Blog

Back to blog posts

PHP Tutorial – Convert PDF to HTML

Published Jul 14, 2021

Intro

For those who are php developers, you might have a niche requirement to convert a PDF into HTML, or extract text content from a PDF for indexing purposes. Here at API2PDF, we have a PDF to HTML endpoint that does a best effort to extract the text from a PDF and output an HTML document.

Our API will take your .pdf file and convert it to html. Just make sure your PDF is saved as a .pdf file and accessible at a URL that our service can ingest. For example, see this: http://www.api2pdf.com/wp-content/uploads/2021/01/1a082b03-2bd6-4703-989d-0443a88e3b0f-4.pdf — Ideally a file storage provider like S3 or Azure Blob Storage. See the code sample below.

Convert PDF to HTML with PHP

Step 1) Install the PHP client library from github: https://github.com/Api2Pdf/api2pdf.php

Step 2) Grab an API key from https://portal.api2pdf.com. Only takes 60 seconds.

Step 3) Use the sample code below and replace “YOUR-API-KEY” with the api key you acquired in step 2.

require_once 'your-own-directory/Api2Pdf.php';
$apiClient = new Api2Pdf('YOUR-API-KEY'); 

$result = $apiClient->libreOfficePdfToHtml('http://www.api2pdf.com/wp-content/uploads/2021/01/1a082b03-2bd6-4703-989d-0443a88e3b0f-4.pdf');
echo $result->getFile();

And that’s it! Modify the code as you see fit. Hopefully this saves you time and makes converting PDF to HTML files easy and painless for those writing php code.

See full github library

We have a whole php based client library for our API that does a lot more than just this. Check out the full library capabilities here: https://github.com/Api2Pdf/api2pdf.php