Blog
Nodejs / Javascript Tutorial – Convert PDF to HTML
Intro
For those who are nodejs developers, you might have a niche requirement to convert a PDF into HTML, or extract text content from a PDF for indexing purposes. Here at API2PDF, we have a PDF to HTML endpoint that does a best effort to extract the text from a PDF and output an HTML document.
Our API will take your .pdf file and convert it to html. Just make sure your PDF is saved as a .pdf file and accessible at a URL that our service can ingest. For example, see this: http://www.api2pdf.com/wp-content/uploads/2021/01/1a082b03-2bd6-4703-989d-0443a88e3b0f-4.pdf — Ideally a file storage provider like S3 or Azure Blob Storage. See the code sample below.
Convert PDF to HTML with Node / Javascript
Step 1) Open up your package manager and run the command
npm install –save api2pdf
Step 2) Grab an API key from https://portal.api2pdf.com. Only takes 60 seconds.
Step 3) Use the sample code below and replace “YOUR-API-KEY” with the api key you acquired in step 2.
var Api2Pdf = require('api2pdf');
var a2pClient = new Api2Pdf('YOUR-API-KEY');
a2pClient.libreOfficePdfToHtml('http://www.api2pdf.com/wp-content/uploads/2021/01/1a082b03-2bd6-4703-989d-0443a88e3b0f-4.pdf').then(function(result) { console.log(result); });
And that’s it! Modify the code as you see fit. Hopefully this saves you time and makes converting PDF to HTML files easy and painless for those writing node / javascript code.
See full github library
We have a whole nodejs based client library for our API that does a lot more than just this. Check out the full library capabilities here: https://github.com/Api2Pdf/api2pdf.node