Blog
C# / .NET Tutorial – Convert PDF to HTML
Intro
For those who are c# / .net core developers, you might have a niche requirement to convert a PDF into HTML, or extract text content from a PDF for indexing purposes. Here at API2PDF, we have a PDF to HTML endpoint that does a best effort to extract the text from a PDF and output an HTML document.
Our API will take your .pdf file and convert it to html. Just make sure your PDF is saved as a .pdf file and accessible at a URL that our service can ingest. For example, see this: http://www.api2pdf.com/wp-content/uploads/2021/01/1a082b03-2bd6-4703-989d-0443a88e3b0f-4.pdf — Ideally a file storage provider like S3 or Azure Blob Storage. See the code sample below.
Convert PDF to HTML with C# / .NET Core
Step 1) Open up your package manager and run the command
Install-Package Api2Pdf -Version 2.0.0
Step 2) Grab an API key from https://portal.api2pdf.com. Only takes 60 seconds.
Step 3) Use the sample code below and replace “YOUR-API-KEY” with the api key you acquired in step 2.
var a2pClient = new Api2Pdf("YOUR-API-KEY");
var request = new LibreFileConversionRequest
{
Url = "https://link-to-your-pdf"
};
var apiResponse = a2pClient.LibreOffice.PdfToHtml(request);
Console.WriteLine(apiResponse.FileUrl);
Console.ReadLine();
And that’s it! Modify the code as you see fit. Hopefully this saves you time and makes converting converting PDF to HTML files easy and painless for those writing C# / .NET core code.
See full github library
We have a whole .net based client library for our API that does a lot more than just this. Check out the full library capabilities here: https://github.com/Api2Pdf/api2pdf.dotnet