Server-side HTML to PDF Converter

A common requirement of many document-based web-applications is the ability to generate a PDF document from dynamic content. There are two main methods for doing this:

  1. Use a raw PDF library such as Zend_PDF and write the complex logic needed to format a PDF from scratch;
  2. Use a library that can convert a HTML-formatted document to PDF.
Option number two is by far the easiest path, especially since your web-framework already has all the libraries and modules required to render dynamic content into appropriate views. So you basically just create a new web-page which is formatted the way you want your PDF to look, then pass this URL to a converter that renders this HTML to a PDF file.

There are many different libraries that do this. Some can be integrated directly into your web-app (depending on the language that you're using), while others are stand-alone command-line tools which can be invoked from any process.

The key features to look for when picking one of these html-to-pdf converters are:
  1. Level of support for standard CSS2/CSS3 and HTML4/5 - you basically want as much freedom as you can to format your document using standard HTML and have that reflected precisely in the generated PDF;
  2. Flexibility for customizing page dimensions, headers, footers, numbering, etc - you typically can't do this with HTML alone, so good converters provide special mark-up or mechanisms to implement these;
  3. Speed of generating the PDF - if it takes more than 5 seconds for a 3 page PDF, you need to implement special scheduled batch-processing and caching mechanisms, etc. 
We previously integrated html2ps into one of my PHP web-applications. This library was the best option for a native PHP solution in terms of items 1 and 2 above. Many of the other PHP converters supported a limited subset of HTML and CSS and provided little support for custom page headers. The library was great, but slow - over 40 seconds for a 4 page documents with mostly simple text and a few tables. It also consumed a lot of memory (256MB+ for a single request). We got it working with batch-processing and caching, but the code was not pretty, and the solution seemed far from ideal.

We recently switched to wkhtmltopdf - a stand-alone command-line based utility that does a perfect job of rendering complex HTML to PDF, and does it fast! I haven't looked at the internals much, but from the author's website, it uses a standard WebKit rendering engine that's part of the standard QT environment. We can now generate the same PDFs in under 2 seconds, meaning we can do it in real-time, on client-click, without the need for nightly cron-jobs. 

If you need this sort of thing, I recommend checking it out.


Popular posts from this blog

Wkhtmltopdf font and sizing issues

Import Google Contacts to Nokia PC Suite

Can't delete last blank page from Word