Zend_Pdf not as useful as it sounds

So I started using the Zend Framework in a new PHP project as it seemed to be my one-stop-shop for authentication, access control, pdf-generation, database abstraction, etc. Well it turns out many of these components are not as useful as they sound.

The Zend_Pdf component is described as:

The Zend_Pdf component is a PDF (Portable Document Format) manipulation engine. It can load, create, modify and save documents. Thus it can help any PHP application dynamically create PDF documents by modifying existing documents or generating new ones from scratch.

Sounds great! Let's use it to create a report containing some simple text, a few lists, a few long paragraphs, and perhaps a table or two. Umm, feeling lost? Maybe we should start with this tutorial describing all the basics. Here's a Hello World minimalistic example extracted from this guide:

<?php
//load Zend_Pdf if not using the auto Zend_Loader
require_once('Zend/Pdf.php');

//create PDF document instance
$pdf = new Zend_Pdf();

//create a new page
$page = $pdf->newPage(Zend_Pdf_Page::SIZE_A4);
$pdf->pages[] = $page;

//set the font
$page->setFont(Zend_Pdf_Font::fontWithName(Zend_Pdf_Font::FONT_HELVETICA), 20);

//write some text
$page->drawText('Hello world!', 100, 510);

//stream the pdf back to the client
$pdfData = $pdf->render();
header("Content-Disposition: inline; filename=result.pdf");
header("Content-type: application/x-pdf");
echo $pdfData;
?>

Ok, that's not too bad. We just create a pdf document, add a page, set the font, write some text and dump it out. Easy enough. Although...notice how the text needs to be absolutely positioned. This means you need to know the precise [x,y] coordinates of every string you write!

Yuk, but ok...how about writing some long paragraphs? Well...turns out word-wrapping is not supported, so you have to write a function to calculate the size of your text string, based on font and pixel-size, then manually insert the line-breaks in your string at the desired position, and write another function to intelligently loop over your now delimited text making multiple drawText calls and incrementally increasing the y coordinate by one-line on each token...That's a lot of custom coding just to get word-wrapping working.

If you want to insert pictures and so forth, the functionality is there, although you'll need to manually set the absolute coordinates and size of every image. If you want to draw tables on the other hand, take out your calculator and get ready to do some vector maths, because you have to draw every line individually.

This is NOT a productive tool. Can you imagine creating a report this way? It would take literally days to get it right. Then what happens when someone asks for a change like adding a new table, changing the layout, etc. It's not maintainable, not in the least. This poor chap started writing his own abstraction layer on top of Zend_Pdf to support tables and wrapping. Looks interesting, but again...that's a lot of work, and still far from achieving true flexibility.

Zend_Pdf reminds me of drawing images in QBasic in high-school. It wasn't fun then and it's not fun now. I guess this may be of some use for creating high-precision documents that are unlikely to change. But then, why would you dynamically generate them in PHP in the first place?

A much better approach to me seems to be able to convert html pages straight into PDF. This way you can create a report as if you were creating another web-page using your existing PHP framework. You then just pass the URL of your page to a generic converter and you get back a streaming PDF. The results may not be 100% pixel-perfect, but I don't think that really matters.

There's a few libraries that do just this. In the PHP space I came across html2ps, dompdf, html2pdf and a few others. Out of the three listed, html2ps seems to be the most useful (although I like the simplicity of dompdf more, it hasn't been updated for nearly three years and it does not support ordered lists). I'm currently experimenting with integrating html2ps in my Zend project as a generic service. I've had success in generating nice-looking pdfs so far, but performance is a little slow at the moment. Will post some more updates later.

Update: Seems like some caching may be very beneficial in the html2ps approach. Two ways of achieving simple caching is to either store md5 hashes of the html content being parsed and only do the parsing if the content is new, or pass your own unique hash to the pdf generator function that simply returns an existing file if the hash exists. That is, you may pass the key of a record and the last modified time through md5 and use that as your pdf caching hashes, storing the files on the server. You can rename the files into friendly names using the code provided above.

This approach will work well if your data doesn't change frequently and if you have a small dataset, otherwise the initial pdf generation lag will still affect many users. In this case it may be worth considering pre-generating the pdfs either as the data changes, or as an off-peak batch job. Must also consider a strategy for pruning outdated reports however...deleting old files based on creation timestamp would probably be the simplest solution.

Comments

  1. for easy table generation using zf check out this component:
    http://zendpdftable.sourceforge.net/

    ReplyDelete
  2. Thank you so much for saving me the pain of learning and testing Zend_PDF only to find it not useful for what I need to do.

    ReplyDelete

Post a Comment

Popular posts from this blog

Wkhtmltopdf font and sizing issues

Import Google Contacts to Nokia PC Suite

Can't delete last blank page from Word