Simple Service Layer PHP Caching

There are a lot of plugin caching systems out there but the correct caching mechanism is often specific to each individual application. This is especially true in Ajax-like web applications where data can be highly dynamic. Sometimes it doesn't make sense to deploy a cache to your entire application, especially if your data is dynamic to the point where the overhead from cache misses could actually slow down overall throughput.

Maybe you only have one service that processes a deep hierarchical data structure. This usually involves a lot of recursion and iteration and possibly a lot of database queries. Performance for this type of services is generally n^2 and while you may not notice any significant lag with small structures, once you start exceeding the thousands of nested elements mark, things start to bog down exponentially.

The simple solution for this is to implement a service-specific cache. To do this you need two things:
  1. You must have some way of caching the service results.
  2. You must have some mechanism for invalidating the cached results when data changes.
Turns out that caching the results is very simple. Consider the following function:

public function varToFile($filename, $var) {
$filename = $_SERVER['DOCUMENT_ROOT'] . '/files/cache/' . $filename;
$data = gzcompress(serialize($var),9);
$file = fopen ($filename, "w");
fwrite($file, $data);
fclose ($file);
return $res;

This function will basically store a PHP variable into a file using the PHP serialize function and compression to make the files smaller. The variable could be a very deeply nested associative array containing your hierarchical data structure for example. Reading the variable back out is equally easy:
public function varFromFile($filename) {
$filename = $_SERVER['DOCUMENT_ROOT'] . '/files/cache/' . $filename;
if(file_exists($filename)) {
$data = unserialize(gzuncompress(file_get_contents($filename)));
if($data) return $data;
else return false;
} else return false;
All we need to do is pass in the same filename as we used to store the variable and we'll get the PHP array back. So then the only trick left is to assign unique filenames and have a way of associating these with service parameters. One way to achieve this is to use MD5.

So for example, if I want to build a hierarchical data structure from 5000 database records, I run a query to retrieve all 5000 records in an array, serialize this array to a PHP string, then take an MD5 hash of the string and use this as my cache filename. If any of the 5000 records is changed in any way, the MD5 hash should be different so I can use this as my cache hit/miss mechanism.

A simple (pseudo) example would be as follows:
$records = queryDb($query); //array of say 5000 records
$filename = md5(serialize($records)); //generate the hash key
$data = varFromFile($filename); //try to get the data from cache
if($data !== false) return data; //cache hit
else {
//start the recursive algorithm to parse records
varToFile($filename, $data); //store data in cache

This worked quite well in my case and reduced the loading times for my service from 40+ seconds to under 2 seconds for a cache hit. The only problem is that the cached entries are not automatically cleaned up in any way. An external script or cron task could be used to clean old entries. Also, the cache miss requests would still take 40+ seconds. If this is an issue, you could pre-generate the cache in the background whenever the data is modified, rather then when it's first accessed.


Popular posts from this blog

Wkhtmltopdf font and sizing issues

Import Google Contacts to Nokia PC Suite

Can't delete last blank page from Word