"I See Dead Code"
I've been working on a PHP project recently which has reached over 30,000 lines of code. The project has also undergone numerous requirement changes which have resulted in various bits being redesigned and reimplemented at different times in an iterative prototype development cycle. Needless to say, this has resulted in some dead code, i.e. functions which were originally used in an earlier prototype but commented out or left uncalled in various revisions.
Detecting these dead code blocks is not as easy as it may seem. The project is a mixture of PHP, HTML and JS using the Zend Framework as the MVC layer, YUI and jQuery for various AJAX-like functionality and a custom RESTful XML service layer with abstracted parameterized named queries stored in external SQL files.
The NetBeans refactoring tools (I use NetBeans as my PHP IDE) are not intelligent enough to be able to understand my project structure and mash-up of languages, design patterns and frameworks to be able to analyze my code and tell me which views never get called, or which services never get invoked, or which named queries never get executed. Trying to find all dead code manually would be very tedious, but leaving it in would create maintenance nightmares, especially for future developers.
Thankfully, the project follows some standard conventions that make writing a staggered automatic dead code detection script possible. I chose to write this script in Java (as it's my main language) and it has the following structure:
- Identify all dead Zend Views and manually delete.
- Next, identify all dead service functions and manually delete.
- Finally, identify all dead SQL named queries and manually delete.
The entire script can be found here, although this is specific to my project and won't be immediately useful without customization. The logic however is fairly simple and may help others in their refactoring efforts.
The script is broken up into three main static functions, findDeadViews, findDeadServices and findDeadSQL. I first run the script by invoking the findDeadViews function, which prints out all the view file names that are never called from any other view. To do this, the function simply gets a list of all files in the views directory, stores all the view names into a HashMap, then goes through every view file line-by-line and does a search for each view name, incrementing a reference counter in the HashMap. The function then prints out any views which have a reference count of 0, which I then manually check and delete if they are indeed not used (making sure to also delete any related unused Controller actions). I then run the script again to find any new views which may no longer be referenced after deleting the previously dead views. Repeat until the list is empty and all dead views/controllers should be purged.
A few notes about this:
- This will find references to views regardless of how they're called, i.e. via JS, via embedded PHP, via direct HTML links, etc. It will however not find dynamic calls where the view name is constructed via some concatenation logic or pulled from a DB table, etc.
- All my views happen to have unique names by convention which makes storying them as keys into a HashMap the simplest way to go. If your views use common names that can be shared across different Controller classes then you'll need to key by the Controller/Action name and implement additional logic to break this up when doing the line-by-line keyword search.
The findDeadServices and findDeadSQL functions work in a very similar fashion and are specific to my project structure. The main thing to note here is the structure of findDeadServices:
- Take the root directory under which all service classes are stored and get a list of all the class files.
- For each class file, read line-by-line and extract all 'public function fooBar(...)' declarations, where 'fooBar' is the service name.
- Store these in a HashMap, then go through all the files in the project that can contain calls to the services and count the references.
- Print out all services with a reference count of 1 (i.e. declaration only).
Again, this relies on a coding convention which dictates every service has a unique name regardless of the service class. This may not be true in other projects, in which case you may need to either concatenate the class name and method name, or have nested HashMaps (top-level map keyed by class name containing child maps keyed by service names).
After running this function a few times and deleting all dead services, I'm ready to move on to stage 3 for finding and deleting named SQL queries. In total, the process took about 1 hour (including coding the script) and helped me identify and delete close to 5000 lines of unused code (i.e. 16% of my project source was accumulated trash... I should house-keep more frequently).
UPDATE: And while we're at it, here's a quick and simple regex that can be used to find magic numbers (any number other than 0 basically): [^a-zA-Z_][1-9][0-9]*