Code Reviews and the Digital Humanities

The following was a response I made in an email exchange with Tom Elliot of the Pleiades Project and Bethany Nowviskie. Our conversation was prompted by Tom’s inquiry on planning, budgeting for, and conducting a code review as part of a grant-funded project. What follows is a slightly modified (and expanded) version of that email conversation.

Testing and code review is something that has been on my mind a lot lately as our shop has been shifting its focus from boutique, one-off projects, to building upon frameworks maintained by other organizations. As these code bases continue to grow, we need to ensure that subtle changes to the core functionality of the underlying systems do not propagate into bugs in our code. We also need a way to handle this situation quickly and efficiently when this does arise. This was especially reinforced by two recent projects our group undertook to migrate nearly decade-old software on to new servers.

If you ask anyone in the office, they will most likely roll their eyes when I start beating the testing drum. These are great tools for not only generating pretty green and red bar charts, but also documenting the intention of the programmer in writing the code, and zeroing in on bugs where they occur without weeks of hunting. However, this is only one of the tools in the chest for writing solid code, sans bugs. In fact, there are a lot of sophisticated, freely available, automated tools that help programmers of all skill levels not only write more consistent code, but also zero in on potential performance issues and just plain smelly code (that they obviously wrote just to get running and fully intended to go back and fix later).

Over the years, tools that measure code complexity (like PMD, PHPMD, and flog), code dependency analyzers (JDepend, PHPDepend, and rcov), copy/paste detection (in PMD, flay, and phpcpd), and enforcing coding standards (a la PHPCode Sniffer and rails_best_practices), along with not only unit and integration tests (in whatever style you choose), but a code coverage analysis reports that provides feedback on which lines were executed, go a long way in reducing the number of bugs in code. These tools are really pre-emptive step in writing stronger, more elegant, and ultimately more sustainable code, all before once gets to the point of performing a human code review.

While I don’t need to be building software per-se, I have started experimenting with the Hudson continuous integration server as a dashboard to get a quick snapshot of how these different metric tests all play together in the code that our team writes. It is no longer good enough to simply have code functioning, we need the code to pass certain thresholds of quality and sustainability before we can release. Where we find issues in the code, like test coverage, high cyclomatic complexity, lots of copy-n-pasted code, or high volatility in dependency scans, we can sit down and perform a rather focused mini code review (resembling the pair-programming idiom) on that section of code to refactor a better solution or approach To this end, we’re currently working on a set of baseline testing and reporting tools for our projects. Currently, we have Ant scripts for our PHP and Java projects, and a gem bundle for Rails and Sinatra projects.

While we take this approach in the Scholars’ Lab, we were wondering if there were others out there that had opinions or experiences to share about code review during development? If you do, leave a comment, write a post, or tweet at us (@scholarslab, @nowviskie, @wayne_graham) — and at @paregorios, who started the conversation in the first place. We’d love to hear about your best practices (and even horror stories) and philosophy on what constitutes good software and useful code reviewing, including whether you think current trends in open source development constitute a good-enough review for DH projects.

Further Resources





Continuous Integration

Issue Tracking

Wayne Graham is head of the Scholars' Lab Research and Development team. He holds an MA in history from the College of William and Mary and his BA in history from the Virginia Military Institute. Before joining the Scholars' Lab in 2009, he worked at the Colonial Williamsburg Foundation's Department of Historical Research, then as…


  1. Wayne, thanks for posting this. Count me in for reciprocal code review, and also in for trying to minimize formal review requirements for open source projects. Those of us working on open source understand that, if we’re successful, we get a lot of code review for free. We get patches, we get forked on Github, other developers are reading the code and suggesting meaningful improvements, often with tests and documentation. A high profile open source project like Anthologize will probably get more attention and review than it could possibly pay for or coordinate in any other way. In addition to formal reviews, we could try to help funders better appreciate this benefit of open source and open development.

  2. Hey, Wayne — Tom may have more to say, but the only thing I’d want to bring in from our email conversation was my suggestion that — if funding agencies are getting serious about requiring code reviews or other forms of quality assurance and post-project peer review — that a group like centerNet would be perfectly positioned to facilitate networking and cooperative relationships among centers and projects. A you-scratch-my-back/I’ll-scratch-yours model of reciprocal code review might be a way to get this going on a shoestring.