Why Ruby?

Stemming from a Twitter conversation last month, I thought it would be a good idea to describe — in more than the 140 character bursts that Twitter allows — why we at the Scholars’ Lab often promote Ruby, opposed to one of the other 4 or 5 languages we develop with. This isn’t an attempt to declare one language “the best,” but is meant to lay out some of the fundamental reasons why we use Ruby in the context of our digital humanities work and why we think it’s a nice language to suggest to folks just starting to program.

Qualification

People often think of the Scholars’ Lab as a Ruby (and Rails) shop, which isn’t quite the case. We code in many different languages. In the past several moths, we have written The Mind is a Metaphor in Ruby (with Rails). For Better For Verse uses WordPress with TEI and JSP, and some more recent work on a William Faulkner audio archive employs Cocoon with Solr. In addition to those collaborations with UVa faculty, we’ve been writing plugins for Omeka (and dusting off our PHP skills) and have created a discovery interface for our GIS infrastructure in Ruby (with Sinatra). If you analyze the technologies we currently deploy, it turns out we use Cocoon + Solr more than anything else, though we’re starting to move away from that particular stack as our approach for tool development.

The Scholars’ Lab has a lot of experience with all types of languages, and depending on the circumstances, we choose different tools to accomplish any given task. However, after quite a bit of time helping people get started on a programming path, I’ve come to appreciate some of the features Ruby provides in getting new programmers up to speed.

Learning Curve

Every language has a learning curve. However, once you get the hang of some of the basics of computer languages (flow control, data structures, objects, etc.), the biggest differences come from syntax. All web languages make certain programming exercises easy, and once you buy in to the way in which that language handles programming constructs, moving between languages for experienced programmers becomes a simpler exercise in exploring syntax and built-in functionality.

As one of my computer science professors posited, generally the first language you learn governs the way you code until something significantly better comes along. For a lot of folks getting in to programming for the first time, this usually means either taking a class or finding someone to show you the basics. If you’re in higher education, this has typically meant the tool of choice is PHP. However, having seen the look of bewilderment on the faces of enough graduate students and faculty members as I attempt to explain the difference between sprintf and printf (printf returns the length of the formatted String and sprintf returns the formatted String), I’ve come to believe that the syntax of a programming language (and it’s readability) is an exceptionally important part of a language, especially when teaching basics of software construction.

Method Chaining

Without getting into how the PHP and Ruby duck-type primitive data types and data structures, one big difference in syntax between the two is how one combines multiple method calls together. Ruby uses method chaining for objects whereas PHP uses “bolted-on” functions (think of these as order-of-operations from your high school Algebra class). Let’s look at this brief example of addressing and sorting an associative array in PHP and its equivalent in Ruby:

$projects = array("solr" => 4, "php" => 1, "rails" => 2, "jsp" => 3);
$keys = array_keys($projects);
sort($keys);
$sorted = array_slice($keys, 0, 3);

If you’re a little more advanced, you might refactor (rewrite) the code to look more like this (methods anonymously “bolted-on” to one another):

$projects = array("solr" => 4, "php" =>  1, "rails" => 2, "jsp" => 3);
$sorted = array_slice(sort(array_keys($projects)), 0, 3);

Now, the same examples in Ruby syntax:

projects = {"solr" => 4, "php" =>  1, "rails" => 2, "jsp" => 3}
sorted = projects.keys.sort.slice(0,3)

Or, even more concisely:

projects = {"solr" => 4, "php" =>  1, "rails" => 2, "jsp" => 3}
sorted = projects.keys.sort[0..3]

I’ve found that Ruby’s method chaining syntax makes more sense to new programmers than the more mathematical “bolted-on” syntax.

Blocks

Ruby has a neat construct that you use all over the place to create anonymous functions (a technical term for creating specific functionality without defining a new function to define the action). Let’s take a function to sort an array of projects. First, in PHP:


function sort_projects_by_count($a, $b)
{
    if($a -> counts == $b -> counts)
    {
        return 0;
    }
    return($a -> counts > $b -> counts) ? +1 : -1;
}

usort($projects, "sort_projects_by_count");

And the same thing in Ruby:

projects.sort do |a, b|
    a.counts <=> b.counts
end

Ok, so this is a bit of an unfair comparison, but here is an analogous version of the Ruby code in PHP:

usort($projects, create_function($a, $b, 'if($a->counts == $b->counts){
    return 0;}return ($a->counts > $b->counts ? +1 : -1));

No matter how you slice it, Ruby syntax just feels more human. Even if you don’t know exactly what’s going on, looking up one operator in the Ruby syntax as opposed to following the logic flow and determining what “? +1 : -1″ means (it’s shorthand for an if-then statement) makes the act of reading code much easier.

Monkeypatching

If you’re not familiar with the term, “monkeypatching” is what programmers call changing or extending a base class (like an array or string object) to add functionality or change the way it works. Let’s say you really need to be able to test a string to see if it looks like an integer, you could create a monkeypatch along these lines:

class String
    def is_int?
        self =~ /^[-+]?[0-9]*$/
    end
end

This code snip extends the String class and uses a Regular Expression (regex) to test if a given String is an integer (number) by simply calling “is_int?” (notice the question mark at the end of the definition; this is used for methods that return a Boolean value). That’s a little advanced, but it does show off a very useful piece of functionality of the language that allows you to do a better job dealing with a duck-typed language.

Frameworks

Many people when talking about Ruby associate its use in web development with the Rails MVC framework. Just as PHP has Zend, CodeIgniter, CakePHP, symfony, etc., Ruby has Rails, Merb, Sinatra, Camping, and many more. Rails is the 900-pound gorilla of Ruby frameworks, and has a lot of nice features to get new applications off the ground quickly and some really great online guides to setting things up (I frequent RailsGuides). Since we often suggest Rails to our collaborators I’ll focus on this framework, but there are several other frameworks out there to choose from.

Think of Ruby (or PHP for that matter) as a pile of building materials: you can to build anything you want if you know how to put everything together. Rails, on the other hand, is like a prefab house where workers pour a foundation, set the house up, and then leave you to add the drywall, siding, windows, and roof. If you need a new component for your prefab house, a sales representative is standing by to help immediately ship you what you need.

Generators

To extend the previous metaphor, generators are like sales representatives that allow you to place orders for new aspects of your site.  To create all the erb templates (the default templeting language for Rails), controller methods, model, database migrations, routes, and tests for a new project with a single line, you might run something like the following:

script/generate scaffold topic title:string description:text

I moved away from using scaffolding pretty quickly, but it does provide a nice starting point for new programmers to build interactions with data models.

Templates

The default templating engine in Rails is erb which provides a convenient method for generating views of  your models. One of erb’s most important features is the use of “partials,” pieces of code that are used in multiple views by calling the render method. I often replicated this behavior in PHP by calling an include somewhere in a view.

ActiveRecord

The key to database interactions in Rails is ActiveRecord. As an SQL expert, I have to admit this part of the Rails framework drove me a bit batty at first, but then again I’ve been writing SQL for over 10 years (my colleagues often roll their eyes when I start writing it with relational algebra nomenclature), so allowing a framework to abstract this particular piece took some getting used to. If you’re new to programming, though, this means you don’t have to learn SQL but instead can use Ruby-style syntax to interact with your database without necessarily needing to care what your RDBMS back-end happens to be.

Take this example of looking up a book review where you have both a “book” and “review” table. In PHP you would do something like this (this snip will only work with a MySQL connection but has some sub-selection stuff going on):

function query($sql)
{
    global $conn;
    return mysql_query($sql, $conn);
}

function recent_reviews($count)
{
    $query = query(sprintf('SELECT b.book_title, r.review_id, r.created_at, r.id, review_counter.review_total
        FROM reviews r, books b,
            (SELECT count(*) AS review_total FROM reviews) AS review_counter
        WHERE r.book_id = b.book_id
        ORDER BY created_at DESC
        LIMIT %d', $count);

    return $query
}

print_r(recent_reviews(5));

Now, for the ActiveRecord equivalent:

reviews = Review.find(:limit => 5, :order => "created_at DESC");
puts reviews.inspect

Because of the way in which ActiveRecord sets up its model associations, you’ll have access to the different name scopes to print out the same information, just in far less code. However, if you really want to, you can pass your SQL to get more granular control over the syntax:

reviews = Review.find_by_sql("SELECT b.book_title, r.review_id, r.created_at, r.id, review_counter.review_total
        FROM reviews r, books b,
            (SELECT count(*) AS review_total FROM reviews) AS review_counter
        WHERE r.book_id = b.book_id
        ORDER BY created_at DESC
        LIMIT ?", count);

One of the real beauties of the ActiveRecord methods is that as long as you’re using the generic ActiveRecord syntax, your data persistence layer can be pretty much any RDBMS and be changed with a couple lines in the configuration file. The trade-off however, is that you lose a few things and can make slightly more work for yourself than you might anticipate. One important caveat is that ActiveRecord doesn’t create foreign keys when you set up reference fields. This is actually by design as it’s using an object-oriented idiom (an object should validate the presence of another, without the underlying persistence layer enforcing any type of constraint), but I find myself adding these in to ensure that the RDBMS takes advantage of the pre-calculated indexes to improve overall performance.

I should also mention that I think the ActiveRecord model has some real limitations. As you develop your models, you will most like be tweaking its fields, which in turn requires new migrations, and you may forget which fields are actually in your models. There are plugins that help with this, but you do need to take additional steps to have this information placed somewhere convenient (I use a pre-commit git hook that calls the annotate gem to dynamically annotate my model schemas).

Security

You’ll notice in the last examples I was doing some funny stuff in both the PHP and Ruby examples to protect against SQL injection attacks. If you’re using the ActiveRecord methods of addressing objects, Rails will take care of this for you. If you’re using PHP, you’ll either need to do this yourself (sprintf is commonly used) or rely on a framework to parametrize your statements (you don’t want someone deleting everything in your database).

You also need to protect yourself from Cross-site Scripting Attacks (XSS) by escaping HTML from fields with dynamic content. erb has a helper function html_escape (h is the shorthand) which escapes this data. In Rails 3, this will change slightly and erb will automatically html_escape model output unless you explicitly tell it not to escape the field. One less thing to remember!

Testing

Testing is important, and I try to preach its virtues every chance I get. After finishing a project, the typical programmer won’t touch the code again until the application breaks. Good testing will save you (or the person that inherits the code) a lot of time discovering exactly what broke the application.

Let’s face it: testing is a pain in PHP, and I rarely see it done well. What I’ve really enjoyed about Ruby development is the fact that no matter how you code there is an appropriate testing framework available (I use rspec). There’s a strong emphasis on not only unit testing, but also integration and acceptance testing. There are also libraries that give you an idea of how well your code is tested, something I sorely missed from my Java coding endeavors. One other huge plus is that every gem on the rubygems.org site includes test-coverage metrics to give you an idea of how well the code you want to install is tested.

PHP also has testing frameworks with PHPUnit and PHPSpec being rather popular. I won’t say too much about the PHP testing frameworks other than to say that there are analogous frameworks for writing and running tests in PHP and Ruby. However, I’ve noticed a slightly more concerted effort to think through the inclusion of the tools in Ruby and their integration into the coding workflow than I’ve experienced with PHP. With the latter language, I’ve often fell in to the trap of writing the code, getting it to where I want it, and then, really as an afterthought, writing basic unit tests to get rather skimpy code coverage.

As a case in point, a mantra in Rails development is TATFT.

BryanL on TATFT from Bryan Liles on Vimeo.

Deployment

Two typical objections raised when contemplating Ruby development are that “Ruby doesn’t scale” and that server setup is a real pain. These are valid concerns, but as with many open source projects with a large number of fanatical supporters, the Ruby community has steadily made improvements in these areas. Actually, for the vast majority of our readership, these issues can be filed away in the solved category.

Scaling

I’ve always found objections to scaling a bit troubling. Scaling is one of those over-used terms that means different things to different people, but most of the “Rails doesn’t scale” comes from Twitter’s experience with the framework. They found, as they scaled horizontally (adding more servers) to handle loads of 11,000+ requests per second, that a bottleneck existed at the data persistence level as Rails doesn’t, by default, provide a mechanism to to address multiple databases. Twitter has since moved parts of their code base to Scala but has retained the majority of their code in Rails and has developed some rather ingenious messaging capabilities to talk to the appropriate abstraction layers that one needs in very large enterprise applications.

While Twitter shows that Rails is capable of scaling (with lots of work), quite honestly the likelihood of any of our applications needing this level of engineering is slim. I will say, however, that there are relatively simple methods of scaling with your infrastructure should you start running into performance issues. We have, for example, an application written in pure Ruby on Rails deployed as a Tomcat application (the details of which are completely outside the scope of this article, but the application gets all the benefits of an Enterprise class Java environment with the ease of Rails development).

Server Support

The Ruby language is included in most (if not all) modern Linux package systems and makes installation a snap. The other “major” piece of software you’ll need is RubyGems (a package manager for Ruby libraries), which is also generally available as a managed package.

Note: There is a major change occurring with the development of Rails 3. Rails is moving from a system of system-wide gems to application-level gems with the introduction of GemBundler. This approach is a more stable method of deploying application requirements which not only allows you to ensure that application libraries are properly resolved, but also provide better granular control over which libraries are deployed in specific contexts.

There was a time where deploying Rails applications was a real bear. Then along came Phusion Passenger (aka mod_rails). This allows you to run Rails (actually any Rack-based application) through Apache and Nginx without any other port management, service process monitoring, file cleanup, etc. As long as Apache is running, so is your Rails app!

Community of Support

The Rails community is pretty great in getting folks off the ground. As with any technology there are a fair number of curmudgeons, but leaders in the community as quick to remind people to be nice (see Yahuda Katz’s The Blind Men and the Elephant: A Story of Noobs). There are several corporations backing Rails development (EngineYard is a big one) and when the Google Summer of Code for Rails program wasn’t continued, the Rails community was able to raise $100,000 in three days to support a Ruby Summer of Code.

There are a number of really good podcasts (Ruby5 and The Ruby Show are good), vodcasts (Railscasts), tutorial sites (ACSIIcasts), blogs (RailsDispatch, ThoughtBot, EngineYard), open source projects (lots on github), open source books (Rails Tutorials), and some really good reference books from The Pragmatic Programmers. There’s even some humor…

Summary

As much as it drives language purists crazy when I say it, there’s nothing that you can do in Ruby that you can’t replicate in PHP (and vice-versa). In my experience, getting people set up with a functioning web application is far easier with Ruby than it is with PHP (and less prone to spaghetti code), and the deployment options make Ruby (plus a framework) a great starting point for new programmers to get their feet wet with web application development (check out Heroku). If you’re an experienced developer, does it make sense to drop PHP and rewrite your code base? Absolutely not. However, at some point, you will be faced with the prospect of needing to migrate a legacy application where Ruby may make a lot of sense. As someone who has spent quite a bit of time de-tangling spaghetti code from Perl CGI and mixed HTML and PHP pages, I’m hoping that the people that will eventually be migrating my Ruby code will not need to perform the level of coding archaeology we’ve needed to perform.

Like most things in life, choosing the correct tool for the job needs some careful consideration and planning. Ruby makes a lot of sense for getting applications off the ground quickly and reinforcing good practices like testing, code separation, and readability that I find important in forming new digital humanities programmers. Web development over the last decade has become exceptionally complex (AJAX, web services, web standards, multiple browsers, etc.) and the real hope is that by using Ruby and Rails as an approach, people will be inspired to continue down a development path to both enrich their own scholarship and impact the larger digital humanities community without becoming frustrated by syntax. This is a bit of an experiment which we and other digital humanities shops are undertaking, and in which we’re inviting everyone to participate. No matter the language, we should all be engaged in teaching best practices in project design and management, in software development techniques, in the construction of usable and elegant interfaces, and in the application of these things to humanities scholarship, through which everyone wins!

Other Resources

Wayne Graham is head of the Scholars' Lab Research and Development team. He holds an MA in history from the College of William and Mary and his BA in history from the Virginia Military Institute. Before joining the Scholars' Lab in 2009, he worked at the Colonial Williamsburg Foundation's Department of Historical Research, then as…

2 Comments

  1. $sorted = array_slice(sort(array_keys($projects)), 0, 3);

    This doesn’t work in PHP. sort returns a bool and NOT the sorted array.

    • Hi Greg,

      Yes, you’re correct, thanks for pointing that out. After looking at it again, that’s also probably not the most useful example of combining functions in a procedural language either to do something comparable to a object-oriented language. I know I try to go to extraordinary lengths not to write “clever” lines of code like that in PHP…especially when they don’t work :)

      Wayne

Pingbacks

The Rubyist Historian: Getting Started | History in the Digital

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Archives