Tuesday, 17 May 2011

Software quality control at GBIF

We've not only set up Hadoop here at GBIF but also introduced a few other new things. With the growing software development team we've felt the need to put some control measures in place to guarantee the quality of our software and to make the development process more transparent both for us at GBIF and hopefully for other interested parties as well.

GBIF projects have always been open source and hosted at their Google Code sites (e.g. GBIF Occurrencestore or the IPT). So in theory it was always possible for everyone to check every commit and review it. We've set up a Jenkins server however that does continuous integration for us which means that every time a change is made to one of our projects it is checked out and a full build is being run including all tests, code quality measurements (I'm going to get back to those later), web site creation (e.g. Javadocs) and publishing of the results to our Maven repository.

This is the first step in our new process. Every commit is checked in this way and we've had great success improving the stability of our builds in this way. Our Jenkins server is publicly visible at the URL http://hudson.gbif.org (background on the Hudson name in the URL can be found on Wikipedia).

As part of the process Jenkins also calls a code quality server called Sonar. Our Sonar instance is public as well. Take a look at the metrics for the IPT for example. You'll see a lot of information about our code, good and bad. We're not yet using this information extensively but are looking into useful metrics to incorporate them more closely into our development process. One example are some Coding Conventions to make the code consistent and easier to understand for everybody.

Once the build has finished the Sonar stage the results of the build are pushed to our Maven repository (which is running a Nexus server). That means we now have up to date SNAPSHOT builds of all our projects available (to use in our and your projects).

At the moment we don't have a lot of code contributions from outside of the GBIF to our projects but we hope that by making our development process more transparent we can encourage others to take a look as well.

We're always open for suggestions, questions and comments about our code base.


  1. I live in the Ruby on Rails world, where testing is practically a religion (including TDD - which requires writing a failing test before writing any code). It's clear that you guys have an awesome and mature development methodology, but only 26 tests? Did I miss something?

  2. I assume you're talking about the IPT. In that case you didn't miss anything. There are not as many tests as we would like to have but if you take a look at the currently open issues (http://code.google.com/p/gbif-providertoolkit/issues/list) you'll see that we have issues open that talk about adding tests for the next release and they all are marked as critical.

  3. Well, its 26 unit tests for the IPT, thats correct. But that project is probably the worst example with the least number of tests. Also for the next release increasing unit test coverage is the primary objective: http://code.google.com/p/gbif-providertoolkit/issues/list?q=Milestone%3DRelease2.0.3

    The IPT also is based on other libraries, in particular the dwc archive reader (which we use a lot for reading these archives). And that library is much better covered: http://sonar.gbif.org/dashboard/index/3618

  4. In comparison, for antcat.org (which I'm working on for Stan Blum here at CAS), there are 1097 unit tests and 50 integration tests for what I imagine is a much less complicated project (3416 LOC + 8018 test LOC). Like I said, we're rather religious.

    I personally hate writing tests - after the fact. There's nothing more tedious or boring. But writing tests first is a different story. I (we) think it improves our design. Some people say that TDD isn't even about testing - it's about making sure you only write the code you need, and that it has a usable interface.

    I could go on and on, but I won't. :) I'm looking forward to exploring your your codebase.