Wednesday 26 November 2014

Upgrading our cluster from CDH4 to CDH5

A little over a year ago we wrote about upgrading from CDH3 to CDH4 and now the time had come to upgrade from CDH4 to CDH5. The short version: upgrading the cluster itself was easy, but getting our applications to work with the new classpaths, especially MapReduce v2 (YARN), was painful.

The Cluster

Our cluster has grown since the last upgrade (now 12 slaves and 3 masters), and we no longer had the luxury of splitting the machines to build a new cluster from scratch. So this was an in-place upgrade, using CDH Manager.

Upgrade CDH Manager

The first step was upgrading to CDH Manager 5.2 (from our existing 4.8). The Cloudera documentation is excellent so I don't need to repeat it here. What we did find was that the management service now requests significantly more RAM for it's monitoring services (minimum "happy" config of 14GB), to the point where our existing masters were overwhelmed. As a stop gap we've added a 4th old machine to the "masters" group, used exclusively for the management service. In the longer term we'll replace the 4 masters with 3 new machines that have enough resources. 

Upgrade Cluster Members

Again the Cloudera documentation is excellent but I'll just add a bit. The upgrade process will now ask if a JAVA jdk should be installed (an improvement over the old behaviour of just installing one anyway). That means we could finally remove the Oracle JDK 6 rpms that have been lying around on the machines. For some reason the Host Inspector still complains about OpenJDK 7 vs Oracle 7 but we've happily been running on OpenJDK 7 since early 2014, and so far so good with CDH5 as well. After the upgrade wizard finished we had to tweak memory settings throughout the cluster, including setting the "Memory Overcommit Validation Threshold" to 0.99, up from its (very conservative) default of 0.8. Cloudera has another nice blog post on figuring out memory settings for YARN. Additionally Hue's configuration required some attention because after the upgrade it had forgotten where Zookeeper and the HBase Thrift server were. All in all quite painless.

The Gotchas

Getting our software to work with CDH5 was definitely not painless. All of our problems stemmed from conflicting versions of jars, due either to changes in CDH dependencies, or in changes to how a user classpath is specified as having priority over that of Yarn/HBase/Oozie. Additionally it took some time to wrap our heads around the new artifact packaging used by YARN and HBase. Note that we also use Maven for dependency management.

Guava
We're not alone in our suffering at the hands of mismatched Guava versions (e.g. HADOOP-10101HDFS-7040), but suffer we did. We resorted to specifying version 14.0.1 in any of our code that touches Hadoop and more importantly HBase, and exclude any higher version guavas from our dependencies. This meant downgrading some actual code that was using guava 15, but was the easiest path to getting a working system.

Jackson
We have many dependencies on Jackson 1.9 and 2+ throughout our code, so downgrading to match HBase's shipped 1.8.8 was not an option. It meant figuring out the classpath precedence rules described below, and solving the problems (like logging) that doing so introduced.

Logging
Logging in Java is a horrible mess, and with the number of intermingled projects required to make application software run on a Hadoop/HBase cluster it's not surprise that getting logging to work was brutal. We code to the SLF4J API and use Logback as our implementation of choice. The Hadoop world uses a mix of Java Commons Logging, java.util.logging, and log4j. We thought that meant we'd be clear if we used the same SLF4J API (1.7.5) and used the bridges (log4j-over-slf4j, jcl-over-slf4j, and jul-to-slf4j), which has worked for us up to now. <montage>Angry men smash things angrily over the course of days</montage> Turns out, there's a bug in the 1.7.5 implementation of log4j-over-slf4j, which blows up as we described over at YARN-2875. Short version - use 1.7.6+ in client code that attempts to use YARN and log4j-over-slf4j.

YARN
The crux of our problems was having our classpath loaded after the Hadoop classpath had been loaded, meaning old versions of our dependencies were loaded first. The new, surprisingly hard to find parameter that tells YARN to load your classpath first is "mapreduce.job.user.classpath.first". YARN also quizzically claims that the parameter is deprecated, but.. works for me.

Oozie
Convincing Oozie to load our classpath involved another montage of angry faces. It uses the same parameter as YARN, but with a prefix, so what you want is "oozie.launcher.mapreduce.job.user.classpath.first". We had been loading the old parameter "mapreduce.task.classpath.user.precedence" in each action in the workflow using the <job-xml> tag to load the configs from a file called hive-default.xml. We then encountered two problems: 
  1. Note the name - we used hive-default.xml instead of hive-site.xml because of a bug in Oozie (discussed here and here). That was fixed in the CDH5.2 Oozie, but we didn't get the memo. Now the file is called hive-site.xml and contains our specific configs and is again being picked up. BUT:
  2. Adding oozie.launcher.mapreduce.job.user.classpath.first to hive-site.xml doesn't work! As we wrote up in Oozie bug OOZIE-2066 this parameter has to be specified for each action, at the action level, in the workflow.xml. Repeating the example workaround from the bug report:
 <action name="run-test">  
  <java>  
   <job-tracker>c1n2.gbif.org:8032</job-tracker>  
   <name-node>hdfs://c1n1.gbif.org:8020</name-node>  
   <configuration>  
    <property>  
     <name>oozie.launcher.mapreduce.task.classpath.user.precedence</name>  
     <value>true</value>  
    </property>  
   </configuration>  
   <main-class>test.CPTest</main-class>  
  </java>  
  <ok to="end" />  
  <error to="kill" />  
 </action>  


New Packaging Woes


We build our jars using a combination of jar-with-dependencies and the shade plugin, but in both cases it means all our dependencies are built in. The problems come when a downstream, transitive dependency loads a different (typically older) version of one of the jars we've bundled in our main jar. This happens a lot with the Hadoop and HBase artifacts, especially when it comes to MR1 and logging.

Example exclusions

hbase-server (needed to run MapReduce over HBase): https://github.com/gbif/datacube/blob/master/pom.xml#L268

hbase-testing-util (needed to run mini clusters): https://github.com/gbif/datacube/blob/master/pom.xml#L302

hbase-client: https://github.com/gbif/metrics/blob/master/pom.xml#L226

hadoop-client (removing logging): https://github.com/gbif/metrics/blob/master/pom.xml#L327


Beyond just sorting conflicting dependencies, we also encountered a problem that presented as "No FileSystem for scheme: file". It turns out we had projects bringing in both hadoop-common and hadoop-hdfs, and so we were getting only one of the META-INF/services files in the final jar.  Thus we could not use the FileSystem to read local files (like jars for the class path) and also from HDFS.  The fix was to include the org.apache.hadoop.fs.FileSystem in our project explicitly: https://github.com/gbif/metrics/blob/master/cube/src/main/resources/META-INF/services/org.apache.hadoop.fs.FileSystem

Finally we had to stop the TableMapReduceUtil from bringing in it’s own dependent jars, which brought in yet more conflicting jars - this appears to be a change in the default behaviour, where dependent jars are now being brought in by default in the shorter versions of initTableMapper:
https://github.com/gbif/metrics/blob/master/cube/src/main/java/org/gbif/metrics/cube/occurrence/backfill/BackfillCallback.java#L37

Conclusion

As you can see the client side of the upgrade was beset on all sides by the iniquities of jars, packaging and old dependencies. It seems strange that upgrading Guava is considered a no-no, major breaking change by these projects, yet discussions about removing HBaseTablePool are proceeding apace and will definitely break many projects (including any of ours that touch HBase). While we're ultimately pleased that everything now works, and looking forward to benefiting from the performance improvements and new features of CDH5, it wasn't a great trip. Hopefully our experience will help others migrate more smoothly.