Thursday 9 June 2011

Getting started with Avro RPC

Apache Avro is a data exchange format started by Doug Cutting of Lucene and Hadoop fame. A good introduction to Avro is on the cloudera blog so an introduction is not the intention of this post.

Avro is surprisingly difficult to get into, as it is lacking the most basic "getting started" documentation for a new-comer to the project. This post serves as a reminder to myself of what I did, and hopefully to help others get the hello world working quickly. If people find it useful, let's fill it out and submit it to the Avro wiki!

Prerequisites: knowledge of Apache Maven

Start by adding the Avro maven plugin to the pom. This is needed to compile the Avro schema definitions into the Java classes.

<plugin>
  <groupId>org.apache.avro</groupId>
  <artifactId>avro-maven-plugin</artifactId>
  <version>1.5.1</version>
  <executions>
    <execution>
      <id>schemas</id>
      <phase>generate-sources</phase>
      <goals>
        <goal>schema</goal>
        <goal>protocol</goal>
        <goal>idl-protocol</goal>
      </goals>
      <configuration>
        <excludes>
          <exclude>**/mapred/tether/**</exclude>
        </excludes>
        <sourceDirectory>${project.basedir}/src/main/avro/</sourceDirectory>
        <outputDirectory>${project.basedir}/src/main/java/</outputDirectory>
        <testSourceDirectory>${project.basedir}/src/test/avro/</testSourceDirectory>
        <testOutputDirectory>${project.basedir}/src/test/java/</testOutputDirectory>
      </configuration>
    </execution>
  </executions>
</plugin>

Now add the dependency on Avro and the Avro IPC (Inter Process Calls)

<dependency>
  <groupId>org.apache.avro</groupId>
  <artifactId>avro</artifactId>
  <version>1.5.1</version>
</dependency>
<dependency>
  <groupId>org.apache.avro</groupId>
  <artifactId>avro-ipc</artifactId>
  <version>1.5.1</version>
</dependency>

Now we create the Avro Protocol file, which defines the RPC exchange. This file is stored in /src/main/avro/nublookup.avpr and looks like so:

{"namespace": "org.gbif.ecat.ws",
 "protocol": "NubLookup",
 "types": [
     {"name": "Request", "type": "record",
      "fields": [
        {"name": "kingdom", "type": ["string", "null"]},
        {"name": "phylum", "type": ["string", "null"]},
        {"name": "class", "type": ["string", "null"]},
        {"name": "order", "type": ["string", "null"]},
        {"name": "family", "type": ["string", "null"]},
        {"name": "genus", "type": ["string", "null"]},
        {"name": "name", "type": ["string", "null"]}
      ]
     },
     {"name": "Response", "type": "record",
      "fields": [
        {"name": "kingdomId", "type": ["int", "null"]},
        {"name": "phylumId", "type": ["int", "null"]},
        {"name": "classId", "type": ["int", "null"]},
        {"name": "orderId", "type": ["int", "null"]},
        {"name": "familyId", "type": ["int", "null"]},
        {"name": "genusId", "type": ["int", "null"]},
        {"name": "nameId", "type": ["int", "null"]}
      ]
     }  
 ],
 "messages": {
     "send": {
         "request": [{"name": "request", "type": "Request"}],
         "response": "Response"
     }
 }
}

This protocol defines an interface called NubLookup, that takes a Request and returns a Response. Simple stuff.

From the command line issue a compile:
$mvn compile
This will generate into src/main/java and the package I declared in the .avpr file (org.gbif.ecat.ws in my case).

Now we can test it using a simple Netty server which is included in the Avro dependency:

public class Test {
  private static NettyServer server;
  
  // A mock implementation
  public static class NubLookupImpl implements NubLookup {
    public Response send(Request request) throws AvroRemoteException {
      Response r = new Response();
      r.kingdomId=100;
      return r;
    }
  }
  
  public static void main(String[] args) throws IOException {
    server = new NettyServer(new SpecificResponder(
        NubLookup.class, 
        new NubLookupImpl()), 
        new InetSocketAddress(7001)); 

      NettyTransceiver client = new NettyTransceiver(
          new InetSocketAddress(server.getPort()));
      
      NubLookup proxy = (NubLookup) SpecificRequestor.getClient(NubLookup.class, client);
      
      Request req = new Request();
      req.name = new Utf8("Puma");
      System.out.println("Result: " + proxy.send(req).kingdomId);

      client.close();
      server.close();
  }
}

I am evaluating Avro to provide the high performance RPC chatter for lookup services while we process the content for the portal. I'll blog later about the performance compared to the Jersey REST implementation currently running.

8 comments:

  1. I think it will be even clearer if you re-organize this article so that the maven stuff comes later, because the schema is probably the most important and intuitive part, and then comes the java code, finally the maven code.


    also you may want to show the new .idl form, which is much nicer ---- well I am not sure if .idl is supposed to replace the old .avpr form in the future.

    ReplyDelete
  2. Thanks dogdogdog - I just did it in chronological order, and setting up the project was the first step (and something that was not immediately clear which pieces were needed)
    Your comment on the .idl echos a little my concern with avro. It looks very interesting, but getting into it is not as easy as it should be. For example, I did not even spot a .idl and this is my concern with avro. If you try and google AVRO RPC Java you don't find much info. In the avro download/share schemas there are .avsc .avpr and .avdl only. I see on the avro docs [http://avro.apache.org/docs/1.5.1/idl.html] it says "This document defines Avro IDL, an experimental higher-level language for authoring Avro schemata", so I guess it is not intended for use yet.

    To be clear to anyone reading - Please don't misinterpret my comments. I think avro looks very interesting and the core definitions are well documented. It is only the getting "started for dummies" and "how to use it" that I found lacking a little, and my inspiration to blog this. The more people blogging experiences the better for the project, right?

    ReplyDelete
  3. I agree with Tim's comment. Avro is very interesting and it has been very stable and fast for me but the documentation for some of the (newer) features could be better. This includes the MapReduce integration, data file writing/reading, the Maven plugin, stats plugin etc.

    ReplyDelete
  4. A quick note to say that in terms of performance, on the first hacky test, I saw a single Avro server throughput to be the same as 2 load balanced Tomcat / Jersey servers. These web services are being called from the Hadoop cluster, with 53 concurrent clients. It fell over a fair amount though, but I did not code for dropping HTTP connections...

    ReplyDelete
  5. Very nice starter :)
    How would you add snappy compression to the data you send?

    ReplyDelete
  6. Hi Yogev,
    This was just a quick test and I don't know if I would really use this kind of approach in production apps. More recently I've been using Jersey web services, so I'd possibly just add snappy compression as an interceptor to Jersey, or similar if I wanted that. Sorry... not very helpful I know.
    Tim

    ReplyDelete
  7. What is this used for in the avro maven plugin:

    ${project.basedir}/src/test/avro/
    ${project.basedir}/src/test/java/

    Which goal uses this?

    ReplyDelete
  8. This comment has been removed by the author.

    ReplyDelete