Friday 8 July 2011

Are you the keymaster?


As I mentioned previously I'm starting work on evaluating HBase for our occurrence record needs.  In the last little while that has meant coming up with a key structure and/or schema that optimizes reads for one major use case of the GBIF data portal - a user request to download an entire record set, including raw records as well as interpreted.  The most common form of this request looks like "Give me all records for ", eg "Give me all records for Family Felidae".

So far I'm concentrating more on the lookup and retrieval rather than writing or data storage optimization, so the schema I'm using is two column families, one for verbatim columns, one for interpreted (for a total of about 70 columns).  The question of which key to use for HTable's single indexed column is what we need to figure out.  For all these examples we assume we know the backbone taxonomy id of the taxon concept in question (ie Family Felidae is id 123456).

Option 1
Key: native record's unique id

Query style: The simplest way of finding all records that belong to Family Felidae is scan all of them, and check against the Family column from the interpreted column family.  The code looks like this:

    HTable table = new HTable(HBaseConfiguration.create(), tableName);
    byte[] cf = Bytes.toBytes(colFam);
    byte[] colName = Bytes.toBytes(col);
    byte[] value = Bytes.toBytes(val);

    Scan scan = new Scan();
    ResultScanner scanner = table.getScanner(scan);

    for (Result result : scanner) {
      byte[] testVal = result.getValue(cf, colName);
      if (Bytes.compareTo(testVal, value) == 0) doSomething;

    }

Because this means transferring all columns of every row to the client before checking if it's even a record we want, it's incredibly wasteful and therefore very slow.  It's a Bad Idea.

Option 2
Key: native record's unique id

Query style: HBase provides a SingleColumnValueFilter that executes our equality check on the server side, thereby saving the transfer of unwanted columns to the client.  Here's the code:

    HTable table = new HTable(HBaseConfiguration.create(), tableName);
    byte[] cf = Bytes.toBytes(colFam);
    byte[] colName = Bytes.toBytes(col);
    byte[] value = Bytes.toBytes(val);

    SingleColumnValueFilter valFilter = new SingleColumnValueFilter(cf, colName, CompareFilter.CompareOp.EQUAL, value);
    valFilter.setFilterIfMissing(true);

    Scan scan = new Scan();
    scan.setFilter(valFilter);
    ResultScanner scanner = table.getScanner(scan);


This is about as good as it gets until we start getting clever :)

Option 3
Key: concatentation of nub-taxonomy "left" with native record's unique id

Query style:  We know that a taxonomy is a tree, and our backbone taxonomy is a well behaved (ie true) tree.  We can use nested sets to make our "get all children of node x" query much faster, which Markus realized some time ago, and so thoughtfully included the left and right calculation as part of the backbone taxonomy creation.  Individual occurrences of the same taxon will share the same backbone taxonomy id, as well as the left and right.  One property of nested sets not mentioned in the wikipedia article is that when the records are ordered by their lefts, the query of "give me all records where left is between parent left and parent right" becomes "give me all rows starting with parent left and ending with parent right", which in HBase terms is much more efficient since we're doing a sequential read from disk without any seeking.  So we build the key as leftId_uniqueId, and query as follows (note that startRow is inclusive and stopRow is exclusive, and we want exclusive on both ends):


    HTable table = new HTable(HBaseConfiguration.create(), tableName);
    Scan scan = new Scan();
    scan.setStartRow(Bytes.toBytes((left + 1) + "_"));
    scan.setStopRow(Bytes.toBytes(right + "_"));

    ResultScanner scanner = table.getScanner(scan);

Which looks pretty good, and is in fact about 40% faster than Option 2 (on average - depends on the size of the query result).  But on closer inspection, there's a problem.  By concatenating the left and unique ids with an underscore as separator, we've created a String, and now HBase is doing its usual lexicographical ordering, which means our rows aren't ordered as we'd hoped.  For example, this is the ordering we expect:

1_1234
2_3458
3_3298
4_9378
5_3435
10_5439
100_9763

but because these are strings, HBase orders them as:

1_1234
10_5439
100_9763
2_3458
3_3298
4_9378
5_3435

There isn't much we can do here but filter on the client side.  For every key, we can extract the left portion, convert to a Long, and compare it to our range, discarding those that don't match.  It sounds ugly, and it is, but it doesn't add anything appreciable to the processing time, so it would work.

Except that there's a more fundamental problem - if we embed the left in our primary key, it only takes one node added to the backbone taxonomy to force an update in half of all the lefts (on average) which means all of our primary keys get rewritten.  At 300 million records and growing, that's not an option.

Option 4
Key: native record's unique id
Secondary index: left to list of unique ids

Query style: Following on from Option 3, we can build a second table that will serve as a secondary index.  We use the left as a numeric key (which gives us automatic, correct ordering) and write each corresponding unique occurrence id as a new column in the row.  Then we can do a proper range query on the lefts, and generate a distinct Get for each distinct id.  Unfortunately building that index is quite slow, and is still building as I write this, so I haven't been able to test the lookups yet. 

For those keeping score at home, I'm using Hbase 0.89 (from CDH3b4) which doesn't have built in secondary indexes (which 0.19 and 0.20 did).

I'll write more when I've learned more, and welcome any tips or suggestions you might have to aid in my quest!



No comments:

Post a Comment