Wednesday 11 May 2011

The GBIF Spreadsheet Processor - an easy option to publish data

Most of data publishers in the GBIF Network use software wrappers to make data available on the web. To set up those tools, usually an institution or an individual needs to have certain degrees of technical capacity, and this more or less raises the threshold for publishing biodiversity data.

Imaging an entomologist who deals with collections and monographs everyday, the only thing s/he does on a PC is Word or Excel. S/he's got no student to help with, but keen to share the data before s/he retires. What is s/he going to do?

One of our tools is built to support this kind of scenario - the GBIF Darwin Core Archive Spreadsheet Processor, usually we just call it "the Spreadsheet Processor."

The Spreadsheet Processor is a web application that one can:

  1. Use templates provided on the web site;
  2. Fill and upload(or email) the xls file;
  3. Get a Darwin Core Archive file as the result.

This is a pretty straight-forward approach to prepare data for publishing, because the learning curve is flat if users already know how to use Excel, how to upload a file on a web site.

When the spreadsheet template is uploaded to the page, the web app first parses the values in the metadata sheet to generate an eml.xml, and then the occurrence or checklist sheet to generate an meta.xml and csv file. These files are then collected and zipped according to Darwin Core Archive standard - ready to download.

With a DwC-A file, the data is in a standardized format and ready to be published. In the example scenario above, this entomologist can either only share them among colleagues, or, send them to the nearest GBIF node which hosts IPT. Since IPT can digest a DwC-A file and publish it, the entomologist doesn't need to know the usage of IPT. To update it, s/he can revise the spreadsheet, create and send DwC-A to the node again.

P.S. This manual explains how to publish and register data in DwC-A format.

No comments:

Post a Comment