Friday 26 August 2011

Darwin Core Archives for Species Checklists

GBIF has long had an ambition for supporting the sharing of annotated species checklists through the network. Realising this ambition has been frustrated by the lack of a data exchange standard of sufficient scope and simplicity as to promote publication of this type of resource. In 2009, the Darwin Core standard data set was formerly ratified by the TDWG, Biodiversity Information Standards. The addition of new terms, and a means of expressing these terms in a simplified and extensible text-based format, paved the way for the development of a data exchange profile for exchanging species checklists known as the Global Names Architecture (GNA) Profile. Species checklists, published in this format, can be zipped into single, portable, 'archive' files.
Here I introduce two example archives that illustrate the flexible scope of the format. The first represents a very simple species checklist while the second is a more richly documented taxonomic catalogue. The contents of any file can be viewed by clicking on the file icon or filename. A complete list of terms used in sharing checklists can be found here.
Example 1: U.S. National Arboretum Checklist
This checklist represents the most simple checklist archive. It consists of a document that describes the checklist and a second file with the checklist data itself. The checklist data consist of two columns. Note that by including column headers that match the standard DarwinCore term names, that no additional mapping document is needed.
EML.xml The checklist is documented using an Ecological Metadata Language (EML) document.
Checklist.txt The checklist itself is kept in this simple text file.
Example 2: Catalog of Living Whales
This checklist represents an annotated species checklist. In addition to the core species list ('whales.tab') there are numerous other data types consisting of Darwin Core extensions that conform to the GNA Profile. This more complex archive contains a resource map file ('meta.xml') that describes the files in the archive. An EML metadata document describes the catalog itself. This more complex archive uses a common identifier, taxonID, to link data in the extension files to the data records in the core species checklist ('whales.tab').
EML.xml The checklist is documented using a Ecological Metadata Language (EML) document. It includes a title, contacts, citation information and more.
whales.tabThe checklist itself is kept in this tab-delimited file.
meta.xml The files in the archive are described in this resource map file.
distribution.tab Distribution information conforming to the GNA Distribution extension are stored in this file.
references.tab Bibliographic references are stored in this file and linked to 'whales.tab' via the taxonID
types.tab Type specimen details are contained in this file.
vernaculars.tab Common name information that conforms to the GNA Vernacular Extension are stored in this file.

No comments:

Post a Comment