Wednesday 9 November 2011

Important Quality Boost for GBIF Data Portal

Improvements speed processing, “clean” name and location data, enable checklist publishing.

[This is a reposting from the GBIF news site]
A major upgrade to enhance the quality and usability of data accessible through the GBIF Data Portal has gone live.

The enhancements are the result of a year’s work by developers at the Copenhagen-based GBIF Secretariat, in collaboration with colleagues throughout the worldwide network.

They respond to a range of issues including the need for quicker ‘turnaround’ time between entering new data and their appearance on the portal; filtering out inaccurate or incorrect locations and names for species occurrences; and enabling species checklists to be indexed as datasets accessible through the portal.

After a testing period, the changes now apply to the more than 312 million biodiversity data records currently indexed from some 8,500 datasets and 340 publishers worldwide.
Key improvements include:

•    processing time for data has fallen from 3-4 days to around 36 hours, paving the way for more frequent ‘rollovers’ or index updates;
•    the ‘backbone taxonomy’ used by the GBIF Portal has been reworked with up-to-date checklists and taxonomic catalogues such as the Catalogue of Life 2011, improving search and download;
•    checklists describing species in particular geographic locations, taxonomic groups or thematic categories (eg. invasives) can now be published using a standard set of terms called the Global Names Architecture (GNA) Profile (see GNA guidelines) and thus become indexed and accessible via the Data Portal;
•    automated interpretation of the coordinates, country location and scientific names used in published records has been improved to screen out inaccuracies – for example, ensuring that records identified as coming from a particular country are shown as occurring within the borders and territorial waters of that country; and
•    a mechanism using the Hadoop open-source software system has been introduced to ensure that the Data Portal is able to cope with anticipated future growth in the volume of data.
The algorithms and dictionaries developed to improve interpretation of data published through the GBIF Data Portal are intended for future re-use by the wider biodiversity informatics community.

Commenting on the release of these substantive Data Portal improvements, GBIF Executive Secretary Nicholas King said: “These changes represent a major step forward in the usefulness of GBIF to science and society.

“They are a direct response to the feedback we have had from the data publishing and user communities, and will enable an even greater return on the long-term investment made over the past decade by GBIF Participant countries.”
IPT v.2.0.3 launched

The GBIF Secretariat has also issued a new release of the Integrated Publishing Toolkit (IPT), which enables biodiversity data updates to be ‘harvested’ automatically from databases published to the Internet.

IPT version 2.0.3 addressed 76 reported issues from the previous version, and includes translations into French and Spanish.

Instructions on installing the new version are available here