Wednesday 20 July 2016

Probably Turboveg's best-kept secret

Turboveg is one of the most widely used software programs used to manage vegetation data. Probably its best-kept secret is that it can export vegetation data in Darwin Core Archive (DwC-A) format, which is a standard format that enables its quick and easy integration with other resources on GBIF.org. Turboveg v2 converts vegetation data into species occurrence data packaged as a DwC-A. Now thanks to an 8 month long collaboration between GBIF and Stephan Hennekens (Turboveg's developer), v3 will convert vegetation data into sampling event data packaged as a DwC-A - a much more faithful and useful representation of the data.

Turboveg

Screenshot of Turboveg v3 prototype
Turboveg is an easy to install and easy to use Windows program for storing, managing, visualizing and exporting vegetation data (relevés). A relevé is a list of the plants in a delimited plot of vegetation, with information on species cover and on substrate and other abiotic features in order to make as complete as possible description in terms of plant community composition and structure.

Today there are about 1500 users of the software worldwide managing more than 1,5 million relevés. Turboveg can export relevés in various file formats, which is useful to enable further analysis. Support for exporting relevés as species occurrence data packaged as a Darwin Core Archive (DwC-A) was added to v2 in 2011. Guidance on how to use this feature can be found in the Turboveg User Manual.

Version 3, due to be released in 2017, will export relevés as sampling event data packaged as DwC-A - a format that more accurately reflects the original data.

Sampling event data

Sampling event data derive from environmental, ecological, and natural resource investigations that follow standardized protocols for measuring and observing biodiversity. This is in contrast to opportunistic observation and collection data, which today form a significant proportion of openly accessible biodiversity data. A good example of sampling data is data coming from vegetation sampling events using the Braun-Blanquet protocol. Because the sampling methodology and sampling units are precisely described the resulting data is comparable and thus better suited for measuring trends in habitat change and climate change.

Sampling event data model

A data model provides the details of the structure of the data. Previously sampling event data couldn't be modelled in a standardized way in Darwin Core due to the complexity of encoding the underlying protocols. Over the past two years, however, GBIF has been working with EU BON and the wider bioinformatics community to develop a data model for sharing sampling event data. In March 2015 TDWG, the international body responsible for maintaining standards for the exchange of biological data, ratified changes that enabled support for modelling sampling event data.

In summary, the de facto data model for sampling event data in Darwin Core consists of three tables: Sampling event, Measurements or Facts and Species occurrences. 

A Sampling event can be associated with many Species occurrences, while a Species occurrence can only be associated to one Sampling event. Similarly, a Sampling event can be associated with many Measurements or Facts. In this way a Sampling event has a one-to-many relationship to both Species occurrences and Measurements or Facts. 

Note additional tables of information can also be added to a Sampling event, such as Multimedia (e.g. to record images of the plot). More information about this preferred data model for sampling event data can be found in the IPT Sampling Event Data Primer.


Sampling event data model for vegetation plot data

Vegetation surveys or relevés produce a wealth of information on species cover and on substrate and other abiotic features in the plot. Species cover can be measured using dozens of different vegetation abundance scales such as the Braun-Blanquet scale or Londo decimal scale to name a couple. To standardize how this information is stored, a custom Relevé table is used instead of the Measurements or Facts table.

This data model for vegetation plot data in Darwin Core consists of three tables: Sampling event, Relevé and Species Occurrence.

A Sampling event can be associated with only one Relevé. The Relevé consists of the most common relevé measurements covering all vegetation layers. Note for each measurement the unit and precision is explicitly defined. A Sampling event can also be associated with many Species occurrences, however, each Species occurrence should specify the vegetation layer where it was found hence the same species can be found within multiple vegetation layers. In this way the vegetation composition can be described for each layer within the plot.

Note that at the time of writing the Darwin Core standard doesn't have the terminology for storing vegetation layers. Therefore a formal proposal has been made to add the new term "layer" to Darwin Core. To standardise how this new term is populated, a custom vocabulary for vegetation layers has also been produced.


Example DwC-A export by Turboveg: Dutch Vegetation Database (LVD)

Fortunately, the Dutch Vegetation Database (LVD) has recently been republished using the new sampling event format and can thus serve as an exemplar dataset. LVD is a substantial dataset published by Alterra (a major Dutch research institute) that covers all plant communities in the Netherlands with more than 85 years of vegetation recording for some habitats. The latest version of this dataset has more than 650 thousand relevés associated with almost 12 million species occurrences.

Alterra uses Turboveg v3 to manage this dataset and export it in the standardized DwC-A format. It is important to note that special care is taken by the software to protect sensitive species: the location of plots, which have red list species observed in them, are obfuscated to a level of 5x5 km squares. Furthermore the software converts all coverage values to the same unit (e.g. species coverage values are converted into percentage coverage) in order to make the data easy to use and integrate with other sources.

Sampling event data on GBIF.org: Dutch Vegetation Database (LVD)

GBIF.org map of LVD georeferenced data
All versions of LVD are imported to the EU BON IPT where they get archived and published through GBIF.org

The 8 month long collaboration between GBIF and Stephan Hennekens culminated in the latest version of LVD being indexed into GBIF.org here. A special and grateful thanks is owed to Stephan for all his hard work to make this happen.

Over the next couple of years GBIF will continue working on enhancing the indexing and discovery of sampling event datasets (e.g. showing events' plots/transects on a map, filtering events by sampling protocol, indexing Relevés, etc.). At least when Turboveg v3 is released in 2017, users can already export their relevés into this new standardized format that represents their data much more faithfully.