Thursday 28 June 2018

Occurrence Downloads

Occurrences at GBIF are often downloaded through the web interface, or through the api (via rgbif ect.). Users can place various filters on the data in order to limit the number of records returned. As the occurrence index is currently a 447 GB csv, most users want to use a filter.

Total monthly downloads

Here I plot the total monthly downloads for various popular filters. For the past few years, GBIF has be averaging around 10k downloads per month.

Two peaks in total downloads stand out:
  • Mar 2014
  • Sep 2016
The Sep 2016 peak seems to be explained by high DATASET_KEY downloads. Both the Mar 2014 and Sep 2016 peaks are well explained by the top users. Top users in this graph are all the downloads generated by the top 3 most active users on GBIF. These users generate downloads in the 1000s and are most likely to be automated downloads generated internally.

One interesting detail is that while No Filter Used is not used very often it accounts for more than 500 billion occurrence records downloaded.

Finally, if we look at the number of unique users (un-select everything else to see in isolation), we see that the number of individuals making downloads on GBIF has been increasing steadily with some perhaps interesting cyclical patterns. The graph below is interactive. You can see different data views by clicking on the names. 


Popular filters explained

There are many ways that a user can filter data. The types and combinations of filters are almost limitless. Below I describe some of the most common filters:

1. TAXON_KEY

This is one of the most common filters users place on the GBIF occurrence index. Users can either choose one or many taxon names to filter the data, and users can choose any taxon rank they want (species, genus, family, kingdom ect.).

2. COUNTRY

Here users can return records only from a certain country. This is the country the user searched and not where user is searching from.

3. HAS_GEOSPATIAL_ISSUE

Here users can specify that they want occurrence records without some interpreted error.

4. HAS_COORDINATE

Here users can say that they want occurrence records that have coordinates.

5. No Filter

Finally, a surprising number of users never put any filter and instead request to download the entire occurrence index. In the overwhelming majority of cases, we have to assume these users have done this by mistake.

You can read more about downloads at GBIF here:
http://www.johnwalleranalytics.org/2018/05/30/gbif-download-statistics/