Friday, 29 April 2011

The evolution of the GBIF Registry

The GBIF Registry has evolved through time to become an important tool in GBIF's day to day work. But before going into this post, a basic understanding of the GBIF Network model should be provided. GBIF is a decentralised network that has several network entities that are related in some way between each other. At the top level, there are GBIF Participant Nodes, which typically are countries or thematic networks that coordinate their domain. These Nodes endorse one or more Organisations or Institutions inside their domain, and each Organisation possesses one or more Resources exposed through the GBIF Network. Also, each Resource typically comes associated to a Technical Access Point which is the url to access its data. There are also other entities such as IPT Installations which are deployed inside specific organisations, but are not resources by themselves. They publish Resources that might be owned by other organisations. A quick view on the GBIF's network model can be seen:


Not long ago, this complexity was modeled using an Universal Description, Discovery and Integration (UDDI) system. This system served a purpose at the time, despite its limited data structures types (e.g. businessEntity, businessService, bindingTemplate, tModel). A BusinessEntity was associated with an Organisation/Institution, a BusinessService was associated to a Resource and a BindingTemplate was associated with the technical access point to access the data from that specific resource. A tModel was used to associate the BusinessEntity(Organisation) with a specific Node inside the GBIF Network. A quick view on how the network information was kept on this Registry :



The main disadvantages (for our concerns) of the UDDI Specification:

  1. Its lack of contact information at the BusinessService(Resource) level (contacts can only be added at the BusinessEntity(Organisation) level)
  2. Lack of more descriptive metadata on Organisation and Resources (lacking fields such as the address, homepage, phone of the organisation - sure you could provide all of this information through a complex use of UDDI's capabilities, but will result in unnecessary complexity to extract this information for third-party tools.
  3. Limited to a fixed specification and to a fixed API (although, the UDDI client libraries available are quite straightforward to use)
  4. General purpose specification, not easily adaptable for modeling the complexity of GBIF's network.
  5. Our software dated back to the beginning of the past decade (Systinet WASP UDDI).
  6. Third party consumers will need to know how to talk UDDI
In 2009, we tried overcoming some of our Registry limitations by trying an "UDDI on steroids" approach, which consisted still of an UDDI system (jUDDI in our case) and an external database which would hold some extra data (e.g. Resource contact information, organisation's address, homepage or phone, etc.). The main advantage was the creation our own APIs so that third-party tool developers, who wanted to consume the GBIF's network information, didn't need to know the nuts and bolts of UDDI specs anymore. We offered the community a simple API and its proper documentation, and we dealt with the inner workings of it all.

Further in this evolution, our Registry took the next step and we removed the UDDI component and were left only with a DB which gave us complete freedom to model the network. We now had a system on hand which offered the possibility to create any kind of entities on the Network (Nodes, Organisations, Resources, Technical Installations) and any relation among them. Along with this new approach, came the web application (http://gbrds.gbif.org) and a far better API which offered the possibility to consume the data in XML or JSON format. These APIs are easy to follow and are well documented (http://code.google.com/p/gbif-registry/wiki/TableOfContents). Among the new features:

  1. Create any kind of entities
  2. Create any kind of relation among them
  3. More detailed metadata (for entities and contacts)
  4. Ability to tag entities
  5. Individual credentials for each Institution/Organisation to provide the ability to add new or delete existing resources under their own Organisations (this is currently only available through the APIs or via admin management)
  6. Enhanced maintenance features (for admins)

[Evolution of GBIF's Registry]
Development is still ongoing and many exciting features are expected in the future. The status of development can be checked out here.

No comments:

Post a Comment