Power to the people
The Museum has developed an online resource that gives everyone in the world digital access to the Museum’s collection data and research datasets.
The Museum is home to some 80 million natural history specimens, making it one of the most important scientific collections in the world.
The resources and data in this incredible collection have historically only been accessed when academics take the opportunity to visit the Museum in person.
Now, however, the Museum is launching the Data Portal - a one-stop access point for digital information about specimens. The Data Portal is open to anyone to browse, download and reuse the Museum’s collection data for their own research.
This ground-breaking idea was the brainchild of Museum scientist Dr Vince Smith and digital developer Ben Scott.
Dr Smith explains why it was an important move for the Museum and for science in general:
‘We wanted to expose the Museum’s data to our peers in a way that allows them to easily discover and reuse it. At the moment there is no simple way of doing this, and there is an inconsistent pattern of licensing across our data. The Data Portal is here to fill that gap.’
Both creators say their motivation for the development of the portal was to encourage innovation by sharing the Museum’s data with the global scientific community.
The Museum holds a significant amount of the world’s natural history resources and it has some incredible data on this vast collection.
Dr Smith says:
‘Data about the collection is one of our greatest assets and is arguably the most under-exploited opportunity within the Museum.'
To change this limited access the Museum is embarking on an ambitious plan to create digital records for 20 million specimens in the next five years - known as the digital collections programme.
The aim of the programme is for the 20 million digital collections records to be open to everyone through the Museum’s Data Portal.
Dr Smith and Scott hope that by sharing the collection data with the world, other scientists will be able to reuse the data in new and unexpected ways.
Dr Smith says:
‘By publishing the collection data digitally we suddenly expose it to the world and there is a huge democratising potential in making that information accessible.’
Out in the open
It is hoped that by opening up the collections data to the outside world the Museum’s datasets will be enriched, as scientists and the public can contribute additional information about specimens.
There is also a function in the Data Portal that people can use to report errors.
Scott explains 'This Data Portal offers an important opportunity for citizen science, allowing people outside the Museum to contribute and help us correct these records.’
People can search through all of the collections data and any search results can be downloaded. The entire collections dataset is also accessible through an application programming interface (API) created by the Data Portal development team.
Scott explains 'The API allows you to take all the data you want in an easy, reusable format, so that you can do what you like with it’.
Visual discovery and analysis
The Data Portal team have also developed detailed visualisations of the collections data. This includes global distribution maps and statistical overviews of specimen records. These visualisations can be found on the Data Portal and act as a way of finding, or exposing, patterns and trends in the data for the first time.
Scott is busy planning more data visualisation and other analytical tools for the Data Portal. He and Dr Smith are also devising ‘hackathons’ at the Museum to help develop the Data Portal’s tools.
Scott says 'We hope to organise hackathons around topics like data cleaning and data visualisation where we can get other people to address some of the big problems that we faced with these big datasets. Development resource at the Museum is limited, so if we can get people to build things from our data it’s also fantastic for the Museum.’
Credit where credit is due
Staff at the Museum generate almost 1,000 scientific papers every year and a key function of the Data Portal is to allow these scientists to upload datasets associated with their publications. Another important goal of the Data Portal was to add a mechanism for people to be able to cite data.
Dr Smith says:
‘One of the big reasons people don’t share data is because they don’t really see the value in doing it. A lot of that value comes from citation, from people being able to get credit for that data.’
Each dataset is assigned a DataCite digital object identifier so that people can cite datasets, allowing Museum staff to be credited for their work.
The Data Portal website uses open source software known as CKAN. This was developed by the Open Knowledge Network, and is used for the UK and US government data portals. It has a fast and scalable platform, which Scott says ‘is crucial to be able to deliver datasets of the size and complexity that we have within the Museum.’
Scott adds ‘The fact that this software is open source has enabled the portal’s functions to be shared by other users of CKAN, so the Museum is contributing to the ongoing development of the platform.’
The Data Portal team plan to continue developing the portal’s discovery and analytical tools and are looking forward to the future.
‘Seeing what people will actually do with the dataset is exciting. There is huge added value in exposing the data to the world - to see what new and interesting things people create.’
Over the next five years we plan to digitise 20 million specimens in the Museum's collection.
A pilot project to digitise half a million British and Irish butterflies and moths.
The informatics team are creating digital tools to facilitate access, analysis, reuse and publication of biodiversity data.