People power helps turn historic collection digital

02 December 2013

Museum embarks on mammoth project to recreate itself online.

The Museum has begun the huge task of digitising 20 million specimens onto a database that will be available to everyone. The website will also house scientific publications and papers generated by Museum staff, allowing all data to be used freely.

It will also mean that the curators can better search and manage their collections.

The initial digitising programme is expected to take five years, with the remainder of the 79 million records in the collection being uploaded over five further years.

Butterflies as guinea pigs

The UK and Irish lepidoptera collection (butterflies and moths) was chosen to kick-start the iCollections project because it contains important scientific and historic information. It includes specimens collected from the mid-1800s to the 1960s.

By comparing phenology, ie when the first butterflies appear each year, scientists can see how the climate has changed over the past 200 years.

Critical information written on small labels, giving details of who, why, when and where for each specimen, will be used to create digital maps showing past geographical butterfly hotspots around the UK, also useful for future conservation.

The painstaking work involves photographing 500,000 butterflies and their labels, uploading the images, entering the label data and then storing every specimen in new trays. 


Painstaking work: the iCollections team now upload on average one specimen every three minutes.

So far, the team has entered 100,000 specimens. At this rate it should take a year to capture this section of the butterfly collection.

Museum zoologist Gordon Paterson, head of the iCollections project, said, 'When we started, we hadn't necessarily thought through every process. But it's turned into a true team effort that has tested every resource of the Museum.'

Notes from Nature

Another ongoing digitisation project, Notes from Nature, is using crowdsourcing to decode and transcribe one million entries in the bird register. Volunteers, who range from ornithologists to the simply curious, see an image of a label or sheet from the register and have to transcribe short lines of handwritten text, which is sometimes difficult to read. 

Scientists' handwriting

Although there's a certain romance to the script, there are no pictures so it can be dry work. To keep the volunteers going the iCollections team encouraged the development of online work group communities, trading stories and advice, with occasional input from Museum staff to keep it interesting.  

People as machines

Lawrence Brooks, a database expert in the Zoology Department, said crowdsourcing is invaluable. 'It's a unique way of using people as machines and machines as people - a sort of joint payoff to compensate that computers can't read handwriting and people need computers to collate this huge collection of information.'

Museum entomologist Vince Smith, who is in overall charge of building the digital infrastructure, said ‘pure data input of, say, of research papers, is relatively straightforward. The Museum collection is much more difficult to record, not least because some of the records are virtually illegible.'

Smith's job is to come up with a system capable of digitising the 79 million specimen records, which in the time frame specified, means somehow uploading 18,000 records a day. The diversity of the collection means the solution for one part is different to that needed for another.

Smith, who admits one of the things that keep him awake at night is whether he can get enough information online to make it useful, is working to go beyond crowdsourcing to find a way of utilising the people power available daily in the Museum.

‘It could mean tapping into people standing in the dinosaur queue, or sitting drinking coffee in one of our cafés. Or it could mean creating special technical areas where people can see their work going online.

'If there are 20 million specimens to upload in the first phase, and five million people pass through the Museum every year, if every person digitised one specimen, think what could be done.'



The Citizen Science Alliance is a collaboration of scientists, software developers and educators who collectively develop, manage and utilise internet-based citizen science projects. The Notes from Nature online projects

Share this