Making data digital

The 79 million specimens in our collection provide a vital source of information about the natural world. The informatics group is developing methods to digitally capture and analyse this powerful resource.

Our collections represent a treasure trove of scientific information relevant to the study of:

  • taxonomy
  • systematics
  • biological conservation
  • land management
  • agriculture
  • pollination
  • climate change

These data have the potential to transform our understanding of the biosphere, helping computational scientists and ecologists to model all life on Earth.

We aim to produce a globally accessible database of information by digitising and transcribing the data from millions of Museum specimen labels.

Dynamic data

The informatics group is working with commercial and academic partners to speed up the process of extracting information from Museum collections.

Techniques critical to this process include:

Industrial-scale imaging

We are developing methods of digitisation that minimise and streamline resource intensive work such as selecting specimens and databasing.

In a collaborative project we have produced a museum drawer scanning system that digitises and captures data from multiple specimens in one step, rather than imaging and databasing each specimen individually.

Text recognition

Automated methods for capturing text can reduce the cost and increase the efficiency of digitising the Museum collections include:

  • optical character recognition (OCR)
  • natural language processing
  • human assisted parsing


Current digitisation methods are relatively slow, labour intensive and expensive. In collaboration with Zooniverse we are exploring the use of crowdsourcing for the transcription of hand-written museum labels to populate Museum databases.

Volunteers will be able to transcribe data from:

  • specimen labels
  • ledgers
  • herbarium sheets