Skip navigation

The NaturePlus Forums will be offline from mid August 2018. The content has been saved and it will always be possible to see and refer to archived posts, but not to post new items. This decision has been made in light of technical problems with the forum, which cannot be fixed or upgraded.

We'd like to take this opportunity to thank everyone who has contributed to the very great success of the forums and to the community spirit there. We plan to create new community features and services in the future so please watch this space for developments in this area. In the meantime if you have any questions then please email:

Fossil enquiries:
Life Sciences & Mineralogy enquiries:
Commercial enquiries:

Curator of Micropalaeontology's blog

2 Posts tagged with the smithsonian tag

I have just read an excellent blog article by Nick Poole about the Smithsonian Digitisation Fair in Washington. I gave a talk last December about the cost of mass digitisation at the Annual General Meeting of the Geological Curators' Group at Leeds Museum and feel inspired to jot down the thoughts of a curator in the middle of a mass digitisation project. Here are my 10 steps to mass digitisation dealing with some of the pitfalls, how we have managed to overcome them, a timeline and finally an estimate of the cost of this mass digitisation project.


  • Data entry templates

I have been asked so many times if I can provide a template for easy data capture. In my experience, each dataset is different and considerable initial thought is required to design a good data capture structure. I was given 100,000 micropalaeontological records back in 2009 that were created using MS Access on a data entry sheet designed to mirror fields in our KE Software collections management system, KE Emu. You can never spend too much time at the start of the process testing how it works so that the data you capture is useable. It could save weeks if not months of re-formatting at a later stage. This is especially critical if you will later rely on someone else to deliver your data to the web.




The old paper microfossil registers transcribed into an MS Access database at the start of the project


  • Getting help with entering data

Two contract data entry clerks were responsible for initial data entry of our old micropalaeontology specimen registers. There has been a lot of debate about whether non-specialists can work as accurately as specialists. I would say that they did an excellent job in transcribing exactly what was written in the registers apart from when the handwriting was poor. I often had trouble interpreting what had been written in these cases! They did it in a fraction of the time it would have taken me. I haven't tried crowdsourcing but I am certainly considering it to help clear some of the electronic backlog registration that has accumulated since we stopped recording everything in pen and ink.


  • Cleansing

The data entry clerks were told not to do any interpretation and to transcribe exactly what had been written in the registers. This is fine because we wanted to maintain a good balance between recording the original register data and making informed interpretations. No orginal data has been removed during the migration as we were able to record details in verbatim fields. Considerable cleansing of the data has been neccessary, mainly because the data in our registers is not sufficiently detailed or needs updating to reflect changes in political boundaries. Various other key areas required cleansing and these are dealt with below.


  • Maintaining data standards

There are many ways of writing people's names (Miller, C. G., Mr C. G. Miller, Dr C. Giles Miller ... etc) and the hand written registers reflect the fact that there was never a standard followed. Matching records in the MS Access database with those already in KE Emu was therefore difficult to impossible without creating many duplicate entries. To avoid this, we compiled a list of all the names associated with the collection and distilled them down to a list of about 2,000. We then checked these against all current museum records and found that many had already been created by other members of Museum staff. We then linked these records directly back to our data using a internal record number or "irn" so that we could be sure that the correct record in the correct format was being linked to. New records were created if neccessary from the dataset of names we compiled.



Some relatively complete examples of bibliographic citations


  • Breaking tasks down into manageable blocks

In some ways we did this with the process we used for people names. I was interested to see in Nick Poole's blog that the Smithsonian are using similar strategies of breaking the tasks down into smaller blocks to achieve larger digitisation goals. Bibliographic citations like those above, have not been complete enough to create records direct from the registers as many use abbreviations, lack vital data or need further research to make them meaningful. I wrote a short subproject proposal for internal funds to hire an assistant for 6 months who created full reference details for all the published specimens in the collection. In reality this took a much shorter time than expected and she was able to help with many other tasks associated with preparing the data for migration into KE Emu.


  • Using pre-existing datasets

Again the registers were not complete enough to be able to create identification records from scratch because generic names were often abbreviated or  the original describing author details were missing. There are many biodiversity resources on the internet including the Ellis and Messina Catalogue of microfossil species published by the Micropalaeontology Press. I asked them if I could use their list of microfossil names to help populate our database and for a small fee they provided an MS Excel file of all the species in their database. I imported about 50,000 complete microfossil names into KE Emu and used a simple VLOOKUP function in MS Excel to match these with electronic records created from the paper registers. When no match was achieved I checked why, corrected the data if neccessary or used the data to create a new species records in KE Emu.


  • Thinking positively

Shortly after arriving at the Museum in the 1990s I remember being told by a senior member of staff that it would take us 250 years to database the entire collection. Sometimes it's difficult to get started when you feel that your efforts are only just touching the surface or will go off into some black hole of a database that won't ever be useful because hardly any of your hundreds of thousands of objects are registered in it. I have to admit that there have been some times in my career when I have felt like this. My mentor encouraged me to see the bigger picture and the benefits of the project that I was involved in. Bringing data checking up to the top of my list of collections management priorities has paid immediate dividends.


  • The bigger picture

There are so many advantages to having the majority of your collection on an electronic database that is searchable via the web. Even though I am already half way though, I have seen real benefits in answering enquiries quickly and easily. Once everything is migrated I will be spotting areas for development of the collection, looking for potential areas for de-accession while gathering hard data on the collection strengths. It is much easier to raise the profile of the collection and encourage visitors to the collections through schemes such as SYNTHESYS when you can send out messages to list-servers advertising a web link to your collection. Another major advantage is that I now have somewhere to associate the many electronic images and documents that relate to my collections and these are being delivered to the web should I choose to.


  • Estimating timescales

The initial data entry from the registers took our two clerks 4 months each to input a total of 100,000 records. In 6 months my assistant created full bibliographic records for the whole dataset and added "irn" references for all of the people associated as either collectors, donors or publishers. The process that has taken longest is my data checking, particularly for the scientific accuracy of the fossil names. I would estimate that I spent between 5 and 10 per cent of my time checking data and preparing import sheets since the project started. I am therefore the log jam! At the current rate we are looking at sometime in 2015 for completion of the entire 100,000 record dataset.



Lyndsey Douglas researching full bibliographic microfossil reference details in the Heron-Allen Library


  • Costs

Obviously it would be imprudent to show a breakdown of salary costs here so I will just say that at Christmas last year when 36,000 KE Emu records had been created, the cost came to roughly one pound per record. This includes the Micropalaeontology Press fee, salary costs for initial data entry, an assistant for 6 months and for 10 per cent of my time. I have not included other expenses like building and IT overheads. I expect that the final cost per record at the end of the project will be slightly less than a pound per record as the major expenditure of salary for the data entry people and the 6 month posts are now accounted for. The final cost will depend on how long it takes me to finish checking and migrating the data.


I may be only half way through importing the 100,000 records, but I would like to think that this project can provide some valuable benchmark data for those planning future projects, suggest some ways of making the process quicker and help with forecasting costs and timeframes.


I'm so tempted to say that a microfossil curator attends meetings and writes e-mails. Sometimes it feels like that. I decided to document a typical day back in January where e-mails and meetings helped prepare towards a loan for an art exhibition, gave news of a potentially exciting new acquisition and a possible research opportunity involving micro-CT scanning.



One of Irene Kopelman's items in the Gasworks Gallery based on microfossils from our collection


The bulk of the e-traffic involves preparations towards an exhibition that opened on 10 Feb at the Gasworks Gallery near the Oval Cricket Ground. Artist Irene Kopelman's work was partly inspired by some slides of radiolarian microfossils from our collections. We are preparing an exhibition loan of the slides and today there is a lot of correspondence discussing arrangements for two open day tours I am holding to accompany the exhibition.


Most microfossils are so small that I have to deal with images rather than the specimens themselves. We recently sent some specimens on loan to the Smithsonian Institution in Washington where a researcher has made some images for a publication and left them on an ftp site for me to collect. I am also making arrangements for other images of our specimens to be sent to us by one of our regular visitors. They have posted them on an excellent site for people interested in foraminiferal microfossils.



Aggerostramen rustica, a type of foraminiferal microfossil that builds a shell from sediment. In this case, sponge spicules have been chosen. This image has been posted on-line at the site mentioned above


Typically a day will not pass without some correspondence with future visitors to the collections and/or an actual visit from a scientist. Two visitors want to come in a couple of days time and another wants to visit the following week to discuss a short paper on a major collection of 2,500 slides that they donated last year.


In a few days time I'm off to our collections outstation in Wandsworth to meet OU PhD student Kate Salmon who is using our collections to study ocean acidification. I need to book a Museum vehicle to transport me to Wandsworth and to bring the collections back that she would like to borrow.


I mentioned meetings but you'll be glad to know that I'm not going to go into detail here. From one meeting I come away with two additional enquiries to answer; a request by a journalism student for a 5 minute mock radio interview and a student wants images of some of our specimens for their thesis.


I am also asked to assess a destructive sampling request as my boss is away. Sometimes our samples or specimens need further analysis to reveal their true scientific potential. In this case the borrower wants to make thin sections of fragments of fish fossils and to carry out 3-D imaging using a synchrotron (see my previous blog on sex in the Cretaceous for details of synchrotrons). The work will potentially give important details about early fish evolution so the request is ratified.



Erasmus student Angelo Mossoni using one of the scanning electron microscopes at the Museum.



The excellent research facilities here at the Museum offer many exciting possibilities. Today an e-mail has come in requesting bids for use of the micro-CT scanner. I want to test whether this method can provide 3-D images of some tiny specimens the reverse sides of which we cannot analyse at the moment because they are stored embedded in wax. If it works, some 3-D images of some of our most important specimens will be delivered to the web. Some of these species have been used extensively in studies on climate change and oceanography.


One message informs me that an exciting new sample has just been sent as a donation from Oman. When it arrives I will need to dissolve some of it in acid (vinegar) to release the tiny fossils. Traces of fish microfossil are clearly visible on the surface of the rock so this sounds very promising and possibly the subject of a new paper on early fish evolution.


It would appear from everything listed above that there is not much time for any other activities. However, documenting the collections for the web is one of our core duties so I find time in the afternoon to work towards a documentation project. I am also on duty for an hour to answer questions from my fellow curators and my mentee Jacqui about using the databasing system.


A number of people including my two new colleagues Tom and Steve, pop their heads round my door to ask questions about the collections or bring me information. Retired Museum Associate Richard Hodgkinson is in today and has some questions about his project. Another retired member of staff brings me a copy of his latest paper and former volunteer and now colleague Lyndsey Douglas comes to tell me that my blog has been quoted in the January edition of the Museums Journal!


It's an amazingly variable job being a microfossil curator and no day is ever the same as another. I love my job and I think of it as unique. I don't know of anyone else in the world who has a similar job in Micropalaeontology. If you have a similar job, I'd love to hear from you.

Giles Miller

Giles Miller

Member since: Apr 21, 2010

This is Giles Miller's Curator of Micropalaeontology blog. I make the Museum micropalaeontology collections available to visitors from all over the world, publish articles on the collections, give public talks and occasionally make collections myself.

View Giles Miller's profile