Skip navigation
0

Over the last month or so we have been preparing a large microfossil teaching collection for loan to the University of Birmingham to support a new postgraduate masters course on Applied and Petroleum Micropalaeontology. The collection consists of 730 slides and over 2,500 countable specimens housed in a single cabinet.

 

Microfossils in the collection represent all the different foraminiferal groups and were compiled by Prof. John Haynes of University of Wales, Aberystwyth where he used the collection to teach an M.Sc. course in Micropalaeontology before his retirement in 1993.

 

Group_of_Foraminifera_A1_19-trans3-10x_blog.jpg

A glass slide with mounted specimens arranged to illustrate the shell structure in various groups of Foraminifera. The crack across the slide was caused, presumably by a student, while focussing the microscope too closely on the slide! In general the collection is in remarkable condition considering the many years of use for teaching.

P1010016_blog.jpg

The loaned teaching collection cabinet represents one of 62 microslide cabinets donated to the Museum by the University of Wales, Aberystwyth in the early 2000s along with associated residues, samples, notes and student theses. The university stipulated that a well curated collection be left before a student could graduate so the entire collection is in beautiful order.

 

Prof. Haynes supervised over 80 M.Sc. dissertations and 30 M.Phil and Ph.D. research students before he retired. The Aberystwyth Collection also contains ostracod collections compiled by Prof. Robin Whatley and his students. A searchable collections level catalogue of the Aberystwyth Microfossil Collection can be found on the Museum website.

 

Lepidocyclina_limestone-A15-8b_1_2x_blog.jpg

A thin section of limestone composed almost entirely of Lepidocyclina, a genus of larger Foraminifera.

 

The University of Birmingham is the only university in the UK currently offering a full M.Sc. course specialising in micropalaeontology. The course started in October 2012 and the teaching in the first month will include classes on Foraminifera taught by Haydon Bailey.

 

Foraminifera and other microfossil groups are very useful for dating rock formations as well as giving details of the environment that they were deposited in. This sort of information is vital in producing models for exploration of petroleum and other natural resources.

 

To prepare the loan we had to compile a list of all specimens, count them if possible, number the slides individually and make notes on the condition of specimens and slides that were in poor condition. A big thank you to Haydon Bailey and Daryl Tappin for help in preparing this vast loan.

 

Below are a few more images of some of the specimens that caught my eye while I was checking the loan. I hope that both you and the University of Birmingham students will enjoy this amazing collection!

 

Alveolina_elliptica-A7_16-section-2-5x_BLOG.jpg

A thin section of Alveolina elliptica. The cracks are in mounting balsam.

Various_foraminifera_Nothe-A9_7-slide2-3_2x_blog.jpg

A slide with an uncountable number of foraminiferal specimens from the Nothe Clay of the Jurassic coast. Roughly 2,500 specimens were counted on the loaned slides. In reality the collection consists of far more than 2,500 specimens as we did not try to count the individual specimens on 200 of the 730 slides because there were too many of them.

Nummulites_britannicus-A16_9-lat-1_2x_blog.jpg

.. and finally Nummulites britannicus.

0

I have just read an excellent blog article by Nick Poole about the Smithsonian Digitisation Fair in Washington. I gave a talk last December about the cost of mass digitisation at the Annual General Meeting of the Geological Curators' Group at Leeds Museum and feel inspired to jot down the thoughts of a curator in the middle of a mass digitisation project. Here are my 10 steps to mass digitisation dealing with some of the pitfalls, how we have managed to overcome them, a timeline and finally an estimate of the cost of this mass digitisation project.

 

  • Data entry templates

I have been asked so many times if I can provide a template for easy data capture. In my experience, each dataset is different and considerable initial thought is required to design a good data capture structure. I was given 100,000 micropalaeontological records back in 2009 that were created using MS Access on a data entry sheet designed to mirror fields in our KE Software collections management system, KE Emu. You can never spend too much time at the start of the process testing how it works so that the data you capture is useable. It could save weeks if not months of re-formatting at a later stage. This is especially critical if you will later rely on someone else to deliver your data to the web.

 

Registers_DSCF1048_blog.jpg

 

The old paper microfossil registers transcribed into an MS Access database at the start of the project

 

  • Getting help with entering data

Two contract data entry clerks were responsible for initial data entry of our old micropalaeontology specimen registers. There has been a lot of debate about whether non-specialists can work as accurately as specialists. I would say that they did an excellent job in transcribing exactly what was written in the registers apart from when the handwriting was poor. I often had trouble interpreting what had been written in these cases! They did it in a fraction of the time it would have taken me. I haven't tried crowdsourcing but I am certainly considering it to help clear some of the electronic backlog registration that has accumulated since we stopped recording everything in pen and ink.

 

  • Cleansing

The data entry clerks were told not to do any interpretation and to transcribe exactly what had been written in the registers. This is fine because we wanted to maintain a good balance between recording the original register data and making informed interpretations. No orginal data has been removed during the migration as we were able to record details in verbatim fields. Considerable cleansing of the data has been neccessary, mainly because the data in our registers is not sufficiently detailed or needs updating to reflect changes in political boundaries. Various other key areas required cleansing and these are dealt with below.

 

  • Maintaining data standards

There are many ways of writing people's names (Miller, C. G., Mr C. G. Miller, Dr C. Giles Miller ... etc) and the hand written registers reflect the fact that there was never a standard followed. Matching records in the MS Access database with those already in KE Emu was therefore difficult to impossible without creating many duplicate entries. To avoid this, we compiled a list of all the names associated with the collection and distilled them down to a list of about 2,000. We then checked these against all current museum records and found that many had already been created by other members of Museum staff. We then linked these records directly back to our data using a internal record number or "irn" so that we could be sure that the correct record in the correct format was being linked to. New records were created if neccessary from the dataset of names we compiled.

 

Registers_citations_blog.jpg

Some relatively complete examples of bibliographic citations

 

  • Breaking tasks down into manageable blocks

In some ways we did this with the process we used for people names. I was interested to see in Nick Poole's blog that the Smithsonian are using similar strategies of breaking the tasks down into smaller blocks to achieve larger digitisation goals. Bibliographic citations like those above, have not been complete enough to create records direct from the registers as many use abbreviations, lack vital data or need further research to make them meaningful. I wrote a short subproject proposal for internal funds to hire an assistant for 6 months who created full reference details for all the published specimens in the collection. In reality this took a much shorter time than expected and she was able to help with many other tasks associated with preparing the data for migration into KE Emu.

 

  • Using pre-existing datasets

Again the registers were not complete enough to be able to create identification records from scratch because generic names were often abbreviated or  the original describing author details were missing. There are many biodiversity resources on the internet including the Ellis and Messina Catalogue of microfossil species published by the Micropalaeontology Press. I asked them if I could use their list of microfossil names to help populate our database and for a small fee they provided an MS Excel file of all the species in their database. I imported about 50,000 complete microfossil names into KE Emu and used a simple VLOOKUP function in MS Excel to match these with electronic records created from the paper registers. When no match was achieved I checked why, corrected the data if neccessary or used the data to create a new species records in KE Emu.

 

  • Thinking positively

Shortly after arriving at the Museum in the 1990s I remember being told by a senior member of staff that it would take us 250 years to database the entire collection. Sometimes it's difficult to get started when you feel that your efforts are only just touching the surface or will go off into some black hole of a database that won't ever be useful because hardly any of your hundreds of thousands of objects are registered in it. I have to admit that there have been some times in my career when I have felt like this. My mentor encouraged me to see the bigger picture and the benefits of the project that I was involved in. Bringing data checking up to the top of my list of collections management priorities has paid immediate dividends.

  

  • The bigger picture

There are so many advantages to having the majority of your collection on an electronic database that is searchable via the web. Even though I am already half way though, I have seen real benefits in answering enquiries quickly and easily. Once everything is migrated I will be spotting areas for development of the collection, looking for potential areas for de-accession while gathering hard data on the collection strengths. It is much easier to raise the profile of the collection and encourage visitors to the collections through schemes such as SYNTHESYS when you can send out messages to list-servers advertising a web link to your collection. Another major advantage is that I now have somewhere to associate the many electronic images and documents that relate to my collections and these are being delivered to the web should I choose to.

 

  • Estimating timescales

The initial data entry from the registers took our two clerks 4 months each to input a total of 100,000 records. In 6 months my assistant created full bibliographic records for the whole dataset and added "irn" references for all of the people associated as either collectors, donors or publishers. The process that has taken longest is my data checking, particularly for the scientific accuracy of the fossil names. I would estimate that I spent between 5 and 10 per cent of my time checking data and preparing import sheets since the project started. I am therefore the log jam! At the current rate we are looking at sometime in 2015 for completion of the entire 100,000 record dataset.

 

DSCF1266_Lyndsey_blog.jpg

Lyndsey Douglas researching full bibliographic microfossil reference details in the Heron-Allen Library

 

  • Costs

Obviously it would be imprudent to show a breakdown of salary costs here so I will just say that at Christmas last year when 36,000 KE Emu records had been created, the cost came to roughly one pound per record. This includes the Micropalaeontology Press fee, salary costs for initial data entry, an assistant for 6 months and for 10 per cent of my time. I have not included other expenses like building and IT overheads. I expect that the final cost per record at the end of the project will be slightly less than a pound per record as the major expenditure of salary for the data entry people and the 6 month posts are now accounted for. The final cost will depend on how long it takes me to finish checking and migrating the data.

 

I may be only half way through importing the 100,000 records, but I would like to think that this project can provide some valuable benchmark data for those planning future projects, suggest some ways of making the process quicker and help with forecasting costs and timeframes.

Giles Miller

Giles Miller

Member since: Apr 21, 2010

This is Giles Miller's Curator of Micropalaeontology blog. I make the Museum micropalaeontology collections available to visitors from all over the world, publish articles on the collections, give public talks and occasionally make collections myself.

View Giles Miller's profile