Skip navigation
Currently Being Moderated

Plant Challenge - reporting from Scotland

Posted by Tiina on Jul 29, 2012 4:49:11 PM

Day 1 and 2 from Edinburgh


Similar to Sandy I was impressed by the plant inspired ending of Friday night’s ceremony! Whilst watching the inspiring event, I was looking through our database and seeing how big my part of the task ahead really is going to be…


As Sandy explained, simple differences in typing collectors names can result in two names being allocated to a single person – like the example of A. Fernandez and A. Fernández. The accent makes all the difference to the computer! The implications of such typo’s or spelling differences are what I’ll be focusing on this week.


Let me give an exaxmple of job as the “duplicate hunter” as I have named myself. If, for example, specimen “Fernandez 212” is being entered to our database, the database performs an automated check if other specimens (also known as duplicates) of the collection event already have been entered or not. If another duplicate of the collection event has been entered as “Fernández 212” with the accent on the a, whilst the one being entered is missing it, these two specimens will become part of two separate collection events… Again we can’t blame the computer, as the names are not exactly identical!


So I went into our database and checked how many collection events are identical based on collection number (for example 212) and collection date (day, month and year). As collection number and dates are numerical, typo’s caused by alternative spellings do not generally cause issues (although see below), meaning that identical entries can be identified easily.


Using the above ever-so-clever but simple technique, I identified 1839 records that are potentially duplicated. Of course there is a large list of collections that are not true duplicates although they appear on our suspected list. These are collections that have, just by chance, same number and collection dates. A mere 1549 of the 1839 suspected are collections that lack number, which are all labelled with number “s.n.” according to old tradition as “s.n.” means “without number” in latin. What the letters s.n. truly stand for escapes me now – s. = sin, but n. = numero or numerus? Latin speakers will be able to help me out here…


Prior to our Plant Challenge, I did a spur of duplicate spotting in our database over one quiet day. I found out that there are several errors leading to duplications. Spelling mistakes or alternative spellings of collectors’ names is one reason, but alternative spellings of numbers is another reason, although small I grant you. There is a set of numbers which have been entered with an unnecessary 0, such as “012” which appears simply as “12” in another duplicate entry. I plan to tackle these duplicates by filtering all collection numbers with “0” and then sorting in numerical order. There seems to be an additional 100 or so records to check there.


And lastly there are ones where duplicates appear simply as identical duplicates. These are ones where collectors name and collection number appear perfectly identical, and truly are. Although we try to elimanate entering duplicate records, it always happens, somehow


Quite impressively, I have now tagged myself a list of 11 388 collection events to check and go through!!! By no means will all of these records represent true duplicate entries – our data set is relatively clean we believe – but one never knows …

Comments (1)