64.   Introducing SPIDA-web: An automated identification system for biological species

Kimberly N. Russell, Martin T. Do & Norman I. Plantick

Division of Invertebrate Zoology, American Museum of Natural History, Central Park West at 79th St., New York, NY 10024, USA (krussell@amnh.org, mtd001@attglobal.net, platnick@amnh.org).

We are currently developing an Internet-accessible automated identification system that uses artificial neural networks to make identifications of species based on digital images encoded using wavelet transformation. We call this system SPIDA-web (SPecies IDentification, Automated and web accessible). Our goal is to create a system that can identify any species in a particular family, or from a particular area, without requiring the user to have more than the most basic knowledge of the organism to be identified. This has the potential to drastically improve the efficiency and scope of biological inventories, and subsequent monitoring efforts.

Our test group for the prototype is one of the world’s most diverse: spiders, order Araneae. As our test case, we are developing an identification system for the spider family Trochanteriidae using images of the external genitalia (ventral view of the female epigynum, ventral and retrolateral views of the male palp). This recently revised family of Australasian ground spiders includes 121 species in 15 genera. Although the work is not yet complete, results from the females indicate that SPIDA will be able to classify images to species with 90–99% accuracy when sufficient numbers of replicate specimens are available for training. When few specimens are available, however, results are at best unpredictable. Once the identification engine is complete (i.e., there is a trained neural network for each species in the group), the system will be connected to the Internet using a sophisticated web interface. This will allow users to submit minimally processed images via a web page and receive back identifications and information on matching species from a database, including images, drawings and distribution maps. Submitted images need only be 256x256 pixels or better and cropped square. The accuracy of the identifications is dependent on the quality and quantity of the training set of images.

We foresee SPIDA-web as an evolving system, making use of submitted images to continually improve accuracy. The feasibility of larger systems (or, more realistically, hierarchically linked arrangements of smaller systems) remains to be tested. But a successful prototype could pave the way for widespread use of this technology in studying diverse taxa, and lead to a subsequent explosion of knowledge about the species composition of our biosphere.