1.       Experience exporting and importing SDD 1.0

Jacob K. Asiedu & Robert A. Morris

Department of Computer Science, University of Massachusetts at Boston, 100 Morrissey Blvd., Boston, MA 02125, USA.

The proposed standard for the exchange of Structured Descriptive Data (SDD) supports descriptions of taxa and specimens both in natural language and as “CodedDescriptions” conforming to a community constrained vocabulary (“SDD Terminology”) for describing characters and states. Even when a legacy database or data content XML schema was not designed ab initio for such an SDD Terminology, it remains possible for software to deduce a per-datasource Terminology and create instance documents against it. We will describe the methodology used by the Electronic Field Guide (EFG) Project to deduce Terminology, create SDD instance documents, and import SDD documents created by other systems, notably Lucid3 and legacy flora automatically marked up in the Heidorn lab by machine learning techniques.

We used a schema compiler (Castor) to generate marshalling and unmarshalling code (in this case between XML and Java objects) to and from documents valid for our own schema, and to and from those valid for the SDD schema. We will explain what hand crafted software glue between the generated Java methods was required, and will discuss where and how we needed to provide additional metadata required to match the expressive power of SDD to the (weaker) expressive power of the EFG schema. In a single (minor) instance where we needed to invoke an SDD extension mechanism to provide for semantic guidance to SDD import software (including our own) about one issue for which SDD was not expressive enough to signal differences we require. The latter surrounds the use of lists (e.g. of other taxa) associated with a particular taxon. In EFG applications we use lists such as “similar species”, "nectar plants", "host plants", and "herbivores" in a final “verification” stage for taxonomic identification as well as in the construction of descriptive taxon pages. Finally, we will offer opinions on the possible use of SDD as a native XML format for the EFG project.

Links to development applications of this work can be found at http://efg.cs.umb.edu/SDD

This work was funded by the U.S. National Science Foundation.