Taxonomic Databases Working Group (TDWG)

Annual Meeting 2002

Indaiatuba, Sao Paulo, Brazil

18th – 20th October

 

 

This year’s meeting was hosted by CRIA (Centro de Referencia em Informacao Ambiental), and organised in conjunction with Stan Blum (chair of TDWG). Many thanks go to Vanderlei Canhos, Dora Lange Canhos, and their marvellous staff at CRIA for the wonderful organisation, fabulous food and good entertainment during the whole series of meetings that formed the Trends and Developments in Biodiversity Informatics Forum (17 – 25 October).

 

 

The TDWG meeting began in traditional fashion with an evening reception on Friday 18th October. This was a barbacue by the poolside of the Vitoria Hotel, site of the main meeting (and whole forum), accompanied by the typical Brazilian cocktail, caipirinha and “Choro Bandido”, a group of local musicians, leading to a relaxed and jolly atmosphere!

 

The main meeting began on Saturday, 19th October, with the chairman of TDWG, Stan Blum, presenting an overview of the coming two days – see

http://www.cria.org.br/eventos/tdbi/tdwg/presentations/MtgIntro_TDWG_2002.ppt With respect to possible future cooperation with GBIF (the Global Biodiversity Facility, http://www.gbif.org), the first day was to be composed of the two organisations summarising their activities and aims, so that there could be further discussion on future cooperative developments.

 

Stan started with a brief history of TDWG, stating that is was important to remember that TDWG membership is open to all, in contrast with GBIF. Also in contrast to GBIF, TDWG is a volunteer organisation with a very small budget (mainly from annual membership fees). But the past year had been a fairly significant year for TDWG – at GBIF 4 in Sydney in March 2002, TDWG became an associate member, and there has been a TDWG representative at 3 of the 4 STAG (Scientific and Technical Advisory Group) meetings. Out of each of the GBIF work programmes, standards are necessary, and it is hoped that GBIF will collaborate with TDWG – and the aim of this meeting was to identify standards needed by GBIF, and to set up working groups to achieve these, with the immediate need to set up a work plan to develop the standard and get funding.

 

Another important requirement of this meeting was to complete the voting on two proposals to constitutional amendments, which would allow faster and easier voting on issues in the future see http://www.tdwg.org/2002meet/ballot2002.htm. Votes should be in by 2pm on Sunday 20th October and the results would be presented during the business meeting in the final session of the meeting on Sunday.

 

Stan closed his introduction by a request for offers to host the 2003 meeting (and 2004).

 

He then proceeded to introduce “standards” by presenting some definitions, and listing the existing TDWG standards  (see Stan’s presentation or http://www.tdwg.org/standrds.html) before calling on the various TDWG subgroup convenors to introduce the work of their group and the possible standards they are working on.

 

 

Geography – Rafael Govaerts & Neil Brummit, Royal Botanics Gardens, Kew, UK

 

Unfortunately, neither of the convenors of the geography subgroup had been able to attend the meeting but they had provided a report, and their colleague Mark Jackson, presented a short summary of the report to the meeting. After summarising what the standard was, he reported on the release of the second edition in 2001. There was a short discussion on the need to expand the standard to include marine zones (to be in line with TDWG’s expanded scope of dealing with all organisms, not just plants). The convenors requested help with this and hoped for progress before the next meeting.

 

Stan commented that this one of the older style of standards, typical of what TDWG had produced in the past, and that there was need to find out if these were still applicable and whether they should be maintained.

 

 

Economic Botany - Daphne Christopher, New York Botanical Gardens, USA

 

http://www.cria.org.br/eventos/tdbi/tdwg/presentations/BrazilTDWG.ppt

Daphne’s presentation reviewed the history of the subgroup and the existing “standard”. She said that it was created before many databases existed and many modifications were now needed – the group wanted to make it a true standard so that users didn’t need to make modifications to meet individual needs, but to do so will need input from experts in many different fields e.g. nutrition.

 

Daphne also mentioned that the standard was in use – even by the Bronx Zoo! (For their monkey food database.)

 

 

Access to Biological Collections (ABCD): schema for collections data – Walter Berendsohn, Botanic Garden and Botanic Museum Berlin-Dahlem, Germany

 

http://www.bgbm.org/tdwg/codata/IndaiatubaWorkshop.htm

Walter outlined the history of the group, saying that collections data had always been a topic at TDWG, but there had been a quiet time when everyone was busy setting up databases. Now, it was an active area again, particularly since the Frankfurt meeting, after which CODATA awarded some funding for workshops. The EHNSIN and BIOCASE projects in Europe then helped to spur things on. Initially, there were about 12000 data elements in the schema defined for the initial workshop in Santa Barbara – but this has been reduced to less than 700 elements in the current version by removing those elements that were repetitive, overly complicated, or irrelevant in the context of collections data (e.g. synonyms). A workshop in March continued this work, and the third full workshop was held the day before the main TDWG meeting, with the schema now fairly well defined in hierarchical XML. Although yet to be finalised, there are about 600 elements when it is “flattened out”.  Once the final changes have been made, it will be sent out to a wider community for comments, with a proposed editorial group meeting in December. It could be sent out as a proposed standard for TDWG within 6 months (assuming TDWG constitution changes). The schema will be used in some projects before then (e.g. BIOCASE) but it will help those projects if the schema becomes an accepted standard.

 

Walter also reported that CODATA had assigned the TDWG ABCD subgroup as a Task Group – this will mean continued funding for workshops and continued recognition.

 

 

Access to Biological Collections (ABCD): protocol for distributed queries – Stan Blum, California Academy of Sciences, San Francisco, USA

 

http://www.cria.org.br/eventos/tdbi/tdwg/presentations/DiGIR_overview_TDWG_2002.ppt

Stan described DiGIR, which will provide a single point of access (portal) to distributed information resources, enabling the search and retrieval of structured data and making the location and technical representation of data transparent to users. The DiGIR protocol could be a common language between many different portals and provider layers. But before the protocol is even proposed as a standard, at least 2 implementations of it are needed.

Stan felt that a registry of DiGIR providers was needed, which could hold metadata about the providers (but DiGIR can be independent of this). He also explained that there was no provision for caching whole data sets to provide a backup when providers systems were unavailable – some providers would not want a copy of their data to be kept. So the best solution was to provide a monitoring mechanism to let providers know when there was a problem.

 

 

In answer to a question on the relationship of ABCD and HISPID (see http://www.rbgsyd.gov.au/HISCOM/ ), Jim Croft answered that ultimately the plan would to be merge the two so that they were indistinguishable.

 

 

Taxonomic Names

 

Stan summarised the history of the group saying that there was an existing published botanical standard produced by Frank Bisby in 1995. However, TDWG had since expanded its scope so there was now a need to revise the standard to include other names. And Stan commented that the published standard was a cross between semantics and structure, and wasn’t based in a RDBS. Frank added that it was more of a data dictionary really, as they had tried to keep it format free.

 

Stan said that since many stakeholders in “the names field” were at the meeting, it was certainly time to revisit the issue. He suggested that a names subgroup would meet the following day at the time allocated for subgroup meetings, to decide what needed to be done.

 

 

 

Structure of Descriptive Data (SDD) – Gregor Hagedorn, BBA, Berlin, Germany

 

http://www.cria.org.br/eventos/tdbi/tdwg/presentations/tdwgsdd.ppt

Gregor summarised the history of the group and what its aims were. He stressed the structure in the title – not a particular format. He said that SDD really needed to link to a name identifier – a globally unique (permanent) identifier, and they were looking to one of the catalogue of life projects to provide this.

 

 

Spatial Data Standards – Reed Beaman, University of Kansas, Lawrence, USA

 

Reed introduce the group by saying that it first met in Frankfurt in 1999, then doubled in size in 2000 at the Sydney meeting. He said that one of the aims was to recommend existing standards rather than develop new ones. This year, there would new priorities relating to geo-referencing, and discussion on incorporating GML into the ABCD schema. It was also requested that there be discussion with the geography subgroup.

 

 

TDWG Standards Process – Stan Blum, California Academy of Sciences, San Francisco, USA

 

http://www.cria.org.br/eventos/tdbi/tdwg/presentations/StandardsProcess.ppt

 

Stan summarised the history and aims of this subgroup, saying that the whole constitution and standards processes of TDWG were laid down in a time when computing capabilities were far less than they are now so some revision was needed!

He had been looking at other standards bodies for comparison. And there was definitely a need for revising and retiring procedures as many of the older standards were outdated.

 

Stan asked for volunteers, saying that it would be a painful process!

 

Frank agreed that the standards process was outdated, but said that funding was also necessary - money was required to do a professional job.

 

Hannu Saarenmaa (GBIF) commented that GBIF did need a standards community and that perhaps TDWG and GBIF should formalise a relationship. There was general agreement that this would be a good idea.

 

 

 


GBIF Overview – Review of GBIF Work Programs

 

Digitization of Natural History Collection Data (DIGIT) – Larry Speers

http://www.cria.org.br/eventos/tdbi/tdwg/presentations/Brazil_Tdwg_DIGIT.ppt

 

Larry began by saying that the goal of GBIF was to liberate biodiversity data. But he felt there was also a need to deal with newly gathered data as well as legacy data – proper thought should be given on how best to store it.

 

GBIF would be looking at repatriation of specimen data to the original country. But the short term aims would be to get a baseline estimate of the current state of digitisation, to produce a complete comparative review of what has been done, and to work on a “best practices” handbook to help those about to tackle a digitisation project.

 

GBIF has a commitment to long-term maintenance, to improving quality through time and to the development, adoption and use of community wide standards.

 

The current budget was really only seed money (up to 20%) for 15 – 20 projects. But the projects need to show rapid results as there is a 3year review for GBIF.

 

Larry asked for participation in an “experts” workshop that night, to share experiences of digitisation projects with him.

 

 

Electronic Catalog of Names of Known Organisms (ECAT) – Per Bjorn

http://www.cria.org.br/eventos/tdbi/tdwg/presentations/Bjorn-TDWG_meeting.ppt

 

Per explained that this program had a common goal with ITIS, Species 2000 etc in trying to provide a global checklist of the world’s known organisms, and ECAT would be working specifically with the ITIS/Sp2000 Catalogue of Life (CoL) Consortium.

 

Essential elements for ECAT would be the accepted scientific name + reference, source database, and the latest taxonomic scrutiny, and it would act as an authority file. The aim would be to have it running by the end of next year.

 

Seed money will be used to start up new or finish existing projects, with the aim of having 40% of names in the catalogue by 2005 and 90% by 2013.

 

ECAT would need to reflect existing hierarchies, and to collect common names too. But probably the main part of the project would be to bring the necessary people together at workshops.

 

 

Data Access and Database Interoperability (DADI) – Donald Holbern

http://www.cria.org.br/eventos/tdbi/tdwg/presentations/GBIF_DADI_TDWG_2002_v.1.ppt

 

Donald explained that GBIF didn’t expect everyone to do things the same way, so there would need to be some data exchange etc. between nodes. He saw 3 different classes of nodes: data providers (collections etc.); participants (countries, organisational members) and GBIF portals, with certain GBIF services centralised.

There would need to be indexing of what the holdings were, to help users locate information at runtime. But the attribution of data was very important, and it could be used as a mechanism to avoid display of duplication of records.

 

Donald asked for ideas on what users requirements would be.

 

He said it was important to get the standards settled to a degree where they could be used in projects around the world. He thought that it was important to give some thought to funding of projects – the possible problem of other funding agencies not giving funding to this type of work because it was GBIF’s area – since this would be detrimental in the long run.

 

Donald listed some of the many standards requirements GBIF had, with possible TDWG solutions:

 

            Federated Data Access (DiGIR)

            Collection Data Exchange Schema (ABCD)

            Name Data Exchange Schema (Taxonomic Names)

            Geographical Services (Spatial Data)

 

and, more in the future, there would be a requirement for the SDD standard.

 

 

Outreach and Capacity Building (OCB) – Beatriz Torres

 

http://www.cria.org.br/eventos/tdbi/tdwg/presentations/tdwg_Beatriztorres.ppt

 

Beatriz described some of the perceived users and providers of GBIF, saying that they wanted to make a greater awareness of GBIF, and had plans to increase GBIF membership. She said there was a need to provide adequate support to the nodes, and that forging synergies was important. They saw a need for providing training courses for network nodes. In addition to repatriation of data and information, they saw intellectual property rights as an important issue, with a workshop planned for the end of 2003.

 

 

 

GBIF and TDWG – implications for standards work

 

This discussion was the final session of the first day of the TDWG meeting.

 

It was felt that TDWG standards should be maintained by TDWG, but that GBIF could provide some help, particularly with electronic communications (hosting web site and mailing lists) and secretarial support for actually producing the standards. But a more formal agreement should be made with the GBIF executive and the TDWG executive before deciding on details of who needs to be on what committee/subgroup. There would be a need for liaison/monitoring between GBIF work groups and TDWG groups, and GBIF needed to show progress so it was important that the two groups worked together quickly. Meredith Lane (GBIF PR and Communications Officer) described the detail of GBIF organisation, saying that the sub-committees could incorporate TDWG members.

 

There was general agreement that the TDWG chairman, Stan, would discuss further relationships with GBIF, and that the standards needing immediate attention included the two worked on by the ABCD group, and the names standard.

 

 

 

Sunday, 20th October

 

The second day of the 2002 TDWG meeting began with several presentations:

 

The INRAM Biodiversity Database Project - Christopher Frazier, Institute of Natural Resource Analysis and Management, Albuquerque, USA

http://www.cria.org.br/eventos/tdbi/tdwg/TDWG_abstracts.html#frazier

http://www.cria.org.br/eventos/tdbi/tdwg/presentations/The_INRAM_Biodiversity_Database_Project.ppt

Christopher described this project, funded to create an infrastructure for biodiversity work in the state. It is a collaborative project, using open source technology whenever possible.

 

 

Bioinformatics in the New York Botanical Garden: an update – Melissa Tulig, Emily Ashley and Barbara M. Thiers, New York Botanical Garden, USA

http://www.cria.org.br/eventos/tdbi/tdwg/TDWG_abstracts.html - tulig

They described how they were databasing specimens in collections, and including georeferencing. They are mounting database collections from smaller institutions, but these are maintained by the provider. And their library digitisation project includes an image of every page plus double-keyed text, so it is fully searchable. Online interactive maps are hoped to be provided in the future.

 

 

Data Access – Challenges and Opportunities – Charles Hussey, Natural History Museum, London, UK

http://www.cria.org.br/eventos/tdbi/tdwg/TDWG_abstracts.html - hussey

http://www.cria.org.br/eventos/tdbi/tdwg/presentations/TDWG02_CGH_DATA_ACCESS.ppt

Charles described how there were as many problems with personal names as taxonomic names in databases, and with place names – how they were recorded, varying in language and spelling.

 

 

Taxonote: a platform independent personal notebook for nomenclatural research – Nozomoi (James) Ytow, [Hiroshi Kajihara, David R. Morse and David McL Roberts,] University of Tsukuba, Japan

http://www.cria.org.br/eventos/tdbi/tdwg/TDWG_abstracts.html - ytow

http://www.nomencurator.org/

“James” described this tool, written as a single Java file so that it is easy to install, and to control users’ access. It supports multiple data repositories, and uses open source technology (the source code is available on the web).

 

 

 

 

Search Features over Semi-Structured Taxonomic Documents – P. Bryan Heidorn, University of Illinois, USA

http://www.cria.org.br/eventos/tdbi/tdwg/TDWG_abstracts.html#heidorn

http://www.cria.org.br/eventos/tdbi/tdwg/presentations/SearchTools_Bryan.ppt

Bryan described how he used open source software but modified it to use XML. His software used a thesaurus to expand user queries to use the terminology in the database.

 

 

Intelligent Components for Systematics – Guillame Rousse, Université Pierre et Marie Curie, Paris, France

http://www.cria.org.br/eventos/tdbi/tdwg/TDWG_abstracts.html#rousse

http://lis.snv.jussieu.fr/~rousse/recherche/TDWG2002/

Guillame illustrated his talk using the RefTax project as an example of a federation of heterogenous databases searchable by XML mediation over http.

 

 

International and Special Characters in Scientific Data – Adrian Rissone, Natural History Museum, London, UK

http://www.cria.org.br/eventos/tdbi/tdwg/TDWG_abstracts.html#rissone

http://www.cria.org.br/eventos/tdbi/tdwg/presentations/TDWG2002_Rissone.ppt

Adrian talked about the problems with character representation and described how even the latest “standards” are still not well supported.

 

 

Unicode Processing for BIOCASE – Anton Güntsch and Wolfgang Lipp, Botanic Garden and Botanical Museum Berlin-Dahlem, Berlin, Germany

http://www.cria.org.br/eventos/tdbi/tdwg/TDWG_abstracts.html - guentsch

Anton described how they are now looking at the programming language Python for this project.

 

 

After lunch, the meeting attendees split into groups for the subgroup meetings, followed by the TDWG business meeting.

 

A summary of what came out of the Spatial Data subgroup meeting, and subsequent work, can be found at

http://www.biogeomancer.org/tdwg/TDWG-Spatial-Data-Subgroup-Agenda-Actions-Oct-2002.html

 

Notes on the discussion held in the Names subgroup can be seen at

http://www.tdwg.org/2002meet/names-subgroup-discussion.htm .

 

 


TDWG business meeting:

 

The Treasurer’s report was proposed, seconded and accepted.

The two proposed changes to the constitution were both accepted by the voting members. The amended constitution will appear on the web site shortly.

The standing officers were all prepared to remain in office and no alternatives were suggested so the current officers were all re-elected.

 

The 2003 meeting may be held in Portugal, but discussions were still being held with the possible hosting organisation and news will be posted on the web site as soon as possible. Volunteers for a programme committee were requested – to help organise the meeting. This would involve compiling a scientific programme, inviting speakers and making the announcements as soon as possible.

The tentative dates for 2003 are:

 

            Nov 5 – 7        subgroup meetings

            Nov 8 – 9        TDWG Meeting

            Nov 10 – 11    symposium

 

There was a proposal that the 2004 meeting should be held in the United States – invitations were sought!

 

Thanks were made to CRIA – to Vanderlei Canhos and his staff for the excellent organisation of the meeting.

 

Karen Wilson proposed a vote of thanks to Stan Blum for moving TDWG forward to a more formalised standards body and looked forward to the next year of his chairmanship. This was heartily applauded.