Introduction

Data models

A model is 'a representation of something' (Homby, 1974). In the technical sense, a model is the medium to record the structure of an object in a more or less abstracted way, following pre-defined and documented rules.

The objective of applying modeling techniques is either to describe and document the structure of an existing object, or to prescribe the structure of one to be created. In both cases, the model can be used to test (physically, or, in most cases, intellectually) the function of the object and to document it, for example for future maintenance. Testing is usually done with the purpose to further refine the model, either to perform like the existing object, or to perform according to the functionality desired for the new object.

In the realm of computer science and the creation of computer programs, the descriptive process may be called 'System Analysis' while modeling the program itself is 'System Design'. In reality, both processes usually go hand in hand: An analysis of existing data and (computerized or non-computerized) functions is done with the aim of creating an information system, handling the data and supporting the functions. The modeling techniques information science provides may roughly be subdivided into three types:

Function modeling is a comparatively rapid method to obtain the specification framework for a computer system. However, information systems mainly based on function modeling techniques tend to be rather rigid when additional functions are to be added later-on.

Data modeling needs a strong, in-depth understanding of the system and its environment. As a consequence, it is rather laborious and needs much more time, especially if the people conducting it do not have intimate knowledge of the field the system is to be used in (Coad & Yourdon, 1991). Object oriented modeling is a rather new technique, thus the modeling procedures provided by information science are not yet fully mature, and neither are the tools used in actual system design.

The data model presented here has been developed over the past three years, based on more than a decade of experience gathered in designing and working with botanical databases. It can be used as a base for the application of object-oriented techniques, and it has actually served as the base of applied function modeling. It uncovers the complexity of the paradigms underlying the classification and naming of plants. Hopefully it will help designers of information systems which incorporate information on organisms to avoid the widely-made error of over-simplification of taxonomic data and the resulting loss in data accuracy and quality.

Scope of the "IOPI Model"

The present information model has been designed with the following aims:

Thus, the model includes a great number of data items which, at first glance, are not related to the checklist project. These are necessary to ensure future extendibility of the system (and there is no obligation to use all possible functions in a program based on this structure). On the other hand, the model includes certain data areas only in the rather rudimentary form asked for by checklist data definitions - the treatment of geography and of references to nomenclatural types can hardly be deemed satisfactory. However, the central object of botanical information is the naming and classification of taxa, and this is covered in full.

History of the model

Throughout the 1980ies, attempts have been made to develop standards for botanical databases, at least to allow for a data exchange between different systems. One example is the International Transfer Format (BGCS, 1987). However, lists of fields or simple data dictionaries have proven to be unable to cope with the intricacies inherent to botanical nomenclature, taxonomy and collection information management.

Data models for botanical collections or taxonomic databases have been developed at various places since 1992, e.g. ASC (1992), Bolton et al. (1992), Sinnot (1993), Wilson (1993), NMNH (1994), and ITIS (1995). All represent attempts to bring order into the complex data structures which are involved when plants are named, collected, classified and investigated as to their properties. Without doubt, many more such unpublished documents exist, and even more systems have been developed without any attempt to publicise the underlying model (be it because the information is considered proprietary or simply because no such model exists outside the actual implementation).

During the 1992 meeting of IOPI and TDWG in Xalapa, the usefulness of models was demonstrated in talks presented by C. McMahon and Berendsohn (1993). The Information System Committee of IOPI agreed on elaborating a detailed data model for checklist data. Berendsohn provided a datamodel developed for the Botanical Garden and Botanical Museum Berlin-Dahlem on a portable PC equipped with a CASE system (Microtool 1990-). Part of this model was adapted to the Data Definition Subgroups provisions during the meetings (IOPI model draft version 1 and 2). The document has since then undergone various changes and some of the resulting drafts (version 3, 4, 5.2 and 6.1) have been made available on the Internet; version 6.0 has been distributed as part of IOPI's Global Plant Checklist project plan (Wilson, 1994).

Already during the meeting in Xalapa it became clear that the task at hand - providing a world checklist of taxa - would require intensive input from large parts of the taxonomic community. A minimal model incorporating only the data prescribed by the Data Definitions would lead to a great amount of data loss which would not be justifiable. Consequently, during subsequent drafts the complexity of the model increased in order to preserve taxonomic information provided by data sources. From a simple hierarchical model (higher taxon, family, genus, species, optionally infraspecies), a multi-taxonomies, multi-taxon level model was developed.

The subsequent IOPI meetings which deserve special mention include: Data Definition Subgroup meetings in Geneva (June 1993), and Berlin (July 1993), Information Committee meetings in Berlin (Feb. 1993, Jan. 1994), Geneva (June 1993), and Washington (Oct. 1993). Since then, the model has remained essentially stable. Some discrepancies have been solved and additions have been included as a result of prototyping efforts, particularly in the course of the PLANTAS III software development (PLANTAS III is an implementation project carried out at the Botanical Garden and Botanical Museum Berlin-Dahlem in cooperation with the Jardín Botánico La Laguna in El Salvador, C.A. and a Berlin-based company, programmer: D. Raisin).

CASE techniques in data modeling and system development

The present document makes extensive use of hierarchical data structure diagrams (DSDs), which greatly facilitate discussion of the model with 'non technical' participants. To depict the results of the data analysis in an abstracted form, entity relation diagrams (ERDs) are used which represent a logical model of the data and their interrelations. The latter can be transformed into an implementable design model in the form of relational model diagrams (RMDs). These types of graphical abstractions of the analysis and system planning represent only 3 facets of the possibilities of a modern CASE system. Function modeling, information flows, module structures, and dialog design are other analysis and design tools offered, which can be based directly on the results of the data analysis here presented.

Furthermore, the complete model held in the CASE system represents the unified efforts of various projects, particularly IOPI, CDEFD ("A common datastructure for European floristic databases") and PLANTAS III. All diagrams, definitions, etc. are held in a common relational data repository by the CASE system. This ensures optimal congruence between the different functional and information-related areas, and the different projects are directly influencing each other. In this way a general view of botanical data (user and research) is forming, which will greatly benefit specialized systems based on it.

For a world checklist project, this CASE model may be extended in the future to include:


Next chapter; Contents of this article; Complete entity list; References cited; Author information. Last updated: June 23, 1995