
have a more structured format but are still considered as text files. In this system,
each database record in any component database is processed separately as a collec-
tion of data fields. An index is created for each data field (data field index). This index
classifies each record in the database according to a set of keywords from a con-
trolled vocabulary. Each database will have one index per data field. A different type
of index (link index) is used to link individual databases. A link index is created for
each pair of databases that includes cross-references to each other and these links
are bidirectional. Databases that do not directly reference each other can still be con-
nected by traversing links through intermediate databases. SRS has a unique ap-
proach to addressing syntactic and semantic heterogeneities.
SRS has an object-oriented design. It uses metadata to define a class for a database
entry object and rules for text-parsing methods, coupled with the entry attributes.
SRS incorporates a proprietary parsing language (Icarus) for generating the database
wrappers and another language (SRS query language) for the formulation of queries
(Zdobnov et al. 2002). Data in SRS can be subdivided into sections that correspond
to the main contents of the integrated databases such as DNA sequences, protein se-
quences, mapping data, SNPs, and metabolic processes.
SRS is a keyword-based system. Queries can be combined using logical operators
such as “&” (AND), “|” (OR) and “!” (BUT NOT). An HTML interface is available for
the formulation of queries and for viewing the results of the data retrieval. Thus,
SRS can be used as a front end to independently query multiple data sources.
There are some APIs of SRS version 6 to most widely used programming lan-
guages such as C++ and Java. This allows the development of customized interfaces
to proprietary analysis tools.
11.1.3
EnsMart
EnsMart (http://www.ensembl.org/EnsMart) provides a generic data warehouse sys-
tem (Kasprzyk et al. 2004). The system organizes data from individual databases into
one query-optimized system by the incorporation of the data-warehousing technique
for descriptive data. Currently, it is focused on Ensembl and thus primarily entails
data with genomic annotation such as genes and SNPs, functional annotation, and
expression data. Data are available for nine different species annotated in Ensembl
(Homo sapiens, Mus musculus, Rattus norvegicus, Danio rerio, Fugu rubripes, Anopheles
gambiae, Drosophila melanogaster, Caenorhabditis briggsae, and Caenorhabditis elegans)
(Birney et al. 2004).
EnsMart data are organized around central objects, so-called foci, and additional
satellite data. Currently, two foci exist, gene and SNP, and all additional data are pre-
sented in relation to these foci. EnsMart comes up with three different user inter-
faces. The MartView is an Internet user interface in the “wizard” style. It allows navi-
gating through pages to specify user input. Furthermore, it is used to specify the out-
put format and to handle data export. The MartExplorer is a local database. It is in-
stalled as a program and has a graphical user interface that allows displaying the
371
11.1 Database Networks