
shared-interest communities to define their own tag sets
and schemas in XML to create their own markup language.
Thus, we have ChemML, BioML, StatML, MathML, etc.—
hundreds of languages that are easy to create and modify
on top of XML, and serve as effective local standards.
We can think of XML as comparable to English, and each
of the specialized languages like the professional jargon
used by various disciplines.
Definitions in XML
A markup language is a way of indicating, in a document,
any items of interest, including items such as headings,
paragraph boundaries, and highlighted concepts. Popular
markup languages include LaTex for document processing
andHTMLforWebpageconstruction.Mostmarkuplanguages
define a set of tags with associated meanings . For example, the
tag
<P> in HTML indicates the beginning of a new paragraph.
As noted, XML stands for eXtensible Markup Language,
and was explicitly designed from the ground up with
extensibility in mind. There are no predefined tags in XML.
A tag
<P> can refer to a paragraph boundary as in HTML,
or to something entirely different, such as a price attribute.
Obviously, markup is not very useful if it does not have
meaning. The expectation is that groups of users will define
sets of tags for which they agree on a shared meaning.
This has facilitated the proliferation of XML-based markup
languages, one for each application niche and user
community, as described above.
An XML document is said to be well formed if (1) it has a
matching end tag for every start tag, and if this start–end
pair is properly nested either completely included in,
completely including, or completely nonoverlapping with
every other start–end tag pair, and (2) it has a “root” tag pair
enclosing the entire document. Note that well-formedness
is a purely syntactic property—it says nothing about what
the tags are or what they mean. See Figure 9.1.
To be able to understand an XML document, one needs to
kno
w what the structure of the document
is and what tags it
contains. Such information about the structure of each doc-
ument type is stated in a Document Type Definition (DTD).
The notion of a DTD was first introduced in an influential
Chapter 9 XML AND WEB DATABASES 163