Download free books at BookBooN.com
An Introduction to Relational Database Theory
187
Database Design I: Projection-Join Normalization
As we have already seen, equality dependencies are a bit of a challenge for the DBMS and in most
commercial DBMSs existing in 2009 it cannot even be expressed. However, in those same DBMSs the JD
constraint required for ENROLMENT cannot be expressed either, so neither of the designs is fully
implementable in the current technology!
xxi
If we can express both of those constraints, then, on the
evidence so far, there doesn’t seem to be a lot to choose between the two designs and our decision is more
likely to be based on how well the DBMS supports multiple assignments; if we can’t express the
constraints, then the stated requirements cannot be met and we will have to compromise (and perhaps rely
on application code to maintain integrity). But efficacy of constraint checking isn’t the only criterion to
guide our choice. What if, for example, student S1’s name is incorrectly recorded as “Ann” and needs to
be corrected to “Anne”? In the unnormalized design that correction will entail updates to several tuples in
ENROLMENT, whereas in the 5NF design just one tuple in IS_CALLED is affected. On the other hand, a
query to give the names of all the students enrolled on course C1 is simpler to express (and might run
faster) in the unnormalized design. A frequent complaint about rigorous application of 5NF is that the
decomposition causes too many joins to have to be used in queries. Advocates of 5NF respond by pointing
out that the query to find the names of all the students is simpler to express (and might run faster) in the
5NF design!
It does seem that the 5NF design has a certain aesthetic appeal, giving a structure that is reduced to
simpler terms and is thus in a sense more flexible. Also, an abiding motivation for the relational approach
is that in principle the database designer need not anticipate the kinds of queries that will be presented to
the DBMS. A 5NF design “levels the playing field” in keeping with that principle. Moreover, the
discussion so far has been based on the assumption that the single relvar design, from which we inferred
those business rules, is correct, an assumption we might well question.
In particular, we might question BR4: “Every student whose name is recorded is enrolled on at least one
course”. Does the university really require every student to be always enrolled on at least one course, even
during the annual long vacation? What harm comes if that rule is relaxed? In that case the single relvar
design becomes incorrectand in any case we still need that complex and difficult constraint to express
the JD. With the 5NF design the difficult equality dependency becomes replaced by a simple foreign key
constraint to address BR2 and BR3.
Now, if 5NF is indeed a goal to be earnestly pursued, then some questions arise. How does the designer
discover that a relvar under consideration is not in 5NF? How do we spot the rogue JDsthe nontrivial
ones that do not arise simply as a consequence of keys? Well, there is an algorithm, given by Fagin (see
the annotation for reference [12] in Appendix B) for determining whether a given JD is implied by the
keys of the relvar to which it pertains; but to spot the JD in the first place you just have to examine the
business rules, and that’s not always very easy. However, a certain special kind of JD has been identified
such that when it holds in a given relvar r, r is not in 5NF. Not all 5NF-violating JDs are of this kind, but
in practice most of them are. If we can eliminate such JDs we stand a reasonable chance of achieving 5NF
by that action alone and any remaining rogue JDs should be quite easy to detect. A useful theory has been
developed around this special kind of JD, making it possible to mechanize certain important aspects of
relational database design.