REFERENCES 87
More sensitive sequence analysis can be obtained by dynamic programming
methods. In part they are used after the diagonals are located in the FASTN
and FASTA programs. Here similar sequence elements are aligned with
positive scores, and dissimilar elements are aligned with negative scores.
Complicating the analysis are insertions and deletions, which also receive
negative scores. The challenge of the problem is to arrange two sequences into
the maximum number of scoring alignments. Additional diffi culty arises from
the fact that slightly similar regions of DNA or protein sequences might lie in
otherwise unrelated sequences. Despite the complex nature of the problem,
an effi cient algorithm (Smith and Waterman, 1981 ) has been devised and is
widely used.
The problem of sequence comparison creates a related statistical problem
of estimating p - values (attained signifi cance levels) for the alignment scores.
The set of possible alignment scores from two sequences are dependent
random variables, since they result from overlapping sequence segments.
Another area of mathematical research that will be stimulated by biology is
the probabilistic theory of discrete and dynamic structures. While the scattered
beginnings of this fi eld have extended over the past three decades, the major
developments are yet to come. Illustrative developments in the fi eld include
random graphs and random directed graphs, interacting particle systems, sto-
chastic cellular automata, products of random matrices, and nonlinear dynami-
cal systems with random coeffi cients. For example, Erd ö s and R é nyi (1960)
created the fi eld of random graphs to model apparently random connections
in neural tissue. Erd ö s and R é nyi discovered numerous examples of “ phase
transitions, ” and many more have been discovered since (see Bollob á s, 1985 ).
REFERENCES
Altschul , S. F. , and Erickson , B. W. ( 1985 ). Signifi cance of nucleotide sequence align-
ments: a method for random sequence permutation that preserves dinucleotide and
codon usage . Mol. Biol. Evol. , 2 , 526 – 538 .
Altschul , S. F. , Gish , W. , Miller , W. , Myers , E. W. , and Lipman , D. J. ( 1990 ). Basic local
alignment search tool . J. Mol. Biol. , 215 , 403 – 410 .
Arratia , R. , Goldstein , L. , and Gordon , L. ( 1990 ). Poisson approximations and the
Chen – Stein method . Stat. Sci. , 5 , 403 – 434 .
Bollob á s , B. ( 1985 ). Random Graphs . Orlando, FL : Academic Press .
Chenna , R. , Sugawara , H. , Koike , T. , Lopez , R. , Gibson , T. J. , Higgins , D. G. , and
Thompson , J. D. ( 2003 ). Multiple sequence alignment with the Clustal series of
programs . Nucleic Acids Res. , 31 ( 13 ), 3497 – 3500 .
Chothia , C. , and Lesk , A. M. ( 1986 ). The relation between the divergence of sequence
and structure in proteins . EMBO J. , 5 ( 4 ), 823 – 826 .
Deken , J. ( 1983 ). Probabilistic behavior of longest - common - subsequence length . In:
D. Sankoff and J. B. Kruskal (Eds.), Time Warps, String Edits and Macromolecules: