of course, but they appear in front of each other only once. This prevents
Lalign from reporting trivial variations of a very good local alignment.
Figure 8-11 shows you the output of Lalign given the two sequences used in
the preceding steps list. Lalign reports the alignments in a BLAST-like format,
sorted according to their E-values. On the first line associated with each
alignment are the following features:
The percent identity: The proportion of identical residues aligned with
one another. For instance, you can see in Figure 8-11 that the best local
alignment has a percent-identity score of 25.7 percent; the second-best
score is 27.3 percent.
The local alignment length (Overlap): This is the total length of the
local alignment.
The score: This score sums up the cost of the gaps and substitutions, as
given by the substitution matrix and the gap penalties. Generally speak-
ing, the higher the score, the better the alignment — yet be aware that
the absolute value has no clear meaning. The E-value is a better indica-
tor of the alignment’s quality.
The E-value: This value tells you how many times you could have
expected to find such a good alignment by chance, given your two
sequences. Be aware that this E-value is much less meaningful than the
one BLAST reports when searching a database. A good E-value must be
below 10-
4
.
The alignment itself contains three types of information:
The residue index on the line above the sequence. The residue corre-
sponding to an index is the one below the last digit of this index.
The alignment itself, with gaps represented by dashes.
Identity and similarity. In the line between the two aligned sequences,
the (_) symbol means identity, while the (.) symbols mean similarity.
Two residues are similar when their substitution score is greater than 0.
The first alignment reported corresponds to the conserved serine protease
domain. On its own, this alignment isn’t really convincing: It contains less
than 26 percent identity over about 200 residues.
To be really convinced, we would need a much higher similarity (at least
close to 30 percent) and a better (lower) E-value. The reason we may trust
this alignment is because it is consistent with the signal we previously saw on
the dot plot. (Refer to Figure 8-7.) The fact that these two different analyses
give compatible results (Dotlet and Lalign) makes a good case for the exis-
tence of a conserved serine protease domain in our proteins.
259
Chapter 8: Comparing Two Sequences