396
Part IV: Becoming a Specialist: Advanced Bioinformatics Techniques
For instance, here is how you could use an
initial alignment to generate two bootstrap
alignments:
Initial Alignment
Column 1 2 3 4 5 6 7 8 9
seq1
A B C D E F G H I
seq2
A A B B C B A C A
seq3
C C A C B A C A B
Bootstrap Alignment 1
1 1
8 1 2 5 1 8 2
A A H A B E A H B
A A C A B C A C A
C C A C C B C A C
Bootstrap Alignment 2
1
4 5 6 6 3 4 1 7
A D E F F C D A G
A B B C C B B A A
C C B A A A C C C
Note that the first bootstrap alignment contains
Column #1 of the initial alignment four times.
This procedure generates many multiple align-
ments that look more or less like the original.
The purpose of all this is to check whether all
the columns in your initial alignment tell a simi-
lar evolutionary story (that should be the case).
During the next step, each random alignment is
turned into a distance matrix, and each matrix
is turned into a tree. To build the consensus
tree, Phylip takes the average of all the trees it
has generated from the bootstrap alignments.
To assess the quality of each branch in the
consensus tree, Phylip counts the number of
bootstrap trees that contain this branch.
Good branches are those that appear in every
bootstrap tree. They are the ones on which you
can strap your swing!
Take a look at the consensus tree back in Figure
13-12. On this consensus tree, every branch
comes along with a number between 1 and 2.
This value tells you how solid your branch is.
This value falls between 1 and 2 because, in
Steps 8 and 15, we chose to have 2 bootstrap
cycles. Neighbor generated 2 bootstrap trees
with these 2 bootstrap alignments, and it gener-
ated a consensus tree that’s the average of
these 2 bootstrap trees.
In a tree, a branch always separates your data
into two groups: sequences on the left and
sequences on the right side of the branch. The
numbers above the branches in the consensus
tree indicate how many branches exist in your
two trees that split the data in exactly the same
way as the branch you’re looking at.
For instance, if you look at the consensus
tree in Figure 13-12, you find that the branch
containing giraffe and sheep has a value of only
1. The reason for this is that in one of the boot-
strap trees, giraffe and moose are in the same
group — while in the other tree, giraffe is in the
same group as sheep (and moose is alone). In
theory, you could conclude that the tree indi-
cates some uncertainty on whether sheep and
giraffe are more closely related to one another
than each is related to moose. In practice, how-
ever, with only 2 bootstraps, you really can’t say
anything of the kind! You need at least 100 boot-
strap cycles before you start convincing farm-
ers to breed giraffes for their wool.
(continued)
Making a maximum likelihood tree with PhyML
Maximum likelihood trees are considered to be more accurate than other trees
because they produce the tree that is most likely (statistically speaking) to
explain your alignment. In other words, your alignment is a little story that
explains how, starting from one ancestral sequence, a series of mutations