Taking your multiple alignment further
One sentence summarizes what you really want from your multiple align-
ment: You want to identify important positions! You want to find the amino
acids that are not allowed to mutate, those that you find conserved even
when aligning distantly related proteins.
Consider, for instance, the alignment in Figure 9-9. You can use any ClustalW
or Tcoffee server to generate this multiple sequence alignment using any
server you fancy. It is a good alignment: It contains distantly related proteins,
and it is beginning to tell us a nice story about the various components of our
protein family. For instance, we can clearly see that the N-terminus region
seems to be more conserved than the C-terminus region. In the N-terminus,
we can see a short stretch of highly conserved amino acids that make this
region a good candidate for being a binding or an active site.
This is interesting — but it isn’t enough to make a big story. This alignment still
contains too many conserved positions for a detailed analysis. At this point,
what we could do is add a few distantly related sequences, one by one, and
carefully check the effect of these sequences on the overall alignment quality.
More specifically, we want to make sure that these distantly related sequences
actually enhance existing patterns rather than completely destroying blocks.
Here is a possible strategy to further reveal the evolutionary constraints
within our protein. This strategy relies on the integration within the multiple
alignment of precisely those sequences that BLAST reported as marginal hits
when we first scanned Swiss-Prot for homologues of the human parvalbumin.
(If this all sounds a bit unfamiliar, take a look at the “Selecting sequences on
the ExPASy server” section, earlier in this chapter.) You don’t actually need to
rerun BLAST to use this example; we give you the info you need to know. First
and foremost, gather your sequences as follows:
1. Point your browser to www.expasy.ch/sprot/sprot-retrieve-
list.html
.
The Swiss-Prot/TrEMBL: Retrieve a List of Entries page appears.
2. In the Format line, select the FASTA radio button.
3. Enter the accession number of your sequences in the Sequence window.
Enter one accession number per line. For our example, we entered
P20472, P80079, P02626, P02619, P43305, P32930, Q91482, P02620,
P02622, P02586.
P02586 is the TPCS_RABIT, the troponin C of rabbit. In the BLAST of the
human parvalbumin against Swiss-Prot that we used to select the other
sequences, BLAST reported this hit with an E-value of 5!
On its own, this result is not interesting — but now that we have a multi-
ple sequence alignment, we can see whether this rabbit can tell us some-
thing about our human protein.
294
Part III: Becoming a Pro in Sequence Analysis