Making a Multiple Protein Sequence
Alignment with ClustalW
Besides running database searches to identify similar proteins pair by pair,
the second most common bioinformatic task that biologists like to perform
with protein sequences is a multiple alignment.
Multiple alignments consist in
lining up many similar proteins side by side for the sake of comparison.
Multiple alignments are used to
Identify sequence positions where specific amino acids really matter for
the structural integrity or the function of a given protein
Define specific sequence signatures for protein families
Classify sequences and build evolutionary trees
We detail all the intricacies of making good multiple alignments in Chapter
9. At this point, however, we simply want to show you that performing a
multiple alignment is really easy — especially when you have the pleasure
of using some nice Internet server, such as the one maintained by the Protein
Information Resource (PIR) people at Washington, D.C.’s Georgetown
University.
The PIR actually originated from the
Atlas of Protein Sequences, the first pro-
tein-sequence collection (which was built by the late Prof. M. Dayhoff in the
late 1970s). The PIR site offers some useful protein analysis tools and data-
bases that we invite you to explore by yourself. Among these tools, it offers a
multiple-alignment server (running the standard ClustalW program) that is
really easy to use for beginners.
Do the following to get your feet wet using ClustalW:
1. Point your browser to pir.georgetown.edu.
The PIR home page appears, as shown in Figure 2-29.
2. Under the Search/Analysis heading, choose Multiple Alignment from
the drop-down menu to display the input form.
The input form appears. At this point, you need a few FASTA-formatted
protein sequences.
3. Open the dUTPase FASTA-formatted sequence file that you created on
your PC in the previous section of this chapter.
62
Part I: Getting Started in Bioinformatics