Looking at sequence features in 3-D
On its own, a protein sequence is an important piece of information, but it
becomes more revealing and valuable when you compare it with others from
a variety of different species. Chapter 9 insists on the role of multiple align-
ments to identify the most significant regions of a sequence: Conserved
residues (or, alternatively, highly variable ones) are often key to predicting
or understanding a protein function. Going further, by precisely locating
these conserved residues in space, it’s possible to come up with additional
clues as to their biological roles: Such residues, for example, can delineate a
cavity at the active site of an enzyme — or, if they are found at the surface,
one can assume they are good candidates for interactivity with other mole-
cules, and so on.
We can use the example of the TolB protein family (of relatively unknown
function) to illustrate the interplay between multiple alignments and struc-
tural analysis. Here’s how it’s done:
1. First fetch several TolB homologue sequences from various bacterial
species from NCBI:
a. Open a window in your browser and go to www.ncbi.nlm.
nih.gov
.
b. Choose Protein from the drop-down Search menu.
c. In the query window, type in the following identifiers:
NP_360043
(
R. conorii), NP_415268 (E. coli), NP_404737 (Y. pestis), NP_249663
(
P. aeruginosa), and NP_438543 (H. influenzae); then click Go.
d. When the answer is returned, change the Display format to FASTA.
e. Finally, use Send to Text to get rid of any parasitic characters.
Five TolB protein sequences are now ready for use. Now we can build a
multiple alignment out of these sequences.
2. Open a new browser window and go to www.igs.cnrs-mrs.fr/
Tcoffee/
.
3. Click on Regular in the very top TCOFFEE option line.
4. Copy the five TolB sequences from the other browser window and
paste them into the Tcoffee input window.
Be sure to include their FASTA headers.
5. Click Submit (at the bottom of the form).
The Tcoffee Results form automatically appears after an intermediary
waiting page.
343
Chapter 11: Working with Protein 3-D Structures