Discovering New Domains
in Your Proteins
Hunting for new domains is a bit of an art, mastered by only a few highly
trained biologists around the world. But the good news is that everything you
want to know is at hand — if you fancy giving it a try. The simplest way is to
use BLAST, and turn your database search into what BLAST gurus call a PSSM
(and simple folks call a domain). Read the BLAST chapter (Chapter 8) if you
are not familiar with this tool, and you will find everything you need to build
and use PSSMs at the following online address:
www.ncbi.nlm.nih.gov/blast/blastcgihelp.shtml#pssm
If you go this route, you can do everything online, but don’t expect miracles.
Domains are like diamonds, scattered here and there in the protein world.
While you can expect to occasionally stumble by chance on a nice gem, it will
take more than that to uncover the Crown Jewels! If you want to go down that
road, you shouldn’t mind running a few programs on Unix, digging for
sequences in odd DNA sequence databases (like EST databases for instance,
see Chapters 3 and 4), gathering your sequences with BLAST, aligning them
with a Multiple Sequence Alignment program (see Chapter 9), turning your
alignment into a Hidden Markov Model (HMM) with the Hmmer program
(
hmmer.wustl.edu/), and using Hmmer to search protein databases. This
is what folks at Pfam do everyday.
They say the Eskimos have 40 words for snow, and can describe even the tini-
est differences. It’s similar with biologists; you won’t be surprised to hear
that they have more than one word for protein domains. In the context of a
biological paper, you can assume that the words
HMM, PSSM, profile, domain,
MSA, weight matrix,
and extended profile mean roughly the same thing.
More Protein Analysis for
Free over the Internet
The Internet offers an extremely large number of resources for doing sequence
analysis online — and they’re free; we’ve listed a few in Table 6-2. The follow-
ing links are only a sample of the most stable sites available to you.
In general, if your work depends on one of these sites, we suggest that you
choose a very stable resource. As a rule, avoid sites that run from a personal
home page (
www.something.somewhere/~somebody) as they’re generally
less reliable.
194
Part II: A Survival Guide to Bioinformatics