look at the alignment between the profile and your sequence. Unfortunately,
InterProScan doesn’t let you do this, which is why you also need to use the
other servers (such as CD-Search and Motif-Scan) that we describe in the
next section.
Finding domains with the CD server
The CD (Conserved Domain) server of the NCBI follows the same principle as
the InterProScan. The main advantage of the CD server is that reported hits
come with a score that helps you discriminate the good from the spurious
matches. On the downside, the CD server doesn’t integrate as many data-
bases as InterProScan, although it contains quite a few domains contributed
by the NCBI that you can’t find anywhere else.
If you’re interested in giving the CD server a try, here’s how you’d go about it:
1. Point your browser to www.ncbi.nlm.nih.gov/Structure/cdd/
wrpsb.cgi
.
You can also access this server from the main NCBI BLAST page. Point
your browser to
www.ncbi.nlm.nih.gov/BLAST/ and click the
Sear
ch the Conserved Domain Database using RPS-BLAST link. This
hyperlink is the fourth one in the second column.
187
Chapter 6: Working with a Single Protein Sequence
A common mistake when scanning domains
Don’t forget that when a domain server reports
some hits between your sequence and a domain,
this information always results from an alignment
between the domain (profile) you’ve found and
the sequence you’re using for comparison. If the
program believes the alignment is good enough,
it will report the match. Otherwise it will not.
Unfortunately, neither the profile nor the pro-
grams are perfect — and mistakes occur. The
server can tell you that your protein contains a
specific domain when that isn’t true, or it may
fail to report a domain that
is
present in your
protein. If you trust the results blindly, there’s
always a chance you may get it wrong.
The InterProScan server isn’t very helpful when
it comes to avoiding that type of mistake: It
doesn’t report the score of the hits and does not
even display the alignments. The other servers
we show you in this chapter (CD-Search and
Motif Scan) give you a score along with the hits.
This score has a statistical meaning, and it
informs you how likely it is that your match may
have occurred by chance only and is devoid of
a biological meaning.
Conservative interpretations (that is, only
believing very high scores) are almost always
correct. On the other hand, if the score isn’t so
good — or if only a portion of the domain matches
your sequence — you need additional evidence
to make sure that what you see is real.