We find the DOMAIN keyword used again to indicate a serine-rich segment,
and the region of the sequence responsible for the protein kinase activity.
This confirms what we’ve just said about the loose usage of this term in
Swiss-Prot. Let’s keep moving, however, in search of newer terms:
NP-BIND: The numbers here indicate the extent of the nucleotide phos-
phate binding region — from residue 718 to 726 — for binding ATP.
BINDING: Here we can see the precise binding site for ATP — on lysine
at position 745.
ACT_SITE: The numbers here indicate amino acids involved in the activity
of an enzyme.
To the nonspecialist, the three Key field terms listed above address
fairly overlapping concepts.
COMPBIAS: Extent of a compositionally biased (in this case, a serine
rich) region of the protein sequence.
Next comes some information about residues that are chemically modified
after translation (as shown in Figure 4-5). The corresponding keywords are:
MOD_RES: The numbers here indicate residues undergoing phosphory-
lation. Click MOD_RES
to find out the list of chemical modifications
associated with that keyword.
CARBOHYD: Here we can see the numerous residues on which carbohy-
drate molecules have been attached as well as the type of link (C–, N–,
or O–). Note that they’re all in the extracellular domain, as one would
expect. Clicking the CARBOHYD
keyword can tell you much more about
the intricacies of glycosylation.
DISULPHID: The numbers here indicate cysteine pairs forming a bond in
the mature protein; there are plenty such pairs here.
Finally, the Features section ends up with a series of specialized features:
VAR_SEQ: This field is used to indicate amino-acid changes from the trans-
lation of different mRNAs (isoforms) generated by alternative splicing.
VARIANT: This field identifies natural variation of the EGFR protein
sequence, most of which have been associated with lung cancer.
MUTAGEN: In contrast with the previous field, this field is used to
record sequence changes that have been experimentally (voluntarily)
introduced in the protein.
CONFLICT: This feature is used to indicate discrepancies between differ-
ent sources of the same protein sequence — basically errors or unrecog-
nized polymorphisms.
121
Chapter 4: Using Protein and Specialized Sequence Databases