You have many ways of landing on the Swiss-Prot database (or the UniProt
knowledge base, as they also call it now!). Most genomic databases, genome
browsers, or sites running a sequence retrieval system (SRS) link you directly
to the relevant Swiss-Prot entry. If you want to know how to query Swiss-Prot
to find an entry by keyword, go to Chapter 2, where we explain how to do this
in some detail.
106
Part II: A Survival Guide to Informatics
Swiss-Prot: A personal vision of the protein world
Although GenBank and Swiss-Prot occupy sim-
ilarly central roles for the international commu-
nity of biologists, their overall philosophies
aren’t quite the same. GenBank, as a primary
sequence repository, obeys a relatively strict
historical point of view. In GenBank, the authors
have full authority over the content of the
entries they submit. GenBank annotators are
only responsible for the recently introduced
RefSeq entries — the ones they derive from
their own expert analysis of the community-sub-
mitted entries. However, the RefSeq entries
never replace the original GenBank entries, and
all these layers of information are maintained
side by side.
In contrast, Swiss-Prot is not a repository data-
base but a derived information resource, con-
veying the vision of its head and founder, Amos
Bairoch (helped out by a group of experts). As a
consequence, the only truly current Swiss-Prot
version is the one on Amos’ portable computer.
This idea of “personal vision” also means that
Amos Bairoch doesn’t need anybody’s permis-
sion to correct or change a Swiss-Prot entry on
the spot. To do this, Amos simply needs to be
convinced by an expert, or by his own evalua-
tion of the literature, that a change is necessary.
Believe us, we have seen this happening on our
own kitchen table!
The benefit of such a philosophy is obviously
great when it comes to flexibility; blatant errors
can be removed very quickly while hot discover-
ies are incorporated immediately. This is the
reason why Swiss-Prot is considered the best-
annotated protein database, and why it occupies
such a pivotal role in molecular biology and
genomics. The downside involves the (inevitable)
upper limit of what the best researcher in the
world can do (or supervise), even when helped
by a large team of annotators — at the European
Bioinformatics Institute (EBI) in Hinxton as well
as the Swiss Institute of Bioinformatics (SIB) in
Geneva — and a circle of experts. These days,
Swiss-Prot has troubles coping with the present
rate of new (nucleotide) sequence determination
and is falling behind in terms of completeness.
To alleviate this problem, a Swiss-Prot buffer
has been created that is called TrEMBL (for
automatic
TR
anslation of
E
uropean
M
olecular
B
iology
L
aboratory nucleotide sequences).
TrEMBL entries are generated at the EBI from
GenBank submissions and annotated mostly
automatically, using sequence similarity as a
main criterion. Upon visual inspection, manual
correction, and final approval by Amos Bairoch,
TrEMBL entries are then converted into
bona
fide
Swiss-Prot entries.
The idea that such a key worldwide information
resource rests on a single’s man shoulders may
come as a shock to you. However, this sort of
thing has been quite common since the early days
of bioinformatics. Among other famous examples
of one-man shows, we can cite Elvin Kabat ‘s
Immunoglobulin sequence database, Richard J.
Roberts’ Restriction Enzyme Database, or Victor
McKusick’s database of human genetic dis-
eases (Online Mendelian Inheritance in Man,
OMIM). To paraphrase Sir Winston Churchill: “In
the field of bioinformatics, rarely have so many
owed so much to so few!”