second-generation nucleotide-sequence databases have adopted a more gene-
centric
perspective, where all the sequence information relevant to a given
gene is made accessible at once. In addition to unraveling the mysteries of
the more traditional GenBank, then, we also show you how to use NCBI’s
Entrez Gene, a good example of a gene-centric database.
These days, nucleotide sequences are routinely determined at the whole-
genome or chromosome scale — at least for microorganisms. We now have
information not only about individual gene sequences, but also about their
relative positions, strand orientation, and the presence or absence of bio-
chemical functions within an entire organism. To take advantage of this more
global information, researchers have had to design state-of-the-art
genome-
centric
sequence-information management systems that can connect special-
ized sequence collections with browsing tools. As an added bonus, this
chapter presents some of the great genome-centric resources dealing with
viruses, bacteria, or (you guessed it) human beings.
However, before you start delving into these information-management systems
in earnest, you may find it useful to read the next section. There, we quickly
summarize the fundamentals of genes and genomes and introduce the vocabu-
lary you need to read GenBank fluently. If you’re an experienced biologist, you
can skip this section completely — or read it word for word in hopes of finding
some embarrassing mistake. Go ahead. We dare you! Conversely, if you want to
learn much more, we advise you to go get
Genetics For Dummies, by Tara
Rodden Robinson, a great companion book for would-be bioinformaticians.
Reading into Genes and Genomes
True, nucleotide sequences are universal, but the structure of the genes they
encode is markedly different between
prokaryotic organisms (organisms lack-
ing a true nucleus) and
eukaryotic organisms (the kind lucky enough to have
a true nucleus). You have to know the basic architecture of both types of
genomes and genes to make sense of a simple GenBank entry. Remember that
we need to define a
gene as the contiguous genome segment encompassing
all the nucleotide-sequence information necessary to bring about its success-
ful
expression — that is, the production of protein or RNA. This handy defini-
tion includes both the coding and regulatory parts of a gene.
Prokaryotes: Small bugs, simple genes
The three most basic classes of living organisms are the prokaryotes (usually
bacteria), the
archaea (bacteria-like organisms living in extreme conditions),
and the
eukaryotes. Eukaryotes go all the way from microscopic yeast to
humans, animals, and plants.
70
Part II: A Survival Guide to Bioinformatics