Apago PDF Enhancer
in segmental duplication. Blocks of similar genes in
the same order are found throughout the human
genome. Chromosome 19 seems to have been the
biggest borrower, sharing blocks of genes with
16 other chromosomes.
Multigene families. As more has been learned about
eukaryotic genomes, many genes have been found to exist
as parts of multigene families, groups of related but
distinctly different genes that often occur together in
clusters. These genes appear to have arisen from a single
ancestral gene that duplicated during an uneven meiotic
crossover in which genes were added to one chromosome
and subtracted from the other. These multigene families
may include silent copies called pseudogenes, which are
inactivated by mutation.
Tandem clusters. Identical copies of genes can also be found
in tandem clusters. These genes are transcribed
simultaneously, increasing the amount of mRNA available
for protein production. Tandem clusters also include
genes that do not encode proteins, such as clusters of
rRNA genes.
Noncoding DNA in eukaryotes
One of the most notable characteristics is the amount of non-
coding DNA they possess. The Human Genome Project has
revealed a particularly startling picture. Each of your cells has
about 6 feet of DNA stuffed into it, but of that, less than 1 inch
is devoted to genes! Nearly 99% of the DNA in your cells is
non-protein coding DNA.
True genes are scattered about the human genome in
clumps among the much larger amount of noncoding DNA,
like isolated oases in a desert. Seven major sorts of noncod-
ing human DNA have been described. (Table 18.1 shows
the composition of the human genome, including noncod-
ing DNA.)
Noncoding DNA within genes. As discussed in chapter 15 , a
human gene is not simply a stretch of DNA, like the letters
of a word. Instead, a human gene is made up of numerous
fragments of protein-encoding information (exons)
embedded within a much larger matrix of noncoding DNA
(introns). Together, introns make up about 24% of the
human genome and exons less than 1.5%.
Structural DNA. Some regions of the chromosomes remain
highly condensed, tightly coiled, and untranscribed
throughout the cell cycle. Called constitutive
heterochromatin, these portions tend to be localized around
the centromere or located near the ends of the
chromosome, at the telomeres.
Simple sequence repeats. Scattered about chromosomes are
simple sequence repeats (SSRs). An SSR is a 1- to 6-nt
sequence such as CA or CGG, repeated like a broken
record thousands and thousands of times. SSRs can arise
from DNA replication errors. SSRs make up about 3% of
the human genome.
Segmental duplications. Blocks of genomic sequences
composed of from 10,000 to 300,000 bp have duplicated
and moved either within a chromosome or to a
nonhomologous chromosome.
codons (TAA, TGA, or TAG) for a distance long enough to
encode a protein. This coding region is referred to as an
open reading frame (ORF). Although ORFs are likely to
be genes, they may or may not actually be translated into a
functional protein. Among putative genes, families of genes
can be identified based on common domains. For example,
genes in the HOX family have a conserved, 180-bp sequence
called the homeobox, which encodes the homeodomain re-
gion of certain transcription factors. Sequences for potential
genes need to be tested experimentally to determine whether
they have a function.
The addition of information to the basic sequence infor-
mation, like identifying ORFs, is called sequence annotation.
This process is what converts simple sequence data into some-
thing that we can recognize based on landmarks such as regions
that are transcribed and regions that are known or thought to
encode proteins.
Inferring function across species: the BLAST algorithm
It is also possible to search genome databases for se quences
that are homologous to known genes in other species. A re-
searcher who has isolated a molecular clone for a gene of
unknown function can search the database for similar se-
quences to infer function. The tool that makes this possible
is a search algorithm called BLAST (which stands for Basic
Local Alignment Search Tool). Using a networked computer,
one can submit a sequence to the BLAST server and get
back a reply with all possible similar sequences contained in
the sequence database.
Using these techniques, sequences that are not part of
ORFs have been identified that have been conserved over mil-
lions of years of evolution. These sequences may be important
for the regulation of the genes contained in the genome.
Using computer programs to search for genes, to
compare genomes, and to assemble genomes are only a few of
the new genomics approaches falling under the heading
of bioinformatics.
Genomes contain both coding
and noncoding DNA
When genome sequences are analyzed, regions that encode
proteins and other regions that do not encode proteins are
revealed. For many years investigators had known of the
latter, but they did not know the extent and nature of the
noncoding DNA. We first consider the types of coding
DNA that have been found, then move on to look at types
of noncoding DNA.
Protein-encoding DNA in eukaryotes
Four different classes of protein-encoding genes are found in
eukaryotic genomes, differing largely in gene copy number.
Single-copy genes. Many genes exist as single copies on a
particular chromosome. Most mutations in these genes
result in recessive Mendelian inheritance.
Segmental duplications. Sometimes whole blocks of genes
are copied from one chromosome to another, resulting
chapter
18
Genomics
359www.ravenbiology.com
rav32223_ch18_352-371.indd 359rav32223_ch18_352-371.indd 359 11/10/09 3:05:58 PM11/10/09 3:05:58 PM