If you want to make sense of human data, you must have clear ideas on the
current state of the data:
The complete nucleotide sequence of the human genome is now at
hand — except for a few thousand holes that are still being filled up.
This is a reason why new versions (also called “assemblies”) of the —
almost-finished — human genome are periodically released. This state
of flux is true for all large animal genomes.
This sequence was obtained in raw format; the next challenge is the
annotation of the raw data — creating a detailed and accurate FEATURES
table of the human genome.
Throughout the world, new information is generated daily on human
gene properties and functions, using a wide array of techniques. Ideally,
someone has to gather it all, package it nicely, and offer it to the whole
research community for free!
Having doubts that the last goal will ever be attainable in this less-than-perfect
world? Well, think again, because this is where we’re taking you right now.
Finding out about the Ensembl project
The Internet home page of Ensembl (www.ensembl.org) says it all: Ensembl
is a joint project between the European Bioinformatics Institute (EBI) and the
Sanger Institute, both located near Cambridge (U.K.). Together they’ve devel-
oped an integrated database and software system to produce and maintain
automatic annotations for the genomes of animals, with a special attention to
our closest relatives: the vertebrates. This project — like the others we
describe in this chapter — relies heavily on the collaboration of a large
number of individual laboratories. It all started as part of the International
Human Genome Project, continued for the Mouse Genome project, and is
now being pursued for other animals. Data and software are also freely flow-
ing among numerous national database and bioinformatics centers from all
over the world, allowing a complex cross-linking to take place.
Getting started on the Ensembl site
Considering how complex the human genome is, you may not be surprised to
find that you can attack the Ensembl resources from many different angles.
To quickly find out about your options, the best thing to do is to jump on the
guided tour that Ensembl proposes on its home page. Here’s how it’s done:
1. Point your browser to www.ensembl.org/.
The impressive Ensembl home page appears, as shown in Figure 3-23.
98
Part II: A Survival Guide to Bioinformatics