
ning and image analysis will assign each such position a small intensity level that
corresponds to the local background. For each regular spot, a certain probability can
then be calculated that the spot is different from the negative sample (Fig. 9.2 d).
This can be done by outlier criteria or by direct comparison to the negative sample
(Kahlem et al. 2004).
Signal validity indices can be used as an additional qualifier for the expression ra-
tio. Suppose we compare a gene’s expression in two different conditions (A and B)
and then we distinguish four cases (1 = signal is valid, 0 = signal is invalid):
A B Ratio Interpretation Possible marker
1 1 Valid Gene expression is detectable in both conditions. Yes
1 0 Invalid Gene expression is detectable in condition A but not in B. Yes
0 1 Invalid Gene expression is detectable in condition B but not in A. Yes
0 0 Invalid Gene expression is not detectable in both conditions. No
Probes belonging to the fourth case should be removed from further analysis
since they represent genes that either are not expressed in both conditions or cannot
be detected using the microarray procedure (possibly a very low number of mole-
cules). This will occur fairly often in practice since only a part of the genes on the ar-
ray will be actually activated in the tissue under analysis. The other three cases might
reveal potential targets, but the expression ratio is meaningful only in the first case,
where both conditions generate valid signals.
295
9.1 Data Capture
Fig. 9.2 Image analysis and data acquisition.
(a) Visualization of individual sub-grid adjust-
ment with Visual Grid (GPC Biotech AG). Spot-
ting patterns show the geometry of the cDNAs
organized in sub-grids. Local background can be
assigned to each spot by defining specific neigh-
borhoods. (b) Image analysis was performed
with three image analysis programs classified by
manual (green bars), semiautomated (red bars),
and automated (blue bars) procedures on simu-
lated data. The purpose of the simulation was to
compare the reproducibility of the signals by re-
plicated analysis of perturbed signals (CV value).
The histogram shows the frequencies (y-axis)
over the range of the CV (x-axis). (c) Affymetrix
geometry employs successive printing of gene
representatives (oligonucleotide probes). Ap-
proximately 20 different oligonucleotides that
are spread across the gene sequence are immo-
bilized (perfect matches), with each PM having
a one-base-pair mismatch (MM) next to it that
is an estimator for the local background. The
pair PM-MM is called a probe pair. The whole
set of PM-MM pairs for the same gene is called
a probe set. After image analysis the feature
values are condensed to a single value reflecting
the gene’s concentration in the target sample.
(d) Spot validity can be judged by a negative
control sample distributed on the array. After
quantification a small, nonzero intensity is as-
signed to each of these empty spots, reflecting
the amount of background signal on the array.
Since these positions are spread uniformly over
the array, the distribution of these signals re-
flects the distribution for signal noise for this ex-
periment and is an indicator of whether signals
are at the background level or reflect reliable ex-
pression levels. If the cumulative distribution
function for the spot’s signal is close to one
(blue line), this indicates that the cDNA is ex-
pressed in the tissue, whereas low values reflect
noise (red line). In practice cDNAs are consid-
ered “expressed” when their signal exceeds a
proportion above 0.9, a threshold consistent
with the limit of visual detection of the spots.
3