
40 4 Empirical Orthogonal Functions
or introducing the synoptic vectors x
j
D Œx
1
.j /; x
2
.j /; : : : ; x
m
.j / for j D
1;2; :::; nwe have the matrix X defined in terms of column vectors,
X D Œx
1
; x
2
; :::; x
n
;
where the rows of the matrix now describe the values at the same spatial location.
1
In the analysis of meteorological or climatological data it is very common that
time series come from observations or from numerical simulations taken at reg-
ular intervals. A typical example are for instance temperatures taken at several
stations around the world, grouped in monthly means, so that only one value per
month is available. In this case the columns of the data matrix X indicate the time
series at each station, whereas the synoptic vectors, i.e. the rows of the data ma-
trix, describe the geographic distribution of the temperature at a particular time. The
time evolution of the temperature can then be followed by looking at the sequence
of geographical maps depicting the temperature distribution every month. Usually,
for interpretation purposes it is useful to use contouring algorithms to represent the
field as a smooth two-dimensional function. Contouring is not a trivial operation
and especially for observations that are distributed in space with large gaps and far
from being a regular covering of the surface of the Earth, must be done with care.
Sophisticated techniques, known as data assimilation, are employed to make sure
the data sets are put on regular grids in a physically consistent manner. In any case,
either that we are looking at data from modelling, or that we are working with ob-
servations coming from data assimilation systems, we end up with data on regular
grids, covering the Earth with a regular pattern.
As already mentioned, we use two data sets to illustrate our discussion. The first
is a time series of monthly mean geopotential data at 500 mb (Z500), obtained from
a simulation with a general circulation model forced by observed values of monthly
mean Sea Surface Temperatures (SST). The data sets cover 34 years, corresponding
to the calendar years 1961–1994. The Z500 set is a very good indicator of upper
air flow, since the horizontal wind is predominantly aligned along the geopotential
isolines. Figure 4.1 shows a few examples taken from the data set. It is possible
to note the large variability from one month to the other (top panels), but also the
large variability at the same point, as the time series for the entire series (lower
panels) show. It is clear that the geopotential at 500 mb is characterized by intense
variability in space and time and a typical month may be as different from the next
month as another one chosen at random.
We can consider the maps at each month as a vector in a special vector space, the
data space. Each vector in this data space represents a map, a possible case of a Z500
monthly mean. The space covers all possible shapes of the Z500, the vast majority
of which will never be realized, like the one in which the heights are constant every-
where, or some other similar strange construction. The mathematical dimension of
1
This matrix notation is very common in meteorological data, whereas in many other fields, data
are stored as an n m matrix, namely as X
. This difference affects the whole notation in later
chapters, when defining the covariance matrix and other statistical quantities.