K-m eans cluste ran alysis: A method of
cluster analysis
in which from an initial partition of the
observations into K clusters, each observation in turn is examined and reassigned, if
appropriate, to a different cluster in an attempt to optimize some predefined numerical
criterion that measures in some sense the ‘quality’ of the cluster solution. Many such
clustering criteria have been suggested, but the most commonly used arise from consid-
ering features of the
within groups
,
between groups
and
total matrices of sums of squares
and cross products
(W, B, T) that can be defined for every partition of the observations
into a particular number of groups. The two most common of the clustering criteria arising
from these matrices are
minimization of trace WðÞ
minimization of determinant W
ðÞ
The first of these has the tendency to produce ‘spherical’ clusters, the second to produce
clusters that all have the same shape, although this will not necessarily be spherical. See also
agglomerative hierarchical clustering methods, divisive methods and hill-climbing
algorithm. [MV2 Chapter 10.]
K-means inverse regression: An extension of
sliced inverse regression
to
multivariate regres-
sion
with any number of response variables. The method may be particularly useful at the
‘exploration’ part of an analysis, before suggesting any specific multivariate model.
[Technometrics, 2004, 46, 421–429.]
Knots: See spline functions.
Knowl edge discovery i n data bases (K DD) : A form of
data mining
which is interactive
and iterative requiring many decisions by the researcher. [Communication of the ACM,
1996, 39 (II),27–34.]
Knox’stests: Tests designed to detect any tendency for patients with a particular disease to form a
disease cluster
in time and space. The tests are based on a
two-by-two contingency table
formed from considering every pair of patients and classifying them as to whether the
members of the pair were or were not closer than a critical distance apart in space and as to
whether the times at which they contracted the disease were closer than a chosen critical
period. [ Statistics in Medicine, 1996, 15, 873–86.]
Kolmogorov,AndreiNikolaevich(1903^1987): Born in Tambov, Russia, Kolmogorov first
studied Russian history at Moscow University, but turned to mathematics in 1922. During
his career he held important administrative posts in the Moscow State University and the
USSR Academy of Sciences. He made major contributions to probability theory and
mathematical statistics including laying the foundations of the modern theory of Markov
processes. Kolmogorov died on 20 October 1987 in Moscow.
Kolmogorov^Smirnov two-sample method: A
distribution free method
that tests for any
difference between two population probability distributions. The test is based on the
maximum absolute difference between the
cumulative distribution functions
of the
samples from each population. Critical values are available in many statistical tables.
[Biostatistics: A Methodology for the Health Sciences, 2nd edn, 2004, G. Van Belle,
L. D. Fisher, P. J. Heagerty and T. S. Lumley, Wiley, New York.]
Korozy, Jozsef (1844^1906): Born in Pest, Korozy worked first as an insurance clerk and
then a journalist, writing a column on economics. Largely self-taught he was appointed
director of a municipal statistical office in Pest in 1869. Korozy made enormous
contributions to the statistical and demographic literature of his age, in particular
237