Monitoring Committees in Clinical Trials: A Practical Perspective, S. S. Ellenberg,
T. R. Fleming and D. L. DeMets, 2002, Wiley, Chichester.]
Data perturbation: See statistical disclosure limitation.
Datareduction: The process of summarizing large amounts of data by forming frequency distribu-
tions, histograms, scatter diagrams, etc., and calculating statistics such as means, variances
and correlation coefficients. The term is also used when obtaining a low-dimensional
representation of
multivariate data
by procedures such as
principal components analysis
and
factor analysis
.[Data Reduction and Error Analysis for the Physical Sciences, 1991,
P. R. Bevington, D. K. Robinson, McGraw-Hill.]
Data science: A term intended to unify statistics, data analysis and related methods. Consists of three
phases, design for data, collection of data and analysis of data. [Data Science, Classification
and Related Methods, 1998, C. Hayashi et al. eds., Springer, Tokyo.]
Data screening: The initial assessment of a set of observations to see whether or not they appear to
satisfy the assumptions of the methods to be used in their analysis. Techniques which
highlight possible
outliers
, or, for example, departures from normality, such as a
normal
probability plot
, are important in this phase of an investigation. See also initial data
analysis. [SMR Chapter 6.]
Data set: A general term for observations and measurements collected during any type of scienti fic
investigation.
Data smoothing algorithms: Procedures for extracting a pattern in a sequence of observations
when this is obscured by noise. Basically any such technique separates the original series
into a smooth sequence and a residual sequence (commonly called the ‘rough’). For
example, a smoother can separate seasonal fluctuations from briefer events such as identi-
fiable peaks and random noise. A simple example of such a procedure is the
moving average
;
a more complex one is
locally weighted regression
. See also Kalman filter and spline
function.
Data squashing: An approach to reducing the size of very large data sets in which the data are first
‘binned’ and then statistics such as the mean and variance/covariance are computed on each
bin. These statistics are then used to generate a new sample in each bin to construct a reduced
data set with similar statistical properties to the original one. [Graphics of Large Data Sets,
2006, A. Unwin, M. Theus, and H. Hofmann, Springer, New York.]
Data swapping: See statistical disclosure limitation.
Data tilting: A term applied to techniques for adjusting the empirical distribution by altering the data
weights from their usual uniform values, i.e., n
–1
where n is the sample size, to multinomial
weights, p
i
for the ith data point. Often used in the analysis of
time series
.[Journal of the
Royal Statistical Society, Series B, 2003, 65, 425–442.]
Data theory: Data theory is concerned with how observations are transformed into data that can be
analyzed. Data are hence viewed as theory laden in the sense that observations can be given
widely different interpretations, none of which are necessitated by the observations them-
selves. [Data Theory and Dimensional Anaysis, 1991, W. G. Jacoby, Sage, Newbury Park.]
Data vizualization: Interpretable graphical representations of abstract data and their relationships.
See also statistical graphics.[Vizualization Handbook, 2004, C. Hansen and C. R. Johnson,
Academic Press, Orlando, Florida.]
125