
7.1 Performance Measures 97
It is important to note that the training error or the generalization error alone is not
sufficient to quantify the accuracy of a NN. Both these errors should be considered.
Additional Reading Material on Accuracy
The trade-off between training error and generalization has prompted much research in
the generalization performance of NNs. Average generalization performance has been
studied theoretically to better understand the behavior of NNs trained on a finite
data set. Research shows a dependence of generalization error on the training set, the
network architecture and weight values. Schwartz et al. [767] show the importance of
training set size for good generalization in the context of ensemble networks. Other
research uses the VC-dimension (Vapnik-Chervonenkis dimension) [8, 9, 152, 643] to
derive boundaries on the generalization error as a function of network and training set
size. Best known are the limits derived by Baum and Haussler [54] and Haussler et
al. [353]. While these limits are derived for, and therefore limited to, discrete input
values, Hole derives generalization limits for real valued inputs [375].
Limits on generalization have also been developed by studying the relationship between
training error and generalization error. Based on Akaike’s final prediction error and
information criterion [15], Moody derived the generalized prediction error which gives
a limit on the generalization error as a function of the training error, training set size,
the number of effective parameters, and the effective noise variance [603, 604]. Murata
et al. [616, 617, 618] derived a similar network information criterion. Using a different
approach, i.e. Vapnik’s Bernoulli theorem, Depenau and Møller [202] derived a bound
as a function of training error, the VC-dimension and training set size.
These research results give, sometimes overly pessimistic, limits that help to clarify
the behavior of generalization and its relationship with architecture, training set size
and training error. Another important issue in the study of generalization is that of
overfitting. Overfitting means that the NN learns too much detail, effectively mem-
orizing training patterns. This normally happens when the network complexity does
not match the size of the training set, i.e. the number of adjustable weights (free
parameters) is larger than the number of independent patterns. If this is the case,
the weights learn individual patterns and even capture noise. This overfitting phe-
nomenon is the consequence of training on a finite data set, minimizing the empirical
error function given in equation (7.2), which differs from the true risk function given
in equation (7.1).
Amari et al. developed a statistical theory of overtraining in the asymptotic case
of large training set sizes [22, 21]. They analytically determine the ratio in which
patterns should be divided into training and test sets to obtain optimal generalization
performance and to avoid overfitting. Overfitting effects under large, medium and
small training set sizes have been investigated analytically by Amari et al. [21] and
M¨uller et al. [612].