Dw1, Dw2 and replaces the two matrix difference equations (6-29) with four
difference equations
MATRIX Dw1 = lrate1 * delta1 * x + mom1 * Dw1
MATRIX Dw2 = lrate2 * delta2 * v1 + mom2 * Dw2
DELTA W1 = Dw1 | W2 = Dw2 (6-30)
which make the connection-weight adjustments favor the directions of past
successes. The optimization parameters
mom1, mom2 must be found by trial
and error and are typically between 0.1 and 0.9. There are literally hundreds
of papers and several books [6–9,12–18] describing other improved back-
propagation algorithms, but none work every time. References [2–5] describe
more advanced numerical function-optimization schemes applicable to mul-
tilayer networks. In practice, the best algorithm for a specific application
must be selected (and possibly redesigned) by trial and error. The
Levenberg–Marquart algorithm [5,7] is often a good compromise.
(c) Examples and Neural-network Submodels
Backpropagation regression networks with a few inputs and one or more out-
puts are used to model empirical relations. As a simple example, the program
in Figure 6-4a trains a two-layer regression network to produce a very accu-
rate sine function; Figure 6-4b shows results. We have defined the two-layer
neural network as a reusable submodel (Section 3-17) in the experiment-
protocol script. The same submodel is then invoked in two separate
DYNAMIC program segments, one for training and one for recall tests. Our
submodel could also be stored and used in another program, say, in a control-
system simulation.
Figure 6-4c shows the squared-error time histories for 32 output activa-
tions of a two-layer, 32-input backpropagation network with
nv = 9 hidden-
layer neurons during a successful training run of
NN = 200,000 steps. A
2.4-GHz personal computer trained 585 connection and bias weights to pro-
duce this display in 7.3 s. The same run took 5.5 s with the display turned off.
6-13. Radial-basis-function Networks
(a) Basis-function Expansion and Linear Optimization
Given a sample of corresponding measurements x, Y, traditional statistical
regression methods have long approximated mean-square regression func-
tions
y(x) (Section 6-6) with weighted sums y(x) = W1 f1(x) + W2 f2(x) + …+
Wn fn(x)
of conveniently chosen basis functions f1(x), f2(x), …, fn(x).One
Nonlinear Multilayer Networks 141