
2.5 Neural Networks 97
of the neural network, which means that these fitting criteria consider for example,
both the RSQ and the roughness of the neural network. In terms of such a modified
fitting criterion, a network such as the one in Figure 2.11a can be better than the
‘‘rougher’’ network in Figure 2.11b (although the RSQ of the second network is
smaller). One of these regularizations methods called weight decay makes use of the
fact that the roughness of neural networks is usually associated with ‘‘large’’ values
of its weight parameters, and this is why this method includes the sum of squares of
the network weights in the fitting criterion. This is the role of the
decay parameter
of the
nnet command: decay=0 means there is no penalty for large weights in the
fitting criterion. Increasing the value of
decay, you increase the penalty for large
weights in the fitting criterion. Hence
decay=0 means that you may get overfitting
for neural networks with sufficiently many nodes in their hidden layer, while
positive values of the
decay parameter decrease the ‘‘roughness’’ of the neural
network and will generally improve its predictive capability. Ripley suggests to use
decay values between 10
−4
and 10
−2
[67, 68]. To see the effect of this parameter,
you may use a hidden layer with nine nodes similar to Figure 2.11b, but with
decay=1e-4. Using these settings, you will observe that the result will look similar
to Figures 2.10 and 2.11a, which means you get a much smoother (less ‘‘rough’’)
neural network compared to the one in Figure 2.11b.
2.5.6
Several Inputs Example
As a second example which involves several input quantities, we consider the data
in
rock.csv which you find in the book software. These data are part of the R
package, and they are concerned with petroleum reservoir exploration. To get oil out
of the pores of oil-bearing rocks, petroleum engineers need to initiate a flow of the
oil through the pores of the rock toward the exploration site. Naturally, such a flow
consumes more or less energy depending on the overall flow resistance of the rock,
and this is why engineers are interested in a prediction of flow resistance depending
on the rock material. The file
rock.csv contains data that were obtained from
48 rock sample cross-sections, and it relates geometrical parameters of the rock
pores with its permeability, which characterizes the ease of flow through a porous
material [69]. The geometrical parameters in
rock.csv are: area,ameasureof
the total pore spaces in the sample (expressed in pixels in a 256 × 256 image);
peri, the total perimeter of the pores in the sample (again expressed in pixels);
and
shape, a measure of the average ‘‘roundness’’ of the pores (computed as the
smallest perimeter divided by the square root of the area for each individual pore;
approx. 1.1 for an ideal circular pore, smaller for noncircular shapes). Depending
on these geometrical parameters,
rock.csv reports the rock permeability perm
expressed in units of milli Darcy (= 10
−3
Darcy, see [70]).
In a first attempt to describe these data using a neural network, let us consider
two explanatory variables,
area and peri, neglecting the third geometrical variable,
shape. With this restriction we will be able to generate 3D graphical plots of the
neural network below. Moreover, we will take
log(perm) as the response variable