
54 3. Supervised Learning Neural Networks
3.5 Assignments
1. Give an expression for o
k,p
for a FFNN with direct connections between the
input and output layer.
2. Why is the term (−1)
v
j,I+1
possible in equation (3.5)?
3. Explain what is meant by the terms overfitting and underfitting. Why is E
V
>
E
V
+ σ
E
V
a valid indication of overfitting?
4. Investigate the following aspects:
(a) Are direct connections between the input and output layers advantageous?
Give experimental results to illustrate.
(b) Compare a FFNN and an Elman RNN trained using GD. Use the following
function as benchmark: z
t
=1+0.3z
t−2
− 1.4z
2
t−1
,withz
1
,z
2
∼ U(−1, 1),
sampled from a uniform distribution in the range (−1, 1).
(c) Compare stochastic learning and batch learning using GD for the function
o
t
= z
t
where z
t
=0.3z
t−6
−0.6z
t−4
+0.5z
t−1
+0.3z
2
t−6
−0.2z
2
t−4
+ ζ
t
,and
z
t
∼ U(−1, 1) for t =1, ···, 10, and ζ
t
∼ N(0, 0.05).
(d) Compare GD and SCG on any classification problem from the
UCI machine learning repository at http://www.ics.uci.edu/
~mlearn/MLRepository.html.
(e) Show if PSO performs better than GD in training a FFNN.
5. Assume that gradient descent is used as the optimization algorithm, and derive
learning equations for the Elman SRNN, the Jordan SRNN, TDNN and FLNN.
6. Explain how a SRNN learns the temporal characteristics of data.
7. Show how a FLNN can be used to fit a polynomial through data points given in
a training set.
8. Explain why bias for only the output units of a PUNN, as discussed in this
chapter, is sufficient. In other words, the PUs do not have a bias. What will be
the effect if a bias is included in the PUs?
9. Explain why the function f(z
1
,z
2
)=z
3
1
z
7
2
−0.5z
6
1
requires only two PUs, if it is
assumed that PUs are only used in the hidden layer, with linear activations in
both the hidden and output layers.
10. Assume that a PUNN with PUs in the hidden layer, SUs in that output layer,
and linear activation functions in all layers, is used to approximate a polynomial.
Explain why the minimal number of hidden units is simply the total number of
non-constant, unique terms in the polynomial.
11. What is the main requirement for activation and error functions if gradient
descent is used to train supervised neural networks?
12. What is the main advantage of using recurrent neural networks instead of feed-
forward neural networks?
13. What is the main advantage in using PUs instead of SUs?
14. Propose a way in which a NN can learn a functional mapping and its derivative.
15. Show that the PUNN as given in Section 3.1.3 implements a polynomial approx-
imation.