Y ¼ A
T
pq
X (8.20)
where q<p, and A
pq
is the first q columns of A.
Given that the features transformed by the principal components are not directly
connected to the physical nature of the defect, the eigenvectors in A
pq
for the
transformed features are only used as the basis for choosing the most significant
features from the original p-dimensional feature vectors. This is explained by
means of a numerical simulation. As illustrated in Fig. 8.4, four normalized feature
vectors, f
1
, f
2
, f
3
, and f
4
, are constructed with each of them forming clusters around
four distinct levels of magnitudes. A total of 100 samples are considered for each of
the four features, hence each feature is a 100-by-1 vector. The four features are
simulated to have random variations from the same mean for each of the four
clusters. This is similar in principle to the variation of a measured data feature for
four different defect severities. Each of the four clusters for each feature contained
25 data points. The four features become less clearly differentiated from f
1
to f
4
,as
overlap between the clusters increases.
It is evident that a suitable feature selection scheme should be able to rank f
1
, f
2
,
f
3
, and f
4
in the same order as shown in Fig. 8.4. To derive the principal components
for the simulated data set, the four normalized features are collected in a 4-by-100
matrix X:
X ¼½f
1
; f
2
; f
3
; f
4
T
(8.21)
The eigenvalues and the eigenvectors are calculated from the scatter matrix S.The
matrix of eigenvectors can be represented as A ¼[a
i,j
], where i ¼1to4,andj ¼1to4.
The eigenvector a
4
consists of four components from the fourth column of the matrix
A as a
4
¼½
a
1;4
a
2;4
a
3;4
a
4;4
. Similar arrangement applies to a
1
, a
2
,anda
3
(i.e.,
a
1
¼½
a
1;1
a
2;1
a
3;1
a
4;1
, a
2
¼½
a
1;2
a
2;2
a
3;2
a
4;2
,and
a
3
¼½
a
1;3
a
2;3
a
3;3
a
4;3
), respectively. The matrix A is a 4 4 square matrix
because of the presence of the four features f
1
f
4
. The eigenvector corresponding to the
eigenvalue with the largest magnitude is chosen. As shown in Table 8.2, one of the
four eigenvalues of the data set is much larger than the other three, indicating that most
of the variance is concentrated in one direction. Table 8.3 lists the component
magnitudes for the eigenvector corresponding to the largest eigenvalue. Since this
corresponds to a
4
, the feature that is responsible for the maximum variance in the data
is thus identified.
Subsequently, the magnitudes of the four components of e
4
are examined. As
shown in Table 8.3, ja
1,4
j > ja
2,4
j > ja
3,4
j > ja
4,4
j. This result can be interpreted in
terms of the directionality of the eigenvector (a
4
) in the original feature space. If the
unit vectors for the original feature space are represented as u
1
, u
2
, u
3
, and u
4
(where
u
1
¼ [1 0 0 0]
T
, u
2
¼ [0 1 0 0]
T
, etc.), then a higher magnitude of a
i,4
denotes
the similarity in direction of the eigenvector a
4
with u
i
, when compared with the
other unit vectors forming the basis for the original feature space. For the simulated
data, the component a
1,4
has the largest magnitude, followed by a
2,4
, a
3,4
, and a
4,4
.
8.2 Key Feature Selection 133