The goal of this paper is to set the theoretical framework of PCA in remote sensing of burned areas. Specific objectives are to introduce the reader to the (i) EVD-based PCA and the significance of mean-centering and scaling as pre-processing steps; (ii) SVD, an alternative solution to PCA and its difference against EVD; (iii) existing EVD- and SVD-based PCA applications in remote sensing; (iv) the methodological concepts of remotely sensing burned areas via the EVD-based PCA; (v) four SVD-based PCA versions for burned area mapping (vi) the implications of mean-centering and scaling multi-dimensional data prior to PCA. Finally we link the presented theoretical concepts to the remote sensing of burned areas and suggest that a non-centered SVD may perform better than the EVD-based PCA in capturing burned areas.
Principal components analysis
Principal components analysis (PCA) or transformation (PCT) is a non-parametric, orthogonal linear transformation of correlated variables [1, 2]. Being probably the oldest and most well-known multivariate analysis technique, PCA is useful in a wide range of applications including data exploration and visualisation of underlying patterns within correlated data sets; decorrelation; detection of outliers; data compression; feature reduction; enhancement of visual interpretability; improvement of statistical discrimination of clusters; ecological ordination; and more [3].
While the transformation can expose the internal structure of multivariate data sets, it is by no means optimised for class separability. PCA does not analyse class labels but uses global statistics to derive the transformation parameters [4]. Hence, there is no guarantee that the directions of maximum variance enhance class separabilities. It is up to the user to identify a high signal-to-noise ratio, via visual or quantitative inspection, of a feature of interest within the principal components. In short, PCA supports cluster seeking applications but cannot replace the need for user input.
Applying the EVD-based principal components transformation (henceforth noted as PCA) assumes linearity (or high data correlation); considers the statistical importance of the mean and the variance-covariance; and bases on large variances to highlight useful discrimination properties. In its classical form, the algorithm of the transformation bases on EVD and performs the following steps: i) organises the data set in a matrix, ii) mean-centers the columns of the matrix (henceforth also referred to as mean centering or just centering
1), iii) calculates the covariance matrix (non-standardised PCA) or the correlation matrix (standardised PCA, a step also known as scaling), iv) calculates the eigenvectors and the eigenvalues of the data’s variance-covariance or the correlation matrix, v) sorts the variances (i.e. the eigenvalues) in decreasing order, and finally vi) projects the original dataset signals into what is named Principal Components, or scores, by multiplying them with the eigenvectors which act as weighting coefficients.
Mathematically, PCA can be described as a set of p-dimensional vectors of weights or loadings w
(k)=(w
1, …, w
p
)(k) that map each row vector x
i
, of a zero-mean matrix X, to a new vector of principal component scores t
(i)=(t
1, …, t
m
)(i). The scores are given by t
k
(i)=x
(i)·w
(k) for i=1, …, n and k=1, …, m in a way that the individual variables of t capture successively the maximum possible variance from x, with each loading vector w constrained to be a unit vector [3]. The complete decomposition of X can be given as
where W is a p-by-p matrix (noting that W
′ is transposed) whose columns are the eigenvectors of X′X determined by the algorithm and Y the matrix of the transformed data. In the case of remotely sensed multispectral images, we consider n spectral response observations (pixel values) on b spectral bands. Following the steps outlined above, the algorithm using the covariance matrix
-
1.
Starts with arranging the spectral responses as n vectors x
1, …, x
n
and placing them as rows in a single matrix X of dimensions n×b where the columns correspond to the b spectral bands.
-
2.
For each band b, its mean m is subtracted from all of its spectral response values. Hence all bands present now a zero mean.
-
3.
Next, it calculates the covariance (or correlation) matrix Σ
x
of the input data given by
$$ \Sigma{}_{X}=\frac{\sum\limits^{n}_{k=1}(x_{k}-m)(x_{k}-m)'}{n-1} $$
(2)
We emphasize that calculating the covariance matrix, does actually mean-center the data beforehand. It is of explanatory importance, however, useful to present it as a separate step. The covariance describes the scatter or spread of the spectral responses in the multispectral space. It is symmetric and therefore has orthogonal eigenvectors and real eigenvalues. Alternatively, to perform PCA on the correlation matrix, instead of the covariance matrix, each band is standardized, i.e. divided by its standard deviation.
-
4.
Successively, the algorithm computes the matrix D of eigenvectors which diagonalises the covariance matrix Σ
x
. The respective equation is
$$ \Lambda=D{\prime}\Sigma_{x}D $$
(3)
where Λ is the diagonal matrix of eigenvalues of Σ
x
. It is shown that the covariance matrix of the transformed data Σ
y
can be identified as the diagonal matrix of eigenvalues of Σ
x
[5]. The relationship between Σ
y
and Σ
x
is
$$ \Sigma_{y}=D{\prime}\Sigma_{x}D $$
(4)
where Σ
y the covariance of the transformed data, which must be diagonal, and D the matrix of eigenvectors of Σ
x
, provided that D is an orthogonal matrix.
-
5.
The last step sorts the eigenvectors in D and the eigenvalues in matrix Λ in decreasing order. The eigenvector with the highest eigenvalue, which corresponds to the highest variance, is the first principal component of the data.
Mean-centering, scaling and the eigendecomposition
Mean centering the multivariate data matrix is considered to be integrated in EVD. It is achieved by subtracting the mean of each variable from its own observations. This results in a zero mean vector of observations [6] and successively a zero mean matrix. Graphically, this action translates to an origin shift of the original scatter plot in its gravity center without altering its shape (Fig. 1). This is justified in terms of finding a basis that minimises the mean square error of approximating the original data.
Scaling the centered variables to have unit variance before the analysis takes place is optional. This action, also referred as standardisation, forces the variables to be of equal magnitude by altering the range of the point swarm (Fig. 2). This is required when variables are measured in different units and may be useful when their ranges vary substantially. We note that scaling uncentered variables does not result in unit variance. The usefulness of this combination may even be questionable from a matehematical point of view. However, we include this combination in our analytical framework for the sake of experimental completeness.
The eigendecomposition of a symmetric matrix yields eigenvectors. Each eigenvector has a positive eigenvalue. Theeigen
vectors define the direction of the variation and the eigenvalues define proportionally the length of the axes of variation [7]. Geometrically, the first principal component points in the direction with the largest variance. The second component; being orthogonal to the first, points to the second largest variance. The same pattern is repeated for all remaining components.
PCA rotates actually the point-scatter around its centroid and aligns to the coordinate axes so as to maximise the spread of the data projected onto them. It may be for instructive reasons that authors refer to PCA as a
rotation of the coordinate axes [8]. This spread is the sum of squares of the data’s coordinate points along the axes, also known as principal component scores. In addition, the new axes are not correlated. The scores individuated by centered data are expressed as deviations from the mean of the variable. The rotation translates mathematically in a weighted linear combination of the original dimensions.
SVD, an alternative solution to PCA
Another matrix decomposition algorithm which, among other uses, computes the least squares fitting of data, is the Singular Value Decomposition (SVD). The algorithm yields the minimum number of dimensions required to represent a matrix or linear transformation. Clark and Clark [9] describe SVD as being essentially equivalent to a least squares method but numerically more robust according to [10]. Wu [11] notes that better numerical properties are important when higher order dimensions are required. Specifically, when computing the covariance matrix (equation 2), prior to EVD, it is the multiplication of the matrix X by itself that may lead to loss of numerical accuracy. EVD, upon which the conventional PCA is based, and SVD have similar properties and products. Furthermore, PCA can be seen as SVD applied on a column-centered data matrix [6]. Both algorithms are non-sequential and extract hidden variables simultaneously. Yet they differ in several aspects. EVD works on the covariance (or correlation) matrix while SVD operates on the raw data matrix. The main difference of interest, in this work however, is that centering the columns of the input data is performed by default within the framework of EVD-based PCA, while it is optional in SVD.
To describe SVD mathematically, let X denote a matrix of n observations by p variables. The singular value decomposition of X is
where U, A are respectively n×r and p×r matrices, each of which have orthonormal columns so that U′U=I
r
and A′A=I
r
; L is an r×r diagonal matrix and r is the rank of X. The columns of U are orthogonal unit vectors of length n called the left singular vectors of X. The columns of A are orthogonal unit vectors of length p called the right singular vectors of X.
Jolliffe [3] states that SVD provides a computationally efficient method of actually finding PCs and that it provides additional insight into what a PCA actually does. He proves the relation between SVD and PCA in that
$$ ULA{\prime}=X{\sum\limits_{x=1}^{p}}a_{k}a{\prime}_{k}=X $$
(6)
where a
k
, k=(r+1), (r+2), …, p, are eigenvectors of X′X corresponding to zero eigenvalues. Essentialy, the right singular vectors A of X are equivalent to the eigenvectors of X′X, while the singular values in L are equal to the square roots of the eigenvalues of X′X. These correspond to the coefficients and standard deviations of the principal components for the covariance matrix.