Open Access

Remote sensing of burned areas via PCA, Part 2: SVD-based PCA using MODIS and Landsat data

Open Geospatial Data, Software and Standards20172:21

https://doi.org/10.1186/s40965-017-0029-0

Received: 22 December 2016

Accepted: 29 May 2017

Published: 24 August 2017

Abstract

Background

Singular value decomposition (SVD), as an alternative solution to principal components analysis (PCA), may enhance the spectral profile of burned areas in satellite image composites.

Methods

In this regard, we combine the pre-processing options of centering, non-centering, scaling, and non-scaling the input multi-spectral data, prior to the matrix decomposition, and treat their combinations as four different SVD-based PCA versions. Using both unitemporal and bi-temporal data sets, we test all four combinations to derive principal components. We assess the effects of the transformations based on multiresponse permutation procedures and quantify the enhanced spectral separability between burned areas and other major land cover classes via the Jeffries-Matusita metric. Lastly, we evaluate visually and numerically all principal components and select a subset of interest.

Results

The best transformation for the subset of selected components, is the uncentered-unscaled one.

Conclusions

The results indicate that an uncentered and unscaled SVD may improve the spectral separability of burned areas in some of the higher order components.

Keywords

PCAEVDSVDMean-centeringScalingBurned area mappingMODISLandsat5 TMFree open source software

Background

In the article “Remote sensing of burned areas via PCA, Part 1: centering, scaling and EVD vs SVD.” [1], we present in-depth the concepts of PCA [2]; past scientific literature of PCA in remote sensing applications [3]; the link of PCA to burned area mapping [4]; the implications of centering and scaling [5]; and finally suggest that the uncentered-unscaled SVD-based PCA variant may further improve the spectral enhancement of burned area clusters compared to the conventional centered and EVD1-based PCA.

In multi-spectral imagery, burned areas build homogeneous clusters of low internal heterogeneity. Their mean spectral value is distanced from the composite’s overall mean and they present lower projections, in some dimensions, in both uni- and multi-temporal composites. In the latter case, it is well noted that burned surfaces are absent in the prefire dimensions.

The pre-processing options to center and scale the image composites before the matrix decomposition, can be combined in different ways [2]. Their application influences the transformation of the spectral properties of burned area clusters. The impact of the transformations, is most evident in some of the higher order principal components. A non-centered SVD, captures in the first component greater amounts of information around the mean value of the input composite [5]. This can be advantageous in isolating burned clusters in some of the higher order components. Not scaling the input data may as well allow for subtle, yet useful, transformations applied in the initial dataset to be expressed in the restructured principal components. In this article, we demonstrate numerically the theoretical concepts of spectrally enhancing remotely sensed burned areas via SVD-based PCA. We apply and discuss the performance of four SVD versions. In addition, we go through an example-based quantitative discussion on the selection of the best principal components obtained via SVD.

Data

Within the first weeks, after the pause of large wildfires, burn scars absorbe higher amounts of solar energy. Compared to other surfaces, they present lower reflectance values in both Near-infrared (NIR) and Mid-infrared (MIR) bands (Fig. 1) and appear expectedly darker than older burns. Therefore, post-fire multi-spectral imagery, needs to be timely acquired near after the pause of fires. Regarding pre-fire imagery in multi-temporal data sets, they are best if acquired within the same season as the post-fire images. That is to hold the inter-seasonal reflectance variation of landscape features as low as possible. Generally, all scenes should be as cloud-free as possible, over large fire-affected regions in order to obtain more accurate results.
Fig. 1

Density plots of both (sub-plot a) pre- and postfire MODIS bands 2, 6 and 7 and (sub-plot b) burned area samples

Based on the above, we analyse daily MODIS Terra L2G (MOD09GA)2 and Landsat5 TM surface reflectance products (Figs. 2 and 3) respectively over Peloponnese and Mt Parnitha in Greece (Fig. 4). The selected MODIS acquisitions are a postfire scene in summer 2007 (Julian day 242)3 and a prefire in summer 2006 (Julian day 239)4. MOD09 products are estimations of the surface spectral reflectance for each designated MODIS band and they are already atmospherically corrected. Variance-covariance and correlation coefficients for the selected input surface reflectance bands are presented in Table 1.
Fig. 2

Pre- and postfire MODIS surface reflectances

Fig. 3

Pre- and postfire Landsat5 TM surface reflectances

Fig. 4

Peloponnese (orange) and Mt Parnitha plus surroundings (red) - Scale in km

Table 1

Variance-covariance (white cells) and correlation coefficient (grey cells) matrices of MODIS and Landsat5 TM surface reflectance bands

Worth mentioning is that MODIS band 5 (1.240 A m) is a very good discriminator with respect to the spectral response of burned areas (see sampled burned areas in Fig. 5 and refer to [6, 7]). However, in the acquired scene, band 5 is stripped, likely due to a calibration artefact causing anomalously high reflectance values [6]. Experimental transformations with data sets including band 5, derived noisy components. Therefore, this band has been excluded entirely from the analyses.
Fig. 5

A boxplot graph comparing spectral values of burned samples for both the pre- (2006) and post-fire (2007) MODIS bands 1, 2, 5, 6 and 7

The Landsat5 TM scenes5 were acquired in summer 2007 (Julian day 248, postfire)6 and in summer 2003 (Julian day 237, prefire)7. These are already pre-processed data of Level-18 and delivered as scaled digital numbers. Since we do not cross-compare data from different sensors, and burned areas feature distinct spectral profiles, no further pre-processing was performed.

The selected MODIS scenes (Fig. 2) cover the Peloponnese peninsula (South Greece) with a total surface of 22,068 k m 2 (main land of about 21,405 k m 2 incl. surrounding islands on East, South). The Landsat5 TM products (Fig. 3) illustrate a region North of Athens–including Mt Parnitha–of about 1027 k m 2. Both areas were severely damaged by large and uncontrolled wildland fires at the end of the summer 2007.

Tools

The employed methods were performed using free and open source software. Geospatial processing was performed using GRASS-GIS [8], QGIS [9] and FWTools [10]. The SVD-based PCA algorithm was applied via R’s function prcomp [11]. Multi-Response Permutation Procedures (MRPP) statistics were estimated using the mrpp and meandist functions, part of the R-package vegan [12]. The J-M index was implemented via custom R functions.

Methods

In the context of spectrally enhancing burned area clusters, we present uni- and bi-temporal study data sets. Therefore we label the four SVD-based PCA versions to derive principal components. Next, we describe the use of multiresponse permutation procedures to assess the effects of all transformations applied, namely centering, scaling and SVD itself. In addition, we refer to the Jeffries-Matusita spectral distance metric as a tool to quantify the separability between burned area and other major land cover class samples. Lastly, we overview an evaluation process for selecting principal components in which burned areas are spectrally enhanced. The complete workflow is visualised in Fig. 6.
Fig. 6

Overview of methodology

Samples of burned areas and major land cover classes

Firstly, we delineated 42 samples of burned areas and numerous for vegetation and water bodies. Secondly, we extracted urban surfaces (greater than 200 ha) and bare ground samples from the CORINE 2000 land data map [13]. The samples, visualised in Fig. 7 are of both regular and irregular shape and consist by at least or more than 17 pixels9. We did avoid to digitise large and mixed samples that could result in high internal class heterogeneity.
Fig. 7

Samples of burned areas and major land cover classes

Fig. 8

Principal components derived from SVD of unitemporal MODIS composites (1a)

Fig. 9

Principal components as derived from SVD on the bi-temporal MODIS composite (2a)

Fig. 10

Principal components as derived from SVD on the unitemporal Landsat5 TM composite (1b)

Fig. 11

Principal components as derived from SVD on the bi-temporal Landsat5 TM composite (2b)

Unitemporal and bitemporal composites

We define the following multi-spectral data sets:
  1. 1.

    Two unitemporal postfire sets: (a) a MODIS set build out of bands 1, 2, 6, 7 (in Fig. 2) and (b) a Landsat5 TM set composed of bands 1, 2, 3, 4, 5, 7 (in Fig. 3)

     
  2. 2.

    Two bi-temporal sets: (a) a MODIS composite build out of pre- and postfire bands 2, 6, 7 (in Fig. 2) and (b) a Landsat5 TM composite using pre- and postfire bands 2, 4, 7 (in Fig. 3)

     
The MODIS bands 1 and 2 were downscaled to 500 m to match the resolution of bands 6 and 7. The data sets will be cross-referenced as 1a, 1b, 2a and 2b hereafter. Scatterplot matrices for the samples in Fig. 7 extracted from both the unitemporal and bi-temporal MODIS composites are visualised in Figs. 12 and 13.
Fig. 12

Scatterplot matrix for major land cover classes extracted from the unitemporal MODIS composite (1a)

Fig. 13

Scatterplot matrix for major land cover classes extracted from the bi-temporal MODIS composite (2a)

Four ways of extracting principal components via SVD

Employing SVD in burned area mapping applications, is an in-between enhancement step. It means to improve the performance of subsequent classification algorithms. Towards this end, we extract principal components via SVD from MODIS and Landsat5 TM surface reflectance data.

We subject to SVD the following versions of the data sets defined in the subsection “Unitemporal and bitemporal composites”: (A) uncentered-unscaled, (B) uncentered-scaled, (C) centered-unscaled, (D) centered-scaled. Henceforth, the various versions will be referred as A, B, C and D respectively. Scatterplot matrices for the samples in Fig. 7 extracted from the MODIS-derived transformed images, are visualised in Figs. 14, 15, 16, 17, 18, 19, 20 and 21.
Fig. 14

Scatterplot matrix for major land cover classes extracted from uncentered-unscaled principal components derived from the unitemporal MODIS composite (1a)

Fig. 15

Scatterplot matrix for major land cover classes extracted from uncentered-scaled principal components derived from the unitemporal MODIS composite (1a)

Fig. 16

Scatterplot matrix for major land cover classes extracted from centered-unscaled principal components derived from the unitemporal MODIS composite (1a)

Fig. 17

Scatterplot matrix for major land cover classes extracted from centered-scaled principal components derived from the unitemporal MODIS composite (1a)

Fig. 18

Scatterplot matrix for major land cover classes extracted from uncentered-unscaled principal components derived from the bitemporal MODIS composite (2a)

Fig. 19

Scatterplot matrix for major land cover classes extracted from uncentered-scaled principal components derived from the bitemporal MODIS composite (2a)

Fig. 20

Scatterplot matrix for major land cover classes extracted from centered-unscaled principal components derived from the bitemporal MODIS composite (2a)

Fig. 21

Scatterplot matrix for major land cover classes extracted from centered-scaled principal components derived from the bitemporal MODIS composite (2a)

Multiresponse permutation procedures

Following multiresponse permutation procedures (MRPP) [14], one can describe the composition and configuration of major land cover class samples extracted from both the original and the transformed composites (Tables 2 and 3).
Table 2

Statistics based on multiple response permutation procedures for MODIS and Landsat5 TM composites (based on euclideandistance, 999 permutations and significance for all deltas 0.001)

Table 3

Statistics based on multiple response permutation procedures for principal components composites derived from MODIS composites (based on euclidean distance, 999 permutations and significance for all deltas 0.001)

The MRPP null hypothesis (H 0) accepts no differences among the sampled classes.10 This means that there is an equal chance for any possible combination of the data under H 0. The procedures estimate and compare the observed intra-class average distances (δ o ), weighted by their sample size (n), with average distances derived by all possible combinations (δ e x p.) of the sampled data (permutations) expected under H 0. Essentially, they compare the dissimilarities within and among classes.

The significance of the test is reflected in the probability (P-value) of observing a mean distance δ as small or smaller than the observed δ o under H 0. In addition, a measure of the within-class homogeneity is provided by A=1−δ 0/δ e x p.. The extreme case of all within-class observations being identical, equals to δ o =0 and A=1. Since the mean distance δ under H 0 is 0, an A>0 represents within-class homogeneity and an A<0 signifies within-class heterogeneity. Lastly, the classification strength [15] is the difference of the average between- and within-class dissimilarities.

The tests were performed using the complete set of observations sampled from the MODIS-based composites (in total 1085 pixels extracted from each band). However, due to the enormous amount of permutations demanded by the high number of observations sampled from Landsat5 TM data (in total 18865 pixels), we ran MRPP on 3000 randomly selected observations, independently for each Landsat5 TM-based data set. The euclidean distance metric was selected as the measure of dissimilarity between two observations.

Spectral distance metric

The MRPP test assesses primarily the sampled burned area classe’s quality of being different among the rest of the classes. Moreover, to verify numerically the effects of the pre-processing options mean-centering and scaling on the clusters of the sampled classes in terms of their configuration and composition. The procedures do not quantify, however, in a precise manner, the spectral enhancement of burned area samples after the application of SVD. To highlight how much the spectral separability, between burned and other class samples, increases or decreases, we rely on the Jeffries-Matusita (J–M) index.

J–M is well established in remote sensing applications as a measure of spectral separability between classes. The index is a transformation of the Bhatacharyya distance (Eq. 2) and applies to multivariate normal spectral class models. It is bound between [0,2.0] as defined by [16].
$$ J_{ij}=2\,\left(1-e^{-B}\right) $$
(1)
where
$$ \begin{aligned} B&=0.125\,(i-j)^{t}\left\{ \Sigma_{i}+\Sigma_{j}\right\}^{-1}(i+j)\\ &\quad+0.5\,log_{e}\left\{ \frac{\left|\frac{\left(\Sigma_{i}+\Sigma_{j}\right)}{2}\right|}{\sqrt{\left|\Sigma_{i}\right|\left|\Sigma_{j}\right|}}\right\} \Rightarrow \end{aligned} $$
(2)

where

B= Bhatacharyya index; i= first spectral signature vector; j= second spectral signature vector; Σ i = covariance matrix of sample i; and Σ j = covariance matrix of sample j.

Evaluation of the principal components

Selecting the components in which burn scars are emphasized, is important for any subsequent mapping attempt. The selection is rather a rejection scheme to filter out components that are dominated by information linked to unchanged landscape features. Likewise to reject ones that consist mainly of noise.

In this sense, we evaluate the outcomes of SVD considering in-depth the effects of the pre-processing transformations centering and scaling via MRPP on samples of the land cover classes of interest; by visually inspecting the principal components; and comparing the eigen 11 vectors 12 and eigen 13 values 14.

Results and discussion

We discuss hereafter the results of the transformations and their impact on spatial distances within and between the sampled land cover classes. In addition, we compare the performance of the four SVD-based PCA versions in terms of the spectral enhancement of burned area clusters via the Jeffries-Matusita index. Next, we evaluate the principal components visually and numerically. Regarding the latter, we thoroughly review the case of the bi-temporal MODIS data set (2a), how its variance is redistributed among the principal components. Finally, we justify the selection of the components that hold the highest separabilities.

Synopsis of pre-processing effects

Centering shifts the origin of the coordinate axes in the gravity center of the multidimensional data set. Scaling the centered dimensions forces unit variance before the analysis. In turn, this increases the influence of those variables with low variance and decreases the influence of those with high variance. Scaling, however, non-centered data does not yield to unit variance. It may even be mathematically questionable to do so, we do however include this combination for experimental completeness. While a centered SVD, equals the conventional EVD-based PCA, visual differences in terms of contrast may be perceived between components of the same order. These are atributed to the arbitrary sign in front of the eigenvectors.

Within- vs between classes mean distances

We performed the MRPP test in order to diagnose the internal heterogeneity of burned area samples (within-class low dispersion of mean) and question their distinctness among other sampled land cover features (between-classes heterogeneity).

The within-classes heterogeneity is described by the A statistic and deviates little, in general, before and after the transformations–overall around 0.4 for MODIS data and around 0.6 for Landsat5 TM data. Hence, the transformations do not operate destructively in the internal structure of clusters for each class.

Before the transformations, the MRPP statistics show that burned area samples have relatively small mean within-class distance, which reflects their low within-class heterogeneity. For example, the respective δ values for burned area samples extracted from the MODIS data sets 1a and 2a, are 796.8 and 1079.0, lower than the observed δ 0 for all observations 979.8 and 1361.0 respectively (Table 2). In contrast, urban areas (and similarly mineral extraction sites in Landsat5 TM data) present similar δ than δ 0 values (i.e. 1004.0 and 1354.0 vs. 979.8 and 1361.0), yet higher than burned areas. Depending on the temporality of the samples extracted from the uni- or bi-temporal composites, burned area class distances δ are close to the ones of vegetation, sparse vegetation, and bare ground. A clear disjunction of water samples is present in all sampled data sets.

In the transformed data (Table 3), it is evident that centering does not alter the within- or between-classes spatial distances. The mean distances are identical for all MODIS-derived transformed composites (Table 3a, A and C of 1a and 1b) and practically equal for all Landsat5 TM-derived transformed composites (Table 4, A and C of 1b and 2b).
Table 4

Statistics based on multiple response permutation procedures for principal components composites derived from Landsat5 TM composites (based on euclidean distance, 999 permutations and significance for all deltas 0.001)

Scaling effects on both the range and the shape of the original point scatters are evident in the statistics (A, δ 0 and δ e x p. values). For MODIS-based transformations, nearly all scaled data sets result in higher A values than the unscaled data (Table 3 - 1a: 0.439 (B) vs. 0.4282 (A); 0.4357 (D) vs. 0.4282 (C); and Table 3-2a: 0.4038 (B) vs. 0.3851 (A)). An exception is the bitemporal centered-scaled data set which is practically the same as the centered data ((Table 3-2a: 0.3833 (D) vs. 0.3851 (C)). For the Landsat5 TM-based transformations, scaled bitemporal data have reduced A values while for the scaled unitemporal data they are close to the A values that correspond to the non-centered and centered data. Hence, low A and decreasing δ o values, as observed for all scaled versions, reflect the suppresion of fine intra-class variations in the transformed data.

Lastly, we consider the classification strength values. Overall, the mean between-classes distances are higher than the within-classes distances for all data sets. For uncentered and centered data, both before and after the transformations, they are identical for the MODIS data sets (991.80 and 1058.76 in Table 2) and practically of equal importance for the Landsat5 TM data sets (64.97 and 55.13 in Table 2 and 67.35, 66.23 and 54.69, 54.62 in Table 4). In contrast, they are suppressed to low values and differ for all scaled versions. This translates in lower differences of within- and between-classes dissimilarities.

Estimation of class separabilities

Separability estimations between samples of burned areas and major land cover classes, quantify the magnitude of spectral enhancements. The indices are compared in a one-to-one manner, for all SVD versions, for each land cover class and principal component. Individual estimations and averages of the highest mean distances between samples of burned areas and major land cover classes can be extracted from Tables 5 and 6 for MODIS and Landsat data respectively.
Table 5

Jeffries-Matusita matrix for burned area against major land cover class samples extracted from Principal Components derived from MODIS data sets

Table 6

Jeffries-Matusita matrix for burned area against major land cover class samples extracted from Landsat5 TM data sets

In these tables, the row means correspond to the individual spectral separabilities between samples of burned areas, and other major land cover classes, for each row-specific principal component. The column means correspond to the individual spectral separabilities between samples of burned areas and the column-specific land cover class for each version of SVD-based PCA. To exemplify, in Table 5, the average of the spectral separabilities between burned and other classes (first row) 0.959, 0.451, 1.178 and 1.005, extracted from principal component 1 derived from the uncentered and unscaled version of the unitemporal MODIS data set, is 0.898. The average of the spectral separabilities exclusively between samples of burned and urban areas (first column) 0.959, 0.122, 1.215 and 0.181, for components 1, 2, 3 and 4, derived from the uncentered and unscaled version of the unitemporal MODIS data set, is 0.619.

Overall higher separabilities

For the unitemporal MODIS data set 1a, we gain higher overall average separabilities 0.722 in case uncentered-unscaled (A). The bi-temporal set 2a individuates the highest average 0.695 when the data are centered and scaled (D), practically identical to 0.694 when using uncentered-unscaled data (A). The corresponding average separation peaks for the Landsat5 TM sets, are 1.151 for the unitemporal set (1b) with uncentered-unscaled data (A) and 1.109 for the bi-temporal set (2b) with uncentered but scaled data (B).

Cell-by-cell highest separabilities

Overall, when comparing the separability matrices in a cell-by-cell-manner (per class and component comparison), most of the highest observed values are concentrated in the uncentered-scaled case (B) followed by the uncentered-unscaled (A), leaving behind the other two cases. Cases A and C share most of the unitemporal-based highest separabilities, followed by the uncentered-scaled, leaving behind the centered-scaled data. For the bitemporal sets, uncentered-scaled (B) data count most of the highest separations followed by uncentered-unscaled, centered-scaled and lastly the centered-unscaled (C).

Per-component and per-class highest separabilities

Centered and scaled data (D) produce the highest separations in components 1 and 2 while uncentered-scaled data (B) attach to components 3, 4 and 6. The 5th component contains the smallest number of separation peaks, most of them contributed when using centered-unscaled data (C). Urban area samples are best separated from burned areas when using centered-unscaled data (C), while vegetated and bare ground samples with uncentered-unscaled data (D). Water surface samples peak their distance from burned areas twice in both uncentered-scaled (B) and centered-scaled (D) data. Mineral extraction sites peak once in uncentered-unscaled (A) and once in centered-scaled (D). Concluding, the most critical classes are best separated by using uncentered-unscaled data.

Visual inspection of the components

Visual inspection of the transformed images serves for quick control and is part of the complete evaluation process. On-sight, components 2, 3 and 4 are expected to be among the candidates in order to extract burned areas.
  1. 1.

    MODIS unitemporal data sets

    Burn scars are distinguished in all components derived from the unitemporal MODIS data set (1a, Fig. 8). For all SVD versions, burned areas appear very poor in the first component and rather blurry in the fourth component. Only the centered (both unscaled and scaled) second component represents sharply the scars. The uncentered components 2 and 3, appear to contain similar amounts of information linked to burned areas.

     
  2. 2.

    MODIS bi-temporal data set

    The bi-temporal MODIS composite (2a, Fig. 9) yields components in which we identify the burn scars within the 2nd, the 3rd and 4th components. The 3rd component appears occasionally unclear. Fragments of burn scars appear also in the 6th component, though they are rather noisy and stripped. In contrast, the 1st and the 5th components do not appear to hold distinguishable burned areas.

     
  3. 3.

    Landsat5 TM unitemporal data set

    On inspecting the components coming from the unitemporal postfire Landsat5 TM composite (1b, Fig. 10), the uncentered cases (A, B) distribute the scars on all components but the first. Also, they are barely visible in the 6th component. Conversely, in the centered but unscaled case (C) they appear more concentrated within the components 2, 3, 4 and noisy in components 5 and 6. Finally, the centered and scaled case (C) clearly displays the burnt signals in components 2 and 3 while the signal is rather weak in the remaining.

     
  4. 4.

    Landsat5 TM bi-temporal data set

    The outcomes based on the bi-temporal Landsat5 TM composite (2b, Fig. 11), include in all SVD versions a 2nd component that holds a moderate burnt signal. Component 3 is weaker for the uncentered cases and even more weak for the centered cases (C, D). Component 4 is best in cases A, C and D except for the case B where scars appear very weak if visible at all. The 5th component holds recognisable scars only in cases B and D.

     

Visually comparing the outcomes of the transformations allows for a rough similarity grouping of the images between centered and uncentered. As well, we observe, that the uncentered-scaled set of components, deviates from the uncentered-unscaled components.

Using the bi-temporal MODIS and the unitemporal Landsat5 TM composites, uncentered data highlight the burn scars in the third and fourth components while they appear weaker in the 2nd component (Figs. 9 and 10 respectively). Centered data emphasize the large burned surfaces within the second component and slightly alter their presence in the fourth component. An exception is the 4th centered-scaled transformed image, which seems very poor for the features of our interest. Using the unitemporal MODIS data (1a), burn scars are divided among the second and third components. Finally, regarding the bi-temporal Landsat5 TM (2b) composite, uncentered-unscaled data spread the information in decreasing order of visual contrast against other features among the 4th, 2nd and 3rd components. The centered data, however, concentrate the scars in components 4 and 2 (Figs. 8 and 11).

Quantitative evaluation of the transformation matrices

Careful observation of the transformed variances expressed in percentage (%), reveals two groups of ranges for each component, depending on whether the input data matrix was centered or not (Table 7). This is expected as the first uncentered component passes through the origin of the coordinate system near to the centroid of the multidimensional point swarm. In the following sub-sections we discuss the effects of centering and scaling based on the transformation matrices derived from SVD on the bi-temporal MODIS composite 2a (Fig. 9). All numbers compared beloware drawn from Table 8. The transformations matrices for composites 1a, 1b and 2b are presented in Tables 9, 10 and 11.
Table 7

Variance percentages form two groups for each principal component depending on whether centering is applied or not in the initial data set

Table 8

Transformation matrices derived from SVD on the bi-temporal MODIS composite (corresponding components in Fig. 9)

Table 9

Transformation matrices derived from SVD on unitemporal MODIS composites (corresponding components in Fig. 8)

Table 10

Transformation matrices derived from SVD on the unitemporal Landsat5 TM composite (corresponding components in Fig. 10)

Table 11

Transformation matrices derived from SVD on the bi-temporal Landsat5 TM composite (corresponding components in Fig. 11)

A subtlety that affects the numerical accuracy of calculations is the divisor N used for the covariance matrix in the princomp function (an EVD-based PCA implementation) and the divisor N−1 used in the prcomp function (an SVD implementation) [17]. Though this should practically make no difference for samples containing more than 30 observations.

Variance

In general, uncentered data practically channel all of the original’s data variance in the 1st component (variances 98.5% and 98.3% for cases A, B respectively). On the other hand, centered data distribute significant amounts of information in higher order components (variances 74.7%, 72.1%, 74.7%, 72.1%, for cases C, D respectively).

For all cases, the variances of the last components (5th and 6th) are very low, while, as expected, the highest ones are identified in the major component (1st). In general, one can safely ignore these components since the former can be attributed to residual information and the latter mainly to unchanged features. Thus, we focus on the 2nd, 3rd and 4th components. The distribution of each original band in the transformed images is reflected in the eigen vectors, which act as weighting coefficients.

Centering

Centering decreases the absolute standard deviations of the extracted components. Yet the variance percentages of the higher order components increase substantially. This signifies that important amounts of the initial variation are redistributed among the higher order components 2, 3 and 4. On the contrary, performing the analysis without centering results in higher absolute standard deviations. Nonetheless, the variance percentages of the higher order components are substantially reduced in comparison to the 1st component. We then observe that centering relocates a lot of the information included in postfire band 2 in the 2nd component (eigenvector increases from 0.53 in case A to 0.73 in case C).

Burned surfaces are recorded as lower reflectance values in most of the spectral bands. Assuming they form data clusters which are clearly separated from the mean, the biggest portion of spectral information channeled in the 1st uncentered component, resembles mostly features other than burned. Postfire band 7-sourced information, increases in the 1st and 3rd centered components (respectively from 0.29 and 0.16 to 0.39 and 0.22) and decreases in the 2nd and 4th components (from 0.47 and 0.58 to 0.42 and 0.54) which might be also interpreted as a loss of useful information from the higher order components 2 and 4.

Scaling

While the effect of centering is obvious in both the eigen values (or singular) and vectors, scaling the input data deals with finer details. Depending on whether the dimensions to be scaled are already centered or not, the influence on the variance percentages of the extracted components varies. The variance changes very little, and only for the first two components, when using uncentered input data. Quite the opposite, using centered input data produces different percentages.

In general, scaling reduces the variance of the 1st component. The variance percentages for component 2 increase from 0.9 to 1% and 13 to 14.9% in cases A, B respectively. In the higher order components 3 and 4, scaling of the uncentered input data does not alter the variance percentages 0.4% and 0.2% respectively for cases A, B. The same is observed when using centered input data sets with respect to components 5 and 6 whose variances are 0.7% and 0.2% for cases C, D. This does not hold true, however, for components 3 and 4 where the numbers increase: 7.5 to 7.9% and 3.8 to 4.2% for cases C, D.

Worth emphasising is that scaling uncentered data prior to SVD relocates the biggest proportion of information originating from both the prefire and the postfire band 2 in components other than the 3rd. For case B, the prefire band 2 loadings in the 3rd component decrease from 0.52 to 0.18. Most of the prefire band 2 information is clearly channeled in the 4rth component (loading −0.69). The postfire band 2 loading in the 3rd component decreases as well from 0.50 to 0.21. Thus, burned areas appear isolated in the 4rth component (Fig. 9).

Selecting components with highest separabilities

Most of the highest per-class separability peaks, exist within the uncentered-unscaled data followed by the uncentered-scaled, the centered-scaled, and lastly the centered-unscaled data set. Yet, observations of the highest mean separabilities only, whether per-SVD version or per-class, do not suffice for selecting the best components. We know that the first and the last components are likely to be rejected. The first due to its highest variance, representing classes other than burned areas. The last due to its near-zero variance, capturing mainly noise. Hence, we focus on some of the higher order components, though, ignoring the last ones.

The mean separabilities for the components subset of our interest (meaning components 2, 3 and 4) are summarised in Table 12. The overall best PCA version for these components is the uncentered-unscaled one. Even in cases where centered data present relatively higher mean separabilities (in Table 12, 0.907 in case D over 0.902 in A for set 2a), we need to consider that a centered PCA redistributes greater amounts of the original variance–that is including unchanged patterns–among the higher order components.
Table 12

Mean separabilities for higher order principal components

Conclusions

The statistical evaluation shows that centering and scaling, prior to the application of SVD, operate on the input multi-dimensional matrix generally in a non-destructive way. If performed, centering modifies the way that data clusters are intercepted by the transformed axes. Effectively projecting spectral information related to unchanged patterns in higher order components. This works rather against the spectral enhancement of burned area clusters. Scaling smooths out fine variations existing in the original data. The latter may neutralise minor to moderate–but potentially useful details.

Within the framework of burned area mapping, the spectral separability estimations between burned and major land cover samples, point to the uncentered-unscaled SVD-based PCA version as the most suitable one. The uncentered-scaled version is rather expectedly not useful as it appears to have random effects. The centered-unscaled and centered-scaled versions should be tested. Yet, we generally discourage the use of scaling the original data if it is important to retain fine details after the transformations.

Since SVD is not optimised for class separability, centering or not centering the input data matrix, should be examined carefully. Even small improvements might be significant in further analysing the transformed data.

Endnotes

1 eigenvector decomposition

2 Distributed by the Land Processes Distributed Active Archive Center (LP DAAC), located at USGS/EROS, Sioux Falls, SD. http://lpdaac.usgs.gov

3 Local Granule ID: MOD09GQ.A2007242.h19v05. 005.2007244231200.hdf

4 Local Granule ID: MOD09GQK.A2006239.h19v05. 004.2006241155630

5 Available from the U.S. Geological Survey, http://www.usgs.gov.

6 Scene ID:LT51830332007248MOR00

7 Scene ID: LT51830332003237MTI01

8 Landsat Processing Details, ”USGS - Landsat Missions,” https://landsat.usgs.gov/landsat-processing-details(accessed April 16, 2017)

9 Driven by the sample size restriction in GRASS-GIS’ i.smap module, an implementation of the SMAPalgorithm [18] to perform supervised image classification

10 We use the term “class” in place “group” as used originally in the MRPP test

11 here actually singular vectors

12 vectors can be seen as loadings or weighting coefficients which determine the direction of the principal components

13 here actually singular values which are square roots of non-zero eigenvalues

14 eigen values represent the variance of the original data contained in the principal components

Declarations

Acknowledgments

The authors thank Aniruddha Ghosh and Georgia Kakoulaki for reading the manuscript.

Authors’ contributions

All authors contributed equally to this article. All authors read and approved the final manuscript.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Authors’ Affiliations

(1)
Independent Researcher
(2)
Department of Environmental and Natural Resources Management, University of Patras
(3)
Institute of Environmental Studies, Kurukshetra University

References

  1. Alexandris N, Gupta S, Koutsias N. Remote sensing of burned areas via PCA. Part 1: centering, scaling and EVD vs SVD. Open Geospatial Data, Software and Standards. 2017. doi:10.1186/s40965-017-0028-1.
  2. Jolliffe IT. Principal Component Analysis, 2nd edn. Springer; 2002. 28 illustrations. http://www.springer.com/statistics/statistical+theory+and+methods/book/978-0-387-95442-4.
  3. Lu D, Mausel P, Brondizio E, Moran E. Change detection techniques. Int J Remote Sensing. 2003; 25(12):2365. doi:10.1080/0143116031000139863.View ArticleGoogle Scholar
  4. Richards J, Milne A. Mapping fire burns and vegetation regeneration using principal components analysis. In: 1983 International Geoscience and Remote Sensing Symposium(IGARSS’83). San Francisco: 1983.Google Scholar
  5. Cadima J, Jolliffe I. On relationships between uncentred and column-centred principal component analysis. Pak J Stat. 2009; 25(4):473–503.Google Scholar
  6. Roy D, Lewis P, Justice C. Burned area mapping using multi-temporal moderate spatial resolution data - a bi-directional reflectance model-based expectation approach. Remote Sensing Environ. 2002; 83:263–86.View ArticleGoogle Scholar
  7. Roy D, Landmann T. Characterizing the surface heterogeneity of fire effects using multi-temporal reflective wavelength data. Int J Remote Sensing. 2005; 26(19):4197–218.View ArticleGoogle Scholar
  8. GRASS DT. Geographic Resources Analysis Support System (GRASS GIS) Software. Open Source Geospatial Foundation, 2008. Open Source Geospatial Foundation. http://grass.osgeo.org. Accessed 28 June 2017.
  9. QGIS DT. Quantum GIS Geographic Information System. Open Source Geospatial Foundation, 2009. Open Source Geospatial Foundation. http://qgis.osgeo.org. Accessed 28 June 2017.
  10. Warmerdam F. FWTools: Open Source GIS Binary Kit for Windows and Linux. http://fwtools.maptools.org/. Accessed 28 June 2017.
  11. R Core Team. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing; 2010. R Foundation for Statistical Computing. ISBN 3-900051-07-0. http://www.R-project.org. Accessed 28 June 2017.
  12. Oksanen J, Blanchet FG, Kindt R, Legendre P, O’Hara RB, Simpson GL, Solymos P, Stevens MHH, Wagner H. Vegan: Community Ecology Package. 2010. R package version 1.17-5. http://CRAN.R-project.org/package=vegan. Accessed 28 June 2017.
  13. Bossard M, Feranec J, Otahel J, Steenmans C. CORINE land cover technical guide – Addendum 2000. European Environment Agency, Kongens Nytorv 6, DK–1050 Copenhagen K, Denmark: EEA; 2000.Google Scholar
  14. Mielke PWJ. The application of multivariate permutation methods based on distance functions in the earth sciences. Earth Science Rev. 1991; 31:55–71. doi:10.1016/0012-8252(91)90042-E.View ArticleGoogle Scholar
  15. Sickle JV. Using mean similarity dendrograms to evaluate classifications. J Agric Biol Environ Stat. 1997; 2(4):370–88.View ArticleGoogle Scholar
  16. Richards J, Jia X. Remote Sensing Digital Image Analysis. An Introduction. Third, Revised and Enlarged Edition, 3rd edn: Springer; 1999, p. 363. Hard cover. ISBN 3-540-64860-7.Google Scholar
  17. R Core Team. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing; 2016. R Foundation for Statistical Computing. https://www.R-project.org/. Accessed 28 June 2017.
  18. Bouman CA, Shapiro M. A multiscale random field model for bayesian image segmentation. IEEE Trans Image Process. 1994; 3(2):162–77. doi:10.1109/83.277898.View ArticleGoogle Scholar

Copyright

© The Author(s) 2017