The RPAS images have been acquired in a testing area inside the Campus of Agripolis at University of Padova in the city of Legnaro (Italy). The size of the area is 241 m × 508 m. It contains heterogeneous land-cover, including bare ground, vegetation and urban features. The ground-truth is defined by direct observation. Eighteen ground control points (GCP) were defined in the area for orientation of the photogrammetric image block. The coordinates were collected with GNSS in Real Time Kinematic mode; the root mean square error (RMSE) of measures resulted between 0.008 and 0.011.
The RPAS flight was performed in November 2015 using a camera with Red, Blue, Green and Near Infrared camera (RGBI) carried by the SenseFly EBee fixed wing platform. The average ground sampling distance (GSD) was 4.5 cm on the ground at a flight altitude of 150 m. The images have been processed using Agisoft Photoscan. The result is an ortho-rectified mosaic of images, with an RMSE of 0.393 pixel. The final GSD, or spatial resolution, is 6 cm, so the final dimensions are 4020 X 8466 pixels, and the storage size is 48.9 MB. To reduce the computation time, the full dataset was resampled using the nearest neighbour algorithm to a cell size of 30 cm. The nearest neighbour algorithm preserves the radiometric values of cells. Then, the orthomosaic is clipped to a final dimension of 801 × 529 pixels, storage size of 1.21 MB (Fig 1). The RF and SVM machine learning methods were tested on the clipped image, using the R/rminer package [19] available in The Comprehensive R Archive Network repository [20].
The R/rminer package, version 1.4.1 for R is an aggregator of 14 classification and 15 regression methods. It also includes methods for determining common accuracy metrics over results [21, 22]. Two algorithms, Support Vector Machine (SVM) and Random Forest (RF) have been compared in this study. The SVM uses a separating hyperplane as a predictor. A decision plane divides dataset into two groups. Hence, the set of objects has different class memberships, and data are transformed in classes by using a mathematical function called kernel [23]. The RF classifier consists of a collection of trees. It samples randomly the original dataset, and defines decision trees using bootstrap aggregating. Bootstrap is a statistical technique that allows approximating statistics (e.g. average, variance, confidence interval) of data from the data itself. It is used when the distribution of the original dataset is not known beforehand. A complete tree with all branches is grown for each sample, and the predictors are applied to each branch [24]. Finally, the best variable obtained from the predictor is chosen, and predictions are aggregated in a new sample. Consequently, a new sample is predicted, and the estimation of errors can be calculated at the level of iteration and aggregation [25, 26]. In this study, RF and SVM have been trained using a subset of 2 to 20% of the total number of raster cells. For each percentage, ten training sets were extracted using stratified random sampling. This allowed to assess the variance from accuracy results for each size of training set. The control dataset is an independent classification based on photo interpretation as shown in Fig. 2. The classes for LULC are: (i) broadleaf, (ii) building, (iii) grass, (iv) headland access path, (v) road, (vi) sowed land, (vii) vegetable.
The framework of the benchmarking process is illustrated in Fig. 3. Each class in the area is represented differently in terms of number of pixels (i.e. area). Therefore, the number of pixels we sampled for training was proportional to the class area (i.e. stratified sampling). Pixels falling across two polygons, thus mixing two different classes, were discarded to limit using pixels with mixed spectral signature. For each set of stratified samples, ten different training sets and ten validation sets have been created. The training set is used to fit the model and to apply it for classification of the image. The validation dataset is the difference between the full set and the training set.
The framework trains and tests each of the two methods (RF and SVM) fitting the model and applying K-fold cross-validation. The K-fold cross-validation technique splits the data in K (10) sets (folds) of equal size. K − 1 subsamples are used as training test set, and a single subsample is used for validation. The procedure is repeated K times, but each subset is used only once for the validation.
The accuracy metrics used for comparing results are the Kappa index, the classification accuracy and the classification error. Their values range from 0 to 100, and they are estimated with three different approaches, (i) using pixels from the training test set and applying K-Fold cross validation, (ii) using pixels from the validation test set, and (iii) using pixels from the full test set.