Keywords

1 Sample Size Estimation and Spatial Distribution of Sampling Sites in a Stratified Randomised Design

When conducting error assessment, it is important to strike a balance between the theoretical requirements and the practical reality of implementation (Congalton 1991). In the map validation process, it is therefore crucial to have the right sample size and to use the right number of spatial units in order to ensure that reliable accuracy indices can be obtained without incurring high costs.

There is no single right way to calculate the ideal sample size; in general, this task could be regarded as a process of successive approximations, in which criteria such as the availability of resources, levels of sampling error, or the desired degree of accuracy all play an important role. The expertise of the user and his/her interest in certain thematic classes are also important factors in the success of the estimation process.

An initial estimation of the most appropriate sample size can be made with the formulae used in statistical sampling. Equations for the validation of thematic maps have often been taken from the original work by Cochran (1977) and for a simple, stratified randomised design, Stehman and Foody (2019) propose the following formula:

$$n=\frac{{z}^{2}O\left(1-O\right)}{{d}^{2}}$$

where O = accuracy expressed as a proportion (in the case of simple random sampling O is the anticipated overall accuracy, whilst in stratified sampling it is the anticipated user’s accuracy); n = number of sampling sites; z = percentile from the standard normal distribution (z = 1.96 for a 95% confidence interval); and d = desired half-width of the confidence interval of O. It can also be expressed as \(z*S\left(\widehat{O}\right)\),

In the case of stratified random sampling, Olofsson et al. (2014) recommend the following formula:

$$n=\frac{{\left(\sum {W}_{i}{S}_{i}\right)}^{2}}{{\left[S\left(\widehat{O}\right)\right]}^{2}+\left(\frac{1}{N}\right)\sum {W}_{i}{S}_{i}^{2}}\approx {\left(\frac{\sum {W}_{i}{S}_{i}}{S\left(\widehat{O}\right)}\right)}^{2}$$

where S(Ô) = standard error of estimated overall accuracy; Wi = mapped proportion of the area of class I; Si = standard deviation of class i, Si = \(\sqrt{Ui\left(1-Ui\right)}\); and Ui = User’s accuracy for class i.

Note that in both cases, it is necessary for the user to define certain parameters in advance, such as the permissible level of error (S(Ô)) or the user’s accuracy values. These data should be obtained from prior or approximate knowledge regarding the quality of the map or from previous experience in producing maps with similar characteristics.

Sometimes it may be difficult to estimate user’s accuracy, so practical recommendations for sample size calculation may be useful. Hay (1979) proposed allocating 50 sampling sites per thematic class. Congalton (1988, 2016), based on a series of Monte Carlo simulations, also recommended allocating 50 sampling sites per thematic class but only when map extent was under 500,000 ha and there were 12 or fewer thematic classes. In more complex situations, i.e. when the map extent was over 500,000 ha or it had more than 12 classes, he proposed allocating 75–100 samples per thematic class. According to his approach, therefore, total sample size is dependent on the size of the thematic map and the number of classes it contains.

Sampling design is another important factor to consider. Frequently used types include systematic, simple random and stratified random sampling. Traditionally, the cost or ease of fieldwork was a criterion for preferring some designs over others. With the increased availability of high-resolution imagery, in many cases, it is no longer essential to obtain data directly in the field. Reference data can now be interpreted from the imagery, so reducing costs dramatically.

The systematic sampling is easier to implement in the field, but has the disadvantage that it cannot be used to construct an unbiased variance estimator (Stehman 2009). Randomised designs can be more effective at estimating accuracy parameters (Congalton 1988) and can adapt more easily to changes in sample size without losing their probability sampling character (Stehman 2009). In the case of stratified random sampling, once the sample size has been calculated, rules must be established to allocate the sampling sites to each of the strata or thematic classes. These rules normally apply one of the following criteria: an equal number of sites in each class, a number proportional to the size of the class, or a number that depends on both the size of the class and the expected user’s accuracy for this class (optimal allocation). The allocation criterion affects the precision of some of the accuracy parameters. For example, with optimal allocation, the variance of the overall accuracy and the user’s accuracy for rare classes decreases. By contrast, with equal distribution, more precise estimates of the user’s accuracy for rare classes can be obtained, whilst in large classes, the precision decreases (Stehman 2012).

Regardless of the design chosen, a problem that sometimes arises is the under-representation of small or rare thematic classes in the sample. In other words, once the sample has been calculated and distributed, it may leave some classes with too few sites (<50). Some authors (Olofsson et al. 2014; Finegold et al. 2016) suggested a two-step solution for the specific case of stratified random sampling. First, calculate and distribute the sample according to the proportional or optimal criteria, and if any class turns out to have a small number of sampling sites (<50), then allocate 50 sites to it and recalculate the total sample size.

Once all the different stages of the accuracy assessment process have been performed, the precision values obtained should be reviewed, e.g. the magnitude of the overall accuracy standard error or the width of the user’s accuracy confidence intervals. Even if there are some variations from the expected values, if the values obtained meet the analyst’s targets as regards accuracy, then there would be no need to repeat the analysis (Stehman and Foody 2019). If not, it would be necessary to try again, adding new sites to increase the sample size.

QGIS Exercise: To calculate sample size and to distribute sample sites using a random stratified approach

Next, we present a practical way to carry out the sampling design for obtaining reference data. In this exercise, we will estimate the sample size for a stratified random design, for which we will have to specify the expected standard error of the overall accuracy and to provide an a priori estimate of user’s accuracy values. Sometimes, these figures may be difficult to provide in which case we can use the default values provided by the tool we will be using.

Available tools

• MapAccurAssess plugin

• Semi-Automatic Classification Plugin (SCP)

• AcATaMa plugin

There are several useful tools in QGIS for statistical sampling design. All of them are external plugins such as Semi-Automatic Classification Plugin (SCP), AcATaMa and MapAccurAssess. The MapAccurAssess plugin is a trial version specifically developed in the context of this book, which is not yet available in the official QGIS repositories.

SCP, which was developed by Congedo (2020), is a toolset for the classification and validation of land-cover and land use maps. With this plugin, the sample size and the allocation to each class must be calculated externally using a spreadsheet or other software. Once the number of sites per class has been defined, the plugin allows for a random distribution per thematic class. The size of the spatial units is indicated in number of pixels. Both the map and the samples must be in raster format.

AcATaMa was developed by the Group from the Forest and Carbon Monitoring System for the validation of single LUC maps (Llano 2019). It consists of a set of tools that guide the user through a series of steps: (a) sampling design (stratified or simple); (b) sample classification; and (c) calculation of the confusion matrix and accuracy statistics. In the sample classification step, the spatial unit is a pixel (or points in the GeoPackage or shapefile format), which is not very convenient for those who prefer to use a different spatial unit, such as group of pixels or polygons. At that stage (classification), a set of tools is enabled to zoom in on each of the samples, and four windows are created to display images of interest. An editable attribute table is also created to classify the samples.

In this exercise, we will be using MapAccurAssess, a plugin developed by the authors of this chapter, which includes several of the suggestions proposed by Olofsson et al. (2013) and Finegold et al. (2016). It is available at https://doi.org/10.5281/zenodo.5419130 with its associated documentation. For more information on the plugin, readers are referred to Chapter “About This Book”.

This plugin provides several functions for calculating sample size in a stratified random design, using Neyman’s optimal allocation to calculate the number of sampling sites in each thematic class. If, after that, any class has less than 50 sampling sites, it must be assigned between 50 and 100 sites depending on the complexity of the map. The result is a layer of points (shapefile) that are distributed over all the thematic categories of the map according to the stratified random design criteria. The points can be further modified to represent a polygon using QGIS functions.

Materials

Marqués de Comillas Land Use Cover Map 2019

Requisites

To calculate the area of each thematic class the LUC map must be projected in any cartographic projection (not geographic coordinates). The plugin has been tested on map projections with distance units in metres (rather than feet for example).

Execution

Step 1

Install the MapAccurAssess plugin. All the relevant information regarding the installation of the plugin can be found in Chapter “About This Book” and the plugin’s manual, which is included in the plugin’s download.

Step 2

Go to the Plugins Menu, select the Accuracy Assessment and Random point options. Alternatively, you can click on the Random Point icon (Fig. 1).

Fig. 1
figure 1

Exercise 1. Step 2. MapAccurAssess plugin icon

Step 3

In the dialogue box in Fig. 2, fill in the map filename (LUC map of Marqués de Comillas) and modify or accept the suggested values. A minimum distance between the centres of the spatial units must be specified in order to prevent overlapping, e.g. if the spatial units are square polygons of one ha, the minimum distance between their centres must be 100 m.

Fig. 2
figure 2

Exercise 1. Step 3. MapAccurAssess plugin

The Ui values (User’s Accuracy for class i) refer to an a priori estimation of accuracy for the thematic class, which could be based on expert judgement or on previous assessments. If there are any doubts about these values, the default values can be retained. Whilst Ui can vary between 0 and 1, a value of 0.5 was allocated to a large number of sites. Values of over 0.5 will generate a smaller sample size. The last stage is to select the folder where the results will be saved.

Results and Comments

The results are displayed in the Record tab, and two types of output are generated and saved in the selected folder: (i) a.csv file with statistics about the thematic classes and (ii) a point shapefile where the points represent the centres of the sampling sites.

The .csv file contains a row for each thematic class and four columns showing id, area, the number of sampling sites estimated using Neyman’s optimal criteria and the suggested number of sampling sites, adjusted to ensure that none of the classes have less than 50 sites (Table 1). If the area covered by a particular class is so small that 50 sites cannot be placed on it, the adjusted value will also be less than 50. Classes like this should be merged into other similar classes.

Table 1 Results from Exercise 1. Number of sampling sites per thematic classes

The vector point layer contains the spatial location of the centres of randomly distributed sampling sites (Fig. 3). Each point has two attributes, a unique identifier (id) and the thematic class value recovered from the LUC map.

Fig. 3
figure 3

Result from Exercise 1. Map showing the spatial distribution of sampling sites and the corresponding table of attributes

2 Collection of Reference Data for Assessing the Accuracy of a Thematic Map

One of the major challenges of map evaluation is to obtain a reliable reference dataset with minimal positional errors and with the same date as the LUC maps. The aim is to obtain a data subset that faithfully represents the population from which it was extracted, so as to obtain confident accuracy estimates (Stehman 2009).

The collection of reference data requires the prior definition of several aspects relating to the size of the sampling area and the characteristics of the information we want to obtain (Olofsson et al. 2014): (a) characteristics of the spatial assessment units; (b) sources of reference data; (c) labelling protocol; and finally (d) classification agreements. Spatial assessment units refer to the sampling areas where the reference and map values are compared. Traditionally, the chosen spatial unit was a pixel or a polygon, or even a group of pixels, although there is still no consensus regarding the best size (Stehman and Czaplewski 1998; Olofsson et al. 2014; Stephen and Wickham 2011). What is certain is that various factors must be taken into account. For example, when a pixel is used as the spatial unit, it must be decided whether the land-cover label to be assigned will be exclusively what is observed on each individual pixel or whether the surrounding context will be taken into account, so as to reduce possible georeferencing errors. If we use a polygon or group of pixels, it will be necessary to define their size, for example, one hectare or blocks of three by three pixels. The advantage of using an area larger than one single pixel is that the incorrect assignment of labels due to georeferencing errors is minimised. The major drawback is that each spatial unit can contain several different land-cover classes, which means that rules must be drawn up to assign the land-cover to the right class (Stehman and Wickham 2011). The minimum mapping unit of the map must also be taken into account, given that the spatial unit must not be smaller than the minimum mapping unit. In the end, each user will have to opt for the spatial unit size that best suits his or her purposes.

The reference source can be either observed field data or data interpreted from satellite imagery and aerial photographs. Although data collected in the field is always preferable, this method is much more expensive, and the interpretation of aerial photographs and satellite images is often regarded as an acceptable alternative. In this case, it is important to ensure that the reference data has a higher quality and resolution than the images used in the initial mapping process. The labelling protocol should be the same as that used in the mapping, i.e. the land-cover classes or types of change, and the photointerpretation criteria for labelling the sampling sites should be the same as those used when drawing the map being assessed.

When the reference data are obtained from satellite imagery, there is a degree of uncertainty associated with the level of expertise of the photointerpreter. This uncertainty can be reduced if classification criteria are established before obtaining the reference data. To minimize interpreter bias, we suggest that at least two specialists perform the class assignment independently. When different labels are assigned to the same sampling unit, a third interpreter must decide which class it should be assigned to.

It is also necessary to establish the criteria for dealing with non-ideal situations. When the spatial reference unit, defined as a set of pixels, contains several different land-cover classes we suggest, when possible, assigning the reference unit to the majority category, representing more than 50% of the area. Another complex situation could be when the reference unit contains a linear feature or corridor which is assigned to several different land-cover classes. In this case, we suggest moving the sampling site to another place in which there is less uncertainty regarding class allocation. The producer and the person(s) assessing the map should always reach agreement on such decisions and document them, so as to avoid biases in the accuracy assessment.

QGIS Exercise: To collect reference data

This exercise is a guide to collecting reference data. Instead of fieldwork, high-resolution satellite imagery, available on Google Earth, is used together with various QGIS tools. The result of this exercise is a set of comparisons of land-cover observation taken from the high-resolution image (reference data) and the land-covers extracted from the LUC map under evaluation. The output data are formatted to compute the error matrix and accuracy statistics. These calculations are explained in Part III of this book (Sect. 5 in chapter “Metrics Based on a Cross-Tabulation Matrix to Validate Land Use Cover Maps”).

Available tools

• SCP plugin

• AcATaMa plugin

• Vectorial menu

     Geoprocessing tools

        Buffer

• QuickMapServices plugin

• Google Earth Engine Data Catalog plugin

The Semi-Automatic Classification Plugin (SCP) and the AcATaMa plugin have a module for the collection of reference data. AcATaMa provides a multi-view interface that allows spatial units to be added and revised in an orderly manner. However, spatial units can only be one pixel in size. For its part, the process for collecting the reference data using SCP is very similar to the process that would be followed if just QGIS tools were used. Notwithstanding, as SCP uses a unique data format (.scp), it is quite complicated to add other types of data or to use information from other platforms.

Both plugins have valuable tools that assist in the capture of reference data. However, as we intend to use larger spatial units than one pixel and wish to keep the installation of new interfaces and formats to a minimum, we will only use the basic QGIS (Buffer) tools, and other data services such as the Google Earth Engine Data catalog and QuickMapServices.

Materials

Centroids of sample sites—Marqués de Comillas (the point vector layer RandomSample.shp created in the previous exercise that contain the centres of the sample sites)

Execution

Step 1

Before data collection can begin, the size and shape of the spatial unit must be established, i.e. the area over which we will be making the comparison between the thematic map values and the reference values. The minimum mappable area of the thematic map to be used in this exercise is 1 ha, and it is generally recommended that the spatial unit should be of a similar size. Accordingly, in this exercise, we will be using square polygons of 1 ha as the spatial unit.

The point layer containing the centroids (Centroids of sample sites—Marqués de Comillas) will be used to create the spatial assessment units. To form square polygons centred on each of the points in the point layer, use first the Buffer tool in the Geoprocessing Tools menu. The input will be the point layer, and the distance value depends on the desired size of the square. In this case, 50 m. Change the End cap style to “Square” and leave the rest of the parameters unchanged (Fig. 4).

Fig. 4
figure 4

Exercise 2. Step 1. Buffer

The newly created layer will have two attributes: the id and the value of the thematic class (inherited from the previous exercise). To avoid bias in the photointerpretation decision-making process, we advise hiding the class column (the value of the thematic class taken from the LUC map). To hide a column in the attribute table without deleting it definitively, right-click on the area of the attribute table headers, select the option Organize Columns, and then select the columns to hide (Fig. 5).

Fig. 5
figure 5

Exercise 2. Step 1. How to hide columns in the attribute table

Step 2

Ideally, to identify the land-cover type of each sampling unit, it would be necessary to overlay them on high spatial resolution images with the same (or similar) date as the images used in the mapping. If such data are available, photointerpretation of the spatial units can proceed directly. However, acquiring high-resolution images to verify extensive areas could be expensive. In this regard, sometimes the resources are limited, which restricts the use of this source of imagery.

In the following steps, we propose a partial solution to this problem based on the combined use of image servers (Google Earth, Bing, ESRI) with high spatial and temporal resolution. However, in these servers it is impossible to identify and select scenes according to their acquisition dates. One way to estimate the dates of these data is to compare them with images with higher temporal resolution, which have a known acquisition time, and for which a longer historical record is available, such as Sentinel or Landsat images. For this purpose, we will install two plugins with which we can access the high spatial resolution image servers of Google, Bing and ESRI (QuickMapServices) and Sentinel, Landsat, Aster and other images (Google Earth Engine Data Catalog). To see how to install these plugins in QGIS, see Chapter “About This Book”.

Step 3

Once QuickMapServices is installed, open the plugin and select Setting. Then select the More Services tab and click on the Get contributed pack. To add images with high spatial resolution to the QGIS Project, in the Web option in the main menu select QuickMapService, then Google and finally Google Satellite. After selecting these options, the Google Satellite images become available in the Layers menu.

Step 4

Add Sentinel-2 images from the Google Earth Engine Data Catalog plugin. The plugin requires to define the product type, date and cloud cover percentage (Fig. 6). The images can be saved temporarily or in permanent files. The images to be added should be dated as close as possible to the date of the evaluated map.

Fig. 6
figure 6

Exercise 2. Step 4. Google Earth Engine Data Catalog plugin

Step 5

To facilitate the collection of reference data, we suggest creating multiple windows to display images with different dates or resolutions, a good way to work when assessing land-cover change maps.

In the Layers panel, select the Image and vector layer (Centroids of sample sites—Marqués de Comillas) that will be added to the second window, click on “Manage Maps Themes” button (represented as an eye) and select “Add Theme”. Name the theme “Image 1”. Then go to “View” in the main menu, and select “New Map View”. This will create a new display window. Enter the new window and do the following: (a) Select the layers set to display (Image 1) by clicking on the on “Manage Maps Themes”; (b) Synchronise the windows by selecting the “view settings” tool, and then click on the “Synchronise scale” option. The example in Fig. 7 shows a Google server image (2) and a true-colour Sentinel image (1).

Fig. 7
figure 7

Exercise 2. Step 5. Synchronising windows

Step 6

To capture the reference data, the “Centroids of sample sites—Marqués de Comillas” vector layer must be edited and a new field (integer type) must be added. We suggest naming it “Refer_data”.

Step 7

To make reference data collection easier, we recommend displaying the Attribute Table as a form and anchoring it to the main window, displaying only the selected data. To do this, open the Attribute Table, select the “Dock Attribute Table” icon, select the “switch to form view” button and then “show Selected Feature” (Fig. 8).

Fig. 8
figure 8

Exercise 2. Step 7. Displaying selected data

Step 8

If you have completed Steps 1–8 successfully, you can now start photointerpreting high-resolution satellite images. The exercise involves identifying the predominant land-cover type in each sampling unit and recording the corresponding code in the “Refer_data” column of the attribute table (Fig. 9). The meanings of the codes are described in the auxiliary data distributed with the Marqués de Comillas LUC map, available at https://doi.org/10.5281/zenodo.5418318.

Fig. 9
figure 9

Exercise 2. Step 8. Photointerpretation over images of different resolutions displayed on syncronised windows

Photointerpreting all the spatial units of the sample can be a lengthy process, so we suggest that you try to photointerpret at least 10 to 20 spatial units and then compare your results with the “Photo-interpreted reference dataset—Marqués de Comillas 2019”, a reference dataset was prepared by the authors. It is available, together with all book’s data, at https://doi.org/10.5281/zenodo.5418318. For more information, see Chapter “About This Book”.

Results and Comments

The result of this exercise should be a shapefile with an attribute table in which the columns class (map class code) and refer-data (photointerpreted class code) are filled in, as shown in Fig. 10. From the attribute table, you can now calculate the error matrix and the map accuracy statistics, as is done later in Sect. 5 in chapter “Metrics Based on a Cross-Tabulation Matrix to Validate Land Use Cover Maps” of this book.

Fig. 10
figure 10

Results from Exercise 2. Table of attributes with the map codes and the photointerpreted codes

If the images used in the validation were acquired at a different time than the one for which the LUC map represents the covers on earth, this must be taken into account when assigning labels. This date mismatch may increase the uncertainty of the reference data, a situation that should be avoided.

It is worth remembering that in the absence of high spatial resolution imagery, medium resolution imagery, such as Landsat or Sentinel, can provide sufficient information to validate maps, especially small-scale maps.

Although the spatial assessment unit used in this exercise is widely used and recommended, it may contain several land-cover types. This means that clear rules should be established when deciding the category to which the unit should be allocated in these circumstances.