5 Land cover and crop classification schemas
5.1 Introduction
Satellite images are the most comprehensive source of data about our environment; they provide essential information on global challenges. Images provide information for measuring deforestation, crop production, food security, urban footprints, water scarcity, land degradation, among other uses. In recent years, space agencies have adopted open distribution policies. Petabytes of Earth observation data are now available. Experts now have access to repeated acquisitions over the same areas; the resulting time series improve our understanding of ecological patterns and processes[1]. Instead of selecting individual images from specific dates and comparing them, researchers can track changes continuously[2]. To handle big data, scientists are developing new algorithms for image time series (for recent surveys, see [5]). These methods are data-driven and theory-limited. However, numbers do not speak for themselves [6]. Data-driven approaches without solid theories can lead to results which will not increase our knowledge [7].
Consider how experts use Earth observation data. Their input are images with resolution ranging from 5 to 500 meters, produced by satellites such as Landsat, Sentinels-1/2/3, and CBERS-4. To extract information, experts use methods that assign a label to each pixel (e.g., ‘grasslands’). Labels can represent either land cover or land use. Land cover is the observed biophysical cover of the Earth’s surface; land use concepts describe socio-economic activities\cite[8]. Thus, forest' is a type of land cover, while
corn plantation’ is a kind of land use. To support land classification, scientists have proposed ontologies and descriptive schemes [9]. We might thus ask: Are the current classification systems suitable to represent land change when working with big data? If not, which concepts are needed and how should they be applied?
In what follows, we present the prevailing consensus on classification systems: FAO’s Land Cover Classification System (LCCS)\cite[10]. We argue that LCCS does not meet the challenges posed by big data. To support our views, we consider concepts used on image time series analysis; we show these concepys are related to event recognition and are not representable in LCCS. To improve the theory behind big data, we introduce elements of a phenology based approach for land classification.
5.2 Classification systems for Earth observation data: current status
The act of classification raises philosophical questions dating as far back as Aristotle. We use an a priori conception of reality to classify the world; what we observe has to fit our categories. Words in our language describe elements of the external reality. However, geographical terms such as mountain' and
river’ are imprecise and context-dependent[13]. These ambiguities have motivated research on geospatial semantics [16]. However, building such complete ontologies is hard. Janowicz et al. [17] recognize that “geographical concepts are situated and context-dependent, can be described from different, equally valid, points of view, and ontological commitments are arbitrary to a large extent”. Work on classification systems has shifted. Rather than using a single ontology, the current consensus argues for domain ontologies based on a common foundational ontology. These domain ontologies are means of making concepts of specific disciplines explicit and better communicating them [19].
The semantics of Earth observation data are constrained by classification systems. Experts agree on what are the possible descriptions of the objects in the image (e.g., “forest”, “river”, “pasture”). Each pixel of the image is then labeled using visual or automated interpretation. As an example, for countries reporting greenhouse gas inventories, the International Panel of Climate Change (IPCC) restricts the top-level land classes to ‘forest’, ‘cropland’, ‘grassland’, ‘wetlands’, ‘settlements’, and ‘others’. This approach is too simplistic. Sasaki and Putz [20] criticize the IPCC base classes for inducing wrong assessments for ecological and biodiversity conservation. The IPCC classes are an example where pre-conceived rules collide with the diversity of the world’s ecosystems.
Since land classification provides essential information about our environment, many GIScience researchers have addressed the subject of land use and land cover semantics [23]. They investigated consistency of classification systems [24], semantic similarity between terms used by different systems [25], and disagreements between results [26]. The current consensus favors ontologies aiming at unambiguous definitions of land cover classes, such as the FAO Land Cover Classification System (LCCS) [10]. For this reason, it is important to discuss whether LCCS works well with big EO data.
FAO has developed the Land Cover Classification System (LCCS) “to provide a consistent framework for the classification and mapping of land cover” [27]. LCCS is a hierarchical system. At its highest level, LCCS has eight major land cover types:
- Cultivated and managed terrestrial areas.
- Natural and semi-natural terrestrial vegetation.
- Cultivated aquatic or regularly flooded areas.
- Natural and semi-natural aquatic or regularly flooded vegetation.
- Artificial surfaces and associated areas.
- Bare areas.
- Artificial water bodies, snow, and ice.
- Natural water bodies, snow, and ice.
The division on eight classes considers three criteria: presence of vegetation, edaphic conditions, and artificiality of cover [27]. Specialization of top-level LCCS classes uses properties such as life form, tree height, and vegetation density, setting pre-defined limits (e.g., “tree height > 10 meters”). These subdivisions are ad hoc and application-dependent, leading to a combinational explosion with dozens or even hundreds of subclasses [10]. Such high expressive power can lead to incompatible LCCS-based class hierarchies [[24]}.
LCCS is a landmark initiative; it provides a basis for a common understanding of land cover concepts. Many global and regional land mapping products use LCCS, including GLOBCOVER [28] and ESA CCI Land Cover [29]. However, LCCS makes assumptions which limit its use with big data:
- LCCS describes land properties based only on land cover types, disregarding land use. For example, LCCS does not distinguish
pasture' from
natural grasslands’; it labels both as herbaceous land cover types. - The LCCS hierarchy uses hard boundaries between its subclasses. At each level of the hierarchy, properties of subclasses use fixed values (e.g., “sparse forests have between 10% and 30% of trees”). Real-world class boundaries do not fit into such strict definitions. When doing data analysis with machine learning, boundaries between classes are data-dependent and cannot be set a priori [30].
- Classification in LCCS has no temporal reference. LCCS assumes that subtype properties (e.g., percent of tree cover) are detectable at the moment of classification. These properties do not refer to past or future values. Land use and land cover types whose values require time references (e.g., “forest land cleared in the last decade”) are not representable in LCCS.
For example, the UNFCCC Reduction of Emissions by Deforestation and Degradation initiative (REDD+) requires representing and measuring forest dynamics [31]. Static and rigid definitions of “forest” used by LCCS cannot represent concepts such as `forest degradation’ [32]. Forest degradation happens when a natural forest loses part of its biodiversity and its tree cover. It is not a stable state but an intermediary situation that can lead to different medium-term outcomes. One can restore a degraded forest; degradation may continue and lead to complete loss of forest cover. Whatever the case, LCCS lacks explicit temporal information to capture forest degradation and thus support initiatives such as REDD+. Therefore, LCCS is thus not fit for many critical applications of EO data.
5.3 Elements of a phenology-based classification schemas
To represent change in geographical space, GIScience authors distinguish between continuants and occurrents [36]. Continuants refer to entities that “endure through time even while undergoing different sorts of changes” [33]. The Amazon Forest and the city of Brasilia are continuants. Occurrents happen in a well-defined period and may have different stages during this time. Cutting down a forest area, cultivating a crop in a season, and building a road are occurrents. Objects are associated to continuants and events to occurrents.
Atemporal classification systems such as LCCS refer only to properties of continuants. One can state facts such as “this area has 30% forest cover” using LCCS, but cannot assert that “this area lost 70% of its forest in the last two years”. To convey change, classification systems for big data need to include occurrents. In what follows, we discuss concepts used in the analysis of satellite image time series. These time series are extracted from organized collections of Earth observation data covering a geographical area in regular temporal intervals.
5.4 The key role of time series
Since remote sensing satellites revisit the same place, we can calibrate their images so that measures of the same place at different times are comparable (Figure \(\ref{fig:sits_a}\)). These observations can be organized so that each measure from the sensor maps to a three-dimensional array in space-time. From a data analysis perspective, each pixel location \((x, y)\) at consecutive times, \(t_1,...,t_m\), makes up a satellite image time series (SITS), such as the one in Figure \(\ref{fig:sits_b}\). From these time series, we can extract land-use and land-cover change information. In Figure \(\ref{fig:sits_b}\), after the forest was cut in 2002, the area was used for cattle raising (pasture) for three years, during 2002 to 2008, then turned into cropland.
References{-}