Intelligent mapping of irrigated areas from Landsat 8 images using transfer learning

The lack of reliable and up-to-date data in developing countries is a major obstacle to sustainable development. In Morocco, where groundwater withdrawals by farmers are very intensive and informal, maps describing and monitoring the extension of irrigated areas are scarce and labor-intensive to obtain. In this paper a novel transfer learning algorithm is proposed to map irrigated areas at different stages of an agricultural cycle from Landsat 8 images. The results obtained displays satisfactory performance over traditional machine learning algorithms. On a small dataset, we initially tested three well known deep learning architectures (SegNet, DenseNet and Unet). The results obtained were not satisfactory. So, to get high performance, we rely on a transfer learning architecture combining UNet with ResNet50 backbone (trained on 2012 ILSVRC ImageNet dataset) as a baseline after a phase where different configurations were tested. In the first part of this study, we compared the use of three optimization methods: Adam and two variants of Stochastic Gradient Descent (SGD) associated with two techniques (Cyclical Learning Rate and Warm Restart) to find the optimal learning rate and then test the impact of data augmentation on the overall accuracies. Data augmentation had improved the overall accuracy for the three methods. Adam based method from 94% to 97% with mean IoU of 0,79 (for all land cover classes) and 0,86 for irrigated areas class. For SGD based methods, the overall accuracy had increased from 91% to 94% with mean IoU of 0,75 (for all land cover classes) and 0,82 for irrigated areas class. As we are interested in having irrigated areas maps at different key periods of the agricultural cycle, we also explored, in the second part of this study, the temporal generalization of the best model.


INTRODUCTION
Deep learning (DL) is significantly impacting areas of research, including computer vision, image processing, and remote sensing (Ball et al., 2018) thanks to the increased availability of data and computational resources (Zhu et al. 2017).
Generally, traditional deep networks (DN) are trained using large datasets of imagery. However, in remote sensing the ones available are typically very limited (Ball et al. 2018). In such low to medium learning datasets contexts, some architectures like UNet and SegNet networks are frequently privileged (Younis and Keedwell 2019).
Fortunately, open high-resolution satellite imagery, such as from USGS and Copernicus, is becoming increasingly available. Such imagery can be used to extract useful insights to inform policy decisions in water resources management and feed datasets for training new deep learning architectures.
In this paper, we examine one of the different approaches used to address the lack of training data which are: 1) Unsupervised learning 2) Generative adversarial networks (GANs) 3) Transfer learning (Ball et al. 2018) which is still an active research area in remote sensing according to Tuia et al. (2016).
The goal of the next sections is to provide a brief overview of some existing architectures based on DN and the specific objectives of this study.

Deep Learning Approaches for Semantic Image Segmentation
DL architectures have been successfully applied to pixel-based classification of high-resolution satellite images outperforming standard image classifiers. It has been shown that it can achieve far better classification performances (Zhu et al. 2017;Liu et al. 2018).
Despite the lack of training data, Deep networks have proven to outperform at extracting mid-and high-level abstract and discriminative semantic features from images. Recent studies indicate that the feature representations learned by CNNs are greatly effective in semantic segmentation (Long et al. 2015;Khryashchev et al. 2018).
LeNet-5 is the reference structure of a CNN. It was developed by (LeCun et al. 1998). It consists of two convolutional layers followed by three fully connected layers.
Semantic image segmentation is defined as the task of clustering parts of image together which belong to the same object class (Thoma 2016).
So, to produce a land cover map, well known and deep architectures as SegNet, DenseNet and UNet can be used. The latter has received a lot of interest initially for the segmentation of biomedical images using a reduced dataset but then for a lot of applications in remote sensing (Iglovikov et al. 2017;Feng et al. 2019).
UNet architecture (Ronneberger et al. 2015) is like a convolutional autoencoder. It uses skips connections ( Figure 1) to reinject the features maps of the encoder part into the decoding phase and also transposed convolutions to reconstruct the original image resolution. These approaches use CNN's pretrained convolutional layers for classification, including VGG-16, as the encoder. The advantage of these symmetric approaches is that they can generate predictions at the same spatial resolution as the input image.  (Vooban 2017) In addition, given the better performance of the ResNet and DenseNet models in object recognition, researchers also tried to adapt these architectures for semantic segmentation. Thanks to the increase in GPU computing capacities (Wu et al. 2016) proposed a first approach for ResNet.
Recent research works have shown that deeper architectures, such as deep residual networks ResNets (He et al. 2106) can gain accuracy from increasing the depth of the network. These residual networks 1) are substantially deeper (Table 1) 2) have fewer parameters 3) are easier to optimize, and 4) can gain accuracy from considerably increased depth (Bilinski and Prisacariu 2018). 3.8*10 9 7.6*10 9 11.3*10 9 Also, a fully convolutional version of the DenseNet (Jégou et al. 2017) has also been proposed for semantic segmentation by combining an encoder-decoder approach with UNet inspired activation passing.
In SegNet, (Badrinarayanan et al. 2017) the authors proposed a convolutional encoder-decoder architecture for image segmentation. Similar to the deconvolution network. It consists of an encoder network, which is topologically identical to the 13 convolutional layers in the VGG16 network, and a corresponding decoder network followed by a pixelwise classification layer.
Fully convolutional network (FCN) approaches for semantic segmentation of remotely sensed images have become much more popular. FCNs infer a pixel prediction for the entire image in a single pass, avoiding as well the problem of the classification per patch. This drastically reduces computation times without requiring unsupervised pre-segmentation.
Generally, deep learning methods using FCN have emerged in few years as the new state of the art for many remote sensing image interpretation tasks (Liu et al. 2019) and UAV images as well ( Figure 2).

Related Works
The first applications of FCN on optical aerial data appear many years ago (Paisitkriangkrai et al. 2015;Sherrah 2016). Since (Mnih 2013) where authors tried using FCN for the extraction of roads and buildings in aerial images from image patches, these approaches have been successfully used on many Very High-Resolution satellite data (Lagrange et al. 2015).
In (Papadomanolaki et al. 2016), the authors used Convolutional Neural Networks (CNN) for the classification of SAT-4/SAT-6 dataset given by US National Agriculture Imagery Program. They compared different deep architectures (AlexNet, VGG. etc).
Some researches (Xu et al. 2018;Iglovikov et al. 2017) start from UNet based and adapted architectures to extract buildings, urban patterns and other land cover classes from satellite images.
Also, in (Audebert et al. 2016;Audebert et al. 2017) the authors train variants of the SegNet architecture on remotely sensed imagery over an urban area. The goal was to study different strategies to have an accurate semantic segmentation.
For hyperspectral image (HSI), Lin et al. (2013) introduced, for the first time, the concept of deep learning in a new framework of spectral-spatial feature extraction. (Pirotti et al. 2016), by using mainly ESA and USGS free images, benchmarked 9 machine learning algorithms (Random Forest, SVM and Neural Networks...etc.). These models were tested for accuracy and speed in training and classification of land-cover classes in a Sentinel-2 dataset.
Research on semantic segmentation includes some works that deal with training data scarcity as in our paper. (Acquarelli et al. 2018) proposed, in the case of lacking training data, a convolutional neural network with a single hidden layer that can achieve state-of-the-art performance by using three tricks: a spectral-locality-aware regularization term, smoothing and label-based data augmentation. (Vooban 2017;Jiang 2017) used only 25 labelled satellite images for training. While (Younis and Keedwell 2019) modified the structure of SegNet architecture ( Figure 3) and train it using 6 RGB images. The results of this study were also promising. As we mention before, transfer learning can be used as solution to fine-tune pretrained networks based on a small training dataset. This is possible thanks to transfer learning which seeks to learn from one area to another (Tuia et al. 2016). It can improve the learning process of a target predictive function from a knowledge-based source predictive function (Pan and Yang 2010).
Working on new target domain, two options for transfer learning can be explored (Ghazi et al. 2017). Using completely a pre-trained network to learn new features or fine-tune its weights.
Generally, most remote sensing-based transfer learning works are focused on updating the weights of a DL solution from another context to the current task based on available training data.
In (Yang et al. 2016) the authors used dual CNNs and transfer learning as inputs to a fully connected layer for classification. Lower and middle layers were trained on external dataset whereas the top layers were trained on the available training samples. (Othman et al. 2016) used transfer learning architecture trained on the ILSVRC-12 dataset. The trained system was next applied to the UC Merced Land Use and Banja-Luka datasets.
The nature of used mapping approaches by machine learning made it necessary to invest a lot of effort in training the models. In (Wurm et al. 2019) the study seeks to analyze transfer learning capabilities of FCNs to slum mapping in various satellite images. A model trained on very highresolution optical satellite imagery from QuickBird is transferred to Sentinel-2 and TerraSAR-X data.
Recently, advanced methods based on domainspecific transfer learning are proposed for semantic segmentation of remote sensing data. Panboonyuen et al. (2019) proposed a novel CNN called global convolutional network (GCN) which can capture different resolutions by extracting multi-scale features from different stages of the network.
In summary, automated land cover mapping based on satellite image is a great source of information for many fields such as land management, forestry, agriculture and so on. In Morocco, where groundwater withdrawals by farmers are very numerous and informal, the need for information on the location of irrigated areas rises up as a strategic objective.
Studies on the use of remote sensing for mapping irrigated areas in Morocco are uncommon. In (Merdas et al. 2015) we used low-resolution data (MODIS) and a time series of NDVIs to map irrigated areas using a pixel-based approach (86.29% of wellclassified pixels). This study was based on previous studies (Ozdogan et al. 2010). In (Benbahria et al. 2018), a new automatic mapping framework was proposed based on Landsat 8 (L8) time series images and using pixelwise classification Random Forest algorithm. (Zhang et al. 2018) tested an approach based on well-known image classification convolutional neural networks to automatically detect only center pivot irrigation systems from Landsat 5 TM images.
Our main objective, in this paper, is to evaluate recent Transfer Learning approaches and Semantic Segmentation to monitor the extension of irrigated areas at different stages of an agricultural cycle.
As a preliminary step, we experimented known deep learning architectures as SegNet, DenseNet and UNet with our train dataset and then a transfer learning architecture combining UNet with ResNet50 as backbone. Three specific objectives are set:  To compare the use of three optimization methods (Adam and two variants of Stochastic Gradient Descent (SGD)).  To evaluate the impact of data augmentation on the overall accuracies of the three methods.  To assess the temporal generalization of the model to imagery collected at different times and under different atmospheric conditions (in the same agricultural cycle).

Study Area
Experiments are conducted in Gharb site which is located in the north-west of Morocco (Figure 4). It currently counts 190 000 ha irrigated area. About 80 % of the rains are concentrated between November and April. The dry period is usually between June and September.

Used Data
Three L8 cloud free scenes (October 2015, May and August 2016) were acquired and pre-processed. The choice is based on three key periods in the agricultural cycle (Autumn, Spring and Summer).
In addition to RGB (R = NIR, G = R, B = G) L8 images, ground truth data are collected and used for learning and validation.
For collecting reference data, Corine Land Cover (CLC) based classification scheme is used by adapting it to the Moroccan context. Seven (7) classes were selected (Table 2) to appear in the final land cover maps. The training image dataset includes 50 images (128x128 pixels each with 3 channels (NIR, R, G)) with corresponding ground truth masks. The ground truth was collected by photointerpretation of the images and existing recent data (Ortho-images SPOT6 and 7 with 1.5m resolution and Google Earth).
For testing the models, initially three datasets of ten manually labelled images were prepared and then augmented to make tests on 100 images for each period of the agricultural cycle.
From the first image, we extracted well distributed 50 patches (128x128 pixels each with 3 channels (NIR, R, G)) covering all land cover classes. These patches were photo-interpreted and the vector results were converted to raster masks. Finally, all the generated images were resized to 224x224 to be suited for the training and validation of the experimented DL architecture (UNet with ResNet50 as backbone).

Evaluating Deep Learning Architectures for Irrigated Areas Mapping
Although it has not been designed specifically for satellite images, UNet architecture is increasingly applicable in remote sensing. As we will explain later, we use this structure as a base reference but combined with ResNet.
In this paper, due to the limited training dataset, we tested a UNet based transfer learning architecture (Yakubovskiy 2018) to perform land cover semantic segmentation of L8 images.
Our multi class segmentation is based on UNet with ResNet50 backbone (which has weights trained on 2012 ILSVRC ImageNet).
In the first part of this study, this architecture was experimentally assessed through three different use cases depending on the used optimization methods: Adam (Diederik and Jimmy 2014) and two variants of Stochastic Gradient Descent (SGD).
As we know, an important part of developing a DL architecture is the selection of hyperparameters. Different methods exist for choosing such values: 1. Manual: hyperparameters are set through trial-and-error until a usable set of parameters are found.
2. Search algorithms: A grid search, or random search algorithm can be deployed. The network is then trained on multiple models by using all combinations of parameters made available in these ranges (Bergstra and Bengio, 2012).
The learning rate (LR) range (base_lr, max_lr) was firstly determined through trial-and-error test by observing the variations of loss value. Then two techniques (associated with SGD) were experimented: Cyclical Learning Rate (CLR) (Kenstler 2018) and Warm Restart (SGDR) (Loshchilov and Hutter 2016).
According to (Smith 2017), letting the learning rate cyclically vary between reasonable bounds ( Figure 5) can increase the accuracy of the model in fewer steps and escape the saddle points more efficiently.  Figure 6) is similar to CLR. It applies an aggressive annealing strategy. So, the learning rate is varied during training deep neural networks and performance is improved (Loshchilov and Hutter 2016).

Figure 6. SGD with restart and cosine annealing
In the second part of this study, we assess (for the three adopted architectures) the impact of data augmentation. We considerably augment the initial training dataset (outside training process) combining 90 degrees rotations with top-bottom and left-right flip (Bloice, 2017) which increased the training dataset 10 times (500 images with associated ground truth).
In addition, we also assess the temporal generalization of the best model (from the three architectures) learned from the first Landsat 8 image acquired at the beginning of the agricultural cycle (Autumn period). The idea is to explore the feasibility of predicting (using the same model) the location of irrigated areas at some specific times in the same agricultural cycle (Spring and Summer periods).
As evaluation metrics (Liu et al. 2019) for image segmentation, we use Pixel Accuracy (Pacc) (Eq. 1) and Intersection over Union (IoU) (Eq. 2): Where, nij: number of pixels of class i predicted correctly to belong to class j ncl: number of classes ti: number of pixels of class i (ti = ∑j nij) Our architectures were implemented based on Keras API with TensorFlow backend. All models are trained and tested using Google Colaboratory.

Generating Land Cover Maps without Data Augmentation
On the limited train dataset, we experiment three of the known DL architectures: SegNet, DenseNet and UNet. The overall accuracy obtained doesn't exceed 47%.
To improve the performance, we rely on UNet architecture with ResNet50 as backbone and used a loss based on the categorical cross entropy. For all models, we use a minibatch size of 32 images (this choice was a result of trial and error testing for four values: 8, 16, 24 and 32) and fixed learning rates between base_lr=0.001 and max_lr=0.01. The results for the three use cases based on the variation of the optimization method are as follow. The results show a clear improvement with an overall accuracy fluctuating around 94% (76% on validation dataset for Adam method) (Figure 7). The two variants of SGD lead to relatively smooth increasing to reach 91% as overall accuracy and 72% on validation dataset (Figure 8) but with a value of loss (on validation dataset) that doesn't decrease under the value of 1. With mean IoU of 51% (63% for Irrigated land class) and with less confusion among classes, Adam gives the best result (Table 3). Hereafter some predictions outputted on some test images ( Figure  9) and the whole first image (Figure 10).

Generating Land Cover Maps with Data Augmentation
In the deep learning field, it is commonly known that a large amount of data is required to properly train a network. Unfortunately, accessing a suitable amount of data is not possible (or time consuming) for everyone along with data ground truth information.
In this second part of the study, we experimented the same three models on the augmented training dataset to assess the impact on performance and accuracies.
After artificially augmenting the training dataset and with less epochs (200), the overall accuracy has increased from 94% to 97% for Adam based method ( Figure 11) and from 91% to 94% for SGD based methods ( Figure 12). Also, the overall accuracy for the three methods on validation dataset has increased and reached respectively 92% and 91%.  (Table 4) perform well (less confusion among classes). Adam based method outperform the other methods especially for classes water and irrigated land. This latter is mapped with high degree of performance (IoU of 87%). Using Google Colaboratory with GPU processing capacities, it has been noticed that training and testing all models show the same speed performance.
Below, some predictions produced on some test images ( Figure 13) and the first whole image ( Figure  14).

Temporal Generalization of the Best Learned Model
The aim here is to assess how robust is the temporal generalization of the best learned model (from the previous phase). Two new test datasets (ground truth masks) are photo interpreted based on two acquisitions corresponding to spring and summer periods in the same agricultural cycle. The results of the predictions evaluations (Table 5) Figure 15. Predicted land cover maps for spring and summer L8 images The results show low accuracies for the predictions on spring and summer L8 images ( Figure  15 and 17). The main cause is the reflectance of the objects which change during the agricultural cycle. Especially for summer acquisition, more confusion between irrigated land and orchard classes ( Figure  16) and low performance for the other classes.  The aim of this third part experiments is to come up with a prediction model that can map irrigated areas at different times of an agricultural cycle. The reason behind is that our class of interest generally keep the same reflectance during the year. Unfortunately, 2015-2016 was a specific year marked by a severe drought in Morocco which explain the low accuracies and also the confusion among classes.

CONCLUSIONS
With the aim of irrigated areas automatic mapping from RGB Landsat 8 satellite images, we review three architectures based on UNet with ResNet50 as backbone. Initially, small dataset of 50 images with associated ground truth labels were used in training and validation. Without data augmentation, Adam based model gives the best result with mean IoU of 51% (63% for Irrigated land class) ( Table 6). We believe that better performances can be achieved using more data.
After artificially augmenting the training dataset, the overall accuracy has increased from 94% to 97% for Adam based model and from 91% to 94% for SGD based models. The irrigated areas are mapped with high degree of performance (IoU >= 82%). On the same area of interest and using Random Forest algorithm (Benbahria et al., 2018) we obtained less accuracy. This could confirm the relevance of using these new approaches based on deep learning architectures.
On the other hand, the temporal generalization of the best learned model to spring and summer L8 images (in the same agricultural cycle) leads to low accuracies with 37% and 28% IoUs respectively. Many enhancements can be explored to improve these results. Firstly, we can further augment the training dataset from different season's images. Adding more spectral bands and indices into training images should also be explored.

ACKNOWLEDMENT
This research was supported in part through computational resources of HPC-MARWAN (www.marwan.ma/hpc) provided by the National Center for Scientific and Technical Research (CNRST) in Morocco.
Also, the authors gratefully acknowledge all persons who have made available to the scientific community the codes in the GitHubs hereafter referenced.