Performance Evaluation of Capsule Networks for Classification of Plant Leaf Diseases

version of this paper was presented at 9th International Conference on Advanced Technologies (ICAT'20), 10-12 August 2020, Istanbul, Turkey with the title of “Performance Evaluation of Capsule Networks for Classification of Plant Leaf Diseases”.


Introduction
Food security is at the forefront of many issues covered in healthy life. Food security is a field of study that has a wide scope such as planting and growing of seed, supporting it with the right seed dressing, seed dressing, and methods in its harvest [1]. Especially in recent years, many diseases that cause pathology in normal tissues of cancer type also come to the fore as nutrition with genetically modified foods and readymade foods. In addition to its impact on human health, proper nutrition of food is also an important issue. Bacterial, insect-borne leaf, stem, fruit, root, and flower leaf plant diseases are the most influential factors on the harvesting process, harvesting quality, safety and efficiency of food production [2]. Identification of plant diseases and determination of their species is still carried out by traditional methods even in developed countries. Moreover, the control of harvest areas is done zone-byzone. According to the locally diagnosed plant diseases, spraying applied to the entire cultivation area decreases the yield and creates additional burden for the producer. Therefore, correct detection of plant diseases is of great importance and the realization of this structure with automatic systems will speed up the processes to stop the progression of the disease and improve the quality of the harvesting. Therefore, the quest for accelerator, reliable methods are of great practical importance.
In recent years, the widespread use of cameras especially on mobile devices and the active use of computer vision techniques have enabled the development of image processing and machine learning approaches. It provides the opportunity to recommend models supporting food production and modeling disease identification systems in agriculture, which is based on image processing after the use of hybrid analysis methods. Among these models, traditional machine learning approaches and hand-crafted feature extraction commonly performed for classification of various plant leaf diseases using images of diseased spots on plants. Jagan and Mohan focused on paddy diseases using Scale Invariant Feature Transform (SIFT) on leaf images. They experimented on k-Nearest Neighbor (k-NN) and Support Vector Machine (SVM) on SIFT features [3]. Phadikar analyzed rice leaf diseases using morphological features, radial distribution of pathology on leaves and histogram equalization features. They used Bayes classifier and SVM for classification and optimized classification parameters for non-linear SVM kernels [4]. Islam et al used percentage of RGB value for spot areas of disease form the leaf images. They classified rice diseases using Naive Bayes classifier using a fast method [5]. Usha Kumari et al. evaluated the efficiency of contrast, energy, homogeneity, and statistical features from segmented spots from the leaves for identification of tomato and cotton leaf diseases. They fed the hand-crafted features of spot areas on leaves into artificial neural network (ANN) and reported high classification performances [6]. Arivazhagan et al. applied color cooccurrence method to extract shape, color and texture statistical features of spot areas to identify beans, lemon, guava, potato, and tomato leaf diseases on a limited variety of images. They used SVM classifier with non-linear kernel to classify the texture features [7]. Chouhan et al. used Region Growing Algorithm, which is based on extracting similarity based correlation of intensity level, color, or scalar features, to identify various plant leaf disease. They classified the diseases using ANN with radial basis function kernel and reached high plan leaf disease identification performances [8]. Kumar et al. proposed exponential spider monkey optimization to extract significant features and experimented with SVM, k-NN, and ANN classifiers. They reported the proposal as a successful feature extraction [9].
Especially, popular Deep Learning (DL) algorithms can achieve high generalization capacity performances by detailed analysis with many layers, feature learning stages, and excluding feature extraction and image processing from the classification progresses. DL algorithms were commonly fed with raw images without pre-processing. DL has own feature extraction stages using convolution progress or autoencoder models. Therefore, implementing DL by transfer learning and modeling novel architectures is the most common technique for classification of images. Sladojevic et al. used convolutional neural networks (CNN) for identification of thirteen different types of plant leaf diseases including Pear, cherry, apple, grapevine, and peach on a large private database [10]. Lee et al. utilized feature learning using the pre-trained AlexNet weights on CNN architecture. They analyzed Malayakew plant leaf dataset using fine-tuning on AlexNet architecture and reached well enough identification rates for identification of forty leaves [11]. Amara et al. used feature learning on LeNet architecture on CNN to identify banana leaf diseases. They experimented on PlantVillage dataset and reported accurate achievements [12]. Brahimi et al. compared the efficiency of pre-trained CNN architectures, including GoogleNet and AlexNet, for identification of nine types of tomatoes leaf diseases. They fine-tuned the architectures on PlantVillage dataset [13]. Liu et al. compared the performance of popular CNN architectures including AlexNet, GoogleNet, VggNet, and ResNet for identification apple leaf diseases. They fine-tuned the AlexNet architecture using own database [14]. Ferentinos et al. experimented on various CNN architectures including AlexNet, VggNet, Overfeat, and GoogleNet for identification of fifty seven leaf classes including diseased and healthy in PlantVillage. They iterated on variations of classification parameters on CNN and reported the VggNet as the best architecture for leaf classification [15]. Mohanty et al. utilized GoogleNet and AlexNet on identification of forty plant leaf diseases on PlantVillage dataset. They applied low-level image processing techniques and segmentation for leaves before fine-tuning the architectures [16]. Zhang et al. analyzed peach leaf images to identify the diseased plants using AlexNet architecture and compared its efficiency with k-NN, SVM, and ANN. They reported the superiority of CNN architecture against conventional machine learning algorithms [17]. Geetharamani and Pandian proposed their own CNN architecture to identify the leaf diseases for thirteen plants in PlantVillage dataset. They iterated their model using different dropout factorization, learning rate, and fully-connected layers. Their most light-weight architecture for plant leaf disease classification was established a CNN with nine layers [18].
Capsule Network (CapsNet) is a specified Deep Learning model proposed by Hinton and his colleagues to overcome the deficiencies of CNN [19]. Whereas the CNN has high analysis capability on images with proven achievements; pooling layer, which is a down-sampling approach, causes data losses which may lead the training process prone to low generalization performance. Moreover, CNN cannot transfer spatial information and instantiation parameters such as pose of low-level features to each other, texture and deformation information. These cases give rise to error rates for classification of whole parts together on an object. The dynamic routing approach between capsules, which represents likelihood and spatial information between low-level features, provide transferring pose parameters and part-whole hierarchy [19].
CapsNet are commonly used for the researching areas that CNN achieved well enough classification and segmentation performances in a few years. Verma et al.
utilized CapsNet algorithm to identify potato leaf diseases in PlantVillage dataset. They also experimented on various pre-trained CNN architectures including ResNet, VggNet, and GoogLeNet to compare the performance of CapsNet and highlighted the superiority of CapsNet over CNN architectures in accuracy [20]. Dong et al. modified the CapsNet model by stacking three convolutional layers in addition the conventional CapsNet architecture for identification of peanut leaf diseases in their own dataset. They evaluated the performance of CapsNet with SVM and CNN and reported the CapsNet as machine learning algorithm with the best generalization performance for peanut leaf diseases [21]. Kurup et al. analyzed plant leaves from fourteen species from PlantVillage to identify the leaf diseases using CapsNet. They compared the efficacy of CapsNet and CNN for multiple diseases [22]. To the best of our knowledge, there is no research which focuses on directly identification of bell pepper leaf disease. However, in the papers on multi-class plant leaf classification through PlantVillage dataset incorporated bell pepper analysis. Majority of them shared an average classification performance for all plant leaf diseases in PlantVillage with none of class-specific performances. Therefore, the achievements present overall performance instead of assessing the generalization capacity of models for each plant. This paper aims at exploring CapsNet architecture for identification of bell pepper leaf disease in PlantVillage dataset, evaluating the generalization performance and ability transferring spatial information between capsules for diseased spots with many CapsNet models, and comparing the classification performances with state-of-art.
The remaining of the paper is organized to detail PlantVillage dataset and CapsNet algorithm in Section 2. The experimental setup and statistical test characteristics to evaluate the CapsNet architectures are shared in Section 3. The comparison on related works according to system performances, advantages and superiority of the models are discussed in Section 4.

PlantVillage Database
PlantVillage Database is a challenge dataset that aims at changing the traditional harvesting processes with novel computer-aided developments for identification of plant leaf diseases. It is collected by Land Grant University, USA [23]. PlantVillage is comprised a total number of 54305 leaf images with healthy and diseased spots from thirteen plants. Additionally, background images were shared in the dataset for segmentation researches.
The existing literature focused on one plant with binary disease classification (healthy-diseased), but also analyzed multi-class plant leaf diseases. In this study, we studied on identification of bell pepper leaf disease using a total number of 2475 images (997 bacteria diseased, 1478  Fig. 1.
We analyzed plant leaf images without cropping and segmentation of diseased spots. Each leaf image has posed with different angle and has a background view. The several images have shadow effect; however, none of images was excluded from the dataset to include actual cases including background and shadow. We augmented the leaf images by 4x using horizontal-, vertical-, and bothflipping. The plant leaf images were resized to 64×64 to obtain a standard input size for the CapsNet.

Capsule Networks (CapsNet)
Transfer learning provides collecting the randomization into a pre-defined space. One of the main reasons for efficacy of pre-trained CNN architectures is detailed convolutional analysis stages (CONV layer) for feature learning. This circumstance enables performing faster optimization for pre-trained weights using regularization and factorization techniques. On the other hand, using pooling layer (down-sampling) may cause significant data loss among feature maps. Capsule network (CapsNET) is a novel DL algorithm to overcome the shortcoming of CNN by excluding the pooling layer from the architecture and transferring spatial information between layers by capsule [19].
The input of a capsule is output of CONV layer at a specified number of filter sizes. The output of a capsule consists of the likelihood for encoded by capsules between feature maps and instantiation parameters including pose, texture, rotating, and deformation information [24]. The spatial information enables transferring part-whole hierarchy for learned feature map between low-level capsules [19].
The main benefits of CapsNet are dynamic routing between capsules, spatial information, and the squashing function for defining the output at [0-1] as likelihood. The capsules with activity vector fed into the fully-connected layers (FC) just as CNN [19], [24].
The activity vectors and spatial information makes CapsNet robustness to overfitting even for small-scale databases, dependent learning for part-whole hierarchy for rotated and scaled images. Furthermore, the dropout factorization speeds up the training of CapsNet by excluding the neurons in FCs by a similarity index at each FC [25].
The CONV in CapsNet represents generating different representations of input data according to feature maps and rectified-linear unit (ReLU). The structure of CapsNet for proposed leaf disease is indicated in Fig. 2. There is no pooling layer. The number of the CONVs depends on the level of the features to extract [26]. Whereas the first CONVs identify low-level features, primary capsules transfer the spatial information for the low-level maps. Therefore, composing a CapsNet depends on deciding many variations in the depth of CONV, number and size of filters, and finally classification parameters such as depth of FCs, number of neurons at each FC, learning rate, dropout index, and more [25].
The squashing function is: where represents for ℎ individual primary capsule predictions.
Primary capsule is the lowest layer capsule which extracts the existence and spatial information of feature. The next capsules (routing capsules) trace upper level features and instantiation parameters.

Experimental Results
The plant leaf images have shadows and sun shining effects. Due to increase the heterogeneity of the dataset with actual cases of monitoring plant leaves, each image was controlled in detail for the analysis. About a half of the diseased leaf images has small bacteria spots. This case provides assessing the CapsNet for early diagnosis of plant leaf diseases. Therefore, we experimented on a standard capsule layer in CapsNet architectures; however, various depth and range of neurons and FCs in supervised learning stage was iterated for defining the optimum model with highest classification performance for identification of bell pepper leaf disease. In this study, we shared the best achievements for the CapsNet architectures and compared with the state-of-art on PlantVillage dataset.
The leaf images were resized to 64×64 to obtain a standard input size for CapsNet. The RGB images transformed to gray-scale images before the analysis. Data augmentation was performed to increase the dataset for enabling the CapsNet model to learn various representations of plant leaf images to and to avoid overfitting. The number of analyzed PlantVillage images database was increased by 4 times by applying vertical flip, horizontal flip, and both. The pre-processing stage of the leaf images is indicated in in Fig. 3. Using data augmentation procedure, we got 3988 and 5912 plant leaf images for diseased and healthy bell peppers, sequentially. The experiments were iterated on various CapsNet models using the adaptability of FCs. The proposals were trained using 80% of the dataset stratified by plant leaves. None of the data augmentation was included for both training and testing the CapsNet. The remaining of the dataset was utilized to validate the performance of trained CapsNet. The test results were evaluated using independent test characteristics. Accuracy, sensitivity, and specificity were calculated using confusion matrix of the trained CapsNet model [27].
In the CONV layer of the CapsNet, convolution kernel size is 9×9 with 256 filters. Sequentially, 32 primary capsule layers were generated using a convolution kernel of 9×9 and stride of 2. Output of the each layer has 24×24 capsules (24×24×32). Each capsule is an 8-dimensional vector that is spatial information. The class capsule layer outputs 16 dimensional vector for per capsule. The 8dimensional vector is converted into 16-dimensional leaf capsules using an encoder procedure by the weight matrix . The class capsule layer, leaf capsules, generates a matrix with a dimension of 2×16 (healthy-diseased × spatial information).  The best classification performances for identification of plant leaf diseases for bell pepper are presented in Table  1. The achievements prove that with the CapsNet models has ability to perform accurate classification performances for identification of bell pepper leaf diseases using simple FCs with capsule and spatial information. Whereas increasing number of the neurons in each FC enhanced the identification performance, using big number of neurons for both FC1 and FC2 failed to separate the healthy and diseased leaf images. Using 960 neurons at FC1 and 768 neurons at FC2 with the proposed Capsule architecture reached the best achievements. The best bell pepper leaf disease identification performance was achieved with the rates of 95.76%, 96.37%, and 97.49% for accuracy, sensitivity, and specificity, respectively. The activation functions are RELU, RELU, and sigmoid for FC1, FC2, and FC3, respectively.

Discussion
Most of the novel papers in last decade focused on the pre-trained CNN architectures to identify plant leaf diseases. Especially, transfer learning approach is the main reason for this choice with popular architectures. The adaptability and applicability of pre-trained CNN architectures provide a steady optimization for the issues. However, the disadvantage of CNN in data loss with pooling layer makes use of CNN a handicap for researchers. The achievements extracted from the papers for bell pepper leaf disease identification on PlantVillage are presented in Table 2.
The performance of the proposed CapsNet architecture is comparable to a very limited study, since most studies presented overall accuracy rather than class-based classification performance. To the best of our knowledge, there is no paper that directly focused on identification of bell pepper leaf diseases. The achievements of related works were calculated using confusion matrix and weighted average of class-based performances for healthy and diseased pepper leaves. Geetharamani and Arun Pandian proposed light-weight CNN architecture with nine layers for identification of plant leaf diseases. They experimented on various batch size, epoch, validation folds, and dropout factorization index on CNN. They compared the efficiency of their proposal with popular CNN architectures including AlexNet, VggNet, Inception-v3 and ResNet and reported the superiority of their proposal with rates of 93.00%, 92.00%, and 93.00% for pepper leaf disease identification accuracy, sensitivity, and specificity, respectively [18]. Kurup et al. applied conventional CapsNet architecture to the PlantVillage dataset for leaf classification and leaf disease identification tasks. They achieved pepper leaf diseased-healthy identification performance rates of 91.00-96.00%, 85.00-92.00%, and 88.00-94.10% for precision, sensitivity, and F1 score, respectively [22]. Whereas CNN with various feature learning architectures is so popular, the deficiencies of CNN such as pooling layer and no connection between low-level features are the main topic of machine learning researchers. This paper explored CapsNet on pepper leaf disease classification using no pooling layer, transferring spatial information between layers by capsules. The proposed CapsNet architecture defined the most responsible FC models using experimental assessments and model-based evaluation metrics. Using big number of neurons at FC1 and smaller number of neurons at FC2 enabled learning more responsible and optimized models for identification of bell pepper leaf disease using simplistic supervised learning procedures. The proposed CapsNet architecture with the highest classification performance reached the rates of 95.76%, 96.37%, and 97.49% for accuracy, sensitivity, and specificity, respectively. The proposal outperformed the literature for pepper leaf diseases. Moreover, the proposal presents pepper-specific leaf disease identification on PlantVillage dataset.

Conclusion
We explored the effect of FC models for CapsNet architecture with various numbers of neurons at each FC. Although CNN have been the most popular machine learning algorithm in DL for a couple of years, the capsule architecture is a formidable opponent for image analysis due to capability of transferring spatial information, pose parameters, activity vectors, and part-whole hierarchy. The establishing pre-trained architectures on CapsNet have a bigger potential than CNN to be used for transfer learning. However, training big number of datasets is still time-wasting for CapsNet in DL. The optimization techniques in FCs may reduce the training time. The idea of CapsNet for spatial information must be supported with optimization algorithms on FCs.