Body Condition Score (BCS) Segmentation and Classification in Dairy Cows using R-CNN Deep Learning Architecture

Body condition score (BCS) is based on scoring of dairy cattle from 1 to 5 according to the appearance of animals. BCS is a subjective method based on assessing of subcutaneous fat thickness on the regions in back, waist and coccyx regions in cattle and the bone spurs in the pelvic region by visual inspection and palpation method. BSC of animals in among the most important indicator of whether the needs of animals are met in livestock enterprises. In general, BCS values are determined by a method based on expert knowledge and determined by observation. If the animal is above or below the desired BCS, at this stage, diseases resulting from metabolic problems, low yield or animal losses may occur. With the regular control of this situation, the profitability of the enterprise may increase with healthier animals. For this purpose, in this study, it is aimed to segment the required regions and to classify the segmented regions in order to perform BCS. Images taken from dairy cattle were trained with the R-CNN architecture used in object detection applications, which are among the Convolutional Neural Networks (CNN) architectures. Of the 184 images, 75% (138) were used for training and 25% (46) were used for testing. During the training phase, the regions where BSC could be conducted from the raw images were labeled and these regions were learned. Then, the segmentation of the correct regions from the new images to the system was tested. Pre-trained networks were utilized to increase system success. For the classification of the segmented regions, the CNN network trained with AlexNet architecture was used. When the overall success of the system was evaluated, the AlexNet network correctly segmented 40 of the 46 raw test images, and the AlexNet CNN network correctly classified 28 of them and provided 60.86% overall success. The VGG16 network correctly segmented 42 of the 46 raw test images, and the AlexNet CNN network correctly classified 30 of them, achieving 65.21% overall success On the other hand, The VGG19 network correctly segmented 43 of the 46 raw test images, and the AlexNet CNN network correctly classified 31 of them, achieving 67.39% overall success.


Introduction
Considering that 60-70% of the expenses of the dairy cattle enterprises is formed by feed expenses, it may be important to ensure that the needs of the animals are optimally met in terms of increasing the income of the farm. In addition, meeting the needs of the animals can influence the yield obtained from the animal, the formation of metabolic diseases and the profitability of the processing. For this purpose, the requirement of dairy cattle should be fully provided after birth until the next birth. During this period, the needs of the animal are very different and vary. Therefore, it is necessary to group the animals into 5 different phases and meet their needs in feeding of dairy cattle. After birth, the first 0-70 days are called as phase 1, 70-140 days are as phase 2, 140-135 days are as phase 3, 305-360 days are as dry period and the last 21 days are as phase 5. With this process, it is aimed to fully meet the nutritional needs of the animals by grouping them according to their periods. The most important indicator of whether the needs of the animals are met under the conditions of the enterprise is the BCS of the animals. BCS is a subjective method based on assessing of subcutaneous fat thickness on the regions in back, waist and coccyx regions in cattle and the bone spurs in the pelvic region by visual inspection and palpation method (Canatan, 2013). Body condition score (BCS) of dairy cattle is based on scoring of dairy cattle from 1 to 5 according to the appearance of animals. Since BCS is performed through observation, it is a very simple procedure and based on the experience and knowledge of the person. BCS in dairy cattle provides general information about the care and feeding of animals under farm conditions. Body condition scoring in dairy cattle is based on the visual evaluation as 1, 2, 3, 4 and 5. In this case, animals with different weights are roughly divided into scores as follows.
Score 1: The animal is nothing but skin and bones.
Score 2, 2-: The animal is in negative energy balance. Score 2 + : Milk yield is high in early lactation but problems may arise.
Score 3: The animal is in ideal nutritional balance.
Score 4: Condition is a high but milk production is bad. Lactation lasts too long and there is a high risk of infertility in the dry period.
Score 5: Over-fat cow. Fatty Cow Syndrome (FCS candidate) (Bayramoğlu, 2011). Although this scoring is very simple for the enterprise, it will be very important for the animals in the enterprise. In each phase, the animal must have the desired BCS value. If the animal is above or below the desired BCS at this stage, diseases resulting from metabolic problems, low yields or animal losses may be observed. With regular control of this situation, the production of healthier animals and the profitability of the enterprise may increase. The control of the will of the enterprise can also be controlled by this situation. The expected BCS values are 3 +, 4 -in calving, 3 -, 3 in early lactation, 3 in mid-lactation, 3 -in the end of lactation, 3 +, 4 -in dry period. There may be metabolic problems below or above this rating. For this reason, control of animals is possible with regular control of BCS in animals at regular intervals.
Increasing the milk yield of dairy cattle, regulating the postpartum anestrus interval of the cattle and meeting the needs of the animal during the year in order to prevent metabolic diseases will be possible by controlling BCS values regularly. For this purpose, animals need to be grouped for maintenance and feeding and accordingly the need of the animals should be met. BCS is an indicator of whether nutritional needs of grouped dairy cattle are fully met. It is also important in regulating the needs of those with low BCS and avoiding over or undernourishment of animals. Regular rationing of each group by grouping will ensure that the needs of the animals are fully met as well as providing the stability of the enterprise. For this reason, it is necessary to group the animals into 5 different phases and meet their needs, and the control of this process could be conducted by BCS.
After calving, cows with a high BCS at calving and greater lipid mobilization have a more marked alteration in oxidative status. These conditions may lead the cows to be more sensitive to oxidative stress. The study results on this issue emphasize the need for further investigation of the possible role of oxidative stress in the identification of obesity-related disorders in transition dairy cows (Bernabucci, Ronchi, Lacetera, & Nardone, 2005). Metabolic problems such as ketosis and fatty liver may also be encountered in animals that are in high condition when giving birth. Berry et al. (2007) reported that BCS has an effect on udder health of the animal. They found more somatic cell score in cows with high body weight, low BCS during early lactation or with high live weight loss. They also stated that some numerical differences between the groups may be due to a number of reasons such as the higher degree of stress in younger animals in breeding and early lactation. They also stated that animals with high BCS and low BCS were not suitable for animal welfare. (Berry et al., 2007). Roche et al., (2009) stated that there is general recognition that BCS provides a gross but reasonably accurate measure of a cow's energy reserves. Calving in appropriate BCS is the most influential method in the lactation calendar in terms of period in early lactation, BCS loss and milk yield. The data with an appropriate BCS provide reproduction, health and animal welfare, while indicating maximum milk production in the genetic merit of the cow. Cows with lower BSC level produce less milk, are likely have extended postpartum anestrus interval, are less likely to get pregnant, and are more likely to present themselves in an animal welfare-risk category (Roche et al., 2009).
When the literature is examined, it is seen that there are no studies on BCS classification by image processing method or selection of the required regions through the image and differentiation of them in the background for BCS. In this respect, the study will make great contribution to the literature and will lead to further studies.

Material and Method
In this study, a system is proposed for segmentation of the region where BCS classification can be made from the image and BCS classification of the segmented area. The proposed system consists of 4 stages. Firstly, the BCS regions of the images taken under the supervision of expert zoo technician were labeled. 75% of the labeled images were used for training and 25 % of the images were used for testing, and the system was modeled by transfer learning using R-CNN architecture. For the BCS classification, the segmented regions were further strengthened with transfer learning by another CNN architecture designed for segmented regions. Finally, the overall performance rate of the system was tested. The general architectural infrastructure of the designed system is shown in Figure 2. For the study, the images on which BCS could be provided were obtained. 184 images that could be used in BCS of dairy cattle from different enterprises were determined by expert Zoo technician. This process was performed by the expert entirely according to his/her experience and by examining the obtained images. Images were recorded in a way to show bony protrusions in the pelvic region of the animal (Figure 3). The BCS classification regions determined by the expert were marked and labeled on these recorded images (Figure 4). Labeling process was conducted using Matlab ImageLabeler application tool. In this way, a total of 184 different images were taken from different enterprises. The obtained images were used for input data of the system.

Convolutional Neural Network (CNN)
In order to develop a model and provide training in standard machine learning, feature vectors must be extracted first. It is necessary to refer to the knowledge of experts in determining these characteristics. This process takes time and is open to expert error. In order to eliminate this problem in the field of machine learning, deep learning aims to obtain results by processing raw data (Özkan & Ülker, 2017). The most important feature of the deep learning networks, which are designed in different models according to the field in which they are used, is that they do not require an "attribute engineering" to extract the attributes appropriate to the problem. In layers with deep structure, attributes are formed by learning of the network. Deep learning networks, which can decide on which information to learn on their own rather than using the information presented to them, therefore produce more successful results than classical methods (Kızrak & Bolat, 2018).
As one of the deep learning methods, the convolutional neural network (CNN) architectures are frequently used in the literature for image classification, object recognition and detection methods (Arı & Hanbay, 2019). A CNN is comprised of one or more convolution layers and followed by fully connected standard multilayer neural network. In deep learning CNN model, which has a multi-layer structure, data is transferred to the next layer by performing a separate operation on each layer. Each layer performs its own function. The layers in the CNN architectures and their operations are shown in Figure 5.

Figure 5. A typical CNN structure
Convolution Layer: Convolution process takes place by hovering a matrix with 3x3, 5x5, 7x7, 9x9, 11x11 dimensions on the image matrix. The specified small-size matrices hover over the entire image matrix, highlighting attributes in the image. A new image matrix is obtained at the end of the process (Doğan & Türkoğlu, 2018).
Pooling Layer: This layer is a merging process placed after the convolutional layer. It reduces the number of parameters that the network needs to learn. Thus, the process is accelerated. This reduces the number of pixels that need to be processed.
Relu (Rectified Linear Unit) Layer: It introduces non-linearity to the system. Activation function is applied to this layer. Negative values are set to zero, positive values are maintained. Since certain mathematical operations are performed in the convolution layer, which is used before this layer, the network is linear. This layer is applied to put this deep network into a nonlinear structure. By using this layer the network learns faster. It allows only active properties to be transferred to the next layer (Özkan & Ülker, 2017).
Fully Connected Layer: It is the standard neural network layer. It is used for classification purposes. The number of the outputs in the layer depends on the number of the classification that wanted to be learned.

Regional-Convolutional Neural Network (R-CNN)
Image classification is a process of identifying objects on an image. For this purpose, identifying the regions with images on the picture and trying to recognize the object only on these regions will give much faster results than scanning the whole image. The method that only searches on designated regions is called as Regional Convolutional Neural Network (R-CNN) (Özkan & Ülker, 2017).
The R-CNN architecture is generally consists of four parts. The first part contains the image to be classified. In the second part, the regions with objects are determined. In the third part, a CNN is run on which the classification process will be carried out. In the last part, the object class is determined and the corresponding region of the image is marked (Figure 6) (Girshick, Donahue, Darrell, & Malik, 2014). With this operating structure, R-CNN has achieved successful results in object detection applications. Figure 6. R-CNN structure (Girshick et al., 2014) In studies in the literature, pre-trained models are used to train the network in a shorter period of time without reducing the performance of the network when the data set is small. System performance is improved by using these models. These models are available to researchers free of charge by various sources. Pre-trained versions of these models are called as pre-trained networks. Conducting classification and object identification processes by adding to new classes to these networks is called as transfer learning. Current major pre-trained CNN models in the literature can be listed as follows: LeNet (1998), AlexNet (2012), GoogleNet (2014), VGGNet (2014), ResNet (2015) became successful in ImageNet Classification Competition and became prominent in the field (Kızrak, 2018).
LeNet: It is the convolutional neural network model that was designed in 1998 and was the first successful deployment of such a network. It was developed by Yann LeCun and his team to read digits and numbers on bank checks and bills. Experiments were conducted on the MNIST (Modified National Institute of Standards and Technology) data set. Unlike other models that developed after this model, average pooling is performed instead of max-pooling in size reduction steps. Also sigmoid and hyperbolic tangent transfer functions are used as activation function (Kızrak, 2018;LeCun, Bottou, Bengio, & Haffner, 1998).
AlexNet: It is a convolutional neural network that is trained on more than a million images from the ImageNet database. The network is 8 layers deep and can classify images into 1000 object categories, such as keyboard, mouse, pencil, and many animals. The network must have image input size of 227X227 (A Alex Krizhevsky, Sutskever, & Hinton, 2012).
VGG-16/19: It is a simple network model. The most important difference from the previous models is the use of convolution layers as in 2 or 3. In the fully connected (FC) layer, it is converted into an attribute vector with 7x7x512 = 4096 neurons. The softmax performance of 1000 classes is calculated at the two FC layer outputs. There are 16 and 19 layered versions according to the number of layers. Approximately 138 million parameters are calculated in 16-layered version and 144 million in 19-layered version. As in other models, the height and width dimensions of matrices decrease from input to output, while depth value (number of channels) increases (Bayramoğlu, 2011). The network must have image input size of 227X227 (Simonyan & Zisserman, 2014).
GoogLeNet: It is a predefined CNN with a 22 layers deep and 5.7% error rate. It is the winner of the ImageNet 2014 competition. This architecture is generally one of the first CNN architectures formed by overlapping the convolution and pooling layers in a successive structure. The GoogLeNet architecture is also very generous in terms of memory and power usage, because putting all the layers on top of each other and adding a large number of filters increase calculation and memory cost. This can also allow the network to memorize. To overcome this situation, modules connected in parallel are used (Özkan & Ülker, 2017;Szegedy et al., 2015).
Microsoft RestNet: It has a different structure than traditional successive network architecture as VggNet, AlexNet. Resnet micro architecture module structure is different from other architectures. It may be preferable to move to the substrates by ignoring the change between some layers. In Resnet architecture, this situation is allowed and the performance rate is increased to higher levels. The Resnet50 architecture includes a network of 177 layers. In addition to this layered structure, there is information about how the interlayer connections will occur (Doğan & Türkoğlu, 2018;He, Zhang, Ren, & Sun, 2016).
In this study, the R-CNN model was used to separate the regions in the image from the background by using the infrastructure of the CNN models, which have proven to be successful in literature, using transfer learning method.

Findings and Discussion
In this study, the images obtained for BCS classification were classified using pre-trained deep neural networks. The image showing that the BCS classification of 184 dairy cattle could be done by the expert was obtained and a pretreatment was performed. In this pretreatment, BCS regions were labeled using Matlab ImageLabeler application tool. Labeled images obtained after pretreatment (75% of the total image set, 138 images) were used for training. The remaining images (25% of the total image set, 46 Images) were used for testing. The AlexNet, VGG-16 and VGG-19 networks, which are among the pre-trained networks in the literature, were used and training of images was performed respectively using the R-CNN architecture. As a result of 1760 steps, consisting of 10 epochs and 176 iterations in each epoch, the obtained training times, training success rates and test success rates according to network types are given in Table 1. The BCS regions obtained after this step of the study were tested in a pre-trained CNN network. The CNN network used for the testing process has a structure modeled on the pre-trained AlexNet architecture, which was successfully tested in our previous study (Çevik &Boğa, 2019). As a result of the test, the score (BCS 1-5) of the segmented BCS region was tried to be determined. The system was enabled to test the images resulting from three networks (AlexNet, Vgg16, Vgg19) and the results are presented in Table 2.  Table 1 is examined, the network with the fastest training in terms of training time was found as AlexNet. VGG19 had the highest educational success with 96.88%. When the three network performances were examined, it was observed that educational success rates and test success rates were parallel to each other. In terms of training times, AlexNet was found to be nearly 20 times faster than the other two networks. However, its training and testing success was almost similar to other networks. Therefore, it can be said that acceptable results were obtained for Time/Performance evaluation of AlexNet network. In other networks, when the time was ignored, very successful results were obtained. Table 2 presents the test results of BCS regions, which could be segmented from raw images, with AlexNet CNN architecture. According to these results, the scoring of segmented images was around 70% for all three networks. Therefore, it can be concluded that VGG19 achieved the highest achievement in three networks with 72% success rate.
When the overall success of the system was evaluated, AlexNet network achieved 60.86% success rate by correctly classifying 28 of 46 raw test images. The VGG16 network achieved 65.21% success rate by correctly classifying 30 of 46 raw test images. VGG19 network achieved 67.39% success rate by correctly classifying 31 of 46 raw test images.

Results
In this study, it was aimed to perform computer aided classification of Body Condition Score (BCS) in dairy cattle. For this process segmentation of the region where BCS classification could be conducted and a system capable of BCS classification of the segmented area were proposed. In order to increase the success of the system, CNN models based on transfer learning method and the models with proved success in the literature (AlexNet, VGG-16 and VGG-19) were used. Within the raw images, BCS regions were segmented to pre-trained networks with the help of R-CNN architecture, and these segmented regions were classified from pre-trained networks with the help of AlexNet architecture. As a result of the tests, the highest performance was obtained in VGG19 network as 67.39% in terms of the overall success of the system. System performance will be increased by increasing the number of 184 images used in the current study. In addition, when the images on which incorrect BCS regions were identified or the BCS regions could not be determined were examined, it was determined that in the training data set, there were images that could be difficult to identify the BCS region from different distances. For this reason, it can be said that the creation of the data set by a single expert will increase the system performance.
By increasing such practices, some important points in animal production will be put into practice. This will allow the early detection of being overweight or too weak of dairy cattle, and therefore prevent metabolic problems. In addition, increasing the milk yield of dairy cattle, regulating the postpartum anestrus interval of the cattle and meeting the needs of the animal during the year in order to prevent metabolic diseases will be possible by controlling the fast and realistic BCS values regularly.
In the following studies, an automated BCS classification system can be designed for the end user with the help of an increased number of data sets using the specified R-CNN and CNN architectures. This application software can be run on the flowing images. In fact, this application software, together with the support of mobile technology, can allow users to access more easily.