Comparison of SSD and Faster R-CNN Algorithms to Detect the Airports with Data Set Which Obtained From Unmanned Aerial Vehicles and Satellite Images

Today, image processing has been used in many different sectors, especially in health, production and military fields, for various purposes directly in human life. The development of deep learning algorithms and starting to use of computer vision has accelerated the studies such as critical target, important location and strategic region determination especially in the military field. In this study, the airport has been determined on the landing runways. Training, test and evaluation data sets were created by using both medium and high-altitude unmanned air vehicles and satellite images. SSD-Single Shot Multibox algorithm and Faster R-CNN algorithm were used by re-training during the determination process. The results of both algorithms were evaluated within the extend of evaluation criteria such as accuracy, sensitivity, specificity, false positive rate, false negative rate, positive pred value, F score, error rate, result and training time. The image detection accuracy with SSD algorithm was 76,61%, with Faster R-CNN algorithm the image detection accuracy was 99.52% according to valuation dataset. With this study, which of the two architectures has been revealed to be successful in determining critical areas in unmanned aerial vehicles and satellite images.


Introduction
Today, it has started to be used for various purposes in many different sectors in image processing. The most distinctive examples of this include: facial recognition systems in the workplace, motion detection systems for security cameras, license plate identification systems, banknote recognition systems in bank ATMs, product detection systems in manufacturing processes, and military security systems.
Thanks to the latest developments in unmanned aerial vehicles and satellite systems, it is possible to take live images. Although there are specialists who constantly review these images, it is important for the continuity of correct detection to carry out an automatic system that prevent errors. Improvements in image detection plays a significant role in the subject selection.
Deep learning methods was not able to exceed 40% of the predicted percentages in the object detection studies conducted in the PASCAL VOC data set by 2013 (Girshick, Donahue, Darrell & Malik, 2014) [1]. This percentage has increased over the years with the use of deep neural networks in this area, reaching and exceeding the 80% level (Girshick, 2015) [2]. Being a specific part of machine learning, deep learning tries to reveal the unknown in the input distribution for successful results in this sense.
Recently, achievements have been incredible in the recognition of objects and in determining actions with the deep neural network architecture. Owing to these architectures' ability to extract and represent strong and distinctive features, it has become possible to create a deep feature detection, mapping and network structure setup on a particular object (Kamran, Shahzad & Shafait, 2018) [3]. The three main architectures that stand out in this sense are R-CNN (Regional Convolution Neural Network), Fast R-CNN and Faster R-CNN.
R-CNN has a convolutional neural network architecture working with zone extraction. This is basically determined by the region proposal that the object is likely to be the object. It is ensured that the dimensions are synchronized before the convolutional neural network architecture process. Images with synchronized dimensions are passed through the convolutional neural network. Obtained results' image border is determined by Support Vector Machine and estimation process is also verified. (Xiaozhu, 2017) [4] (Hsu, Chang and Lin, 2016) [5]. Although it produces successful results, the biggest drawback of R-CNN architecture is that the time allocated to the training and testing stages is quite long. A different version of this method is optimized for Fast R-CNN architecture.
Instead of making a region proposal for the object whose basic difference will be determined according to R-CNN, fast R-CNN architecture sends the image directly to the convolutional neural network. Thus, a high resolution feature map which is compatible with the original image is created. Region maps are created by selective search method on this map. Thus, region detection is determined by the feature map, not the original image.
Another method defined as Faster-CNN, one of the deep learning techniques, constructs a region network proposal instead of creating a region proposal with selective research like Fast-CNN. This method which is also called Region Proposal Network (RPN), performs the same operations as Fast-CNN after the network proposal setup. To sum up, this method works with four different network architectures that need to be trained such as RPN classifier, RPN limiter, final scores and final limits (Hsu, Chang & Lin, 2016) [5].
SSD (Single Shot Multibox Detector), which is one of the deep learning techniques, provides object recognition at once with a different approach. While the region proposal and region classification are done in 2 stages in Faster R-CNN, in SSD technique, they do both in one convolutional neural network at once.
In this study, it is aimed to identify and mark the airports defined as critical regions from satellite images and unmanned aerial images. While performing this determination process, both SSD (Single Shot Multibox) algorithm and Faster R-CNN algorithm were used to compare the performances of artificial neural network architectures and the results were evaluated.

Fast R-CNN
Fast R-CNN has similar approach like R-CNN, however Fast R-CNN combined different tecniques to fast up the object detection process. Instead of making region proposels, entire image puts into the CNN on Fast R-CNN Algorithm. As a result, high resoluted convolutional feature map obtaned. Approximately 2000 region of interest defined on convolutional feature map via selective seach. After that proposed regions wrap on fixed size via rool pooling layer then connected to fully connected layer. Softmax uses for classification and linear regression uses for bounding box for each region of interest. (Girshick, 2015) [2] Fast R-CNN advantage is using convolutional feature map, however selective seach is still make bottle neck for process. (Girshick, 2015) [2] Şekil 2. Fast R-CNN Architecture (Girshick, 2015) [2] Avrupa Bilim ve Teknoloji Dergisi e-ISSN: 2148-2683

Faster R-CNN
Faster R-CNN structure is developed by Shaoqing Ren in 2016. Faster R-CNN doesn't contain selective seach in the object detection process and this is the main time advantage of this algorithm. Selective search is bottle neck for object detection process. Image puts into the convolutional layers on Faster R-CNN Algorithm to obtain high resoluted convolutional feature map as the same as in Fast R-CNN. Instead of using selective seach, region proposes made by region proposal network. Proposed regions reshaped via rool pooling layer. On the same layer, image classification and bounding box processes completed within the regions of interest. (

Single Shot Multibox Detector
Liu, Anguelov, Erhan, Szegedy, Reed, Fu and Berg developed new deep learning methodology "Single Shot Multibox Detector" for real-time object detection. Faster R-CNN uses region proposal Network for classification and bounding box process. Single Shot Multibox Detector detec the object in one shot. Region proposal network eliminated in SSD.
Input image feed into convolutional neural network in Single Shot Multibox Detector as a first step. Feature maps are produced within different scale. Restrictive rectangles are produced by 3x3 convolutional filter on feature maps. Borders and classifications are defined each rectangels as the same time. This rectangels located on each activation maps, thus detection be able to make on differed scaled objects. Correct borders and predict borders are compered during the training process. Best predicted rectanges and prediction rate over then 0,5 rectanges are labeled positive. (

Creating Training, Testing, Validation Data Sets
It is important to prepare data sets before starting retraining in SSD-Single Shot Multibox MobileNet_v2 (Github, 2019) [8] and Faster R-CNN Inception_v2 (Github, 2019) [8] architectures to be compared in image processing. Since the main purpose of the study is the detection of airports from satellite and medium and high-altitude unmanned air vehicles over airport runways, the datasets were created from the images taken by satellite and UAV (Unmanned Aerial Vehicle).
In the first stage, 310 images, including one and more than one airport, were taken over Yandex Maps, Google Earth and the internet. The airport images in all of them are labelled using one or more additional interface programs.
While collecting the data, regions with different seasonal characteristics were selected from all over the world. Additionally, images were collected from many different terrain conditions such as forest area, sea and ocean edge, island surface, desert and city center especially to push the model. Especially the airports located near city centers and main roads were preferred to push the model because the highways are similar to the airports.
Again, the images with altitude up to 2 km-14 km were preferred to push the model. In this way, the scale of the region to be determined was changed. 310 images, including colorful and colorless images, are divided into two as 80% education data set and 20% test set. Accordingly, 248 images were included in the training data set and 62 images were included in the test data set. The images in question were placed especially on images containing more than one airport. In both data sets, the airports are marked on the images and the coordinates have been converted to XML format for training. These clusters in the form of XML files have been converted into TFRecord files for training of the Tensorflow library, and important parameter changes have been made for training.
In addition to the training and test data sets, an evaluation data set that is completely independent and different from the training and test data sets has been created for an objective evaluation of the results and performances. The dataset in question was obtained from different terrain conditions, in areas belonging to different geographical climate types and from different altitudes. The evaluation data set includes 50 images containing one and more airport runways and 50 images without airport. Below are a few images of the airport used in the training data.

Results and Performance Evaluation Criteria
In order to evaluate the results and performance of retrained SSD-Single Shot Multibox and Faster R-CNN architectures, the criteria in the table below have been calculated according to the evaluation dataset measurements. In addition, both training and output given durations were calculated and compared for both algorithms. While performing these calculations, results were obtained from the same workstation with Intel Core i53230M 2.60 Ghz CPU in both architectures.
In general, the real and predictive values of the classes are compared with the confusion matrix for the performance evaluation of these algorithms. (Polat, Mehr and Cetin, 2017) [9]. Receiver Operating Characteristics (ROC) is one of the methods which is used to measure classification performance (Lasko, Bhagwat, Zou, and Ohno-Machado, 2005) [10].
To compare the performance of the related algorithms in the study, four possible performance results of the results produced with the evaluation data set are given in the table below.  The sum of the true positive rate and the false negative rate is equal to 1. In addition, the sum of the true negative rate and the false positive rate is equal to 1.
Posiyive pred value: shows how much of the classes labeled as positive is actually positive.
ROC curve, which is another evaluation criterion, is the curve where the right and false positive rates for different threshold values are located on the horizontal and vertical axes (Metz, 2006) [11]. The test that gives the ROC curve closest to the upper left corner is known to be the most useful (Dirican, 2001) [12]. In the ROC curve which is on the diagonal curve and approaches the upper left corner, x is considered to be more successful as the classification performance success of the x and y classifiers. (Fawcett, 2006) [13]. When the evaluation criteria are gathered, the following table emerged.

(FP+FN)/(N+P)
Besides, the training times and the results of the models are also important. That's why analyzes were performed by calculating the mean, standard deviation, minimum and maximum values.
Loss Rate: Loss rate is the average of the losses in each training set group. Because a deep learning model learns over time, the loss of one step over the first time is generally higher than the last time. The loss rate reveals that the model has been acting strong or weak after each iteration of the training phase. It is aimed to decrease the rate of loss after repetitions (Chen, 2017) [15].
The Loss function basically calculates how the model's estimate differs from the ground truth. Therefore, if we haven't been able to create a model that predicts well, the difference between the ground truth and the predicted value will be high, so the loss value will be high, and if we have a good model, the loss value will be low. If it is exactly the same, the loss will be 0.
There are multiple types of linear loss functions that calculate the loss value. These loss functions normalize the scores which are produced by the artificial neural network and calculate the loss value. The most common of these are Sigmoid, Multiclass Support Vector Machine (SVM), Softmax etc.

Results and Discussion
The results of retraining the SSD-Single Shot Multibox and Faster R-CNN algorithms were evaluated with the evaluation dataset which was created completely independently from the training and test dataset, and the results were obtained.
Firstly, when the accuracy rates were compared, a very successful result was obtained with the Faster R-CNN algorithm with 99.52% detection. The result obtained with the SSD algorithm remained only 76.61%.
Comparing the training times for the two algorithms, the training with the Faster R-CNN algorithm took 165,62 hours with the Intel Core i53230M 2.60 Ghz Processor workstation, while the training took 763.88 hours with the SSD algorithm. In this case, the training of the SSD mobilenet_v2 architecture took 4.6 times longer than the Faster R-CNN inception_v2 architecture. Aforementioned process is expected to obtain faster results if a GPU or if a CPU with a higher processing speed is used instead of CPU.
The comparison of the results for both algorithms according to the evaluation criteria is as follows.  As can be seen, the detection of Faster R-CNN airports is very successful according to the SDD algorithm and the error rate is very low.
If the results of the Faster R-CNN and SSD algorithms or image detection times are compared, it is concluded that the SSD Mobilenet v2 architecture can detect images faster than 2.5 times compared to the Faster R-CNN Inception v2 architecture. The average times, standard deviations, minimum and maximum result times of the detection durations are summarized for comparison in the table below. In addition, 2 sample tests were carried out for the result production durations of two series by using the Minitab program. First of all, for the SSD algorithm and Faster R-CNN algorithm, normal distribution compatibility test was performed separately. As a result of evaluating the result serial data of both algorithms with Anderson-Darling Test, both P values are considered H0 because they are greater than 0.05. Accordingly, both series are suitable for Normal Distribution.
As a result of compliance of both series with Normal Distribution, two sample tests were started with 95% significance level. 0 : ℎ ℎ ℎ .

Figure 8. SSD Algorithm and Faster R-CNN Algorithm Two Sample Test Results
H0 is rejected because the P value is less than 0.05. Accordingly, there is a significant difference in terms of image detection times for SSD Algorithm and Faster R-CNN Algorithms. "Estimate for difference" indicates the difference between the averages and this difference is specified as 6,301 seconds.
Retrained SSD Mobilenet V2 and Faster R-CNN inception_v2 Algorithms results put in the ROC Curve. When results compared each other, it has been seen that the Faster R-CNN inception_v2 architecture's classification performance is higher than SSD Mobilenet V2 architecture. When the Total Loss graphs are compared, the loss rate of the SSD algorithm in 50,000 steps is approaching level 1 starting from 9, and in the Faster R-CNN algorithm converging from 0.15 to less than 0.1. This shows how strong and accurate the Faster R-CNN algorithm has been compared to the SSD algorithm. In addition, it indicates that the error rates have decreased very quickly. Total Loss gets under level 1 before the 1000 iterations on Faster R-CNN algorithm, however total Loss has just gotten under level 1 after the 50.000 iterations on SSD algorithm.

Conclusions and Recommendations
With this study carried out in the field of image processing, it has been clearly proven that which algorithm will produce a more successful result in the detection of objects from unmanned air vehicles and satellite images, which one should be preferred for which purpose. With the study carried out, an assistant decision support system which produces 99.52% accurate results with the Faster R-CNN algorithm; and 76.61% with the SSD algorithm in the determination of airports has been developed for the operators that examine and analyze satellite or unmanned air images With the successful detection of the image, it has turned out that the data set of multiple different objects that can be introduced for different images taken from the air and space are created and the model can be successfully detected after retraining. These systems, which can be developed as a result, will both support the operators and minimize the risk of overlooking or making mistakes.
In this study, which was carried out with Intel Core i53230M 2.60 Ghz Processor workstation, the production time of the results obtained with the evaluation data set with the SSD algorithm on average was 4.06 seconds and 76.61% successful detection was performed at the end of the 763.88 hour training. With the Faster R-CNN algorithm, 10.37 seconds and 99.52% successful detection was achieved after 165.62 hours of training. Past studies show training and test time comperation between R-CNN (fast -faster) algorithms on VOC data set [16]. This study shows the comperation between Faster R-CNN and SSD training and test time on specific data sets. If the study is carried out via a better-speed CPU or GPU, a significant reduction in both training time and object detection time is expected. With the shortening of the time, the image detection process can also be used in live vehicles in live images. As mentioned, 310 data set gathered from UAV and Satellite images, if the data set images quantity raised, model can be more adapted different situation and accuracy rate gets higher. Nowadays corporate companies started to collect many different data sets and produced search engines such as Google (Google,2019) [17].
In addition to using the created study in critical region detection, if new images are defined, it can be used in search and rescue activities and in the detection of debris-accident areas. Regardless of the image analysis by man, large areas and detected potential areas will be presented quickly to the those who are interested, without the need for any pause, rest or break.