A Convolutional Neural Network Application for Predicting the Locating of Squamous Cell Carcinoma in the Lung

Copyright © BAJECE ISSN: 2147-284X http://www.bajece.com Abstract—Squamous cell carcinoma, one of the most common types of lung cancer types, usually occurs in the middle, right or left bronchi. Squamous cell carcinoma can be easily detected by imaging methods to determine the location within the lung. However, rarely the location of some tumor types cannot be determined. In this case, it may be delayed to obtain the results of the assay such as biopsy. This possible delay also means delayed diagnosis and delayed start of treatment. In order to solve this problem, it is possible to perform applications with machine learning methods. In this study, convolutional neural networks method was used to determine the location of cancerous tumor in squamous cell carcinoma of lung. With the designed convolutional neural network model, squamous cell carcinoma tumor location in lung cancer was estimated with an accuracy rate close to 100%.


I. INTRODUCTION
HILE lung cancer is a rare disease at the beginning of the 20th century, the frequency has increased steadily in parallel with the increase in smoking habits and has become the most common type of cancer in the world [1].According to the World Health Organization (WHO) report, lung cancer is the leading cause of death among males and the second type of cancer among females all over the world [2].Squamous cell lung cancer, also called lung squamous cell carcinoma, accounts for about 30% of all lung cancers [3].There are many treatment options for people diagnosed with squamous cell lung cancer, and doctors are constantly working to increase and improve these types of treatments.Squamous cell lung cancer is a type of non-small cell lung cancer.Squamous cell carcinoma starts as squamous cells that look like fish scales when viewed under a microscope.This cancer, which runs along the inside of the airways in the lungs, uses the most air channels to progress.Squamous cells in the tissue that make up the skin surface are also found on the surface of the body's organs and inside the digestive tract.Squamous cell carcinoma originating in the lungs is thought to be lung cancer.
Squamous cell lung tumors usually begin in the middle of the lung or in one of the main airways (left or right bronchus) [3,4].The location of the tumor may affect the appearance of symptoms such as coughing, shortness of breath, chest pain and blood in the sputum.If the tumor is enlarged, a chest radiograph or computerized tomography scan may detect a space in the lung.This space is a gas or fluid-filled cavity in the tumor mass or nodule and is a classic sign of squamous cell lung cancer.Squamous cell lung cancer can spread to multiple sites such as the brain, spine, other bones, the adrenal glands, and the liver.Smoking affects this type of cancer more strongly than any other large-cell lung cancer.Other risk factors for squamous cell lung cancer include age, hereditary diseases, environmental pollution, mineral and metal dust, exposure to asbestos or radon.Positive results may be observed with appropriate treatment methods, although this type of cancer is sometimes hardly noticed by occult development [5][6][7].Pathologically, lung cancer can be divided into 4 main groups; a) squamous cell carcinoma, b) large cell carcinoma, c) adenocarcinoma, d) small adenocarcinoma.Squamous cell carcinoma in 40% -60% of males is more common in the middle and advanced age group.The main diagnostic methods of lung cancer are; a) CT (Computerized Tomography), b) Endoscopic Ultrasonography, c) Bronchoscopy, Thoracoscopy, Dermoscopy.d) Pulmonary Radiography e) Positron Emission Tomography (PET) f) Magnetic Resonance Imaging.The main treatment modalities are surgical treatment, radiotherapy, chemotherapy and cryotherapy.Squamous cell carcinoma shows different symptoms according to its tumor location.Identification of tumor location is of great importance in early diagnosis of the disease.The region of the lung where the tumor is located is an important piece of information to be used for the treatment process.The location of the tumor is immediately revealed by imaging techniques.In some cases, however, the tumor location may not be detected by imaging techniques.This leads to delays for biopsy and the later start of treatment [7][8][9][10][11][12][13].
Machine learning-based prediction methods have been used effectively in recent years for different purposes in cancer types [14][15][16][17].In this study, convolutional neural network (CNN) model, which is the most popular method of machine learning approaches in recent years, is designed.Detection of tumor location in squamous cell carcinoma lung cancer is the main goal of the study.In the first section of the study, basic information on entry and lung cancer was included.In the second section, data set and materials used in the study were mentioned.In the third chapter, the design of the convolutional neural networks used in the study is briefly explained.In the fourth section, the estimation results obtained from the CNN

II. MATERIALS
The data set used in the study was obtained from the website of The Cancer Imaging Archive (TCIA) [18,19].The actual size of the image data in the data set is 512 x 512.All image data used in this study actually consist of 3-channel data.In order to train the CNN model in shorter time, all image data in the data set is reduced to 84x84 and single channel gray scale.According to the regions of the squamous cell carcinoma tumor dataset, a data set consisting of a total of 300 image data, 60 data for five regions, was used.Figure 1 shows a sample image at the actual size in the dataset.For the input layer of the CNN model, each image data is used in the form of a matrix form.If a randomly selected image data from the data set is redrawn by contouring the digitized matrix form, Figure 3 is obtained.In Figure 3, the data set for the input layer shows a contour of a randomly selected image data in the matrix form.The convolutional neural network designed and applied in the study is shown in figure 4. The CNN model shown in Figure 4 includes the process after the actual image data is reduced to 84x84 pixels and is mono-channelized.

Fig. 3. Contour plot of a randomly selected sample image
The designed CNN model consists of the input layer, the convolution layer, the relu, the maxpooling layer, the fully connected layer and the softmax, respectively.As mentioned earlier, image data is used at the input layer.For the next layer, convoluted layer, 20 filters of 5x5 are used.After this filtration process, normalization with Relu transfer function is performed.In the Maxpooling layer, a 2x2 filter was used, shifted by 2 frames.The next processing layer is the 'fully connected layer' to which all neurons are connected.Classification is also performed with the last layer, the softmax layer.
In the study, 15 of 60 data used for each region were tested and the rest was used for training purposes.In this case a total of 75 data were used for the test and the remaining 225 data were used for the training of the model.The initial learning rate for the designed CNN model was accepted as 0.00001.1000 is selected as the maximum epoch for training and testing in the study.The reason for choosing a very high Epoch number is to be able to achieve high accuracy.2. Figure 7 shows the obtained accuracy curve.This accuracy curve is obtained by taking minimum values for every 50 epochs, as in the mini-batch loss curve.However, the total accuracy rate obtained is 99.11%.In order to evaluate the results obtained from the CNN model designed in the study, it is seen that 1000 iterations in total are used in Table 2 when the training process is examined first.At the end of each 50 epoch, the error rate obtained by writing the lowest error rate shows an accuracy rate of 99.9983% after the last epoch.However, this accuracy is the lowest rate of the last 50 epochs.When all epochs are considered in this case, it is seen that the average accuracy rate is 99.11% of the true accuracy rate.This accuracy is an indication that the model is extremely successful.In the obtained error value, it is possible to say that the first epoch is a very high error of 5.71, but it is approaching zero by decreasing to the next epoch.Another important factor is the length of the education process.The training ended after 114.45 seconds at the end of a thousand epoch.This value may vary depending on the power of the processor, the size of the image, the number of images in the data set, and the number of convolutional and maxpooling layers used in the designed model.In fact, the number of filters used in the convolutional layer and the filter size affect both accuracy and training duration.As a result, with the designed CNN model, the estimation of which region of the lung of the squamous cell carcinoma tumor was performed in approximately 2 minutes, with an accuracy rate close to 100%.

Fig. 4 .
Fig. 4. Architecture of the designed CNN model

Fig. 5 .
Fig. 5.The loss curve for the CNN model in the training process At the end of each 50 epochs in the CNN model, the smallest loss ratio obtained can be viewed from the graph covering all epochs in figure 5, which are obtained according to the values given in table 2. If this graph is to be examined more clearly, the graph of Figure 6 is obtained by reducing the maximum value of the loss axis to 3. The change in losses is shown more clearly in this graph.

Fig. 6 .
Fig. 6.Detailed loss curveIt is also possible to obtain an accuracy curve from the values in Table2.Figure7shows the obtained accuracy curve.This accuracy curve is obtained by taking minimum values for every 50 epochs, as in the mini-batch loss curve.However, the total accuracy rate obtained is 99.11%.

TABLE . 1
SQUAMOUS CELL CARCINOMA LUNG LOCATION AND NUMBERING