Sınıflandırma Algoritmalarını Kullanarak Meme Dokusunda Kitleleri Değerlendirmeye Yönelik Karar Destek Sistemi

Breast cancer is the most widely recognized cancer-related death among women globally. Epidemiological studies released in different parts of the world over the past two decades show a significant rise in mortality rates for breast cancer. Today, mammography is the most effective method for imaging masses and microcalcifications in breast tissue. On the other hand, breast biopsy predictions arising from mammogram analysis lead to nearly 70 percent biopsies of benign findings that can be prevented without a biopsy. An automated method is therefore required to assist physicians in mammography analysis prognoses. Researchers have suggested different medical decision support systems recently. In this study, a medical decision support system to be utilized in the process of a breast cancer diagnosis is proposed. The primary purpose of this system is to lower the number of unnecessary breast biopsies and make the diagnosis more reliable. Accordingly, apart from the age of the patient, BI-RADS assessment of the breast tissue, the shape of the mass, mass margin, tissue density, the class label indicating the severity of the lesion are evaluated using the performances of a Naive Bayes algorithm , which is a probabilistic classification algorithm, and Multilayer Perceptron algorithm, which is a feed forward neural network, as two different classification algorithms via a preferred dataset in which each mammography * This paper was presented at the International Conference on Access to Recent Advances in Engineering and Digitalization (ARACONF 2020). ** Sorumlu Yazar: Nevşehir Hacı Bektaş Veli Üniversitesi, Mühendislik Mimarlık Fakültesi, Biyomedikal Mühendisliği Bölümü, Nevşehir, Türkiye, ORCID: 0000-0002-9688-6293, pinarozel@nevsehir.edu.tr European Journal of Science and Technology e-ISSN: 2148-2683 115 mass have six different feature. The proposed system can help to make a biopsy or short-time follow-up decision. The test results are promising that the proposed method can be used as a decision module in computer-aided diagnosis systems.

Recently, eight input values from the medical history of patients and eighteen inputs to the network included 10 BI-RADS lesion descriptors was utilized to characterize malignant and benign breast lesions through ANN trained and tested 73 malignant and 133 benign cases. Hence the specificity of radiologists as 30 percent was considerably smaller than the positive predictive value of the biopsy from 35 percent to 61 percent with a relative sensitivity of 95 percent, the specificity of ANN approach (62 percent) (Baker, J. A. , Kornguth, P. J. , Lo, J. Y. , Williford, M. E. , Floyd, C. E., 1995). Fernandes et al. also utilized a partial logistic ANN with automatic relevance determination (PLANN-ARD) (Fernandes, A. S. , Alves, P. , Jarman, I. , Etchells, T. A. , Foncea, J. M. , Lisboa, P. J. G., 2010) in addition to two distinct prognostic modeling strategies: the clinically widely used Nottingham prognostic index (NPI)and the Cox regression.
Another methodology examined is a case-based reasoning framework improved to clarify the biopsy decision for patients with suspicious outcomes on benign breast lesions. This framework is intended to facilitate benign biopsies without being of target malignancies. Radiologists evaluate the mammograms utilizing a regular revealing lexicon. And the case-based reasoning framework confronts these results with a database of cases with known outcomes (from biopsy) and gives back the portion of comparable instances that were malignant. This portion with a malignant case is an intuitive reaction that radiologists would then be able to think about when settling on the choice concerning biopsy. The framework was assessed utilizing a round-robin sampling scheme and evaluated with an area under the ROC curve of 0.83 (Floyd, C. E. , Lo, J. Y. , Tourassi, G. D., 2000). Similarly, (Bilska-Wolak, A. O , Floyd, C. E., 2001) (Bilska-Wolak, A. O , Floyd, C. E., 2002) also utilized case-based reasoning classifiers in their studies.
In a Bayesian network structure study (Markey, M. K. , Fischer, E. A. , Lo, J. Y. , 2004), it is demonstrated that there is a distinction in the categorization for biopsy findings and the invasiveness of metastases of breast masses. Jiang et al. (Jiang, X. , Wells, A. , Brufsky, A. , Neapolita, R. , 2019) improved a Bayesian network model named as Causal Modeling with Internal Layers (CAMIL), and Treatment Feature Interactions (TFI) algorithm. And via these methods, it is analyzed the likelihood of not being metastasized-in 5 years for individuals who settled on choices prescribed by the decision support system (DSS).
Also, to achieve a computer-aided diagnosis system (CAD) systems, a Graph-Based Visual Saliency (GBVS) technique is utilized for automated mass identification. Lastly, categorization and retrieval are operated via ELM, SVM, in addition to a linear combination-based similarity fusion method (Rahman, M. , Alpaslan, N., 2017).
In another study, DT, ANN, and SVM as data mining classification algorithms are utilized to improve the capacity of clinicians to decide the seriousness of a mammographic mass lesion using the patient's age and BI-RADS properties. The mammographic masses data set is separated for training and test the models by the ratio of 70:30 percent, respectively. Three statistical metrics measure classification algorithm efficiency as sensitivity, accuracy, a specificity of the classification. Accuracy of ANN, DT, and SVM are 80.56%, 78.12%, and 81.25% of test samples, respectively. Their study shows that SVM predicts the incidence of BC with the least error rate and the highest accuracy among these three categorization models (Mokhtar, S. A. , Elsayad, A. M., 2013).
Furthermore, Elter et al. (Elter, M. , Schulz-Wendtland, R. , Wittenberg, T. , 2011) propose two innovative CAD strategies that both focus on an intelligible decision process to make predictions of BI-RADS findings in breast biopsy. The first method generates a global paradigm that is dependent on decision tree learning. The latter approach is focused on case-based reasoning and uses an entropic measure of similarity. In the study, the efficiency of both CAD strategies using analysis of ROC, bootstrap sampling, and the ANOVA statistical significance test via two known openly available mammography databases are tested. All methods outperform physicians' diagnostic options.
Ala et al. (Alaa, A.M. , Moon, K. H. , Hsu, W. , Van Der Schaar, M., 2016) proposed a system called ConfidentCare, which runs by identifying "related" patient clusters and learning "best" screening to implement for each cluster. ConfidentCare uses a sequential algorithm that performs K-means clustering to the women's feature space, accompanied by learning for each cluster an effective Avrupa Bilim ve Teknoloji Dergisi e-ISSN: 2148-2683 116 classifier (decision tree). The algorithm guarantees that the strategy embraced for each cluster of individuals fulfill a predetermined accuracy necessity with a high grade of certainty.
In this research paper, we propose a DSS to predict the severity of masses in breast tissue using mammography outcomes. This system can be considered as part of a CAD for BC detection using mammographic images.

Materials and Method
In the present study, the developed methods are come together using a supervised classification approach. Two different algorithms are investigated to implement this. One of them is the Naive Bayes Classifier, which is a fundamental supervised classification algorithm, and the other one is a multilayer perceptron, which depends on the backpropagation algorithm. These algorithms are described in the following subsections.

Naive Bayes Classifier
Naive Bayes method assumes that a feature is independent of every other feature. Design parameters, i.e., feature probability distributions and the class priors, are calculated using the relative frequencies derived from the training data (Sebe, N. , Lew, M.S. , Cohen, I. , Garg, A. , Huang, T. S., 2002). The Naive Bayes model can be stated as follows.
Given a feature vector of an unknown, a decision is made by selecting the class that offers the highest posterior probability. This is called the maximum a posteriori (MAP) decision rule, and for equation (2.2) the decision rule can be expressed as follows:

Multilayer Perceptron
The multilayer perceptron (MLP), which is a kind of ANN network classifier and feedforward fully connected ANN that is nowadays the most utilized supervised classifiers, is made up of multiple layers of simple, two-state, sigmoid processing elements or neurons that act upon each other utilizing weighted connections. Following an input layer ranging from bottom to top randomly, the hidden layers followed by an output layer at the furthermost are available. While there are no interconnections inside a layer, in adjacent layers, all of the neurons in a layer are linked to neurons. Weights quantify the extent of correlation among the activity degrees of neurons that they attach (Ruck, D. . W. , Rogers, S. K. , Kabrisky, M. , Oxley, M. E. , Suter, B. W. , 1990).
Every neuron outputs of each layer are linked to all of the neuron inputs of the adjacent layer weighted by values that are expected to estimate. These weights are initially commenced with small casual valuations. To calculate these valuations, the learning vectors and the corresponding desired outputs, which are known targets, are introduced to the network. The learning process aims to abbreviate the quadratic error: So, we accomplish the learning algorithm named the back-propagation algorithm for the repetition t + 1: where is the weight value between the neuron l1 of the first layer and the neuron l2 of the following layer, stands for the learning rate, is the acquired output of the neuron l1 at the repetition t, and is given by:

Dataset
To train the classifiers and evaluate the classification performance, we have used a dataset that was collected in 2007 by Prof. Dr. Rudiger Schulz-Wendtland and Matthias Elter (Elter,M. , Schulz-Wendtland, R. , Wittenberg, T., 2007).
There are 961 instances in the dataset. Among these 961 patients, each case corresponds to a lesion in the breast tissue of the patients. The following six attributes represent each instance:

Proposed Method
A medical decision support system has been proposed to be used to estimate whether the masses in the breast tissue are benign or malignant from the data obtained from mammography images. Accordingly, from the data obtained from mammography images, as first step, different features mentioned as six attributes above in the dataset are acquired. After this process, apart from the age of the patient, BI-RADS assessment of the breast tissue, the shape of the mass, mass margin, tissue density, a class label indicating the severity of the lesion are evaluated via using two different classification algorithms. As a result, a DSS user can decide whether biopsy or short-term follow-up decisions is necessary or not. The flow chart of our proposed method is summarized as in Figure 1.

Research Results and Discussion
To interpret the success of the system, we performed tests operating the 10-fold cross-validation method. The classification results are reported as the numbers of correctly and incorrectly classified instances from benign and malignant instances and overall accuracy.
In Table I, the test results acquired using the Naive Bayes algorithm. An overall accuracy level of 83.35% is reached by this method. The test results acquired from the multilayer perceptron algorithm can be seen in Table II. The overall accuracy, in this case, is 81.06%. Additionally, other parameters are also evaluated utilizing Naive Bayes algorithm and Multilayer Perceptron algorithm. The results of this process is given as Table 3 in the following: We evaluated the performances of a Naive Bayes algorithm and Multilayer Perceptron algorithm as two different classifiers on this dataset using a 10-fold cross-validation scheme for BI-RADS assessment of the breast tissue, the shape of the mass, mass margin, tissue density, the class label indicating the severity of the lesion parameters. Hence, among the accuracy results of the chosen classifiers, the most efficient one belongs to density parameter. It is followed by the accuracy results of the class label and the accuracy results of Bi-Rads assesmentresults respectively.
Additionally, from the given results, it can be concluded that the Naive Bayes algorithm slightly outperforms the Multilayer Perceptron algorithm, thus, it is not enough to choose one algorithm over the other. Nevertheless, the results differences can be interpreted with the fundamental differences between a Perceptron and a Naive Bayes classifier. Firstly, Naive Bayes classifiers utilize the background of Probability Theory for classification and learning; however, Multilayer Perceptron classifiers use Neural Network for classification and learning. Furthermore, the Naive Bayes classifier requires reading the whole training data before updating its knowledge about training data. On the other hand, Multilayer Perceptron classifiers read one sample at a time to update European Journal of Science and Technology e-ISSN: 2148-2683 119 its understanding of the training data. And lastly, training and testing data are distinct for Naive Bayes classifier, but training data also serve the purpose of the test data in case of Multilayer Perceptron classifiers

Conclusion
In this presented paper, we have proposed a DSS for classifying the masses in breast tissue into two categorizations, which are benign or malignant ones. From the data obtained from mammography images, a treatment which is used to estimate the benign or malignant masses in the breast tissue is proposed. And thus, biopsy or short-term follow-up decisions can be made. By reducing the need for biopsy, it is aimed to facilitate the diagnosis process, to speed up, to reduce the cost, and to save the patient from the painful procedure.
It can be considered as a decision module for computer-assisted diagnostic (CAD) systems. We are planning to integrate image processing algorithms such as tissue density prediction and mass shape analysis to this system to develop a fully automatic CAD system. Furthermore, different classification algorithms, decision fusion algorithms and ensemble classifiers are to plan in our future studies.