Determination and Classification of Importance of Attributes Used in Diagnosing Pregnant Women's Birth Method

The rapid development of information technologies enables successful results in computer-aided studies. This has led researchers to investigate the usability of technologies such as computer and software supported systems, machine learning, and artificial intelligence in many studies. One of these areas is health. For example, in order not to risk the condition of the mother and baby, in some cases, it is very important to correctly determine the times when the cesarean operation, which is mandatory, is mandatory. In this context, in order to make a faster and more accurate decision, it is very important to determine which attributes and how important the level is in making obligatory cesarean. In this study, to determine whether or not caesarean is necessary in the literature, the importance level of the five criteria taken into consideration has been determined and an attribute determination has been carried out and then a classification has been made. The data set used belongs to 80 pregnant women with 6 attributes. Although the same data set was previously classified with different methods, no study was found on determining the significance levels of the attributes and using artificial neural networks as a method. For this reason, in this study, the feature was determined using an adaptive nerve-fuzzy classifier and classified using artificial neural networks. When the results are examined, it is concluded that the importance levels of the attributes are different. Although the values such as accuracy, Sensitivity, and Specificity calculated to evaluate the classification results were found to be quite high for the training set, it was observed that the desired success was not achieved in the test data. While this result is promising, it also reveals the need to increase the learning performed with larger data sets.


Introduction
Today, with the developing technology, the amount of data increases day by day. The need to extract meaningful information between these increased data caused many concepts such as data mining. Data mining, which aims to reveal the relationship between data, has many applications and uses (Alan, 2012). In this study, classification and feature selection has been realized.
Classification working as a pre-tutorial is a data mining technique that is used quite a lot (Alan, 2004). The classification method was developed to include another sample of which class is not known to be included in one of the existing classes by using inferences from existing samples. The subject of classification is very popular in the literature and is one of the most studied topics. Some of the studies are as follows; Alptekin and Yeşilaydın (2015), have worked on the classification of OECD countries according to health indicators by using the fuzzy cluster method. Haltaş and Alkan (2016), worked on automatic classification of cancer types. Yakut and Gemici (2017), worked on estimating their returns by classifying stocks of some companies traded in the istanbul stock exchange index. Çelik et al. (2018), made attribute detection on the human spine data set and then classified.
In the classification method, many mathematical and statistical calculations are made to separate the classes, and one of the methods for determining these distinctive features is the feature selection. Feature selection is among the basic steps of many data mining methods (Polat and Özerdem, 2016). In order for the methods to be used successfully, it is of great importance to determine the correct attributes. The need to remove attributes that do not affect the result, especially due to increasing costs as the data sizes increase, increases the demand for determining the correct attributes (Bulut, 2016). When the literature is analyzed, it is seen that the feature selection is used in studies in many areas. Some of these studies are as follows; Kayım et al. (2003), worked on identifying distinctive features and gender recognition using facial features. Gündüz et al. (2013), worked to identify the attributes for removing crowded dynamics. Yoldaş et al. (2014), first performed feature extraction to determine cancer levels and then classified them with k-nearest neighborhood. Kaya and Ertuğrul (2016), proposed a new language recognition approach based on feature extraction. Kaya et al. (2017), made a selection of features to classify diabetic retinopathy disease. Emhan and Akın (2019), investigated the effect of feature selection methods on intrusion detection systems. Bilimleri (2019), has performed attribute detection from the raw data collected from the sensors on the smartphone. Al-Tashi et al. (2019), proposed a hybrid method for feature selection. Mafarja et al. (2019), worked on the most appropriate feature selection for classification purposes.
Caesarean, which is the subject of this study, has been a very common operation in recent years. In some cases, cesarean is an operation that is mandatory for individuals to avoid risking the condition of the mother and baby in some cases. It is very important to determine the situations that are obligatory for not risking the life of the mother and the baby and to determine which effect is effective on this situation. Alphanumeric Journal Volume 8, Issue 1, 2020 In this study, it was also studied to estimate the delivery type of pregnant women and to determine the features with higher effect in cases requiring cesarean. The data set used has been previously studied in the literature by different methods. The studies in the literature with the same data set are as follows; Gharehchopogh and his colleagues have worked on predicting situations where cesarean is required. In the classification made using the decision tree algorithm used as a method, the success rate was 86.25% (Gharehchopogh et al., 2012). Amin and Ali evaluated the performance of some methods used in data mining on cesarean data. The methods they use are random forest, logistic regression, naive bayes, k-nearest neighbors, and support vector machine. The highest success was found in 95% of k-nearest neighbors and random forest (Amin and Ali, 2018). As can be seen, no study with Artificial Neural Network (ANN) was found in the data set analyzed by different methods. For this reason, the caesarean data set was classified with ANN in this study. At the same time, in addition to the literature, the importance levels of the features were determined and the feature selection was made. In addition, Accuracy, Sensitivity, Specificity, Precision, Recall, F-Measure, Gmean, Mean Square Error (MSE) and Root Mean Square Error (RMSE) values were calculated for performance evaluation of the classification made with the trained ANN. When the study is evaluated in this context, it includes different methods and calculations from the studies in the literature. Thus, the evaluation of the same data set from different angles will contribute to the literature.

Methods and attributes used in the research
The data used in the study are the data of the Application of Decision Tree Algorithm for Data Mining in Healthcare Operations: A Case Study (Gharehchopogh et al., 2012). The data shared explicitly in the study were later shared with the Performance Evaluation of Supervised Machine Learning Classifiers for Predicting Healthcare Operational Decisions study so that it can be used in other studies, as well, in the UCI machine learning repository site (Amin and Ali, 2018). For this study, it was obtained from the website https://archive.ics.uci.edu/ml/datasets/Caesarian+Section+Classification+Datas et with the date of access on 05.04.2019 and used by referring to the articles in the literature. There are 6 attributes in the dataset containing the information of 80 pregnant women and the type of birth, and detailed information about the attributes is given in Table 1  In this study, Adaptive nerve-fuzzy classifier was used for determining the significance levels of the features and the feature selection, and Artificial neural networks were used for classification.

Adaptive nerve-fuzzy classifier
Adaptive nerve-fuzzy classifier, which is an adaptive method that combines the inference structure of the fuzzy logic with the learning feature of neural networks, firstly determines the effect of all the features in the dataset on the result, and then classifies the neural networks by using these effect degrees (Pençe et al., 2013). The Adaptive nerve-fuzzy classifier created for classification purposes can be used only for attribute detection and for determining the effect level of the attributes on the result. Fuzzy rule structures are used when determining the significance of attributes with Adaptive nerve-fuzzy classifier. Ratings made with fuzzy rules are carried out closer to human inference, such as too little, little, much, too much, rather than clear expressions such as yes or not (Çeşmeli et al., 2015). In addition, it is revealed that it is not necessary to use the attribute whose significance level is determined as zero, that is, it is not important, in the classification stage. This situation also saves memory and reduces costs (Çetişli, 2009).
Linguistic forces are used when determining the significance of the attributes (Fırat, 2008). The reinforced A linguistic terms determined for the p values of the linguistic forces applied to the linguistic term A {-2, -0.5, 0, 0.5, 1, 2} are given in Figure 1. As shown in Figure 1, if P is less than 0, the direction of the membership function changes and as a result of this change, the membership values become greater than 1. It is not desirable that the membership value calculated to keep the turbidity of the clusters out of range [0 + 1] is not desired, and therefore P must be greater than or equal to zero (Çelik et al., 2018).
The results of a two-input, one-output fuzzy rule in the inference system are shown in Figure 2.  (Çetişli, 2006). Figure 2, for variables x1 and x2, if set B is to be chosen with high degree, set A2 prevents this. To increase the accuracy of the selection, only A1 set should be used and A2 should be disabled. The linguistic force must be used as 0 to deactivate the A2 fuzzy set. Thus, X2 space is removed from the rule and since it is not used for X2 space, the degree of selection of B set increases (Çetişli, 2006). Some studies on USBS used in many areas are as follows; Çetişli (2009), worked to reduce the size of gene data used for the treatment of gingival cancer diseases and used Adaptive nerve-fuzzy classifier as a method. Pençe et al. (2013), have worked on the modeling and classification of handwriting characters, and they have also selected attributes using the Adaptive nerve-fuzzy classifier method. Çeşmeli et al. (2015), studied the success of the students in the classes by using data mining methods. As a method, they used ANN for classification and USBS for feature selection. Çelik et al. (2018), made a feature selection and classification on the human spine using Adaptive nervefuzzy classifier.

Artificial neural networks (ANN)
Artificial Neural Networks (ANN) is a method that emerged with the idea that the working structure of the human brain can be used as an information processing tool by pouring it into a mathematical model (El-Bouri et al., 2000). The most important feature is that it has the ability to learn from experience (Karahan, 2015). ANN can automatically realize the ability to create new information with learning, which is one of the features of the human brain, and can create relationships between existing information.
The representative image of ANN is shown in Figure 3. Inputs are information from the external environment to the artificial nerve cell. Outputs are created by calculating the functions in ANN (Kaynar et al., 2011). All of the artificial nerve cells connected by inputs and outputs form ANN (Kaynar and Taştan, 2009).
In ANN, the binding of artificial nerve cells in parallel forms layers. Although the number of layers created may vary according to the designed models, there are basically three layers. These layers are the input layer, the output layer and the hidden layer. Example ANN model showing layers is shown in Figure 4. The layer formed by external inputs is the input layer, the layer formed by the resulting outputs is the output layer, and all the layers outside the input and output layer are hidden layers. When we look at the literature, it is seen that ANN is used in many fields such as medical and health applications, industrial applications, military and defense applications, and financial applications. Especially in the field of health, it is used in many areas such as early diagnosis of diseases, grading of the factors causing the disease, and which of the treatment methods to be chosen can give more successful results. Some studies are as follows; Hoskins and Himmelblau (1988), worked on detecting errors in information representation processes in the field of chemical engineering. They used artificial neural networks as a method in the study and created a simulation. The results obtained are quite successful. Park et al. (1991), worked to estimate the electrical load using ANN. Hill et al. (1994), evaluated the usability of ANN for prediction and decision making models. Zhang et al. (1996), worked on the detection of faults in their transformers using ANN. Türkoğlu and Arslan (1996) worked on the recognition of alphabetic patterns and artificial neural networks were used as the method. When looking at the details of the study, the alphabet was taught to the system by providing the network as training data with the help of 26 basic alphabetical pattern matrix system. Subsequently, thanks to the system that learns the alphabet with the method of artificial neural networks, it was provided to predict missing or broken letters. It seems that the results obtained are quite successful. and artificial neural networks were used as the method. When looking at the details of the study, the alphabet was taught to the system by providing the network as training data with the help of 26 basic alphabetical pattern matrix system. Subsequently, thanks to the system that learns the alphabet with the method of artificial neural networks, it was provided to predict missing or broken letters. It seems that the results obtained are quite successful. Khan et al. (2001), have worked to classify cancer patients according to their determined categories using ANN. Dreiseitl and Ohno-Machado (2002), compared logistic regression and the Alphanumeric Journal Volume 8, Issue 1, 2020 use of ANN for medical studies. Nagy et al. (2002), worked to estimate sediment load concentration in rivers using ANN. Çolak et al. (2005), worked on predicting atherosclerosis using ANN. Selim and Demirbilek (2009), in Turkey, have used the ANN method as an alternative approach to analyze the factors that determine residential rental values. Ulusoy (2010), worked on estimating the stock market index value for 3 years using ANN. Partal et al. (2011), have worked on the prediction of daily precipitation with ANN and wavelet transform. El_Jerjawi and Abu-Naser (2018), have tried to predict whether someone has diabetes by using ANN.

Findings and Evaluation
Selection criteria of the attributes determined by Adaptive nerve-fuzzy classifier are given in Figure 5. In Figure 5, the features expressed in numbers are as follows; 1 Age, 2 Number of births, 3 Delivery time, 4 Blood pressure, 5 Heart problem.
As it can be seen in Figure 5, the features with low severity are the age and blood pressure status, the medium having an important level of heart problem, the high severity features are the number of births and the time of birth. As a result, the attributes with the highest importance were determined as "birth number" shown in column 2 and "birth time" shown in column 3.
The dataset contains the result showing the caesarean state versus 5 attributes. The class labels of the outputs are determined as 1 and 2, representing yes and no. Feedforward ANN was used for classification. The data set is divided into 70% training and 30% test data. The structure of the trained ANN is shown in Figure 6. As seen in Figure 6, there are 5 nodes in the input layer of ANN, 25 nodes in both hidden layers and 2 nodes in the output layer. At the same time, trainings with ANN were carried out as 1000 iterations. As seen in Figure 7, the performance of the trained ANN for training data increased in direct proportion to the number of iterations. As seen in Figure 8, the compliance of trained ANN for training data is quite high.
The Receiver Operating Characteristic (ROC) curve generated for the training data is given in Figure 9. Looking at Figure 9, it is seen that the area under the ROC curve (AUC) value is very close to 1. By looking at this result, it can be said that a high success has been achieved in the classification made in training data.
The ROC curve formed for the test data is given in Figure 10. When looking at Figure 10, it is seen that the area under the ROC curve (AUC) value is above 0.5, but 1 cannot be approached sufficiently. This result shows that a sufficiently high success in the classification made in the test data has not been achieved.
Classification details for training and test data are given in Table 2 as Confusion Matrix.

True Pozitif(TP) True Negatif(TN) False Pozitif(FP) False Negatif(FN)
Training 26 28 0 2 Test 6 9 2 7 As can be seen in Table 2, 54 out of 56 data are correctly classified in training data, and 2 examples are classified incorrectly. For the test data, 15 out of 24 data are correctly classified and 9 samples are classified incorrectly.
Performance and error values of the ANN designed for training and test data are shown in Table 3. As can be seen in Table 3, the success values of the classification for training data set are quite high and error values are quite low, but the success values for test data did not rise as high as expected and lower than the training data. Values such as accuracy, sensitivity, specificity, recall, f-measure, gmean can be used to evaluate the accuracy of the results obtained in classification processes. It is desirable that these values are close to 1 because it is understood that the classification success increases as the values approach 1. A value of 1 means that all data were classified correctly and the method was 100% successful. It is very difficult to catch this rate of real life problems because it is difficult to make a completely accurate estimation. Therefore, it shows that it can be successful at values close to 100 for many problems. The values in this study are very close to 1, especially on the training data. Therefore, it is possible to say that successful results have been obtained. Another criterion is the difference in success rate between training and test data as in this study. The probable reason for this situation is that the system cannot provide a complete learning due to insufficient training data. Some of the suggestions in this situation may be to increase training data or to increase network learning by using data preprocessing methods.

Conclusion
Caesarean is a very frequent operation that is mandatory by the physician, sometimes in cases where the mother or baby is at life risk, and in some cases at the request of the expectant mother. In order not to risk the life of the mother and baby, it is very important to determine the situations that are especially necessary and to take necessary interventions immediately. For this reason, in the literature, studies are carried out to determine in which cases the caesarean is compulsory. In this study, the use of developing technology and the use of the closest to the truth by creating many alternatives will increase the success rate as emphasized in the articles taken from the source. Although the data set used has been previously classified by several data mining methods, no classification made with ANN has been found. ANN is a very successful method for modelling the working structure of the human brain, and when we look at the literature, it is seen that predictions that use this method at very high successes. For this reason, the cesarean data set classified by different methods in the literature was classified as ANN in this study, and the importance of the features was determined in the data set and the feature selection was made. The method used for attribute selection is Adaptive nerve-fuzzy classifier.
When the results of the feature selection are examined, it is seen that the importance levels of the features used are different and do not affect the result equally. The details are as follows; In carrying out the cesarean operation, the features with the highest level of effect were determined as the number of births and duration of birth, the features that affected the least were age and blood pressure, and the moderate effect was the heart problem.
Two hidden layers were used in the ANN designed for classification, and the number of neurons was determined as 25. The data set was divided into 70% training and 30% test data, and the values calculated to show success rates were calculated separately for training and test data. When looking at the classification results, accuracy, sensitivity, specificity, precision, recall, f-measure, gmean, mse, rmse values According to these results, the classification success in education data is very high and the error value is very low, but it was observed that the desired success for the test data could not be achieved.
In studies that form the source of the data set; Amin and Ali (2018), evaluated the performance of some methods used in data mining on cesarean data.The methods they use are random forest, logistic regression, naive bayes, k-nearest neighbors, and support vector machine. With this study, as different from the methods in the Amin and Ali (2018) study, the classification was made with the ANN method. And the importance levels of the features were determined and the feature selection was made. According to the findings, which are limited to the data used in the results of the study, "birth number" and "time of birth" were the two attributes with the highest significance level affecting Caesarean status.In future studies, classification can be carried out using normalization or other pre-processing techniques or using different classification methods. In addition, new data sets can be created by increasing the examples in the data set.