COMPUTER-AIDED DETECTION OF BRAIN TUMORS USING MORPHOLOGICAL RECONSTRUCTION

: Computer aided detection (CAD) systems helps the detection of abnormalities in medical images using advanced image processing and pattern recognition techniques. CAD has advantages in accelerating decision-making and reducing the human error in detection process. In this study, a CAD system is developed which is based on morphological reconstruction and classification methods with the use of morphological features of the regions of interest to detect brain tumors from brain magnetic resonance (MR) images. The CAD system consists of four stages: the preprocessing, the segmentation, region of interest specification and tumor detection stages. The system is evaluated on REMBRANDT dataset with 497 MR image slices of 10 patients. In the classification stage the performance of CAD has achieved accuracy of 93.36% with Decision Tree Algorithm, 94.89% with Artificial Neural Network (Multilayer Perceptron), 96.93% with K-Nearest Neighbour Algorithm and 96.93% with Meta-Learner (Decorate) Algorithm. These results show that the proposed technique is effective and promising for detecting tumors in brain MR images and enhances the classification process to be more accurate. The using morphological reconstruction method is useful and adaptive than the methods used in other CAD applications.


INTRODUCTION
Computer-Aided Detection (CAD) is a collaborative branch of science between computer sciences and medical disciplines that allows medical images to be processed automatically by computers (Nagashima and Harakawa, 2011).CAD systems both save time for radiologists and minimize the risk of errors during decision-making process (Akram and Usman, 2011).In this study, a computer aided detection mechanism is developed to rapidly and accurately identify tumors in brain MR images.
Detection of tumors can be a difficult process because tumor regions are generally ambiguous with various morphological structures (Vrji and Jayakumari, 2011).There are various developed methods for tumor detection.Methods developed for this purpose include artificial neural networks (Logeswari and Karnan, 2010), masking (Akram and Usman, 2011), grey scale thresholding (Rulaningtyas and Ain, 2009), fuzzy logic (Khotanlou, Colliot, Atif, and Bloch, 2009), rule-based approaches (Chan, 2007) and clustering methods (Wu, Lin, and Chang, 2007).Furthermore, using different classification methods after identifying a potential tumor region is another commonly used method of detection (El-Dahshan, Hosny, and Salem, 2010; Jayachandran and Dhanasekaran, 2013; Naik and Patel, 2014; Ulku and Camurcu, 2013).In such methods, other factors like structural quantities also become applicable since features derived from pixel values are inadequate in most cases (Xuan and Liao, 2007).
The aim of this study is to develop a computer aided detection system that allows rapid and accurate identification of a tumor region by applying advanced image-processing techniques and classification methods on brain MR images.

Computer-Aided Detection System
A computer-aided detection system is a computer algorithm that can analyze complicated relations on medical images and develop a decision-making mechanism for these image structures based on these relations (Mohsen, El-Dahshan, and Salem, 2012).Within the framework of this study, the CAD steps followed in tumor detection on brain MR images are: preprocessing, segmentation, specifying the areas of interest and classification phase.Preprocessing phase primarily aims at eliminating distortions and improving the quality of the medical image (Arimura, Magome, Yamashita, and Yamamoto, 2009).In the segmentation phase, the speed of CAD is increased by narrowing the search field for abnormalities (Clark et al., 1998).The purpose of specifying the regions of interest is to reveal potential abnormal structures and conduct search on these areas in order to obtain results as fast as possible (Ambrosini, Wang, and O'Dell, 2010; Zarandi, Zarinbal, Zarinbal, Turksen, and Izadi, 2010).Rule-based approaches, template matching, nearest-neighbor clustering, artificial neural networks and Bayes classifier are few examples for the methods used in classification of regions of interest and detection of tumors (Gopal and Karnan, 2010).

MATERIAL AND METHOD
For the development and evaluation of the proposed system, we used the REMBRANDT dataset (Zikopoulos, Parasuraman, Deutsch, Giles, and Corrigan, 2012).REMBRANDT is a dataset published by the National Cancer Institute from The Cancer Imaging Archive (TCIA) database.TCIA is an extensive archive of medical images open to access and all images are in DICOM file format.Archived images are organized by the disease and imaging technique.All images in the REMBRANDT dataset are digitized at a resolution of 256 * 256 pixels and at 16bit gray scale level.Every slice has a thickness of 5mm.This dataset contains axial, sagittal and coronal planes of images with an average of 20 images for each plane for a total of 33 patients.In this study, we used 497 images of axial, sagittal and coronal sections of 10 patients.
Within the scope of the study, the steps and methods used in processing the brain MR images and detection of tumors are shown schematically in Figure 1.

Figure 1:
The steps and methods used in CAD study.

Preprocessing
The preprocessing stage of an image processing application serves to enhance image in order to make images suitable for later stages.Preprocessing stage of the developed system consists of median filtering and histogram equalization process.A median filter is used to eliminate any particular noise (Castleman, 1995).The aim in histogram equalization is to obtain a homogeneous distribution of the intensity of the image itself by optimally distributing intensity values between the maximum and minimum to be able to choose a threshold value in a simpler and more accurate manner that is needed for the intensity thresholding by analyzing the images (Li, Wang, Liu, Lo, and Freedman, 2001).

Segmentation
To separate the tumor region from the other structures, it was assumed that pixels in the region must be adjacent to one another with suitable intensities.
In intensity-based thresholding, the object and the background are separated into two distinct groups with different gray scales (Kittler and Illingworth, 1986).For intensity-based thresholding, utilizing an image histogram showing the gray scale distributions of the image is one of the simplest methods available (Capelle, Alata, Fernandez, Lefèvre, and Ferrie, 2000).Today, most of the thresholding methods used are either semi-automatic or non-automatic.In this study threshold value for each MR image is determined by a rule based approach.The Pseudocode for determining the threshold value is shown in Figure 2.

Specification of Regions of Interest
The brain MR images present several areas such as bone tissue and soft tissue and image background unrelated to the tumor tissues.These areas increase the complexity of the image and adversely affect the sensitivity of targeted computer-aided detection (Mei, Zheng, Bingrong, and Guo, 2009).The ROI were extracted to reduce the complexity of the system.With the specification of ROI, the unrelated structures were eliminated and only tumor candidates were obtained as a region of interest.To increase system performance, only ROI were considered in the scan rather than scanning the whole image pixel by pixel (Pal and Pal, 1993;Vannier and Haller, 1998).

Figure 2:
The Pseudocode for determining the threshold value

Morphological Reconstruction
In ROI specification, the objects in images have been processed according to their shapes by morphological operations.Morphological operations are based on the principle that an output image with exact size is obtained by applying a structuring element to input image (Dougherty, 2009).A morphological closing is performed and the holes are filled on the images to eliminate imperfections or repair distortions formed during preprocessing and segmentation phases in tumor candidate structures.The closing process is an erosion operation performed after the dilation process with the same structuring element (Gonzalez, Woods, and Eddins, 2004).While pixels are added to the borders of the image in the dilated image, the pixels on the borders of the erosion object are removed.The number of pixels either removed from or added to the object depends on the size and the shape of the structuring element.
A morphological reconstruction can be conceptually considered as a dilation operation repeated until an image (marker image), which contains the smaller parts of objects from another image, corresponds to the larger parts of the actual image (mask image) (Gonzalez et al., 2004).The process involves two images (the marker and the mask) instead of a single image and a structuring element.
To be able to create marker images, the original images initially undergo an erosion process to a certain degree.The structuring element to be used for this operation is selected carefully to protect the tumor structures.Then, eroded images are used as marker images while the actual images obtained through segmentation phase are used as mask images.With the erosion process, small and complex structures outside the region of interest are removed, and specific parts of the structures that could be regions of interest are restored by morphological reconstruction.Figure 3 shows Morphological reconstruction process.

Connected Component Labeling (CCL)
For labeling regions of interest, the Connected Component Labeling (CCL) method is used (Ronse and Devijver).This allows individual handling and examination of all regions of interest.The aim of CCL is labeling the independent structures in an image and recording the coordinates of the pixels that form these images (Manohar and Ramapriyan, 1989).The process of connected component labeling involves defining each connected component in a single label.

Object Center-Based 8-Directional Analyzing (OCB8DA):
In the OCB8DA process, the label matrix obtained through CCL method is used.When brain MR images are inspected, the tumor structures are observed to have a more defined diameter than others.It has been discovered that diameters of tumors are in a specific range.Therefore, to understand whether a region of interest was a candidate for tumor, diameter of the region was considered initially (Ozekes and Camurcu, 2006;Pal and Pal, 1993;Pitas, 2000).For this, the region was analyzed in 8-directions based on the center coordinates of each structure.Two threshold values have been defined at this stage which forms the boundaries.
The first value is minimum distance threshold which represents the lower boundary and the other is the maximum distance threshold representing the upper boundary.As seen in Figure 4 if the center pixel in 8 directions has adjacent neighbors that are less than the maximum distance threshold or more than the minimum distance threshold than it is concluded that the region of interest is a candidate tumor.Otherwise, the region is not a candidate tumor.

Figure 4:
Object center-based 8 directional analysis of regions of interest.

Tumor Detection
For automatic labeling of regions of interest in tumor detection, these are assigned in a predefined category as tumor or normal (Pal and Pal, 1993).This process consists of two parts: feature extraction and classification (Pitas, 2000).At this stage, the decisions should be taken with utmost attention since these will immensely impact the performance of the system.Figure 5 shows the images obtained by implementing CAD steps on a patient's coronal section respectively from one side to the other.

Feature Extraction
In this study, shape-based and histogram-based features are used.After the identification of regions of interest, 10 different features of these regions are calculated.Histogram-based features are mean, variance, standard deviation, skewness, kurtosis and entropy.Shape-based features are area, perimeter, centroid and Euclidean distance.
Mean: Average of all pixel values.
Variance: Average of squared deviation of all pixels from mean.
Standard Deviation: Standard deviation is a measurement used to summarize the dispersion of data values.Defined as the values' square root of the arithmetic mean of squares of the deviations from the arithmetic mean.
Kurtosis: The kurtosis is the measurement of peakedness or flatness properties defined from the graphical representation of probability distribution for a real-valued random variable (Pitas, 2000).Skewness: Skewness is the measurement of asymmetry of a probability distribution of a realvalued random variable.Entropy Value: An entropy feature of a tissue image gives its content information.Wide gaps with no feature contain less content information.Diffused areas, on the other hand, give more content information.
Area: Area estimates the area of all of the on pixels in an image by summing the areas of each pixel in the image.The area of an individual pixel is determined by looking at its 2-by-2 neighbourhood.
Perimeter: This feature gives the length of the perimeter of a region of interest in pixels.The pixel is part of the perimeter of objects.
Centroid: The geometric mean of pixel coordinates of the identified region of interest gives the centroid of that region of interest.
Euclidean distance: The Euclidean distance is the straight-line distance between two pixels.

Classification
Four different classification methods used in this study and they are explained below.Decision Tree: Decision tree classification algorithm is an arithmetic method that allows classification of new data by forming decision trees or decision rules through a learning process by using training data.We used -C4.5 algorithm to generate a decision tree (Quinlan, 2014).
Artificial Neural Networks: Artificial neural networks (ANN) are a mathematical model or computational model inspired by the human brain which are useful in application areas such as pattern recognition and classification (Lippmann, 1989).ANN is kind of non-linear statistical data modeling tool.It is usually used with complex model or to find patterns of the data.For the building process of Artificial Neural Networks (ANNs) in classification we used Multilayer Perceptron (MLP).MLP is a classifier that uses backpropagation to classify instances.Neurons are grouped in layers and only forward connections exist in MLP (Haykin and Lippmann, 1994).
K-Nearest Neighbors: K-nearest neighbors (KNN) algorithm is classifying a new data among a number of known examples.When a new sample comes in, the algorithm decides its class by checking its k-nearest neighbors.
In KNN algorithm, a positive integer k is specified firstly.Then, along with a new sample the k entries in dataset which are closest to the new sample are selected.Finally, the most frequent class in the k nearest neighbors is computed and returned as the new sample's class (Hellman, 1970).

RESULTS AND DISCUSSIONS
By morphological construction of 497 segmented MR sections, 577 regions of interest are identified.90 of these regions of interest are labeled as tumor while 487 as non-tumor.With various classifications on these regions of interest, the success rate of the CAD system is calculated.In the classification process, Decision Tree, Artificial Neural Networks, K Nearest Neighbor and Meta-Learner (Decorate) classification algorithms are utilized.Our dataset was divided randomly into 66% and 33% for training and testing, respectively.During the classification phase, accurate and incorrect detection of regions of interest are also observed.
Table 1  When the obtained results were examined, the proposed method's accuracy is very close to or higher than CAD studies based on different classification algorithms.Furthermore, the fact that the segmentation phase is automated independently from image-based parameter identification contributed to the higher accuracy rate compared to other studies.It is observed that Meta-Learner (Decorate) and K-Nearest Neighbors classifiers have the best specificity.In addition, the highest accuracy and sensitivity are achieved by K -Nearest Neighbors algorithm.The CAD system is compared for with recent studies can be seen in Table 2.We selected CAD schemes which are using similar methods.The performance of our method is similar to or better than other methods.In Table 2, the highest accuracy is given by El-Sayed as 99% because of selecting a small number of brain MR images from the dataset which could not have a lot of varieties.Also dimension reduction process using PCA can help learn a better classifier, particularly when the data does have a low-dimensional structure with small datasets.As a future work our dataset can be analyzed for the role of dimension reduction with classifiers.

CONCLUSIONS
With the aid of the CAD software designed in this study, the tumors in brain MR images are easily detected.In analysis of medical images, segmentation is a very significant concept.Unlike other methods, this study achieves an effective segmentation of regions of interest in brain without being dependent on a dataset by automating the method of intensity-based thresholding.Another important point in achieving a successful classification during detection is the effective selection of features that best differentiate a tumor region.To be able to do this, the data should be analyzed well and the features that best sort the data should be identified.Here, the parameters used and the values calculated for these parameters become significant.At this point, the method of morphological reconstruction followed by classification implemented in this study allows better differential characteristics.According to the experimental results, the proposed method is efficient for the classification of the tumors in brain MR images.The classification performances of this study show the advantages of this technique.Developed CAD system facilitates the physician's tumor detection and accelerates decision-making process.
Meta-Learner (DECORATE): DECORATE (Diverse Ensemble Creation by Oppositional Relabeling of Artificial Training Examples) is a meta-learner that uses specially constructed artificial training examples.DECORATE uses an existing "strong" learner as base classifier to build an effective diverse committee in straightforward manner.New committee members are constructed by adding different randomly samples to the training set (Melville and Mooney, 2003).

Table 1 . Classification results of the algorithms.
shows the classification rates for the used classifiers, where:  TP: Number of true positives. TN: Number of true negatives. FP: Number of false positives. FN: Number of false negatives.