PARAMETER ESTIMATION BY TYPE-2 FUZZY LOGIC IN CASE THAT DATA SET HAS OUTLIER

. One of the problems encountered in estimating the unknown pa- rameters of the regression models is the presence of outliers in the data set. This situation may cause problems in providing some assumptions such as the normal distribution for the parameter estimation process and the homogeneity of the variances. The case of the presence of outlier observations in the data set, estimation methods based on fuzzy logic that can be minimized the level of impact of this data are emerged as available methods. If fuzzy logic is used in regression analysis, there are two main steps for parameter estimation. The (cid:133)rst of these is to de(cid:133)ne the clusters that compose the data set, and the other is calculate the degree of membership to determining the contributions of the data to each model for the clusters. In this study, type-2 fuzzy clustering algo- rithm de(cid:133)ned as an expansion of fuzzy c -means algorithm in the determination of membership degrees of data sets was bene(cid:133)ted. The presence of outliers in the data set is addressed. An algorithm has been proposed to estimate the unknown belonging to parameters of the regression model using the membership degrees obtained relating to the cluster elements. The parameters were estimated using regression methods to examine the e⁄ectiveness of the algorithm that called robust methods, and the results obtained were compared.


Introduction
The concept of fuzzy sets was …rst described by Zadeh in 1965 with his work Fuzzy Sets [1]. The fuzzy c-means method was introduced by Dunn in 1973. The method was developed with the studies carried out by Bezdek [2]. Zadeh et al. published their studies on fuzzy sets and their application to decision processes in 1975 [3]. The fuzzy c-means algorithm developed by Bezdek et al., includes euclidean, diagonal and mahalanobis distance measurements and the output from 1194 T. ERBAY DALKILIC, K. SANLI KULA, S. SAGIRKAYA TOLAN this algorithm is controlled by validity criteria [4]. Mendel and John precisely de…ned type-2 fuzzy sets and used the expansion principle for type-2 fuzzy sets in 2002 [5].
They discussed the uncertainty of the m fuzzi…er parameter of the fuzzy c-means algorithm, and they de…ned the m parameter as the interval number. The fuzzy c-means algorithm that include m fuzzi…er parameter uncertainty is applied type reduction, and solutions are obtained in 2007 [6]. Dalkilic and Apaydin determined the optimal class number using the validity criterion when the independent variables had an exponential distribution and has been made parameter prediction using fuzzy neural network [7]. Juang et al. proposes a repetitive type-2 Fuzzy Neural Networks (FNN) for dynamic system processing. The self-developing network is a structure that does not require a pre-assignment task and can automatically develop its parameters according to the training data [8]. Fazel Zarandi et al. suggested a new type-2 fuzzy c-regression clustering method for the application of the steel industry in the Takagi-Sugeno (T-S) system identi…cation stage and the model was tested on the actual data set from a steel company [9]. Enke and Mehdiyev proposed the use of a hybrid model for the estimation of short-term US interest rates. The model consists of fuzzy type-2 inferential neural network that performs input pretreatment with multiple regression analysis and fuzzy type-2 clustering based on di¤erential evolution. The proposed model was applied to estimate US 3-Month T-bill ratios in 2013 [10]. In 2015, Kalhori and Fazel Zarandi presents a new approach to type-2 fuzzy clustering. This approach proposed to separating clusters that does not use only the distance from the centers, and a new validity index is suggested to determine the optimal number of clusters [11]. In 2016, Golse…d and Zarandi present an algorithm for clustering. In the clustering algorithm that developed according to the dicentric type-2 fuzzy clustering model, the centers of the clusters are de…ned by the double object. There are no type reduction or blurring steps in this algorithm [12]. In 2016, Hwak proposed a method for the design of the linear regression and this method is designed using Type-2 Fuzzy C-Means (T2FCM) clustering. This clustering approach takes into account the uncertainty associated with the fuzzi…cation factor when estimating cluster centers. The method was also supported by experimental results [13]. Rubio et al. presented an extension of the Fuzzy Possibilistic C-Means (FPCM) algorithm. In this algorithm, Type-2 Fuzzy Logic Techniques were used to increase the e¢ ciency of the Fuzzy Probabilistic C-Means (FPCM) method. In addition, the performance of the method was controlled by experimental data [14].
In this study, membership degrees for cluster elements are obtained by using type-2 fuzzy clustering algorithm, and an algorithm has been proposed for the regression model to include these degrees based on parameter estimates. The situation of the outlier observation in the data set was discussed, and estimation values from the parameters obtained based on type-2 fuzzy clustering were obtained.
Remainder of this paper is organized as follows. In the second part of the study, type-2 fuzzy clustering method was described. In the third section, de…nitions of robust regression methods to be used in comparison were given brie ‡y. In the fourth section, an algorithm was proposed for parameter estimation based on the type-2 fuzzy membership. In the application part, Proposed algorithm for data set that has outlier, and estimates concerning models obtained using regression methods were compared.

Fuzzy Clustering Based Type-2 Fuzzy Logic
While equal fuzzi…er index is given to each set-in type-1 fuzzy clustering, the fuzzi…er index is de…ned as an interval in type-2 fuzzy clustering. Di¤erent fuzzi…er index is given each cluster. Performance loss is prevented with description when sets have di¤erent set volumes. First, volumes of the sets obtained with fuzzy clustering algorithm are determined. Fuzzi…er indexes based on obtained volumes and cluster center based on these fuzzi…er indexes are calculated. The objective function values are determined based on the principle of minimizing the distance between cluster centers and cluster elements [15].
The center value and membership values are updated until the objective function reaches the smallest target value. Fuzzi…er indexes that have the optimal center value and membership degrees are determined, and observations are divided into sets based on obtained membership degrees. The parameters of the linear regression model related clusters are estimated based on the membership degree that obtained from type-2 fuzzy clustering.
In the clustering that based on type-2 fuzzy logic, the objective function that calculated for interval m = [m 1 ; m 2 ] de…ned as The aim of the function given by the Eq. (1) is minimizing the error. In the system of Eq. (1), m 1 and m 2 are the fuzzi…er index of the …rst and second sets, respectively. The weighted least squares function J m1 U; v is the sum of the weighted error squares of the …rst set and J m2 U; v is the sum of the weighed error squares of the second set, and d 2 ji = kx i v j k 2 is used to express the distance between data and cluster centers. Type-2 fuzzy clustering algorithm that is based on the aim of the function that given by Eq. (1) can be given by the following steps [16,17].
Step 1: Initial values given as; c: number of sets, m 1 and m 2 : fuzzi…er indexes of the …rst and second sets, U : matrix showing the membership degrees and ": termination criterion, are determined.
Step 2: Set centers are calculated using U matrix fuzzi…er indexes m which is arbitrarily determined in …rst step and m = [m 1 ; m 2 ] fuzzi…er parameters, ji ; j = 1; :::; c: Step 3: u ji and u ji are indicates the upper membership degree and the lower membership degree respectively [6]. These degrees updated with Eq. (3) and Eq. (4).
Step 4: When the v Lj and v Rj are indicated the centers that obtained by using m 1 and m 2 respectively Eq. (5) is used to type-reduction for set centers.
Step 7: If kv L(t) v L(t 1) k < " and kv R(t) v R(t 1) k < " the iteration is ended. In other case it returns to Step 2.

Robust Regression Methods
In the case of an outlier in the data set, the resulting regression model moves away from observations other than the outlier by the e¤ect of the outliers. Residues of observations other than outliers are increased. In Robust analysis, it is assumed that these deviations do not signi…cantly a¤ect the performance of the algorithm [18]. In the case of Robust regression analysis with outlier, parameter estimation that is less a¤ected by the Least Square Method (LSM) is obtained [19]. In this study, estimations were obtained by using M methods from Robust methods. The M method minimizes the function of the residues rather than minimizing the sum of the squares of the residuals. Regression coe¢ cients are obtained by the minimizing the sum.
Huberâ's function is de…ned as where k is called tuning constant and k is set at 1:5. Sometimes the numerator of d is called the median of the absolute deviations (MAD) [20].

Parameter Estimation based on Type-2 Fuzzy Logic When Data Set Has Outlier
The general purpose of the regression analysis is to determine the mathematical structure of the functional relationship between the dependent variable Y and independent variables X 1 ; ; X p . Determination of the mathematical structure is carried out by estimating regression coe¢ cients .
Y = 0 + 1 X 1 + 2 X 2 + : : : The least square method is one of the important methods used to estimate the parameters of the linear regression model that given by Eq. (13). The important assumptions to use this method; that error terms related to the model should have normal distribution with zero-averaged and …xed variance. This assumption is expressed mathematically with " N 0; 2 . Estimators of the regression coe¢ cients denoted by b and determinate by, The estimators of the dependent variable Y is shown as b Y and determined by The error for the linear regression model that expressed as the di¤erence between observation values Y and estimation values b Y is given by, In classical regression analysis, the observations that make up the dataset belong to a single class. If the data set has di¤erent distributions in regression analysis, di¤erent methods should be used in parameter estimation, other than classical methods. These methods do not have to provide the necessary assumptions to use the classical method. If the data set has di¤erent distributions, fuzzy methods are among the methods that do not require the assumptions of classical regression. First step of fuzzy regression analysis is to determine the di¤erent clusters for the data set and the other is to obtain the degrees of membership to be used in the prediction process.
In the process of separating data with di¤erent distributions into clusters, fuzzy clustering algorithms suitable for distribution are used. With fuzzy clustering algorithms, the degree of membership is determined for each cluster. These membership degrees are used to weight the data. Parameters of the regression model are determined to have a minimum error using data that weighted by fuzzy membership degree.
Using the type-2 fuzzy clustering process, the algorithm proposed for parameter estimation of the regression model for data weighted by membership degrees given by the following steps, Step 1: Beginning values given as; c: number of sets, m: fuzzi…er indexes of the …rst and second sets, U : matrix showing the degrees of membership and ": termination criterion, are determined.
Step 2: Set centers are calculated using U matrix and fuzzi…er indexes m; v j = P n i=1 u m ji x i P n i=1 u m ji ; j = 1; :::; c: Step 3: Objective function that depend on membership degree and set centers is calculated by, Step 4: Membership degrees of each set are updated with, Step 5: Set centers and objective function are updated with Eq. (17) and Eq. (18) by use the new membership degrees.
Step 6: If the di¤erence between the membership degree in t th step and the membership degree in (t 1) th step is smaller than " stops. It means that the optimal membership degrees and center are calculated.
Step 7: The membership degrees obtained from Eq. (19) are used to cluster the data set. m 1 and m 2 fuzzi…er indexes are calculate and set centers based on fuzzi…er indexes are calculated with Eq. (2) given in Section (2). Objective functions values that according to these centers are calculated by using Eq. (1) that given in Section (2).
Step 8: To reduce the type of fuzziness type-reduction applied to the set centers by use the Eq. (5) and, objective function value calculated by use the Eq. (7) given in Section (2).
Step 9: To clustering that use type-2 fuzzy logic, membership degrees determined by Eq. (3) and Eq. (4), and type-reduction operation is applied to the membership degree by Eq. (6).
Step 10: After the center value in t th step and in (t 1) th step is calculated, the di¤erence between them is determined. If the di¤erence is less than termination criterion " for existing sets the optimal center and membership degree achieved.
Step 11: Estimate the linear regression modelâ's parameters are realized by using membership degree as weight with obtained from type-2 fuzzy clustering [6]. Independent variable is weight by membership degree The estimated values are calculated by, b Y i(T ype 2) = X W i(T ype 2) b B i(T ype 2) ; i = 1; : : : ; c: Step 12: Error related to the models measured as The model that has the smallest error is used as estimated linear regression model.

Application
In this application, which will be discussed to determine the e¤ectiveness of the proposed algorithm to obtain the linear regression model the dataset contains a dependent and an independent variable and set has 61 pairs of observations. Scatter chart of the data that are shown in Figure 1.  Table 1. In Table 2, models generated by the parameters obtained using related methods and the amount of error calculated from the models.
A graph of error for the models obtained using the relevant methods are shown in Figure 2.

Results and Discussion
As a result, estimations that obtained with determined of fuzzy parameters in proposed algorithm for parameter estimation, and the results obtained by robust regression methods in the literature are compared. As a result of the seven methods examined, error amounts were obtained. The errors amount belonging to these Table 1. Predictions values for related method and error values of related predictions YAndrews "Andrews 4.0 3. 5   methods are obtained that LSM is 4:7141, type-1 fuzzy clustering is 4:2887, type-2 fuzzy clustering is 4:2767, Huber method is 5:0208, Hampel method is 5:0646, Tukey method is 5:4003, Andrews methods is 4:9785. As can be seen from the results, the model with the lowest error is the model obtained from type-2 fuzzy clustering. It can be said that if there are outlier observations in the data set, the method that using type-2 fuzzy clustering can be preferable as an e¤ective method.