A CORRECTION ON TANGENTBOOST ALGORITHM

,


Introduction
Classi…cation, in other words supervised learning, is a procedure that obtain a classi…er based on a training dataset.The observed classi…er determines which class the observation belongs to.High accuracy in the testing dataset means that the classi…er is better one.Risk classi…cation, cancer detection, object detection, outlier detection, image classi…cation are some applied areas in classi…cation methods.Over the last decade, many statistical methods have been applied including linear regression, logistic regression (LR), neural networks (NNet), Naive Bayes (NB), k-nearest neighbor (kNN), Support Vector Machine (SVM), boosting methods and other approaches [1,2].The methods are usually based on optimization problems comprised loss functions.While advanced methods minimize misclassi…cation not only using loss functions but also using the distance between di¤erent classes'inputs such as SVM, boosting methods classify inputs according to sum of some weak classi…ers [3].
Boosting is a general method to improve the performance of weak learners.Boosting algorithms are iteratively methods and the weak classi…ers are obtained in each iterations.Then, combining weak classi…ers is a way of determining the 2 ONUR TOKA AND M ERAL CETIN propensity scores and class labels at the end of the iteration steps [1].Despite usefulness, boosting algorithms have some limitations.The common problem is unbounded growth with negative margin for boosting algorithm.Thus, outliers and contaminated part in training data can spoil the classi…ers in boosting methods.Great advances have been achieved to make more robust boosting algorithms in the last decade [4,5,6,7,8].
In [9], loss functions are argued with regard to give unbounded penalty values.To solve this problem, they gave some important information between probability elicitation and bayes consistency and they formalized a new way to obtain bayes consistent loss function.After arguing that robust loss function should penalize both large positive and negative margin, they proposed a new loss function, Tangent loss, and a new boosting algorithm TangentBoost.Although the method is better one in object tracking, probabilities (p) assign class label improperly because of p 2 2 + :5; 2 + :5 [10].In this study, for TangentBoost algorithm, propensity score is rede…ned in order to get accurate weights and class labels properly.Section 2 reviewed binary classi-…cation, loss functions and concerned boosting methods in binary case.In Section 3, robust loss properties, Tangent loss function and the correction were given.In addition, importance of weights and class assign probabilities with the correction were showed.In Section 4, simulation results were given.

Boosting Algorithms in Binary Classification
Binary classi…cation is one of the most encountered methods in applications.Spam mail detection, pattern characterization, diagnosis, digit recognition, signal recognition are some application phases of binary classi…cation.The basic logic is to …nd classi…er that can assign observations to two classi…ers well according to inputs.Let consider g maps a inputs vector x 2 X to label y 2 f 1; 1g.The classi…er function f : X !R is the predictor of class label by the way of g (x) = sign [f (x)].Loss function is de…ned as below: The predictor is g (x) = sign [f (x)] and f (x) > 0, case assigns to 1 and -1 otherwise.Combining information f (x) and y from the Equation (2.1), it is seen that f (x) y < 0 means misclassi…cation and f (x) y > 0 means accurate classi…cation.The quantity of f (x) identi…es the distance from the case to the classi…er.Therefore, minimizing Equation (2.1) is a¤ected not only misclassi…cation but also large margin from the classi…er.To get robust classi…er, loss functions, which also give penalty to large positive margin, have been investigated [8,9].
Especially in boosting methods, minimizing loss function value is an important task.The most common loss functions are exponential loss and logistic loss that are de…ned as Equation (2.2) and Equation (2.3): 3) Changing loss functions in the algorithms is a way of obtaining new boosting algorithm.The penalty values for misclassi…cation are changed by using di¤erent loss functions.For instance, exponential loss increases penalty values very rapidly than logistic loss though exponential and logistic losses grow unbounded.Logistic loss is also unbounded but its increase is not as rapid as the exponential loss.In addition, exponential loss gives less penalty values than logistic loss in accurate clas-si…cation, but both functions'penalty values for large positive loss value are zero.It is also examined in Figure 1.The mention di¤erences cause di¤erent weighting for training data.Using loss functions, lots of boosting algorithm are proposed.AdaBoost is popular and the …rst algorithm that could adapt to the weak learners (See [11] for algorithm and the method).LogitBoost was proposed similarly.The main di¤erence is that LogitBoost utilizes logistic loss to weight the data points, while AdaBoost utilizes exponential loss (See LogitBoost algorithm in [12]).On the other hand, unbounded increment of penalty value reveals the over…tting problem.Therefore, bounded loss functions and its boosting algorithms have been proposed in the few years [13,14].TangentBoost is an alternative loss function and the method has bounded loss function.In the next section, the algorithm and the correction on the algorithm are given.

TangentBoost and the Correction
Robust boosting algorithms obtain classi…ers without being a¤ected by outliers.
In training data, some mislabeled (outliers) and contaminated observations may a¤ect the classi…er.It is usually pointed out that outliers may easily spoil classical boosting algorithms such as AdaBoost, RealBoost [15].As a result, classi…ers can be improper and their generalization ability may not be good.To make classi…ers more stable, some researchers proposed robust boosting algorithms [13,14,16,17,18].The idea behind TangentBoost algorithm is probability elicitation and conditional risk minimization [19].The connection between risk minimization and probability elicitation has been studied in [9].The results showed that if maximal reward function has equality with the formula J ( ) = J ( 1) , the classi…er f is invertible and has symmetry f 1 ( v) = 1 f 1 (v), then new link function and reward function are a way of obtaining a new loss function by using Equation (3.1): After theoretical properties, from the tangent link (f ( ) = tan ( :5)) and the risk function C = 4 ( 1), tangent loss function is given in Equation (3.2) [9]: 2) Tangent loss function arranges more penalties to positive margin than the other loss functions.It is clear from the Figure 2, unlike classical loss functions; tangent loss function penalizes not only negative margin but also positive margin.Penalizing large positive margin limit the e¤ect of observations which are very far from classi…er though it is accurate classi…ed.TangentBoost algorithm is adapted with the similar way of LogitBoost (See LogitBoost codes in [20] and [21]).However, probability of class label is not proper because of p 2 2 + :5; 2 + :5 in TangentBoost algorithm [10].To solve this problem, propensity scores are reduced to interval [0; 1] by using formula p = tan 1 (f ) tan 1 ( 1)  tan 1 (1) tan 1 ( 1) instead of p = tan 1 (f ) :5. TangentBoost algorithm with the correction is given as below [22].
In the algorithm, after initialization the values, weights and z (m) i are calculated by formula obtained Tangent loss function.In the second loop, reweighted least squares obtain the most important variable for the …rst iteration.Using the most important variable and its linear regression prediction, classi…er function, weights, propensity scores are updated.The algorithm continues during the iterations.After the last iteration, the classi…er function describes the class labels.
Probabilities for assigning class label, in another saying propensity scores, are limited to between zero and one with the correction.When the propensity score is around zero or one, it means class label of observation is clear and weight of observation is around zero.That is, if propensity score is enough to de…ne class label, the weight starts to decline and concerning observation will not be very important in the next iteration.On the other hand, if propensity score is around 0.5, observation is near to classi…er.As a result, the weights start to increase and the observation around the classi…er will be more important in the next iteration.After de…ning best variable for each iteration via iteratively reweighted least squares, then it is easy to …nd classi…ers for all iterations.At the end of the algorithm, sign of combining classi…er or the propensity scores decide class labels.Additionally, TangentBoost is one of the alternative boosting method that produce propensity score like logistic regression.Separating propensity scores more than two labels is aimed to obtain multiclass label [23].The method becomes comparable to logistic regression with the statistical correction on propensity scores.Furthermore, classifying observations will become more stable with the correction.
In summary, correction on TangentBoost can be good process to obtain classi…er that not been a¤ected by outliers in training data.In the simulation design, it will be seen how TangentBoost can obtain better classi…er than classical most-known methods in the presence of outliers and mislabeled data.
Algorithm: TangentBoost Algorithm with the correction on p Inputs: Training data set D = f(x 1 ; y 1 ) ; (x 2 ; y 2 ) ; ; (x n ; y n )g, where y is class label f 1; 1gfor observations x and number M for weak learners.Initial Values:Class label probabilities 1 (x i ) = :5 and the classi…er f1 (x) = 0 Loop 1. m = 1; 2; ; M Calculate the z i (m) and weights for all observations given formula below: Minimize LS problem below to select the most important variable with the given equation where hq Obtain important variable k given formula: Obtain classi…er and also probability score for all observation End of Loop 1.
De…ne class label with the given formula below:

Simulation Study
There is a simulation study to compare TangentBoost and classical boosting algorithms in real datasets.There are three di¤erent datasets.The datasets are obtained from UCI Machine Learning Repository [24] and they are king gaming [25], qualitative bankruptcy [26] and credit approval datasets [27].There is some basic information about datasets in Table 1.
Two of dataset's labels are completely separable from each other's.However, there is only one set, credit approval, which has a linearly inseparable data structure.These datasets were included to vary number of observations, number of variables and class label proportions.2, when training part is 70% and 80% of data, when number of iteration is 40, means of overall accuracy in both training and testing parts are given for 250 repetitions.TangentBoost algorithm had similar results in training and testing part for all datasets.There were not any dramatically decreasing from training to testing accuracy scores.On the other hand, while all other boosting algorithm gave impressive result for completely separable datasets, there were not any signi…cant di¤erences between classical algorithms and TangentBoost in testing accuracy scores.Moreover, there were dramatically decreasing all classical boosting algorithms' scores from training datasets to testing datasets while there was not any di¤erences in TangentBoost algorithm.To summarize the results, Tangent-Boost will not useful in completely separable dataset without mislabeling while the method may be useful almost separable data.Accuracy results of classical boosting methods easily decreased in testing data when the training data are not completely separable.Logistic and exponential loss functions are incapable to preserve stability of general accuracy rate in CA testing data as seen from Table 2 .
To clarify the robustness of TangentBoost in the presence of mislabeled observations, di¤erent proportions of mislabeled observations were obtained on qualitative bankruptcy and credit approval datasets.In Table 3, when training part is 70%, when number of iteration is 40, means of overall accuracy in testing parts are given GentleBoost with logistic loss for 250 repetitions.TangentBoost was better than the methods in the presence of mislabeled observations in testing part as seen in Table 3 and Figure 3. GentleBoost with logistic loss To summarize simulation results, TangentBoost can be good robust procedure in the presence of outliers in training data.The method cannot been spoiled by contaminated part in training dataset.However, it is not as well as other classical methods in separable dataset as been expected.Adding mislabeled in separable data uncovered that TangentBoost is better than classical ones.Simulation on real data indicates that the algorithm is a useful method when training data set has mislabeled observations.In this study, TangentBoost algorithm is given with a correction.Outliers or contaminated part in training data may be problem in boosting algorithm.Especially, outliers in boosting algorithms can in ‡uence weak classi…ers very easily.To overcome this problem, robust boosting algorithms are e¤ective methods.Tangent-Boost with the correction is quite useful if there are outliers or contaminated part near the classi…er in almost linearly separable data.

Figure 3 .
Figure 3. Accuracy scores of the methods according to mislabeled proportion (Left).Quality Bankruptcy testing data (Right).Credit Approval testing data

Table 1 .
Some Information about Real Dataset Example

Table 2 .
Mean of the Overall Accuracy in Real Datasets for Tan-gentBoost and some Classical Boosting Methods Qualitative Bankruptcy, KG: King Gaming, CA: Credit Approval TB: TangentBoost RB-Exp: RealBoost with exponential loss; GB-Exp: GentleBoost with exponential loss RB-Log: RealBoost with logistic loss; GB-Log:

Table 3 .
Mean of the Overall Accuracy in Real Datasets for Tan-gentBoost and some Classical Boosting Methods in the presence of mislabeled data Qualitative Bankruptcy, CA: Credit Approval TB: TangentBoost RB-Exp: RealBoost with exponential loss; GB-Exp: GentleBoost with exponential loss RB-Log: RealBoost with logistic loss; GB-Log: