PERFORMANCE OF SHANNON ’ S MAXIMUM ENTROPY DISTRIBUTION UNDER SOME RESTRICTIONS : AN APPLICATION ON TURKEY ’ S ANNUAL TEMPERATURES

Entropy has a very important role in Statistics. In recent studies it can be seen that entropy started to take place nearly in every brunch of science. In information theory, entropy is a measure of the uncertainty in a random variable. While there are different kinds of methods in entropy, the most common maximum entropy (MaxEnt) method maximizes the Shannon’s entropy according to the restrictions which are obtained from the random variables. MaxEnt distribution is the distribution which is obtained by this method. The purpose of this study is to calculate the MaxEnt distribution of Turkey’s Annual temperatures for last 43 years under combinations of the restrictions 1, x, x, lnx, (lnx), ln(1+x) and to compare this distribution with the real probability distribution by the help of Kolmogorov-Smirnov goodness of fit test. According to the results, goodness of fit statistics accept the null hypothesis that all the entropy distributions fit with the probability distribution. The results are given in related tables and figures.


Introduction
Historically, many notations of entropy have been proposed.The etymology of the word entropy dates back to Clausius (Clausius 1865), in 1865, who dubbed this term from the greek tropos, meaning transformation, and a prefix en-to recall the indissociable (in his work) relation to the notion of energy (Jaynes 1980).A statistical concept of entropy was introduced by Shannon in the theory of communication and transmission of information (Lesne, 2011).
A Maximum Entropy (MaxEnt) density can be obtained by maximizing Shannon's information entropy measure subject to known moment constraints.According to Jaynes (1957), the maximum entropy distribution is "uniquely determined as the one which is maximally noncommittal with regard to missing information, and that it agrees with what is known, but expresses maximum uncertainty with respect to all other matters."The MaxEnt approach is a flexible and powerful tool for density approximation, which nests a whole family of generalized exponential distributions, including the exponential, Pareto, normal, lognormal, gamma, beta distribution as special cases (Wu, 2003).
There are potentially more appropriate measures of information than the variance, however, such as that developed by Shannon (1948), Shannon and Weaver (1949), Renyi (1961) and Khinchine (1957).This information theoretic approach was rigorously related to the general body of statistics by Kullback and Leibler (1951) and Kullback (1959).These authors and other current analysts such as Parzen (1990a, b) and

Alphanumeric Journal
The Journal of Operations Research, Statistics, Econometrics and Management Information Systems ISSN 2148-2225 httt://www.alphanumericjournal.com/ Brockett (1992) have continued to conduct research to show how the information theoretic approach can lead to a view of statistics which both unifies and extends the various parts of the body of statistical methods and theories (Brocket et al 1995).

Material and Method
As Losee (1990) mentioned; the amount of selfinformation that is contained in or associated with a message being transmitted, when the probability of its transmission is p, the logarithm of the inverse of the probability is as in [1].
For a random variable X with values in a finite set R, Shannon's entropy H(x) can be defined as in [2].
The choice of a logarithmic base corresponds to the choice of a unit for measuring information.If the base 2 is used the resulting units may be called binary digits, or more briefly bits, a word suggested by J. W. Tukey (Shannon and Weaver, 1949).
Recent studies show that, when deciding the restrictions, Entropy distributions of the characterizing moments and some combinations of these moments of a known statistical distribution gives better results to model the data set.For example Wu and Stengos (2005) used x , x 2 , ln(1+ x 2 ) and sin x functions as the restrictions, Wu and Perloff (2007) used x , x 2 , ln(1+ x 2 ) and arctan x and Shamilov et al (2008) used x , x 2 , x 3 , ln x , (ln x) 2 and ln(1+ x 2 ) as the restrictions for the entropy distribution (Usta, 2009).
In our study like these recent studies we used 1, x, x 2 , lnx, (lnx) 2 , and ln(1+x 2 ) as the restrictions to calculate the entropy distributions.
When there are more than one restriction, we need to use Lagrange multipliers to solve the restricted equations at the same time.If we consider an entropy distribution with three restrictions, to find the MaxEnt distribution of a random x variable, with probabilities  1 ,  2 , … ,   the H(x) must be solved under the restrictions given below.
For three restrictions like this, the Lagrange function can be obtained as in [6].Here   are the i th moments of the related data.0 11


(6) If we set equation [6] to zero after derivation according to   s, then As a result we can obtain the MaxEnt
As an illustrative example lets think that we have observations as 3, 7, 10 and 12. and lets take the restrictions as (1, x and x 2 ) now we may write the equations like in [13] (Çiçek, 2013).When we adapt the given equations we can obtain the equations given [14].
When we solve these equations we can obtain the Lagrange multipliers as; 0 = 0.5618,  1 =-7,80E-18 and  2 = 0.0141 As a result, by the help of these multipliers we may obtain the MaxEnt distribution as in Table 1.

Application
In this section of the study, MaxEnt distributions for temperature values in Turkey during the last 43 years are calculated.The data set is obtained from Turkish State Meteorological Service.To calculate MaxEnt distribution of the related data set under restrictions with the help of the Lagrange multipliers we used MATLAB software and developed a program to calculate any discrete data set under some restrictions.The frequency distribution for this data set can be seen in Table 3 and its histogram is given in Figure 1.   2.   x 2 ) is 1.7268, restrictions (1, x 2 , ln(1+x 2 )) is 1.7273, restrictions (1,x, x 2 , ln(1+x 2 )) is 1.6865, restrictions (1, x, x 2 , (lnx) 2 , ln(1+x 2 )) is 1.6547, and restrictions (1, x, x 2 , lnx, (lnx) 2 , ln(1+x 2 )) is 1.6507.
It can also be seen that by increasing the number of restrictions, the entropy values decrease.
At the next step of the analysis, Kolmogorov-Smirnov goodness of fit test is applied to test whether or not each of the entropy distributions under these restrictions fit to the real probability distribution.
The two-sample Kolmogorov-Smirnov (K-S) goodness of fit test is one of the most useful and general nonparametric methods for comparing two samples, as it is sensitive to differences in both location and shape of the empirical cumulative distribution functions of the two samples.
If Fo(x) is the population cumulative distribution, and SN(x) the observed cumulative step-function of a sample (i.e., SN(x) = k/N, where k is the number of observations less than or equal to x), then the sampling distribution of D= maximum |Fo(x) -SN(X)| is known, and is independent of Fo(x) if Fo(x) is continuous (Frank and Massey, 1951).

Alphanumeric Journal
The Journal of Operations Research, Statistics, Econometrics and Management Information Systems ISSN 2148-2225 httt://www.alphanumericjournal.com/ Maximum Differences between the probability distribution and entropy distributions (D) according to Cumulative Density Functions (CDF) and probabilities for these differences are given in Table 4 and the graph for Cumulative Density Function for all entropy distributions is given in Figure 2. Table 4 shows that according to the probabilities (p(D)) of K-S test we accept the null hypothesis and we can say that the maximum entropy distributions under all restrictions rrgrg statistically fit to the related data set 95% confidently.
While we obtain the maximum information from the entropy distribution under six restrictions, according to Figure 2. and the D values given in Table 4, the maximum difference is between the probability distribution (the red line in Figure 2) and the entropy distribution under three restrictions (the green line in Figure 2) according to Cumulative Density Function.

Figure 1 .
Figure 1.Histogram for the annual temperature values (in Celsius) of Turkey for last 43 years.

Figure 1
Figure 1 shows that the average annual temperature of Turkey in last 43 years is about 11-12 C. Entropy values are calculated under two, three, four, five and six restrictions for this data set.The best entropy values (Minimum uncertainty amount) for the related restrictions are shown in bold and given in Table2.

Figure 2 .
Figure 2. CDF Graph of KS test

Table 1 .
MaxEnt distribution of the sample for three restrictions.

Table 2 .
Entropy values of the temperature distribution under given restrictions.

Table 2
shows that the minimum Entropy (Maximum information) is obtained as 1.6507 under six restrictions.As a summary of the table; the minimum entropy value under restrictions (1,

Table 4 .
Goodness of fit statistics for entropy distributions and data set.

Table 3 .
Temperatures, frequencies, probabilities and entropy distributions   : Observed frequencies   : Probability distribution  − : Entropy distributions under given restrictions