A New Skew-Symmetric Gudermannian-Laplace Distribution with Properties and Application to Wind Speed Data

ABSTRACT


1.Introduction
Data may have a longer tail on one side than the other, indicating that it is "skewed."Understanding skewed data is crucial for a data scientist or other professional who works with data because most realworld situations aren't symmetrical-real data sets are frequently skewed.Skewed data, on the other hand, can pose problems with statistical models because outliers, which frequently generate skew, can have a harmful effect on a model's performance.In this regard, the presence of a skewness parameter in a probability distribution improves modeling success.There are several methods of obtaining skewadjustable probability distributions via a parameter.For a detailed review of these methods, we advise interested readers to check (Gupta & Kundu, 2009).In this study, we shall focus on one of these approaches, the family of skew-symmetric distributions, which was introduced by Azzalini (Azzalini, 1985).
The skew-symmetric distribution family is a wide family of probability density functions that include the skewness parameter(s).The following lemma defines the main frame of probability distributions in the family (Azzalini, 1985).
Lemma: Let  be a density function symmetric about zero, and  is a Lebesgue measurable function is a probability density function, where  is a function that is odd and continuous.
The contents of the rest of this paper are as follows: The skew-symmetric Gudermannian-Laplace distribution is introduced in the next section.In addition, this section includes studies on skewness and kurtosis coefficients, entropy, and raw moments of distribution.The next section provides inference procedures for maximum likelihood estimation and simulation studies.The final two sections of the study are the application of real data section, which demonstrates the usefulness of the new distribution, and the conclusion section, which discusses some findings related to the proposed distribution.

Skew-Symmetric Gudermannian-Laplace Distribution
In this section, the probability density function of the skew-symmetric Gudermannian-Laplace distribution is presented with some basic properties.We use the base distribution in eq(1) as standardized generalized Gudermannian (GG) which is symmetric about zero.The pdf of standardized GG distribution is (Altun, 2019) As skewing function Ψ, we use the cdf of the well-known Laplace distribution.Thus, by taking the odd function () = , our skewing function is obtained as Definition: A random variable  has the Skew-Symmetric Gudermannian-Laplace distribution with parameter λ, ~(), if its pdf has the form where sgn is the signum function,  ∈ ℝ is a shape parameter and controlling the skewness., 2019).We give the cdf of the ~() random variable as two separate functions for  > 0 and  < 0 to save space.For  > 0 cdf of  is
Raw Moments: Let ~(), then even moments of  is given by As can be seen, the even moments are unaffected by .Moreover, the values of even moments are ( 2 ) = 1, ( 4 ) = 5, ( 6 ) = 61, ( 8 ) = 1385, … and are known as Euler numbers.Odd moments are calculated as where  is the generalized Riemann zeta function (Edwards, 2001).By using raw moments, expected value and variance of  calculated as where  ≅ 0.915966 and known as the Catalan number.Limiting case of an odd moment is . It is obvious that () = 1 for  = 0.
The kurtosis coefficient of a random variable is defined as its 4th central moment and is expressed as The skewness coefficient of random variable ~() can be calculated using eq(7), eq(8), and eq(10) as ) It is easy the calculate   (0) = 5.The limiting case of kurtosis is Using numerical approach, we observe that the kurtosis coefficient is minimum at =1/2 and equal to 4.8981.Note that,   (−) = −  () and   (−) =   ().The left panel of Figure 1 shows the effect of the  parameter on the skewness and kurtosis coefficients in the range  ∈ [0,20], and can be interpreted for  ∈ [−20,20].
Shannon Entropy: Entropy is a measure of the variation or uncertainty of a random variable.
Shannon entropy, defined as   = (− ln   ()), is the most well-known measure of entropy.The right panel of Figure 1 shows the Shannon entropy graph for the random variable ~() in the range of  values between [-5,5].The highest entropy value has been numerically observed to be 1.386 at  = 0. Given that the variance in eq(8) reaches its highest value of 1 at  = 0, we may argue that the uncertainty in the distribution reaches its maximum in the symmetric case.On the other hand, the sign of the  parameter has no effect on the entropy value.

Location-Scale Extension:
Location and scale parameters,  and  respectively, can be introduced by means of  =  + , where  is a random variable with density eq(2).Thus, the pdf of  is obtained as where  ∈ ℝ,  > 0, and  ∈ ℝ.We use the notation (, , ) for .

3.Estimation of Parameters and Simulation
In this section, we will study the maximum likelihood estimators of the parameters of the directly from eq(9).We obtain the following normal equations by taking the first derivatives of L with respect to μ,σ and λ, and setting them to zero: where Δ = (  − )  ⁄ , respectively.Thus, the maximum likelihood (ML) estimates of the parameters , , , say ,  ̂, and  ̂, can be obtained by simultaneously and numerically solving these equations.

Monte-Carlo Simulation:
We performed Monte Carlo simulation studies to illustrate the estimation performance of the obtained ML estimators.Since the quantile function of the distribution cannot be obtained analytically, the following algorithm can be used to generate random variables from the distribution.
Step 1. Set parameter values (, , ) Step 2. Generate ~(0,1) Step 3. Solve (; ) −  = 0 with respect to , where (; ) is the cdf of () Step 4. Calculate  =  +  Different parameter values are used in Monte Carlo simulations.Table 1 shows the mean absolute bias , where θ represents the real parameter value and  ̂ is the maximum likelihood estimate of θ.

4.Application to Real Data
The purpose of this section is to demonstrate the usefulness of the  distribution by using two realworld data sets.
Australian Athletes Data: The first set is the heights (in centimeters) of 100 Australian athletes data (Telford & Cunningham, 1991), which is a popular data set in the literature, especially in studying skewed distributions.We employed Azzalini's skew Normal (SN) distribution and the well-known Normal (N) distribution to compare the modeling success of the SSGL distribution.The results are presented in Table 2 along with the maximum likelihood estimates, log-likelihood value (LH), Akaike information criterion (AIC) and Bayes information criterion (BIC) values, and Kolmogorov-Smirnov statistics with associated p-value (KS).In the same table, the observed values of some statistics and the theoretical values calculated by parameter estimates of these statistics are also presented.According to the KS values in Table 2, the goodness of fit of all three models could not be rejected.
However, compared to the other two distributions based on LH, AIC, and BIC values, the SSGL distribution fits better.Considering the values reported in studies (Hasanalipour & Sharafi, 2012) and (Jamalizadeh, Behboodian, & Balakrishnan, 2008), the SSGL distribution is more successful than the alternatives mentioned in these studies.
Wind Speed Data: This dataset contains the average wind speeds recorded by the İstanbul Çatalca meteorological observatory (41°10'04.9"N,28°29'27.1"E) in January 2020 at 2-hour intervals.Examining Table 3, we find that the KS test does not accept the goodness of fit of the SN and N distributions, but the SSGL distribution is reasonable.AIC and BIC values also show that the SSGL distribution provides a better fit.The same may be said for the first quartile (Q1), median, and third quartile (Q3).When we examine Figure 3 and Figure 4, it is seen that SSGL fits better than SN in most of the empirical distribution.At the end of the right tail, the SN distribution provides a better fit than SSGL distribution.This explains why the skewness of the data is better predicted by SN.

5.Conclusion
In this study, we derived a new skew-symmetric model called SSGL to model skewed data.The closed-form pdf and cdf of the resulting distribution were obtained in the study.Furthermore, statistically significant features of the distribution, such as raw moments, skewness and kurtosis coefficients, and Shannon entropy, have been investigated.In addition, maximum likelihood estimators for unknown parameters of the new distribution were studied.The performance of these estimators in the study was also compared to a series of Monte Carlo simulation studies that had been performed.Given the information obtained from the simulation study, it can be said that all obtained estimators of the SSGL parameters are asymptotically consistent and unbiased.Finally, the usability of the derived distribution has been exemplified by applications performed on two real-world datasets.In both samples, the SSGL distribution provides a better fit than Azzalini's SN distribution.As a result of this study, it was concluded that the SSGL distribution is a suitable alternative for modeling skewed data, especially with the help of computer programs.
Figure 2 illustrates the pdf of the location scale extended  distribution with different parameter values.

(
Bias) and mean squared error (MSE) values obtained from simulations repeated 1000 times for different sample sizes n=30, 50, 100 and 1000.The formula used for computing Bias and MSE are

Figure 3 .
Figure 3. Histogram of wind speed data with fitted densities (left), empirical cdf and fitted cdf (right).

Table 1 .
Monte-Carlo simulation results.As seen in Table1, we conducted our simulations with high and low skewness and sigma values.When we examine bias and MSE values in this table as the sample size increases, both the Bias and MSE values decrease for all parameter values.This shows that the estimations are precise and accurate, implying that they are consistent and unbiased.Because the ML estimators are asymptotically unbiased, this is an expected result.

Table 2 .
Summary of fits for Australian athletes data set.

Table 3 .
Summary of fits for wind speed data set.

Table 3
also includes some statistics of wind speed data.If the theoretical values of these statistics calculated with the estimated parameters for the SSGL and SN distributions are examined, one sees that the mean and standard deviation values of wind speed are more accurately estimated by SSGL.