An Application of the Generalized Poisson Model for Over Dispersion Data on The Number of Strikes Between 1984 and 2017

Poisson regression analysis is widely used in many studies including count data. Poisson regression analysis is based on the assumption of equal mean and variance. However, this assumption is quite difficult in regression models. In cases where the assumption is not provided, over dispersion or under dispersion occurs. Over dispersion in data occurs when the variance of the dependent variable is greater than the average. This results in lower estimates than the standard errors. The generalized Poisson regression model is one of the methods used in case of over dispersion. This model is a generalization of Poisson regression. In this study, Poisson regression and generalized Poisson regression methods were used in the modelling of count data for determinants of strikes between 1984 and 2017. According to empirical results, while all explanatory variables of the Poisson regression model were significant, the unemployment rate was found to be insignificant for the generalized Poisson regression model. This result was evaluated considering the structure of the data.


Introduction
Regression models are the most common methods used to model the relationship between dependent and independent variables. Depending on whether the dependent variable is discrete or continuous, different regression models are used in practice. In many applications, the dependent variable is count data consisting of integers that do not take negative values. Count data can be the number of accidents on natural gas pipes, the number of delays of airline companies, the number of party changes of deputies during the legislative year, the number of strikes per year in a country, the number of traffic or occupational accidents resulting in a day. In such cases, the application of classical regression analysis will cause the predicted coefficients to be biased (King, 1988).
The most common regression model used for count data is the Poisson regression (PR) model. The Poisson regression model, which is the dependent variable count data, is derived from the Poisson distribution. Poisson regression model is suitable for data showing equal spread. Equal dispersion means that the expected value and variance of the dependent variable are equal. This is rarely the case.
If the variance in the Poisson model is greater than the average, that is, there is more variability than expected, this is called over dispersion. If there is over dispersion of the data and parameter estimates are made without taking this into account, this will cause parameter estimates to be lower than the standard errors, resulting in errors in the selection of arguments for the model.
Over dispersion is usually caused by one of two conditions. First, in almost all studies, some explanatory variables that were not associated with the variables included in the analysis were excluded from the model. The second is the dependence between observations. One assumption of the Poisson distribution is that the observations are independent of each other (King, 1989;Osgood, 2000).
One way of taking into account the over-dispersion is to derive a probability distribution with more dispersion than the Poisson distribution. In a Poisson process, assuming that the Poisson parameter is Gamma scattered, a negative Binomial distribution is obtained and the resulting distribution is over dispersed relative to Poisson. Joe and Zhu (2005) showed that the generalized Poisson distribution (GP) can be considered as a Poisson mixture and therefore an alternative to negative binomial (NB) distribution. Like NB, the GP distribution has a scale parameter.
The aim of this study is to show that in case of over dispersion in the data, it can be used in the generalized Poisson regression model. In the following sections, Poisson regression and generalized Poisson regression models are explained by giving an application about the number of strikes.

Poisson Regression Analysis
Poisson regression method belongs to the generalized family of linear models (Hoffman, 2004;Agresti, 1996). These models extend the scope of ordinary linear regression in two ways. First, as linear functions of explanatory variables, they define Alphanumeric Journal Volume 8, Issue 1, 2020 transformations of the conditional mean of the dependent variable rather than the mean itself; second, they ensure that the dependent variable has conditional distributions that are different from normal. Poisson regression is similar to the multiple regression method, except that the dependent variable is an observed number showing a Poisson distribution. However, the possible values of are nonnegative integers such as 0, 1, 2, 3.
In some cases, the distribution of the dependent variable is skewed. The frequencies peak at the lowest value and drop sharply towards the top of the scale. Variables with such asymmetric right-slope distributions can be expressed by the Poisson distribution from the discrete distribution family (Moksony and Hegedus, 2014).

Let
and be observations from a data set. Here, the numbers and are respectively a vector of arguments and dependent variables. Poisson regression analysis assumes that the dependent variable shows the Poisson distribution. The probability density function for the Poisson distribution with the parameter is given in the following formula; In this expression, is the number of events occurring, and is the ratio of the occurrence of events per unit of time. In other words, gives the average of the distribution. The probability here changes as a function of . Poisson probability distribution is inclined to the right. But as grows, the distribution approaches the normal distribution. Figure 1 shows the variation of distribution according to different values. The most significant feature of the Poisson regression model is that the mean and variance are equal. Over or under dispersed data sets cannot be modeled by the Poisson distribution because distortions are seen in the assumption that the conditional expected value is equal to the variance and the assumption is not satisfied. In this case, updating the data set or starting the analysis with different methods may be a solution.
The expected value and variance of are given in Equation 2.
Alphanumeric Journal Volume 8, Issue 2, 2020 In order to ensure that the expected value of does not take negative values, the link function showing the relationship between the expected value and the independent variables must be in the form given in Equation 3 (Cameron and Trivedi, 1998).
In this equation, is an exponential function of the arguments. is the same as given in Equation 4.
Where 0 , 1 , … represent the unknown parameter vector. In the Poisson regression analysis, there are many methods for calculating estimators based on the distribution of the dependent variable . The most commonly used and best known of these methods are: Maximum likelihood (MLE) method, artificial maximum likelihood (PMLE) method and generalized linear models (GLM). MLE is the most commonly used technique for regression models. In the likelihood method, Newton Newton Raphson iteration technique is generally used. When an observation set is given, the log likelihood function of the Poisson regression model is as follows: When the logarithm of this function is taken, Equation 6 is obtained.
Accordingly, the Poisson MLE value is calculated from the expression in Equation 7. Famoye (1993) derived the generalized Poisson regression (GPR) model from the generalized Poisson distribution introduced by Consul and Jain (1973). These distributions can handle count data that is under dispersed, over dispersed and equally dispersed. The most widely used regression model for count data sets is the Poisson regression model. The most prominent feature of the Poisson model is that it is equally dispersed. In applications, however, the data sets generally have a variance that exceeds the average. Therefore, they show over dispersion.

Generalized Poisson Regression Analysis
Over dispersion of data is caused by the fact that the number of zero values observed exceeds the zero values revealed by the Poisson model and unobserved heterogeneity (Kibar, 2008). Over dispersion in the model does not affect the coefficient estimate, Where > 0 and max (−1, − 4 ) < < 1. Also, the mean and variance of the generalized Poisson distribution are Equations 9 and 10.
Specifically, the term ∅ = 1− 2 plays the role of a dispersion factor. It is clear that the generalized Poisson distribution for = 0 is the general Poisson distribution with the parameter . When < 0, then under dispersion occurs, while > 0, then over dispersion occurs (Yang et al., 2009). When there is over dispersion, it will cause the standard error to be under the estimate and the regression parameters to be misinterpreted. Based on the GP distribution, the explanatory variables are combined in the regression model with the help of a log-link function as in Equation 11.

Testing the Goodness of Fit of the Model
In linear regression models, the extent to which the regression line is compatible with the data can be named as the goodness of fit of the regression line adapted to a data set (Gujarati, 1999). After estimating the parameters, the distribution of the observations around the shape of the model should be measured because the closer the observations are to the predicted model, the greater the goodness of fit of the model is. In other words, it would be better to explain the change in with changes in explanatory variables (Koutsoyiannis, 1989). In testing the goodness of fit of the Poisson regression model, Pearson statistic χ2, deviation statistic G 2 , pseudo R 2 measurement, Akaike Information Criterion (AIC) and Bayes Information Criterion (BIC) are the commonly used criteria.

Pearson Statistics
Pearson's statistic, which is frequently preferred to determine whether there is an over spread in the series, is one of the basic criteria of goodness of fit. Pearson statistics for a model with mean and variance are given in Equation 12.
This value is used to determine whether the dispersion of the series is over. When Pearson statistic is applied for Poisson regression, it will be = as a natural extension of the Poisson distribution, and the formula will take the form of Equation 13.
Alphanumeric Journal Volume 8, Issue 2, 2020 If the ratio of calculated 2 to the degree of freedom is more than 1, it indicates that the data are not suitable for the model and the presence of over dispersion status. The calculated 2 value will likewise be compared with the value (n -k). Here; If the series 2 > n − k is over dispersion, If the series 2 < n − k is said to be under dispersion (Deniz, 2005).

Deviation Statistics
One of the techniques used to measure goodness of fit is deviance statistics. This statistical value is also called 'G square statistic'. Deviation statistics are expressed by Equation 14.
The convergence of this statistical value to 0 indicates that the model fit has increased. If the statistical value is equal to 0, 'model fit is perfect'.

Akaike Information Criterion (AIC)
The Akaike information criterion is used to select the most appropriate one from different models. Among the available models, the model with the lowest AIC value calculated by Equation 15 is selected as the appropriate model.
In this equation, ℒ represents the maximum value of the log likelihood function, represents the number of independent variables. Where the number of parameters is larger than the sample size, the AICc recommended by Hurvich and Tsai should be used instead of the AIC. This value is expressed by Equation 16 (Akaike, 1973;Sugiura, 1978;Hurvich and Tsai, 1989).

Bayesian Information Criterion (BIC)
The Bayesian information criterion, like the Akaike information criterion, is one of the methods to measure the relevance between the data and the model. Bayesian information criterion is expressed in Equation 17.
Akaike derived the BIC model selection criterion for selected model problems in linear regression (McQuarrie and Tsai, 1998). As with the Akaike information criterion, the model with the smallest BIC value among the existing models is selected as the appropriate model.

Results
In this study, the strikes carried out in Turkey between the years 1984-2017 and the determinants of these strikes are analysed by Poisson regression method. Similar studies have been done before with different variables. Şahin (2002) modelled on the hypothesis that the number of strikes 1964-1998 that occur in a year using the Poisson regression model is affected by unemployment rate, unionization rate, rate of change of national income per employee and dummy variable. Sezgin and Deniz (2004)  The series of strikes for the period 1984-2017 are given in Figure 2 to help us see the course of the strikes over the years. According to the Figure 2, the number of strikes, which increased between 1984 and 1987, experienced a sudden decline in 1988. The number of strikes, which started to increase again, peaked in 1990. Numbers have decreased since this year. The number of strikes has followed an average course since 1992 (except 1995). Turkey in the post-1980 political unions he began to lose its effectiveness as a result. The 1982 constitution-imposed restrictions on strike practices (Sezgin and Deniz, 2004). The strike-off delay with the 1982 Constitution had a constitutional status. In strike delays, the public sector until 1995, the private sector gained weight in the 2000s. The postponement of the strike with the highest national security justification took place during the 1991 Gulf Crisis and in 1995 (Tokol, 2016).
Alphanumeric Journal Volume 8, Issue 2, 2020 According to the results of the analysis, the regression coefficients of Poisson regression and generalized Poisson methods, their standard error values and the significance test statistics of the related coefficients are given in Table 2 and Table 3.
As can be seen in Table 2, the maximum likelihood coefficients are statistically significant at 5%. Statistically significant coefficients show whether the related variable affects the occurrence of the strike positively or negatively. Accordingly, the unemployment rate and inflation rate positively affect the likelihood of a strike to occur in one year. The unionization rate and the gross national product per capita negatively affect the likelihood of a strike. As the data were found to be over dispersed, the analysis was continued with the Generalized Poisson regression model. Table 3 shows the results of the Generalized Poisson regression analysis.

Number of Strikes
According to these results, variables other than unemployment rate were found significant at 5% level. When the coefficients are examined, it is seen that the signals are obtained similar to the Poisson regression analysis, but they take different values. Unemployment rate and inflation rate positively affect the likelihood of a strike to occur in one year. The likelihood of a strike is negatively affected by the unionization rate and the gross national product per capita. When the information given in Tables 2 and 3 is examined, it is observed that all variables make a significant contribution in the Poisson regression model; however, the unemployment rate is not significant in the generalized Poisson regression model. In his study in Şahin (2002), it was observed that the rate of change of national income negatively but statistically insignificantly affected the number of strikes. Similarly, it was concluded that the unionization rate was not effective on the number of strikes. Sezgin and Deniz (2004) rating by the study conducted in Turkey, focusing on the factors affecting the number of strikes in practice the union had no effect on strikes, but the unemployment rate, the rate of change in national income per employee was observed that the pre-1980 and post-period dummy variable indicating the impact of significant and important exit. When the AIC and BIC criteria of both models were examined, it was observed that lower values were found in the generalized Poisson regression model. Therefore, it can be concluded that the Generalized Poisson regression model is statistically better for these data.

Discussion and Suggestions
This study includes an application of Poisson regression and generalized Poisson regression models in case of over dispersion. One important feature of the Poisson regression model is that the mean and standard deviation have the same value. If the data does not comply with this situation, it is possible to mention the wrong definition of the model. In the Poisson regression model, over dispersion occurs when the dependent variable is greater than the average of variance. This results in lower estimates than the standard errors.
The generalized Poisson regression model is used in the modelling of count data in order to solve the over dispersion problem. Another way of solving over dispersion is to develop a negative binomial model, which is a parametric model that spreads more than Poisson. However, it is also possible to use Quasi maximum likelihood estimators, even if the distribution of is misspecified.
In this study, determinants of strike numbers between 1984 and 2017 were analysed by Poisson regression model. As the data were found to be over dispersed, the analysis was continued with the generalized Poisson regression model. At the end of the study, the empirical results of both models were compared. According to these results, while the unemployment rate was found significant in the Poisson regression model, it was interpreted as insignificant in the generalized Poisson regression model. This difference is thought to be due to a deviation in the over dissemination Alphanumeric Journal Volume 8, Issue 2, 2020 of data. Although the results of the analysis of the Poisson regression model seem significant, the presence of over dispersion should not be ignored. Choosing the appropriate model for the structure of the data will increase the reliability and predictive power of the research results.