AIR TEMPERATURE ESTIMATION FOR BATMAN CITY WITH SIMPLE AND MULTI-LINEAR REGRESSION MODELS UTILIZING METEOROLOGICAL PARAMETERS

Determination of air temperature has a significant role in numerous activities such as agriculture, animal husbandry, industry, highway, airlines and railway transportation. In this study, the monthly average of 67 meteorological parameters, which affects the temperature between 2012 and 2017, has taken from Batman Provincial Directorate of Meteorology and the monthly average air temperature of 2017 has been estimated using the meteorological data from 2012-2016. The estimation process has been carried out using two separate scenarios. In the first scenario, each parameter such as monthly average soil temperature, pressure, water vapour pressure, wind speed and relative humidity have been used in the simple linear regression model as input separately and the monthly average temperature has been estimated. In the second scenario, all 67 parameters have been employed in multi linear regression model as inputs and monthly average temperature has been estimated by this way. As a result, very low root mean square error (RMSE) values has been observed in the range of RMSE= [3.30- 10-55] while very high correlation coefficient (R2) values has been computed in the range of R2= [0.10- 0.99].


Introduction
Air temperature (AT) is one of the main meteorological observations and it plays an important role in an extensive range of implementations such as climate change, terrestrial hydrology and atmospheric sciences [1][2]. Moreover, it is very effective for the growth and development of plants. The optimum temperature value is needed for each plant to grow and plants are affected by extreme temperatures [3]. Thus, heat-based region maps studies were started for the first time in the USA, by using data from 4745 meteorological measurement centers from 1974-1995 years [4].
AT is specifically computed in thermometer shields that placed nearly 2 meter above floor at the meteorological stations by high precision. Nevertheless, the deploying of meteorological stations is generally sparse and unbalanced particularly in sparsely populated zones [5]. Therefore, meteorological satellites can provide uninterrupted data for atmospheric and superficial observations over wide coverage terrains [6].
Looking at the studies in the literature, many studies have been conducted in AT estimation using meteorological data. These studies can be classified into two main clusters: simple and advanced statistical approaches [6][7][8][9][10] and machine learning approaches [11][12][13]. In this study, since simple and multi-linear regression models are used in AT estimation, the study can be evaluated within the scope of first group studies.
Considering the first group studies, Bechtel et al. has proposed an approach by implementing empirical models, which used time series of land surface temperature in limited time intervals to estimate air temperature [14]. Yang et al. has built multiple linear regression models for estimating maximum minimum and mean air temperatures in Northeast China with the aid of MODIS land surface temperature data [15]. Good suggested a dynamic multiple linear regression model in order to estimate daily minimum and maximum land air temperatures using in situ observations at meteorological stations [16]. Zhang et al. proposed the multiple linear regressions and the partial least squares regression to estimate daily air temperatures over the Tibetan Plateau thanks to MODIS LST data [17]. Xu et al. implemented two statistical methods for Estimating daily maximum air temperature in British Columbia via MODIS data [18].
In this study, the following steps have been carried out respectively. First, 67 meteorological parameters have been obtained from Batman Provincial Directorate of Meteorology, affecting the average monthly temperature between 2012 and 2017. Second, using the meteorological data from 2012-2016, the average air temperature value of Batman province for 2017 has been estimated. The estimation process has been executed in two main stages. In the first stage, the most important parameters such as monthly average soil temperature, pressure, water vapor pressure, wind speed and relative humidity have been employed as input in the simple linear regression model separately and the monthly average temperature of 2017 for Batman city has been estimated. In the second stage, all 67 parameters have been utilized as inputs of multi linear regression model and monthly average temperature has been estimated by this way. .

Data Collection
The data employed in this study was obtained from Batman Provincial Directorate of Meteorology. The obtained data includes 67 meteorological data for each month among 2012-2017 years. The list of 67 employed meteorological parameters for each month is given in below.

Regression Analysis
Regression can be defined as the dependent variable or variables being a function of the independent variable. Thanks to the regression analysis, information about the existence of the relationship between the variables can be reached. By expressing the relationship between the variables with equations, we can estimate the variables we do not know with the help of the variables we know. Regression can be examined as simple and multiple regression analysis according to the number of independent variables. In addition, according to the function used, it can be divided into two groups as linear and non-linear regression analysis [19]. In this study, simple linear and multilinear regression models are used.

Simple Linear Regression Analysis
In simple linear regression (SLR) analysis, the regression model in which the relationship between one independent variable 'x' and the dependent variable 'y' is expressed by a linear function as shown in Equation 1.
= 0 + 1 + £ (1) Here, b0 and b1 are unknown constants and represent intercept and slope, respectively. They are also called regression coefficients. £ represents the error term. SLR is a method of estimating a standard distributed numerical variable whose value is unknown from a standard distributed numerical variable and which has a relationship between them [18][19].

Multiple Linear Regression Analysis
In multiple linear regressions, the relationship between multiple independent variables (x1, x2,…, xn) and one dependent variable, "y", is obtained. The relationship between each independent variable and the dependent variable can be determined with the aid of below equation. = + 1 1 + 2 2 + 3 3 +. . . . . . + (2) In this way, instead of the linear 'b' coefficient, 'n' linear regression coefficient will be transferred. The method used to achieve its function in both simple and multiple regression method is the least squares method. Moreover, MLR can also be fitted in other methods, such as least absolute deviations regression, or the least squares cost function as in lasso (L1-norm penalty) and in ridge regression (L2-norm penalty). [20][21][22].

Block diagram of the proposed Approach
The architecture of the proposed system is shown in Fig. 1. In the first phase, some important parameters were utilized in the simple linear regression model as input separately. In the second phase, all parameters were in multiple linear regression models as inputs and monthly average temperature was estimated by this way.
Here, y, yi and m show the actual value, the estimated value and the number of observations, respectively.

Results
In this section, the parameters affecting the air temperature between 2012 and 2016 taken from Batman Meteorology Provincial Directorate were used individually and collectively as a training set of regression models. After that, the average monthly air temperature (AT) of Batman city in 2017 was estimated with the help of these data sets.
As a result, by using these parameters separately or collectively, all desired graphs, R 2 and RMSE values were obtained in below figures, respectively. As seen in the Fig. 2, an almost perfect regression curve (R 2 =0.993) was obtained with a very low error (RMSE=3.26) in the estimation of the monthly average air temperature using the monthly average soil temperature parameter, which can also be considered as the parameter with the highest impact among all parameters for the AT estimation. In the Fig. 3, a bad regression curve (R 2 =0.1) was obtained with a high error rate (RMSE=9.78) for estimation of the monthly average air temperature using the monthly average air pressure parameter. In the Fig. 4, a coefficient of correlation (R 2 =0.42) was acquired with an error rate (RMSE=8.23) for estimation of the monthly average air temperature using the monthly average water vapour pressure parameter. As indicated in the Fig. 5, a coefficient of correlation (R 2 =0.11) was computed with an error rate (RMSE=10.53) to estimate the monthly average air temperature using the monthly average wind speed parameter, which might also be evaluated as the parameter with the lowest impact among all parameters for the AT estimation. In the Fig. 6, a low coefficient of correlation (R 2 =0.14) was obtained with an bad error rate (RMSE=10.55) for estimating the monthly average air temperature using the monthly average relative humidity parameter. Finally, in the Fig. 7, a very high coefficient of correlation (R 2 =0.14) was observed with a very low error rate (RMSE=4.47) in order to estimate the monthly average air temperature using the all 67 meteorological parameters. It can be said that when all features were combined, a very good correlation might be observed for AT estimation.

Conclusion
In this study, monthly average temperature of Batman province was estimated by simple and multi-linear regression methods using some important meteorological parameters.
In the estimation phase, all necessary data between 2012 and 2017 were collected and the estimation process was carried out in two stages. In the first stage, some of the important data (monthly average soil temperature, pressure, water vapour pressure, wind speed and relative humidity) were used separately as input parameters in simple regression analysis. In the second stage, all meteorological data was used as the input of the multiple liner regression model. Later, with linear regression analysis, estimation results were obtained as numerical values and graphs, and the error rates of the analyzes and the closeness of the estimation values to actual values were examined. When the numerical values of the graphics are examined carefully, it is understood that my study gives good result.
In the future, it is aimed to collect more data and apply different machine learning techniques to achieve even better results.