Electricity Load Forecasting via ANN Approach in Turkish Electricity Markets

Forecasting electricity load has become the essential task for electric utilities, power plants and regulators. It is essential that electricity load forecasts, which are a vital necessity of energy policies, produce healthy and reliable results. Artificial neural networks (ANN) can learn complex and nonlinear relationships. This article introduces 400 different ANN models for electricity load forecasting. Model performances have compared with Mean Absolute Percentage Error (MAPE) and Diebold-Mariano (DM) test. The electricity load data used for this study range from 2014 to 2016. The variation in forecasting ability of ANN for different models has also discussed. Levenberg-Marquardt (LM) with logsigmoid transfer function trains the best performance ANN model.


Introduction
Today our daily life mainly depends on the use of various forms of electricity. The rapid rise of industrialization within the last century has contributed to the growth of electricity consumption. With the liberalization of many electricity markets, utilities had to deal with the fact that accurate load forecasting became a necessity. Forecasting the electricity load is crucial for market participants, which are generation companies, energy transactions and planners. Market participants with good load forecasting may have efficient utilization of electric energy. Likewise, maintenance scheduling, economic load dispatches are generally carried out on the basis of the electricity load forecasting.
A proper load demand policy in the electricity industry is of the utmost important for the whole economy. Accurate load forecasting results in considerable saving for electrical utilities. 1% reduction in the average load forecasting error can save even millions of dollars (Hobbs et al. 1999(Hobbs et al. , p.1342. In making the policy decision in competitive electricity markets need more algorithms-based tools. Turkish future electricity strategy is based on privatization. The underlying reason for this strategy is to create in confidence investor to be engaged in generation activities (Guney, 2005, p. 103).
Load forecasting task in the electricity market is difficult due to the complexity of the market. This complexity arises from the instant nature of the electricity, complex market design and frequent regulatory interventions. Artificial intelligence is a field of research that is relatively new but is growing in quality. Computational intelligence is widely used to refer to fuzzy system areas, swarm intelligence, evolutionary computing, and artificial neural networks. ANNs are one of those areas that are most frequently used in the forecast of electricity load. Due to its clear design, simple executor and good performance artificial neural networks have received more attention (Hahn et al., 2009, p.903).
In order to select the best training algorithms, transfer function and neuron number for the ANN, all possible combinations should be considered. This paper introduces 400 different ANN model for electricity load forecasting for the Turkish Electricity Market. Load forecasting models are compared according to their Diebold-Mariano (DM) test and MAPE. The paper is structured as follows: section 1 provides short review about load forecast; section 2 provides a short overview of artificial neural networks. Section 3 provides methods in the study and section 4 discusses implementations and empirical finding. Finally, results and some forward ideas are discussed in the conclusion.

General Electricity Load Forecasting Review
There are various types of load forecasting models based on time horizon, algorithms and architecture types. According to time horizon, we can characterize load forecasting methods as long-term, mediumterm and short-term methods. Long-term load forecast methods are usually used to forecast next several years. Long-term load forecasting is attractive for electric utilities, which want to make investment decisions. Medium-term forecast characterizes for several months up to next year. These models are important for portfolio management. Short-term load forecast methods are usually used to forecast time horizons of hours, days or weeks. This type of models used by electric utilities for trading and availability of supply.
Broad ranges of models for load forecasting have been suggested. Electricity load forecasting is classified by traditional approaches 1 and computational approaches 2 . In addition, hybrid models are applied for load forecasting by combining wavelet and neural networks (Kodogiannis et al. 2013, p. 30). Electricity load forecasting also is classified as; multiple regression, time series, adaptive load forecasting models, fuzzy logic, artificial neural networks and knowledge-based expertise (Alfares & Nazeeruddin 2002, p.23). Likewise load-forecasting techniques are classified as mathematical models, soft-computing techniques and hybrid of different techniques. Because of uncertainty on load characteristic, fuzzy system and neural network combination are proposed for load forecast (Lou & Dong, 2015, p. 35). Load forecasting MAPE results are range from 0.56% to 1.30% (Mandal et al. 2006(Mandal et al. , p.2128. Studies on energy demand forecasting of Turkey have begun in the 1960s and starting from 1984 econometrics models have been applied for forecasting purposes (Kankal et al., 2011(Kankal et al., , p.1928. Fuzzy logic electricity model for Turkish market shows that electricity demand is strongly related to gross domestic product of the country. Additionally, short-term electricity forecasting with economic performance would provide reliable data for policy makers (Küçükali & Barış, 2010, p. 2440. Gross domestic product (GDP) per capita, population, inflation, unemployment count and temperature are used to forecast the Turkey annual gross electricity demand (Günay, 2016, p. 93).
Methods used in load forecasting models are artificial neural network, swarm optimization, genetic algorithm, harmony search algorithm and autoregressive integrated moving average (ARIMA).

ANN Specific Electricity Load Forecasting Review
ANN models have received much attention because of their easy implementation, clear design, fault tolerance and good performance. Advantages of ANNs is the capability of learning, massively parallel computation and robustness in the presence of noise. However, ANN models have a disadvantage of over-fitting, under-fitting and local optimal solutions. Load forecasting techniques are reviewed and evaluated according to different architectures, maximum 1,825 training samples maximum 365 test days (Hippert et al., 2001, p.47). In this study, 7,296 sample hour data are used for testing, 19,008 sample hour data are used for training. Testing and training data samples in these models are comparatively much higher than other studies.
In addition, ANN models show its superiority to statistical methods in load forecasting (Srinivasan et al., 1991(Srinivasan et al., , p.1173. Artificial neural network based models are used for predict electricity load for Egyptian Unified System and result in the maximum value of 2.2% MAPE (Osman et al., 2009, p.7). Multilayer feed-forward ANN model with Levenberg-Marquardt optimization training algorithm is used to train network. (Buhari & Adamu, 2012, p. 125).

Methods
The first step to be taken when designing an ANN forecasting model is to select an appropriate architecture. There are several neural network subtypes. Radial basis function networks, recurrent neural networks are used for forecasting. The feed-forward neural networks are generally used in load forecasting (Hippert et al., 2001, p.46). Levenberg-Marquardt (LM), Broyden, Fletcher, Goldfarb and Shannon (BFGS), Gradient Descent (GD) and Gradient Descent with Momentum (GDM) optimizer (training) algorithms were used to create different models. The neuron numbers used in the hidden layer changes step by step from 1 to 50 to determine minimum network error. The activity function for neurons must be differentiable and non-decreasing as mentioning the appendix section. In the current study, hyperbolic tangent sigmoid transfer function and log-sigmoid transfer functions were used in models.
The evaluation of the artificial neural network performance was based on MAPE results and DM test. The average marginal effects of hidden layer neurons also have been determined. Lastly, the effect of short-load forecasting on energy policy was discussed. Figure 1 shows the steps of this study in order.

Data
Electricity load data is received from Energy Exchange Operations Authority of Turkey (EPIAS). EPIAS's aim is to commence activities in electricity market under the supervision of Energy Market Regulatory Authority (EPDK). Data covers the 1,096 days between 01.01.2012 and 31.12.2014 and consist of 26,304 hours data, from this data 19,008 hours between 01.01.2012-03.03.2014 are used to train the model, 7,296 hours between 03.03.2014-31.12.2014 are used to test the model. Figure 2 shows training and testing data. The actual data and the forecasted data were compared through MAPE. In addition to the MAPE test, we compared the accuracy of the forecasting results with The Diebold-Mariano (DM) test. DM test compares the forecast accuracy of two fore-cast methods. DM testing can discriminate the important distinctions between models (Diebold, 2007).

Load Forecasting Factors
The selection of the input variables can significantly affect network performance. In the short term, load depends on the temperature, wind speed, the business activities such as weekdays, holidays and nearholidays. The optimal choice of input variables still remains an open question.
There is a common consensus that the temperature is the most significant impact in load forecasting (Hahn et al., 2009, p.903). The demand is high in cold climates because electric heating is common. Similarly, demand is high in hot weather, which can be attributed to air-conditioning compressors. These results in U-shaped function of the load with regard to the temperature (Hippert et al., 2001,p.50). Using hourly basis temperature increases the success of the model. Hourly temperature was used as input to the forecasting model for Turkey cities between days 01.01.2012 and 31.12.2014. The temperature information was acquired from www.wunderground.com. Population and industries in the city have affected electrical load consumption. For example, Istanbul, the largest city in Turkey has 16% load demand, Ankara, which is the capital of Turkey has 5.5%. Therefore, the temperature of the city and the electricity demand should be considered together. Figure 3 shows electricity load demand rates of Turkish cities. Demand of load has a significant effect on electricity forecasting models. Models use the temperature with city-based electricity consumption. Electricity consumption has calculated by taking into consideration the load demand weighted calculated temperature. Equation 1 shows the load demand consumption weighted formula to calculate temperature.

=1
(1) i value shows cities in Turkey which has 81 cities, wi is the electricity consumption percentage in the city, ti the city temperature and m shows the day, τm reflects the load demand consumption weighted by temperature. Likewise, Table 1 shows input data statistics. Weekday, working or not working day (includes holidays and community festivals), previous week same hour electricity load, previous day same hour electricity load, 24-hour average electricity load and daily hour are model entries for load forecasting. Figure 4 shows model input, hidden layer and model output.

Empirical Results
The performance of the 400 different models is evaluated according to models as model MAPE. In addition to the forecasting performance of the models, the training time of the models is also important. In this study, training time was not used to compare performance. But the increase in the number of neurons also increased the training time of the models. The most successful MAPE result was calculated as 1.86%. The Levenberg-Marquardt learning algorithm has been reached, this value with the logsigmoid transfer function and 13 neurons in the hidden layer. Minimum, maximum and average MAPE values according to training algorithms are illustrated in Table 3.  The following empirical results are acquired from Table 4.
 According to DM test; between BFGS, LM compared values are less than 1,96, the zero hypothesis cannot be rejected at the 5% level of significance, so the BFGS and LM model's forecasting performance is not significant.  According to DM test; GD, GDM compared values are not less than 1,96, the zero hypothesis is rejected at the 5 % level of significance so the observed differences between GD to GDM, GD to BFGS, GD to LM, GDM to BFGS and GDM to LM are significant.
LM and BFGS algorithms have smaller MAPE values. Figure 6 shows LM and BFGS training algorithms MAPE changes according to the hidden layer neuron.
Neurons in hidden layers change from 1 to 15, which have a positive effect on the success of the model. There is no serious effect on the success of the model in more than 15 hidden layer neuron. Figure 7 shows the average marginal effect of the hidden layer neuron number.  LM training algorithm with 13 neurons in the hidden layer and log-sigmoid transfer function combination results in the most successful model. Actual electricity load data and forecasting electricity load are shown in Figure 8. The average difference between the forecasted data and the actual data is calculated as 0,1373 MWh. The biggest difference between actual and estimated the amount of the electricity was observed on 18.09.2014 Thursday at 2 am. Load fluctuations greatly influence the forecast error. Figure 9 shows the change of Load Forecast Plan on 18.09.2014.

Conclusion
Effective implementation of electricity energy polices needs accurate load forecasting. Load forecasting contributes to planning and policy formulations. Modeling load forecasting plays a vital role for policy makers. Underestimation of the load would result in potential power outages, overestimation would result in idle capacity that means wasted resources. The decision makers which establishing a good load forecasting models needs to have efficient demand management. This paper presented comprehensive models for load forecasting with artificial neural networks approach in Turkish deregulated electricity market. In this regard, we may conclude that electricity load forecasting models established in this empirical study could have fruitful implications for decision makers.

Appendix A. Artificial Neural Network Appendix A.1. Artificial Neural Training Algorithms
In this section, we provide an introduction into artificial neural networks. Inspired by biological systems, artificial neural network is a mathematical nonparametric and nonlinear model that maps neurons relationships while not exploring the internal processes. They capture functional relationships among the data, which has non-linear or difficult to describe as a relation. The characteristics of the artificial neural networks are adaptability, nonlinearity. These features make artificial neural network quite appropriate and useful for forecasting.
ANNs are applied in many fields such as medicine, defense industry, communications, finance, and econometrics. In an artificial neural network, the primary unit is artificial neuron. The single neuron is an information-processing unit, which takes input from other neurons and produces output. Single artificial neuron consists of five different components that are inputs, weights, activity function, bias and output. Figure 10 shows all elements of a single neuron. Input and output relationship in a single artificial neuron is as follows.
(2) x1, x2,...xn shows inputs, w1, w2...wn are weights, b shows bias and f shows activity function and y shows output. Most preferably, the activity function is the sigmoid function. Usually, the cause of the preference of the sigmoid function in a neural network is its limited functionality. Limited increasing or a decreasing output characteristic of sigmoid function makes the stable behavior of the network. The sigmoid function is formulated as follows.
The ultimate goal of artificial neural networks is to minimize the network error, which is the discrepancy between the actual results and the result of the network. Network error between the results is formulated as follows.
n shows the total output number of the neural network. ok shows the output of the network of each k, tk indicates the target to be met for each k value. The network performance is determined by the difference between ok and tk values. The artificial neural network learning process is a nonlinear minimization problem in which weights of a neural network are iteratively changed to minimize the overall mean.
In general, there is no algorithm to provide the optimal solution for a nonlinear optimization problem. Backpropagation algorithm is a popular training method, which is mainly called as gradient steepest descent method.
Different algorithms have been suggested to enhance the training performance of neural networks such as gradient descent (Wang et al., 2016), conjugate gradients (Khadse, Chaudhari, & Borghate, 2016), Quasi-Newton (Likas & Stafylopatis, 2000) and Levenberg-Marquardt (Basterrech et al., 2011, p.130). The second order techniques are more effective techniques of nonlinear optimization. BFGS and LM are second-order methods used in this study.
The network weights are moved along the negative of the performance function gradient in standard backpropagation. Gradient descent algorithm starts at a random point in the neural network weight matrix for each input to change the network to produce the desired output on an algorithm. Gradient descent updated weight matrix calculated formulated as follows.
∆wk shows weight change in weight vector, gk shows weight vector change at k, illustrates the learning coefficient αk. Learning coefficient αk should not be too large or small. Small αk value slows the learning network (Ali et al., 2014(Ali et al., , p.1022. Gradient descent algorithm is a disadvantage of having a high learning time learning, and fluctuations at the local minimum (Azar, 2013). To prevent artificial neural networks from entering the network oscillation, momentum is added. p shows added momentum factor.
The Quasi-Newton method examines changes in weight vector and the second order derivatives (Fine, 1999, p.156). When compared to gradient descent method, quasi-newton approach more quickly to reach the desired weight vector. Gradient Descent and Gradient Descent Momentum uses first order derivatives. With Quasi-Newton method, weights of the network are calculated by the second-order Taylor series.
∆ ( ) = −∇ ( ) (7) ∆w is Newton weight step size, H(wi) is a Hessian matrix(second derivatives). ∆w in the neural network is used to calculate the weights updates.
The obtained load value is compared with the error rate. In the case of network error is below a certain level successfully accepted and successful weight the network uses vector.
||∇ ( )|| < If the Hessian matrix (second-order derivative matrix) is assumed to be equal to zero if the determinant can be inverse matrix.
The weight value of the network is calculated by a second order Taylor series and Newton weight change. There are different updated formulas of Quasi-Newton method such as Davido Fletcher Powell (DFP), BFGS.
Levenberg-Marquardt algorithm is seen as a combination of Gradient Descent and Quasi-Newton method. Levenberg-Marquardt algorithm rather than calculate the Hessian matrix, he uses the approximate value of the Hessian matrix to accelerate the algorithm (Talaee, 2014, p. 699). In Levenberg-Marquardt algorithm Hessian matrix calculated by Jacobian matrix as shown below.
New calculated weight vector and the Hessian matrix is given by the following formula.
w refers the weight vector matrix I refers unit matrix, combination coefficient , J refers Jacobean matrix, ε refers to the error vector. If big μ is selected the above formula acting like gradient descent algorithm if small μ is selected the above formula will act as Newton's method (Kourentzes et al., 2014,p.14). Case default initial μ is 0.001 and default increase factor is 10 and default maximum is μ 1e10.
There are also algorithms not mentioned in this study such as the resilient methods, conjugate gradient. Table 5 shows algorithms to be used in this study. There is a considerable amount of research and methods on neural network. Different of optimization algorithm are used to select network parameters to minimize artificial neural network error. Some of methods are the steepest descent, conjugate gradient, quasi-newton and Levenberg-Marquardt.
Neural network learning methods can be examined as heuristic optimization techniques and numerical methods. Heuristic optimization techniques are grouped in gradient descent, adaptive neural network, and gradient descent with momentum and resilient algorithms (Lahmiri, 2011, p15). Numerical methods are categorized in Quasi-Newton and Levenberg-Marquardt.
The quasi-newton method is often faster than conjugate gradient cases because it does not calculate the second derivative. It updates Hessian matrix the each iteration. Finally, Levenberg-Marquardt method includes the Quasi-Newton and the steepest descent method's important features.

Appendix A.2. Artificial Neural Network Design
An artificial neural network design is absence of certain rules. Despite the many satisfactory characteristics of the neural network, building an artificial neural network for a forecasting problem has own difficulties. Neural networks have many satisfactory characteristics but neural network design for forecast problem is not an easy process. The design process of the artificial neural network has a major impact on the success of the network. The lack of precise rules on network design results in significant problems. Decisions regarding the network design can not be found due to the strict rules of artificial neural networks in the design process can be given by trial and error method (Haykin, 1999, p.72).
In general, the design process of the neural network can be grouped under the following figure. Neural network design starts with finding problems inputs and output. In-puts and outputs of the network vary depending on the problem characteristics. Before starting the network design, inputs and outputs of the model must be analyzed. Input pre-processing can be done on the model inputs. Input data with carrying similar information can be simplified by pre-processing before entering the network. Data preprocessing will positively affect the performance of the network. Selection of the network structure is a major step affects the success of the network. Feed forward network structure is preferred in general prediction models, classification models. In a recurrent network, the feedback problems are preferred. The different number of layers can be used in the network design process. An increased number of layers are proposed not to have a greater impact on the performance of the network (Zhang et al., 1998, p.44).
The neuron numbers in the layers should be determined intuitively. If the number of neurons in layers is too small, the input and output pattern can not be found by the network. As a result, the network may have difficulty in convergence during training. If the number of neurons is too large, the learning process will take long time and will impair the capacity of the neural network.
There is no learning method to provide the globally optimal solution for a nonlinear optimization. The back propagation algorithm, which is basically the steepest descent method of gradient, is currently the most popular training method. The BFGS and LM are efficient training methods. The transfer function should be differentiable and most papers used either the logistic or the hyperbolic tangent transfer functions.
Mean absolute percentage error (MAPE), the weighted mean absolute percentage error (WMAPE), the mean absolute error (MAE) and root mean square error (RMSE) is generally used to measure network performance. MAPE values are calculated using following equation where At shows the real value and Ft shows the estimated value.