Generative adversarial network for load data generation: Türkiye energy market case

Load modeling is crucial in improving energy efficiency and saving energy sources. In the last decade, machine learning has become favored and has demonstrated exceptional performance in load modeling. However, their implementation heavily relies on the quality and quantity of available data. Gathering sufficient high-quality data is time-consuming and extremely expensive. Therefore, generative adversarial networks (GANs) have shown their prospect of generating synthetic data, which can solve the data shortage problem. This study proposes GAN-based models (RCGAN, TimeGAN, CWGAN, and RCWGAN) to generate synthetic load data. It focuses on Türkiye’s electricity load and generates realistic synthetic load data. The educated synthetic load data can reduce prediction errors in load when combined with recorded data and enhance risk management calculations.


Introduction
A smart grid is an electricity distribution network incorporating information and communication technologies to improve energy efficiency.It allows for the real-time exchange of data between electricity suppliers and consumers, which enables suppliers to forecast electricity demand based on current energy consumption and user profiles.This feature enables energy suppliers to optimize electricity efficiency by providing accurate load modeling, resulting in a more efficient power grid [1].In the context of smart grids, accurate electricity demand forecasting is crucial for energy suppliers to avoid financial losses and system troubles, e.g., drops in frequency and blackouts.However, obtaining an extensive and high-quality electricity dataset is challenging and expensive.Although electricity grid models are known, data is short as a consequence of privacy concerns, which restricts the researchers' access to datasets and limits the development and application of further load prediction models.Generating synthetic data that accurately represents real data's statistical behavior and characteristics can help address subjects connected to sensitive data's quantity, quality, and privacy.By generating synthetic data that mimics the patterns and trends of real data, it is possible to provide researchers and companies with a valuable resource for understanding the distribution of the original data while also enabling efficient data storage, data augmentation, system testing, and data disclosure.Synthetic data generation can also help mitigate concerns around data privacy, as it can provide a substitute for sensitive or confidential data that is not accessible to third parties.This study focuses explicitly on Generative Adversarial Networks (GANs) for generating synthetic data regarding their implementation performance and flexibility in mirroring historical data.GANs have successfully generated and manipulated images and natural languages, as demonstrated by various studies [2, 3,4,5].As a result, GANs have become a prominent method for synthetic data generation.GANs are powerful generative models that can assemble new samples having similar distributional properties to the real data, making them useful for data augmentation [6].While initially developed for image processing and computer vision, GANs have garnered significant interest and advanced in various research fields [7].GANs also demonstrated favorable outcomes in generating sequential data (e.g., music, medical data, and finance).Therefore, this study focuses on applying GANs to sequential data, specifically generating synthetic load data for Türkiye energy market.There are two primary strategies for applying GANs to electricity consumption data forecasting.The first strategy uses a typical GAN architecture in generating synthetic load data.The performance is evaluated concerning the divergence or convergence of synthetic data to real data.The second strategy involves using more complex GAN architectures to generate synthetic electricity consumption data and combining it with real data to expand and improve real load data.The first approach is limited to scenario generation.It produces load profiles lacking precision, while the second approach is data augmentation, which is highly influential but must fully illustrate the capabilities of GANs.Therefore, this study focuses on the first strategy of synthetic data generation using GANs to produce hourly electricity consumption records.Although usually, studies suggest using Long Short-Term Memory (LSTM) in GANs, this study avoids utilizing it to decrease the computational cost of training and overfitting problems.Instead, the study uses Recurrent Neural Network (RNN).More specifically, it uses the GANs called Recurrent Conditional GAN (RCGAN) [8], Time-Series GAN (TimeGAN) [9], Conditional Wasserstein GAN (CWGAN) [2], and Recurrent Conditional Wasserstein GAN (RCWGAN) as in [10].The remaining part of the study is systematized as follows: Section 2 briefly reviews the literature on GANs.Section 3 presents an overview of the GANs used in this study without delving into technical facts.Section 4 introduces Türkiye's load data and includes exploratory data analysis and synthetic data generation using the selected GANs.Section 5 concludes the study.

Literature review
The concept of GANs was submitted in the paper by [11] and quickly gained traction in many research fields, such as 3D object generation [12], electronic health record generation [8], image processing [13] and generation [11,14], face detection [15], audio synthesis [16,17] natural language processing [18], traffic controlling [19], energy market modeling [10], and stock market modeling [20].However, training GANs are challenging since they generally suffer from missing modes problem or model collapse, where the generated samples lack variety and only cover some regions of the space.Another common issue is vanishing gradients that can stop GANs training from converging to an optimal state.Dealing with these problems has gained significant attention in GAN research, and various approaches have been proposed to mitigate them.Consequently, many alternative GANs are developed from empirical and mathematical perspectives to solve such problems.For instance, the study of [2] is the first study that extends the GANs by comparing various distance measures and suggests using the Wasserstein-1-metric, which leads to the development of WGAN.Later, [21] proposed the Least Squares GANs (LSGANs) that adopt a least-squares loss function for the discriminator to overcome the vanishing gradients problem.[22] also proposed a conditional model that extends the original GAN framework by incorporating additional information such as labels, tags, or attributes.This information is provided to the GAN framework through an additional input layer.However, it is outside the content of this study to provide a complete review of such approaches.Therefore, the study covers only the studies focusing on time series data since it is interested in generating time series data, and it can draw some inspiration from applying GANs in financial and electricity markets.Various GANs have been proposed to generate financial and energy time series data.Financial time series are more challenging to model than other time series because of their high volatility and unexpected market behavior.Therefore, alternative GAN models have been proposed to overcome these challenges.For instance, one of the earliest works presented by [20] offers QuantGANs for financial time series data generation.The QuantGANs utilizes temporal convolutional networks to capture long-range dependencies and can generate realistic stock price simulations employing a data-driven neural network.The QuantGANs can capture the temporal dependence of financial time series, including volatility clustering.[23] proposed a variant of Conditional GANs (cGANs), called Stock-GAN, to generate order flow in the limit order book.The authors showed that cGANs could generate a realistic and high-fidelity stock market.Similarly, [24] and [25] generated transaction prices in a stock market by using cGAN and illustrated the accuracy of GANs in stock markets.Recently, the use of GANs in electricity markets has gained significant attention.[10] utilized RCGAN, TimeGAN, CWGAN, and RCWGAN for univariate electricity consumption time series data generation.In their empirical analysis, the authors showed that all four GANs could generate realistic electricity consumption for an individual.Furthermore, they showed the GANs' stability and no vanishing gradient.[26] employed Deep Convolutional GANs (DCGANs) to generate power profile scenarios for wind and solar power plants and energy consumption data.They show that GANs captured the patterns of renewable energy production in both temporal and spatial dimensions under the assumption of a large number of correlated resources.[27] utilized cWGAN to generate synthetic energy consumption data and generated realistic energy consumption data by given labels as a condition imitating real data distribution.[28] used deep learning GANs in generating electricity consumption and fault diagnosis to develop smart management tools for heating, ventilation, and air conditioning (HVAC).The authors showed that deep learning GAN can help to increase the fault diagnosis accuracy in electricity consumption.
[29] proposed GANs as a novel method to generate realistic electrical load profiles of buildings.They showed that the load profiles generated by GANs could mirror the general load trend and the random variations of the actual loads in buildings.Furthermore, they suggested that GANs detect changes in load profiles, anonymize smart meter data, and support grid management applications.[30] utilized GANs to quantify the uncertainties related to the climate and humansystem-driven in the energy market.They revealed that climate-driven uncertainties in human systems cause higher fluctuations in load profiles.[31] also aimed to build scenarios by embedding GANs and understanding the stochastic and dynamic characteristics of renewable energy re-sources.They demonstrated that GANs achieved the controllable generation of renewable energy generation scenarios covering various statistical characteristics and revealed new patterns.[32] benefited GANs to generate scalable and realistic energy demand.The authors claimed that GANs are promising for generating realistic energy demand data.[33] offer GANs a potential approach for predicting large-scale building energy consumption to manage grid operations.To this end, [33] used various GANs (the original GAN, cGAN, SGAN, InfoGAN, and ACGAN) to predict large-scale building energy consumption.The authors claimed that the success of the GANs highly depends on the data size.Further, they claim that SGAN and InfoGAN are unsuitable for large-scale building electricity consumption prediction since these two GANs do not control the number of generated building samples for different building types.Machine learning applications in Türkiye's energy market are not a new concept, and there are remarkable works in the literature.For instance, [34] used artificial neural networks (ANN) to predict and forecast energy consumption and make correct investments in Türkiye by considering economic indicators (gross national product-GNP and gross domestic product-GDP) and population increase as independent variables.[35] used ANN to forecast electricity consumption in various sectors.Similarly, [36] model Türkiye's energy consumption using ANN and regression analyses to forecast projections by considering explanatory variables, such as socio-economic and demographic factors (gross domestic product (GDP), import and export, population, and employment).[37] and [38] developed acceptable methods based on the ANN model that uses GDP, population, imports, exports, building area, and number of vehicles for estimating Türkiye's future energy demand while [39] developed forecasting models relying on ANN to predict the energy consumption in Türkiye's transportation sector.However, there is no study utilizing GANs for Türkiye's energy market.

Generative adversarial networks
Training GANs are challenging since they generally suffer from missing modes problem or model collapse, where the generated samples lack variety and only cover some regions of the space.Another common issue is vanishing gradients that can stop GANs training from converging to an optimal state.Dealing with these problems has gained significant attention in GAN research, and various approaches have been proposed to mitigate them.However, it is outside the content of this study to provide a complete review of such approaches.As [11] introduced, GANs belong to the family of unsupervised learning algorithms.They can learn dense representations of input datasets and are utilized as generative models.The superiority of GANs is the ability to generate new samples having (nearly) the same distribution as the training dataset.They contain two competing neural networks, Generator (G) and Discriminator (D).Therefore, the training of GANs relies on a zero-sum game.G directly produces samples from a well-known distribution (e.g., normal and uniform distributions) as input (latent vector z), and D attempts to distinguish between samples drawn from training and generated data.The discriminator output (D(x)) corresponds to the probability that a sample belongs to the distribution underlying the training data.On the other hand, the generator output (G(z)) is a sample from the learned distribution.The competition between G and D is formulated as , where D(x) : R n → [0, 1] and G(z) : R d → R n , where G is the generator function that takes random samples z ∈ R d from a predefined distribution γ (usually a Gaussian distribution) and generates samples G(z) [40,41].This linear function illustrates the adversarial competition between the generator and discriminator.Here, the discriminator outputs a binary variable, where D(x) = 1 for real samples and D(x) = 0 for generated samples, while the generator outputs a synthetic sample vector.
It is important to note that when it comes to generating or predicting time series data, it is more significant to determine the correct conditional distribution rather than learning the joint distribution.This is because, in predictive modeling, we are concerned with identifying the conditional distributions of the future time series x f uture = x t+1:t+q , which refers to the following q values given the past p observations of the time series x past = x t−p+1:t at time t (for more information, see [42].) The GANs considered in this study are characterized by the selection of their respective loss functions for the discriminator and generator.

Recurrent Conditional GAN (RCGAN)
The RCGAN shares a similar architecture with the traditional GAN but with a modification where both the generator and discriminator are replaced with recurrent neural networks (RNNs).This change enables the RCGAN to generate sequence data dependent on specific conditional inputs and can produce realistic outputs.
Let RNN(X) be the vector consisting of T outputs from an RNN that receives a sequence of T vectors {x t } T t=1 (x t ∈ R d ), and let CE(a, b) denote the average cross-entropy between the sequences a and b.Then, according to [8], the discriminator and generator loss of X n , y n , where X n ∈ R T×d and y n ∈ 0, 1 T , can be expressed as where y n is a vector consisting of ones if the sequence is real and zeros if it is fake.Z n is a sequence of T points drawned from the latent space Z, which is typically a m-dimensional Gaussian distribution.Therefore, Z n is a matrix with dimensions T × m.The vector 1 represents the decision of the discriminator accepting a given sequence as real data.During each training step, the discriminator uses both real and fake sequences.

Time-series GAN (TimeGAN)
TimeGAN was initially introduced in the work by [9].This approach focuses on datasets that contain both static and temporal features.Static features remain constant and unchanging over time (such as gender), while temporal features change and are updated over time.The static and temporal features can be represented using the vectors S and X , respectively.We can also assign specific values to random vectors S ∈ S and X ∈ X , represented by s and x, respectively.Let us consider tuples S, X 1:T , where the joint distribution is denoted as p, and the length T of each sequence is also a random variable.In the training data, we can index individual samples using n ∈ 1, . . ., N, and denote the training dataset as D = (s n , x{n, 1 : T n )} N n=1 .The objective is to find the density p(S, X 1:T ) that satisfactorily approximates the real data density p(S, X 1:T ) using the training dataset D. However, achieving this task may require more work in the traditional GAN framework.To address this issue, [9] suggests using an autoregressive decomposition of p((S), X 1:T ) = p(S)Πp(X t | S, X 1:t−1 ) to concentrate on the additional information given as conditionals.TimeGAN is distinct from traditional GANs in that it comprises four neural network components: two autoencoding components, namely the embedding and recovery functions, and two adversar-ial components, namely the generator and discriminator.The main concept behind TimeGAN is that the autoencoding and adversarial components are trained jointly.Consequently, TimeGAN can simultaneously learn how to encode features, generate replicas, and iterate across time.The embedding network creates the latent space, while the adversarial network operates within this space.By means of a supervised loss, the latent dynamics of both empirical and generated data are synchronized.

Conditional Wasserstein GAN (CWGAN) and Recurrent Condition Wasserstein GAN (RCW-GAN)
The WGAN was first presented in [2] as a solution to address the issues of mode collapse and vanishing gradient in traditional GANs.Instead of optimizing the traditional GAN loss, which is known to be prone to these issues, the WGAN optimizes the Wasserstein-1 distance.However, calculating the exact Wasserstein-1 distance is often impractical, so instead, the objective function is altered to approximate the Wasserstein-1 distance as If D satisfies a Lipschitz constraint with a constant k, then it can be shown that the Wasserstein-1 distance is equivalent to the supremum of the output difference of D on pairs of inputs.The WGAN uses weight clipping to enforce the Lipschitz constraint, which restricts the weights of D to a compact interval such as [−c, c] where c is a small positive value (e.g., 0.01).However, this technique can limit the capacity of the discriminator.It may cause the weights to converge to the endpoints of the interval, leading to gradient issues like vanishing or exploding gradients.The WGAN-GP method is an improvement over the WGAN, and it addresses the drawbacks of weight-clipping by using a gradient penalty technique.Weight-clipping is replaced with soft enforcement of the Lipschitz constraint through a penalty on the discriminator.This penalty is based on a differentiable function being 1-Lipschitz if its gradients have a norm of at most 1 everywhere.Therefore, the new objective of the GAN is expressed as where λ is the penalty coefficient.The CWGAN is an extension of WGAN-GP that incorporates extra information into the model.This leads to a modified optimization problem that is given by min where y is the vector of additional information.The RCWGAN architecture is similar to that of the CWGAN, but instead of using conventional neural networks as the generator and discriminator, Recurrent Neural Networks (RNNs) are employed.

Data and its stylized facts
The study uses the load data from Türkiye gathered from Epias1 in its empirical analysis.The data consists of seven years of hourly load in Türkiye's energy market.The hourly load profile data over the period 01.01.2016-31.12.2022 is visualized in Figure 1.The figure shows that the load data contains inherent patterns that can be effectively leveraged through machine-learning techniques for modeling purposes.The load has a strong seasonality and a slightly increasing trend.The figure reveals a significant decrease in electricity consumption in Türkiye in the first and second quarters compared to other years.The decline in electricity demand can be attributed to several factors, such as the interruption of production in plants, reduced work hours resulting from restrictions, and the implementation of lockdown measures in cities.While climatic conditions and industrial activities commonly influence fluctuations in total electricity demand, these regular variations cannot account for the significant decrease observed.[43] explains the sharp decline in electricity demand as a direct consequence of the crisis caused by COVID-19 pandemic, clearly highlighting the impact of the prevailing pandemic conditions.Table 1  The maximum load is 55575.02MWh could also be an outlier or a peak value in the investigated period.In Figure 2, each cell corresponds to a calendar year day.The color intensity of the cells in the heatmap represents the varying electricity consumption levels across days.The lighter shades indicate higher electricity consumption, while darker shades represent lower electricity consumption.By analyzing this heatmap, we can observe patterns and trends in electricity usage throughout the years.The heatmap helps us to identify peak periods of electricity consumption, such as during weekdays when industrial and commercial activities are at their highest and lower periods during weekends or holidays when there is reduced demand.Such a visualization assists in understanding energy consumption patterns, identifying potential areas for energy conservation, and optimizing electricity distribution and resource planning strategies.For instance, we can detect certain days of the week or times of the year when the load tends to be higher or lower than average.Figure 2 also reveals a significant change in the load patterns over time, such as an increase in demand in August due to changes in the climate.It also reveals a significant decrease in the electricity demand during the weekends and public holidays.unusual electricity consumption levels that should be investigated further.In contrast, patterns in the data could reveal important insights into energy usage.

Data preprocessing
In standard modeling and machine learning applications, deseasonalizing the data may be necessary to remove the effects of seasonality.However, deseasonalizing can make it difficult for the GAN to learn the underlying patterns and generate realistic samples since seasonality is an essential feature of the data we want to preserve.Hence, the data is preprocessed using robust scaling.This scaling method uses the interquartile range.Therefore, it is a robust scaling method for outliers.It has a formula given as where Q 1 and Q 3 correspond to the 1st quartile and 3rd quartile, i.e., in between 25th quantile and 75th quantile range, respectively.Consequently, it removes the median and scales the data between Q 1 and Q 3 .
Table 2 provides descriptive load statistics after the preprocessing.The length of the training data decreased to 8017.As it is clear, the number of data points is decreased significantly.Here, the study uses only load data in 2017 for the training since the load is quite regular and identical in a classical calendar year (see Figures 2 and 3).After scaling the data, the mean value of −0.045 suggests that the variable has a slightly negative skew, although the value is close to zero.The standard deviation of 0.677 indicates that the variable has moderate variability.The minimum value of −1.983 and maximum value of 1.864 show the range of load values, with values falling between these two extremes.The quartile values indicate the distribution across the dataset, with the median (50th percentile) value of 0.000 falling at the center of the distribution.

Experimental studies
The study fixed time series parameters p and q as 4 for the CWGAN and 3 for the remaining GANs to learn the conditional distribution.The discriminators utilize the conditioning time series x t−p+1:t as inputs to generate the part of time series x t+1:t+q , i.e., it uses a rolling window size p + q = 8 for the CWGAN and p + q = 6 for the others.It optimizes GANs algorithms for a total of 1000 generator weight updates.It utilizes the Adam optimizer [44] with parameters β 1 = 0 and β 2 = 0.9 to optimize neural network weights in the generator and discriminator and sets the learning rates to 0.001.In the RCGAN and TimeGAN cases, it applies two time-scale updates (TTUR) [45] and sets the learning rate to 0.003.Further, it updates discriminator weights twice per generator weight update to improve stability.The number of epochs operated is 1.000, with a batch size of 200 for all GANs.In the empirical performances, the study uses the Pytorch library [46] to build the GANs.It supplies high-level building blocks for designing deep learning models.Pytorch is a symbolic tensor manipulation framework alternative to TensorFlow. Figure 4 shows the empirical distributional properties of the GANs and the real data to compare the distributions.The figure reveals a close match between real and synthetic load data distributions.All GANs have relatively close means, skewness, and kurtosis values.The histograms of real and synthetic datasets and their skewness and kurtosis statistics are presented in the figure to measure symmetry, tail behaviors, and changes in their auto-correlation.The histograms in the first column illustrate that the distribution of hourly load from synthetic load data (orange) is nearly equivalent to the real load data (blue).Only the RCGAN has positive skewness statistics, while the remaining GANs have negative skewness statistics, which is also positive for the real data.
In contrast, the kurtosis statistics are all positive for the GANs and real data.The RCGAN has the closes kurtosis statistics (−0.52) to the real data kurtosis statistics (−0.53) while the TimeGAN kurtosis statistics (−0.16) has a more considerable distance.Consequently, the TimeGAN is more peak than the real data.Furthermore, histograms of the log values presented in Figure 4 reveal some discrepancies in the tails of TimeGAN.The CWGAN and RCWGAN are better than the other two GANs in generating low loads, while the TimeGAN is the worst at generating low and high loads.The RCGAN, on the other hand, is the best at generating high load.The auto-correlations of all the GANs and real data are relatively close.While there are some differences in the skewness and kurtosis statistics, the synthetic datasets' distributional behaviors are nearly identical to the real data distribution.Also, it includes a randomly selected synthetic data realization for each GAN (orange line) to compare with the real data (blue line) to observe the similarity in behavior between the two.The results show that the synthetic data generated by the GANs are bounded above and below, and none of the generated data is exploding.However, the figure reveals that some GAN-generated data points may have larger maximum and minimum values than the real data.This feature is particularly interesting for risk management analysis, such as controlling whether the electricity provider can handle extreme electricity demand.Additionally, the maximum and minimum electricity consumption values generated by TimeGAN are closer to the real data than the other GANs.Finally, the highlighted path's behavior closely mimics the empirical dataset's behavior.Similarly, the standard deviation of the synthetic load data generated by TimeGAN is lower than generated load data by other GANs, implying that TimeGAN produced synthetic load data that is less variable than the others.Although the table presents variations in the statistical properties, these variations are relatively small.Overall, the table provides a helpful summary of the statistical properties of the real and synthetic load data generated by the GANs.However, the table needs to provide more information to compare the success of the GANs.Furthermore, the table presents key statistics for a single synthetic load data.Hence, the comparison can change for other synthetic load data (see Figure 5).

Conclusion
In the realm of power grid regulations, precise simulation and prediction of load have become increasingly essential.Therefore, load modeling has been extensively researched using various methods, including regression-based and artificial intelligence (AI) modeling techniques.Over the last decade, AI models have gained significant attention due to their ability to model load without requiring detailed building and environmental parameters.Two primary approaches to AI modeling exist, deep learning and traditional machine learning, which rely heavily on real-time recorded load data.It is undeniable that time-dependent recorded load data serves as a critical source of information for energy market participants.
Having representative and diverse training load data is crucial to achieving good performance from AI models.However, obtaining such data can be challenging, costly, and time-consuming.In cases where there is insufficient load data or the sampling of load data deviates from the observed data distribution, the accuracy of model predictions can be significantly affected.As a result, energy suppliers may experience substantial trading losses, and energy sources may be overused, leading to more significant problems in the long run.Therefore, this paper proposes using GANs for synthetic load data generation.Specifically, it utilizes the RCGAN, TimeGAN, CWGAN, and RCWGAN models in real-world applications, achieving state-of-the-art results for synthetic load data generation.The findings suggest that GANs can be utilized to address data privacy concerns and enhance load modeling efficiency for grid modeling.As shown in Figure 4, the CWGAN and RCWGAN models perform relatively better than the other GANs, with the ability to capture values in the tails.In contrast, the TimeGAN model is unsuccessful in capturing tail values, resulting in a more peaked distribution than the real data distribution.
Future work can explore enhancing GAN efficiency by providing additional information during the training process.The study shows that GANs effectively generate synthetic data for load modeling, enabling risk management and analysis of various scenarios.The significance of this study lies in demonstrating the effectiveness of GANs for modeling electricity consumption patterns, allowing non-academic researchers and institutions to make informed decisions and develop strategies for energy-related challenges.
The results of our study demonstrate that the GANs proposed in this paper can effectively generate synthetic data for load modeling.Therefore, the generated data can be combined with existing empirical demand data to address risk management issues such as extreme demand tests, optimal timing of maintenance for wind turbines, energy efficiency assessments for buildings, and profitability analysis for demand-or time-dependent pricing strategies.
In essence, GANs offer a valuable solution for generating synthetic electricity consumption data, enabling non-academic researchers and institutions to gain insights, conduct simulations, and develop innovative approaches.This study opens up new possibilities for leveraging advanced AI techniques to improve energy management and contribute to a more sustainable and efficient energy future.

List of abbreviations
Not applicable.

Figure 3
Figure 3 contains distributions of yearly, daily, and monthly electricity consumption data.It provides a visual representation of the distribution of electricity consumption across different periods.The first row illustrates the yearly electricity consumption distributions.It shows the distribution of electricity consumption across each year independently.The box in the middle of the plot represents the interquartile range (IQR) of electricity consumption data, with the median value marked as a line in the box.The whiskers extending from the box represent the range of the data, excluding any outliers.The second row illustrates the distributional properties of electricity consumption in days of the week.It shows the distribution of electricity consumption across each weekday, with the x-axis representing the day of the week and the y-axis representing the electricity consumption level.It reveals any significant change in the electricity consumption on the business days and a slight decrease during the weekends.The third row shows the distributions of electricity consumption for months of the year.It shows that there is an increase in electricity consumption during summers, as in Figure 2, which is most probably due to the cooling needs in the summer.All three figures show that there are outliers in the data indicating

Figure 4 .
Figure 4.The generated and original load are compared in terms of their marginal distributions using a linear scale (1st column), log-plot (2nd column), and the auto-correlation fit with real load data

Figure 5 .
Figure 5. Synthetic load trajectories (orange and gray) generated by GANs and observed load (blue) summarizes the descriptive 42MWh, which represents the third quartile of the load data distribution.

Table 1 .
Descriptive statistics of load in Türkiye energy market Figure 2 presents the average daily load profile over seven years, 01.01.2016-31.12.2022.The figure shows the variation in electricity consumption over daily, weekly, monthly, and yearly.The figures are arranged sequentially starting from 2016 such that each row represents a specific year; the top figure corresponds to daily electricity consumption in 2016, the second figure from the top illustrates daily electricity consumption in 2017, and finally, the bottom figure presents daily electricity consumption in 2022.The seven rows in each year are days of the week, Monday through Sunday, and lines separate months.

Table 2 .
The descriptive statistics after preprocessing load

Table 3
provides key statistics for real and synthetic electricity consumption data generated by RCGAN, TimeGAN, CWGAN, and RCWGAN.The first column lists the statistical measure of interest: the mean, standard deviation, maximum, and minimum.The subsequent columns show the corresponding values for each real and GANs.The mean of the real historical electricity consumption data is −0.0453, while the mean of the synthetic data generated by RCWGAN is −0.04467.This suggests that RCWGAN can generate synthetic data that closely resembles the statistical properties of real data.

Table 3 .
Key statistics of real and synthetic electricity consumption data