Araştırma Makalesi

Synthetic Data Generation with Modified Artificial Bee Colony Optimization Algorithm and Statistical Modeling

Cilt: 14 Sayı: 4 1 Aralık 2024
PDF İndir
TR EN

Synthetic Data Generation with Modified Artificial Bee Colony Optimization Algorithm and Statistical Modeling

Öz

Machine learning is a powerful decision support system used in analyzing and evaluating real-life data. This system aims to create new solutions and improve performance. Therefore, it is related to the field of data science. There are data on the basis of this relationship The effectiveness of drawing meaningful insights from data depends on the quality of the model's training. To improve this performance, the variety of combinations among the data and the total number of data in the dataset should be increased. But in this topic, insufficient data access, legal regulations, ethical rules, confidentiality procedures, privacy, data sharing restrictions and cost parameters are obstacles. Synthetic data generation is a basic step in the field of data science in order to solve all these problems, improve functionality and provide powerful machine-learning inferences. Therefore, a new synthetic data generation approach consisting of 3 basic stages is proposed in this study. In the first stage, synthetic data production similar to the distribution of the original data was carried out with the modified ABC (Artificial Bee Colony) optimization algorithm. In the second stage, the category information of the independent variables was determined by the statistical evaluation analyzed with regression methods among the artificial data produced. In the third stage, the efficiency and applicability of the artificial data produced were evaluated with supervised machine learning classifiers. As a result of the evaluation, it has been proven that the proposed synthetic data generation approach improves the performance of machine learning classifiers in proportion to the increasing number of data. The decision tree algorithm that showed maximum performance produced success rates of 100%, 92.5%, 100%, 85%, and 66% on 5 separate enriched datasets, respectively.

Anahtar Kelimeler

Kaynakça

  1. Akalın, F., & Yumuşak, N. (2022). DNA genom dizilimi üzerinde dijital sinyal işleme teknikleri kullanılarak elde edilen ekson ve intron bölgelerinin EfficientNetB7 mimarisi ile sınıflandırılması. Gazi Üniversitesi Mühendislik-Mimarlık Fakültesi Dergisi, 37(3), 1355–1371. https://doi.org/10.17341/gazimmfd.900987.
  2. Akay, B., Karaboga, D., Gorkemli, B., & Kaya, E. (2021). A survey on the artificial bee colony algorithm variants for binary, integer, and mixed integer programming problems. Applied Soft Computing, 106, 1–35. https://doi.org/10.1016/j.asoc.2021.107351.
  3. Alvarado-Iniesta, A., Garcia-Alcaraz, J. L., Rodriguez-Borbon, M. I., & Maldonado, A. (2013). Optimization of the material flow in a manufacturing plant by use of artificial bee colony algorithm. Expert Systems with Applications, 40, 4785–4790. https://doi.org/10.1016/j.eswa.2013.02.029.
  4. Arab, N., Nemmour, H., & Chibani, Y. (2023). A new synthetic feature generation scheme based on artificial immune systems for robust offline signature verification. Expert Systems with Applications, 213. https://doi.org/10.1016/j.eswa.2022.119306.
  5. Brnabic, A., & Hess, L. M. (2021). Systematic literature review of machine learning methods used in the analysis of real-world data for patient-provider decision making. BMC Medical Informatics and Decision Making, 21. https://doi.org/10.1186/s12911-021-01403-2.
  6. Chicco, D., Warrens, M. J., & Jurman, G. (2021). The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Computer Science, 7, 1–24. https://doi.org/10.7717/PEERJ-CS.623.
  7. Dahmen, J., & Cook, D. (2019). SynSys: A synthetic data generation system for healthcare applications. Sensors, 19(5), 1–11. https://doi.org/10.3390/s19051181.
  8. Dankar, F. K., & Ibrahim, M. (2021). Fake it till you make it: Guidelines for effective synthetic data generation. Applied Sciences, 11, 1–18. https://doi.org/10.3390/app11052158.

Ayrıntılar

Birincil Dil

İngilizce

Konular

Yazılım Mühendisliği (Diğer)

Bölüm

Araştırma Makalesi

Yayımlanma Tarihi

1 Aralık 2024

Gönderilme Tarihi

4 Haziran 2024

Kabul Tarihi

12 Eylül 2024

Yayımlandığı Sayı

Yıl 2024 Cilt: 14 Sayı: 4

Kaynak Göster

APA
Akalın, F. (2024). Synthetic Data Generation with Modified Artificial Bee Colony Optimization Algorithm and Statistical Modeling. Journal of the Institute of Science and Technology, 14(4), 1408-1431. https://doi.org/10.21597/jist.1495455
AMA
1.Akalın F. Synthetic Data Generation with Modified Artificial Bee Colony Optimization Algorithm and Statistical Modeling. Iğdır Üniv. Fen Bil Enst. Der. 2024;14(4):1408-1431. doi:10.21597/jist.1495455
Chicago
Akalın, Fatma. 2024. “Synthetic Data Generation with Modified Artificial Bee Colony Optimization Algorithm and Statistical Modeling”. Journal of the Institute of Science and Technology 14 (4): 1408-31. https://doi.org/10.21597/jist.1495455.
EndNote
Akalın F (01 Aralık 2024) Synthetic Data Generation with Modified Artificial Bee Colony Optimization Algorithm and Statistical Modeling. Journal of the Institute of Science and Technology 14 4 1408–1431.
IEEE
[1]F. Akalın, “Synthetic Data Generation with Modified Artificial Bee Colony Optimization Algorithm and Statistical Modeling”, Iğdır Üniv. Fen Bil Enst. Der., c. 14, sy 4, ss. 1408–1431, Ara. 2024, doi: 10.21597/jist.1495455.
ISNAD
Akalın, Fatma. “Synthetic Data Generation with Modified Artificial Bee Colony Optimization Algorithm and Statistical Modeling”. Journal of the Institute of Science and Technology 14/4 (01 Aralık 2024): 1408-1431. https://doi.org/10.21597/jist.1495455.
JAMA
1.Akalın F. Synthetic Data Generation with Modified Artificial Bee Colony Optimization Algorithm and Statistical Modeling. Iğdır Üniv. Fen Bil Enst. Der. 2024;14:1408–1431.
MLA
Akalın, Fatma. “Synthetic Data Generation with Modified Artificial Bee Colony Optimization Algorithm and Statistical Modeling”. Journal of the Institute of Science and Technology, c. 14, sy 4, Aralık 2024, ss. 1408-31, doi:10.21597/jist.1495455.
Vancouver
1.Fatma Akalın. Synthetic Data Generation with Modified Artificial Bee Colony Optimization Algorithm and Statistical Modeling. Iğdır Üniv. Fen Bil Enst. Der. 01 Aralık 2024;14(4):1408-31. doi:10.21597/jist.1495455