Research Article
BibTex RIS Cite

Yapay Sinir Ağları ve K-Ortalamalar Tabanlı Büyük Veri Azaltma Algoritmasının Tasarımı ve Uygulaması

Year 2021, , 329 - 342, 31.12.2021
https://doi.org/10.29130/dubited.1014161

Abstract

Büyük veri azaltma sürecinde karşılaşılan başlıca zorluk, veri setinin homojenliğinin ve problem uzayını temsil yeteneğinin korunmasıdır. Bu durum, büyük veri setleri üzerinde yapılan modelleme çalışmalarında hesaplama karmaşıklığının yeterince azaltılamamasına, geliştirilen modelin orijinal veri setine dayalı olarak geliştirilen modele kıyasla kararlılık ve doğruluk performansının önemli ölçüde azalmasına neden olmaktadır. Bu makale çalışmasının amacı, büyük veri setleri için kararlı ve etkili bir şekilde çalışan veri azaltma algoritması geliştirmektir. Bu amaçla, yapay sinir ağları (YSA) tabanlı problem modelleme modülü ve K-ortalamalar tabanlı veri azaltma modülünden oluşan melez bir algoritma geliştirilmiştir. Problem modelleme modülü, büyük veri seti için performans eşik değerlerini tanımlamayı sağlamaktadır. Bu sayede, orijinal veri setinin ve veri azaltma işlemi uygulanmış veri setlerinin problem uzayını temsil yetenekleri ve kararlılıkları analiz edilmektedir. K-ortalamalar modülünün görevi ise, veri uzayını K-adet kümede gruplamayı ve bu grupların her biri için küme merkezini referans alarak kademeli olarak veri (gözlem) azaltma işlemini gerçekleştirmektir. Böylelikle, K-ortalamalar modülü ile veri azaltma işlemi uygulanırken, azaltılmış veri setlerinin performansı ise YSA modülü ile test edilmekte ve performans eşik değerlerini karşılama durumu analiz edilmektedir. Geliştirilen melez veri azaltma algoritmasının performansını test etmek ve doğrulamak amacıyla UCI Machine Learning uluslararası veri havuzunda yer alan üç farklı veri seti kullanılmıştır. Deneysel çalışma sonuçları istatistiksel olarak analiz edilmiştir. Analiz sonuçlarına göre büyük veri setlerinde kararlılık ve performans kaybı yaşanmadan %30-%40 oranları arasında veri azaltma işlemi başarılı bir şekilde gerçekleştirilmiştir.

Supporting Institution

TÜBİTAK 2209-A Kapsamında Desteklenmiştir

Project Number

5207

References

  • [1] HT. Kahraman, “A novel and powerful hybrid classifier method: Development and testing of heuristic k-nn algorithm with fuzzy distance metric,” Data & Knowledge Engineering, c. 103, ss. 44-59, 2016.
  • [2] HT. Kahraman, B. Aras, & O. Yıldız. “Sınıflandırma Problemleri İçin Agde-Tabanlı Meta-Sezgisel Boyut İndirgeme Algoritmasının Geliştirilmesi,” Mühendislik Bilimleri ve Tasarım Dergisi, c. 8, s. 5, ss. 206-217, 2020.
  • [3] F. Arslan, & HT. Kahraman. “Yapay zekâ tabanlı büyük veri yönetim aracı,” Journal of Investigations on Engineering and Technology, c. 2, s. 1, ss. 8-21, 2019.
  • [4] Ö. Köroğlu, & HT. Kahraman. “K-Ortalamalar Tabanlı En Etkili Meta-Sezgisel Kümeleme Algoritmasının Araştırılması,” Mühendislik Bilimleri ve Tasarım Dergisi, c. 8, s. 5, ss. 173-184, 2020.
  • [5] N. Gokilavani and B. Bharathi, "Test case prioritization to examine software for fault detection using PCA extraction and K-means clustering with ranking," Soft Computing, vol. 25, no. 7, pp. 5163-5172, 2021.
  • [6] M. Sivaguru and M. Punniyamoorthy, "Performance-enhanced rough k k-means clustering algorithm," Soft Computing, vol. 25, no. 2, pp. 1595-1616, 2021.
  • [7] Z. Wang, Y. Zhou, and G. Li, "Anomaly Detection by Using Streaming K-Means and Batch K-Means," 2020 5th IEEE International Conference on Big Data Analytics (ICBDA). IEEE, vol. 5, pp. 11-17, 2020.
  • [8] Y. Li, and H. Wu, "A clustering method based on K-means algorithm," Physics Procedia vol. 25, pp. 1104-1109, 2012.
  • [9] CU. Kumari, SJ. Prasad, and G. Mounika, "Leaf disease detection: feature extraction with K-means clustering and classification with ANN," 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC). IEEE, pp. 1095-1098, 2019.
  • [10] VP. Murugesan, and P. Murugesan, "A new initialization and performance measure for the rough k-means clustering," Soft Computing, vol. 24, no. 15, pp. 11605-11619, 2020.
  • [11] OJ. Oyelade, OO. Oladipupo, and IC. Obagbuwa, "Application of k Means Clustering algorithm for prediction of Students Academic Performanc,." International Journal of Computer Science and Information Security, IJCSIS, Vol. 7, No. 1, pp. 292-295, 2010.
  • [12] M. Yedla, SR. Pathakota, and TM. Srinivasa, "Enhancing K-means clustering algorithm with improved initial center," International Journal of computer science and information technologies vol. 1, no. 2, pp. 121-125, 2010.
  • [13] BP. Koustubh, VV. Nair, and S. Kumaravel, "Anomaly Detection in Hybrid Electric Vehicles Using ANN Based Support Vector Data Description," 2018 International Conference on Power, Energy, Control and Transmission Systems (ICPECTS). IEEE, pp. 14-24, 2018.
  • [14] A. Pannu, "Artificial intelligence and its application in different areas," Artificial Intelligence, vol. 4, no. 10, pp. 79-84, 2015.
  • [15] N. Kayarvizhy, S. Kanmani, and RV. Uthariaraj, "ANN models optimized using swarm intelligence algorithms," WSEAS Transactions on Computers vol. 13, no. 45, pp. 501-519, 2014.
  • [16] L. Cavallaro, "Artificial neural networks training acceleration through network science strategies," Soft Computing vol. 24, no. 23, pp. 17787-17795, 2020.
  • [17] H. Yaşar, "A novel approach for estimation of coronary artery calcium score class using ANN and body mass index, age and gender data," 2018 4th International Conference on Computer and Technology Applications (ICCTA). IEEE, pp. 184-187, 2018.
  • [18] J. Xu, "ANN based on IncCond algorithm for MPP tracker," 2011 Sixth International Conference on Bio-Inspired Computing: Theories and Applications. IEEE, pp. 129-134, 2011.
  • [19] S. Akhmedova, and E. Semenkin, "Co-operation of biology related algorithms meta-heuristic in ANN-based classifiers design," 2014 IEEE Congress on Evolutionary Computation (CEC). IEEE, pp. 2207-2214,2014.
  • [20]S. Anitha, and M. Vanitha, "Optimal Artificial Neural Network based Data Mining Technique for Stress Prediction in Working Employees." Soft Computing, vol. 25, no. 17, pp. 11421-11428, 2021.
  • [21] T. Srivastaya, (October 20, 2014).How does Artificial Neural Network (ANN) algorithm work? [Online]. Avaiable: https://www.analyticsvidhya.com/blog/2014/10/ann-work-simplified/
  • [22]C. Yilmaz, HT. Kahraman and S. Söyler, “Passive mine detection and classification method based on hybrid model,” IEEE Access, c. 6, ss. 47870-47888, 2018.
  • [23]R. Bayindir, I. Colak, S. Sagiroglu and HT. Kahraman, “Application of adaptive artificial neural network method to model the excitation currents of synchronous motors,” IEEE, vol. 2, pp. 498-502, 2012.
  • [24]A. Radhika, and MS. Masood, "Effective dimensionality reduction by using soft computing method in data mining techniques," Soft Computing vol. 25, no. 6, pp. 4643-4651, 2021.
  • [25]T. Karin and D. Mondial, "Data Reduction and Deep-Learning Based Recovery for Geospatial Visualization and Satellite Imagery," 2020 IEEE International Conference on Big Data (Big Data). IEEE, vol.16, no. 3, pp. 439-454, 2020.
  • [26]SL. Wong, BY. Ooi and SY Liew, "Data Reduction with Real-Time Critical Data Forwarding for Internet-of-Things," 2019 International Conference on Green and Human Information Technology (ICGHIT). IEEE, pp. 1-6, 2019.
  • [27] A. Moitra, NO. Malott and PA. Wilsey, "Cluster-based data reduction for persistent homology," 2018 IEEE International Conference on Big Data (Big Data). IEEE, pp. 327-334, 2018.
  • [28]D. Dua and C. Graff , (2019) UCI Machine Learning Repository [Online]. Avaiable: http://archive.ics.uci.edu/ml
  • [29]T. Athanasios and A. Xifara, "Accurate quantitative estimation of energy performance of residential buildings using statistical machine learning tools," Energy and Buildings vol. 49, pp. 560-567, 2012.
  • [30]IC. Yeh, "Modeling of strength of high-performance concrete using artificial neural Networks," Cement and Concrete research. pp. 1797-1808, 1998.
  • [31] T. Athanasios."Accurate telemonitoring of Parkinson’s disease progression by non-invasive speech tests," Nature Precedings. pp. 1-1, 2009.
  • [32]H. Kaya, P. Tüfekcin and E. Uzun, "Predicting co and no x emissions from gas turbines: novel data and a benchmark pems," Turkish Journal of Electrical Engineering & Computer Sciences vol. 27, no. 6, pp. 4783-4796, 2019.
  • [33]P. Tüfekci, "Prediction of full load electrical power output of a base load operated combined cycle power plant using machine learning methods," International Journal of Electrical Power & Energy Systems vol. 60, pp. 126-140, 2014.
  • [34]B. Rafael, EG. Paredes and R. Pajarola, "Sobol tensor trains for global sensitivity analysis," Reliability Engineering & System Safety vol. 183, pp. 311-322, 2019.
Year 2021, , 329 - 342, 31.12.2021
https://doi.org/10.29130/dubited.1014161

Abstract

Project Number

5207

References

  • [1] HT. Kahraman, “A novel and powerful hybrid classifier method: Development and testing of heuristic k-nn algorithm with fuzzy distance metric,” Data & Knowledge Engineering, c. 103, ss. 44-59, 2016.
  • [2] HT. Kahraman, B. Aras, & O. Yıldız. “Sınıflandırma Problemleri İçin Agde-Tabanlı Meta-Sezgisel Boyut İndirgeme Algoritmasının Geliştirilmesi,” Mühendislik Bilimleri ve Tasarım Dergisi, c. 8, s. 5, ss. 206-217, 2020.
  • [3] F. Arslan, & HT. Kahraman. “Yapay zekâ tabanlı büyük veri yönetim aracı,” Journal of Investigations on Engineering and Technology, c. 2, s. 1, ss. 8-21, 2019.
  • [4] Ö. Köroğlu, & HT. Kahraman. “K-Ortalamalar Tabanlı En Etkili Meta-Sezgisel Kümeleme Algoritmasının Araştırılması,” Mühendislik Bilimleri ve Tasarım Dergisi, c. 8, s. 5, ss. 173-184, 2020.
  • [5] N. Gokilavani and B. Bharathi, "Test case prioritization to examine software for fault detection using PCA extraction and K-means clustering with ranking," Soft Computing, vol. 25, no. 7, pp. 5163-5172, 2021.
  • [6] M. Sivaguru and M. Punniyamoorthy, "Performance-enhanced rough k k-means clustering algorithm," Soft Computing, vol. 25, no. 2, pp. 1595-1616, 2021.
  • [7] Z. Wang, Y. Zhou, and G. Li, "Anomaly Detection by Using Streaming K-Means and Batch K-Means," 2020 5th IEEE International Conference on Big Data Analytics (ICBDA). IEEE, vol. 5, pp. 11-17, 2020.
  • [8] Y. Li, and H. Wu, "A clustering method based on K-means algorithm," Physics Procedia vol. 25, pp. 1104-1109, 2012.
  • [9] CU. Kumari, SJ. Prasad, and G. Mounika, "Leaf disease detection: feature extraction with K-means clustering and classification with ANN," 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC). IEEE, pp. 1095-1098, 2019.
  • [10] VP. Murugesan, and P. Murugesan, "A new initialization and performance measure for the rough k-means clustering," Soft Computing, vol. 24, no. 15, pp. 11605-11619, 2020.
  • [11] OJ. Oyelade, OO. Oladipupo, and IC. Obagbuwa, "Application of k Means Clustering algorithm for prediction of Students Academic Performanc,." International Journal of Computer Science and Information Security, IJCSIS, Vol. 7, No. 1, pp. 292-295, 2010.
  • [12] M. Yedla, SR. Pathakota, and TM. Srinivasa, "Enhancing K-means clustering algorithm with improved initial center," International Journal of computer science and information technologies vol. 1, no. 2, pp. 121-125, 2010.
  • [13] BP. Koustubh, VV. Nair, and S. Kumaravel, "Anomaly Detection in Hybrid Electric Vehicles Using ANN Based Support Vector Data Description," 2018 International Conference on Power, Energy, Control and Transmission Systems (ICPECTS). IEEE, pp. 14-24, 2018.
  • [14] A. Pannu, "Artificial intelligence and its application in different areas," Artificial Intelligence, vol. 4, no. 10, pp. 79-84, 2015.
  • [15] N. Kayarvizhy, S. Kanmani, and RV. Uthariaraj, "ANN models optimized using swarm intelligence algorithms," WSEAS Transactions on Computers vol. 13, no. 45, pp. 501-519, 2014.
  • [16] L. Cavallaro, "Artificial neural networks training acceleration through network science strategies," Soft Computing vol. 24, no. 23, pp. 17787-17795, 2020.
  • [17] H. Yaşar, "A novel approach for estimation of coronary artery calcium score class using ANN and body mass index, age and gender data," 2018 4th International Conference on Computer and Technology Applications (ICCTA). IEEE, pp. 184-187, 2018.
  • [18] J. Xu, "ANN based on IncCond algorithm for MPP tracker," 2011 Sixth International Conference on Bio-Inspired Computing: Theories and Applications. IEEE, pp. 129-134, 2011.
  • [19] S. Akhmedova, and E. Semenkin, "Co-operation of biology related algorithms meta-heuristic in ANN-based classifiers design," 2014 IEEE Congress on Evolutionary Computation (CEC). IEEE, pp. 2207-2214,2014.
  • [20]S. Anitha, and M. Vanitha, "Optimal Artificial Neural Network based Data Mining Technique for Stress Prediction in Working Employees." Soft Computing, vol. 25, no. 17, pp. 11421-11428, 2021.
  • [21] T. Srivastaya, (October 20, 2014).How does Artificial Neural Network (ANN) algorithm work? [Online]. Avaiable: https://www.analyticsvidhya.com/blog/2014/10/ann-work-simplified/
  • [22]C. Yilmaz, HT. Kahraman and S. Söyler, “Passive mine detection and classification method based on hybrid model,” IEEE Access, c. 6, ss. 47870-47888, 2018.
  • [23]R. Bayindir, I. Colak, S. Sagiroglu and HT. Kahraman, “Application of adaptive artificial neural network method to model the excitation currents of synchronous motors,” IEEE, vol. 2, pp. 498-502, 2012.
  • [24]A. Radhika, and MS. Masood, "Effective dimensionality reduction by using soft computing method in data mining techniques," Soft Computing vol. 25, no. 6, pp. 4643-4651, 2021.
  • [25]T. Karin and D. Mondial, "Data Reduction and Deep-Learning Based Recovery for Geospatial Visualization and Satellite Imagery," 2020 IEEE International Conference on Big Data (Big Data). IEEE, vol.16, no. 3, pp. 439-454, 2020.
  • [26]SL. Wong, BY. Ooi and SY Liew, "Data Reduction with Real-Time Critical Data Forwarding for Internet-of-Things," 2019 International Conference on Green and Human Information Technology (ICGHIT). IEEE, pp. 1-6, 2019.
  • [27] A. Moitra, NO. Malott and PA. Wilsey, "Cluster-based data reduction for persistent homology," 2018 IEEE International Conference on Big Data (Big Data). IEEE, pp. 327-334, 2018.
  • [28]D. Dua and C. Graff , (2019) UCI Machine Learning Repository [Online]. Avaiable: http://archive.ics.uci.edu/ml
  • [29]T. Athanasios and A. Xifara, "Accurate quantitative estimation of energy performance of residential buildings using statistical machine learning tools," Energy and Buildings vol. 49, pp. 560-567, 2012.
  • [30]IC. Yeh, "Modeling of strength of high-performance concrete using artificial neural Networks," Cement and Concrete research. pp. 1797-1808, 1998.
  • [31] T. Athanasios."Accurate telemonitoring of Parkinson’s disease progression by non-invasive speech tests," Nature Precedings. pp. 1-1, 2009.
  • [32]H. Kaya, P. Tüfekcin and E. Uzun, "Predicting co and no x emissions from gas turbines: novel data and a benchmark pems," Turkish Journal of Electrical Engineering & Computer Sciences vol. 27, no. 6, pp. 4783-4796, 2019.
  • [33]P. Tüfekci, "Prediction of full load electrical power output of a base load operated combined cycle power plant using machine learning methods," International Journal of Electrical Power & Energy Systems vol. 60, pp. 126-140, 2014.
  • [34]B. Rafael, EG. Paredes and R. Pajarola, "Sobol tensor trains for global sensitivity analysis," Reliability Engineering & System Safety vol. 183, pp. 311-322, 2019.
There are 34 citations in total.

Details

Primary Language Turkish
Subjects Engineering
Journal Section Articles
Authors

Hamdi Kahraman 0000-0001-9985-6324

Seyithan Temel 0000-0003-4157-8426

Project Number 5207
Publication Date December 31, 2021
Published in Issue Year 2021

Cite

APA Kahraman, H., & Temel, S. (2021). Yapay Sinir Ağları ve K-Ortalamalar Tabanlı Büyük Veri Azaltma Algoritmasının Tasarımı ve Uygulaması. Duzce University Journal of Science and Technology, 9(6), 329-342. https://doi.org/10.29130/dubited.1014161
AMA Kahraman H, Temel S. Yapay Sinir Ağları ve K-Ortalamalar Tabanlı Büyük Veri Azaltma Algoritmasının Tasarımı ve Uygulaması. DÜBİTED. December 2021;9(6):329-342. doi:10.29130/dubited.1014161
Chicago Kahraman, Hamdi, and Seyithan Temel. “Yapay Sinir Ağları Ve K-Ortalamalar Tabanlı Büyük Veri Azaltma Algoritmasının Tasarımı Ve Uygulaması”. Duzce University Journal of Science and Technology 9, no. 6 (December 2021): 329-42. https://doi.org/10.29130/dubited.1014161.
EndNote Kahraman H, Temel S (December 1, 2021) Yapay Sinir Ağları ve K-Ortalamalar Tabanlı Büyük Veri Azaltma Algoritmasının Tasarımı ve Uygulaması. Duzce University Journal of Science and Technology 9 6 329–342.
IEEE H. Kahraman and S. Temel, “Yapay Sinir Ağları ve K-Ortalamalar Tabanlı Büyük Veri Azaltma Algoritmasının Tasarımı ve Uygulaması”, DÜBİTED, vol. 9, no. 6, pp. 329–342, 2021, doi: 10.29130/dubited.1014161.
ISNAD Kahraman, Hamdi - Temel, Seyithan. “Yapay Sinir Ağları Ve K-Ortalamalar Tabanlı Büyük Veri Azaltma Algoritmasının Tasarımı Ve Uygulaması”. Duzce University Journal of Science and Technology 9/6 (December 2021), 329-342. https://doi.org/10.29130/dubited.1014161.
JAMA Kahraman H, Temel S. Yapay Sinir Ağları ve K-Ortalamalar Tabanlı Büyük Veri Azaltma Algoritmasının Tasarımı ve Uygulaması. DÜBİTED. 2021;9:329–342.
MLA Kahraman, Hamdi and Seyithan Temel. “Yapay Sinir Ağları Ve K-Ortalamalar Tabanlı Büyük Veri Azaltma Algoritmasının Tasarımı Ve Uygulaması”. Duzce University Journal of Science and Technology, vol. 9, no. 6, 2021, pp. 329-42, doi:10.29130/dubited.1014161.
Vancouver Kahraman H, Temel S. Yapay Sinir Ağları ve K-Ortalamalar Tabanlı Büyük Veri Azaltma Algoritmasının Tasarımı ve Uygulaması. DÜBİTED. 2021;9(6):329-42.