Araştırma Makalesi
BibTex RIS Kaynak Göster

Yüksek Boyutlu Zaman Serilerinde Özellik Çıkarma Yöntemlerinin Karşılaştırılması

Yıl 2024, Cilt: 39 Sayı: 4, 991 - 997, 25.12.2024
https://doi.org/10.21605/cukurovaumfd.1606090

Öz

Yüksek boyutlu veri setlerinde, makine öğrenmesi ile çalışmak iş yükünde artışa sebep olmaktadır. Bu nedenle tahminleme işlemleri yapılmadan önce, tüm veri seti içerisindeki en anlamlı veri noktalarının belirlenmesi gerekmektedir. Özellikle makine öğrenmesi alanında model performansını artırmak için kritik öneme sahiptir. Bu nedenle daha önce topraktaki kum, silt ve kil oranlarını belirlemek amacıyla önerilen bir sistemle elde edilen 14400 özellikli veri seti üzerinde, literatürde sıklıkla kullanılan Karşılıklı Bilgi, Temel Bileşen Analizi, Ki-kare, Bilgi Kazancı ve Varyans Eşiği Belirleme özellik seçme metotları denenmiştir. Bu 5 metodun başarı sonuçları R-kare (R2) ve Ortalama Mutlak Hata (OMH) cinsinden karşılaştırmalı olarak sunulmuştur. En iyi sonuçlar kum için Bilgi Kazancı metodu ile (R2 = 0.44), silt için Ki-kare ile (R2 = 0.17), kil için Varyans Eşiği Belirleme ile (R2 = 0.61) elde edilmiştir.

Kaynakça

  • 1. Wangni, J., Chen, N., 2016. Nonlinear feature extraction with max-margin data shifting. Proceedings of the AAAI Conference on Artificial Intelligence, 30(1), 10299.
  • 2. Guyon, I., Elisseeff, A., 2003. An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157-1182.
  • 3. Hastie, T., Tibshirani, R., Friedman, J., 2009. The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). Springer Series in Statistics. Springer, New York, NY, 745.
  • 4. Ren, S., Zhang, X., Li, H., Chu, G., Chen, D., Bai, H., Hu, C., 2022. Interpretable feature extraction for the numerical particle system. In B.H.V. Topping, & P. Iványi (Eds.), Proceedings of the Eleventh International Conference on Engineering Computational Technology. Civil-Comp Press, Edinburgh, UK.
  • 5. Alegeh, N., Thottoli, M., Mian, N.S., Longstaff, A.P., Fletcher, S., 2021. Feature extraction of time-series data using DWT and FFT for ballscrew condition monitoring. Advances in Transdisciplinary Engineering.
  • 6. Wang, Y., 2022. Malicious URL detection: An evaluation of feature extraction and machine learning algorithm. Highlights in Science, Engineering and Technology, 23, 117-123.
  • 7. Qian, X., Zhang, H., Yang, C., Wu, Y., He, Z., Wu, Q.-E., Zhang, H., 2018. Micro-cracks detection of multicrystalline solar cell surface based on self-learning features and low-rank matrix recovery. Sensor Review, 38(3), 360-368.
  • 8. Xu, Y., Yin, K., Zhang, J., Yao, L., 2008. A spatiotemporal approach to N170 detection with application to brain-computer interfaces. 2008 IEEE International Conference on Systems, Man, and Cybernetics.
  • 9. Doraikannan, S., Selvaraj, P., Burugari, V.K., 2019. Principal component analysis for dimensionality reduction for animal classification based on LR. International Journal of Innovative Technology and Exploring Engineering, 8(10), 1118-1123.
  • 10. Lin, J., Li, H., Zhou, C., Li, W., Shao, X., 2023. Autoencoder-based feature extraction for power time series data considering social information. Eighth International Conference on Electromechanical Control Technology and Transportation (ICECTT 2023).
  • 11. Liu, H., Motoda, H., 1998. Feature selection for knowledge discovery and data mining. The Springer International Series in Engineering and Computer Science. Springer, US.
  • 12. Liu, S., Tang, B., Chen, Q., Wang, X., Fan, X., 2015. Feature engineering for drug name recognition in biomedical texts: Feature conjunction and feature selection. Computational and Mathematical Methods in Medicine, 2015, 1-9.
  • 13. Bishop, C.M., 2006. Pattern recognition and machine learning. Information Science and Statistics. Springer, New York, NY, 778.
  • 14. Venkatesh, R., Anantharajan, S., Gunasekaran, S., 2023. Multi-gradient boosted adaptive SVM-based prediction of heart disease. International Journal of Computers Communications & Control, 18(5), 4994.
  • 15. Yusliani, N., Aruda, S.A.Q., Marieska, M.D., Saputra, D.M., Abdiansah, A., 2022. The effect of chi-square feature selection on question classification using multinomial naïve Bayes. Sinkron, 7(4), 2430-2436.
  • 16. Orhan, U., Kilinc, E., Albayrak, F., Aydin, A., Torun, A., 2022. Ultrasound penetration-based digital soil texture analyzer. Arabian Journal for Science and Engineering, 47(8), 10751-10767.
  • 17. Schölkopf, B., Smola, A.J., Müller, K., 1998. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5), 1299-1319.

Comparison of Feature Extraction Methods in High Dimensional Time Series

Yıl 2024, Cilt: 39 Sayı: 4, 991 - 997, 25.12.2024
https://doi.org/10.21605/cukurovaumfd.1606090

Öz

Working with high-dimensional datasets increases the workload on machine learning models. Therefore, before making predictions, the most meaningful data points in the entire data set must be determined. It is highly important to improve model performance, especially in the field of machine learning. For this reason, five feature selection methods—Mutual Information, Principal Component Analysis, Chi-square, Information Gain, and Variance Thresholding—commonly used in the literature, were tested on the 14400 feature data set obtained with a system previously proposed to determine the sand, silt and clay ratios in the soil. The success of these five methods is presented comparatively using R-square (R²) and Mean Absolute Error (MAE) metrics. The best results were obtained with the Information Gain method for sand (R2 = 0.44), with Chi-square for silt (R2 = 0.17), and with Variance Thresholding for clay (R2 = 0.61).

Kaynakça

  • 1. Wangni, J., Chen, N., 2016. Nonlinear feature extraction with max-margin data shifting. Proceedings of the AAAI Conference on Artificial Intelligence, 30(1), 10299.
  • 2. Guyon, I., Elisseeff, A., 2003. An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157-1182.
  • 3. Hastie, T., Tibshirani, R., Friedman, J., 2009. The elements of statistical learning: Data mining, inference, and prediction (2nd ed.). Springer Series in Statistics. Springer, New York, NY, 745.
  • 4. Ren, S., Zhang, X., Li, H., Chu, G., Chen, D., Bai, H., Hu, C., 2022. Interpretable feature extraction for the numerical particle system. In B.H.V. Topping, & P. Iványi (Eds.), Proceedings of the Eleventh International Conference on Engineering Computational Technology. Civil-Comp Press, Edinburgh, UK.
  • 5. Alegeh, N., Thottoli, M., Mian, N.S., Longstaff, A.P., Fletcher, S., 2021. Feature extraction of time-series data using DWT and FFT for ballscrew condition monitoring. Advances in Transdisciplinary Engineering.
  • 6. Wang, Y., 2022. Malicious URL detection: An evaluation of feature extraction and machine learning algorithm. Highlights in Science, Engineering and Technology, 23, 117-123.
  • 7. Qian, X., Zhang, H., Yang, C., Wu, Y., He, Z., Wu, Q.-E., Zhang, H., 2018. Micro-cracks detection of multicrystalline solar cell surface based on self-learning features and low-rank matrix recovery. Sensor Review, 38(3), 360-368.
  • 8. Xu, Y., Yin, K., Zhang, J., Yao, L., 2008. A spatiotemporal approach to N170 detection with application to brain-computer interfaces. 2008 IEEE International Conference on Systems, Man, and Cybernetics.
  • 9. Doraikannan, S., Selvaraj, P., Burugari, V.K., 2019. Principal component analysis for dimensionality reduction for animal classification based on LR. International Journal of Innovative Technology and Exploring Engineering, 8(10), 1118-1123.
  • 10. Lin, J., Li, H., Zhou, C., Li, W., Shao, X., 2023. Autoencoder-based feature extraction for power time series data considering social information. Eighth International Conference on Electromechanical Control Technology and Transportation (ICECTT 2023).
  • 11. Liu, H., Motoda, H., 1998. Feature selection for knowledge discovery and data mining. The Springer International Series in Engineering and Computer Science. Springer, US.
  • 12. Liu, S., Tang, B., Chen, Q., Wang, X., Fan, X., 2015. Feature engineering for drug name recognition in biomedical texts: Feature conjunction and feature selection. Computational and Mathematical Methods in Medicine, 2015, 1-9.
  • 13. Bishop, C.M., 2006. Pattern recognition and machine learning. Information Science and Statistics. Springer, New York, NY, 778.
  • 14. Venkatesh, R., Anantharajan, S., Gunasekaran, S., 2023. Multi-gradient boosted adaptive SVM-based prediction of heart disease. International Journal of Computers Communications & Control, 18(5), 4994.
  • 15. Yusliani, N., Aruda, S.A.Q., Marieska, M.D., Saputra, D.M., Abdiansah, A., 2022. The effect of chi-square feature selection on question classification using multinomial naïve Bayes. Sinkron, 7(4), 2430-2436.
  • 16. Orhan, U., Kilinc, E., Albayrak, F., Aydin, A., Torun, A., 2022. Ultrasound penetration-based digital soil texture analyzer. Arabian Journal for Science and Engineering, 47(8), 10751-10767.
  • 17. Schölkopf, B., Smola, A.J., Müller, K., 1998. Nonlinear component analysis as a kernel eigenvalue problem. Neural Computation, 10(5), 1299-1319.
Toplam 17 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Konular Yapay Yaşam ve Karmaşık Uyarlanabilir Sistemler
Bölüm Makaleler
Yazarlar

Emre Kılınç 0000-0002-5250-9322

Yayımlanma Tarihi 25 Aralık 2024
Gönderilme Tarihi 19 Ağustos 2024
Kabul Tarihi 23 Aralık 2024
Yayımlandığı Sayı Yıl 2024 Cilt: 39 Sayı: 4

Kaynak Göster

APA Kılınç, E. (2024). Comparison of Feature Extraction Methods in High Dimensional Time Series. Çukurova Üniversitesi Mühendislik Fakültesi Dergisi, 39(4), 991-997. https://doi.org/10.21605/cukurovaumfd.1606090