COMPONENT ANALYSIS FOR INTERESTING PATTERN DETECTION IN MULTI-VARIABLE DATA SETS

Ahmet Yücel

doi:10.33461/uybisbbd.802938

Araştırma Makalesi

COMPONENT ANALYSIS FOR INTERESTING PATTERN DETECTION IN MULTI-VARIABLE DATA SETS

Yıl 2021, , 1 - 11, 30.06.2021

Ahmet Yücel

https://doi.org/10.33461/uybisbbd.802938

Öz

In recent years, great advances have been made on the concept of data, which has become the new power source of our age. Thanks to new methods and techniques at both coding and mechanical level, tremendous speeds have been achieved in the transfering, storing, and processing of data. Thanks to those digital developments, storing even the smallest information on digital platforms has become a natural part of daily life. From family photos to health history, from commercial records to academic publications, from a comment shared on Twitter to a video shared on Youtube, data in almost every field is stored instantly in different sizes. Interesting patterns and information in stored data waiting to be revealed are the main goals of data mining. In data mining studies, the size of data is one of the biggest problems encountered. Some of the problems encountered in large-scale data are the length of the processes of structuring such data and the jams that may occur during the execution of a model to be created afterward. Many dimension reduction algorithms have been developed to overcome the problems arising from large data sizes. In this study, a new dimension reduction approach has been developed on multivariate data. This approach generally consists of pattern recognition steps based on Principal Component Analysis (PCA). The created models were applied on disjoint and balanced sub-datasets and all produced significant results at the 0.05 confidence level. Explanatory performances of the models; They are in the range of [0.819, 0.888] on the multiple R-Square scale and in the range of [0.804, 0.878] on the R-Square scale.

Anahtar Kelimeler

Principal Component Analysis, Pattern Recognition, Multivariate Data Analysis

Kaynakça

Abdi H., Williams L. J., (2010)."Principal component analysis", Volume 2, John Wiley & Son s, In c. doi. 10.1002/wics.101
Ahmed, M. R., Tahid, S. T. I., Mitu, N. A., Kundu, P., & Yeasmin, S. (2020, July). A Comprehensive Analysis on Undergraduate Student Academic Performance using Feature Selection Techniques on Classification Algorithms. In 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT) (pp. 1-6). IEEE. doi: 10.1109/ICCCNT49239.2020.9225341.
Brownlee, J., (2018). “How to Calculate Principal Component Analysis (PCA) from Scratch in Python”.
Chandrashekar G., Sahin F., (2014)."A survey on feature selection methods", Computers and Electrical Engineering Vol. 40, Issue 1, pp. 16-28. doi.org/10.1016/j.compeleceng.2013.11.024
Chen, H. Yan, J. Zhang, G. Hong, H. Zhu, X. (2019) "Human target respiration pattern recognition based on vital-SAR-imaging". Asia-Pacific Microwave Conference Proceedings, APMC, Proceedings of the 2019 IEEE Asia-Pacific Microwave Conference, APMC 2019. (Asia-Pacific Microwave Conference Proceedings, APMC, December 2019, 2019-December:865-867)
Dash M., Liu H., (2003)."Consistency-based search in feature selection", Artificial Intelligence, Volume 151, Issues 1–2, Pages 155-176, doi.org/10.1016/S0004-3702(03)00079-1
De Reuver, M., Sørensen, C., Basole, R. (2017). "The Digital Platform: A Research Agenda". Journal of Information Technology. 33. 10.1057/s41265-016-0033-3.
Farahnaz G. M., Huthaifa A. Al_Issa. (2020). "Developing Machine Learning Model for Disambiguate Pattern Recognition on Social Media". 2020 International Conference on Computation, Automation and Knowledge Management (ICCAKM) Computation, Automation and Knowledge Management (ICCAKM), 2020 International Conference on. :547-551 Jan, 2020
Garg, A., Tai, K.. (2013). "Comparison of statistical and machine learning methods in modelling of data with multicollinearity". Int. J. of Modelling. 18. 295_312. 10.1504/IJMIC.2013.053535.
Kalra, B., Yadav, S., Chauhan, D. (2014). "A Review of Issues and Challenges with Big Data". 2. 97-101. International Journal of Computer Science and Information Technology Research. ISSN 2348-120X (online) Vol. 2, Issue 4, pp: (97-101), Month: October - December 2014
Madhavi B. Desai, S. V. Patel & Bhumi PrajapatI. (2016). "ANOVA and Fisher Criterion based Feature Selection for Lower Dimensional Universal Image Steganalysis". International Journal of Image Processing (IJIP), Volume (10) : Issue (3) : 2016 145.
Marta L., Mauro F. (2019) "Statistical analysis of proteomics data: A review on feature selection", Journal of Proteomics, Volume 198, 2019, Pages 18-26, ISSN 1874-3919, https://doi.org/10.1016/j.jprot.2018.12.004.
Musik, C. And Bogner, A. (2019) "Book title: Digitalization & society: A sociology of technology perspective on current trends in data, digital security and the internet", Österreichische Zeitschrift für Soziologie: Vierteljahresschrift der Österreichischen Gesellschaft für Soziologie, 44(Suppl 1), p. 1. doi: 10.1007/s11614-019-00344-5.
Müller, M., (2004). "Generalized Linear Models". 10.1007/978-3-642-21551-3_24.
Pitombo C. S., Gomes M. M., (2014)."Study of Work-Travel Related Behavior Using Principal Component Analysis", Open Journal of Statistics, 4, 889-901. doi.org/10.4236/ojs.2014.411084
Sarkar J., Saha S., Agrawal S., (2014). "An Efficient Use of Principal Component Analysis in Workload Characterization-A Study, AASRI Conference on Sports Engineering and Computer Science (SECS 2014), AASRI Procedia 8 (2014) 68 – 74, doi: 10.1016/j.aasri.2014.08.012
Sehgal S., Singh H., Agarwal M., Bhasker V., Shantanu, (2014)."Data analysis using principal component analysis," International Conference on Medical Imaging, m-Health and Emerging Commun. Systems, Greater Noida, 2014, pp. 45-48. doi.10.1109/MedCom.2014.7005973
Sharifzadeh S., Ghodsi A., Clemmensen L. H., Ersbøll B. K., (2017)."Sparse supervised principal component analysis (SSPCA) for dimension reduction and variable selection", Engineering Applications of Artificial Intelligence, Volume 65, Pages 168-177, ISSN 0952-1976, https://doi.org/10.1016/j.engappai.2017.07.004.
Sumiran, K. (2018). "An Overview of Data Mining Techniques and Their Application in Industrial Engineering". Asian Journal of Applied Science and Technology (AJAST) (Open Access Quarterly International Journal) Volume 2, Issue 2, Pages 947-953, 2018
TIBCO Product Documentation. (2020) “Principal Component Analysis (PCA) and Partial Least Squares (PLS) Technical Notes”
Vajčnerová I., Šácha J., Ryglová K., Žiaran P.,(2016). "Using The Cluster Analysis And The Principal Component Analysis In Evaluating The Quality Of A Destination", Acta Universitatis Agriculturae Et Silviculturae Mendelianae Brunensis, Vol. 64, No. 2, doi.org/10.11118/actaun201664020677
Varghese N., Verghese V., Gayathri P., Jaisankar N., (2012)."A Survey Of Dimensionality Reduction And Classification Methods", International Journal of Computer Science & Engineering Survey (IJCSES) Vol.3, No.3.
Vidhyavathi R., (2017)."Principal Component Analysis (Pca) In Medical Image Processing Using Digital Imaging And Communications In Medicine (Dicom) Medical Images", International Journal of Pharma and Bio Sciences; 8(2): (B) 598-606 ISSN 0975-6299, doi.org/10.22376/ijpbs.2017.8.2.b.598-606
Wang Z., Sun Y., Li P., (2014)."Functional Principal Components Analysis of Shanghai Stock Exchange 50 Index", Hindawi Publishing Corporation Discrete Dynamics in Nature and Society, Article ID 365204, pp. 7 doi.org/10.1155/2014/365204
Washizawa, Y. (2009). "Subset kernel PCA for pattern recognition". 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops Computer Vision Workshops (ICCV Workshops), 2009 IEEE 12th International Conference on. :162-169 Sep, 2009
Zhang, P., Gao, W. And Liu, G. (2018) "Feature selection considering weighted relevancy", Applied Intelligence, 48(12), p. 4615. Available at: http://search.ebscohost.com/login.aspx?direct=true&db=edb&AN=132904824&lang=tr&site=eds-live (Accessed: 14 January 2021).

ÇOK DEĞİŞKENLİ VERİ KÜMELERİNDE İLGİNÇ ÖRÜNTÜ TESPİTİ İÇİN BİLEŞEN ANALİZİ

Yıl 2021, , 1 - 11, 30.06.2021

Ahmet Yücel

https://doi.org/10.33461/uybisbbd.802938

Öz

Çağımızın yeni güç kaynağı haline gelen veri kavramı üzerine, son yıllarda büyük gelişmeler elde edilmiştir. Hem kodlama hem de mekanik düzeyde ulaşılan yeni yöntem ve teknikler sayesinde, verinin aktarımı, depolanması ve işlenmesi konusunda muazzam hızlara ulaşılmıştır. Veri aktarımı ve depolama hızlarındaki gelişmeler, dijital platformlardaki en küçük bilgiyi dahi veri olarak depolamayı günlük hayatın doğal bir parçası haline getirmiştir. Aile fotoğraflarından sağlık verilerine, ticari kayıtlardan akademik yayınlara, Twitter'da paylaşılan bir yorumdan Youtube'da paylaşılan bir videoya kadar, hemen her alanda değişik boyutlarda veri anlık olarak depolanmaktadır. Depolanmış verinin içinde bulunan ilginç örüntüler ve açığa çıkarılmayı bekleyen bilgi, veri madenciliğinin temel hedeflerindendir. Veri madenciliği çalışmalarında, veri boyutunun büyüklüğü, karşılaşılan en yüyük sorunlardan biridir. Bu tarz verilerin yapısal hale getirilme süreçlerinin uzunluğu ve sonrasında oluşturulacak bir modelin çalıştırılması sırasında yaşanabilecek sıkışmalar, büyük boyutlu verilerde karşılaşılan sorunlardan bazılarındır. Büyük veri boyutundan kaynaklanan problemlerin üstesinden gelebilmek için birçok boyut indirgeme algoritması geliştirilmiştir. Bu çalışmada, çok değişkenli bir veri üzerine, yeni bir boyut indirgeme yaklaşımı geliştirilmiştir. Bu yaklaşım genel olarak Temel Bileşen Analizine (TBA) dayalı örüntü tanıma adımlarından oluşur. Oluşturulan modeller, birbirlerinden ayrık ve dengeli alt veri kümelerine uygulanmış ve tümü 0.05 anlamlılık düzeyinde anlamlı sonuçlar göstermiştir. Modellerin açıklayıcı performansları; Çoklu R-Kare ölçeğinde [0.819, 0.888]aralığında, ve R-Kare ölçeğinde [0.804, 0.878] aralığında gerçekleşmiştir.

Anahtar Kelimeler

Temel Bileşen Analizi, Örüntü Tanıma, Çok Değişkenli Veri Analizi

Kaynakça

Abdi H., Williams L. J., (2010)."Principal component analysis", Volume 2, John Wiley & Son s, In c. doi. 10.1002/wics.101
Ahmed, M. R., Tahid, S. T. I., Mitu, N. A., Kundu, P., & Yeasmin, S. (2020, July). A Comprehensive Analysis on Undergraduate Student Academic Performance using Feature Selection Techniques on Classification Algorithms. In 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT) (pp. 1-6). IEEE. doi: 10.1109/ICCCNT49239.2020.9225341.
Brownlee, J., (2018). “How to Calculate Principal Component Analysis (PCA) from Scratch in Python”.
Chandrashekar G., Sahin F., (2014)."A survey on feature selection methods", Computers and Electrical Engineering Vol. 40, Issue 1, pp. 16-28. doi.org/10.1016/j.compeleceng.2013.11.024
Chen, H. Yan, J. Zhang, G. Hong, H. Zhu, X. (2019) "Human target respiration pattern recognition based on vital-SAR-imaging". Asia-Pacific Microwave Conference Proceedings, APMC, Proceedings of the 2019 IEEE Asia-Pacific Microwave Conference, APMC 2019. (Asia-Pacific Microwave Conference Proceedings, APMC, December 2019, 2019-December:865-867)
Dash M., Liu H., (2003)."Consistency-based search in feature selection", Artificial Intelligence, Volume 151, Issues 1–2, Pages 155-176, doi.org/10.1016/S0004-3702(03)00079-1
De Reuver, M., Sørensen, C., Basole, R. (2017). "The Digital Platform: A Research Agenda". Journal of Information Technology. 33. 10.1057/s41265-016-0033-3.
Farahnaz G. M., Huthaifa A. Al_Issa. (2020). "Developing Machine Learning Model for Disambiguate Pattern Recognition on Social Media". 2020 International Conference on Computation, Automation and Knowledge Management (ICCAKM) Computation, Automation and Knowledge Management (ICCAKM), 2020 International Conference on. :547-551 Jan, 2020
Garg, A., Tai, K.. (2013). "Comparison of statistical and machine learning methods in modelling of data with multicollinearity". Int. J. of Modelling. 18. 295_312. 10.1504/IJMIC.2013.053535.
Kalra, B., Yadav, S., Chauhan, D. (2014). "A Review of Issues and Challenges with Big Data". 2. 97-101. International Journal of Computer Science and Information Technology Research. ISSN 2348-120X (online) Vol. 2, Issue 4, pp: (97-101), Month: October - December 2014
Madhavi B. Desai, S. V. Patel & Bhumi PrajapatI. (2016). "ANOVA and Fisher Criterion based Feature Selection for Lower Dimensional Universal Image Steganalysis". International Journal of Image Processing (IJIP), Volume (10) : Issue (3) : 2016 145.
Marta L., Mauro F. (2019) "Statistical analysis of proteomics data: A review on feature selection", Journal of Proteomics, Volume 198, 2019, Pages 18-26, ISSN 1874-3919, https://doi.org/10.1016/j.jprot.2018.12.004.
Musik, C. And Bogner, A. (2019) "Book title: Digitalization & society: A sociology of technology perspective on current trends in data, digital security and the internet", Österreichische Zeitschrift für Soziologie: Vierteljahresschrift der Österreichischen Gesellschaft für Soziologie, 44(Suppl 1), p. 1. doi: 10.1007/s11614-019-00344-5.
Müller, M., (2004). "Generalized Linear Models". 10.1007/978-3-642-21551-3_24.
Pitombo C. S., Gomes M. M., (2014)."Study of Work-Travel Related Behavior Using Principal Component Analysis", Open Journal of Statistics, 4, 889-901. doi.org/10.4236/ojs.2014.411084
Sarkar J., Saha S., Agrawal S., (2014). "An Efficient Use of Principal Component Analysis in Workload Characterization-A Study, AASRI Conference on Sports Engineering and Computer Science (SECS 2014), AASRI Procedia 8 (2014) 68 – 74, doi: 10.1016/j.aasri.2014.08.012
Sehgal S., Singh H., Agarwal M., Bhasker V., Shantanu, (2014)."Data analysis using principal component analysis," International Conference on Medical Imaging, m-Health and Emerging Commun. Systems, Greater Noida, 2014, pp. 45-48. doi.10.1109/MedCom.2014.7005973
Sharifzadeh S., Ghodsi A., Clemmensen L. H., Ersbøll B. K., (2017)."Sparse supervised principal component analysis (SSPCA) for dimension reduction and variable selection", Engineering Applications of Artificial Intelligence, Volume 65, Pages 168-177, ISSN 0952-1976, https://doi.org/10.1016/j.engappai.2017.07.004.
Sumiran, K. (2018). "An Overview of Data Mining Techniques and Their Application in Industrial Engineering". Asian Journal of Applied Science and Technology (AJAST) (Open Access Quarterly International Journal) Volume 2, Issue 2, Pages 947-953, 2018
TIBCO Product Documentation. (2020) “Principal Component Analysis (PCA) and Partial Least Squares (PLS) Technical Notes”
Vajčnerová I., Šácha J., Ryglová K., Žiaran P.,(2016). "Using The Cluster Analysis And The Principal Component Analysis In Evaluating The Quality Of A Destination", Acta Universitatis Agriculturae Et Silviculturae Mendelianae Brunensis, Vol. 64, No. 2, doi.org/10.11118/actaun201664020677
Varghese N., Verghese V., Gayathri P., Jaisankar N., (2012)."A Survey Of Dimensionality Reduction And Classification Methods", International Journal of Computer Science & Engineering Survey (IJCSES) Vol.3, No.3.
Vidhyavathi R., (2017)."Principal Component Analysis (Pca) In Medical Image Processing Using Digital Imaging And Communications In Medicine (Dicom) Medical Images", International Journal of Pharma and Bio Sciences; 8(2): (B) 598-606 ISSN 0975-6299, doi.org/10.22376/ijpbs.2017.8.2.b.598-606
Wang Z., Sun Y., Li P., (2014)."Functional Principal Components Analysis of Shanghai Stock Exchange 50 Index", Hindawi Publishing Corporation Discrete Dynamics in Nature and Society, Article ID 365204, pp. 7 doi.org/10.1155/2014/365204
Washizawa, Y. (2009). "Subset kernel PCA for pattern recognition". 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops Computer Vision Workshops (ICCV Workshops), 2009 IEEE 12th International Conference on. :162-169 Sep, 2009
Zhang, P., Gao, W. And Liu, G. (2018) "Feature selection considering weighted relevancy", Applied Intelligence, 48(12), p. 4615. Available at: http://search.ebscohost.com/login.aspx?direct=true&db=edb&AN=132904824&lang=tr&site=eds-live (Accessed: 14 January 2021).

Toplam 26 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Bilgisayar Yazılımı
Bölüm	Makaleler
Yazarlar	Ahmet Yücel 0000-0002-2364-9449
Yayımlanma Tarihi	30 Haziran 2021
Yayımlandığı Sayı	Yıl 2021

Kaynak Göster

APA	Yücel, A. (2021). COMPONENT ANALYSIS FOR INTERESTING PATTERN DETECTION IN MULTI-VARIABLE DATA SETS. International Journal of Management Information Systems and Computer Science, 5(1), 1-11. https://doi.org/10.33461/uybisbbd.802938

Makale Dosyaları

Tam Metin

Bu eser Creative Commons Atıf-GayriTicari 4.0 Uluslararası Lisansı ile lisanslanmıştır.