The firms which are specialized in hotel bookings generally have huge amounts of hotels with hundreds of features in their database. To be able to get the most meaningful insights from that data, it is vital to use the right machine learning techniques for segmenting those hotels into meaningful groups and finding their most important features. In this study, hotels data from Setur firm have been used for clustering, dimensionality reduction and feature selection analysis. Firstly, hotels were clustered by KMeans Clustering algorithm according to the similarity of their features. To see the effect of dimensionality reduction technique on the clustering process of hotels data, PCA(Principal Component Analysis) method was applied on hotels data and KMeans Clustering algorithm was applied to this processed data in order to observe the differences between the clustering results when PCA is applied and not applied. After that, multivariate and univariate feature selection techniques were applied to the clustered hotels data for identifying the most important features of hotels which have effect on clustering process. As a multivariate feature selection technique, Random Forest algorithm was used. For the univariate technique, SelectKBest algorithm with chi2 score function was used as a filter-based feature selection method.
Machine Learning, KMeans Clustering, Principal Component Analysis Elbow Method Random Forest
Birincil Dil | İngilizce |
---|---|
Konular | Yapay Zeka |
Bölüm | Araştırma Makaleleri |
Yazarlar | |
Yayımlanma Tarihi | 15 Haziran 2021 |
Gönderilme Tarihi | 22 Şubat 2021 |
Kabul Tarihi | 20 Mart 2021 |
Yayımlandığı Sayı | Yıl 2021 Cilt: 2 Sayı: 1 |