TY - JOUR T1 - K-Mean Clustering of Holstein Friesian Dairy Cattle using Genomic Breeding Values TT - K-Mean Clustering of Holstein Friesian Dairy Cattle using Genomic Breeding Values AU - Önder, Hasan AU - Hoşgönül, Buğra PY - 2025 DA - January Y2 - 2025 DO - 10.34248/bsengineering.1601851 JF - Black Sea Journal of Engineering and Science JO - BSJ Eng. Sci. PB - Karyay Karadeniz Yayımcılık Ve Organizasyon Ticaret Limited Şirketi WT - DergiPark SN - 2619-8991 SP - 263 EP - 267 VL - 8 IS - 1 LA - en AB - Clustering refers to algorithms to uncover such clusters in unlabeled data. Data points belonging to the same cluster exhibit similar features, whereas data points from different clusters are dissimilar to each other. The identification of such clusters leads to segmentation of data points into a number of distinct groups. In this study it was aimed to classify the 492 Holstein Friesian dairy cattle with determining the optimum number of clusters using the genomic breeding values (GBVs) calculated with 13250 SNPs using GBLUP for milk yield (kg), milk fat (%), milk protein (%), milk lactose (%), and milk dry matter (%). Results showed that the optimum number cluster was determined as two for the genomic breeding values. Determining the most appropriate number of clusters, it provides great convenience in the selection of breeding animals after determining the animals that can provide optimum efficiency in the herd or the animals that need to be eliminated from the existing herd. As a result, it can be said that the k-means method can be used successfully in clustering animals for genomic breeding values, but for this, at first, the optimum number of clusters must be determined. KW - K-mean clustering KW - Breeding value KW - Genomic selection KW - Dairy cattle N2 - Clustering refers to algorithms to uncover such clusters in unlabeled data. Data points belonging to the same cluster exhibit similar features, whereas data points from different clusters are dissimilar to each other. The identification of such clusters leads to segmentation of data points into a number of distinct groups. In this study it was aimed to classify the 492 Holstein Friesian dairy cattle with determining the optimum number of clusters using the genomic breeding values (GBVs) calculated with 13250 SNPs using GBLUP for milk yield (kg), milk fat (%), milk protein (%), milk lactose (%), and milk dry matter (%). Results showed that the optimum number cluster was determined as two for the genomic breeding values. Determining the most appropriate number of clusters, it provides great convenience in the selection of breeding animals after determining the animals that can provide optimum efficiency in the herd or the animals that need to be eliminated from the existing herd. As a result, it can be said that the k-means method can be used successfully in clustering animals for genomic breeding values, but for this, at first, the optimum number of clusters must be determined. CR - Cebeci Z, Yıldız F, Kayaalp GT. 2015. Choosing an optimal k in k-means clustering. 2. Ulusal Yönetim Bilişim Sistemleri Kongresi, October 8-10, Erzurum, Türkiye, pp: 231-242. CR - Çolak B, Durdağ Z, Erdoğmuş P. 2015. Automatic clustering with k-means. El-Cezeri J Sci Eng, 3(2): 315-323. CR - Doğan İ. 2002. Selection by Cluster Analysis. Turk J Vet Anim Sci, 26: 47-53. CR - Frades I, Matthiesen R. 2010. Overview on techniques in cluster analysis. In: Matthiesen R (eds) Bioinformatics Methods in Clinical Research. Methods in Molecular Biology, vol 593. Humana Press. https://doi.org/10.1007/978-1-60327-194-3_5 CR - Janos T, Natasa F, Marton S. 2021. Determining the type of Limousin candidate bulls by cluster analysis. Nat Resour Sust Devel, 11(1): 113-120. CR - Immink KAS, Cai K, Weber JH. 2018. Dynamic threshold detection based on Pearson distance detection. IEEE Transact Commun, 66(7): 2958-2965. CR - Kodinariya TM, Makwana PR. 2013. Review on determining number of cluster in k-means clustering. Int J Adv Res Comput Sci Manag Stud, 1(6): 90-95. CR - Kurnaz B, Önder H. 2021. Distance based regression models. II. International Applied Statistics Conference, June 29 – July 2, Tokat, Türkiye, pp: 120-126. CR - Na S, Xumin L, Yong G. 2010. Research on k-means clustering algorithm: An improved k-means clustering algorithm. Third International Symposium on Intelligent Information Technology and Security Informatics, April 22, Jian, China, pp: 63-67. CR - Önder H, Sitskowska B, Kurnaz B, Piwczynski D, Kolenda M, Sen U, Tırınk C, Çanga Boğa D. 2023. Multi-trait single-step genomic prediction for milk yield and milk components for Polish Holstein population. Animals, 13: 3070. https://doi.org/10.3390/ani13193070 UR - https://doi.org/10.34248/bsengineering.1601851 L1 - https://dergipark.org.tr/en/download/article-file/4442077 ER -