Evaluation of Group Homogeneity in Gaussian Mixture Models Using Combined Cluster and Discriminant Analysis
Year 2017,
Volume: 2 Issue: 1, 121 - 132, 01.07.2017
Ezgi Nazman
,
Semra Erbaş
Abstract
Cluster
analysis has been widely used in both data mining as unsupervised learning
method and in statistics as multivariate statistical method which reveals natural
groups underlying data set. However, determining the number
of homogeneous groups regarding
with finite mixture models which provides a natural representation of
heterogeneity due to pairwise overlap is a difficult process. In this study, Gaussian mixture
components which is one of finite mixture models are considered in terms of
group homogeneity. For this purpose, combined
cluster and linear discriminant analysis is compared with combined cluster and quadratic
discriminant anlysis in order to evaluate correctly classification rates of the
Gaussian mixture components and to determine whether further division of components
is nessessary to obtain homogeneous groups. The comparison has been carried out
by using a simulation study for 81 different scenarios and an illisturative
example is presented.
References
- [1] B. Everitt, S. Landau, M. Leese, 2001. Cluster Analysis, Arnold.
- [2] B. Everitt, S. Landau, M. Leese and D. Stahl, , 2011. Cluster Analysis Wiley.
- [3] J. P. Baudry, A. E. Raftery, G. Celeux, K. Lo, R. Gottardo, 2008. Combining Mixture Components for Clustering, Technical Report 540, University of Washington, Seattle.
- [4] D. L. Davies and D.W. Bouldin, 1979. A cluster seperation measure, IEEE Trans. Pattern Anal. Machine Intell., 1 (4), pp. 224-225.
- [5] E. B. Fowlkes and C. L. Mallows, 553-569. A method for comparing two hierarchical clusterings. Journal of the American Statistical Association, 78 (383) (1983), pp.
- [6] G. Chen, S. A. Jaradat and N. Banarjee, 2002. Evaluation and comparision of clustering algorithms in analyzing Es Cell gene expression data, Statistica Sinica, 12, pp. 241-262.
- [7] G. McLachlan, 2004. Discriminant analysis and statistical pattern recognition, John Wiley & Sons.
- [8] J. H. Ward, 1963. Hierarchical grouping to optimize an objective function, Journal of American Statistical Association, 58, pp. 236-244.
- [9] J. Dunn, 1974. Well seperated clusters and optimal fuzzy partitions, Journal of Cybernetics, 4, pp. 95-104.
- [10] J. Kovacs, S. Kovacs, N. Magyar, P. Tanos and I. G. Hatvani, 2014. Classification into homogeneous groups using combined cluster and discriminant analysis, Environmental Modelling & Software, 57, pp. 52-59.
- [11] J. Hartigan, 1975. Clustering Algorithms. John Wiley & Sons, Inc.
- [12] J. P. Baudry, A. E. Raftery, G. Celeux, K. Lo, R. Gottardo, 2010. Combining Mixture Components for Clustering, Journal of Computational and Graphical Statistics, 19 (2), pp. 332-353.
- [13] L. Hubert, 1974. Approximate evaluation techniques for the single-link and complete-link hierarchical clustering procedures, Journal of American Statistical Association, 69, pp. 698-704.
- [14] L. Hubert and P. Arabie, 1985. Comparing Partitions, Journal of Classification 2(1), pp.193-218.
- [15] M. M. R. El-Hanjouri and B.S. Hamad, 2015. Using cluster analysis and discriminant analysis methods in classification with application on standart of living family in Palestinian Areas, International Journal of Statistics, 5 (5), pp. 213-222.
- [16] O. A. Abbas, 2008. Comparision between data clustering algorithms, The International
- Arab Journal of Information Technology, 5 (3), pp. 320-325.
- [17] P. Jaccard, 1908. Nouvelles recherches sur la distribution florale, Bulletin dela Societe Vaudoise des Sciences Naturelles, 44 (163), pp. 223-270.
- [18] P. J. Rousseeuw, 1987. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, pp. 53-65.
- [19] R. B. Calinski and J. Harabasz, 1974. A dendrite method for cluster analysis, Communications in Statistics, 3, pp. 1-27.
- [21] R. A. Fisher, 1936. The use of multiple measurements in taxonomic problems.
- Annals of Eungenics, 7, pp. 179-188.
- [22] R. A. Johnson and D. W. Wichern, 1998. Applied Multivariate Analysis, Prentice Hall, Englewood Cliffs, New Jersey.
- [23] V. Melnykov, 2014. Merging mixture components for clustering through pairwise overlap, Journal of Computational and Graphical Statistics, Forthcoming.
- [24] V. Melnykov, W. C. Chen and R. Maitra, 2012. MixSim: An R package for simulating data to study performance of clustering algorithms, Journal of Statistical Software, 51 (12), pp.
Birleştirilmiş Kümeleme ve Diskriminant Analizi Kullanarak Gauss Karma Modellerde Grup Homojenliğinin Değerlendirilmesi
Year 2017,
Volume: 2 Issue: 1, 121 - 132, 01.07.2017
Ezgi Nazman
,
Semra Erbaş
Abstract
Kümeleme analizi hem gözetimsiz öğrenme olarak veri
madenciliğinde hem de veri seti altında yatan doğal grupları ortaya çıkaran çok
değişkenli istatistiksel bir yöntem olarak istatistikte yaygın olarak
kullanılmaktadır. Ancak, ikili örtüşmeden ötürü
doğal bir heterojenlik temsili ortaya çıkaran sonlu karma modellerine ilişkin
homojen grup sayısını belirlemek zor bir işlemdir. Bu çalışmada, grup
homojenliği açısından sonlu karma modellerden biri olan Gauss karma modelleri
ele alınmıştır. Bu amaçla, Gauss karma bileşenlerin doğru sınıflama oranlarını değerlendirmek ve homojen
grupları elde etmede bileşenlerin daha fazla bölünmesinin gerekli olup
olmadığına karar vermek için birleştirilmiş kümeleme ve lineer diskriminant
analizi ile birleştirilmiş kümeleme ve karesel diskriminant analizi
karşılaştırılmıştır.
References
- [1] B. Everitt, S. Landau, M. Leese, 2001. Cluster Analysis, Arnold.
- [2] B. Everitt, S. Landau, M. Leese and D. Stahl, , 2011. Cluster Analysis Wiley.
- [3] J. P. Baudry, A. E. Raftery, G. Celeux, K. Lo, R. Gottardo, 2008. Combining Mixture Components for Clustering, Technical Report 540, University of Washington, Seattle.
- [4] D. L. Davies and D.W. Bouldin, 1979. A cluster seperation measure, IEEE Trans. Pattern Anal. Machine Intell., 1 (4), pp. 224-225.
- [5] E. B. Fowlkes and C. L. Mallows, 553-569. A method for comparing two hierarchical clusterings. Journal of the American Statistical Association, 78 (383) (1983), pp.
- [6] G. Chen, S. A. Jaradat and N. Banarjee, 2002. Evaluation and comparision of clustering algorithms in analyzing Es Cell gene expression data, Statistica Sinica, 12, pp. 241-262.
- [7] G. McLachlan, 2004. Discriminant analysis and statistical pattern recognition, John Wiley & Sons.
- [8] J. H. Ward, 1963. Hierarchical grouping to optimize an objective function, Journal of American Statistical Association, 58, pp. 236-244.
- [9] J. Dunn, 1974. Well seperated clusters and optimal fuzzy partitions, Journal of Cybernetics, 4, pp. 95-104.
- [10] J. Kovacs, S. Kovacs, N. Magyar, P. Tanos and I. G. Hatvani, 2014. Classification into homogeneous groups using combined cluster and discriminant analysis, Environmental Modelling & Software, 57, pp. 52-59.
- [11] J. Hartigan, 1975. Clustering Algorithms. John Wiley & Sons, Inc.
- [12] J. P. Baudry, A. E. Raftery, G. Celeux, K. Lo, R. Gottardo, 2010. Combining Mixture Components for Clustering, Journal of Computational and Graphical Statistics, 19 (2), pp. 332-353.
- [13] L. Hubert, 1974. Approximate evaluation techniques for the single-link and complete-link hierarchical clustering procedures, Journal of American Statistical Association, 69, pp. 698-704.
- [14] L. Hubert and P. Arabie, 1985. Comparing Partitions, Journal of Classification 2(1), pp.193-218.
- [15] M. M. R. El-Hanjouri and B.S. Hamad, 2015. Using cluster analysis and discriminant analysis methods in classification with application on standart of living family in Palestinian Areas, International Journal of Statistics, 5 (5), pp. 213-222.
- [16] O. A. Abbas, 2008. Comparision between data clustering algorithms, The International
- Arab Journal of Information Technology, 5 (3), pp. 320-325.
- [17] P. Jaccard, 1908. Nouvelles recherches sur la distribution florale, Bulletin dela Societe Vaudoise des Sciences Naturelles, 44 (163), pp. 223-270.
- [18] P. J. Rousseeuw, 1987. Silhouettes: A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics, 20, pp. 53-65.
- [19] R. B. Calinski and J. Harabasz, 1974. A dendrite method for cluster analysis, Communications in Statistics, 3, pp. 1-27.
- [21] R. A. Fisher, 1936. The use of multiple measurements in taxonomic problems.
- Annals of Eungenics, 7, pp. 179-188.
- [22] R. A. Johnson and D. W. Wichern, 1998. Applied Multivariate Analysis, Prentice Hall, Englewood Cliffs, New Jersey.
- [23] V. Melnykov, 2014. Merging mixture components for clustering through pairwise overlap, Journal of Computational and Graphical Statistics, Forthcoming.
- [24] V. Melnykov, W. C. Chen and R. Maitra, 2012. MixSim: An R package for simulating data to study performance of clustering algorithms, Journal of Statistical Software, 51 (12), pp.