Research Article
BibTex RIS Cite

Yarı-gözetimli veri sınıflandırma için kolektif bir öğrenme yaklaşımı

Year 2018, Volume: 24 Issue: 5, 864 - 869, 12.10.2018

Abstract

Yarı-gözetimli veri
sınıflandırma, makine öğrenme ve veri madenciliğinde önemli bir çalışma
alanıdır çünkü az sayıda etiketli ve çok sayıda etiketsiz veri içeren veri
kümeleri ile ilgilenmektedir. Gerçek hayat veri kümelerinin çoğu bu özelliği
taşıdığından birçok araştırmacı bu alana ilgi duymaktadır. Bu makalede
yarı-gözetimli veri sınıflandırma problemlerinin çözümü için kolektif bir
yöntem önerilmiştir. Konuyu daha iyi anlamak için R1 de tanımlı veri
kümeleri oluşturup önerilen algoritmalar bu veri kümelerine uygulanmıştır.
Gelişkin tekniklerle karşılaştırma yapmak için en iyi bilinen WEKA makine
öğrenme programı kullanılmıştır. Çalışmalar UCI veri kümesi deposunda bulunan
gerçek hayat veri kümeleri üzerinde uygulanmıştır. 10 katlı çapraz geçerlilik
ölçütü kullanılarak elde edilen değerlendirme sonuçları tablolarda sunulmuştur.

References

  • Zhu X. “Semi-Supervised Learning Literature Survey”. University of Wisconsin, Madison, United States, Technical Report, 1530, 2008.
  • Hajmohammadi MS, Ibrahim R, Selamat A, Fujita H. “Combination of active learning and self-training for cross-lingual sentiment classification with density analysis of unlabelled samples”. Information Sciences, 317, 67-77, 2015.
  • Chinaei L. Active Learning with Semi-Supervised Support Vector Machines. Msc. Thesis, Waterloo University, Ontario, Canada, 2007.
  • Kanga P, Kimb D, Choc S. “Semi-supervised support vector regression based on self training with label uncertainty: An application to virtual metrology insemi conductor manufacturing”. Expert Systems With Applications, 51, 85-106, 2016.
  • Bruzzone L, Chi M, Marconcini M. “A novel transductive SVM for semisupervised classification of remote-sensing images”. IEEE Transactıons on Geoscience and Remote Sensing, 44(11), 3363-3373, 2006.
  • Ordin B. “Nonsmooth optimization algorithm for semi-supervised data classification”. Dynamics of Continuous, Discrete and Impulsive Systems Series B: Applications & Algorithms, 17, 741-749, 2010.
  • Zhou Z, Li M. “Semisupervised regression with cotraining-style algorithms”. Journal IEEE Transactions on Knowledge and Data Engineering Archive, 19(11), 1479-1493, 2007.
  • Zha ZJ, Mei T, Wang J, Wang Z, Hua XS. “Graph-based semi-supervised learning with multiple labels”. Journal of Visual Communication and Image Representation, 20 (2), 97–103, 2009.
  • Alpaydın E. Introduction To Machine Learning. 2nd ed. Cambridge, Massachusetts, London, England, The MIT Press, 2010.
  • Frank E, Hall MA, Witten IH. Data Mining: Practical Machine Learning Tools and Techniques. 4th ed.San Francisco, Morgan Kaufmann, 2016.
  • Bagirov AM, Rubinov AM, Soukhoroukova NV, Yearwood J. “Unsupervised and supervised data classification via nonsmooth and global optimization”. Top, 11(1), 1-75, 2003.
  • Bagirov AM, Mardaneh K. “Modified Global K-Means Algorithm for Clustering in Gene Expression Data Sets”. Workshop on Intelligent Systems for Bioinformatics 2006 (WISB 2006), Hobart, Australia, 4-9 December, 2006.
  • Irina R. “An emprical study of Naive Bayes classifier”. IJCAI Workshop on Emprical Methods in Artificial Intelligence, 223, 41-46, 2001.
  • Kiang MY. “A comparative assessment of classification methods”. Decision Support Systems, 35, 441-454, 2003
  • Press SJ, Wilson S. “Choosing between logistic regression and discriminant analysis”. Journal of the American Statistical Association, 73(364), 699-705, 1978.
  • Frank E, Wang Y, Inglis S, Holmes G, Witten IH. “Using model trees for classification”. Machine Learning, 32(1), 63-76, 1998.
  • Kohavi R. “The power of decision tables”. 8th European Conference on Machine Learning, Heraclion, Crete, Greece, 25-27 April 1995.
  • Quinlan R. C4.5: Programs for Machine Learning. San Mateo, CA, Morgan Kaufmann Publishers,1993.
  • Kohavi R. “A study of cross-validation and bootstrap for accuracy estimation and model selection”. Fourteenth International Joint Conference on Artificial Intelligence, Montreal, Quebec, Canada, 20-25 August 1995.
  • Garner SR. “Weka: The waikato environment for knowledge analysis”. Second New Zealand Computer Science Research Students Conference, Waikato, Hamilton, New Zealand, 18-21 April, 1995.
  • Zhou D, Bousquet O, Lal NT, Westor J, Schölkopf B. “Learning with local and global consistency”. Max Planck Institute for Biological Cybernetics, Tübingen, Germany, Technical Report, 112, June 2003.
  • Driessens K, Reuteman P, Pfahringer B, Leschi C. “ Using weighted nearest neighbor to benefit from unlabeled data”. 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Singapore, 9-12 April 2006.
  • Laorden C, Sanz B, Santas I, Galan-Garcia P, Bringas PG. “Collective classification for spam filtering”. 4th International Conference Computational Intelligence in Security for Information Systems(CISIS), Spain, 8-10 June, 2011.
  • Lichman M. “UCI Machine Learning Repository”, http://archieve.ics.uci.edu/ml/datasets.html (01.05.2017).

A collective learning approach for semi-supervised data classification

Year 2018, Volume: 24 Issue: 5, 864 - 869, 12.10.2018

Abstract

Semi-supervised
data classification is one of significant field of study in machine learning
and data mining since it deals with datasets which consists both a few labeled
and many unlabeled data. The researchers have interest in this field because in
real life most of the datasets have this feature. In this paper we suggest a
collective method for solving semi-supervised data classification problems.
Examples in R1 presented and solved to gain a clear understanding.
For comparison between state of art methods, well-known machine learning tool
WEKA is used. Experiments are made on real-world datasets provided in UCI
dataset repository. Results are shown in tables in terms of testing accuracies
by use of ten fold cross validation.

References

  • Zhu X. “Semi-Supervised Learning Literature Survey”. University of Wisconsin, Madison, United States, Technical Report, 1530, 2008.
  • Hajmohammadi MS, Ibrahim R, Selamat A, Fujita H. “Combination of active learning and self-training for cross-lingual sentiment classification with density analysis of unlabelled samples”. Information Sciences, 317, 67-77, 2015.
  • Chinaei L. Active Learning with Semi-Supervised Support Vector Machines. Msc. Thesis, Waterloo University, Ontario, Canada, 2007.
  • Kanga P, Kimb D, Choc S. “Semi-supervised support vector regression based on self training with label uncertainty: An application to virtual metrology insemi conductor manufacturing”. Expert Systems With Applications, 51, 85-106, 2016.
  • Bruzzone L, Chi M, Marconcini M. “A novel transductive SVM for semisupervised classification of remote-sensing images”. IEEE Transactıons on Geoscience and Remote Sensing, 44(11), 3363-3373, 2006.
  • Ordin B. “Nonsmooth optimization algorithm for semi-supervised data classification”. Dynamics of Continuous, Discrete and Impulsive Systems Series B: Applications & Algorithms, 17, 741-749, 2010.
  • Zhou Z, Li M. “Semisupervised regression with cotraining-style algorithms”. Journal IEEE Transactions on Knowledge and Data Engineering Archive, 19(11), 1479-1493, 2007.
  • Zha ZJ, Mei T, Wang J, Wang Z, Hua XS. “Graph-based semi-supervised learning with multiple labels”. Journal of Visual Communication and Image Representation, 20 (2), 97–103, 2009.
  • Alpaydın E. Introduction To Machine Learning. 2nd ed. Cambridge, Massachusetts, London, England, The MIT Press, 2010.
  • Frank E, Hall MA, Witten IH. Data Mining: Practical Machine Learning Tools and Techniques. 4th ed.San Francisco, Morgan Kaufmann, 2016.
  • Bagirov AM, Rubinov AM, Soukhoroukova NV, Yearwood J. “Unsupervised and supervised data classification via nonsmooth and global optimization”. Top, 11(1), 1-75, 2003.
  • Bagirov AM, Mardaneh K. “Modified Global K-Means Algorithm for Clustering in Gene Expression Data Sets”. Workshop on Intelligent Systems for Bioinformatics 2006 (WISB 2006), Hobart, Australia, 4-9 December, 2006.
  • Irina R. “An emprical study of Naive Bayes classifier”. IJCAI Workshop on Emprical Methods in Artificial Intelligence, 223, 41-46, 2001.
  • Kiang MY. “A comparative assessment of classification methods”. Decision Support Systems, 35, 441-454, 2003
  • Press SJ, Wilson S. “Choosing between logistic regression and discriminant analysis”. Journal of the American Statistical Association, 73(364), 699-705, 1978.
  • Frank E, Wang Y, Inglis S, Holmes G, Witten IH. “Using model trees for classification”. Machine Learning, 32(1), 63-76, 1998.
  • Kohavi R. “The power of decision tables”. 8th European Conference on Machine Learning, Heraclion, Crete, Greece, 25-27 April 1995.
  • Quinlan R. C4.5: Programs for Machine Learning. San Mateo, CA, Morgan Kaufmann Publishers,1993.
  • Kohavi R. “A study of cross-validation and bootstrap for accuracy estimation and model selection”. Fourteenth International Joint Conference on Artificial Intelligence, Montreal, Quebec, Canada, 20-25 August 1995.
  • Garner SR. “Weka: The waikato environment for knowledge analysis”. Second New Zealand Computer Science Research Students Conference, Waikato, Hamilton, New Zealand, 18-21 April, 1995.
  • Zhou D, Bousquet O, Lal NT, Westor J, Schölkopf B. “Learning with local and global consistency”. Max Planck Institute for Biological Cybernetics, Tübingen, Germany, Technical Report, 112, June 2003.
  • Driessens K, Reuteman P, Pfahringer B, Leschi C. “ Using weighted nearest neighbor to benefit from unlabeled data”. 10th Pacific-Asia Conference on Knowledge Discovery and Data Mining, Singapore, 9-12 April 2006.
  • Laorden C, Sanz B, Santas I, Galan-Garcia P, Bringas PG. “Collective classification for spam filtering”. 4th International Conference Computational Intelligence in Security for Information Systems(CISIS), Spain, 8-10 June, 2011.
  • Lichman M. “UCI Machine Learning Repository”, http://archieve.ics.uci.edu/ml/datasets.html (01.05.2017).
There are 24 citations in total.

Details

Primary Language English
Subjects Engineering
Journal Section Research Article
Authors

Nur Uylaş Satı 0000-0003-1553-9466

Publication Date October 12, 2018
Published in Issue Year 2018 Volume: 24 Issue: 5

Cite

APA Uylaş Satı, N. (2018). A collective learning approach for semi-supervised data classification. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, 24(5), 864-869.
AMA Uylaş Satı N. A collective learning approach for semi-supervised data classification. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi. October 2018;24(5):864-869.
Chicago Uylaş Satı, Nur. “A Collective Learning Approach for Semi-Supervised Data Classification”. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi 24, no. 5 (October 2018): 864-69.
EndNote Uylaş Satı N (October 1, 2018) A collective learning approach for semi-supervised data classification. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi 24 5 864–869.
IEEE N. Uylaş Satı, “A collective learning approach for semi-supervised data classification”, Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, vol. 24, no. 5, pp. 864–869, 2018.
ISNAD Uylaş Satı, Nur. “A Collective Learning Approach for Semi-Supervised Data Classification”. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi 24/5 (October 2018), 864-869.
JAMA Uylaş Satı N. A collective learning approach for semi-supervised data classification. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi. 2018;24:864–869.
MLA Uylaş Satı, Nur. “A Collective Learning Approach for Semi-Supervised Data Classification”. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, vol. 24, no. 5, 2018, pp. 864-9.
Vancouver Uylaş Satı N. A collective learning approach for semi-supervised data classification. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi. 2018;24(5):864-9.





Creative Commons Lisansı
Bu dergi Creative Commons Al 4.0 Uluslararası Lisansı ile lisanslanmıştır.