CLASSIFICATION OF SHORT-TEXTS BY UTILIZING AN EXTERNAL KNOWLEDGE SOURCE
Yıl 2017,
Cilt: 19 Sayı: 57, 693 - 700, 01.09.2017
Mert Calısan
C. Okan Sakar
Öz
The traditional text representation methods have been successfully applied to normal-length documents in many applications. However, these methods may worsen the performance of classifiers when used to represent short-texts in short-text classification tasks especially when the training set size is small. In this paper, we propose a method which is based on generating new feature representations of short-text by utilizing a knowledge base and using these representations together with the traditional feature representation method in classification problem. The experimental results show that using the proposed and traditional representations together improves the overall accuracy of the classifier
Kaynakça
- [1] Gupta, V., Lehal, G.S. 2009. A
Survey of Text Mining Techniques
and Applications, Journal of
Emerging Technologies in Web
Intelligence, Vol. 1(1), pp. 60-76.
- [2] Salton, G., McGill, M.J. 1986.
Introduction to Modern
Information Retrieval, New York:
McGraw-Hill, Inc., 440p.
- [3] Man, Y., 2014. Feature Extension
for Short Text Categorization
Using Frequent Term Sets.
Procedia Computer Science, Vol.
31, pp. 663-670.
- [4] Wang, B.K., Huang, Y.F., Yang,
W.X. and Li, X., 2012. Short Text
Classification based on Strong
Feature Thesaurus. Journal of
Zhejiang University-Science C,
Vol. 13(9), pp. 649-659.
- [5] Huang, A., Milne, D., Frank, E. and
Witten, I.H. 2008. Clustering
Documents with Active Learning
Using Wikipedia. Eighth IEEE
International Conference on Data
Mining, December 15-19, Pisa,
839-844.
- [6] Gabrilovich, E., Markovitch, S.
2006. Overcoming the Brittleness
Bottleneck Using Wikipedia:
Enhancing Text Categorization
with Encyclopedic Knowledge.
National Conference on Artificial
Intelligence (AAAI), AAAI Press,
July 16-17, Boston, 1301-1306.
- [7] Milne, D., Witten, I.H. 2008.
Learning to Link with Wikipedia.
17th ACM conference on
Information and knowledge
management, October 26-30, CA,
509-518.
- [8] Zhang, Z., Lin, H., Li, P., Wang, H.,
Lu, D. 2013. Improving SemiSupervised
Text Classification by
Using Wikipedia Knowledge.
International Conference on WebAge
Information Management,
June 14-16, Beidaihe, China, 25-
36.
- [9] Man, Y. 2014. Feature Extension
for Short-Text Categorization
Using Frequent Term Sets,
Procedia Computer Science, Vol.
31, pp.663-670
- [10] Poyraz, M., Ganiz, M.C., Akyokuş,
S., Görener, B., Kilimci, Z.H. 2012.
Exploiting Turkish Wikipedia as a
Semantic Resource for Text
Classification. International
Symposium on Innovations in
Intelligent Systems and
Applications (INISTA), Trabzon,
1-5
- [11] Dietterich, T.G. 2002. Ensemble
Learning, In the Handbook of
Brain Theory and Neural
Networks, Second edition, (M.A.
Arbib, Ed.), Cambridge, MA: The
MIT Press, pp.405-408
- [12] Ozturk, S., Sankur, B., Gungor, T.,
Yilmaz, M. B., Koroglu, B., Agin, O., … & Ahat, M. 2014. Turkish
Labeled Text Corpus. 22nd Signal
Processing and Communications
Applications Conference (SIU),
April 23-25, Trabzon, 1395-1398.
- [13] Amasyalı, M.F., Yıldırım, T. 2004.
Otomatik Haber Metinleri
Sınıflandırma. Signal Processing
and Communications
Applications Conference (SIU),
April 28-30, Aydın, 224-226.
- [14] Akın, A.A., Akın, M.D., 2007.
Zemberek, An Open Source NLP
Framework for Turkic Languages.
Structure, Vol. 10, pp.1-5.
- [15] Calisan, M. 2015. Multi-View
Short-Text Classification Using
Knowledge Bases. Bahçeşehir
University, Graduate School of
Natural and Applied Sciences,
Master Thesis, 50p, İstanbul.
- [16] Chang, C.C. Lin, C.J., 2011.
LIBSVM: A Library for Support
Vector Machines. ACM
Transactions on Intelligent
Systems and Technology (TIST),
Vol. 2(3), p.27.
KISA-METİNLERİN HARİCİ BİR BİLGİ KAYNAĞINDAN FAYDALANILARAK SINIFLANDIRILMASI
Yıl 2017,
Cilt: 19 Sayı: 57, 693 - 700, 01.09.2017
Mert Calısan
C. Okan Sakar
Öz
bu yöntemler kısa-metinlerin sınıflandırılması problemlerinde kullanıldığında, özellikle eğitim veri kümesinin az sayıda örnekten oluşması düşürebilmektedir. Bu makalede, bir bilgi tabanından faydalanarak kısa metinlerin yeni öznitelik temsillerinin oluşturulması ve geleneksel öznitelik temsil yöntemleri ile beraber sınıflandırma probleminde kullanılmasına dayanan bir yöntem önerilmiştir. Deney sonuçları önerilen ve geleneksel temsil yöntemlerinin birlikte kullanıldığında genel sınıflandırma başarısını artırdığını göstermektedir
Kaynakça
- [1] Gupta, V., Lehal, G.S. 2009. A
Survey of Text Mining Techniques
and Applications, Journal of
Emerging Technologies in Web
Intelligence, Vol. 1(1), pp. 60-76.
- [2] Salton, G., McGill, M.J. 1986.
Introduction to Modern
Information Retrieval, New York:
McGraw-Hill, Inc., 440p.
- [3] Man, Y., 2014. Feature Extension
for Short Text Categorization
Using Frequent Term Sets.
Procedia Computer Science, Vol.
31, pp. 663-670.
- [4] Wang, B.K., Huang, Y.F., Yang,
W.X. and Li, X., 2012. Short Text
Classification based on Strong
Feature Thesaurus. Journal of
Zhejiang University-Science C,
Vol. 13(9), pp. 649-659.
- [5] Huang, A., Milne, D., Frank, E. and
Witten, I.H. 2008. Clustering
Documents with Active Learning
Using Wikipedia. Eighth IEEE
International Conference on Data
Mining, December 15-19, Pisa,
839-844.
- [6] Gabrilovich, E., Markovitch, S.
2006. Overcoming the Brittleness
Bottleneck Using Wikipedia:
Enhancing Text Categorization
with Encyclopedic Knowledge.
National Conference on Artificial
Intelligence (AAAI), AAAI Press,
July 16-17, Boston, 1301-1306.
- [7] Milne, D., Witten, I.H. 2008.
Learning to Link with Wikipedia.
17th ACM conference on
Information and knowledge
management, October 26-30, CA,
509-518.
- [8] Zhang, Z., Lin, H., Li, P., Wang, H.,
Lu, D. 2013. Improving SemiSupervised
Text Classification by
Using Wikipedia Knowledge.
International Conference on WebAge
Information Management,
June 14-16, Beidaihe, China, 25-
36.
- [9] Man, Y. 2014. Feature Extension
for Short-Text Categorization
Using Frequent Term Sets,
Procedia Computer Science, Vol.
31, pp.663-670
- [10] Poyraz, M., Ganiz, M.C., Akyokuş,
S., Görener, B., Kilimci, Z.H. 2012.
Exploiting Turkish Wikipedia as a
Semantic Resource for Text
Classification. International
Symposium on Innovations in
Intelligent Systems and
Applications (INISTA), Trabzon,
1-5
- [11] Dietterich, T.G. 2002. Ensemble
Learning, In the Handbook of
Brain Theory and Neural
Networks, Second edition, (M.A.
Arbib, Ed.), Cambridge, MA: The
MIT Press, pp.405-408
- [12] Ozturk, S., Sankur, B., Gungor, T.,
Yilmaz, M. B., Koroglu, B., Agin, O., … & Ahat, M. 2014. Turkish
Labeled Text Corpus. 22nd Signal
Processing and Communications
Applications Conference (SIU),
April 23-25, Trabzon, 1395-1398.
- [13] Amasyalı, M.F., Yıldırım, T. 2004.
Otomatik Haber Metinleri
Sınıflandırma. Signal Processing
and Communications
Applications Conference (SIU),
April 28-30, Aydın, 224-226.
- [14] Akın, A.A., Akın, M.D., 2007.
Zemberek, An Open Source NLP
Framework for Turkic Languages.
Structure, Vol. 10, pp.1-5.
- [15] Calisan, M. 2015. Multi-View
Short-Text Classification Using
Knowledge Bases. Bahçeşehir
University, Graduate School of
Natural and Applied Sciences,
Master Thesis, 50p, İstanbul.
- [16] Chang, C.C. Lin, C.J., 2011.
LIBSVM: A Library for Support
Vector Machines. ACM
Transactions on Intelligent
Systems and Technology (TIST),
Vol. 2(3), p.27.