Research Article
BibTex RIS Cite
Year 2019, Volume: 8 Issue: 4, 1349 - 1362, 24.12.2019
https://doi.org/10.17798/bitlisfen.531221

Abstract

References

  • 1. Mikhina EK, Trifalenkov VI. Text clustering as graph community detection. Procedia Comput Sci. 2018;123:271-277. doi:10.1016/j.procs.2018.01.042
  • 2. Le Q, Mikolov T. Distributed Representations of Sentences and Documents. Beijing, China; 2014. Proceedings of the 31 st International Conference on Machine Learning.
  • 3. Jiang C, Coenen F, Sanderson R, Zito M. Text classification using graph mining-based feature extraction. 2009. doi:10.1016/j.knosys.2009.11.010
  • 4. Wan X. A novel document similarity measure based on earth mover’s distance. Inf Sci (Ny). 2007;177(18):3718-3730. doi:10.1016/J.INS.2007.02.045
  • 5. Zhao G, Luo B, Tang J, Ma J. Using Eigen-Decomposition Method for Weighted Graph Matching*. Vol 4681.; 2007. https://link.springer.com/content/pdf/10.1007/978-3-540-74171-8_131.pdf. Accessed December 20, 2018.
  • 6. Ma T, Shao W, Hao Y, Cao J. Graph classification based on graph set reconstruction and graph kernel feature reduction. Neurocomputing. 2018;296:33-45. doi:10.1016/J.NEUCOM.2018.03.029
  • 7. Slininger B. Fiedler’s Theory of Spectral Graph Partitioning. http://www.cs.berkeley.edu/~demmel/. Accessed December 20, 2018.
  • 8. Kılınç D. The Effect of Ensemble Learning Models on Turkish Text Classification. Celal Bayar Üniversitesi Fen Bilim Derg. 2016;12(2). doi:10.18466/cbujos.04526
  • 9. Kılınç D, Özçift A, Bozyigit F, Yıldırım P, Yücalar F, Borandag E. TTC-3600: A new benchmark dataset for Turkish text categorization. J Inf Sci. 2017;43(2):174-185. doi:10.1177/0165551515620551
  • 10. Shang T, Xia X, Zheng J. DEStech Transactions on Computer Science and Engineering. Vol 0.; 2018. http://www.dpi-proceedings.com/index.php/dtcse/article/view/24490/24122. Accessed January 24, 2019.
  • 11. Barrett W, Francis A, Webb B. Equitable Decompositions of Graphs with Symmetries. https://arxiv.org/pdf/1510.04366.pdf. Accessed December 20, 2018.
  • 12. Pothenf A, Simon HD, Liou K-P. PARTITIONING SPARSE MATRICES WITH EIGENVECTORS OF GRAPHS*. Vol 11.; 1990. https://www.cs.purdue.edu/homes/apothen/Papers/spectral-partition.pdf. Accessed December 20, 2018.
  • 13. Naumov M, Moon T. Parallel Spectral Graph Partitioning. https://research.nvidia.com/sites/default/files/pubs/2016-03_Parallel-Spectral-Graph/nvr-2016-001.pdf. Accessed December 20, 2018.
  • 14. Wang Q, Guo S, Hu J, Yang Y. Spectral partitioning and fuzzy C-means based clustering algorithm for big data wireless sensor networks. EURASIP J Wirel Commun Netw. 2018;2018:54. doi:10.1186/s13638-018-1067-8
  • 15. Alupoaie S, Cunningham P. Using Tf-Idf as an Edge Weighting Scheme in User-Object Bipartite Networks.; 2013. https://arxiv.org/pdf/1308.6118.pdf. Accessed January 7, 2019.
  • 16. Robertson S. A STATISTICAL INTERPRETATION OF TERM SPECIFICITY AND ITS APPLICATION IN RETRIEVAL. J Doc. 2004;60(5):521-523. doi:10.1108/00220410410560582
  • 17. Kim D, Seo D, Cho S, Kang P. Multi-co-training for document classification using various document representations: TF–IDF, LDA, and Doc2Vec. Inf Sci (Ny). 2019;477:15-29. doi:10.1016/j.ins.2018.10.006
  • 18. Weisstein EW. Laplacian Matrix. MathWorld. http://mathworld.wolfram.com/LaplacianMatrix.html. Accessed January 25, 2019.
  • 19. The Laplacian Matrix of a Graph. http://www.maths.nuigalway.ie/~rquinlan/linearalgebra/section3-1.pdf. Accessed January 25, 2019.
  • 20. Dhillon IS. Co-Clustering Documents and Words Using Bipartite Spectral Graph Partitioning.; 2001. http://www.cs.utexas.edu/users/inderjit/public_papers/kdd_bipartite.pdf. Accessed January 28, 2019.
  • 21. Lurie J. Lectures on Spectral Graph Theory. Vol 30.; 1999. doi:10.1145/568547.568553
  • 22. Kılınç D, Özçift A, Bozyigit F, Yıldırım P, Yücalar F, Borandag E. TTC-3600: A new benchmark dataset for Turkish text categorization. J Inf Sci. 2017;43(2):174-185. doi:10.1177/0165551515620551

Ağırlıklandırılmış Çizgelerde Tf-Idf ve Eigen Ayrışımı Kullanarak Metin Sınıflandırma

Year 2019, Volume: 8 Issue: 4, 1349 - 1362, 24.12.2019
https://doi.org/10.17798/bitlisfen.531221

Abstract

Günümüzde gerek metin gerekse cümle sınıflandırma
problemleri üzerinde yoğunlukla çalışılmaktadır. Metin sınıflandırma
işlemlerinde en önemli problemlerden biri sınıflandırılacak metinlerin yapısal
olmamasıdır. Belli bir formata sahip olmayan metinlerin öncelikle bir
önişlemden geçirilmesi gerekmektedir. Bu çalışmada metinleri sınıflandırma
işleminde öncelikle sınıflandırılacak metinlerin önişlemini yapmak amacıyla KUSH
(Karci-Uçkan-Seyyarer-Hark) adında bir ön işleme aracı geliştirildi. Sonrasında
elde edilen işlenmiş metinlerin sınıflandırılmasında çizge tabanlı matematiksel
bir yaklaşım sunulmaktadır. Yapılan çalışmada Türkiye’de iyi bilinen 6 haber
portalından ve 6 farklı alandan elde edilen metinleri içeren TTC-3600 veri seti
kullanılmaktadır. Sınıflandırılacak metinler Tf (Terim Frekansı) ve Idf (Ters doküman
Frekansı) değerleri dikkate alınarak çeşitli ön işlemlerden geçirildikten sonra
kenar ve düğümlerden oluşan bir ağırlıklı çizge oluşturulmaktadır.
Ağırlıklandırılmış çizgeler kullanılarak sınıflandırma işleminin etkililiği ve
matematiksel verimliliği arttırılmıştır. Elde edilen çizgeyi ifade eden komşuluk
matrisi ve Derece matrisi Kullanılarak Laplace matrisi elde edilmektedir.
Laplace matrisinin özdeğer ayrışımı sonucunda elde edilen öz değer ve öz değer
vektörleri ile metinler sınıflandırılmaktadır. Yapılan testler sonucunda
sınıflandırma oranlarında dikkate değer bir doğruluk değerine ulaşıldığı
görülmektedir.

References

  • 1. Mikhina EK, Trifalenkov VI. Text clustering as graph community detection. Procedia Comput Sci. 2018;123:271-277. doi:10.1016/j.procs.2018.01.042
  • 2. Le Q, Mikolov T. Distributed Representations of Sentences and Documents. Beijing, China; 2014. Proceedings of the 31 st International Conference on Machine Learning.
  • 3. Jiang C, Coenen F, Sanderson R, Zito M. Text classification using graph mining-based feature extraction. 2009. doi:10.1016/j.knosys.2009.11.010
  • 4. Wan X. A novel document similarity measure based on earth mover’s distance. Inf Sci (Ny). 2007;177(18):3718-3730. doi:10.1016/J.INS.2007.02.045
  • 5. Zhao G, Luo B, Tang J, Ma J. Using Eigen-Decomposition Method for Weighted Graph Matching*. Vol 4681.; 2007. https://link.springer.com/content/pdf/10.1007/978-3-540-74171-8_131.pdf. Accessed December 20, 2018.
  • 6. Ma T, Shao W, Hao Y, Cao J. Graph classification based on graph set reconstruction and graph kernel feature reduction. Neurocomputing. 2018;296:33-45. doi:10.1016/J.NEUCOM.2018.03.029
  • 7. Slininger B. Fiedler’s Theory of Spectral Graph Partitioning. http://www.cs.berkeley.edu/~demmel/. Accessed December 20, 2018.
  • 8. Kılınç D. The Effect of Ensemble Learning Models on Turkish Text Classification. Celal Bayar Üniversitesi Fen Bilim Derg. 2016;12(2). doi:10.18466/cbujos.04526
  • 9. Kılınç D, Özçift A, Bozyigit F, Yıldırım P, Yücalar F, Borandag E. TTC-3600: A new benchmark dataset for Turkish text categorization. J Inf Sci. 2017;43(2):174-185. doi:10.1177/0165551515620551
  • 10. Shang T, Xia X, Zheng J. DEStech Transactions on Computer Science and Engineering. Vol 0.; 2018. http://www.dpi-proceedings.com/index.php/dtcse/article/view/24490/24122. Accessed January 24, 2019.
  • 11. Barrett W, Francis A, Webb B. Equitable Decompositions of Graphs with Symmetries. https://arxiv.org/pdf/1510.04366.pdf. Accessed December 20, 2018.
  • 12. Pothenf A, Simon HD, Liou K-P. PARTITIONING SPARSE MATRICES WITH EIGENVECTORS OF GRAPHS*. Vol 11.; 1990. https://www.cs.purdue.edu/homes/apothen/Papers/spectral-partition.pdf. Accessed December 20, 2018.
  • 13. Naumov M, Moon T. Parallel Spectral Graph Partitioning. https://research.nvidia.com/sites/default/files/pubs/2016-03_Parallel-Spectral-Graph/nvr-2016-001.pdf. Accessed December 20, 2018.
  • 14. Wang Q, Guo S, Hu J, Yang Y. Spectral partitioning and fuzzy C-means based clustering algorithm for big data wireless sensor networks. EURASIP J Wirel Commun Netw. 2018;2018:54. doi:10.1186/s13638-018-1067-8
  • 15. Alupoaie S, Cunningham P. Using Tf-Idf as an Edge Weighting Scheme in User-Object Bipartite Networks.; 2013. https://arxiv.org/pdf/1308.6118.pdf. Accessed January 7, 2019.
  • 16. Robertson S. A STATISTICAL INTERPRETATION OF TERM SPECIFICITY AND ITS APPLICATION IN RETRIEVAL. J Doc. 2004;60(5):521-523. doi:10.1108/00220410410560582
  • 17. Kim D, Seo D, Cho S, Kang P. Multi-co-training for document classification using various document representations: TF–IDF, LDA, and Doc2Vec. Inf Sci (Ny). 2019;477:15-29. doi:10.1016/j.ins.2018.10.006
  • 18. Weisstein EW. Laplacian Matrix. MathWorld. http://mathworld.wolfram.com/LaplacianMatrix.html. Accessed January 25, 2019.
  • 19. The Laplacian Matrix of a Graph. http://www.maths.nuigalway.ie/~rquinlan/linearalgebra/section3-1.pdf. Accessed January 25, 2019.
  • 20. Dhillon IS. Co-Clustering Documents and Words Using Bipartite Spectral Graph Partitioning.; 2001. http://www.cs.utexas.edu/users/inderjit/public_papers/kdd_bipartite.pdf. Accessed January 28, 2019.
  • 21. Lurie J. Lectures on Spectral Graph Theory. Vol 30.; 1999. doi:10.1145/568547.568553
  • 22. Kılınç D, Özçift A, Bozyigit F, Yıldırım P, Yücalar F, Borandag E. TTC-3600: A new benchmark dataset for Turkish text categorization. J Inf Sci. 2017;43(2):174-185. doi:10.1177/0165551515620551
There are 22 citations in total.

Details

Primary Language Turkish
Journal Section Araştırma Makalesi
Authors

Taner Uçkan

Cengiz Hark This is me

Ebubekir Seyyarer This is me

Ali Karcı

Publication Date December 24, 2019
Submission Date February 22, 2019
Acceptance Date July 1, 2019
Published in Issue Year 2019 Volume: 8 Issue: 4

Cite

IEEE T. Uçkan, C. Hark, E. Seyyarer, and A. Karcı, “Ağırlıklandırılmış Çizgelerde Tf-Idf ve Eigen Ayrışımı Kullanarak Metin Sınıflandırma”, Bitlis Eren Üniversitesi Fen Bilimleri Dergisi, vol. 8, no. 4, pp. 1349–1362, 2019, doi: 10.17798/bitlisfen.531221.

Bitlis Eren University
Journal of Science Editor
Bitlis Eren University Graduate Institute
Bes Minare Mah. Ahmet Eren Bulvari, Merkez Kampus, 13000 BITLIS