Ar-Ge projelerinin sınıflandırılması için doğal Türkçe dil işleme tabanlı yöntem
Yıl 2023,
, 1375 - 1388, 06.01.2023
Serdar Kocak
,
Yusuf Tansel İç
,
Mustafa Sert
,
Berna Dengiz
Öz
Birçok farklı sektörde metin halinde bulunan verilerden istenilen bilgilerin elde edilmesi için doğal dil işleme, metin madenciliği ve derin öğrenme yöntemleri kullanılmaktadır. Son zamanlarda artan Ar-Ge proje sayıları ve farklılaşan proje faaliyet alanları ile birlikte Ar-Ge projelerinin ait olduğu araştırma alanlarının belirlenmesi ve bu araştırma alanlarına uygun hakemlerin tespitinde yaşanan sıkıntılar nedeniyle projelerin desteklenme süreçleri olumsuz etkilenebilmektedir. Bu makalede, Ar-Ge projelerinin sınıflandırılması amacıyla öncelikli olarak çalışmanın gerçekleştirildiği veri tabanındaki veriler temizlenmiş ve doğal dil tekniklerinden biri olan “Word2Vec” kelime temsili yöntemi ile otomatik özellik öğrenme yaklaşımı kullanılarak özelliklerin sınıflandırılması amacıyla Evrişimsel Sinir Ağları (CNN-Convolutional Neural Network) modelleri oluşturulmaya çalışılmıştır. TUBİTAK Dergipark sitesinden seçilen ve sınıfları belli olan Ar-Ge projeleri ve Ar-Ge proje içeriğine sahip makalelerden oluşan veri kümesi üzerinde yapılan deneysel çalışmalardan elde edilen değerlendirme sonuçları ile diğer klasik algoritmalar karşılaştırılmış ve özellikle Word2Vec modellerine sahip CNN’lerin daha etkili sonuçları ürettiği birçok performans parametresi ile gösterilmiştir.
Kaynakça
- Türkiye Bilimsel Ve Teknolojik Araştırma Kurumu (TÜBİTAK) Faaliyet Raporları (2019), https://www.tubitak.gov.tr/sites/default/files/18842/tubitak_2019_yili_faaliyet_raporu.pdf sayfasından erişilmiştir, 2019.
- Kowsari K., Meimandi K.J., Heidarysafa M., Mendu S., Barnes L.E., Brown D.E., Text Classification Algorithms: A Survey, Information, 10, 2019.
- LeCun Y., Bengio Y., Hinton G., Deep learning, Nature 2015, 521, 436–444, 2015.
- Szczepaniak P.S., Tomczyk A., Pryczek M., Supervised Web Document Classification using Discrete Transforms, Active Hypercontours and Expert Knowledge, Lecture notes in Computer Science, 305–323, 1982.
- Sharma A.K., Chaurasia S., Srivastava D.K., Sentimental Short Sentences Classification by Using CNN Deep
Learning Model with Fine Tuned Word2Vec, Procedia Computer Science 167, 1139–1147, 2020.
- Mikolov T., Chen K., Corrado G., Dean J. Efficient Estimation of Word Representations in Vector Space, ArXiv Prepr ArXiv13013781, 2013.
- Kim Y., Convolutional Neural Networks for Sentence Classification, Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, 2014.
- Tian Q., Ma J., Liu O., A Hybrid Knowledge and Model System for R&D Project Selection, Expert Systems with Applications, 23 (3), 265-271, 2002.
- Cook W.D, Golany B., Kress M., Penn M., Optimal Allocation of Proposals to Reviewers to Facilitate Effective Ranking, Management Science, 51 (4), 655-661, 2005.
- Hettich S., Pazzani M.J., Mining for Proposal Reviewers: Lessons Learned at the National Science Foundation, Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Philadelphia, USA, 862-871, 2006.
- Choi C., Park Y., R&D Proposal Screening System Based On Text-Mining Approach, International Journal of Technology Intelligence and Planning, 2, 61-72, 2006.
- Sun Y.H., Ma J., Fan Z.P., Wang J., A Group Decision Support Approach to Evaluate Experts for R&D Project Selection, IEEE Transactions On Engineering Management, 55 (1), 158-170, 2008.
- Sun Y.H., Ma J., Fan Z.P., Wang J., A Hybrid Knowledge and Model Approach for Reviewer Assignment, Expert Systems with Applications, 34, 817–824, 2008.
- Fan Z.P., Chen Y., Ma J., Zhu Y., Decision Support For Proposal Grouping: A Hybrid Approach Using Knowledge Rule and Genetic Algorithm, Expert Systems with Applications, 36, 1004–1013, 2009.
- Liu O., Ma J., A Multilingual Ontology Framework for R&D Project Management Systems, Expert Systems with Applications, 37 (6), 4626-4631, 2010.
- Xu Y., Ma J., Sun Y., Hao G., Xu W., Zhao D., A Decision Support Approach for Assigning Reviewers to Proposals, Expert Systems with Applications, 37, 6948–6956, 2010.
- Ma J., Xu W., Sun Y.H., Turban E., Wang S., Liu O., An Ontology-Based Text-Mining Method to Cluster Proposals for Research Project Selection, IEEE Transactions on Systems Man and Cybernetics - Part A Systems and Humans, 42 (3), 784-790, 2012.
- Kaur P., Sapra R., Ontology Based Classıfıcatıon and Clusterıng of Research Proposals and External Research Revıewers, International Journal of Computers & Technology, 5 (1), 2277-3061, 2013.
- Preethi T., Lakshmi R., An Implementation of Clustering Project Proposals on Ontology based Text Mining Approach, Information Communication and Embedded Systems (ICICES), Chennai, India, 2013.
- Arunachalam N., Sathya E., Begum S.H., Makeswari M.U., An Ontology Based Text Mining Framework for R&D Project Selection, International Journal of Computer Science & Information Technology, 5 (1), 161-170, 2013.
- Gunjal S.N., Dange B.J., Brahamane A.V., A Novel Ontology based R&D Project Proposal Classification using Text Mining Approach, International Journal of Computer Applications, 108 (4), 23-28, 2014.
- Silva T., Jian M., Chen Y., Process Analytics Approach for R&D Project Selection, ACM Transactions on Management Information Systems (TMIS), 5 (4), 21, 2014.
- Chandre P., Vishe B., Vishe H., Lengule P., Shah A., Ontology in Text Mining To Cluster Research Project Proposals, International Journal of Emerging Technology and Advanced Engineering, 4 (4), 511-514, 2014.
- Madhuri T., Chaitali N., Swapnali G., Seema M., Kadu N.B., Project Paper Selectıon using Ontology based Text-Mining, Global Journal of Advanced Research, 2 (3), 595-602, 2015.
- Liu O., Wang J., Ma J., Sun Y., An Intelligent Decision Support Approach for Reviewer Assignment in R&D Project Selection, Computers in Industry, 76, 1-10, 2016.
- Xu Y.H., Zuo X.L., A LDA Model Based Text-Mining Method to Recommend Reviewer for Proposal of Research Project Selection, 2016 13th International Conference on Service Systems and Service Management (ICSSSM), Kunming, China, 2016.
- Jin J., Niu B., Ji P., Geng Q., An Integer Linear Programming Model of Reviewer Assignment with Research Interest Considerations, Annals of Operations Research, 1-25, 2018.
- Jang B., Kim I., Kim J.W., Word2vec Convolutional Neural Networks for Classification of News Articles and Tweets. Published: August 22, 2019, https://doi.org/10.1371/journal.pone.0220976, 2019.
- Acı Ç.İ., Çırak A., Türkçe Haber Metinlerinin Konvolüsyonel Sinir Ağları ve Word2Vec Kullanılarak Sınıflandırılması, International Journal of InformaticsTechnologies, 12 (3), 219-228, 2019.
- Ciresan, D. C., Meier, U., Masci, J., Maria Gambardella, L., & Schmidhuber, J., Flexible, High Performance Convolutional Neural Networks for Image Classification. In IJCAI Proceedings-International Joint Conference on Artificial Intelligence, 22 (1), 1237, 2011.
- Hinton, G. E., A Practical Guide to Training Restricted Boltzmann Machines, Tricks of the Trade, Lecture Notes in Computer Science, 7700, 599-619, Springer, Berlin, Heidelberg, Springer, Berlin, Heidelberg, 2012.
- Lin, M., Chen, Q., & Yan, S., Network in Network. arXiv preprint arXiv:1312.4400, 2013.
- Le, Q., Mikolov, T., Distributed Representations Of Sentences And Documents. In: Proc. ICML 2014, 32, 1188-1196, 2014.
- Mclean, N., Davis, J., Utilising Semantically Rich Big Data to Enhance Book Recommendation Engines. In: IEEE 18th Int. Conf. High Perform. Comput. Commun., 1434-1441, 2016.
- Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Distributed Representations of Words and Phrases and Their Compositionality. Proc. Adv. Neural Inf. Process. Syst. 3111, 311, 2013.
- Hwang, S., Shin, J., Extending Technological Trajectories to Latest Technological Changes by Overcoming Time Lags," Technological Forecasting And Social Change, Elsevier, 143, 142-153, 2019.