Yazılım geliştirme taleplerinin metin madenciliği yöntemleriyle önceliklendirilmesi

Murat Can Tekin; Volkan Tunalı

Araştırma Makalesi

Yazılım geliştirme taleplerinin metin madenciliği yöntemleriyle önceliklendirilmesi

Yıl 2019, Cilt: 25 Sayı: 5, 615 - 620, 21.10.2019

Öz

Kurumsal
şirketlerde, yazılımlardaki hatalar ve değişiklik talepleri genellikle bir
talep yönetim sistemi üzerinden Bilgi Teknolojileri (BT) birimine iletilir. Bu
sistemde yer alan öncelik bilgisi BT birimi için kritik öneme sahiptir. Ancak,
talebi giren kişilerin inisiyatifine bırakılan öncelik kararı her zaman
gerçekçi olmamaktadır. Örneğin, kritik olmayan ve düşük öncelikli bir
değişiklik talebi yüksek öncelikli olarak girilebilmekte, bu da hatalı planlama
ve müşteri memnuniyetsizliği ile sonuçlanabilmektedir. Bu çalışmada, iç müşteri
talepleri metin madenciliği yöntemleriyle sınıflandırılarak taleplerin önem
derecesi tahmin edilmeye çalışılmıştır. Sistemin eğitimi ve testi için kurumsal
bir şirketin talep yönetim sisteminden alınan kayıtlar kullanılmıştır. Ham
metin formundaki talep verisi üzerinde temizlik ve önişleme işlemlerinin
ardından, doküman-terim matrisinin oluşturulmasında TF-IDF (Terim Frekansı –
Ters Doküman Frekansı) ağırlıklandırma yönteminden yararlanılmıştır.
Oluşturulan veri seti üzerinde çeşitli sınıflandırma algoritmaları test edilmiş
ve en yüksek başarım %54.1 F-Skoru ile Sequential Minimal Optimization
algoritmasıyla elde edilmiştir. Ayrıca, aşırı örnekleme yoluyla sınıfların
dengeli hale getirildiği veri seti üzerinde ise en yüksek başarıma %74.5
F-Skoru değeri ile Random Forest algoritmasıyla ulaşılmıştır.

Anahtar Kelimeler

Yazılım mühendisliği, Talep önceliklendirme, Yapay öğrenme, Metin sınıflandırma, Random forest

Kaynakça

Uddin J, Ghazali R, Deris MM, Naseem R, Shah H. "A survey on bug prioritization". Artificial Intelligence Review, 47(2), 145-180, 2017.
Tian Y, Lo D, Sun C. "Information Retrieval Based Nearest Neighbor Classification for Fine-Grained Bug Severity Prediction". 19th Working Conference on Reverse Engineering, Ontario, Canada, 15-18 October 2012.
Sharma M, Bedi P, Chaturvedi KK, Singh VB. "Predicting the priority of a reported bug using machine learning techniques and cross project validation". 12th International Conference on Intelligent Systems Design and Applications (ISDA), Kochi, India, 27-29 November 2012.
Sharma G, Sharma S, Gujral S. "A novel way of assessing software bug severity using dictionary of critical terms". Procedia Computer Science, 70, 632-639, 2015.
Zhang T, Chen J, Yang G, Lee B, Luo X. "Towards more accurate severity prediction and fixer recommendation of software bugs". Journal of Systems and Software, 117, 166-184, 2016.
Kanwal J, Maqbool O. "Bug prioritization to facilitate bug report triage". Journal of Computer Science and Technology, 27(2), 397-412, 2012.
Kaushik N, Amoui M, Tahvildari L, Liu W, Li S. "Defect Prioritization in the Software Industry: Challenges and Opportunities". IEEE 6th International Conference on Software Testing, Verification and Validation, Luxembourg, Luxembourg, 18-22 March 2013.
Alenezi M, Banitaan S. "Bug Reports Prioritization: Which Features and Classifier to Use?". 12th International Conference on Machine Learning and Applications, Florida, USA, 4-7 December 2013.
Yang C, Chen K, Kao W. "Improving severity prediction on software bug reports using quality indicators". IEEE 5th International Conference on Software Engineering and Service Science, Beijing, China, 27-29 June 2014.
Tian Y, Lo D, Xia X, Sun C. "Automated prediction of bug report priority using multi-factor analysis". Empirical Software Engineering, 20(5), 1354-1383, 2015.
Schütze H, Manning CD, Raghavan P. Introduction to Information Retrieval. New York, USA, Cambridge University Press, 2008.
Han J, Kamber M. Data Mining: Concepts and Techniques. 2nd ed. California, USA, Morgan Kaufmann Publishers, 2006.
Kibriya AM, Frank E, Pfahringer B, Holmes G. "Multinomial naive bayes for text categorization revisited". 17th Australian joint conference on Advances in Artificial Intelligence, Cairns, Australia, 4-6 December 2004.
Platt JC. Fast Training of Support Vector Machines Using Sequential Minimal Optimization. Editors: Bernhard S, Christopher JCB, Alexander JS. Advances in Kernel Methods, 185-208, Massachusetts, USA, MIT Press, 1999.
Breiman L. "Random forests". Machine Learning, 45(1), 5-32, 2001.
Rodriguez JJ, Kuncheva LI, Alonso CJ. "Rotation forest: A new classifier ensemble method". IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10), 1619-1630, 2006.
Rahman MM, Davis DN. "Addressing the class imbalance problem in medical datasets". International Journal of Machine Learning and Computing, 3(2), 224-228, 2013.
Tunalı V, Bilgin TT. "PRETO: A High-performance Text Mining Tool for Preprocessing Turkish Texts". International Conference on Computer Systems and Technologies, Ruse, Bulgaria, 22-23 June 2012.
Akın AA, Akın MD, "Zemberek, an open source NLP framework for Turkic Languages", 2007.
Eryiğit G, Adalı E. "An affix stripping morphological analyzer for Turkish". International Conference on Artificial Intelligence and Applications, Innsbruck, Austria, 16-18 February 2004.
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. "The WEKA data mining software: an update". SIGKDD Explorations, 11(1), 10-18, 2009.

Prioritization of software development demands with text mining techniques

Yıl 2019, Cilt: 25 Sayı: 5, 615 - 620, 21.10.2019

Murat Can Tekin Volkan Tunalı

Öz

In
corporations, software issues and software change demands are forwarded to the
Information Technology (IT) unit via a demand management system. The priority
information in this system has critical importance to the IT unit. However, the
priority decision that is left to the individuals who create the demand records
may not always be realistic. For instance, a non-critical and low-priority
demand may be created with the highest priority, and this may lead to faulty
planning and eventually to customer dissatisfaction. In this work, internal
customer demands were classified using text mining techniques and their
priorities were predicted. The system was trained and tested with the records
extracted from the demand management system of a corporation. After cleaning
and preprocessing the raw textual demand data, TF-IDF (Term Frequency – Inverse
Document Frequency) weighting scheme was used when creating the document-term
matrix. Several classification algorithms were tested on the data set
generated, and the highest performance was obtained by Sequential Minimal
Optimization algorithm with 54.1% F-Score. In addition, on the dataset made
balanced with oversampling technique, the highest performance was achieved by
Random Forest algorithm with 74.5% F-Score.

Anahtar Kelimeler

Software engineering, Demand prioritization, Machine learning, Text classification, Random forest

Kaynakça

Uddin J, Ghazali R, Deris MM, Naseem R, Shah H. "A survey on bug prioritization". Artificial Intelligence Review, 47(2), 145-180, 2017.
Tian Y, Lo D, Sun C. "Information Retrieval Based Nearest Neighbor Classification for Fine-Grained Bug Severity Prediction". 19th Working Conference on Reverse Engineering, Ontario, Canada, 15-18 October 2012.
Sharma M, Bedi P, Chaturvedi KK, Singh VB. "Predicting the priority of a reported bug using machine learning techniques and cross project validation". 12th International Conference on Intelligent Systems Design and Applications (ISDA), Kochi, India, 27-29 November 2012.
Sharma G, Sharma S, Gujral S. "A novel way of assessing software bug severity using dictionary of critical terms". Procedia Computer Science, 70, 632-639, 2015.
Zhang T, Chen J, Yang G, Lee B, Luo X. "Towards more accurate severity prediction and fixer recommendation of software bugs". Journal of Systems and Software, 117, 166-184, 2016.
Kanwal J, Maqbool O. "Bug prioritization to facilitate bug report triage". Journal of Computer Science and Technology, 27(2), 397-412, 2012.
Kaushik N, Amoui M, Tahvildari L, Liu W, Li S. "Defect Prioritization in the Software Industry: Challenges and Opportunities". IEEE 6th International Conference on Software Testing, Verification and Validation, Luxembourg, Luxembourg, 18-22 March 2013.
Alenezi M, Banitaan S. "Bug Reports Prioritization: Which Features and Classifier to Use?". 12th International Conference on Machine Learning and Applications, Florida, USA, 4-7 December 2013.
Yang C, Chen K, Kao W. "Improving severity prediction on software bug reports using quality indicators". IEEE 5th International Conference on Software Engineering and Service Science, Beijing, China, 27-29 June 2014.
Tian Y, Lo D, Xia X, Sun C. "Automated prediction of bug report priority using multi-factor analysis". Empirical Software Engineering, 20(5), 1354-1383, 2015.
Schütze H, Manning CD, Raghavan P. Introduction to Information Retrieval. New York, USA, Cambridge University Press, 2008.
Han J, Kamber M. Data Mining: Concepts and Techniques. 2nd ed. California, USA, Morgan Kaufmann Publishers, 2006.
Kibriya AM, Frank E, Pfahringer B, Holmes G. "Multinomial naive bayes for text categorization revisited". 17th Australian joint conference on Advances in Artificial Intelligence, Cairns, Australia, 4-6 December 2004.
Platt JC. Fast Training of Support Vector Machines Using Sequential Minimal Optimization. Editors: Bernhard S, Christopher JCB, Alexander JS. Advances in Kernel Methods, 185-208, Massachusetts, USA, MIT Press, 1999.
Breiman L. "Random forests". Machine Learning, 45(1), 5-32, 2001.
Rodriguez JJ, Kuncheva LI, Alonso CJ. "Rotation forest: A new classifier ensemble method". IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(10), 1619-1630, 2006.
Rahman MM, Davis DN. "Addressing the class imbalance problem in medical datasets". International Journal of Machine Learning and Computing, 3(2), 224-228, 2013.
Tunalı V, Bilgin TT. "PRETO: A High-performance Text Mining Tool for Preprocessing Turkish Texts". International Conference on Computer Systems and Technologies, Ruse, Bulgaria, 22-23 June 2012.
Akın AA, Akın MD, "Zemberek, an open source NLP framework for Turkic Languages", 2007.
Eryiğit G, Adalı E. "An affix stripping morphological analyzer for Turkish". International Conference on Artificial Intelligence and Applications, Innsbruck, Austria, 16-18 February 2004.
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. "The WEKA data mining software: an update". SIGKDD Explorations, 11(1), 10-18, 2009.

Toplam 21 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	Türkçe
Konular	Mühendislik
Bölüm	Makale
Yazarlar	Murat Can Tekin Bu kişi benim Volkan Tunalı
Yayımlanma Tarihi	21 Ekim 2019
Yayımlandığı Sayı	Yıl 2019 Cilt: 25 Sayı: 5

Kaynak Göster

APA	Tekin, M. C., & Tunalı, V. (2019). Yazılım geliştirme taleplerinin metin madenciliği yöntemleriyle önceliklendirilmesi. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, 25(5), 615-620.
AMA	Tekin MC, Tunalı V. Yazılım geliştirme taleplerinin metin madenciliği yöntemleriyle önceliklendirilmesi. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi. Ekim 2019;25(5):615-620.
Chicago	Tekin, Murat Can, ve Volkan Tunalı. “Yazılım geliştirme Taleplerinin Metin madenciliği yöntemleriyle önceliklendirilmesi”. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi 25, sy. 5 (Ekim 2019): 615-20.
EndNote	Tekin MC, Tunalı V (01 Ekim 2019) Yazılım geliştirme taleplerinin metin madenciliği yöntemleriyle önceliklendirilmesi. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi 25 5 615–620.
IEEE	M. C. Tekin ve V. Tunalı, “Yazılım geliştirme taleplerinin metin madenciliği yöntemleriyle önceliklendirilmesi”, Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, c. 25, sy. 5, ss. 615–620, 2019.
ISNAD	Tekin, Murat Can - Tunalı, Volkan. “Yazılım geliştirme Taleplerinin Metin madenciliği yöntemleriyle önceliklendirilmesi”. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi 25/5 (Ekim 2019), 615-620.
JAMA	Tekin MC, Tunalı V. Yazılım geliştirme taleplerinin metin madenciliği yöntemleriyle önceliklendirilmesi. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi. 2019;25:615–620.
MLA	Tekin, Murat Can ve Volkan Tunalı. “Yazılım geliştirme Taleplerinin Metin madenciliği yöntemleriyle önceliklendirilmesi”. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, c. 25, sy. 5, 2019, ss. 615-20.
Vancouver	Tekin MC, Tunalı V. Yazılım geliştirme taleplerinin metin madenciliği yöntemleriyle önceliklendirilmesi. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi. 2019;25(5):615-20.