Genetik Algoritma İle Öznitelik Seçimi Yapılarak Yazılım Projelerinin Maliyet Tahmini

Şükran Ebren Kara; Rüya Şamlı

doi:10.31590/ejosat.994372

Research Article

Genetik Algoritma İle Öznitelik Seçimi Yapılarak Yazılım Projelerinin Maliyet Tahmini

Year 2021, , 985 - 994, 30.11.2021

Şükran Ebren Kara , Rüya Şamlı

https://doi.org/10.31590/ejosat.994372

Abstract

Bir yazılım projesinin tahmini maliyetini, yazılımı geliştirme döngüsünün başlarında yapabilmek proje yöneticisi için çok önemlidir. Projede ön görülemeyen belirsizlikler, zaman ve maliyet açısından proje yöneticisine sorunlar doğuracaktır. Yazılım maliyetinin doğru tahmin edilmesi bu gibi sorunları en aza indirmektedir. Literatürdeki çalışmalara bakıldığında, yazılım projelerinin maliyetinin çok farklı yöntemlerle tahmin edilmeye çalışıldığı görülmektedir. Bu çalışmanın amacı, bu yöntemler arasında sıklıkla kullanılan bir yöntem olarak ifade edilebilecek olan Genetik Algoritmalar kullanılarak veri setlerinde öznitelik seçiminin yazılım projelerinin maliyet tahminine etkisinin araştırılmasıdır.
Bu çalışmada yazılım projelerinin maliyet tahmini için, WEKA (Waikato Environment for Knowledge Analaysis – Bilgi Analizi için Waikato Ortamı) ortamında bulunan 8 farklı Makine Öğrenmesi algoritması ve Evrimsel Algoritma: Genetik Programlama varsayılan ayarlar ile iki şekilde çalıştırılmıştır. İlk olarak, PROMISE (Predictor Models in Software Engineering – Yazılım Mühendisliğinde Tahmin Modelleri) veri deposundan temin edilen ham veri setlerine (Albrecht, Finnish, Kemerer, Maxwell ve Miyazaki94) herhangi bir öz nitelik seçimi yapılmadan Makine Öğrenmesi algoritmaları uygulanarak yazılım maliyet tahmini gerçekleştirilmiştir. İkinci olarak, öncelikle veri setlerine Genetik Algoritma uygulanarak öznitelik seçimi yapılmıştır. Öznitelik seçimi ile ilgili alt küme çıkarıldıktan sonra veri setlerine Makine Öğrenmesi algoritmaları uygulanarak yazılım maliyet tahmini gerçekleştirilmiştir. Algoritmalar 10 kat çapraz doğrulama tekniği ile test edilmiş ve sonuçlar değerlendirilirken, hata oranları MAE (mean absolute error - ortalama mutlak hata), RAE (relative absolute error - bağıl mutlak hata) ve korelasyon katsayısı dikkate alınmıştır. Bulgular karşılaştırılıp performans değerleri analiz edildiğinde, Genetik Algoritma ile öznitellik seçimi yapılan veri setlerinden elde edilen tahmin sonuçlarının öznitelik seçimi yapılmadan elde edilen tahmin sonuçlarından daha iyi olduğu belirlenmiştir.

Keywords

Genetik Algoritmalar, Genetik Programlama, Makine Öğrenmesi, Yazılım Maliyet Tahmini, WEKA

References

Abe, S., Thawonmas, R. and Kobayashi, Y., 1998, Feature selection by analyzing class regions approximated by ellipsoids, IEEE Trans. On Systems, Man, and Cybernetics-Part C: Applications and Reviews, 28(2), 282–287.
Albrecht, A. J., Gaffney. J. E., 1983, Software function, source lines of code, and development effort prediction: a software science validation. IEEE Trans. Softw. Eng. 9, 6 (1983), 639–648. DOI:10.1109/TSE.1983.235271.
Ayyıldız, M., 2007, Yazılım Projeleri Ölçüm Sonuçları Veri Tabanının Oluşturulması ve Yeni Yazılım Projelerinin Maliyet Tahmininde Kullanımı, Doktora Tezi, Yıldız Teknik Üniversitesi, Fen Bilimleri Enstitüsü.
Başkeleş, B., Turhan, B., Bener, A., 2007, Software Effort Estimation Using Machine Learning Methods, Computer and information sciences, Ankara, IEEE, DOI:10.1109/ISCIS.2007.4456863.
Bishop, C. M., 2006, Pattern recognition and machine learning, Springer, New York.
Bosu, M.F., Macdonell, S.G., 2019, Experience: Quality Benchmarking of Datasets Used in Software Effort Estimation, ACM Journal of Data and Information Quality, 11(4), 1 – 38.
Budak, H., 2018, Özellik Seçim Yöntemleri ve Yeni Bir Yaklaşım, Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi, DOI: 10.19113/sdufbed.01653.
Burgess, C.J., Lefley, M., 2001, Can Genetic Programming Improve Software Effort Estimation? A Comparative Evaluation, Information and Software Technology, 43, 863 – 873.
Caudill, M., 1987, Neural networks primer, J. AI Expert, 2(12), 46 – 52.
Demirörs, O., 2011, Yazılım Kestirimi İçin Referans Veri Kümesi Ve Süreç Odaklı Bir Yöntem, 5. Ulusal Yazılım Projeleri Sempozyumu-UYMS.
Diri, B., 2014, Makine Öğrenmesine Giriş, Ders Notları, https://www.siskon.com.tr/dosya/PDF/Makale/Makina_Ogrenmesi.pdf, [Ziyaret Tarihi: 24.06.2021].
Ebren Kara, Ş., Şamlı, R., 2021, Yazılım Projelerinin Maliyet Tahmini için WEKA’da Makine Öğrenmesi Algoritmalarının Karşılaştırmalı Analizi, Avrupa Bilim ve Araştırma Dergisi, 23, 415 – 426.
Gupta, A., 2015, Classification Of Complex UCI Datasets Using Machine Learning And Evolutionary Algorithms, International Journal Of Scientific & Technology Research, 4(5), 85 – 94.
Güven Aydın, Z.B. 2021, Makine Öğrenmesi Yöntemleri İle Yazılım Hata Tahmini, Doktora Tezi, İstanbul Üniversitesi-Cerrahpaşa, Lisansüstü Eğitim Fakültesi.
Hall, Mark A., 1999, Correlation-based Feature Selection for Machine Learning, Doktora Tezi, University of Waikato, Department of Computer Science.
Huang, D., Chow, T. W. S., 2005, Efficiently searching the important input variables using Bayesian discriminant. IEEE Trans. on Circuits and Systems-I: Regular Papers, 52(4), 785 – 793.
Kaluza, B., 2016, Machine Learning in Java, Pact Publishing.
Kemerer, C.F., 1987, An Empirical Validation Of Software Cost Estimation Models. Commun. ACM 30, 5 (1987), 416–429.
Kitchenham B., Kansala. K., 1993, Inter-item correlations among function points. International Conference on Software Engineering. 229–238. Kubat, C., 2014, Matlab Yapay Zeka ve Mühendislik Uygulamaları, 2. Baskı, Pusula Yayınları, İstanbul, ISBN: 978-605-5106-12–6.
Maxwell, K., 2002, Applied Statistics for Software Managers, Prentice-Hall, Englewood Cliffs.
Mitchell, T. M., 1997, Machine Learning, McGraw-Hill and MIT Press.
Miyazaki, Y., Terakado, M., Ozaki,K., Nozaki. H., 1994, Robust Regression For Developing Software Estimation Models. J. Syst. Softw. 27, 13–16.
Moghaddam, S.A.V., 2014, Etkin Sınıflandırma İçin Genetik Algoritma Tabanlı Öznitelik Alt Küme Seçimi, Yüksek Lisans Tezi, Gazi Üniversitesi, Fen Bilimleri Enstitüsü.
Nabiyev, V. V., 2016, Yapay Zeka, 5. Baskı, Seçkin Yayınları, Ankara, ISBN: 978-975-02-3727-0.
Prabhakar, Dutta, M., 2013, Application Of Machine Learning Techniques For Predicting Software Effort, Elixir Comp. Sci. & Engg., 56, 13677 – 13682.
Shan, Y., McKay, C.J., Essam, D.L., 2002, Software Project Effort Estimation Using Genetic Programming, International Conference on Communications Circuits and Systems.
Singh B.K., Misra, A.K., 2012, Software Effort Estimation by Genetic Algorithm Tuned Parameters of Modified Constructive Cost Model for NASA Software Projects, International Journal of Computer Applications, 59(9), DOI: 10.5120/9577-4053.
Soleimanian, F., Rezaii, R., Arasteh, B., 2015, A New Approach by Using Tabu Search and Genetic Algorithms in Software Cost Estimation, International Conference on Application of Information and Communication Technologies.
Tran, B., Xue, B., Zhang, M., 2015, Genetic programming for feature construction and selection in classification on high-dimensional data, Springer-Verlag Berlin Heidelberg: Regular Papers, 8, 3–15.
Wikipedia, 2020, Korelasyon, Korelasyon - Vikipedi (wikipedia.org), [Ziyaret Tarihi: 24.06.2021].
Yücalar, F., 2011, Use-Case Tabanlı Yazılım Emek Kestirim Modeli, Doktora Tezi, Trakya Üniversitesi, Fen Bilimleri Enstitüsü.

Cost Estimation of Software Projects by Feature Selection with Genetic Algorithm

Year 2021, , 985 - 994, 30.11.2021

Şükran Ebren Kara , Rüya Şamlı

https://doi.org/10.31590/ejosat.994372

Abstract

It is very important for the project manager to be able to estimate the cost of a software project early in the software development cycle. The project manager can reduce the uncertainties in the project by accurately estimating the project cost. Otherwise, serious economic problems will arise. Looking at the studies in the literature, it is seen that the cost of software projects has been tried to be estimated using very different methods. The aim of this study is to investigate the effect of feature selection with genetic algorithms on software cost estimation. In this study, 8 different Machine Learning algorithms in the WEKA (Waikato Environment for Knowledge Analaysis) environment and Evolutionary Algorithm: Genetic Programming were run in two ways with default settings for the cost estimation of software projects. First, software cost estimation was performed by applying Machine Learning algorithms to the raw data sets (Albrecht, Finnish, Kemerer, Maxwell and Miyazaki94) which optained from the PROMISE (Predictor Models in Software Engineering) data store without any feature selection. Secondly, feature selection was made by applying Genetic Algorithm to the datasets. After the subset of datasets was created by feature selection, Machine Learning algorithms were applied to the data sets and software cost estimation was realized. Algorithms were applied to datasets with 10-fold cross validation technique and results, performance criterion correlation coefficient, error rates mean absolute error (MAE) and relative absolute error (RAE). When the results were examined and the performance values were compared, it was determined that the estimation results obtained from the data sets with feature selection by Genetic Algorithm were better than the estimation results obtained without feature selection.

Keywords

Genetic Algorithms, Genetic Programming, Machine Learning, Software Cost Estimation, WEKA

References

Abe, S., Thawonmas, R. and Kobayashi, Y., 1998, Feature selection by analyzing class regions approximated by ellipsoids, IEEE Trans. On Systems, Man, and Cybernetics-Part C: Applications and Reviews, 28(2), 282–287.
Albrecht, A. J., Gaffney. J. E., 1983, Software function, source lines of code, and development effort prediction: a software science validation. IEEE Trans. Softw. Eng. 9, 6 (1983), 639–648. DOI:10.1109/TSE.1983.235271.
Ayyıldız, M., 2007, Yazılım Projeleri Ölçüm Sonuçları Veri Tabanının Oluşturulması ve Yeni Yazılım Projelerinin Maliyet Tahmininde Kullanımı, Doktora Tezi, Yıldız Teknik Üniversitesi, Fen Bilimleri Enstitüsü.
Başkeleş, B., Turhan, B., Bener, A., 2007, Software Effort Estimation Using Machine Learning Methods, Computer and information sciences, Ankara, IEEE, DOI:10.1109/ISCIS.2007.4456863.
Bishop, C. M., 2006, Pattern recognition and machine learning, Springer, New York.
Bosu, M.F., Macdonell, S.G., 2019, Experience: Quality Benchmarking of Datasets Used in Software Effort Estimation, ACM Journal of Data and Information Quality, 11(4), 1 – 38.
Budak, H., 2018, Özellik Seçim Yöntemleri ve Yeni Bir Yaklaşım, Süleyman Demirel Üniversitesi Fen Bilimleri Enstitüsü Dergisi, DOI: 10.19113/sdufbed.01653.
Burgess, C.J., Lefley, M., 2001, Can Genetic Programming Improve Software Effort Estimation? A Comparative Evaluation, Information and Software Technology, 43, 863 – 873.
Caudill, M., 1987, Neural networks primer, J. AI Expert, 2(12), 46 – 52.
Demirörs, O., 2011, Yazılım Kestirimi İçin Referans Veri Kümesi Ve Süreç Odaklı Bir Yöntem, 5. Ulusal Yazılım Projeleri Sempozyumu-UYMS.
Diri, B., 2014, Makine Öğrenmesine Giriş, Ders Notları, https://www.siskon.com.tr/dosya/PDF/Makale/Makina_Ogrenmesi.pdf, [Ziyaret Tarihi: 24.06.2021].
Ebren Kara, Ş., Şamlı, R., 2021, Yazılım Projelerinin Maliyet Tahmini için WEKA’da Makine Öğrenmesi Algoritmalarının Karşılaştırmalı Analizi, Avrupa Bilim ve Araştırma Dergisi, 23, 415 – 426.
Gupta, A., 2015, Classification Of Complex UCI Datasets Using Machine Learning And Evolutionary Algorithms, International Journal Of Scientific & Technology Research, 4(5), 85 – 94.
Güven Aydın, Z.B. 2021, Makine Öğrenmesi Yöntemleri İle Yazılım Hata Tahmini, Doktora Tezi, İstanbul Üniversitesi-Cerrahpaşa, Lisansüstü Eğitim Fakültesi.
Hall, Mark A., 1999, Correlation-based Feature Selection for Machine Learning, Doktora Tezi, University of Waikato, Department of Computer Science.
Huang, D., Chow, T. W. S., 2005, Efficiently searching the important input variables using Bayesian discriminant. IEEE Trans. on Circuits and Systems-I: Regular Papers, 52(4), 785 – 793.
Kaluza, B., 2016, Machine Learning in Java, Pact Publishing.
Kemerer, C.F., 1987, An Empirical Validation Of Software Cost Estimation Models. Commun. ACM 30, 5 (1987), 416–429.
Kitchenham B., Kansala. K., 1993, Inter-item correlations among function points. International Conference on Software Engineering. 229–238. Kubat, C., 2014, Matlab Yapay Zeka ve Mühendislik Uygulamaları, 2. Baskı, Pusula Yayınları, İstanbul, ISBN: 978-605-5106-12–6.
Maxwell, K., 2002, Applied Statistics for Software Managers, Prentice-Hall, Englewood Cliffs.
Mitchell, T. M., 1997, Machine Learning, McGraw-Hill and MIT Press.
Miyazaki, Y., Terakado, M., Ozaki,K., Nozaki. H., 1994, Robust Regression For Developing Software Estimation Models. J. Syst. Softw. 27, 13–16.
Moghaddam, S.A.V., 2014, Etkin Sınıflandırma İçin Genetik Algoritma Tabanlı Öznitelik Alt Küme Seçimi, Yüksek Lisans Tezi, Gazi Üniversitesi, Fen Bilimleri Enstitüsü.
Nabiyev, V. V., 2016, Yapay Zeka, 5. Baskı, Seçkin Yayınları, Ankara, ISBN: 978-975-02-3727-0.
Prabhakar, Dutta, M., 2013, Application Of Machine Learning Techniques For Predicting Software Effort, Elixir Comp. Sci. & Engg., 56, 13677 – 13682.
Shan, Y., McKay, C.J., Essam, D.L., 2002, Software Project Effort Estimation Using Genetic Programming, International Conference on Communications Circuits and Systems.
Singh B.K., Misra, A.K., 2012, Software Effort Estimation by Genetic Algorithm Tuned Parameters of Modified Constructive Cost Model for NASA Software Projects, International Journal of Computer Applications, 59(9), DOI: 10.5120/9577-4053.
Soleimanian, F., Rezaii, R., Arasteh, B., 2015, A New Approach by Using Tabu Search and Genetic Algorithms in Software Cost Estimation, International Conference on Application of Information and Communication Technologies.
Tran, B., Xue, B., Zhang, M., 2015, Genetic programming for feature construction and selection in classification on high-dimensional data, Springer-Verlag Berlin Heidelberg: Regular Papers, 8, 3–15.
Wikipedia, 2020, Korelasyon, Korelasyon - Vikipedi (wikipedia.org), [Ziyaret Tarihi: 24.06.2021].
Yücalar, F., 2011, Use-Case Tabanlı Yazılım Emek Kestirim Modeli, Doktora Tezi, Trakya Üniversitesi, Fen Bilimleri Enstitüsü.

There are 31 citations in total.

Details

Primary Language	Turkish
Subjects	Engineering
Journal Section	Articles
Authors	Şükran Ebren Kara 0000-0003-3071-6942 Rüya Şamlı 0000-0002-8723-1228
Publication Date	November 30, 2021
Published in Issue	Year 2021

Cite

APA	Ebren Kara, Ş., & Şamlı, R. (2021). Genetik Algoritma İle Öznitelik Seçimi Yapılarak Yazılım Projelerinin Maliyet Tahmini. Avrupa Bilim Ve Teknoloji Dergisi(27), 985-994. https://doi.org/10.31590/ejosat.994372

Article Files

Full Text