Research Article
BibTex RIS Cite

DETECTING DUPLICATE SOFTWARE BUG RECORDS USING MACHINE LEARNING AND DEEP LEARNING METHODS

Year 2020, Volume: 8 Issue: 5, 45 - 51, 29.12.2020
https://doi.org/10.21923/jesd.826251

Abstract

Bugs that occur when a software does not work as expected while it is in development, maintenance or use, are reported by the technical team or end users. Reported bug records can point to the same error even if they are entered into the system in different ways by the people reporting the bug. Therefore, it is highly likely that a bug record to be reported already exists in the system. The developer who will fix the error requires a lot of effort to determine whether the relevant error record has been entered into the system before. An automatic detection mechanism is required to detect whether a bug to be entered into the system has already existed in the system. In this study, different models have been developed that detect duplicate bug records with machine learning and deep learning methods by using bug records for 3 different open source projects. In the study, the performances of machine learning algorithms and deep learning methods for the data sets used were examined comparatively and an ensemble method is proposed. Proposed ensemble method increased the accuracy by at least 7.2% compared to the singular mehods.

References

  • Alipour, A., Hindle, A., and Stroulia, E., 2013. A contextual approach towards more accurate duplicate bug report detection. In2013 10th Working Conference on Mining Software Repositories (MSR), pages 183–192.IEEE.
  • Anvik, J., Hiew, L., and Murphy, G. C., 2005. Coping with an open bug repository. In Proceedings of the 2005 OOPSLA workshop on Eclipse technology eXchange, pages 35–39.
  • Anvik, J., Hiew, L., and Murphy, G. C., 2006. Who should fix this bug? In Proceedings of the 28th international conference on Software engineering, pages 361–370.
  • Bettenburg, N., Premraj, R., Zimmermann, T., and Kim, S., 2008. Duplicate bug reports considered harmful. . . really? In2008 IEEE International Conference on Software Maintenance, pages 337–345. IEEE.
  • Buckley, C., Walz, J., Cardie, C., Mardis, S., Mitra, M., Pierce,D., and Wagstaff, K., 1998. The smart/empire tipster ir system. In TIPSTER TEXT PROGRAM PHASE III: Proceedings of a Workshop held at Baltimore, Maryland, October 13-15, 1998, pages 107–121.
  • Budhiraja, A., Dutta, K., Reddy, R., and Shrivastava, M., 2018a. Dwen: deep word embedding network for duplicate bug report detection in software repositories. In Proceedings of the 40th International Conference on Software Engineering: Companion Proceedings, pages 193–194.
  • Budhiraja, A., Dutta, K., Shrivastava, M., and Reddy, R., 2018b. Towards word embeddings for improved duplicate bug report retrieval in software repositories. In Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval, pages 167–170.
  • Budhiraja, A., Reddy, R., and Shrivastava, M., 2018c. Lwe: Lda refined word embeddings for duplicate bug report detection. In Proceedings of the 40th International Conference on Software Engineering: Companion Proceedings, pages 165–166.
  • Cavalcanti, Y. C., de Almeida, E. S., da Cunha, C. E. A.,Lucredio, D., and de Lemos Meira, S. R., 2010. An initial study on the bug report duplication problem. In2010 14th European Conference on Software Maintenance and Reengineering, pages 264–267. IEEE.
  • Cavalcanti, Y. C., Neto, P. A. d. M. S., Lucredio, D., Vale,T., de Almeida, E. S., and de Lemos Meira, S. R. (2013). The bug report duplication problem: an exploratory study. Software Quality Journal, 21(1):39–66.
  • Cer, D., Yang, Y., Kong, S.-y., Hua, N., Limtiaco, N., John, R. S., Constant, N., Guajardo-Cespedes, M., Yuan, S., Tar, C., et al., 2018. Universal sentence encoder. arXiv preprint arXiv:1803.11175.
  • Iyyer, M., Manjunatha, V., Boyd-Graber, J., and Daum ́e III, H., 2015. Deep unordered composition rivals syntactic methods for text classification .In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1681–1691.
  • Kukkar, A., et al., 2020. Duplicate Bug Report Detection and Classification System Based on Deep Learning Technique. IEEE Access, 8, 200749-200763.
  • Jalbert, N. and Weimer, W., 2008. Automated duplicate detection for bug tracking systems. IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN), pages 52–61. IEEE.
  • Lau, J. H. and Baldwin, T., 2016. An empirical evaluation of doc2vec with practical insights into document embedding generation. arXiv preprint arXiv:1607.05368.
  • Lientz, B. P. and Swanson, E. B., 1980. Software maintenance management. Addison-Wesley Longman Publishing Co., Inc.
  • He, J. et al 2020. Duplicate bug report detection using dual-channel convolutional neural networks. In Proceedings of the 28th International Conference on Program Comprehension, pages 117-227.
  • Mikolov, T., Chen, K., Corrado, G., and Dean, J., 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
  • Naumann, F. and Herschel, M., 2010. An introduction to duplicate detection. Synthesis Lectures on Data Management, 2(1):1–87.
  • Nguyen, A. T., Nguyen, T. T., Nguyen, T. N., Lo, D., and Sun,C., 2012. Duplicate bug report detection with a combination of information retrieval and topic modeling. In2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering, pages 70–79. IEEE.
  • Penny, G. et al., 2003. Software maintenance: concepts and practice. World Scientific.
  • Poddar, L., Neves, L., Brendel, W., Marujo, L., Tulyakov, S., and Karuturi, P., 2019. Train one get one free: Partially supervised neural network for bug report duplicate detection and clustering. arXiv preprint arXiv:1903.12431.
  • Porter, M. F. et al., 1980. An algorithm for suffix stripping. Program, 14(3):130–137.
  • Pressman, R. S., 2005. Software engineering: a practitioner’s approach. Palgrave macmillan.
  • Raghavan, V. V. and Wong, S. M., 1986. A critical analysis of vector space model for information retrieval. Journal of the American Society for information Science, 37(5):279–287.
  • Ramos, J. et al., 2003. Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning, volume 242, pages 133–142. Piscataway, NJ.
  • Reis, C. R. and de Mattos Fortes, R. P., 2002. An overview of the software engineering process and tools in the mozilla project. In Proceedings of the Open Source Software Development Workshop, pages 155–175.
  • Runeson, P., Alexandersson, M., and Nyholm, O., 2007. Detection of duplicate defect reports using natural language processing. In 29th International Conference on Software Engineering (ICSE’07), pages 499–510. IEEE.
  • Sun, C., Lo, D., Khoo, S.-C., and Jiang, J., 2011. Towards more accurate retrieval of duplicate bug reports. In 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011), pages 253–262. IEEE.
  • Sureka, A. and Jalote, P. (2010). Detecting duplicate bug report using character n-gram-based features. In 2010 Asia Pacific Software Engineering Conference, pages 366–374. IEEE.
  • Team, L., Bug repo Github: logpai, 2020.
  • Tian, Y., Sun, C., and Lo, D., 2012. Improved duplicate bug report identification. In 2012 16th European Conference on Software Maintenance and Reengineering, pages 385–390. IEEE.
  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L.,Gomez, A. N., Kaiser, L., and Polosukhin, I., 2017. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008.
  • Wang, X., Zhang, L., Xie, T., Anvik, J., and Sun, J., 2008. An approach to detecting duplicate bug reports using natural language and execution

YİNELENEN HATA KAYITLARININ MAKİNE ÖĞRENMESİ VE DERİN ÖĞRENME YÖNTEMLERİ İLE TESPİT EDİLMESİ

Year 2020, Volume: 8 Issue: 5, 45 - 51, 29.12.2020
https://doi.org/10.21923/jesd.826251

Abstract

Bir yazılım, geliştirme, bakım veya kullanım aşamasındayken beklenilen şekilde çalışmaması durumunda ortaya çıkan hatalar teknik ekip veya son kullanıcılar tarafından raporlanmaktadır. Raporlanan hata kayıtları, hatayı raporlayan kişiler tarafından farklı şekillerde sisteme girilse bile aynı hatayı işaret edebilir. Dolayısıyla, raporlanacak olan bir hata kaydının sistemde daha önceden bulunma ihtimali oldukça yüksektir. Hatayı düzeltecek olan geliştiricinin ilgili hata kaydının sisteme daha önce girilmiş olup olmadığını tespit etmesi oldukça yüksek çaba gerektirmektedir. Sisteme girilecek bir hatanın daha önce sistemde var olup olmadığını tespit etmek için otomatik bir tespit mekanizması gerekmektedir. Bu çalışmada, 3 farklı açık kaynak proje için hata kayıtları kullanılarak, yinelenen hata kayıtlarını makine öğrenmesi ve derin öğrenme yöntemleri ile tespit eden farklı modeller geliştirilmiştir. Çalışmada, kullanılan veri setleri için makine öğrenmesi algoritmalarının ve derin öğrenme yöntemlerinin başarımları karşılaştırmalı olarak incelenmiştir ve birleşik bir yöntem önerilmiştir. Önerilen birleşik yöntem tekil yöntemlere göre başarıyı en az %7.2 oranında artırmıştır.

References

  • Alipour, A., Hindle, A., and Stroulia, E., 2013. A contextual approach towards more accurate duplicate bug report detection. In2013 10th Working Conference on Mining Software Repositories (MSR), pages 183–192.IEEE.
  • Anvik, J., Hiew, L., and Murphy, G. C., 2005. Coping with an open bug repository. In Proceedings of the 2005 OOPSLA workshop on Eclipse technology eXchange, pages 35–39.
  • Anvik, J., Hiew, L., and Murphy, G. C., 2006. Who should fix this bug? In Proceedings of the 28th international conference on Software engineering, pages 361–370.
  • Bettenburg, N., Premraj, R., Zimmermann, T., and Kim, S., 2008. Duplicate bug reports considered harmful. . . really? In2008 IEEE International Conference on Software Maintenance, pages 337–345. IEEE.
  • Buckley, C., Walz, J., Cardie, C., Mardis, S., Mitra, M., Pierce,D., and Wagstaff, K., 1998. The smart/empire tipster ir system. In TIPSTER TEXT PROGRAM PHASE III: Proceedings of a Workshop held at Baltimore, Maryland, October 13-15, 1998, pages 107–121.
  • Budhiraja, A., Dutta, K., Reddy, R., and Shrivastava, M., 2018a. Dwen: deep word embedding network for duplicate bug report detection in software repositories. In Proceedings of the 40th International Conference on Software Engineering: Companion Proceedings, pages 193–194.
  • Budhiraja, A., Dutta, K., Shrivastava, M., and Reddy, R., 2018b. Towards word embeddings for improved duplicate bug report retrieval in software repositories. In Proceedings of the 2018 ACM SIGIR International Conference on Theory of Information Retrieval, pages 167–170.
  • Budhiraja, A., Reddy, R., and Shrivastava, M., 2018c. Lwe: Lda refined word embeddings for duplicate bug report detection. In Proceedings of the 40th International Conference on Software Engineering: Companion Proceedings, pages 165–166.
  • Cavalcanti, Y. C., de Almeida, E. S., da Cunha, C. E. A.,Lucredio, D., and de Lemos Meira, S. R., 2010. An initial study on the bug report duplication problem. In2010 14th European Conference on Software Maintenance and Reengineering, pages 264–267. IEEE.
  • Cavalcanti, Y. C., Neto, P. A. d. M. S., Lucredio, D., Vale,T., de Almeida, E. S., and de Lemos Meira, S. R. (2013). The bug report duplication problem: an exploratory study. Software Quality Journal, 21(1):39–66.
  • Cer, D., Yang, Y., Kong, S.-y., Hua, N., Limtiaco, N., John, R. S., Constant, N., Guajardo-Cespedes, M., Yuan, S., Tar, C., et al., 2018. Universal sentence encoder. arXiv preprint arXiv:1803.11175.
  • Iyyer, M., Manjunatha, V., Boyd-Graber, J., and Daum ́e III, H., 2015. Deep unordered composition rivals syntactic methods for text classification .In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), pages 1681–1691.
  • Kukkar, A., et al., 2020. Duplicate Bug Report Detection and Classification System Based on Deep Learning Technique. IEEE Access, 8, 200749-200763.
  • Jalbert, N. and Weimer, W., 2008. Automated duplicate detection for bug tracking systems. IEEE International Conference on Dependable Systems and Networks With FTCS and DCC (DSN), pages 52–61. IEEE.
  • Lau, J. H. and Baldwin, T., 2016. An empirical evaluation of doc2vec with practical insights into document embedding generation. arXiv preprint arXiv:1607.05368.
  • Lientz, B. P. and Swanson, E. B., 1980. Software maintenance management. Addison-Wesley Longman Publishing Co., Inc.
  • He, J. et al 2020. Duplicate bug report detection using dual-channel convolutional neural networks. In Proceedings of the 28th International Conference on Program Comprehension, pages 117-227.
  • Mikolov, T., Chen, K., Corrado, G., and Dean, J., 2013. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
  • Naumann, F. and Herschel, M., 2010. An introduction to duplicate detection. Synthesis Lectures on Data Management, 2(1):1–87.
  • Nguyen, A. T., Nguyen, T. T., Nguyen, T. N., Lo, D., and Sun,C., 2012. Duplicate bug report detection with a combination of information retrieval and topic modeling. In2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering, pages 70–79. IEEE.
  • Penny, G. et al., 2003. Software maintenance: concepts and practice. World Scientific.
  • Poddar, L., Neves, L., Brendel, W., Marujo, L., Tulyakov, S., and Karuturi, P., 2019. Train one get one free: Partially supervised neural network for bug report duplicate detection and clustering. arXiv preprint arXiv:1903.12431.
  • Porter, M. F. et al., 1980. An algorithm for suffix stripping. Program, 14(3):130–137.
  • Pressman, R. S., 2005. Software engineering: a practitioner’s approach. Palgrave macmillan.
  • Raghavan, V. V. and Wong, S. M., 1986. A critical analysis of vector space model for information retrieval. Journal of the American Society for information Science, 37(5):279–287.
  • Ramos, J. et al., 2003. Using tf-idf to determine word relevance in document queries. In Proceedings of the first instructional conference on machine learning, volume 242, pages 133–142. Piscataway, NJ.
  • Reis, C. R. and de Mattos Fortes, R. P., 2002. An overview of the software engineering process and tools in the mozilla project. In Proceedings of the Open Source Software Development Workshop, pages 155–175.
  • Runeson, P., Alexandersson, M., and Nyholm, O., 2007. Detection of duplicate defect reports using natural language processing. In 29th International Conference on Software Engineering (ICSE’07), pages 499–510. IEEE.
  • Sun, C., Lo, D., Khoo, S.-C., and Jiang, J., 2011. Towards more accurate retrieval of duplicate bug reports. In 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011), pages 253–262. IEEE.
  • Sureka, A. and Jalote, P. (2010). Detecting duplicate bug report using character n-gram-based features. In 2010 Asia Pacific Software Engineering Conference, pages 366–374. IEEE.
  • Team, L., Bug repo Github: logpai, 2020.
  • Tian, Y., Sun, C., and Lo, D., 2012. Improved duplicate bug report identification. In 2012 16th European Conference on Software Maintenance and Reengineering, pages 385–390. IEEE.
  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L.,Gomez, A. N., Kaiser, L., and Polosukhin, I., 2017. Attention is all you need. In Advances in neural information processing systems, pages 5998–6008.
  • Wang, X., Zhang, L., Xie, T., Anvik, J., and Sun, J., 2008. An approach to detecting duplicate bug reports using natural language and execution
There are 34 citations in total.

Details

Primary Language Turkish
Subjects Computer Software
Journal Section Research Articles
Authors

Azmi Yüksel 0000-0002-0698-4122

Doç. Dr. Aydın Çetin 0000-0002-8669-823X

Publication Date December 29, 2020
Submission Date November 15, 2020
Acceptance Date December 16, 2020
Published in Issue Year 2020 Volume: 8 Issue: 5

Cite

APA Yüksel, A., & Çetin, D. D. A. (2020). YİNELENEN HATA KAYITLARININ MAKİNE ÖĞRENMESİ VE DERİN ÖĞRENME YÖNTEMLERİ İLE TESPİT EDİLMESİ. Mühendislik Bilimleri Ve Tasarım Dergisi, 8(5), 45-51. https://doi.org/10.21923/jesd.826251