Research Article
BibTex RIS Cite

Yazılım Mühendisliğinde Sıralı Kaynak Kodu Modellerini Keşfetme

Year 2022, , 309 - 324, 31.01.2022
https://doi.org/10.29130/dubited.905510

Abstract

Kaynak kodlardaki sıralı örüntüleri keşfetmek yazılım mühendisliğinde önemli bir konudur, çünkü kod tamamlama, kodu yeniden düzenleme, geliştirici profili oluşturma, ve kod karmaşıklığı ölçümü gibi çeşitli işlemlerde yardımcı olacak yararlı bilgiler sağlayabilmektedir. Bu makale, bir yazılım projesinde sıkça geçen sıralı kuralları keşfeden ve Kaynak Kod Madencisi (SCodeMiner) adı verilen yeni bir yazılım çerçevesi önermektedir. Önerilen yazılım çerçevesi ilk olarak bir Java kaynak kodunu bir sıralı veri tabanına dönüştürür ve ardından bir sıralı örüntü madenciliği (SPM) algoritması uygular. Bu çalışma aynı zamanda, dört SPM algoritmasını çalışma süresi açısından karşılaştırması açısından da orijinaldir. Bu algoritmalar şunlardır: ön ek ile öngörülen sıralı örüntü madenciliği (PrefixSpan), denklik sınıflarını kullanarak sıralı örüntü keşfi (SPADE), çift yönlü uzatma (BIDE+), ve son pozisyon indüksiyonu (LAPIN). Açık kaynak kodlu bir yazılım projesi üzerinde gerçekleştirilen deneyler, önerilen SCodeMiner yazılım çerçevesinin kodlama örüntülerini belirlemede etkili bir madencilik aracı olduğunu göstermektedir.

References

  • [1] A. Agrawal, M. Alenezi, R. Kumar, and R. A. Khan, “Securing web applications through a framework of source code analysis,” J. Comput. Sci., vol. 15, no. 12, pp. 1780-1794, 2019.
  • [2] F. Ebert, F. Castor, N. Novielli, and A. Serebrenik, “An exploratory study on confusion in code reviews,” Empirical Software Eng., vol. 26, no. 12, pp. 1-48, 2021.
  • [3] S. Proksch, J. Lerch, and M. Mezini, “Intelligent code completion with Bayesian networks,” ACM Trans. Softw. Eng. Methodol., vol. 25, no. 1, pp. 1-31, 2015.
  • [4] M. M. Rahman, Y. Watanobe, K. Nakamura, and M. Bures, “A neural network based intelligent support model for program code completion,” Sci. Program., vol. 2020, pp. 1-18, 2020.
  • [5] L. Kaur and A. Mishra, “Cognitive complexity as a quantifier of version to version Java-based source code change: An empirical probe,” Inf. Softw. Technol, vol. 106, pp. 31-48, 2019.
  • [6] A. A. Abdelaal, S. Abed, M. Al-Shayeji, and M. Allaho, “Customized frequent patterns mining algorithms for enhanced Top-Rank-K frequent pattern mining,” Expert Syst. Appl., vol. 169, pp. 1-14, 2021.
  • [7] W. Gan, J. C.-W. Lin, P. Fournier-Viger, H.-C. Chao, and P. S. Yu, “A survey of parallel sequential pattern mining,” ACM Trans. Knowl. Discovery Data, vol. 13, no. 3, pp. 1-34, 2019.
  • [8] J. Pei et al., “Mining sequential patterns by pattern-growth: The PrefixSpan approach,” IEEE Trans. Knowl. Data Eng., vol. 16, no. 10, pp. 1-17, 2004.
  • [9] M. J. Zaki, “SPADE: An efficient algorithm for mining frequent sequences,” Mach. Learn., vol. 42, pp. 31-60, 2001.
  • [10] J. Wang and J. Han, “BIDE: Efficient mining of frequent closed sequences,” in Proc. 20th Int. Conf. on Data Eng., Boston, MA, USA, 2004, pp. 79-90.
  • [11] Z. Yang, Y. Wang, and M. Kitsuregawa, “LAPIN: Effective sequential pattern mining algorithms by last position induction for dense databases,” in 12th Int. Conf. on Database Syst. for Adv. Appl., Bangkok, Thailand, 2007, pp. 1020-1023.
  • [12] S. Cao, X. Sun, L. Bo, Y. Wei, and B. Li, "BGNN4VD: Constructing bidirectional graph neural-network for vulnerability detection," Inf. Softw. Technol., vol. 136, pp. 1-11, 2021.
  • [13] S. Jeon and H. K. Kim, "AutoVAS: An automated vulnerability analysis system with a deep learning approach," Comput. Secur., vol 106, pp. 1-24, 2021.
  • [14] C. D. Newman et al., “On the generation, structure, and semantics of grammar patterns in source code identifiers,” J. Syst. Softw., vol. 170, pp. 1-21, 2020.
  • [15] X. Li, L. Wang, Y. Xin, Y. Yang, and Y. Chen, "Automated vulnerability detection in source code using minimum intermediate representation learning," Appl. Sci., vol. 10, pp. 1-16, 2020.
  • [16] Y. Ueda, T. Ishio, A. Ihara, and K. Matsumoto, “Mining source code improvement patterns from similar code review works,” in IEEE 13th Int. Workshop on Softw. Clones, Hangzhou, China, Mar. 2019, pp. 13–19.
  • [17] Y. Fang, S. Han, C. Huang, and R. Wu, "TAP: A static analysis model for PHP vulnerabilities based on token and deep learning technology," Plos One, vol. 14, no 11, pp. 1-19, 2019.
  • [18] Y. Udagawa, “Maximal frequent sequence mining for finding software clones,” in Proc. of the 18th Int. Conf. on Inf. Integration and Web-based Appl. and Services, Singapore, Nov. 2016, pp. 26-33.
  • [19] H. Date, T. Ishio, M. Matsushita, and K. Inoue, “Analysis of coding patterns over software versions,” Inf. Media Technol., vol. 10, no. 2, pp. 226–232, 2015.
  • [20] R. J. Akbar, T. Omori, and K. Maruyama, “Mining API usage patterns by applying method categorization to improve code completion,” IEICE Trans. Inf. Syst., vol. E97.D, no. 5, pp. 1069–1083, May 2014.
  • [21] L. L. N. da Silva Junior, A. Plastino, and L. G. P. Murta, “What should I code now? ” J. Univers. Comput. Sci., vol. 20, no. 5, pp. 797-821, 2014.
  • [22] H. Takei and H. Yamana, “IC-BIDE: Intensity constraint-based closed sequential pattern mining for coding pattern extraction” in Proc. Int. Conf. on Adv. Inf. Networking and Appl., 2013, pp. 976-983.
  • [23] H. Date, T. Ishio, and K. Inoue, “Investigation of coding patterns over version history,” in 4th Int. Workshop on Empirical Softw. Eng. in Practice, Osaka, Japan, 2012, pp. 40-45.
  • [24] H. Kagdi, M. L. Collard, and J. I. Maletic, “An approach to mining call-usage patterns with syntactic context,” in ACM/IEEE Int. Conf. on Automated Softw. Eng., 2007, pp. 457-460.
  • [25] Y.-T. Kim, H.-T. Kong, and C.-S. Kim, “Analysis of characteristics and location of the appearance for codding pattern in the source code,” J. Digit. Policy Manag., vol. 11, no. 7, pp. 165-171, 2013.
  • [26] T. Ishio, H. Date, T. Miyake, and K. Inoue, “Mining coding patterns to detect crosscutting concerns in Java programs,” in Proc. Working Conf. on Reverse Eng., 2008, pp. 123–132.
  • [27] H. Tang, Y. Liu, and L. Wang, "A new algorithm of mining high utility sequential pattern in streaming data," Int. J. Computational Intell. Syst., vol. 12, no. 1, pp. 342–350, 2019.
  • [28] I. Matloob, S. A. Khan, and H. U. Rahman, "Sequence mining and prediction-based healthcare fraud detection methodology," IEEE Access, vol. 8, pp. 143256-143273, 2020.
  • [29] P. Fournier-Viger, J. C.-W. Lin, R. U. Kiran, Y. S. Koh, and R. Thomas, “A Survey of Sequential Pattern Mining,” Data Sci. Pattern Recognit., vol. 1, no. 1, pp. 54-77, 2017.
  • [30] A. Palacios, A. Martinez, L. Sanchez, I. Couso, "Sequential pattern mining applied to aeroengine condition monitoring with uncertain health data," Eng. Appl. Artif. Intell., vol. 44, pp. 10–24, 2015.
  • [31] P. Fournier-Viger et al., “The SPMF open-source data mining library version 2,” in European Conf. on Machine Learn. and Princ. and Practice of Knowl. Discovery in Databases, 2016, pp. 36-40.
  • [32] S. Lianglei, L. Yun, and Y. Jiang, "Multi-level sequential pattern mining based on prime encoding," Phys. Procedia, vol. 24, pp. 1749-1756, 2012.
  • [33] Y.-H. Hu, F. Wu, and Y.-J. Liao, "An efficient tree-based algorithm for mining sequential patterns with multiple minimum supports," J. Syst. Softw., vol. 86, pp. 1224-1238, 2013.

Discovering Sequential Source Code Patterns in Software Engineering

Year 2022, , 309 - 324, 31.01.2022
https://doi.org/10.29130/dubited.905510

Abstract

Discovering sequential patterns in source codes is an important issue in software engineering since it can provide useful knowledge to help in a variety of tasks such as code completion, code refactoring, developer profiling, and code complexity measurement. This paper proposes a new framework, called Source Code Miner (SCodeMiner), which discovers frequent sequential rules within a software project. The proposed framework firstly transforms a Java code into a sequence data and then applies a sequential pattern mining (SPM) algorithm. This study is also original in that it compares four SPM algorithms in terms of computational time, including sequential pattern discovery using equivalence classes (SPADE), prefix-projected sequential pattern mining (PrefixSpan), bi-directional extension (BIDE+), and last position induction (LAPIN). The experiments that carried out on an open-source software project showed that the proposed SCodeMiner framework is an effective mining tool in identifying coding patterns.

References

  • [1] A. Agrawal, M. Alenezi, R. Kumar, and R. A. Khan, “Securing web applications through a framework of source code analysis,” J. Comput. Sci., vol. 15, no. 12, pp. 1780-1794, 2019.
  • [2] F. Ebert, F. Castor, N. Novielli, and A. Serebrenik, “An exploratory study on confusion in code reviews,” Empirical Software Eng., vol. 26, no. 12, pp. 1-48, 2021.
  • [3] S. Proksch, J. Lerch, and M. Mezini, “Intelligent code completion with Bayesian networks,” ACM Trans. Softw. Eng. Methodol., vol. 25, no. 1, pp. 1-31, 2015.
  • [4] M. M. Rahman, Y. Watanobe, K. Nakamura, and M. Bures, “A neural network based intelligent support model for program code completion,” Sci. Program., vol. 2020, pp. 1-18, 2020.
  • [5] L. Kaur and A. Mishra, “Cognitive complexity as a quantifier of version to version Java-based source code change: An empirical probe,” Inf. Softw. Technol, vol. 106, pp. 31-48, 2019.
  • [6] A. A. Abdelaal, S. Abed, M. Al-Shayeji, and M. Allaho, “Customized frequent patterns mining algorithms for enhanced Top-Rank-K frequent pattern mining,” Expert Syst. Appl., vol. 169, pp. 1-14, 2021.
  • [7] W. Gan, J. C.-W. Lin, P. Fournier-Viger, H.-C. Chao, and P. S. Yu, “A survey of parallel sequential pattern mining,” ACM Trans. Knowl. Discovery Data, vol. 13, no. 3, pp. 1-34, 2019.
  • [8] J. Pei et al., “Mining sequential patterns by pattern-growth: The PrefixSpan approach,” IEEE Trans. Knowl. Data Eng., vol. 16, no. 10, pp. 1-17, 2004.
  • [9] M. J. Zaki, “SPADE: An efficient algorithm for mining frequent sequences,” Mach. Learn., vol. 42, pp. 31-60, 2001.
  • [10] J. Wang and J. Han, “BIDE: Efficient mining of frequent closed sequences,” in Proc. 20th Int. Conf. on Data Eng., Boston, MA, USA, 2004, pp. 79-90.
  • [11] Z. Yang, Y. Wang, and M. Kitsuregawa, “LAPIN: Effective sequential pattern mining algorithms by last position induction for dense databases,” in 12th Int. Conf. on Database Syst. for Adv. Appl., Bangkok, Thailand, 2007, pp. 1020-1023.
  • [12] S. Cao, X. Sun, L. Bo, Y. Wei, and B. Li, "BGNN4VD: Constructing bidirectional graph neural-network for vulnerability detection," Inf. Softw. Technol., vol. 136, pp. 1-11, 2021.
  • [13] S. Jeon and H. K. Kim, "AutoVAS: An automated vulnerability analysis system with a deep learning approach," Comput. Secur., vol 106, pp. 1-24, 2021.
  • [14] C. D. Newman et al., “On the generation, structure, and semantics of grammar patterns in source code identifiers,” J. Syst. Softw., vol. 170, pp. 1-21, 2020.
  • [15] X. Li, L. Wang, Y. Xin, Y. Yang, and Y. Chen, "Automated vulnerability detection in source code using minimum intermediate representation learning," Appl. Sci., vol. 10, pp. 1-16, 2020.
  • [16] Y. Ueda, T. Ishio, A. Ihara, and K. Matsumoto, “Mining source code improvement patterns from similar code review works,” in IEEE 13th Int. Workshop on Softw. Clones, Hangzhou, China, Mar. 2019, pp. 13–19.
  • [17] Y. Fang, S. Han, C. Huang, and R. Wu, "TAP: A static analysis model for PHP vulnerabilities based on token and deep learning technology," Plos One, vol. 14, no 11, pp. 1-19, 2019.
  • [18] Y. Udagawa, “Maximal frequent sequence mining for finding software clones,” in Proc. of the 18th Int. Conf. on Inf. Integration and Web-based Appl. and Services, Singapore, Nov. 2016, pp. 26-33.
  • [19] H. Date, T. Ishio, M. Matsushita, and K. Inoue, “Analysis of coding patterns over software versions,” Inf. Media Technol., vol. 10, no. 2, pp. 226–232, 2015.
  • [20] R. J. Akbar, T. Omori, and K. Maruyama, “Mining API usage patterns by applying method categorization to improve code completion,” IEICE Trans. Inf. Syst., vol. E97.D, no. 5, pp. 1069–1083, May 2014.
  • [21] L. L. N. da Silva Junior, A. Plastino, and L. G. P. Murta, “What should I code now? ” J. Univers. Comput. Sci., vol. 20, no. 5, pp. 797-821, 2014.
  • [22] H. Takei and H. Yamana, “IC-BIDE: Intensity constraint-based closed sequential pattern mining for coding pattern extraction” in Proc. Int. Conf. on Adv. Inf. Networking and Appl., 2013, pp. 976-983.
  • [23] H. Date, T. Ishio, and K. Inoue, “Investigation of coding patterns over version history,” in 4th Int. Workshop on Empirical Softw. Eng. in Practice, Osaka, Japan, 2012, pp. 40-45.
  • [24] H. Kagdi, M. L. Collard, and J. I. Maletic, “An approach to mining call-usage patterns with syntactic context,” in ACM/IEEE Int. Conf. on Automated Softw. Eng., 2007, pp. 457-460.
  • [25] Y.-T. Kim, H.-T. Kong, and C.-S. Kim, “Analysis of characteristics and location of the appearance for codding pattern in the source code,” J. Digit. Policy Manag., vol. 11, no. 7, pp. 165-171, 2013.
  • [26] T. Ishio, H. Date, T. Miyake, and K. Inoue, “Mining coding patterns to detect crosscutting concerns in Java programs,” in Proc. Working Conf. on Reverse Eng., 2008, pp. 123–132.
  • [27] H. Tang, Y. Liu, and L. Wang, "A new algorithm of mining high utility sequential pattern in streaming data," Int. J. Computational Intell. Syst., vol. 12, no. 1, pp. 342–350, 2019.
  • [28] I. Matloob, S. A. Khan, and H. U. Rahman, "Sequence mining and prediction-based healthcare fraud detection methodology," IEEE Access, vol. 8, pp. 143256-143273, 2020.
  • [29] P. Fournier-Viger, J. C.-W. Lin, R. U. Kiran, Y. S. Koh, and R. Thomas, “A Survey of Sequential Pattern Mining,” Data Sci. Pattern Recognit., vol. 1, no. 1, pp. 54-77, 2017.
  • [30] A. Palacios, A. Martinez, L. Sanchez, I. Couso, "Sequential pattern mining applied to aeroengine condition monitoring with uncertain health data," Eng. Appl. Artif. Intell., vol. 44, pp. 10–24, 2015.
  • [31] P. Fournier-Viger et al., “The SPMF open-source data mining library version 2,” in European Conf. on Machine Learn. and Princ. and Practice of Knowl. Discovery in Databases, 2016, pp. 36-40.
  • [32] S. Lianglei, L. Yun, and Y. Jiang, "Multi-level sequential pattern mining based on prime encoding," Phys. Procedia, vol. 24, pp. 1749-1756, 2012.
  • [33] Y.-H. Hu, F. Wu, and Y.-J. Liao, "An efficient tree-based algorithm for mining sequential patterns with multiple minimum supports," J. Syst. Softw., vol. 86, pp. 1224-1238, 2013.
There are 33 citations in total.

Details

Primary Language English
Subjects Engineering
Journal Section Articles
Authors

Kökten Birant 0000-0002-5107-6406

Dilara Kırnapcı 0000-0002-3630-9726

Publication Date January 31, 2022
Published in Issue Year 2022

Cite

APA Birant, K., & Kırnapcı, D. (2022). Discovering Sequential Source Code Patterns in Software Engineering. Duzce University Journal of Science and Technology, 10(1), 309-324. https://doi.org/10.29130/dubited.905510
AMA Birant K, Kırnapcı D. Discovering Sequential Source Code Patterns in Software Engineering. DÜBİTED. January 2022;10(1):309-324. doi:10.29130/dubited.905510
Chicago Birant, Kökten, and Dilara Kırnapcı. “Discovering Sequential Source Code Patterns in Software Engineering”. Duzce University Journal of Science and Technology 10, no. 1 (January 2022): 309-24. https://doi.org/10.29130/dubited.905510.
EndNote Birant K, Kırnapcı D (January 1, 2022) Discovering Sequential Source Code Patterns in Software Engineering. Duzce University Journal of Science and Technology 10 1 309–324.
IEEE K. Birant and D. Kırnapcı, “Discovering Sequential Source Code Patterns in Software Engineering”, DÜBİTED, vol. 10, no. 1, pp. 309–324, 2022, doi: 10.29130/dubited.905510.
ISNAD Birant, Kökten - Kırnapcı, Dilara. “Discovering Sequential Source Code Patterns in Software Engineering”. Duzce University Journal of Science and Technology 10/1 (January 2022), 309-324. https://doi.org/10.29130/dubited.905510.
JAMA Birant K, Kırnapcı D. Discovering Sequential Source Code Patterns in Software Engineering. DÜBİTED. 2022;10:309–324.
MLA Birant, Kökten and Dilara Kırnapcı. “Discovering Sequential Source Code Patterns in Software Engineering”. Duzce University Journal of Science and Technology, vol. 10, no. 1, 2022, pp. 309-24, doi:10.29130/dubited.905510.
Vancouver Birant K, Kırnapcı D. Discovering Sequential Source Code Patterns in Software Engineering. DÜBİTED. 2022;10(1):309-24.