Leveraging Machine Learning and Transformers to Identify Domain-Specific Services Decomposition in Legacy Systems

Işıl Karabey Aksakallı

doi:10.18185/erzifbed.1590024

Araştırma Makalesi

Leveraging Machine Learning and Transformers to Identify Domain-Specific Services Decomposition in Legacy Systems

Yıl 2025, Cilt: 18 Sayı: 2, 476 - 494, 31.08.2025

Işıl Karabey Aksakallı

https://doi.org/10.18185/erzifbed.1590024

Öz

Service-oriented architecture, one of the popular software architectures that have become very popular in recent years, has scalability, isolation and flexibility as it consists of smaller and independent domain-specific services compared to monolithic systems. For this reason, the transition from monolithic monolithic systems to service-oriented architectures is becoming widespread for large-scale applications with millions of users to have an easily manageable, scalable and flexible structure. In this study, the effectiveness of various machine learning models and different types of tokenization methods were evaluated by analyzing static source code to decompose monolithic legacy systems into domain-specific services. Standard machine learning algorithms and transformer-based tokenizers were applied to the FXML-POS legacy system and model performance were evaluated using precision, recall, accuracy, and F1 scores. Experimental results indicate that all transformer models achieve strong performance with an F1 score of 91.9% using Random Forest and Logistic Regression classifiers. Furthermore, it has been observed in the experimental results that the Word2Vec vectorization method outperforms TF-IDF in most scenarios and a maximum F1 score of 97.2% is achieved using the Random Forest Classifier. These results underscore the utility of advanced embedding techniques and classifiers in the accurate identification of domain-specific service components.

Anahtar Kelimeler

Decomposition using source code , Static analysis , Transformer-based tokenizers , Word embeddings , Machine learning

Kaynakça

1. Al-Debagy, O., P.J.S.C.P. Martinek, and Experience, A microservice decomposition method through using distributed representation of source code. 2021. 22(1): p. 39-52.
2. Akl, M., Exploring Software Architectural Transitions: From Monolithic Applications to Microfrontends enhanced by Webpack library and Cypress Testing. 2024, Politecnico di Torino.
3. Balalaie, A., A. Heydarnoori, and P.J.I.S. Jamshidi, Microservices architecture enables devops: Migration to a cloud-native architecture. 2016. 33(3): p. 42-52.
4. Razzaq, A. and S.A.J.C.A.i.E.E. Ghayyur, A systematic mapping study: The new age of software architecture from monolithic to microservice architecture—awareness and challenges. 2023. 31(2): p. 421-451.
5. Cerny, T., M.J. Donahoo, and M.J.A.S.A.C.R. Trnka, Contextual understanding of microservice architecture: current and future directions. 2018. 17(4): p. 29-45.
6. Stojanov, Z., et al., A Tertiary Study on Microservices: Research Trends and Recommendations. 2023. 49(8): p. 796-821.
7. Verma, R., D.J.S.C.P. Rane, and I.f.R.-T.S.-O. Computing, Service-Oriented Computing: Challenges, Benefits, and Emerging Trends. 2024: p. 65-82.
8. Abdellatif, M., et al. A type-sensitive service identification approach for legacy-to-SOA migration. in Service-Oriented Computing: 18th International Conference, ICSOC 2020, Dubai, United Arab Emirates, December 14–17, 2020, Proceedings 18. 2020. Springer.
9. Khadka, R., et al. A structured legacy to SOA migration process and its evaluation in practice. in 2013 IEEE 7th International Symposium on the Maintenance and Evolution of Service-Oriented and Cloud-Based Systems. 2013. IEEE.
10. Abdellatif, M., et al., A taxonomy of service identification approaches for legacy software systems modernization. 2021. 173: p. 110868.
11. Abebe, S., H.J.I.J.o.P.M. Twinomurinzi, and Benchmarking, Identifying business services from small and micro enterprises' collaboration: the activity-based service identification framework. 2023. 15(3): p. 373-399.
12. Ayalew, S.A.J.A.P., Identifying reusable services from collaborative activities using activity theory (AT): The Activity-Based Service Identification Framework (ASIF). 2023.
13. Si, H., et al. A service-oriented analysis and modeling using use case approach. in 2009 International Conference on Computational Intelligence and Software Engineering. 2009. IEEE.
14. Yousef, R., et al., Extracting SOA Candidate Software Services from an Organization’s Object Oriented Models. 2014. 2014.
15. Zhao, Y., et al. A service-oriented analysis and design approach based on data flow diagram. in 2009 International Conference on Computational Intelligence and Software Engineering. 2009. IEEE.
16. Rafsanjani, S. JavaFX Point of Sales. 2024; Available from: https://github.com/sadatrafsanjani/JavaFX-Point-of-Sales.
Brito, M., J. Cunha, and J. Saraiva. Identification of microservices from monolithic applications through topic modelling. in Proceedings of the 36th annual ACM symposium on applied computing. 2021.
18. Trabelsi, I., et al., From legacy to microservices: A type‐based approach for microservices identification using machine learning and semantic analysis. 2023. 35(10): p. e2503.
19. Sellami, K., et al., Improving microservices extraction using evolutionary search. Information and Software Technology, 2022. 151: p. 106996.
20. Bastidas Fuertes, A., M. Pérez, and J.J.A.S. Meza, Transpiler-based architecture design model for back-end layers in software development. 2023. 13(20): p. 11371.
21. Hu, M., et al., Collaborative Deployment and Routing of Industrial Microservices in Smart Factories. 2024.
22. Karabey Aksakallı, I., et al., Micro‐IDE: A tool platform for generating efficient deployment alternatives based on microservices. 2022. 52(7): p. 1756-1782.
23. Woods, D., Enterprise services architecture. 2003: " O'Reilly Media, Inc.".
24. Xiao, Z., I. Wijegunaratne, and X. Qiang. Reflections on SOA and Microservices. in 2016 4th International Conference on Enterprise Systems (ES). 2016. IEEE.
25. Feng, Z., et al., Codebert: A pre-trained model for programming and natural languages. 2020.
26. Behera, S.K. and R. Dash, Fine-Tuning of a BERT-Based Uncased Model for Unbalanced Text Classification, in Advances in Intelligent Computing and Communication: Proceedings of ICAC 2021. 2022, Springer. p. 377-384.
27. Liu, Y., et al., RoBERTa: A robustly optimized BERT pretraining approach. arXiv [Preprint](2019). 1907.
28. Abubakar, H.D., et al., Sentiment classification: Review of text vectorization methods: Bag of words, Tf-Idf, Word2vec and Doc2vec. 2022. 4(1): p. 27-33.
29. Rong, X., word2vec Parameter Learning Explained. 2014. 30. Popescu, M.-C., et al., Multilayer perceptron and neural networks. WSEAS Transactions on Circuits and Systems, 2009. 8(7): p. 579-588.
31. Navada, A., et al. Overview of use of decision tree algorithms in machine learning. in 2011 IEEE control and system graduate research colloquium. 2011. IEEE.
32. Ampomah, E.K., et al., Stock market prediction with gaussian naïve bayes machine learning algorithm. Informatica, 2021. 45(2).
33. Abbas, M., et al., Multinomial Naive Bayes classification model for sentiment analysis. IJCSNS Int. J. Comput. Sci. Netw. Secur, 2019. 19(3): p. 62.
34. Zhang, Z., Introduction to machine learning: k-nearest neighbors. Annals of translational medicine, 2016. 4(11).
35. Thulasidas, M. Nearest centroid: A bridge between statistics and machine learning. in 2020 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE). 2020. IEEE.
36. Tian, Y., Y. Zhang, and H. Zhang, Recent advances in stochastic gradient descent in deep learning. Mathematics, 2023. 11(3): p. 682.
37. Hearst, M.A., et al., Support vector machines. IEEE Intelligent Systems and their applications, 1998. 13(4): p. 18-28.
38. Upadhyay, D., et al., Gradient boosting feature selection with machine learning classifiers for intrusion detection on power grids. IEEE Transactions on Network and Service Management, 2020. 18(1): p. 1104-1116.
39. Liu, Y., Y. Wang, and J. Zhang. New machine learning algorithm: Random forest. in Information Computing and Applications: Third International Conference, ICICA 2012, Chengde, China, September 14-16, 2012. Proceedings 3. 2012. Springer.

Eski Sistemlerde Alana Özgü Hizmet Ayrıştırmasını Belirlemek İçin Makine Öğreniminden ve Transformatörlerden Yararlanma

Yıl 2025, Cilt: 18 Sayı: 2, 476 - 494, 31.08.2025

Işıl Karabey Aksakallı

https://doi.org/10.18185/erzifbed.1590024

Öz

Son yıllarda oldukça popüler hale gelen yazılım mimarilerden biri olan servis odaklı mimari monolit sistemlere göre daha küçük ve bağımsız alana özgü hizmetlerden oluştuğundan ölçeklenebilirlik, izolasyon ve esnek yapıya sahiptir. Bu nedenle milyonlarca kullanıcıya sahip büyük ölçekli uygulamaların kolay yönetilebilir, ölçeklenbilir ve esnek bir yapıya sahip olması için monolitk tek parçalı sistemlerden servis odaklı mimarilere geçiş yaygınlaşmaktadır. Bu çalışmada, tek parçalı eski sistemleri alana-özgü hizmetlere ayrıştırmak için statik kaynak kod analizi yapılarak çeşitli makine öğrenimi modelleri ve farklı tokenleştirme yöntemlerinin etkinliği değerlendirilmektedir. Standart makine öğrenimi algoritmaları ve dönüştürücü tabanlı tokenleştiriciler FXML-POS eski sistemine uygulanmıştır ve model performansı hassasiyet, geri çağırma, doğruluk ve F1 puanları kullanarak değerlendirilmiştir. Deneysel sonuçlar, tüm transformatör modellerinin Rastgele Orman ve Lojistik Regresyon sınıflandırıcılarını kullanarak %91,9'luk bir F1 puanı ile güçlü bir performans elde ettiğini göstermektedir. Ayrıca, Word2Vec vektörleştirme yönteminin çoğu senaryoda TF-IDF'den daha iyi performans gösterdiği ve Rastgele Orman Sınıflandırıcısı kullanılarak %97,2'lik maksimum bir F1 puanı elde edildiği deneysel sonuçlarda görülmüştür. Bu sonuçlar, alan-özgü hizmet bileşenlerinin doğru bir şekilde tanımlanmasında gelişmiş yerleştirme tekniklerinin ve sınıflandırıcıların yararlılığını vurgulamaktadır

Anahtar Kelimeler

Kaynak kodu kullanarak ayrıştırma , Statik analiz , Transformatör tabanlı belirteçleyiciler , Kelime yerleştirmeleri , Makine öğrenimi

Kaynakça

1. Al-Debagy, O., P.J.S.C.P. Martinek, and Experience, A microservice decomposition method through using distributed representation of source code. 2021. 22(1): p. 39-52.
2. Akl, M., Exploring Software Architectural Transitions: From Monolithic Applications to Microfrontends enhanced by Webpack library and Cypress Testing. 2024, Politecnico di Torino.
3. Balalaie, A., A. Heydarnoori, and P.J.I.S. Jamshidi, Microservices architecture enables devops: Migration to a cloud-native architecture. 2016. 33(3): p. 42-52.
4. Razzaq, A. and S.A.J.C.A.i.E.E. Ghayyur, A systematic mapping study: The new age of software architecture from monolithic to microservice architecture—awareness and challenges. 2023. 31(2): p. 421-451.
5. Cerny, T., M.J. Donahoo, and M.J.A.S.A.C.R. Trnka, Contextual understanding of microservice architecture: current and future directions. 2018. 17(4): p. 29-45.
6. Stojanov, Z., et al., A Tertiary Study on Microservices: Research Trends and Recommendations. 2023. 49(8): p. 796-821.
7. Verma, R., D.J.S.C.P. Rane, and I.f.R.-T.S.-O. Computing, Service-Oriented Computing: Challenges, Benefits, and Emerging Trends. 2024: p. 65-82.
8. Abdellatif, M., et al. A type-sensitive service identification approach for legacy-to-SOA migration. in Service-Oriented Computing: 18th International Conference, ICSOC 2020, Dubai, United Arab Emirates, December 14–17, 2020, Proceedings 18. 2020. Springer.
9. Khadka, R., et al. A structured legacy to SOA migration process and its evaluation in practice. in 2013 IEEE 7th International Symposium on the Maintenance and Evolution of Service-Oriented and Cloud-Based Systems. 2013. IEEE.
10. Abdellatif, M., et al., A taxonomy of service identification approaches for legacy software systems modernization. 2021. 173: p. 110868.
11. Abebe, S., H.J.I.J.o.P.M. Twinomurinzi, and Benchmarking, Identifying business services from small and micro enterprises' collaboration: the activity-based service identification framework. 2023. 15(3): p. 373-399.
12. Ayalew, S.A.J.A.P., Identifying reusable services from collaborative activities using activity theory (AT): The Activity-Based Service Identification Framework (ASIF). 2023.
13. Si, H., et al. A service-oriented analysis and modeling using use case approach. in 2009 International Conference on Computational Intelligence and Software Engineering. 2009. IEEE.
14. Yousef, R., et al., Extracting SOA Candidate Software Services from an Organization’s Object Oriented Models. 2014. 2014.
15. Zhao, Y., et al. A service-oriented analysis and design approach based on data flow diagram. in 2009 International Conference on Computational Intelligence and Software Engineering. 2009. IEEE.
16. Rafsanjani, S. JavaFX Point of Sales. 2024; Available from: https://github.com/sadatrafsanjani/JavaFX-Point-of-Sales.
Brito, M., J. Cunha, and J. Saraiva. Identification of microservices from monolithic applications through topic modelling. in Proceedings of the 36th annual ACM symposium on applied computing. 2021.
18. Trabelsi, I., et al., From legacy to microservices: A type‐based approach for microservices identification using machine learning and semantic analysis. 2023. 35(10): p. e2503.
19. Sellami, K., et al., Improving microservices extraction using evolutionary search. Information and Software Technology, 2022. 151: p. 106996.
20. Bastidas Fuertes, A., M. Pérez, and J.J.A.S. Meza, Transpiler-based architecture design model for back-end layers in software development. 2023. 13(20): p. 11371.
21. Hu, M., et al., Collaborative Deployment and Routing of Industrial Microservices in Smart Factories. 2024.
22. Karabey Aksakallı, I., et al., Micro‐IDE: A tool platform for generating efficient deployment alternatives based on microservices. 2022. 52(7): p. 1756-1782.
23. Woods, D., Enterprise services architecture. 2003: " O'Reilly Media, Inc.".
24. Xiao, Z., I. Wijegunaratne, and X. Qiang. Reflections on SOA and Microservices. in 2016 4th International Conference on Enterprise Systems (ES). 2016. IEEE.
25. Feng, Z., et al., Codebert: A pre-trained model for programming and natural languages. 2020.
26. Behera, S.K. and R. Dash, Fine-Tuning of a BERT-Based Uncased Model for Unbalanced Text Classification, in Advances in Intelligent Computing and Communication: Proceedings of ICAC 2021. 2022, Springer. p. 377-384.
27. Liu, Y., et al., RoBERTa: A robustly optimized BERT pretraining approach. arXiv [Preprint](2019). 1907.
28. Abubakar, H.D., et al., Sentiment classification: Review of text vectorization methods: Bag of words, Tf-Idf, Word2vec and Doc2vec. 2022. 4(1): p. 27-33.
29. Rong, X., word2vec Parameter Learning Explained. 2014. 30. Popescu, M.-C., et al., Multilayer perceptron and neural networks. WSEAS Transactions on Circuits and Systems, 2009. 8(7): p. 579-588.
31. Navada, A., et al. Overview of use of decision tree algorithms in machine learning. in 2011 IEEE control and system graduate research colloquium. 2011. IEEE.
32. Ampomah, E.K., et al., Stock market prediction with gaussian naïve bayes machine learning algorithm. Informatica, 2021. 45(2).
33. Abbas, M., et al., Multinomial Naive Bayes classification model for sentiment analysis. IJCSNS Int. J. Comput. Sci. Netw. Secur, 2019. 19(3): p. 62.
34. Zhang, Z., Introduction to machine learning: k-nearest neighbors. Annals of translational medicine, 2016. 4(11).
35. Thulasidas, M. Nearest centroid: A bridge between statistics and machine learning. in 2020 IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE). 2020. IEEE.
36. Tian, Y., Y. Zhang, and H. Zhang, Recent advances in stochastic gradient descent in deep learning. Mathematics, 2023. 11(3): p. 682.
37. Hearst, M.A., et al., Support vector machines. IEEE Intelligent Systems and their applications, 1998. 13(4): p. 18-28.
38. Upadhyay, D., et al., Gradient boosting feature selection with machine learning classifiers for intrusion detection on power grids. IEEE Transactions on Network and Service Management, 2020. 18(1): p. 1104-1116.
39. Liu, Y., Y. Wang, and J. Zhang. New machine learning algorithm: Random forest. in Information Computing and Applications: Third International Conference, ICICA 2012, Chengde, China, September 14-16, 2012. Proceedings 3. 2012. Springer.

Toplam 38 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Karar Desteği ve Grup Destek Sistemleri
Bölüm	Makaleler
Yazarlar	Işıl Karabey Aksakallı 0000-0002-4156-9098
Erken Görünüm Tarihi	14 Ağustos 2025
Yayımlanma Tarihi	31 Ağustos 2025
Gönderilme Tarihi	22 Kasım 2024
Kabul Tarihi	11 Mart 2025
Yayımlandığı Sayı	Yıl 2025 Cilt: 18 Sayı: 2

Kaynak Göster

APA	Karabey Aksakallı, I. (2025). Leveraging Machine Learning and Transformers to Identify Domain-Specific Services Decomposition in Legacy Systems. Erzincan University Journal of Science and Technology, 18(2), 476-494. https://doi.org/10.18185/erzifbed.1590024

Kapak Resmi İndir

Makale Dosyaları

Tam Metin