TY - JOUR T1 - Meta-Sezgisel Tabanlı Clustal-SA Algoritmasını Kullanarak DNA Sekanslarında Çoklu Dizi Hizalama TT - Multiple Sequence Alignment for DNA Sequences by Using Meta-Heuristic Based Clustal-SA Algorithm AU - Erdirik, Hatic AU - Karcıoğlu, Abdullah Ammar AU - Tanyolaç, Bahattin AU - Bulut, Hasan PY - 2024 DA - June Y2 - 2024 DO - 10.21597/jist.1404898 JF - Journal of the Institute of Science and Technology JO - J. Inst. Sci. and Tech. PB - Iğdır Üniversitesi WT - DergiPark SN - 2536-4618 SP - 544 EP - 562 VL - 14 IS - 2 LA - tr AB - Biyoinformatik, biyolojik verilerin analizi ve kalıtsal ilişkilerin ortaya çıkarılması için matematik, biyoloji ve bilgisayar bilimlerini birleştiren bir disiplindir. Bu alandaki en kritik görevlerden biri, biyolojik dizilerin hizalanmasıyla ilgili olan dizi hizalama problemini çözmektir. Ancak, biyolojik verilerin hızla artması, bu problemi manuel olarak çözülemez hale getirmiş ve bilgisayar sistemlerinin biyoinformatikte daha yaygın bir şekilde kullanılmasına yol açmıştır. Bu çalışmada, mevcut Clustal algoritması ve benzetimli tavlama algoritması kullanılarak yeni bir dizi hizalama algoritması önerilmiştir. Clustal algoritmasının hız avantajını kullanarak ve benzetimli tavlama algoritmasını entegre ederek, Clustal'ın aç gözlü yaklaşımından uzaklaşılarak optimal hizalama skoru elde etmek amaçlanmıştır. Geliştirilen algoritmanın başarısını değerlendirmek için SP (Çiftlerin Toplamı) puanlama sistemi kullanılmış ve hizalama sonucunda sütun eşleşme sayısı dikkate alınmıştır. Elde edilen sonuçlar, geliştirilen algoritmanın aynı uzunluktaki dizi veri kümeleri üzerinde ClustalW programından daha iyi performans gösterdiğini, MUSCLE programına göre ise bazı veri setlerinde daha başarılı olduğu veya yakın sonuçlar verdiğini ortaya koymuştur. Bu gelişme, biyoinformatik alanında dizi hizalama problemini çözmek için yeni ve daha etkili bir yaklaşımın potansiyelini vurgulamaktadır. Gelecekte, bu tür geliştirmelerin biyolojik veri analizi alanında daha geniş bir uygulama alanı bulabileceği düşünülmektedir. KW - Çoklu dizi hizalama KW - Sezgisel Yaklaşım KW - Clustal Algoritması KW - Benzetimli Tavlama KW - Biyoinformatik N2 - Bioinformatics is a discipline that combines mathematics, biology and computer science to analyze biological data and reveal genetic relationships. One of the most critical tasks in this field is to solve the sequence alignment problem, which is related to the alignment of biological sequences. However, the rapid increase in biological data has made this problem unsolvable manually and led to the more widespread use of computer systems in bioinformatics. In this study, a new sequence alignment algorithm is proposed using the existing Clustal algorithm and simulated annealing algorithm. By using the speed advantage of the Clustal algorithm and integrating the simulated annealing algorithm, it is aimed to obtain an optimal alignment score by moving away from the greedy approach of Clustal. To evaluate the success of the developed algorithm, the SP (Sum of Pairs) scoring system was used and the number of column matches as a result of the alignment was taken into account. The results obtained revealed that the developed algorithm performed better than the ClustalW program on sequence data sets of the same length, and was more successful or gave similar results on some data sets compared to the MUSCLE program. This development highlights the potential of a new and more effective approach to solving the sequence alignment problem in bioinformatics. It is thought that in the future, such developments may find wider application in the field of biological data analysis. CR - Aarts, E. H., & van Laarhoven, P. J. (1987). Simulated annealing: a pedestrian review of the theory and some applications. In Pattern recognition theory and applications (pp. 179-192). Springer Berlin Heidelberg. Doi: 10.1007/978-3-642-83069-3_15 CR - Aktan, M. N., & Bulut, H. (2022). Metaheuristic task scheduling algorithms for cloud computing environments. Concurrency and Computation: Practice and Experience, 34(9), e6513. Doi: 10.1002/cpe.6513 CR - Botta, M., & Negro, G. (2010). Multiple sequence alignment with genetic algorithms. In Computational Intelligence Methods for Bioinformatics and Biostatistics: 6th International Meeting, CIBB 2009, Genoa, Italy, October 15-17, 2009. Springer Berlin Heidelberg. Doi: 10.1007/978-3-642-14571-1_15 CR - Bucak, İ. Ö., & Uslan, V. (2011). Sequence alignment from the perspective of stochastic optimization: a survey. Turkish Journal of Electrical Engineering and Computer Sciences, 19(1), 157-173. Doi: 10.3906/elk-1002-410 CR - Chen J.T, Chao J.N, Liu H., Yang F.L., Zou Q. & Tang F.R, (2023) WMSA 2: a multiple DNA/RNA Sequence alignment tool implemented with accurate progressive mode and a fast win-win mode combining the center star and progressive strategies, Briefings in Bioinformatics, Volume: 24, Issue:4, Doi :10.1093/bib/bbad190 CR - Cohen, J. (2004). Bioinformatics—an introduction for computer scientists. ACM Computing Surveys (CSUR), 36(2), 122-158. Doi: 10.1145/1031120.1031122 CR - Diamantis, S., & Anna, C. (2005). Comparison of multiple sequence alignment programs. National and Kapodistrian university of Athens. CR - Doğan, H., & Otu, H., (2014) Multiple Sequence Alignment Methods: Objevtive Function, Springer Protocols, Chepter 3, 44-85p CR - Edgar, R.C., (2004), MUSCLE: multiple sequence alignment with high accuracy and high throughput, Nucleic Acids Research, 200, vol.32, No.5, DOI: 10.1093/nar/gkh340 CR - Edgar, R.C., Batzoglou, S., (2006) Multiple sequence alignmnet, Current Opinion in Structural Biology 2006, 16:368–373, Elsevier CR - Haque, W., Aravind, A., & Reddy, B. (2009, March). Pairwise sequence alignment algorithms: a survey. In Proceedings of the 2009 conference on Information Science, Technology and Applications (pp. 96-103). Doi: 10.1145/1551950.1551980 CR - Karcıoğlu, A. A., & Bulut, H. (2022). DNA sekansları için q-gram hash karşılaştırmasına dayalı çoklu kesin dizi eşleştirme algoritması. Gazi Üniversitesi Mühendislik Mimarlık Fakültesi Dergisi, 38(2), 875-888. Doi: 10.17341/gazimmfd.951157 CR - Karcioglu, A. A., & Bulut, H. (2021). Improving hash-q exact string matching algorithm with perfect hashing for DNA sequences. Computers in Biology and Medicine, 131, 104292. Doi: 10.1016/j.compbiomed.2021.104292 CR - Karcioglu, A. A., & Bulut, H. (2021). The WM-q multiple exact string matching algorithm for DNA sequences. Computers in Biology and Medicine, 136, 104656. Doi: 10.1016/j.compbiomed.2021.104656 CR - Karcioglu, A. A., & Bulut, H. (2022). q‐frame hash comparison based exact string matching algorithms for DNA sequences. Concurrency and Computation: Practice and Experience, 34(9), e6505. Doi: 10.1002/cpe.6505 CR - Lee, Z. J., Su, S. F., Chuang, C. C., & Liu, K. H. (2008). Genetic algorithm with ant colony optimization (GA-ACO) for multiple sequence alignment. Applied Soft Computing, 8(1), 55-78. Doi: 10.1016/j.asoc.2006.10.012 CR - Likic, V. (2008). The Needleman-Wunsch algorithm for sequence alignment. Lecture given at the 7th Melbourne Bioinformatics Course, Bi021 Molecular Science and Biotechnology Institute, University of Melbourne, 1-46. CR - Luscombe, N. M., Greenbaum, D., & Gerstein, M. (2001). What is bioinformatics? An introduction and overview. Yearbook of medical informatics, 10(01), 83-100. Doi: 10.1055/s-0038-1638103 CR - Major Differences, Difference between Global and Local Sequence Alignment, https://www.majordifferences.com/2016/05/differencebetween-global-and-local.html, Access Date: 28.12.2022. CR - Mirjalili, S. (2019). Evolutionary algorithms and neural networks. In Studies in computational intelligence (Vol. 780). Berlin/Heidelberg, Germany: Springer. CR - Omar, M.F., Salam, R.A., Abdullah, R. & Rashid, N.A (2004). Multiple Sequence Alignment Using Optimization Algorithms, International Journal of Computational Intelligence Volume 1 Number 2 CR - Pais, F. S. M., Ruy, P. D. C., Oliveira, G., & Coimbra, R. S. (2014). Assessing the efficiency of multiple sequence alignment programs. Algorithms for molecular biology, 9, 1-8. Doi: 10.1186/1748-7188-9-4 CR - Paruchuri T., Kancharla G.R. & Dara S. (2023). Solving multiple sequence alignment problems by using a swarm intelligent optimization based approach, International Journal of Electrical and Computer Engineering (IJECE), Vol. 13, No. 1, February 2023, pp. 1097-1104 CR - Russel and Norvig. (2010). Artificial intelligence: a modern approach, Global Edition. Harlow, Essex, England: Pearson Educatio. CR - Smith, T. F., & Waterman, M. S. (1981). Identification of common molecular subsequences. Journal of molecular biology, 147(1), 195-197. UR - https://doi.org/10.21597/jist.1404898 L1 - https://dergipark.org.tr/tr/download/article-file/3599688 ER -