Research Article

A New Automata Based Approximate String Matching Approach and Web Interface for Bioinformatics Algorithms

Volume: 23 Number: 3 October 26, 2018
EN TR

A New Automata Based Approximate String Matching Approach and Web Interface for Bioinformatics Algorithms

Abstract

 In this study, we present a new web interface for major bioinformatics algorithms and introduce a novel approximate string matching algorithm. Our web interface executes major algorithms on the field for the use of computational biologists, students or any other interested researchers. In the web interface, algorithms come under three sections: Sequence alignment, pattern matching and motif finding. In each section, we introduce algorithms in order to find best fitting one for specific dataset and problem. The interface introduces execution time, memory usage and context specific results of algorithms such as alignment score. The interface utilizes emerging open source languages and tools. In order to develop light and user-friendly interface, all parts of the interface coded with Python language. On the other hand, Django is used for web interface. Second contribution of the study is novel A-BOM algorithm, which is designed for approximate pattern matching problem. The algorithm is approximate matching variation of Backward Oracle Matching. We compare our algorithm with popular approximate string matching algorithms. Results denote that A-BOM introduces %30 to %80 short runtime improvement when compared to current approximate pattern matching algorithms on long patterns.

Keywords

References

  1. Alluzen, C., Crochemore, M. and Raffinot, M. (1999) Factor Oracle: A New Structure for Pattern Matching, SOFSEM’99: Theory and Practice of Informatics, Lecture Notes in Computer Science, Berlin, 291-306. doi: 10.1007/3-540-47849-3_18
  2. Bishop, C. M. (2006) Machine learning and pattern recognition. Information Science and Statistics. Springer, Heidelberg.
  3. Boyer, R.S., Moore, J.S and Pratt, W.R. (1977) A Fast String Searching Algorithm, Journal of Molecular Biology, Communications of the ACM, New York, 762-772. doi: 10.1145/359842.359859
  4. Burrows, W. and Wheeler, D. J. (1994) A block-sorting lossless data compression algorithm, Technical Report 124, Digital Equipment Corporation, Digital Equipment Corporation, California.
  5. D'haeseleer, P. (2006) How does DNA sequence motif discovery work?. Nature biotechnology, 24(8), 959-961
  6. Durbin, R., Eddy, S. R., Krogh, A. and Mitchison, G. (1998) Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids, Cambridge University Press, Cambridge.
  7. Ji, H. and Shendure, J. (2008) Next-generation DNA sequencing, Nature biotechnology volume 26, Nature Publishing Group, London, 1135-1145. doi: 10.1038/nbt1486
  8. Knuth, D.E., Morris, J.H and Pratt, W.R. (1977) Fast Pattern Matching in Strings, Journal of Molecular Biology, SIAM Journal on Computing, Philadelphia, 323-350. doi: 10.1137/0206024

Details

Primary Language

English

Subjects

Engineering

Journal Section

Research Article

Publication Date

October 26, 2018

Submission Date

May 22, 2018

Acceptance Date

October 16, 2018

Published in Issue

Year 2018 Volume: 23 Number: 3

APA
Koca, B., & Özcan, G. (2018). A New Automata Based Approximate String Matching Approach and Web Interface for Bioinformatics Algorithms. Uludağ Üniversitesi Mühendislik Fakültesi Dergisi, 23(3), 91-102. https://doi.org/10.17482/uumfd.425094
AMA
1.Koca B, Özcan G. A New Automata Based Approximate String Matching Approach and Web Interface for Bioinformatics Algorithms. UUJFE. 2018;23(3):91-102. doi:10.17482/uumfd.425094
Chicago
Koca, Burak, and Gıyasettin Özcan. 2018. “A New Automata Based Approximate String Matching Approach and Web Interface for Bioinformatics Algorithms”. Uludağ Üniversitesi Mühendislik Fakültesi Dergisi 23 (3): 91-102. https://doi.org/10.17482/uumfd.425094.
EndNote
Koca B, Özcan G (December 1, 2018) A New Automata Based Approximate String Matching Approach and Web Interface for Bioinformatics Algorithms. Uludağ Üniversitesi Mühendislik Fakültesi Dergisi 23 3 91–102.
IEEE
[1]B. Koca and G. Özcan, “A New Automata Based Approximate String Matching Approach and Web Interface for Bioinformatics Algorithms”, UUJFE, vol. 23, no. 3, pp. 91–102, Dec. 2018, doi: 10.17482/uumfd.425094.
ISNAD
Koca, Burak - Özcan, Gıyasettin. “A New Automata Based Approximate String Matching Approach and Web Interface for Bioinformatics Algorithms”. Uludağ Üniversitesi Mühendislik Fakültesi Dergisi 23/3 (December 1, 2018): 91-102. https://doi.org/10.17482/uumfd.425094.
JAMA
1.Koca B, Özcan G. A New Automata Based Approximate String Matching Approach and Web Interface for Bioinformatics Algorithms. UUJFE. 2018;23:91–102.
MLA
Koca, Burak, and Gıyasettin Özcan. “A New Automata Based Approximate String Matching Approach and Web Interface for Bioinformatics Algorithms”. Uludağ Üniversitesi Mühendislik Fakültesi Dergisi, vol. 23, no. 3, Dec. 2018, pp. 91-102, doi:10.17482/uumfd.425094.
Vancouver
1.Burak Koca, Gıyasettin Özcan. A New Automata Based Approximate String Matching Approach and Web Interface for Bioinformatics Algorithms. UUJFE. 2018 Dec. 1;23(3):91-102. doi:10.17482/uumfd.425094

Announcements:

30.03.2021-Beginning with our April 2021 (26/1) issue, in accordance with the new criteria of TR-Dizin, the Declaration of Conflict of Interest and the Declaration of Author Contribution forms fulfilled and signed by all authors are required as well as the Copyright form during the initial submission of the manuscript. Furthermore two new sections, i.e. ‘Conflict of Interest’ and ‘Author Contribution’, should be added to the manuscript. Links of those forms that should be submitted with the initial manuscript can be found in our 'Author Guidelines' and 'Submission Procedure' pages. The manuscript template is also updated. For articles reviewed and accepted for publication in our 2021 and ongoing issues and for articles currently under review process, those forms should also be fulfilled, signed and uploaded to the system by authors.