COMPARISON OF THE DATA MATCHING PERFORMANCES OF STRING SIMILARITY ALGORITHMS IN BIG DATA

Bekir Aksoy; Sinan Uğuz; Okan Oral

doi:10.21923/jesd.467036

Araştırma Makalesi

COMPARISON OF THE DATA MATCHING PERFORMANCES OF STRING SIMILARITY ALGORITHMS IN BIG DATA

Yıl 2019, Cilt: 7 Sayı: 3, 608 - 618, 15.09.2019

Bekir Aksoy , Sinan Uğuz , Okan Oral

https://doi.org/10.21923/jesd.467036

Cited By: 2

Öz

The great mobility in the world tourism in recent years has
also enabled this sector to be included among the study areas of big data. In
this study, a solution proposal was put forward by using the big data and
string similarity algorithms (SSA) for the problems arising from the entry of
the hotel data coming from different providers into databases with different
names and addresses. Therefore, 2599 hotels of a tourism agency with a wide
hotel network located in London were selected as the sample, and the Map-Reduce
process was performed by using the Soundex algorithm to match these hotels with
approximately three million hotel data coming from seventy different providers.
Matching with Map-Reduce ensured a significant reduction in process count and
process time. Furthermore, the Dice coefficient, Levenshtein and Longest common
subsequence (LCS) algorithms were compared in terms of the data that they
correctly matched, and process time. In this stage, the words decreasing the
score of the algorithms in the database were detected and removed before the
algorithms were implemented. The Dice coefficient algorithm yielded better
results in terms of correct matching, and the Levenshtein algorithm yielded
better results in terms of process time.

Anahtar Kelimeler

Algorithms , Text Analysis , Natural Language processing , Data Analysis , Databases

Kaynakça

Bakar, Z. A., Sembok, T. M. T., and Yusoff, M., 2000. An evaluation of retrieval effectiveness using spelling-correction and string-similarity matching methods on Malay texts, Journal of the Association for Information Science and Technology, vol. 51, no. 8, pp. 691-706, doi: 10.1002/(SICI)1097-4571(2000)51:8<691: :AID-ASI20>3.0.CO;2-U
Baruah, D., and Mahanta, A. K., 2013. A new similarity measure with length factor for plagiarism detection, International Journal of Computer Applications, vol. 72, no. 14, pp. 14-17.
Baruah, D., and Mahanta, A. K., 2015. Design and development of soundex for assamese language, International Journal of Computer Applications, vol. 117, no. 9, pp. 9-12, doi: 10.5120/20581-3000
Bhatti, Z., Waqas, A., Ismaili, I. A., Hakro, D. N., and Soomro, W. J., 2014. Phonetic based soundex and shapeex algorithm for Sindhi spell checker system, Advances in Environmental Biology, vol. 8, no. 4, pp. 1147-1155.
Bird, S., Klein, E., and Loper, E., 2009. Natural Language Processing with Python. O’Reilly Press, pp. 463.
Cavoukian, A., and Jonas, J., 2012. Privacy by design in the age of big data. Information and Privacy Commissioner of Ontario, Canada, pp. 3.
Chaudhary, A., Wakchoure, N., Gotarne, N., Nath, P., and B., Dhakulkar, 2016. A comparative study on name matching algorithms, International Journal of Research in Advent Technology, vol. 4, no. 5, pp. 127-129.
Chen, X., and Zhou, L., 2015. Design and implementation of an intelligent system for tourist routes recommendation based on Hadoop, 6th IEEE International Conference on Software Engineering and Service Science (ICSESS), Beijing, pp. 774–778. doi: 10.1109/ICSESS.2015.7339171
Chowdhury, S. R., Hasan, M. M., Iqbal, S., and Rahman, M. S., 2014. Computing a longest common palindromic subsequence, Fundamenta Informaticae, vol. 129, no. 4, pp. 329-340, doi: 10.3233/FI-2014-974
Dice, L. R., 1945. Measures of the amount of ecologic association between species, Ecology, vol. 26, no. 3, pp. 297-302.
Dursun, B., and Sonmez, A. C., 2008. A new method for computing the similarity of Turkish texts, IEEE 16th Signal Processing, Communication and Applications Conference, Aydın, pp. 76. doi: 10.1109/SIU.2008.4632581
Freeman, A. T., Condon, S. L., and Ackerman, C. M., 2006. Cross linguistic name matching in English and Arabic: a one to many mapping extension of the Levenshtein edit distance algorithm, in proc. Main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics. Association for Computational Linguistics, pp. 471-478, doi:10.3115/1220835.1220895
Fuentes, A. A. G., Parra, I. P., Quevedo-Torrero, J. U., and Perez, R. D., 2016. Comparative analysis of phonetic algorithms applied to Spanish,” International Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas, pp. 1180-1185, doi: 10.1109/CSCI.2016.0223
Gupta P., and Upadhyay, A., 2015. Sentiment and predictive analysis of big data for hotel reviews, International Journal of Software & Hardware Research in Engineering, vol. 3, no. 5, pp. 78–86.
Heeringa, W. J. 2004. Measuring dialect pronunciation differences using Levenshtein distance, Groningen: s.n, pp.323.
Ilhan, S., Duru, N., Karagoz, S., and Sagir, M., 2008. Metin madenciligi ile soru cevaplama sistemi, Electrical – Electronics - Computer Engineering Symposium, Bursa, pp. 356-359.
Jaisunder, G. C, Ahmed, I., and Mishra, R. K., 2017. Need for customized soundex based algorithm on indian names for phonetic matching, Global Journal of Enterprise Information System, vol. 8, no. 2, pp. 30-35, doi: 10.18311/gjeis/2016/7658
Jiang, Y., Deng, D., Wang, J., and Li, G., 2013. Efficient parallel partition based algorithms for similarity search and join with edit distance constraints, in Proc. Joint EDBT/ICDT 2013 Workshops, Genoa. doi: 10.1145/2457317.2457382
Kisla, T., Karaoglan, B., and Metin, S. K., 2015. Extracting the Features of Similarity in Short Texts. IEEE 23th Signal Processing And Communications Applications Conference, Malatya, pp. 180-183, doi: 10.1109/SIU.2015.7130443
Kruskal, J. B., and Sankoff, D., 1999. Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. Stanford, CA: CSLI Publications.
Kurdziel, L. B. F., and Spencer, R. M. C., 2016. Consolidation of novel word learning in native English-speaking adults, Memory, vol. 24, no. 4, pp. 471-481, doi: 10.1080/09658211.2015.1019889
Levenshtein, V. I., 1966. Binary codes capable of correcting deletions, insertions, and reversals, Soviet Physics Doklady, vol. 10, no. 8. pp. 707-710.
Li, G., Deng, D., and Feng, J., 2013. A partition-based method for string similarity joins with edit-distance constraints, ACM Transactions on Database Systems (TODS), vol. 38, no. 2, pp. 1–33, doi: 10.1145/2487259.2487261
Li, X., Pan, B., Law, R., and Huang, X., 2017. Forecasting tourism demand with composite search index, Tourism Management, vol. 59, pp. 57-66, 2017. doi: 10.1016/j.tourman.2016.07.005
Liu, Y., Teichert, T., Rossi, M., Li, H., and Hu, F., 2017. Big data for big insights: Investigating language-specific drivers of hotel satisfaction with 412,784 user-generated reviews, Tourism Management, vol. 59, pp. 554–563.
Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., and Watkins, C., 2002. Text classification using string kernels, Journal of Machine Learning Research, vol. 2, pp. 419-444.
Miah, S. J., Vu, H. Q., Gammack, J.,. and McGrath, M., 2017. A big data analytics method for tourist behaviour analysis, Information & Management, vol. 54, no. 6, pp. 771-785, doi: 10.1016/j.im.2016.11.011
Mutalib N. S. A., and Noah, S. A., 2011. Phonetic coding methods for Malay names retrieval,” International Conference on Semantic Technology and Information Retrieval, Putrajaya, pp. 125-129. doi: 10.1109/STAIR.2011.5995776
Naumann, F., and Herschel, M., 2010. An introduction to duplicate detection,” Synthesis Lectures on Data Management, vol. 2, no.1, pp. 1-87, doi: 10.2200/ S00262ED1V01Y201003DTM003
Nyirarugira, C., and Kim, T., 2015. Stratified gesture recognition using the normalized longest common subsequence with rough sets, Signal Processing: Image Communication, vol. 30, pp. 178-189, doi: 10.1016/j.image.2014.10.00844.
Odell, M., and Russell, R., 1918. The soundex coding system, US Patents 1261167.
Onder, I., 2017. Classifying multi-destination trips in Austria with big data, Tourism Management Perspectives, vol. 21, pp. 54-58, doi: 10.1016/j.tmp.2016.11.002
Parmar, V. P., and Kumbharana, C. K., 2014. Study existing various phonetic algorithms and designing and development of a working model for the new developed algorithm and comparison by implementing it with existing algorithm (s), International Journal of Computer Applications, vol. 98, no. 19, pp. 45-49.
Peng, X., and Huang, Z., 2012. Enabling semantic queries against the spatial database, Advances in Electrical and Computer Engineering, vol. 12, no.1, pp. 45-50, doi: 10.4316/AECE.2012.01008
Sagiroglu, S., and Sinanc, D., 2013. Big data: A review, International Conference on Collaboration Technologies and Systems (CTS), San Diego, pp 42-47. doi: 10.1109/CTS.2013.6567202
Shedeed, H. A., and Abdel, H., 2011. A new intelligent methodology for computer based assessment of short answer question based on a new enhanced soundex phonetic algorithm for Arabic language, International Journal of Computer Applications, vol. 34, no. 10, pp. 40-47.
Shrote, K. R., and Deorankar, A. V., 2016 Hotel recommendation system using hadoop and mapreduce for big data, International Journal of Computer Science, Information Technology, and Security, vol. 6, no. 2, pp. 137–141.
Stein-Smith, K., 2016. The US Foreign Language Deficit: Strategies for Maintaining a Competitive Edge in a Globalized World. Palgrave Macmillan, pp. 21, doi: 10.1007/978-3-319-34159-0
Su, Z., Ahn, B. R., Eom, K. Y., Kang, M. K., Kim, J. P., and Kim, M. K., 2008. Plagiarism detection using the Levenshtein distance and Smith-Waterman algorithm, 3rd International Conference on Innovative Computing Information and Control, Dalian, Liaoning, pp. 0-3. doi: 10.1109/ICICIC.2008.422
Tabataba F. S., and Mousavi, S. R., 2012. A hyper-heuristic for the longest common subsequence problem, Computational Biology and Chemistry, vol. 36, pp. 42–54, doi: 10.1016/j.compbiolchem.2011.12.004
Toole, J. L., Colak, S., Sturt, B., Alexander, L. P., Evsukoff, A., and González, M. C., The path most traveled: Travel demand estimation using big data resources, Transportation Research Part C: Emerging Technologies, vol. 58, pp. 162-177, 2015. doi: 10.1016/j.trc.2015.04.022
Ugon, A., T. 2015. Nicolas, M. Richard, P. Guerin, P. Chansard, C. Demoor, and L. Toubiana, “A new approach for cleansing geographical dataset using Levenshtein distance, prior knowledge and contextual information, Medical Informatics Europe, Madrid, pp. 227-229. doi: 10.3233/978-1-61499-512-8-227
Xiang, L. , Jiang, N., Ya-ting, Y., Xi, Z., and Cheng-gang, M., 2014. Application of generalization language model in Chinese-Uyghur machine translation, Application Research of Computers, vol. 31, no. 10, pp. 2994-2997, doi: 10.3969/j.issn.1001-3695.2014.10.026.
Xiang, Z., Schwartz, Z., Gerdes, J. H., and Uysal, M., 2015. What can big data and text analytics tell us about hotel guest experience and satisfaction? International Journal of Hospitality Management, vol. 44, pp. 120-130, doi: 10.1016/j.ijhm.2014.10.013
Yahia, M. E., Saeed, M. E., and Salih, A. M., 2006. An intelligent algorithm for Arabic soundex function using intuitionistic fuzzy logic, 3rd International IEEE Conference Intelligent Systems, London, pp. 711-715. doi: 10.1109/IS.2006.348506
Zikopoulos, P., and Eaton, C., 2011. Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data. Mcgraw-Hill Osborne Media Press, pp. 176.

BÜYÜK VERİDE METİN BENZERLİK ALGORİTMALARININ VERİ EŞLEME PERFORMANSLARININ KARŞILAŞTIRILMASI

Yıl 2019, Cilt: 7 Sayı: 3, 608 - 618, 15.09.2019

Bekir Aksoy , Sinan Uğuz , Okan Oral

https://doi.org/10.21923/jesd.467036

Cited By: 2

Öz

Son yıllarda dünya turizmindeki büyük hareketlilik, bu
sektörün büyük verinin çalışma alanları arasına girmesini sağlamıştır. Bu
çalışmada farklı sağlayıcılardan gelen otel bilgilerinin, veritabanlarına
farklı isim ve adreslerle girilmesi sonucu oluşan problemler için, büyük veri
ve string similarity algoritmaları (SSA) kullanarak bir çözüm önerisi ortaya
konulmuştur. Bunun için geniş bir otel ağına sahip bir turizm acentasının
Londra’da bulunan 2599 oteli örneklem olarak seçilmiş ve bu oteller ile yetmiş
farklı sağlayıcıdan gelen yaklaşık üç milyon otel bilgisinin eşleştirilmesi
için, soundex algoritmasından faydalanılarak Map-Reduce işlemi
gerçekleştirilmiştir. Map-Reduce ile eşleme işlem sayısı ve işlem süresinde
önemli ölçüde azalma sağlanmıştır. Çalışmanın diğer aşamasında ise Dice
coefficient, Levenshtein ve Longest common subsequence (LCS) algoritmaları,
doğru eşleyebildikleri veri ve işlem süresi açısından kıyaslanmıştır. Bu aşamada
algoritmalar uygulanmadan önce veri tabanında algoritmaların skorunu düşüren
kelimeler tespit edilerek çıkartılmıştır. Doğru eşleme bakımından Dice
coefficient algoritması, işlem süresi açısından ise Levenshtein algoritması
daha iyi sonuçlar üretmiştir.

Anahtar Kelimeler

Algoritmalar , Metin analizi , Doğal dil işleme , Veri analizi , Veri tabanları

Kaynakça

Bakar, Z. A., Sembok, T. M. T., and Yusoff, M., 2000. An evaluation of retrieval effectiveness using spelling-correction and string-similarity matching methods on Malay texts, Journal of the Association for Information Science and Technology, vol. 51, no. 8, pp. 691-706, doi: 10.1002/(SICI)1097-4571(2000)51:8<691: :AID-ASI20>3.0.CO;2-U
Baruah, D., and Mahanta, A. K., 2013. A new similarity measure with length factor for plagiarism detection, International Journal of Computer Applications, vol. 72, no. 14, pp. 14-17.
Baruah, D., and Mahanta, A. K., 2015. Design and development of soundex for assamese language, International Journal of Computer Applications, vol. 117, no. 9, pp. 9-12, doi: 10.5120/20581-3000
Bhatti, Z., Waqas, A., Ismaili, I. A., Hakro, D. N., and Soomro, W. J., 2014. Phonetic based soundex and shapeex algorithm for Sindhi spell checker system, Advances in Environmental Biology, vol. 8, no. 4, pp. 1147-1155.
Bird, S., Klein, E., and Loper, E., 2009. Natural Language Processing with Python. O’Reilly Press, pp. 463.
Cavoukian, A., and Jonas, J., 2012. Privacy by design in the age of big data. Information and Privacy Commissioner of Ontario, Canada, pp. 3.
Chaudhary, A., Wakchoure, N., Gotarne, N., Nath, P., and B., Dhakulkar, 2016. A comparative study on name matching algorithms, International Journal of Research in Advent Technology, vol. 4, no. 5, pp. 127-129.
Chen, X., and Zhou, L., 2015. Design and implementation of an intelligent system for tourist routes recommendation based on Hadoop, 6th IEEE International Conference on Software Engineering and Service Science (ICSESS), Beijing, pp. 774–778. doi: 10.1109/ICSESS.2015.7339171
Chowdhury, S. R., Hasan, M. M., Iqbal, S., and Rahman, M. S., 2014. Computing a longest common palindromic subsequence, Fundamenta Informaticae, vol. 129, no. 4, pp. 329-340, doi: 10.3233/FI-2014-974
Dice, L. R., 1945. Measures of the amount of ecologic association between species, Ecology, vol. 26, no. 3, pp. 297-302.
Dursun, B., and Sonmez, A. C., 2008. A new method for computing the similarity of Turkish texts, IEEE 16th Signal Processing, Communication and Applications Conference, Aydın, pp. 76. doi: 10.1109/SIU.2008.4632581
Freeman, A. T., Condon, S. L., and Ackerman, C. M., 2006. Cross linguistic name matching in English and Arabic: a one to many mapping extension of the Levenshtein edit distance algorithm, in proc. Main conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics. Association for Computational Linguistics, pp. 471-478, doi:10.3115/1220835.1220895
Fuentes, A. A. G., Parra, I. P., Quevedo-Torrero, J. U., and Perez, R. D., 2016. Comparative analysis of phonetic algorithms applied to Spanish,” International Conference on Computational Science and Computational Intelligence (CSCI), Las Vegas, pp. 1180-1185, doi: 10.1109/CSCI.2016.0223
Gupta P., and Upadhyay, A., 2015. Sentiment and predictive analysis of big data for hotel reviews, International Journal of Software & Hardware Research in Engineering, vol. 3, no. 5, pp. 78–86.
Heeringa, W. J. 2004. Measuring dialect pronunciation differences using Levenshtein distance, Groningen: s.n, pp.323.
Ilhan, S., Duru, N., Karagoz, S., and Sagir, M., 2008. Metin madenciligi ile soru cevaplama sistemi, Electrical – Electronics - Computer Engineering Symposium, Bursa, pp. 356-359.
Jaisunder, G. C, Ahmed, I., and Mishra, R. K., 2017. Need for customized soundex based algorithm on indian names for phonetic matching, Global Journal of Enterprise Information System, vol. 8, no. 2, pp. 30-35, doi: 10.18311/gjeis/2016/7658
Jiang, Y., Deng, D., Wang, J., and Li, G., 2013. Efficient parallel partition based algorithms for similarity search and join with edit distance constraints, in Proc. Joint EDBT/ICDT 2013 Workshops, Genoa. doi: 10.1145/2457317.2457382
Kisla, T., Karaoglan, B., and Metin, S. K., 2015. Extracting the Features of Similarity in Short Texts. IEEE 23th Signal Processing And Communications Applications Conference, Malatya, pp. 180-183, doi: 10.1109/SIU.2015.7130443
Kruskal, J. B., and Sankoff, D., 1999. Time Warps, String Edits, and Macromolecules: The Theory and Practice of Sequence Comparison. Stanford, CA: CSLI Publications.
Kurdziel, L. B. F., and Spencer, R. M. C., 2016. Consolidation of novel word learning in native English-speaking adults, Memory, vol. 24, no. 4, pp. 471-481, doi: 10.1080/09658211.2015.1019889
Levenshtein, V. I., 1966. Binary codes capable of correcting deletions, insertions, and reversals, Soviet Physics Doklady, vol. 10, no. 8. pp. 707-710.
Li, G., Deng, D., and Feng, J., 2013. A partition-based method for string similarity joins with edit-distance constraints, ACM Transactions on Database Systems (TODS), vol. 38, no. 2, pp. 1–33, doi: 10.1145/2487259.2487261
Li, X., Pan, B., Law, R., and Huang, X., 2017. Forecasting tourism demand with composite search index, Tourism Management, vol. 59, pp. 57-66, 2017. doi: 10.1016/j.tourman.2016.07.005
Liu, Y., Teichert, T., Rossi, M., Li, H., and Hu, F., 2017. Big data for big insights: Investigating language-specific drivers of hotel satisfaction with 412,784 user-generated reviews, Tourism Management, vol. 59, pp. 554–563.
Lodhi, H., Saunders, C., Shawe-Taylor, J., Cristianini, N., and Watkins, C., 2002. Text classification using string kernels, Journal of Machine Learning Research, vol. 2, pp. 419-444.
Miah, S. J., Vu, H. Q., Gammack, J.,. and McGrath, M., 2017. A big data analytics method for tourist behaviour analysis, Information & Management, vol. 54, no. 6, pp. 771-785, doi: 10.1016/j.im.2016.11.011
Mutalib N. S. A., and Noah, S. A., 2011. Phonetic coding methods for Malay names retrieval,” International Conference on Semantic Technology and Information Retrieval, Putrajaya, pp. 125-129. doi: 10.1109/STAIR.2011.5995776
Naumann, F., and Herschel, M., 2010. An introduction to duplicate detection,” Synthesis Lectures on Data Management, vol. 2, no.1, pp. 1-87, doi: 10.2200/ S00262ED1V01Y201003DTM003
Nyirarugira, C., and Kim, T., 2015. Stratified gesture recognition using the normalized longest common subsequence with rough sets, Signal Processing: Image Communication, vol. 30, pp. 178-189, doi: 10.1016/j.image.2014.10.00844.
Odell, M., and Russell, R., 1918. The soundex coding system, US Patents 1261167.
Onder, I., 2017. Classifying multi-destination trips in Austria with big data, Tourism Management Perspectives, vol. 21, pp. 54-58, doi: 10.1016/j.tmp.2016.11.002
Parmar, V. P., and Kumbharana, C. K., 2014. Study existing various phonetic algorithms and designing and development of a working model for the new developed algorithm and comparison by implementing it with existing algorithm (s), International Journal of Computer Applications, vol. 98, no. 19, pp. 45-49.
Peng, X., and Huang, Z., 2012. Enabling semantic queries against the spatial database, Advances in Electrical and Computer Engineering, vol. 12, no.1, pp. 45-50, doi: 10.4316/AECE.2012.01008
Sagiroglu, S., and Sinanc, D., 2013. Big data: A review, International Conference on Collaboration Technologies and Systems (CTS), San Diego, pp 42-47. doi: 10.1109/CTS.2013.6567202
Shedeed, H. A., and Abdel, H., 2011. A new intelligent methodology for computer based assessment of short answer question based on a new enhanced soundex phonetic algorithm for Arabic language, International Journal of Computer Applications, vol. 34, no. 10, pp. 40-47.
Shrote, K. R., and Deorankar, A. V., 2016 Hotel recommendation system using hadoop and mapreduce for big data, International Journal of Computer Science, Information Technology, and Security, vol. 6, no. 2, pp. 137–141.
Stein-Smith, K., 2016. The US Foreign Language Deficit: Strategies for Maintaining a Competitive Edge in a Globalized World. Palgrave Macmillan, pp. 21, doi: 10.1007/978-3-319-34159-0
Su, Z., Ahn, B. R., Eom, K. Y., Kang, M. K., Kim, J. P., and Kim, M. K., 2008. Plagiarism detection using the Levenshtein distance and Smith-Waterman algorithm, 3rd International Conference on Innovative Computing Information and Control, Dalian, Liaoning, pp. 0-3. doi: 10.1109/ICICIC.2008.422
Tabataba F. S., and Mousavi, S. R., 2012. A hyper-heuristic for the longest common subsequence problem, Computational Biology and Chemistry, vol. 36, pp. 42–54, doi: 10.1016/j.compbiolchem.2011.12.004
Toole, J. L., Colak, S., Sturt, B., Alexander, L. P., Evsukoff, A., and González, M. C., The path most traveled: Travel demand estimation using big data resources, Transportation Research Part C: Emerging Technologies, vol. 58, pp. 162-177, 2015. doi: 10.1016/j.trc.2015.04.022
Ugon, A., T. 2015. Nicolas, M. Richard, P. Guerin, P. Chansard, C. Demoor, and L. Toubiana, “A new approach for cleansing geographical dataset using Levenshtein distance, prior knowledge and contextual information, Medical Informatics Europe, Madrid, pp. 227-229. doi: 10.3233/978-1-61499-512-8-227
Xiang, L. , Jiang, N., Ya-ting, Y., Xi, Z., and Cheng-gang, M., 2014. Application of generalization language model in Chinese-Uyghur machine translation, Application Research of Computers, vol. 31, no. 10, pp. 2994-2997, doi: 10.3969/j.issn.1001-3695.2014.10.026.
Xiang, Z., Schwartz, Z., Gerdes, J. H., and Uysal, M., 2015. What can big data and text analytics tell us about hotel guest experience and satisfaction? International Journal of Hospitality Management, vol. 44, pp. 120-130, doi: 10.1016/j.ijhm.2014.10.013
Yahia, M. E., Saeed, M. E., and Salih, A. M., 2006. An intelligent algorithm for Arabic soundex function using intuitionistic fuzzy logic, 3rd International IEEE Conference Intelligent Systems, London, pp. 711-715. doi: 10.1109/IS.2006.348506
Zikopoulos, P., and Eaton, C., 2011. Understanding Big Data: Analytics for Enterprise Class Hadoop and Streaming Data. Mcgraw-Hill Osborne Media Press, pp. 176.

Toplam 46 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Bilgisayar Yazılımı
Bölüm	Araştırma Makalesi \ Research Makaleler
Yazarlar	Bekir Aksoy 0000-0001-8052-9411 Sinan Uğuz 0000-0003-4397-6196 Okan Oral 0000-0003-4256-0930
Yayımlanma Tarihi	15 Eylül 2019
Gönderilme Tarihi	3 Ekim 2018
Kabul Tarihi	4 Nisan 2019
Yayımlandığı Sayı	Yıl 2019 Cilt: 7 Sayı: 3

Kaynak Göster

APA	Aksoy, B., Uğuz, S., & Oral, O. (2019). COMPARISON OF THE DATA MATCHING PERFORMANCES OF STRING SIMILARITY ALGORITHMS IN BIG DATA. Mühendislik Bilimleri ve Tasarım Dergisi, 7(3), 608-618. https://doi.org/10.21923/jesd.467036

Cited By

The experimental application of popular machine learning algorithms on predictive maintenance and the design of IIoT based condition monitoring system

Computers & Industrial Engineering

Mustafa Cakir

https://doi.org/10.1016/j.cie.2020.106948

FABRIC AND PRODUCTION DEFECT DETECTION IN THE APPAREL INDUSTRY USING DATA MINING ALGORITHMS

International Journal of 3D Printing Technologies and Digital Industry

https://doi.org/10.46519/ij3dptdi.1030676

Makale Dosyaları

Tam Metin