Araştırma Makalesi

COMPARISON OF THE DATA MATCHING PERFORMANCES OF STRING SIMILARITY ALGORITHMS IN BIG DATA

Cilt: 7 Sayı: 3 15 Eylül 2019
PDF İndir
EN TR

COMPARISON OF THE DATA MATCHING PERFORMANCES OF STRING SIMILARITY ALGORITHMS IN BIG DATA

Öz

The great mobility in the world tourism in recent years has also enabled this sector to be included among the study areas of big data. In this study, a solution proposal was put forward by using the big data and string similarity algorithms (SSA) for the problems arising from the entry of the hotel data coming from different providers into databases with different names and addresses. Therefore, 2599 hotels of a tourism agency with a wide hotel network located in London were selected as the sample, and the Map-Reduce process was performed by using the Soundex algorithm to match these hotels with approximately three million hotel data coming from seventy different providers. Matching with Map-Reduce ensured a significant reduction in process count and process time. Furthermore, the Dice coefficient, Levenshtein and Longest common subsequence (LCS) algorithms were compared in terms of the data that they correctly matched, and process time. In this stage, the words decreasing the score of the algorithms in the database were detected and removed before the algorithms were implemented. The Dice coefficient algorithm yielded better results in terms of correct matching, and the Levenshtein algorithm yielded better results in terms of process time.

Anahtar Kelimeler

Kaynakça

  1. Bakar, Z. A., Sembok, T. M. T., and Yusoff, M., 2000. An evaluation of retrieval effectiveness using spelling-correction and string-similarity matching methods on Malay texts, Journal of the Association for Information Science and Technology, vol. 51, no. 8, pp. 691-706, doi: 10.1002/(SICI)1097-4571(2000)51:8<691: :AID-ASI20>3.0.CO;2-U
  2. Baruah, D., and Mahanta, A. K., 2013. A new similarity measure with length factor for plagiarism detection, International Journal of Computer Applications, vol. 72, no. 14, pp. 14-17.
  3. Baruah, D., and Mahanta, A. K., 2015. Design and development of soundex for assamese language, International Journal of Computer Applications, vol. 117, no. 9, pp. 9-12, doi: 10.5120/20581-3000
  4. Bhatti, Z., Waqas, A., Ismaili, I. A., Hakro, D. N., and Soomro, W. J., 2014. Phonetic based soundex and shapeex algorithm for Sindhi spell checker system, Advances in Environmental Biology, vol. 8, no. 4, pp. 1147-1155.
  5. Bird, S., Klein, E., and Loper, E., 2009. Natural Language Processing with Python. O’Reilly Press, pp. 463.
  6. Cavoukian, A., and Jonas, J., 2012. Privacy by design in the age of big data. Information and Privacy Commissioner of Ontario, Canada, pp. 3.
  7. Chaudhary, A., Wakchoure, N., Gotarne, N., Nath, P., and B., Dhakulkar, 2016. A comparative study on name matching algorithms, International Journal of Research in Advent Technology, vol. 4, no. 5, pp. 127-129.
  8. Chen, X., and Zhou, L., 2015. Design and implementation of an intelligent system for tourist routes recommendation based on Hadoop, 6th IEEE International Conference on Software Engineering and Service Science (ICSESS), Beijing, pp. 774–778. doi: 10.1109/ICSESS.2015.7339171

Ayrıntılar

Birincil Dil

İngilizce

Konular

Bilgisayar Yazılımı

Bölüm

Araştırma Makalesi

Yayımlanma Tarihi

15 Eylül 2019

Gönderilme Tarihi

3 Ekim 2018

Kabul Tarihi

4 Nisan 2019

Yayımlandığı Sayı

Yıl 2019 Cilt: 7 Sayı: 3

Kaynak Göster

APA
Aksoy, B., Uğuz, S., & Oral, O. (2019). COMPARISON OF THE DATA MATCHING PERFORMANCES OF STRING SIMILARITY ALGORITHMS IN BIG DATA. Mühendislik Bilimleri ve Tasarım Dergisi, 7(3), 608-618. https://doi.org/10.21923/jesd.467036
AMA
1.Aksoy B, Uğuz S, Oral O. COMPARISON OF THE DATA MATCHING PERFORMANCES OF STRING SIMILARITY ALGORITHMS IN BIG DATA. MBTD. 2019;7(3):608-618. doi:10.21923/jesd.467036
Chicago
Aksoy, Bekir, Sinan Uğuz, ve Okan Oral. 2019. “COMPARISON OF THE DATA MATCHING PERFORMANCES OF STRING SIMILARITY ALGORITHMS IN BIG DATA”. Mühendislik Bilimleri ve Tasarım Dergisi 7 (3): 608-18. https://doi.org/10.21923/jesd.467036.
EndNote
Aksoy B, Uğuz S, Oral O (01 Eylül 2019) COMPARISON OF THE DATA MATCHING PERFORMANCES OF STRING SIMILARITY ALGORITHMS IN BIG DATA. Mühendislik Bilimleri ve Tasarım Dergisi 7 3 608–618.
IEEE
[1]B. Aksoy, S. Uğuz, ve O. Oral, “COMPARISON OF THE DATA MATCHING PERFORMANCES OF STRING SIMILARITY ALGORITHMS IN BIG DATA”, MBTD, c. 7, sy 3, ss. 608–618, Eyl. 2019, doi: 10.21923/jesd.467036.
ISNAD
Aksoy, Bekir - Uğuz, Sinan - Oral, Okan. “COMPARISON OF THE DATA MATCHING PERFORMANCES OF STRING SIMILARITY ALGORITHMS IN BIG DATA”. Mühendislik Bilimleri ve Tasarım Dergisi 7/3 (01 Eylül 2019): 608-618. https://doi.org/10.21923/jesd.467036.
JAMA
1.Aksoy B, Uğuz S, Oral O. COMPARISON OF THE DATA MATCHING PERFORMANCES OF STRING SIMILARITY ALGORITHMS IN BIG DATA. MBTD. 2019;7:608–618.
MLA
Aksoy, Bekir, vd. “COMPARISON OF THE DATA MATCHING PERFORMANCES OF STRING SIMILARITY ALGORITHMS IN BIG DATA”. Mühendislik Bilimleri ve Tasarım Dergisi, c. 7, sy 3, Eylül 2019, ss. 608-1, doi:10.21923/jesd.467036.
Vancouver
1.Bekir Aksoy, Sinan Uğuz, Okan Oral. COMPARISON OF THE DATA MATCHING PERFORMANCES OF STRING SIMILARITY ALGORITHMS IN BIG DATA. MBTD. 01 Eylül 2019;7(3):608-1. doi:10.21923/jesd.467036

Cited By