TR
EN
A method to improve full-text search performance of MongoDB
Abstract
B-Tree based text indexes used in MongoDB are slow compared to different structures such as inverted indexes. In this study, it has been shown that the full-text search speed can be increased significantly by indexing a structure in which each different word in the text is included only once. The Multi-Stream Word-Based Compression Algorithm (MWCA), developed in our previous work, stores word dictionaries and data in different streams. While adding the documents to a MongoDB collection, they were encoded with MWCA and separated into six different streams. Each stream was stored in a different field, and three of them containing unique words were used when creating a text index. In this way, the index could be created in a shorter time and took up less space. It was also seen that Snappy and Zlib block compression methods used by MongoDB reached higher compression ratios on data encoded with MWCA. Search tests on text indexes created on collections using different compression options shows that our method provides 19 to 146 times speed increase and 34% to 40% less memory usage. Tests on regex searches that do not use the text index also shows that the MWCA model provides 7 to 13 times speed increase and 29% to 34% less memory usage.
Keywords
Kaynakça
- [1] Qiao Y. An FPGA-Based Snappy Decompressor-Filter. MSc Thesis, Delft University of Technology, Delft, Netherlands, 2018.
- [2] Deutsch P, Gailly JL. “Zlib compressed data format specification version 3.3”. RFC 1950, USA, 1996.
- [3] Collet Y, Kucherawy M. “Zstandard compression and the application/zstd Media Type”. RFC 8478, USA, 2018.
- [4] Öztürk E, Mesut A, Diri B. “Multi-Stream word-based compression algorithm for compressed text search”. Arabian Journal of Science and Engineering, 43(12), 8209–8221, 2018.
- [5] Habib A, Islam MJ, Rahman MS. “A dictionary-based text compression technique using quaternary code”. Iran Journal of Computer Science, 3(3), 127–136, 2020.
- [6] Rahman MA, Hamada M. “Burrows–Wheeler transform based lossless text compression using keys and huffman coding”. Symmetry, 12(10), 1654-1667, 2020.
- [7] Mahmood MA, Hasan KMA. “Efficient compression scheme for large natural text using zipf distribution”. International Conference on Advances in Science, Engineering and Robotics Technology, Dhaka, Bangladesh, 3 May 2019.
- [8] Bharathi K, Kumar H, Fairouz A, Al Kawam A, Khatri SP. “A plain-text incremental compression (pic) technique with fast lookup ability”. IEEE 36th International Conference on Computer Design, Orlando, FL, USA, 7-10 October 2018.
Ayrıntılar
Birincil Dil
İngilizce
Konular
Mühendislik
Bölüm
Araştırma Makalesi
Yayımlanma Tarihi
31 Ekim 2022
Gönderilme Tarihi
20 Temmuz 2021
Kabul Tarihi
10 Aralık 2021
Yayımlandığı Sayı
Yıl 2022 Cilt: 28 Sayı: 5
APA
Mesut, A., & Öztürk, E. (2022). A method to improve full-text search performance of MongoDB. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, 28(5), 720-729. https://izlik.org/JA55JM94MW
AMA
1.Mesut A, Öztürk E. A method to improve full-text search performance of MongoDB. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi. 2022;28(5):720-729. https://izlik.org/JA55JM94MW
Chicago
Mesut, Altan, ve Emir Öztürk. 2022. “A method to improve full-text search performance of MongoDB”. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi 28 (5): 720-29. https://izlik.org/JA55JM94MW.
EndNote
Mesut A, Öztürk E (01 Ekim 2022) A method to improve full-text search performance of MongoDB. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi 28 5 720–729.
IEEE
[1]A. Mesut ve E. Öztürk, “A method to improve full-text search performance of MongoDB”, Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, c. 28, sy 5, ss. 720–729, Eki. 2022, [çevrimiçi]. Erişim adresi: https://izlik.org/JA55JM94MW
ISNAD
Mesut, Altan - Öztürk, Emir. “A method to improve full-text search performance of MongoDB”. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi 28/5 (01 Ekim 2022): 720-729. https://izlik.org/JA55JM94MW.
JAMA
1.Mesut A, Öztürk E. A method to improve full-text search performance of MongoDB. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi. 2022;28:720–729.
MLA
Mesut, Altan, ve Emir Öztürk. “A method to improve full-text search performance of MongoDB”. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi, c. 28, sy 5, Ekim 2022, ss. 720-9, https://izlik.org/JA55JM94MW.
Vancouver
1.Altan Mesut, Emir Öztürk. A method to improve full-text search performance of MongoDB. Pamukkale Üniversitesi Mühendislik Bilimleri Dergisi [Internet]. 01 Ekim 2022;28(5):720-9. Erişim adresi: https://izlik.org/JA55JM94MW