Benchmark Effect of Web Search Engines on Text Mining
Öz
There have been many studies about creating a dictionary and these studies have come from past to present with different methods and different analyzes. Especially with the emergence of the World Wide Web, efforts to create dictionary based on instant data have gained importance. Therefore, the performance of the web search engines directly effects the model which is using web documents for automatic dictionary creation. The web search engines were evaluated in terms of their suggested documents relationality to the query in the research. For this purpose, an automatic dictionary creating model using web documents were developed. First of all, the topic seed words are determined by the documents presented to the system initially. Search is executed by these seed words initially. Then TF-IDF metric was used as meaningful word selection method for returned first document. The top n meaningful words were selected from the highest TF-IDF values. The value of n was determined experimentally. When searching the web with these words added to the dictionary, new documents were suggesting by the web search engine. By repeating the process, experimental dictionaries of a certain size were obtained. By the way, the documents suggested by each web engine are generally different, so that the dictionary similarity produced from the top suggested documents can measure web engines performance of selecting relational documents. Hash similarity was used to evaluate dictionary performance. According to the results, dictionary with the 73.9% highest similarity for Google search engine, dictionary with the 68.7% highest similarity for Bing search engine and dictionary with the 60.5% highest similarity for Yandex search engine were produced.
Anahtar Kelimeler
Kaynakça
- B V.Z. Kepuska and P. Rojanasthie, “Speech corpus generation from DVDs of movies and tv series,” Journal of International Technology and Information Management, vol. 20(1), pp. 49-82, 2011.
- R. Ellen, “Automatically constructing a dictionary for information extraction tasks,” Proceedings of the Eleventh National Conference on Artificial Intelligence, pp. 811-816, 1993.
- S. Koeva, I. Stoyanova, M. Todorova and S. Leseva, “Semi-automatic compilation of the dictionary of Bulgarian multiword expressions,” Proceedings of GLOBALEX 2016, pp. 86-95, 2016. https://doi.org/10.5281/zenodo.1469527
- K.E. Silverman, V. Anderson, J.R. Bellegarda, K.A. Lenzo and D. Naik, “Design and collection of corpus of polyphones and prosodic contexts for speech synthesis research and development,” Sixth European Conference on Speech Communication and Technology, PP. 5-9, 1999.
- A. Toprak, “Creating English dictionary with natural language processing,” Published Master Thesis, Istanbul Commerce University Institute of Science, Istanbul, 2019.
- C. Caldera, R. Berndt, E. Eggeling, M. Schröttner and D.W. Fellner, “PRIMA-towards an automatic review / paper matching score calculation,” The Sixth International Conference on Creative Content Technologies (CONTENT 2014), pp. 70-75, 2014.
- A. Mishra, and S. Vishwakarma, “Analysis of TF-IDF model and its variant for document retrieval,” International Conference on Computational Intelligence and Communication Networks (CICN), pp. 772-776, 2015. https://www.doi.org/10.1109/CICN.2015.157
- J. Lavid, H.J. Arús, B. Clerck and V. Hoste, “Creation of a high-quality, register-diversified parallel (English-Spanish) corpus for linguistic and computational investigations,” 7th International Conference on Corpus Linguistics (CILC2015), vol. 198, pp. 249-256, 2015. https://doi.org/10.1016/j.sbspro.2015.07.443
Ayrıntılar
Birincil Dil
İngilizce
Konular
Mühendislik
Bölüm
Araştırma Makalesi
Yayımlanma Tarihi
15 Ocak 2021
Gönderilme Tarihi
21 Temmuz 2020
Kabul Tarihi
15 Kasım 2020
Yayımlandığı Sayı
Yıl 2021 Cilt: 4 Sayı: 1
APA
Toprak, A., & Turan, M. (2021). Benchmark Effect of Web Search Engines on Text Mining. Veri Bilimi, 4(1), 84-92. https://izlik.org/JA23GJ35ZR
AMA
1.Toprak A, Turan M. Benchmark Effect of Web Search Engines on Text Mining. Veri Bilim Derg. 2021;4(1):84-92. https://izlik.org/JA23GJ35ZR
Chicago
Toprak, Ahmet, ve Metin Turan. 2021. “Benchmark Effect of Web Search Engines on Text Mining”. Veri Bilimi 4 (1): 84-92. https://izlik.org/JA23GJ35ZR.
EndNote
Toprak A, Turan M (01 Ocak 2021) Benchmark Effect of Web Search Engines on Text Mining. Veri Bilimi 4 1 84–92.
IEEE
[1]A. Toprak ve M. Turan, “Benchmark Effect of Web Search Engines on Text Mining”, Veri Bilim Derg, c. 4, sy 1, ss. 84–92, Oca. 2021, [çevrimiçi]. Erişim adresi: https://izlik.org/JA23GJ35ZR
ISNAD
Toprak, Ahmet - Turan, Metin. “Benchmark Effect of Web Search Engines on Text Mining”. Veri Bilimi 4/1 (01 Ocak 2021): 84-92. https://izlik.org/JA23GJ35ZR.
JAMA
1.Toprak A, Turan M. Benchmark Effect of Web Search Engines on Text Mining. Veri Bilim Derg. 2021;4:84–92.
MLA
Toprak, Ahmet, ve Metin Turan. “Benchmark Effect of Web Search Engines on Text Mining”. Veri Bilimi, c. 4, sy 1, Ocak 2021, ss. 84-92, https://izlik.org/JA23GJ35ZR.
Vancouver
1.Ahmet Toprak, Metin Turan. Benchmark Effect of Web Search Engines on Text Mining. Veri Bilim Derg [Internet]. 01 Ocak 2021;4(1):84-92. Erişim adresi: https://izlik.org/JA23GJ35ZR