Research Article

ON THE USEFULNESS OF HTML META ELEMENTS FOR WEB RETRIEVAL

Volume: 21 Number: 1 March 31, 2020
EN

ON THE USEFULNESS OF HTML META ELEMENTS FOR WEB RETRIEVAL

Abstract

Web retrieval studies have mostly used URL, title, body, and anchor text fields to represent Web documents. On the other hand, HTML standards provide a rich set of elements to define different parts of a Web page. For example, meta elements are used to provide structured metadata about a Web page not to end users, but instead to browsers or crawlers. However, it is unclear whether meta tags are or are not useful for Web retrieval, as most of the previous studies leveraged URL, title, body, and anchor text fields. In this work, we examine the usefulness of two meta tags, namely keywords and description, based on ad-hoc tasks of previous TREC studies. Through experiments on the standard TREC Web datasets and several query sets, our results using the state-of-the-art term-weighting models show that the utilization of description field systematically increases the retrieval effectiveness, to a statistically significant degree most of the time. By contrast, the employment of keywords field may cause a significant deterioration in retrieval effectiveness for certain term-weighting models.

Keywords

Information Retrieval,Web Retrieval,Meta Tags,ClueWeb,HTML

References

  1. Robertson S, Zaragoza H, Taylor M. Simple BM25 Extension to Multiple Weighted Fields, in Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, pp. 42-49.
  2. Croft WB. "Combining Approaches to Information Retrieval," W. B. Croft, Ed., ed: Springer US, 2000, pp. 1-36.
  3. Turner TP, Brackbill L. Rising to the top: evaluating the use of the HTML meta tag to improve retrieval of World Wide Web documents through Internet search engines. Library Resources & Technical Services 1998; 42: 258-271.
  4. Hiemstra D, Hauff C, "MapReduce for Information Retrieval Evaluation: “Let's Quickly Test This on 12 TB of Data”," in Multilingual and Multimodal Information Access Evaluation, M. Agosti, N. Ferro, C. Peters, M. de Rijke, and A. Smeaton, Eds., ed: Springer Berlin Heidelberg, 2010, pp. 64-69.
  5. Mao J, Sakai T, Luo C, Xiao P, Liu Y, Dou Z. Overview of the NTCIR-14 we want web task. 2019; 455-467.
  6. Brin S, Page L. The anatomy of a large-scale hypertextual Web search engine. Computer Networks and ISDN Systems 1998; 30: 107-117.
  7. Ounis I, Amati G, Plachouras V, He B, Macdonald C, Johnson D. Terrier Information Retrieval Platform, in Advances in Information Retrieval, pp. 517-519.
  8. Yang P, Fang H, Lin J. Anserini: Reproducible Ranking Baselines Using Lucene. J. Data and Information Quality 2018; 10: 16:1-16:20.
  9. Verma M, Yilmaz E, Mehrotra R, Kanoulas E, Carterette B, Craswell N, et al. Overview of the TREC Tasks Track 2016. 2016.
  10. Sanderson M, Croft WB. The History of Information Retrieval Research. Proceedings of the IEEE 2012; 100: 1444-1451.
APA
Arslan, A. (2020). ON THE USEFULNESS OF HTML META ELEMENTS FOR WEB RETRIEVAL. Eskişehir Technical University Journal of Science and Technology A - Applied Sciences and Engineering, 21(1), 182-198. https://doi.org/10.18038/estubtda.615103
AMA
1.Arslan A. ON THE USEFULNESS OF HTML META ELEMENTS FOR WEB RETRIEVAL. Estuscience - Se. 2020;21(1):182-198. doi:10.18038/estubtda.615103
Chicago
Arslan, Ahmet. 2020. “ON THE USEFULNESS OF HTML META ELEMENTS FOR WEB RETRIEVAL”. Eskişehir Technical University Journal of Science and Technology A - Applied Sciences and Engineering 21 (1): 182-98. https://doi.org/10.18038/estubtda.615103.
EndNote
Arslan A (March 1, 2020) ON THE USEFULNESS OF HTML META ELEMENTS FOR WEB RETRIEVAL. Eskişehir Technical University Journal of Science and Technology A - Applied Sciences and Engineering 21 1 182–198.
IEEE
[1]A. Arslan, “ON THE USEFULNESS OF HTML META ELEMENTS FOR WEB RETRIEVAL”, Estuscience - Se, vol. 21, no. 1, pp. 182–198, Mar. 2020, doi: 10.18038/estubtda.615103.
ISNAD
Arslan, Ahmet. “ON THE USEFULNESS OF HTML META ELEMENTS FOR WEB RETRIEVAL”. Eskişehir Technical University Journal of Science and Technology A - Applied Sciences and Engineering 21/1 (March 1, 2020): 182-198. https://doi.org/10.18038/estubtda.615103.
JAMA
1.Arslan A. ON THE USEFULNESS OF HTML META ELEMENTS FOR WEB RETRIEVAL. Estuscience - Se. 2020;21:182–198.
MLA
Arslan, Ahmet. “ON THE USEFULNESS OF HTML META ELEMENTS FOR WEB RETRIEVAL”. Eskişehir Technical University Journal of Science and Technology A - Applied Sciences and Engineering, vol. 21, no. 1, Mar. 2020, pp. 182-98, doi:10.18038/estubtda.615103.
Vancouver
1.Ahmet Arslan. ON THE USEFULNESS OF HTML META ELEMENTS FOR WEB RETRIEVAL. Estuscience - Se. 2020 Mar. 1;21(1):182-98. doi:10.18038/estubtda.615103