The application of lexical simplification techniques in this study aims to reduce vocabulary mismatch between user queries and document collections. Two distinct strategies —complex word substitution (Stg1) and query expansion with simplified alternative (Stg2)— are systematically evaluated across six benchmark datasets (FiQA, ArguAna, NFCorpus, SCIDOCS, SciFact, and TREC-COVID) using nDCG@10 and MAP utilizing dense and sparse neural retrieval models. Our experiments reveal that Stg1 consistently degrades retrieval performance, with nDCG@10 and MAP declines often exceeding 0.10 and 0.03, respectively. In contrast, Stg2 preserves nearly all baseline effectiveness, nDCG@10 drops by fewer than 0.02 points, MAP by fewer than 0.01, and in some instances even surpasses the baseline. Risk-sensitive evaluation further confirms that Stg2 incurs minimal or no per-query risk and exhibits promising results, whereas Stg1 uniformly harms effectiveness. These findings indicate that appending simplified term variants alongside original query terms offers a fair and robust means of addressing vocabulary mismatch without sacrificing precision or recall. It is recommended that future IR pipelines incorporate carefully tailored query simplification models to balance user query intent and high retrieval quality.
Lexical simplification Information retrieval Complex word identification Vocabulary mismatching Query rewriting
Primary Language | English |
---|---|
Subjects | Natural Language Processing |
Journal Section | Articles |
Authors | |
Publication Date | September 25, 2025 |
Submission Date | June 10, 2025 |
Acceptance Date | September 19, 2025 |
Published in Issue | Year 2025 Volume: 26 Issue: 3 |