Feature Extraction for Real Estate Images and Titles with LLMs

Afra Arslan; Tan Doruk Yetki; Arda Yücel; Hacer Turgut; Ömür Bali; Gülfem Işıklar Alptekin; Günce Keziban Orman

doi:10.35377/saucis...1829206

Feature Extraction for Real Estate Images and Titles with LLMs

Abstract

Images and titles often contain rich latent information about their associated objects, particularly on web-based platforms. Real estate websites provide a clear example, where listing images and titles provide important details that assist users in their decision-making. However, these unstructured elements cannot be directly utilized in downstream machine learning tasks, since their contextual meaning is not directly interpretable. This work aims to transform listing images and titles into structured, tabular representations, making them suitable for analytical and predictive modeling. To this end, we propose a modular framework based on state-of-the-art large language models. The framework incorporates ReAct, LLM-as-a-Judge, and few-shot prompting techniques. Its performance is evaluated on a real-world real estate dataset and compared with BERT and CLIP-based baselines. Experimental results demonstrate that our framework achieves up to a 44.26% improvement in recall for listing attributes, such as the presence of a balcony or the furnishing status of a property.

Keywords

Supporting Institution

TUBITAK

Project Number

124E135

References

T. Brown et al., Language models are few-shot learners, Adv. Neural Inf. Process. Syst., 33 (2020), 1877–1901.
J. Dagdelen et al., Structured information extraction from scientific text with large language models, Nat. Commun., 15(1) (2024), 1418.
S. Desai and G. Durrett, Calibration of pre-trained transformers, in Proc. 2020 Conf. Empirical Methods Nat. Lang. Process. (EMNLP), Association for Computational Linguistics, (2020), 295–302.
F. Feng, Y. Yang, D. Cer, N. Arivazhagan, and W. Wang, Language-agnostic BERT sentence embedding, arXiv preprint arXiv:2007.01852, 2020. https://arxiv.org/abs/2007.01852
Y. Guo, C. Wang, S. X. Yu, F. McKenna, and K. H. Law, AdaLN: a vision transformer for multidomain learning and predisaster building information extraction from images, J. Comput. Civ. Eng., 36(5) (2022), 04022024.
T. Gupta, M. Zaki, N. M. A. Krishnan, and Mausam, MatSciBERT: a materials domain language model for text mining and information extraction, Comput. Mater. Sci., 8(1) (2022), 102.
K. Han, Y. Wang, J. Guo, Y. Tang, and E. Wu, Vision GNN: an image is worth graph of nodes, Adv. Neural Inf. Process. Syst., 35 (2022), 8291–8303.
ilab-core, LLM-based real estate information extraction, GitHub repository. https://github.com/ilab-core/llm-based-real-estate-information-extraction

M. Kvet, M. Potocˇa´r, and S. Tatarka, Real estate attribute value extraction using large language models, IEEE Access, 13 (2025).
F. Li, M. Zhang, G. Fu, and D. Ji, A neural joint model for entity and relation extraction from biomedical text, BMC Bioinformatics, 18 (2017), 1–11.
W.-C. Liao, D. Y. Koh, G. Yin, and S. Zheng, Feng Shui meets AI: Measuring its economic value in commercial real estate, MIT Center for Real Estate Research Paper No. 25/08, Nov. 2025.
S. Marukatat, Tutorial on PCA and approximate PCA and approximate kernel PCA, Artif. Intell. Rev., 56(6) (2023), 5445–5477.
OpenAI, OpenAI, online resource. https://openai.com
A. Radford et al., Learning transferable visual models from natural language supervision, arXiv preprint arXiv:2103.00020, 2021. https://arxiv.org/abs/2103.00020
G. Salton, Automatic Information Organization and Retrieval, McGraw-Hill Book Company, New York, 1968.
K. Spa¨rck Jones, A theoretical comparison of recall and precision, J. Doc., 28(1) (1972), 11–21.
A. Vijayan, A prompt engineering approach for structured data extraction from unstructured text using conversational LLMs, in Proc. 2023 6th Int. Conf. Algorithms, Comput. Artif. Intell. (ACAI ’23), Association for Computing Machinery, (2024), 183–189.
A. Wang, A. Singh, J. Michael, F. Hill, O. Levy, and S. R. Bowman, GLUE: a multi-task benchmark and analysis platform for natural language understanding, in Proc. 7th Int. Conf. Learn. Represent. (ICLR), 2019.
K. Wu, S. Yuan, C. Shen, L. Xu, and M. Chen, AutoMV: An autonomous agent framework for real estate marketing video generation, Proc. AAAI Conf. Artif. Intell., 39(28) (2025), 29715–29717.
S. Yao et al., ReAct: Synergizing reasoning and acting in language models, arXiv preprint arXiv:2210.03629, 2023. https://arxiv.org/abs/2210.03629
L. Zheng et al., Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena, arXiv preprint arXiv:2306.05685, 2023. https://arxiv.org/abs/2306.05685

Details

Primary Language

English

Subjects

Natural Language Processing, Artificial Intelligence (Other)

Journal Section

Research Article

Authors

Afra Arslan ^*
0009-0006-4857-5155
Türkiye

Tan Doruk Yetki
0009-0000-7304-2605
Türkiye

Arda Yücel
0009-0003-2926-9257
Türkiye

Hacer Turgut
0000-0002-7680-0878
Türkiye

Ömür Bali
0009-0005-4907-649X
Türkiye

Gülfem Işıklar Alptekin
0000-0003-0146-1581
Türkiye

Günce Keziban Orman
0000-0003-0402-8417
Türkiye

Early Pub Date

June 19, 2026

Publication Date

June 30, 2026

Submission Date

November 24, 2025

Acceptance Date

May 5, 2026

Published in Issue

Year 2026 Volume: 9 Number: 3

DOI

https://doi.org/10.35377/saucis...1829206

IZ

https://izlik.org/JA63XS76PU

Cite

RIS / Bibtex

APA

Arslan, A., Yetki, T. D., Yücel, A., Turgut, H., Bali, Ö., Işıklar Alptekin, G., & Orman, G. K. (2026). Feature Extraction for Real Estate Images and Titles with LLMs. Sakarya University Journal of Computer and Information Sciences, 9(3), 690-699. https://doi.org/10.35377/saucis...1829206

AMA

1.Arslan A, Yetki TD, Yücel A, et al. Feature Extraction for Real Estate Images and Titles with LLMs. SAUCIS. 2026;9(3):690-699. doi:10.35377/saucis.1829206

Chicago

Arslan, Afra, Tan Doruk Yetki, Arda Yücel, et al. 2026. “Feature Extraction for Real Estate Images and Titles With LLMs”. Sakarya University Journal of Computer and Information Sciences 9 (3): 690-99. https://doi.org/10.35377/saucis. 1829206.

EndNote

Arslan A, Yetki TD, Yücel A, Turgut H, Bali Ö, Işıklar Alptekin G, Orman GK (June 1, 2026) Feature Extraction for Real Estate Images and Titles with LLMs. Sakarya University Journal of Computer and Information Sciences 9 3 690–699.

IEEE

[1]A. Arslan et al., “Feature Extraction for Real Estate Images and Titles with LLMs”, SAUCIS, vol. 9, no. 3, pp. 690–699, June 2026, doi: 10.35377/saucis...1829206.

ISNAD

Arslan, Afra - Yetki, Tan Doruk - Yücel, Arda - Turgut, Hacer - Bali, Ömür - Işıklar Alptekin, Gülfem - Orman, Günce Keziban. “Feature Extraction for Real Estate Images and Titles With LLMs”. Sakarya University Journal of Computer and Information Sciences 9/3 (June 1, 2026): 690-699. https://doi.org/10.35377/saucis. 1829206.

JAMA

1.Arslan A, Yetki TD, Yücel A, Turgut H, Bali Ö, Işıklar Alptekin G, Orman GK. Feature Extraction for Real Estate Images and Titles with LLMs. SAUCIS. 2026;9:690–699.

MLA

Arslan, Afra, et al. “Feature Extraction for Real Estate Images and Titles With LLMs”. Sakarya University Journal of Computer and Information Sciences, vol. 9, no. 3, June 2026, pp. 690-9, doi:10.35377/saucis. 1829206.

Vancouver

1.Afra Arslan, Tan Doruk Yetki, Arda Yücel, Hacer Turgut, Ömür Bali, Gülfem Işıklar Alptekin, Günce Keziban Orman. Feature Extraction for Real Estate Images and Titles with LLMs. SAUCIS. 2026 Jun. 1;9(3):690-9. doi:10.35377/saucis. 1829206