Deep Learning-Based Object Detection with Mobile Application and Expression Generation Using a Large Language Model
Abstract
This work presents an integrated mobile solution that allows users to detect objects in their environment, measure their distances, and understand the spatial relationships between them. The system combines YOLOv11-based real-time object detection, LiDAR-assisted distance measurement, and GPT-4o expression generation, allowing users to locate desired objects and learn about nearby objects. This allows the user to understand not only the presence of objects but also their locations and their spatial relationships. In this study, images are captured with a mobile application during object detection, ensuring that the object is always within the frame. This prevents problems such as blurring and incorrect framing, which are frequently encountered in photos created by visually impaired users. Experimental results show that the YOLOv11 model demonstrates effective performance with an F1 score of 0.77 and a mAP value of 0.806. Furthermore, the fine-tuned GPT-4o model identifies object locations in images and generates expressions that include other surrounding objects. The present work proposes a system that integrates object detection, LiDAR-based distance measurement, and expression generation from a large language model. It provides a reference for the implementation of more advanced solutions in the future.
Keywords
References
- Abed, A. A., Al-Ibadi, A., & Abed, I. A. (2023). Real-time multiple face mask and fever detection using YOLOv3 and TensorFlow lite platforms. Bulletin of Electrical Engineering and Informatics, 12(2), 922-929.
- Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F. L., Almeida, D., Altenschmidt, J., Altman, S., & Anadkat, S. (2023). Gpt-4 technical report. arXiv preprint arXiv:2303.08774.
- Alamsyah, D. P., Ramdhani, Y., Syam, A. T., & Setiadi, A. (2022). Augmented Reality English Education Based iOS with MobileNetV2 Image Recognition Model. 2022 Seventh International Conference on Informatics and Computing (ICIC),
- Alemdar, K. D., Kayacı Çodur, M., Codur, M. Y., & Uysal, F. (2023). Environmental Effects of Driver Distraction at Traffic Lights: Mobile Phone Use. Sustainability, 15(20), 15056.
- Boyar, T., & Yıldız, K. (2022). Powdery mildew detection in hazelnut with deep learning. Hittite Journal of Science and Engineering, 9(3), 159-166.
- Chen, C., Anjum, S., & Gurari, D. (2022). Grounding answers for visual questions asked by visually impaired people. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Chen, C., Tseng, Y.-Y., Li, Z., Venkatesh, A., & Gurari, D. (2025). Acknowledging Focus Ambiguity in Visual Questions. arXiv preprint arXiv:2501.02201.
- Chen, J., & Zhu, Z. (2023). Real-time 3D object detection, recognition and presentation using a mobile device for assistive navigation. SN Computer Science, 4(5), 543. Furniture Computer Vision Dataset. (2022). Retrieved 19.11.2025 from https://universe.roboflow.com/objectdetection-uzld5/furniture-ngpea-h6zxi/
- Gurari, D., Li, Q., Stangl, A. J., Guo, A., Lin, C., Grauman, K., Luo, J., & Bigham, J. P. (2018). Vizwiz grand challenge: Answering visual questions from blind people. Proceedings of the IEEE conference on computer vision and pattern recognition, Han, X., Zhang, Z., Ding, N., Gu, Y., Liu, X., Huo, Y., Qiu, J., Yao, Y., Zhang, A., & Zhang, L. (2021). Pre-trained models: Past, present and future. AI Open, 2, 225-250. He, L., Zhou, Y., Liu, L., Zhang, Y., & Ma, J. (2025). Application of the YOLOv11-seg algorithm for AI-based landslide detection and recognition. Scientific Reports, 15(1), 12421.
Details
Primary Language
English
Subjects
Computer Vision, Natural Language Processing
Journal Section
Research Article
Authors
Nurcihan Dere
*
0009-0009-6072-6990
Türkiye
Kazım Yıldız
0000-0001-6999-1410
Türkiye
Önder Demir
0000-0003-4540-663X
Türkiye
Early Pub Date
April 8, 2026
Publication Date
-
Submission Date
November 21, 2025
Acceptance Date
December 16, 2025
Published in Issue
Year 2026 Number: Advanced Online Publication