Research Article

An AI-Driven PDF Query System Leveraging OpenAI LLM and LangChain for Enhanced Data Retrieval

Volume: 7 Number: 2 January 31, 2025
EN

An AI-Driven PDF Query System Leveraging OpenAI LLM and LangChain for Enhanced Data Retrieval

Abstract

One of the most important issues in knowledge mining is the problem of how to extract correct and useful information from the unprejudiced PDFs. Introducing OpenAI LLM with LangChain for contextual understanding of PDF, this paper proposes a new PDF querying system. The system operates in multiple stages: extracting text, generating the embedding, and storing the embeddings in a vector database. From a business user perspective, they can ask natural language queries, which the Conversational Chain of LangChain processes to obtain text chunks, context, and prompt optimization. The input provided by the user is processed by OpenAI’s highly developed LLM to produce factual and suitable output. The efficiency of the developed system has been tested through experiments on different PDF materials with higher accuracy, relevance of the search results and users’ satisfaction compared to conventional keyword-based search. LangChain helps to enriched text meaning from OpenAI, and its contextual reasoning helps to efficiently extract structured information from texts. This approach has innovative use cases in science, law, and finance by allowing easy access to large amounts of information available in PDFs. Through implementation of NLP, the proposed system enables effective search and enhanced learning from data that is less likely to be managed structurally.

Keywords

Supporting Institution

université 8 Mai 1945 Guelma

References

  1. Pappuri Jithendra Sai et al. An effective query system using LLMs and LangChain. International Journal of Engineering Research & Technology (IJERT), 12(06), 2023.
  2. Arjun Pesaru, Taranveer Singh Gill, and Archit Reddy Tangella. AI assistant for document management using LangChain and Pinecone. International Research Journal of Modernization in Engineering Technology and Science, 5(6):3980–3983, 2023.
  3. Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, et al. A survey of large language models. arXiv preprint arXiv:2303.18223, 2023.
  4. Konstantinos I Roumeliotis and Nikolaos D Tselikas. ChatGPT and Open-AI models: A preliminary review. Future Internet, 15(6):192, 2023.
  5. Fatih Soygazi and Damla Oguz. An analysis of large language models and LangChain in mathematics education. In Proceedings of the 2023 7th International Conference on Advances in Artificial Intelligence, pages 92–97, 2023.
  6. Thaís Medeiros, Morsinaldo Medeiros, Mariana Azevedo, Marianne Silva, Ivanovitch Silva, and Daniel G Costa. Analysis of language-model-powered chatbots for query resolution in PDF-based automotive manuals. Vehicles, 5(4):1384–1399, 2023.
  7. Keivalya Pandya and Mehfuza Holia. Automating customer service using LangChain: Building custom open-source GPT chatbot for organizations. arXiv preprint arXiv:2310.05421, 2023.
  8. Holkar Aniket, Bhosale Shivam, Harpale Avdhut, and Pachangane V.H. Unlocking the depth analysis of PDF using artificial intelligence, large language model, LangChain. International Research Journal of Modernization in Engineering Technology and Science, 6(2):682–684, 2024.

Details

Primary Language

English

Subjects

Natural Language Processing, Autonomous Agents and Multiagent Systems

Journal Section

Research Article

Authors

Tushar Sharma This is me
India

Pulkit Aggrawal This is me
India

Early Pub Date

January 30, 2025

Publication Date

January 31, 2025

Submission Date

December 19, 2024

Acceptance Date

January 11, 2025

Published in Issue

Year 2024 Volume: 7 Number: 2

APA
Jindal, C., Gupta, S., Mehra, J., Sharma, T., & Aggrawal, P. (2025). An AI-Driven PDF Query System Leveraging OpenAI LLM and LangChain for Enhanced Data Retrieval. International Journal of Informatics and Applied Mathematics, 7(2), 16-28. https://izlik.org/JA95TE26DL
AMA
1.Jindal C, Gupta S, Mehra J, Sharma T, Aggrawal P. An AI-Driven PDF Query System Leveraging OpenAI LLM and LangChain for Enhanced Data Retrieval. IJIAM. 2025;7(2):16-28. https://izlik.org/JA95TE26DL
Chicago
Jindal, Chirag, Satyam Gupta, Jyoti Mehra, Tushar Sharma, and Pulkit Aggrawal. 2025. “An AI-Driven PDF Query System Leveraging OpenAI LLM and LangChain for Enhanced Data Retrieval”. International Journal of Informatics and Applied Mathematics 7 (2): 16-28. https://izlik.org/JA95TE26DL.
EndNote
Jindal C, Gupta S, Mehra J, Sharma T, Aggrawal P (January 1, 2025) An AI-Driven PDF Query System Leveraging OpenAI LLM and LangChain for Enhanced Data Retrieval. International Journal of Informatics and Applied Mathematics 7 2 16–28.
IEEE
[1]C. Jindal, S. Gupta, J. Mehra, T. Sharma, and P. Aggrawal, “An AI-Driven PDF Query System Leveraging OpenAI LLM and LangChain for Enhanced Data Retrieval”, IJIAM, vol. 7, no. 2, pp. 16–28, Jan. 2025, [Online]. Available: https://izlik.org/JA95TE26DL
ISNAD
Jindal, Chirag - Gupta, Satyam - Mehra, Jyoti - Sharma, Tushar - Aggrawal, Pulkit. “An AI-Driven PDF Query System Leveraging OpenAI LLM and LangChain for Enhanced Data Retrieval”. International Journal of Informatics and Applied Mathematics 7/2 (January 1, 2025): 16-28. https://izlik.org/JA95TE26DL.
JAMA
1.Jindal C, Gupta S, Mehra J, Sharma T, Aggrawal P. An AI-Driven PDF Query System Leveraging OpenAI LLM and LangChain for Enhanced Data Retrieval. IJIAM. 2025;7:16–28.
MLA
Jindal, Chirag, et al. “An AI-Driven PDF Query System Leveraging OpenAI LLM and LangChain for Enhanced Data Retrieval”. International Journal of Informatics and Applied Mathematics, vol. 7, no. 2, Jan. 2025, pp. 16-28, https://izlik.org/JA95TE26DL.
Vancouver
1.Chirag Jindal, Satyam Gupta, Jyoti Mehra, Tushar Sharma, Pulkit Aggrawal. An AI-Driven PDF Query System Leveraging OpenAI LLM and LangChain for Enhanced Data Retrieval. IJIAM [Internet]. 2025 Jan. 1;7(2):16-28. Available from: https://izlik.org/JA95TE26DL

International Journal of Informatics and Applied Mathematics