This work presents the development of a Retrieval-Augmented Generation (RAG) based conversational agent designed to query and interpret Pakistan’s National AI Policy 2025. The system leverages LangChain for orchestration, FAISS for vector-based retrieval, and OpenAI GPT models for response generation. Unlike traditional search within static documents, the proposed chatbot enables users to engage with complex policy documents interactively, receiving contextually accurate answers grounded in the original text. By integrating retrieval with generative models and conversational memory, the system demonstrates how policy documents, which are often lengthy and inaccessible to non-specialists, can be transformed into practical, queryable knowledge resources.
The system was implemented in Python using the LangChain framework. The methodology involved the following steps:
Document Ingestion
The Pakistan AI Policy 2025 PDF was loaded using PyPDFLoader.
The text was segmented into overlapping chunks using a recursive character splitter to preserve context.
Vectorization and Storage
Chunks were embedded using text-embedding-3-large from OpenAI.
The resulting vectors were stored in a FAISS index for efficient semantic search.
Retrieval-Augmented Generation (RAG)
A retriever component was configured to fetch the most relevant chunks for a given query.
Retrieved text was passed to a custom prompt template along with the user’s question.
GPT-3.5-turbo generated responses strictly constrained to the retrieved content.
Conversational Memory
ConversationBufferWindowMemory was integrated to remember the last five user–agent interactions.
This provided continuity, enabling multi-turn discussions without losing context.
Evaluation
Queries were tested across multiple sections of the policy (e.g., governance, education, industry applications).
Performance was evaluated qualitatively based on answer relevance, fidelity to the document, and user experience.
The chatbot was successfully deployed in a local environment and tested against a range of queries derived from Pakistan’s National AI Policy 2025. The system produced accurate and context-grounded responses in real time, with explicit refusal to answer when the required information was not present in the document. Key outcomes include:
Precision: Responses consistently mapped back to specific sections of the source text, minimizing hallucinations.
Efficiency: Users obtained targeted answers in seconds, compared to manual scanning of a 100+ page policy document.
Usability: Conversational memory allowed for natural follow-up questions, making the system suitable for interactive policy exploration.
Generalizability: The architecture is adaptable to other domains such as compliance documents, academic research papers, and corporate guidelines.
These results demonstrate the viability of RAG-based systems for transforming dense policy documents into accessible, interactive knowledge resources.