Figure 1: Complete architecture of the LangchainJS RAG implementation showing document processing, vector storage, LLM integration, and user interaction components.
The AI developer community is naturally drawn to Python and with good reason. The ML library already supported in Python is vast, leading to better answers for any questions beginners might have while implementing their project.
This project aims to achieve the following specific objectives:
Frontend developers who are generally more comfortable with Javascript find themselves at an impasse: either learn the frontend libraries compatible with Python from scratch, so that they can create the entire app in Python itself, or awkwardly call Python code from JS code, which means separate codebase development and deployment and type conversion and other issues which arise while calling remote apps written in differing languages.
A simple project involving integrating LangChain into NodeJS code to call OpenAI functions was proving to be an uphill task for us due to lack of documentation or even community support for the questions which were inevitably arising.
Then we discovered LangchainJS, a JS port of the LangChain library, though there didn't seem to be any sample project demonstrating end-to-end usage of the library in a JS project. This led to the decision to build one so that others can benefit from a full implementation of LangchainJS in a Retrieval Augmented Generation (RAG) project that can serve as the starting point for other, much more complex applications.
Retrieval Augmented Generation is an approach that enhances large language models (LLMs) by retrieving relevant information from external sources before generating responses. This technique addresses two key limitations of standalone LLMs:
A RAG system typically consists of the following components:
The LangchainJS implementation we present here provides all these components in a JavaScript environment, making it accessible to frontend developers.
Before attempting to implement this project, you should have:
Our implementation follows a straightforward architecture with the following components:
This architecture enables the system to:
This implementation guide is based on the actual code from the langchainjs-rag-example
repository. The project consists of two main service files: gpt-service.ts
which handles the chat functionality and agent setup, and rag.service.ts
which handles document processing and the retrieval chain.
First, let's set up our project:
# Create a new directory for your project mkdir langchainjs-rag-example cd langchainjs-rag-example # Initialize npm project npm init -y # Install required dependencies npm install @langchain/openai @langchain/core @langchain/textsplitters langchain env-cmd ts-node typescript zod @nestjs/common
Next, create the basic project structure:
mkdir src touch src/gpt-service.ts touch src/rag.service.ts touch .env
Create a .env
file in the project's root directory with the following variables:
FILE_LOC_NAME=<The entire directory with the filename>
OPENAI_API_KEY=<The OpenAI API key you created for your account in the OpenAI site>
CHAT_QUESTION=<The question you would like the model to answer based on the document you provided>
Create a rag.service.ts
file that will handle document loading, processing, and retrieval. This service creates a retrieval chain that can be invoked to answer questions using the document content.
// Import necessary dependencies import { Injectable } from "@nestjs/common"; import { RecursiveCharacterTextSplitter } from "@langchain/textsplitters"; import * as fs from "fs"; import { OpenAIEmbeddings } from "@langchain/openai"; import { ChatOpenAI } from "@langchain/openai"; import { createStuffDocumentsChain } from "langchain/chains/combine_documents"; import { createRetrievalChain } from "langchain/chains/retrieval"; import { MemoryVectorStore } from "langchain/vectorstores/memory"; @Injectable() export class RagService { /** * Creates and returns a retrieval chain for answering questions based on document content * @param model - The LLM model to use for generating responses * @param prompt - The prompt template to use for structuring queries * @returns The retrieval chain result with answers based on document content */ async getChain(model: ChatOpenAI, prompt) { try { // Step 1: Load document content from the specified file console.log("Retrieving file"); const text = fs.readFileSync(process.env.FILE_LOC_NAME, "utf8"); console.log("Retrieved text"); // Step 2: Split document into smaller chunks for processing const textSplitter = new RecursiveCharacterTextSplitter({ chunkSize: 100, chunkOverlap: 4 }); const docs = await textSplitter.createDocuments([Buffer.from(text).toString()]); if (docs == null) throw new Error(`Docs not retrieved`); // Step 3: Filter out empty documents and log content console.log("Retrieved file"); docs.forEach((doc) => console.log(doc.pageContent)); const processedDocs = docs.filter((doc) => doc.pageContent); // Step 4: Create vector embeddings for the document chunks const vectorstore = await MemoryVectorStore.fromDocuments( processedDocs, new OpenAIEmbeddings() ); // Step 5: Create the document chain for combining retrieved documents console.log("Before creating stuffDocumentsChain"); console.log(prompt.promptMessages); const combineDocsChain = await createStuffDocumentsChain({ llm: model, prompt, }); // Step 6: Create a retriever from the vector store const retriever = vectorstore.asRetriever(); // Step 7: Create the retrieval chain that connects retrieval and response generation console.log("Before creating retrieval chain"); const retrievalChain = await createRetrievalChain({ combineDocsChain, retriever, }); // Step 8: Invoke the chain with the appropriate parameters return await retrievalChain.invoke({ input: "prompt.promptMessages", // context: retriever, context: processedDocs, agent_scratchpad:[] }); } catch (error) { console.log("Error in parsing documents: " + error); console.error(error); throw error; // Rethrow to allow proper error handling upstream } } }
Key features of the RAG service:
RecursiveCharacterTextSplitter
Next, create a gpt-service.ts
file that will handle the chatbot functionality, integrating the RAG service through a tool-based architecture.
// Import necessary dependencies import { Injectable } from "@nestjs/common"; import { ChatPromptTemplate, MessagesPlaceholder } from "@langchain/core/prompts"; import { ChatOpenAI } from "@langchain/openai"; import { convertToOpenAIFunction } from "@langchain/core/utils/function_calling"; import { RunnableSequence } from "@langchain/core/runnables"; import { AgentExecutor, AgentStep } from "langchain/agents"; import { formatToOpenAIFunctionMessages } from "langchain/agents/format_scratchpad"; import { OpenAIFunctionsAgentOutputParser } from "langchain/agents/openai/output_parser"; import { AIMessage, BaseMessage, HumanMessage } from "@langchain/core/messages"; import { RagService } from "./rag.service"; import { z } from "zod"; import { DynamicStructuredTool } from "langchain/tools"; // Initialize the OpenAI Chat model with specific parameters const chatModel = new ChatOpenAI({ model: "gpt-3.5-turbo", temperature: 0, // Set to 0 for deterministic responses apiKey: process.env.OPENAI_API_KEY, }); // Define the schema for the RAG tool using Zod const ragQuerySchema = z.object({ query: z.string().describe("The entire query sent by user."), }); // Create the RAG tool that will retrieve information from documents const documentRetrievalTool = new DynamicStructuredTool({ name: "query_user_information", description: "Send a query to get personal information about the user and their preferences", schema: ragQuerySchema, func: async ({query}) => { console.log(query); console.log("Getting information..." + query); // Create a prompt for the RAG service const ragPrompt = ChatPromptTemplate.fromMessages([ ["system", "You are very powerful assistant. Answer the user's questions by referring to all of your tools and knowledge sources. If you don't know the answer, just say that you don't know, don't try to make up an answer.{context}"], ["human", query], new MessagesPlaceholder("agent_scratchpad"), ]); console.log(ragPrompt.promptMessages); // Initialize and call the RAG service const ragService = new RagService(); const retrievalResult = await ragService.getChain(chatModel, ragPrompt); console.log(retrievalResult); return retrievalResult.answer; } }); // Create the tools array (can be extended with additional tools) const availableTools = [documentRetrievalTool]; const conversationPrompt = ChatPromptTemplate.fromMessages([ ["system", "You are very powerful assistant. Answer the user's questions by referring to all of your tools and knowledge sources. If you don't know the answer, just say that you don't know, don't try to make up an answer.{context}"], ["human", "{input}"], new MessagesPlaceholder("agent_scratchpad"), ]); // Create the main conversation prompt template const modelWithFunctions = chatModel.bind({ functions: availableTools.map((tool) => convertToOpenAIFunction(tool)), }); // Create the agent sequence without memory const basicAgentSequence = RunnableSequence.from([ { input: (i: { input: string; steps: AgentStep[] }) => i.input, agent_scratchpad: (i: { input: string; steps: AgentStep[] }) => formatToOpenAIFunctionMessages(i.steps), }, conversationPrompt, modelWithFunctions, new OpenAIFunctionsAgentOutputParser(), ]); // Create an executor for the basic agent const basicExecutor = AgentExecutor.fromAgentAndTools({ agent: basicAgentSequence, tools: availableTools, }); // Define a key for storing chat history const MEMORY_KEY = "chat_history"; // Create a prompt template that includes chat history const memoryPrompt = ChatPromptTemplate.fromMessages([ [ "system", `Answer the user's questions by referring to all of your tools and knowledge sources. Don't make something up. If you don't know something, just say "I don't know"`, ], new MessagesPlaceholder(MEMORY_KEY), ["user", "{input}"], new MessagesPlaceholder("agent_scratchpad"), ]); // Initialize the chat history array const chatHistory: BaseMessage[] = []; // Create an agent sequence that incorporates memory const agentWithMemory = RunnableSequence.from([ { input: (i) => i.input, agent_scratchpad: (i) => formatToOpenAIFunctionMessages(i.steps), chat_history: (i) => i.chat_history, }, memoryPrompt, modelWithFunctions, new OpenAIFunctionsAgentOutputParser(), ]); // Create an executor for the agent with memory const memoryEnabledExecutor = AgentExecutor.fromAgentAndTools({ agent: agentWithMemory, tools: availableTools, }); // Create the ChatService class for handling conversations @Injectable() export class ChatService { /** * Processes user input and generates a response using the LLM with RAG capabilities * @param content - The user's input message * @returns The AI-generated response based on context and user input */ async chatWithGPT(content: string) { console.log(content); // Invoke the executor with the user's input and current chat history const result = await memoryEnabledExecutor.invoke({ input: content, chat_history: chatHistory, }); // Update the chat history with the new messages chatHistory.push(new HumanMessage(content)); chatHistory.push(new AIMessage(result.output)); console.log("Customized chat app: " + result.output); return result.output; } } // Initialize the service and process the question from environment variables new ChatService().chatWithGPT(process.env.CHAT_QUESTION);
Key features of the Chat service:
DynamicStructuredTool
with Zod schema validationchatWithGPT
method that processes user input and returns responsesThe gpt-service.ts
file is the core of our LangchainJS implementation, orchestrating the interactions between the language model, tools, and the RAG service. Let's break down its key components:
const model = new ChatOpenAI({ model: "gpt-3.5-turbo", temperature: 0, apiKey: process.env.OPENAI_API_KEY, });
This code initializes the OpenAI Chat model with specific parameters:
model
: Specifies the model to use (gpt-3.5-turbo)temperature
: Set to 0 to make responses more deterministic and focused on accuracyapiKey
: Retrieves the API key from environment variables for authenticationconst ragSchema = z.object({ query: z.string().describe("The entire query sent by user."), }); const ragTool = new DynamicStructuredTool({ name: "query_user_information", description: "Send a query to get personal information about the user and their preferences", schema: ragSchema, func: async ({query}) => { // Tool implementation } });
This section defines the RAG tool with Zod schema validation:
ragSchema
: Creates a schema that validates the input for the tool, requiring a string queryDynamicStructuredTool
: Creates a structured tool with a name, description, schema, and implementation functionquery
provides context to the LLM about what information to passThe function inside the RAG tool performs several important steps:
This implementation demonstrates how the tool serves as a bridge between the main conversation flow and the specialized RAG functionality.
The code defines two primary prompt templates:
const prompt = ChatPromptTemplate.fromMessages([ ["system", "You are very powerful assistant. Answer the user's questions by referring to all of your tools and knowledge sources. If you don't know the answer, just say that you don't know, don't try to make up an answer.{context}"], ["human", "{input}"], new MessagesPlaceholder("agent_scratchpad"), ]);
const memoryPrompt = ChatPromptTemplate.fromMessages([ [ "system", `Answer the user's questions by referring to all of your tools and knowledge sources. Don't make something up. If you don't know something, just say "I don't know"`, ], new MessagesPlaceholder(MEMORY_KEY), ["user", "{input}"], new MessagesPlaceholder("agent_scratchpad"), ]);
These templates define:
{input}
){context}
placeholder for additional context informationThe primary difference between these templates is that the memory prompt includes a placeholder for chat history, enabling multi-turn conversations.
const modelWithFunctions = model.bind({ functions: tools.map((tool) => convertToOpenAIFunction(tool)), }); const runnableAgent = RunnableSequence.from([ { input: (i: { input: string; steps: AgentStep[] }) => i.input, agent_scratchpad: (i: { input: string; steps: AgentStep[] }) => formatToOpenAIFunctionMessages(i.steps), }, prompt, modelWithFunctions, new OpenAIFunctionsAgentOutputParser(), ]);
This section:
const chatHistory: BaseMessage[] = []; const agentWithMemory = RunnableSequence.from([ { input: (i) => i.input, agent_scratchpad: (i) => formatToOpenAIFunctionMessages(i.steps), chat_history: (i) => i.chat_history, }, memoryPrompt, modelWithFunctions, new OpenAIFunctionsAgentOutputParser(), ]);
This code:
@Injectable() export class ChatService { async chatWithGPT(content: string) { console.log(content); const result1 = await executorWithMemory.invoke({ input: content, chat_history: chatHistory, }); chatHistory.push(new HumanMessage(content)); chatHistory.push(new AIMessage(result1.output)); console.log("Customized chat app: " + result1.output); return result1.output; } }
The ChatService class:
@Injectable()
to enable NestJS dependency injectionchatWithGPT
method that processes user queriesnew ChatService().chatWithGPT(process.env.CHAT_QUESTION);
This final line demonstrates how to use the service by:
Key architectural points to understand about this implementation:
To run the application, use the following command:
npx env-cmd ts-node .\src\gpt-service.ts
This will:
.env
filegpt-service.ts
script which will automatically:
chatWithGPT
with the question from your environment variableThe implementation follows a modular structure:
RAG Service: Handles document processing and retrieval
Chat Service: Manages the conversation flow
Tool-Based Architecture: Follows OpenAI function calling pattern
This modular design allows for easy extension and customization, such as adding additional tools or modifying the RAG implementation.
A lot of exceptions cropped up during the implementation and execution of the project, which we systematically researched and resolved. The exceptions and their solutions are presented below:
Exception message
"TypeError: text.replace is not a function"
Troubleshooting checklist
Check to see if the messages from your chat are being sent to the custom RAG tool you created and registered to your model.
Problem Explanation and Root Cause
OpenAI decides which tool to use based on the tool description, field description, and the prompt.
Initially, we had used the DynamicTool from LangChain to create our RAG tool. The tool was being called, but it was unable to fetch the chat question from the argument.
Solution
To provide the field description, we used zod to create the tool schema where we provided the field description.
We then used DynamicStructuredTool to create the RAG tool, and the query started being sent to the tool properly.
Exception message
"BadRequestError: 400 Missing parameter 'name': messages with role 'function' must have a 'name'."
Troubleshooting checklist
Check if you are sending the entire response obtained from your RAG to the chat agent. You should only send the "response.answer" field to the chat agent.
Problem Explanation and Root Cause
When calling the invoke() method on the retrieval chain, we had sent the entire prompt containing all the templates to the "input" field, which then threw the above exception.
Solution
After studying the structure of the generated prompt, we understood that instead of the prompt, the "prompt.promptMessages" in the "input" field of the chain's invoke() method needed to be sent, so that only the messages get passed in, and not the entire JSON structure.
While our LangchainJS implementation offers significant benefits for JavaScript developers, it's important to be aware of certain limitations and trade-offs:
Library Maturity: LangchainJS is less mature than its Python counterpart, with fewer community examples and resources available. This may result in more challenging debugging processes compared to the Python implementation.
Performance Trade-offs: JavaScript's single-threaded nature can impact performance for computationally intensive operations like embedding generation and vector similarity search, especially with large document collections.
Documentation Gaps: The official documentation for LangchainJS is less comprehensive than for Python LangChain, requiring developers to occasionally infer functionality from Python examples. As we encountered while implementing this project, this can lead to time-consuming troubleshooting.
Ecosystem Integration: Fewer pre-built integrations exist for JavaScript compared to Python's robust ML ecosystem, potentially requiring custom implementation for specialized needs.
TypeScript Type Definitions: While TypeScript support is provided, some type definitions may be incomplete or require manual augmentation in complex use cases.
NestJS Integration Considerations: When using LangchainJS with NestJS as in our implementation, developers need to consider how to properly integrate LangChain's functional programming style with NestJS's dependency injection patterns.
Memory Management: Processing large documents can lead to memory issues in Node.js environments with default memory limits. Users may need to increase memory allocation for production deployments.
Rate Limiting: OpenAI API rate limits can impact throughput for applications with high query volumes. Implement proper error handling and retry logic for production applications.
Environmental Differences: Behavior may differ between development and production environments, particularly around file path handling and environment variable management.
Vector Store Persistence: Our example uses an in-memory vector store that doesn't persist between application restarts. Production applications should implement a persistent storage solution.
Simplicity vs. Flexibility: We prioritized simplicity and clarity over advanced features in this implementation, making it accessible to beginners but potentially requiring extensions for complex use cases.
Error Handling: The implementation provides basic error handling, but production applications should implement more robust error management strategies.
Security Considerations: Our implementation focuses on functionality rather than security. Production deployments should implement proper security measures, especially around API key management.
The final result of the process mentioned above is the fully implemented LangChainJS RAG project which can be used as a baseline for further customization depending on the individual or organization use-cases.
The Git repository for the project is as follows:
langchainjs-rag-example
npm install
from the root of the projectnpx env-cmd ts-node .\src\gpt-service.ts
When implementing RAG systems with LangchainJS, consider the following performance aspects:
Document Chunking Strategy: The chunk size and overlap parameters significantly impact retrieval quality. Smaller chunks increase precision but may lose context, while larger chunks preserve context but may introduce noise.
Embedding Model Selection: The choice of embedding model affects both performance and retrieval quality. OpenAI's embeddings offer high quality but at higher cost, while local models may be faster but potentially less accurate.
Vector Store Selection: For production applications with large document collections, consider alternatives to the in-memory store used in this example, such as Pinecone, Chroma, or Faiss.
Retrieval Strategy: The default similarity search can be enhanced with techniques like Maximum Marginal Relevance (MMR) to increase result diversity.
Having completed this basic implementation, you can explore the following advanced topics:
Vector Store Persistence: Implement a persistent vector store using solutions like Pinecone, Chroma, or a local database.
Advanced RAG Techniques: Explore techniques like hypothetical document embeddings, reranking, or multi-query retrieval to improve retrieval quality.
Streaming Responses: Implement streaming responses from the LLM to improve user experience for longer responses.
Fine-tuning: Explore fine-tuning the underlying LLM to improve performance on domain-specific tasks.
NestJS Web Application: Extend the implementation into a full NestJS web application with proper controllers and service structure, leveraging the framework's capabilities for a production-ready deployment.
Chat Memory Enhancement: Improve the chat history implementation with different memory types such as buffer window memory or summary memory for more efficient context management.
The development of a comprehensive LangchainJS RAG implementation has significance that extends far beyond solving technical implementation challenges for JavaScript developers. This work has important implications for the broader field of AI application development, knowledge democratization, and the evolution of software architecture patterns.
One of the most significant implications of this work is how it contributes to the democratization of AI development:
Expanding the Developer Community: By providing JavaScript implementations of RAG systems, we're extending advanced AI capabilities to the massive JavaScript developer community—estimated at over 13.8 million developers worldwide. This represents a substantial expansion of who can build AI-enhanced applications.
Reducing Barriers to Entry: The significant knowledge gap between machine learning experts and frontend developers has been a major barrier to AI adoption. This implementation reduces that gap, enabling web developers to participate in AI development without learning an entirely new technology stack.
Accelerating AI Integration in Web Applications: Web applications represent the primary interface through which most users experience technology. Enabling direct JavaScript implementation of RAG systems accelerates the integration of AI capabilities into these applications, potentially leading to faster adoption of AI-enhanced user experiences.
This implementation also contributes to the evolution of software architecture patterns for AI systems:
Unified Codebases: The traditional separation between AI components (typically in Python) and frontend components (typically in JavaScript) creates significant complexity in development, deployment, and maintenance. Our approach demonstrates the viability of unified codebases that can simplify the entire application lifecycle.
Tool-Based Agent Architecture: The architectural pattern we've implemented—using structured tools with schema validation in a function calling framework—represents a significant evolution in how AI agents are constructed. This pattern improves maintainability, testability, and extensibility compared to monolithic implementations.
Framework Integration Patterns: Our integration with NestJS demonstrates how modern AI capabilities can be incorporated into established enterprise application frameworks, providing a blueprint for similar integrations with other popular frameworks like Express, Next.js, or Angular.
The practical implications of this work for industry applications are substantial:
Reduced Development and Operational Costs: Organizations can leverage existing JavaScript expertise rather than hiring specialized Python developers or managing separate codebases, potentially reducing both development and operational costs.
Enhanced Information Retrieval Solutions: The RAG implementation provides a foundation for more sophisticated enterprise information retrieval solutions that can be directly integrated into existing JavaScript-based enterprise systems.
Improved Maintenance and Iteration: The unified codebase approach enables faster iteration and more consistent maintenance practices, potentially leading to more reliable and adaptable AI-enhanced applications.
This work opens several promising avenues for future research and development:
JavaScript-Native Vector Database Integrations: Our implementation highlights the need for more JavaScript-native vector database solutions with optimized client libraries for browser and Node.js environments.
Web-Specific RAG Optimizations: The web environment presents unique constraints and opportunities for RAG systems, such as leveraging browser capabilities for local processing or optimizing for variable network conditions. These represent rich areas for future optimization.
LLM Function Calling Standards: Our implementation leverages OpenAI's function calling capabilities, but there's an opportunity to develop more standardized approaches that would work across different LLM providers.
Client-Side RAG Systems: While our implementation runs in Node.js, the approach could potentially be extended to create client-side RAG systems that run entirely in the browser, opening new possibilities for privacy-preserving AI applications.
Perhaps most importantly, this work directly addresses what we might call the "JavaScript AI gap"—the disparity between the rich ecosystem of AI tools available to Python developers and the comparatively limited options for JavaScript developers:
Documentation and Examples: By providing detailed explanations and examples, we're contributing to the growth of JavaScript-specific AI documentation, which has historically lagged behind Python equivalents.
Error Resolution Patterns: The systematic identification and resolution of common exceptions establishes patterns that others can follow when encountering similar issues, potentially saving thousands of developer hours.
Community Knowledge Base: This implementation serves as a foundation for a growing knowledge base around JavaScript AI development, which can foster a more robust community and ecosystem.
In summary, the significance of this work extends far beyond its technical implementation details. It represents an important step toward more inclusive AI development practices, unified architectural approaches, and the integration of advanced AI capabilities into mainstream web development. The long-term implications may include faster adoption of AI technologies, more diverse AI applications, and innovative new approaches to human-AI interaction through web interfaces.
This project set out to address the challenges faced by JavaScript developers seeking to implement AI capabilities using LangchainJS. After completing the implementation and analyzing the results, we can summarize our key findings and contributions:
Our primary contribution is a complete, working implementation of a Retrieval Augmented Generation system using LangchainJS. This implementation demonstrates that:
Through our development process, we identified and documented several key implementation patterns that are essential for JavaScript developers:
Perhaps most valuable to other developers is our systematic documentation of common exceptions encountered during implementation:
These solutions directly address the lack of documentation and community support that initially motivated this project.
Our work has identified important trade-offs and limitations in the LangchainJS implementation:
The modular architecture we've developed provides a foundation for future extensions:
Revisiting our original objectives from the introduction:
In summary, this project has not only achieved its original objectives but has also provided valuable insights into the practical aspects of implementing RAG systems with LangchainJS. The solutions to common exceptions and the architectural patterns we've documented address a significant gap in the current ecosystem, making AI capabilities more accessible to JavaScript developers.# Demonstrating Usage of LangchainJS to Retrieve Information from One Document.
For those looking to deepen their understanding of LangchainJS and RAG systems:
This publication has presented a comprehensive guide to implementing a RAG system using LangchainJS, addressing the specific needs of JavaScript developers working with AI technologies. By providing a complete implementation, common error solutions, and considerations for extensions, we aim to bridge the gap between Python-centric AI development and JavaScript frontend expertise.
The project demonstrates that effective AI-enhanced applications can be built entirely within the JavaScript ecosystem, eliminating the need for cross-language integration and enabling frontend developers to leverage their existing skills. We hope this serves as a valuable resource for the JavaScript AI development community and encourages further exploration of LangchainJS capabilities.
There are no datasets linked
There are no models linked
There are no models linked
There are no datasets linked