The project is about the development of a Retrieval Augmented Generation (RAG)-based Large Language Model (LLM) assistant tailored for e-commerce data, specifically for a health drink use case. Using Snowflake Cortex and Streamlit, the project integrates document retrieval, contextual response generation, and a user-friendly chat interface. The system reduces hallucinations by grounding responses in relevant documents, such as product manuals and sales metrics, while providing traceability through retrieved document chunks. The article outlines the methodology, implementation, and results of the project, demonstrating the effectiveness of the RAG framework in enhancing LLM performance for domain-specific applications.
With the rapid adoption of AI-powered assistants in e-commerce has highlighted the need for domain-specific solutions that can provide accurate, context-aware responses. Traditional LLMs often struggle with hallucinations, generating incorrect or irrelevant information when queried about niche topics. To address this, Retrieval Augmented Generation (RAG) frameworks have emerged as a powerful solution, combining the strengths of document retrieval and generative AI.
This project focuses on building a RAG-based LLM assistant for a health drink e-commerce platform. By integrating Snowflake Cortex for document retrieval and Streamlit for the user interface, the system provides intelligent, contextually grounded responses. The assistant is designed to handle queries related to product manuals, sales metrics, and operational data, ensuring accuracy and relevance.
Beyond e-commerce, this solution has significant potential for internal organizational use cases. For example, it can be adapted to create a company-specific chatbot for employees, enabling them to access and query internal documents such as HR policies, operational guidelines, or training materials. Additionally, the system could be deployed in operations to assist employees in retrieving technical manuals, troubleshooting guides, or compliance documents, ensuring that they have the right information at their fingertips. This would streamline workflows, reduce dependency on manual searches, and enhance overall operational efficiency.
Existing solutions typically rely on generic pre-trained models, which fail to leverage structured or unstructured domain-specific data effectively. This project addresses this gap by implementing a RAG framework that integrates document retrieval, contextual grounding, and traceability into a single system. The key gaps identified include:
The dataset used in this project consists of the following:
The dataset is stored in Snowflake, with documents pre-processed and categorized into two main types:
Note: The data in instruction manuals and metrics documents are AI-generated (ChatGPT) and used as example only.
The RAG-based LLM assistant relies on the following assumptions:
The project has the following workflow to develop the RAG-based LLM assistant.
User manuals and metric documents are organized and preprocessed.
Snowflake Cortex’s serverless capabilities are used to label documents with metadata for filtered searches.
Cortex Search is employed for automatic embeddings and efficient document retrieval.
Documents are categorized into "instructions" and "metrics" for targeted retrieval.
A Streamlit-based chat interface is built, incorporating retrieval and generation logic.
The UI displays retrieved document chunks alongside LLM-generated responses.
Conversation history summarization is implemented to maintain context.
The RAG framework reduces hallucinations by grounding responses in retrieved documents.
Improved document traceability with retrieved chunks used for generating answers.
Enhanced Context that utilizes conversation history and sliding window summarization.
Product Expertise of a Chatbot that answers queries related to health drink products, sales, and operations.
The following are the steps performed to build the Chatbot.
The following code demonstrates how documents are pre-processed and split into chunks using Langchain’s text splitter:
from langchain.text_splitter import RecursiveCharacterTextSplitter # Configure text splitter for optimal chunk size text_splitter = RecursiveCharacterTextSplitter( chunk_size=1512, # Adjust chunk size as needed chunk_overlap=256, # Overlap to maintain context length_function=len ) # Split text into chunks chunks = text_splitter.split_text(document_text)
This SQL code sets up the Cortex Search service on the processed document chunks:
CREATE OR REPLACE CORTEX SEARCH SERVICE CC_SEARCH_SERVICE_CS ON chunk ATTRIBUTES category WAREHOUSE = COMPUTE_WH TARGET_LAG = '1 minute' AS ( SELECT chunk, relative_path, file_url, category FROM docs_chunks_table );
The following SQL code uses Snowflake Cortex’s LLM capabilities to classify documents into categories.
CREATE OR REPLACE TEMPORARY TABLE docs_categories AS WITH unique_documents AS ( SELECT DISTINCT relative_path FROM docs_chunks_table ), docs_category_cte AS ( SELECT relative_path, TRIM(snowflake.cortex.COMPLETE( 'llama3-70b', 'Given the name of the file between <file> and </file> determine if it is related to bikes or snow. Use only one word <file>' || relative_path || '</file>' ), '\n') AS category FROM unique_documents ) SELECT * FROM docs_category_cte;
The experiments had the following results:
The RAG framework significantly reduced hallucinations, ensuring responses were grounded in relevant documents. Additionally, the bot gave users to verify the source of information through displayed document chunks.
Cortex Search enabled fast and accurate retrieval of document chunks, even with large datasets. Hybrid search capabilities allowed for filtering based on document categories, improving relevance.
The Streamlit chat interface provided a seamless user experience, with features like conversation history and document traceability.
Users could toggle between responses with and without document context, highlighting the benefits of RAG.
Llama3-70B delivered high-quality responses and more accurate results compared to other models.
To ensure the system’s long-term effectiveness, the following monitoring and maintenance practices are recommended:
The RAG-based approach outperformed traditional LLMs and other retrieval-based systems in terms of accuracy, relevance, and user satisfaction.
While the RAG-based LLM assistant offers significant advantages, it also has some limitations. Its performance heavily depends on the quality and relevance of the ingested documents; poorly structured or outdated documents can lead to suboptimal responses. Additionally, as the document repository grows, the system may encounter scalability challenges related to storage and retrieval latency.
Although Llama3-70B is cost-effective, frequent usage of LLMs can still incur significant costs, particularly for large-scale deployments. The accuracy of metadata labels relies on the LLM used for classification, which may occasionally produce incorrect labels. Furthermore, the system is currently optimized for English-language documents. Extending support to other languages would require additional fine-tuning and testing.
This project demonstrates the effectiveness of a RAG-based LLM assistant for e-commerce applications. By using Snowflake Cortex for document retrieval and Streamlit for the user interface, the system provides accurate, context-aware responses while reducing hallucinations. The integration of conversation history summarization and document traceability further enhances the user experience. Future work could explore fine-tuning LLMs for specific use cases, expanding the document repository, and incorporating additional data sources. The project serves as a blueprint for building domain-specific AI assistants that combine the strengths of retrieval and generative AI, with applications ranging from e-commerce to internal organizational support.