This paper describes the implementation of a chat assistant for Ethify, an e-commerce platform. The system uses retrieval-augmented generation (RAG) to provide responses based on platform documentation, integrated with multi-turn conversation memory stored in MongoDB. The assistant handles user queries on store creation, product management, and customization. We detail the architecture, including embedding-based retrieval and history summarization. Evaluation shows accurate responses within scope, with memory enabling context retention across 10+ turns.
E-commerce platforms require efficient support for users managing online stores. Ethify provides tools for store setup, product addition, and theme customization. Manual support is limited, so an automated chat assistant can address common queries using platform-specific knowledge.
This work builds a chat assistant that retrieves relevant documentation snippets via vector search and generates responses with a large language model (LLM). It supports multi-turn conversations by storing recent interactions fully and summarizing older ones. The design prioritizes accuracy to Ethify documentation and refusal of out-of-scope requests.
The system runs on Next.js, with MongoDB for storage and Gemini for embeddings and generation. This setup allows deployment on a single server.
System Architecture
The assistant processes queries in three stages: retrieval, generation, and memory management.
Prompt Configuration
The core prompt is modular:
Role: "An AI assistant for Ethify that answers user questions about creating and managing online stores."
Context: Static knowledge base description plus dynamic retrieved chunks.
Guidelines: Enforce scope limits and safety (e.g., "If beyond scope, say: 'I'm sorry, that information is not in this document.'").
Frontend: React component handles input, displays messages, and persists thread ID.
Backend: Next.js API route orchestrates embedding, search, prompt building, and DB upserts.
We tested on a Next.js development server with MongoDB Atlas. Dataset: 20 Ethify documents chunked into 500-character segments, embedded offline.
Test cases:
Single-turn: 50 queries (25 in-scope, e.g., "How to add products?"; 25 out-of-scope, e.g., "Weather today?").
Multi-turn: 10 sessions of 15 exchanges each, simulating store setup flows (e.g., greeting → product query → customization follow-up).
Edge: 5 unsafe queries (e.g., "Hack payments").
Metrics:
Accuracy: Exact match to documentation for in-scope.
Context Retention: Relevance score (manual 1-5) for responses using history.
Latency: End-to-end time per turn.
No hyperparameter tuning; default Gemini settings (temperature 0.7).
In-scope accuracy reached 96%, with 48 out of 50 queries matching documentation exactly. Errors occurred from ambiguous phrasing in 4 cases. Out-of-scope refusals were 100%, with all 25 queries handled politely without providing unrelated information. Unsafe queries were also refused 100% of the time, maintaining security.
For multi-turn relevance, the average score was 4.2 out of 5 across 10 sessions. History improved follow-ups, such as referencing prior product details in variant questions. Summaries captured 80% of key details after 10 turns, supporting ongoing context without loss.
The system processed 500 simulated turns without failures. Memory storage capped at 10 full entries per thread, with summaries limited to 150 tokens on average.
The implemented assistant provides reliable, context-aware support for Ethify users. RAG ensures factual responses, while memory supports natural conversations. Limitations include dependency on chunk quality and Gemini costs. Future work: Add user feedback loops for prompt refinement and scale to real-time analytics. The code is open for deployment.