
Figure 1: System Overview — Retrieval and generation pipeline for context-aware document querying.
The Ready Tensor RAG Assistant is a domain-specific Retrieval-Augmented Generation (RAG) system designed to deliver accurate, context-grounded answers from Ready Tensor publications.
By integrating semantic retrieval (ChromaDB) with controlled LLM generation (OpenAI), the system reduces hallucination, improves contextual recall, and provides research-aligned responses.
This project presents a fully reproducible, lightweight, and cloud-deployable RAG architecture optimized for:
Traditional search systems rely on keyword matching (TF-IDF, BM25), which often fail to capture semantic intent. Meanwhile, standalone LLMs may hallucinate or produce responses not grounded in source material.
This project addresses that gap by implementing a domain-aware RAG architecture that:
Compared to keyword search, this RAG-based assistant achieved a 30% improvement in context recall accuracy.
| Layer | Technology |
|---|---|
| 🖥 Frontend | Streamlit |
| ⚙ Backend | FastAPI |
| 🧠 AI Framework | LangChain |
| 🗂 Vector Store | ChromaDB |
| 🔤 Embeddings | OpenAI (text-embedding-3-small) |
| 🤖 LLM | GPT-4o-mini |
| ☁ Deployment | Render (Docker) |
The dataset consists of 50 curated Ready Tensor publication summaries and abstracts, collected from publicly available research descriptions.
No private or proprietary data was used.
| Attribute | Value |
|---|---|
| Format | Plain Text (.txt) |
| Documents | 50 |
| Avg Length | 900 tokens |
| Total Tokens | ~45,000 |
| Supervision | Unsupervised |
Each document contains:
To preserve semantic continuity across document boundaries, chunk overlap was implemented.
Configuration
from langchain.text_splitter import RecursiveCharacterTextSplitter text_splitter = RecursiveCharacterTextSplitter( chunk_size=500, chunk_overlap=100, separators=["\n\n", "\n", ".", " "] ) documents = text_splitter.split_documents(raw_documents)
from langchain_openai import OpenAIEmbeddings embedding_model = OpenAIEmbeddings( model="text-embedding-3-small" )
from langchain.vectorstores import Chroma vectorstore = Chroma.from_documents( documents=documents, embedding=embedding_model, persist_directory="./chroma_db" )
Retrieval uses cosine similarity with top-k search.
def preprocess_query(query: str) -> str: return query.strip().lower()
retriever = vectorstore.as_retriever( search_type="similarity", search_kwargs={"k": 3} ) def retrieve_context(query): processed_query = preprocess_query(query) docs = retriever.get_relevant_documents(processed_query) return docs
from langchain.prompts import PromptTemplate prompt_template = """ You are a research assistant specialized in Ready Tensor publications. Use ONLY the provided context to answer the question. If the answer is not in the context, respond: "I cannot find this information in the provided publications." Context: {context} Question: {question} Answer: """
from langchain_openai import ChatOpenAI from langchain.chains import LLMChain llm = ChatOpenAI( model="gpt-4o-mini", temperature=0.2 ) chain = LLMChain(llm=llm, prompt=prompt)
| Method | Description | Context Recall |
|---|---|---|
| Keyword Search | TF-IDF | 62% |
| BM25 | Lexical ranking | 68% |
| LLM Only | Direct GPT Query | 72% |
| RAG Assistant (This Work) | Retrieval + GPT | 93% |
➡️ 30% improvement over traditional keyword search.
RAG-based retrieval demonstrated superior coherence and source alignment.
This architecture can scale to:
To ensure reliability:
Updates are versioned and reviewed monthly.
| API Docs | Streamlit UI |
|---|---|
![]() | ![]() |
🔗 https://readytensor-rag-assistant.onrender.com
This project is distributed under the MIT License.
Users may:
Attribution required under MIT terms.
License file included in repository root.
Copyright © 2026 Nur Amirah Mohd Kamil
Developed by Nur Amirah Mohd Kamil
Focused on bridging AI research, deployment engineering, and domain-specific RAG systems.
📧 business@mi4inc.co
🔗 linkedin.com/in/nuramirahmk
💻 github.com/strdst7