A production‑ready Retrieval‑Augmented Generation (RAG) assistant that:
VectorDB
(MiniLM embeddings) → prompt → LLMMemoryManager
keeps a short recent window + a running summary (compact & persisted)app/
├─ app.py # CLI entry (baseline)
├─ app_gradio.py # Gradio UI (chat + debug panels)
├─ memory_utils.py # Rolling summary memory (persisted + recent window)
├─ log_utils.py # Logger + JSONL trace writer
├─ vectordb.py # Simple vector DB wrapper (add/search)
├─ file_utils.py # load_all_publications(), load_yaml_config()
├─ prompt_builder.py # build_prompt_from_config()
├─ paths.py # PROMPT_CONFIG_FPATH, OUTPUTS_DIR, etc.
langchain-core
, langchain-openai
/langchain-groq
/langchain-google-genai
, sentence-transformers
, chromadb
, python-dotenv
, gradio
git clone <your-repo-url> cd llm-rag pip install -r requirements.txt
Create .env
at project root:
# choose one provider (assistant auto-detects in this order) OPENAI_API_KEY=sk-... OPENAI_MODEL=gpt-4o-mini # or GROQ # GROQ_API_KEY=... # GROQ_MODEL=llama-3.1-8b-instant # or Google # GOOGLE_API_KEY=... # GOOGLE_MODEL=gemini-2.0-flash ---
python app.py # Enter a question or 'quit' to exit:
python app_gradio.py # open the printed local URL (default http://127.0.0.1:7860)
UI Panels
SUMMARIZE_EVERY_N
turnsOUTPUTS_DIR/memory/memory_summary.json
outputs/rag_assistant.log
(rotating file handler)outputs/rag_assistant_traces.jsonl
(CLI) and outputs/rag_assistant_ui_traces.jsonl
(UI)Each trace includes timestamps, doc counts, memory excerpts, and answer snippets for easy offline debugging.
The system prompt is conversation‑aware:
See config/prompt_config.yaml
.
VectorDB
.