Abstract

Mama Amaka is an AI-powered Retrieval-Augmented Generation (RAG) assistant that provides contextual, accurate answers about Nigerian cuisine.

Built with LangChain, ChromaDB, and multiple LLM providers (OpenAI, Groq, Google Gemini), the application combines vector-based semantic search with large language models to deliver personalized cooking guidance.

The system indexes traditional Nigerian recipes, chunks them for efficient retrieval, and generates warm, culturally-informed responses through a friendly "Mama Amaka" persona. Mama Amaka_ The RAG Application for Native Nigerian Dishes - visual selection (3).png

This tool addresses the gap in accessible, AI-driven resources for learning traditional Nigerian cooking methods and serves as both a practical culinary assistant and an educational reference for RAG architecture implementation.

Repository: https://github.com/Numba1ne/mama-amaka

Introduction

Nigerian cuisine represents one of Africa's richest culinary traditions, featuring diverse dishes like Jollof rice, Egusi soup, and Suya that have gained international recognition. However, accessing authentic, detailed information about traditional cooking methods remains challenging for many enthusiasts. Existing recipe resources often lack the conversational, contextual guidance that home cooks need when preparing unfamiliar dishes.

Retrieval-Augmented Generation (RAG) has emerged as a powerful paradigm for building knowledge-intensive AI applications. By combining the precision of information retrieval with the generative capabilities of large language models, RAG systems can provide accurate, contextual responses grounded in specific knowledge bases which make them ideal for domain-specific applications like culinary assistance. Mama Amaka_ The RAG Application for Native Nigerian Dishes - visual selection (4).png

Mama Amaka bridges this gap by creating an intelligent recipe assistant that understands and responds to natural language queries about Nigerian food. Rather than simply returning search results, the system synthesizes information from its recipe knowledge base to provide comprehensive, conversational answers that are complete with a warm, motherly personality that reflects the cultural tradition of learning cooking from family elders.

This publication presents the technical architecture, implementation methodology, and practical considerations for building and deploying Mama Amaka, serving both as documentation for users and as an educational resource for developers interested in RAG system development.

Methodology

RAG Architecture Overview

Mama Amaka implements a standard RAG pipeline that processes user queries through four distinct stages:

User Query

↓

[Query Embedding] → [Vector Search] → [Retrieve Top-K Chunks]

↓

[Context Assembly] → [LLM Prompt Construction] → [Response Generation]

↓

Final Answer

Document Processing Pipeline

The system ingests recipe documents through a multi-step processing pipeline:

Document Loading: Text files containing Nigerian recipes are loaded from the data/ directory. Each recipe includes the dish name, ingredients list, cooking instructions, and optional serving suggestions.
Text Chunking: Documents are split using LangChain's RecursiveCharacterTextSplitter with the following parameters:

Chunk size: 500 characters

Chunk overlap: 50 characters

Splitting hierarchy: paragraphs → sentences → words

This hierarchical approach preserves semantic coherence while creating manageable chunks for embedding and retrieval.

Embedding Generation: Each chunk is converted to a 384-dimensional vector representation using the sentence-transformers/all-MiniLM-L6-v2 model. This lightweight model provides strong semantic similarity performance while maintaining reasonable computational requirements.
Vector Storage: Embeddings are stored in ChromaDB, an open-source vector database that supports persistent storage and efficient similarity search using cosine distance.

# The VectorDB class handles all vector storage and retrieval operations:
from src.vectordb import VectorDB

# Initialize vector database
vdb = VectorDB(collection_name="mama_amaka_recipes")

# Search for relevant recipe content
results = vdb.search("jollof rice", n_results=3)
print(f"Found {len(results['documents'])} relevant chunks")

Retrieval Mechanism

When a user submits a query, the system:

Embeds the query using the same sentence transformer model
Performs cosine similarity search against the vector database
Retrieves the top-K most relevant chunks (default K=3)
Formats retrieved chunks with source attribution for context injection

def ask(self, query: str, n_results: int = 3) -> str:
    """Process user query and generate contextual response."""
    # Retrieve relevant context from vector database
    search_results = self.vector_db.search(query, n_results=n_results)
    
    # Combine retrieved chunks into context
    context = self._format_context(search_results)
    
    # Generate response using LLM with retrieved context
    response = self.chain.invoke({
        "context": context,
        "question": query
    })
    return response

Response Generation

The assembled context is combined with the user query in a structured prompt template that defines the "Mama Amaka" persona which is a warm, knowledgeable Nigerian cooking expert.

The prompt instructs the LLM to:

Answer based on provided context
Acknowledge when information is not available
Maintain a friendly, encouraging tone
Provide practical cooking guidance

# Chain composition: prompt | llm | parser
chain = prompt_template | llm | StrOutputParser()

The system supports multiple LLM providers OpenAI GPT-4o-mini, Groq Llama-3.1, Google Gemini through LangChain's unified interface, allowing users to choose based on performance requirements and cost considerations.

Experiments

System Configuration

The development and testing environment consisted of:

Python Version: 3.13
Key Dependencies: LangChain, ChromaDB, Sentence-Transformers, python-dotenv
Hardware: CPU-based inference (GPU optional for embedding generation)
Memory: 4GB minimum, 8GB recommended

# .env configuration file
# Choose ONE of the following API keys:

# Option 1: OpenAI (Recommended for best quality)
OPENAI_API_KEY=your_openai_api_key_here
OPENAI_MODEL=gpt-4o-mini

# Option 2: Groq (Fast and free tier available)
GROQ_API_KEY=your_groq_api_key_here
GROQ_MODEL=llama-3.1-8b-instant

# Option 3: Google Gemini
GOOGLE_API_KEY=your_google_api_key_here
GOOGLE_MODEL=gemini-2.0-flash

# Optional: Custom ChromaDB collection name
CHROMA_COLLECTION_NAME=mama_amaka_recipes

# Optional: Custom embedding model
EMBEDDING_MODEL=sentence-transformers/all-MiniLM-L6-v2

Knowledge Base

The initial knowledge base includes 6 traditional Nigerian recipes:

- Jollof Rice

- Egusi Soup

- Coconut Rice

- Moi-Moi

- Yam and Egg Sauce

- Pepper Soup

Each recipe document contains comprehensive information including regional variations, ingredient substitutions, and cooking tips.


Jollof rice is a popular party favourite in Nigeria...

INGREDIENTS
Serves 4
- 500g Long grain rice
- 3 cooking spoons Margarine/Vegetable oil
- 400g Tomato paste
- 2 Onions (chopped)
- 3 Scotch bonnet peppers
...

METHOD
STEP 1 Melt the butter in a large pot...
STEP 2 Add the rice and stir to coat...
...

# Document Loading:
from src.app import MamaAmakaAgent

# Initialize agent and load recipes
agent = MamaAmakaAgent()
docs = agent.load_recipes()
print(f"Loaded {len(docs)} documents")
# Output: Loaded 6 documents

# Vector Search:
from src.vectordb import VectorDB

# Initialize vector database
vdb = VectorDB(collection_name="test")

# Perform semantic search
results = vdb.search("jollof rice", n_results=3)
print(f"Found {len(results['documents'])} results")
# Output: Found 3 results

# Full RAG Pipeline:
i# Initialize and prepare agent
agent = MamaAmakaAgent()
agent.ingest_data()

# Ask a question
answer = agent.ask("What is jollof rice?")
assert len(answer) > 0
print(answer)

Test Queries

The system was validated against a diverse set of query types:

`Query Type - Example - Expected Behavior`

`Direct recipe request - "How do I make jollof rice?" - Return complete cooking instructions`

`Ingredient inquiry - "What ingredients are in egusi soup?" - List all required ingredients`

`Technique question - "How long does moi-moi take to steam?" - Provide specific timing information`

`General knowledge - "Tell me about coconut rice" - Offer overview with cultural context`

Out-of-scope - "How do I make sushi?" - Acknowledge limitation gracefully Screenshot 2026-01-02 133956.png

Results

Retrieval Performance

The vector search component demonstrated strong performance characteristics:

Query Embedding Time: <100ms on CPU
Vector Search Time: <50ms for ~1000 chunks
End-to-End Latency: 1-3 seconds (including LLM inference)

Mama Amaka_ The RAG Application for Native Nigerian Dishes - visual selection (2).png

Response Quality

Qualitative evaluation of system responses revealed:

Strengths:

Accurate ingredient lists and proportions
Clear, sequential cooking instructions
Appropriate handling of out-of-scope queries
Consistent persona maintenance across interactions

Areas for Improvement:

Limited coverage (6 recipes in initial knowledge base)
No support for follow-up questions or conversation memory
Text-only responses (no images or videos)

ChromaDB's in-memory search scales efficiently up to approximately 1 million vectors, making it suitable for recipe collections of significant size.

Conclusion

Mama Amaka demonstrates the practical application of RAG architecture for creating domain-specific AI assistants. By combining semantic search with large language models, the system provides accurate, contextual responses about Nigerian cuisine while maintaining a culturally appropriate conversational style.

Key Contributions

Practical RAG Implementation: A complete, working example of RAG architecture using modern tools (LangChain, ChromaDB, Sentence Transformers)
Multi-Provider Flexibility: Support for OpenAI, Groq, and Google Gemini allows optimization for cost, speed, or quality
Cultural Preservation: Digital documentation and accessibility of traditional Nigerian cooking knowledge
Educational Resource: Well-documented codebase serves as a learning reference for RAG system development

Limitations

Knowledge Base Scope: Currently limited to 6 recipes; expansion requires manual document creation
Single-Turn Interactions: No conversation memory or context carryover between queries
Text-Only Interface: CLI-based interaction without visual elements
Language Support: English only; no support for Nigerian languages (Yoruba, Igbo, Hausa)

Future Directions

Potential enhancements include:

Expanded Recipe Database: Integration with recipe APIs or web scraping for broader coverage
Multimodal Support: Adding recipe images and video tutorials
Web Interface: Streamlit or Gradio-based UI for improved accessibility
Session Memory Management: Integrating LangChain's ConversationBufferMemory to persist chat history. This will allow the application to retain context across multiple turns (e.g., remembering you are cooking "Jollof Rice" when you subsequently ask "How much salt do I need?").
Contextual Query Rewriting: A pivotal upgrade will be the implementation of a Conversational Retrieval Chain. This component will use a secondary LLM call to rephrase follow-up questions that contain pronouns (like "it" or "that") into standalone queries before performing the vector search.

This ensures that the retrieval mechanism always searches for "Jollof Rice salt quantity" rather than a vague "salt quantity," significantly improving retrieval accuracy for complex, multi-step cooking sessions.

Voice Interface: Speech-to-text input for hands-free cooking assistance
Multilingual Support: Nigerian language translations for broader accessibility

Availability

Mama Amaka is open-source under the MIT License. The complete codebase, documentation, and sample recipes are available at:

GitHub Repository: https://github.com/Numba1ne/mama-amaka

Maintenance and Support

As an open-source initiative, Mama Amaka is actively maintained to ensure reliability and recipe accuracy.

Version Control: The project follows Semantic Versioning practices. Users can track major updates, bug fixes, and new feature releases directly on the GitHub Releases page.
Community Support: We encourage users to report technical issues or suggest new features via the Issues tab on our GitHub repository.
Contribution: Community contributions are welcome! Detailed guidelines for submitting pull requests (for code or new recipes) can be found in the repository's CONTRIBUTING.md file.

git clone https://github.com/Numba1ne/mama-amaka.git
cd mama-amaka

# Create virtual environment
python -m venv venv
source venv/bin/activate  # Linux/macOS
# or: .\venv\Scripts\activate  # Windows

# Install dependencies
pip install -r requirements.txt

# Configure API key (choose one provider)
echo "OPENAI_API_KEY=your_key_here" > .env

# Run the application
python src/app.py

Contact

Author: Emmanuel Anthony

Email: emmanuelanthony357@gmail.com

GitHub: Numba1ne

LinkedIn : Emmanuel

References

Lewis, P., et al. (2020). "Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks." Advances in Neural Information Processing Systems, 33, 9459-9474.

LangChain Documentation. (2024). Building LLM Applications with LangChain. https://python.langchain.com/docs/

ChromaDB Documentation. (2024). Chroma: The AI-native open-source embedding database. https://docs.trychroma.com/

Reimers, N., & Gurevych, I. (2019). "Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks." Proceedings of EMNLP-IJCNLP 2019.

Mama Amaka: The RAG Application for Native Nigerian Dishes