Ethiopian History RAG System: Based on Ethiopian history content from Wikipedia

Abstract

Despite the public availability of Ethiopia’s historical records and academic resources, their complexity and academic formality make them difficult for students and the general public to understand and apply in real-life learning. This project presents a conversational Ethiopian History RAG (Retrieval-Augmented Generation) Assistant that simplifies and contextualizes Ethiopian history through semantic search, reference-cited reasoning, and user-friendly interaction. Powered by LangChain, ChromaDB, HuggingFace embeddings, Groq-hosted LLaMA 3, and Streamlit, the assistant explains historical events in plain language, interprets user questions, and offers grounded insights—bridging the gap between historical text and practical understanding.

Introduction

Understanding Ethiopia’s rich and complex history requires more than a simple search query. Users often seek context—why events happened, their cultural impact, and their global significance. Traditional databases return raw text but fail to interpret meaning, while generic AI models risk hallucination, misrepresenting historical facts and eroding trust in educational settings.

This project bridges that gap with a context-sensitive history assistant powered by Retrieval-Augmented Generation (RAG). By combining semantic search, structured reasoning, and robust prompt engineering, it provides grounded, reference-cited responses to user queries. Whether you’re exploring the origins of the Ethiopian Empire, understanding the impact of coffee on global trade, or analyzing the Derg regime, this assistant delivers accurate, explainable insights—transforming passive access into active learning.

System Architecture

1. Problem Statement

While Ethiopia’s historical documents and Wikipedia entries are public and searchable, applying them meaningfully to specific real-world questions remains complex. For example, understanding the causes and consequences of the Battle of Adwa or the significance of the Derg regime requires cross-referencing multiple sources and synthesizing context. Few tools exist today that connect a user’s curiosity—about people, events, or cultural heritage—to a grounded, accessible historical interpretation.

Traditional databases provide raw text search but lack the ability to process full context or offer structured reasoning. Meanwhile, generic AI models may hallucinate answers or misrepresent historical facts, undermining user trust in educational applications. There’s a gap between historical literacy and historical applicability—especially when users seek not just what happened, but why it matters.

This project addresses that gap by delivering a context-sensitive history assistant that reasons from Ethiopia’s past to real-world understanding. Whether asked to explain the origins of the Ethiopian Empire or to assess the impact of coffee on global trade, the assistant provides grounded, reference-cited responses. In doing so, it transforms passive access into proactive learning—where history begins to answer back.

2. Methodology

This project combines retrieval-augmented generation (RAG), structured prompt engineering, and vector-based search to deliver grounded historical reasoning through natural-language interaction. The assistant is trained on curated Wikipedia articles, academic summaries, and primary sources, and is engineered to interpret user queries while maintaining historical accuracy and relevance.

2.1 Architecture Overview

The system is composed of four main layers:

Document Store
- sentence-transformers/all-MiniLM-L6-v2 for semantic embeddings
- ChromaDB as the vector store
- Historical documents chunked and tagged by topic and period
LLM Reasoning Engine
- Groq-hosted LLaMA 3 (8B) via langchain_groq.ChatGroq
- Deterministic prompt execution and fast response
Prompt Layer (Chain of Thought for History)
- Structured to guide reference-cited, step-by-step historical reasoning
- Detects event-based inputs and infers plausible historical consequences
- Includes memory context and retrieval context in every call
Interfaces
- Streamlit UI for end-user interaction
- Command-Line Interface (CLI): for offline or developer testing

2.2 Data Preparation

Source: Curated Wikipedia articles, academic repositories, and open educational resources
Cleaning: Removal of formatting and encoding artifacts
Chunking: Topic-aware logic with summary_title metadata
Embedding: Embedded and stored in ChromaDB for semantic retrieval

2.3 Historical QA Flow

User submits question or topic
Query embedded and relevant chunks retrieved
Prompt constructed with:
- Retrieved context
- User question
- Chat history (if available)
LLM generates grounded answer with references
UI renders explanation + referenced sources

2.4 System Architecture Diagram

The diagram below illustrates how the assistant is structured across four layers—data ingestion, retrieval, reasoning, and interaction:

3. Results & Insights

The Ethiopian History RAG Assistant was tested locally on foundational use cases, confirming its ability to reason accurately across key areas of Ethiopian history. While formal deployment and benchmarking are still in progress, early results indicate strong alignment between user input and relevant historical responses.

Video to show how the resulting implementation works:

3.1 Local Functionality Tests

Core capabilities were validated:

Event-Based Queries
- Accurately responds to questions involving major historical events.
Personality-Based Queries
- Successfully interprets questions about historical figures and their impact.
Cultural and Social History
- Maps user questions to relevant cultural and social developments.
Out-of-Context Questions
- Detects when historical context is insufficient and responds appropriately.
Reference-Cited Reasoning
- Synthesizes sources and facts to infer historical significance with proper grounding.

3.2 Observations

Maintains high historical fidelity and avoids hallucination
Prompt design yielded conversational and reasoned responses
Local CLI and Streamlit testing proved smooth and responsive

4. Future Directions

The Ethiopian History RAG Assistant shows strong promise as a foundational tool for education and public understanding. Planned enhancements include:

No.	Feature	Description
1	Amharic and Multilingual Support	Make the assistant accessible to non-English speakers through Amharic and regional languages.
2	Primary Source and Archive Integration	Incorporate archival documents and oral histories for more dynamic and grounded responses.
3	Historical Event Simulation	Offer interactive walkthroughs and scenario explorers for students and educators.
4	Analytics and Feedback Loop	Use user feedback to refine retrieval, chunking logic, and prompt strategies for accuracy and relevance.
5	Deployment and Public Access	Launch a secure, hosted version suitable for schools, museums, and civic platforms.
6	Educational Mode and Timeline Explorer	Provide interactive timelines and learning modules to support students and lifelong learners.

5. References

Wikipedia: History of Ethiopia, Battle of Adwa, Menelik II, Haile Selassie, Derg, Ethiopian Empire, and more
Academic sources and open educational resources
Data and sources retrieved from public repositories and Wikipedia

6. Project Repository

7. UI Image

System Architecture

8. Installation & Usage

Prerequisites

Python 3.10+
Virtual environment recommended

Steps
git clone https://github.com/YonatanAwoke/Ethiopian_History_RAG
cd .\Ethiopian_History_RAG
pip install -r requirements.txt

Run the CLI
cd code
python .\vector_db_rag_retrieval.py

Run the Streamlit App
cd code
streamlit run ethiopian_history_streamlit_chat.py

9. Safety & Guardrails

Source Transparency: All responses cite their origin.

Hallucination Mitigation: Answers limited to retrieved context.

Ethical Usage: Designed for educational purposes; not a substitute for academic research.

Explore the full source code, CLI tools, UI, and model configuration on GitHub:

GitHub Repository

Ethiopian History RAG System: Based on Ethiopian history content from Wikipedia

Ethiopian History RAG System: Based on Ethiopian history content from Wikipedia

Table of contents

Ethiopian History RAG System: Based on Ethiopian history content from Wikipedia

Abstract

Introduction

1. Problem Statement

2. Methodology

2.1 Architecture Overview

2.2 Data Preparation

2.3 Historical QA Flow

2.4 System Architecture Diagram

3. Results & Insights

Video to show how the resulting implementation works:

3.1 Local Functionality Tests

3.2 Observations

4. Future Directions

5. References

6. Project Repository

7. UI Image

8. Installation & Usage

9. Safety & Guardrails

Table of contents

Files

Datasets

Datasets

Code

Code