Abstract
Accessing reliable and Kenya-specific information on tea cultivation and regulation is a challenge due to the fragmentation of sources. Farmers and students rely on Wikipedia for general tea knowledge, while the Tea Board of Kenya publishes detailed manuals and quality guidelines in lengthy documents that are difficult to search. This project, the Tea Knowledge Assistant, is a retrieval-augmented question-answering (RAG) system that integrates Wikipedia’s comprehensive tea content with official Kenyan tea board documents. By embedding and semantically indexing these resources, the system enables users to query and receive concise, cited responses. The assistant provides global context while emphasizing Kenyan-specific standards, thereby improving accessibility of critical knowledge for farmers, students, and policymakers.
Methodology
-
Data Collection
- Wikipedia articles on tea, tea processing, and tea production in Kenya were downloaded.
- Key documents from the Tea Board of Kenya (cultivation manuals, quality requirements, and regulations) were obtained in PDF format.
-
Preprocessing & Embedding
- Documents were split into text chunks of ~1000 characters with 200-character overlap.
- Embeddings were generated using OpenAI’s
text-embedding-3-small
model.
- Chunks were stored in a ChromaDB vector database for similarity search.
-
Retrieval-Augmented Generation (RAG)
- User queries are embedded and compared with the vector database using cosine similarity.
- The top relevant chunks are retrieved and passed to a GPT model via a structured prompt template.
- The assistant returns a synthesized, cited answer.
-
Evaluation
- Test queries included practical questions such as “What are the optimal conditions for tea cultivation in Kenya?” or “What are the Tea Board of Kenya’s quality standards for greenleaf?”
- Responses were assessed for accuracy, relevance, and citation transparency.
Results
- The system successfully retrieves and synthesizes both global tea knowledge and Kenya-specific guidelines in real time.
- Farmers benefit by quickly accessing standards (e.g., acceptable leaf plucking intervals, pest management methods).
- Students gain structured answers that bridge academic knowledge and local practice.
- Policymakers and researchers can navigate scattered resources through a single conversational entry point.
- Compared to manual document search, the assistant reduces information retrieval time significantly and improves contextual accuracy.