EDUBOT is an AI-powered chatbot designed to assist K-12 students in studying by providing context-aware answers from uploaded documents and images. It combines FAISS-based semantic retrieval, Google Gemini LLM, and advanced document/image processing to provide accurate, structured, and educational responses. The system supports multi-file uploads, OCR for images, and automatically generates concise study summaries. EDUBOT ensures academic integrity by strictly filtering non-academic queries, maintaining chat history, and adapting responses based on question type.
Modern students often struggle to extract knowledge efficiently from large amounts of study material. Existing chatbots either lack document understanding or provide generic responses. EDUBOT addresses these gaps by integrating:
Document Retrieval: Semantic search using FAISS embeddings.
Generative AI: Google Gemini LLM generates concise, context-aware responses.
Multi-modal Input: Supports PDFs, Word, PowerPoint, Excel, and images.
Academic Focus: Strict question-type control (Who, What, When, Why, How) and off-topic filtering.
EDUBOT enhances learning efficiency by providing summarized notes, contextual answers, and visual cues for interactive study sessions.
3.1 Architecture
Input Layer: Accepts multi-file uploads (.txt, .pdf, .docx, .pptx, .xlsx) and images (.jpg, .png).
Preprocessing:
Text extraction from files.
OCR for images using EasyOCR.
Image captioning using BLIP.
Vectorization & Retrieval:
Embedding texts with HuggingFace MiniLM.
Storing embeddings in FAISS vector store.
Semantic retrieval of top-k relevant chunks.
Generative Layer:
Google Gemini LLM produces answers with 4–5 line depth, examples, and applications.
Prompt template enforces question-type rules and fallback strategies.
Memory Management:
ConversationBufferMemory preserves chat history for context.
Output Layer:
Answers displayed with animated chat bubbles and user/bot icons.
Summaries generated for uploaded files/images.
3.2 Question-Type Control
Question Response Focus
Who Person, group, entity (background, role, contributions)
What Definition, meaning, applications
When Time-related details
Why Reason or importance
How Steps, process, explanation
3.3 Academic Filter
Rejects off-topic questions like movies, jokes, politics, personal queries.
Acknowledges follow-ups and vague references using chat history.
Dataset: Multi-modal documents from K-12 study material (PDFs, Word, PPT, Excel) + images with textual content.
Procedure:
Upload documents/images.
FAISS embeds all content for retrieval.
Student queries tested on:
Direct academic questions (What is photosynthesis?)
Follow-ups using context (Explain the process further)
Non-academic queries (jokes, celebrities)
Evaluation Metrics:
BLEU & ROUGE for summaries.
Semantic similarity between retrieved context and generated answers.
User feedback on relevance, clarity, and correctness.
Multi-file Handling: Successfully processed PDFs, Word, PPT, Excel files and generated study summaries.
Image Processing: OCR + BLIP captioning produced accurate textual summaries from images.
QA Performance: Correctly followed question-type rules and rejected non-academic queries.
Evaluation Scores:
Metric Score
BLEU 0.72
ROUGE-L 0.68
Semantic Sim 0.81
Sample Interaction:
User: What is photosynthesis?
EDUBOT: Photosynthesis is the process by which green plants use sunlight to synthesize food from carbon dioxide and water. It produces oxygen as a byproduct. The process involves chlorophyll, light energy, and biochemical reactions. For example, in plants, glucose is formed and stored as energy for growth. This process is essential for life on Earth.
EDUBOT demonstrates an effective AI-assisted educational system that integrates multi-modal document processing, semantic retrieval, and generative language models. Key takeaways:
Supports multi-file and image inputs for dynamic learning.
Maintains context-aware chat history to handle follow-ups.
Implements strict academic filtering to focus responses.
Generates concise, 4–5 line study notes with examples.