The LLM Evaluation Chatbot is an interactive web application designed to evaluate the responses of various language models (LLMs) in real-time. It allows users to submit prompts, receive AI-generated responses, and provide feedback through a rating system. By collecting user-generated evaluations, this system helps analyze and benchmark model performance ā essential for understanding the effectiveness and alignment of LLMs.
Try it on Hugging Face-hosted models like
GPT-2
,DistilGPT-2
, andBERT-base-uncased
.
![]()
.env
settings.# Clone the repository git clone https://github.com/PhilJotham14/llm-evaluation-chatbot.git cd llm-evaluation-chatbot/backend # Install backend dependencies npm install # Run backend server npx ts-node server.ts
cd ../frontend # Install frontend dependencies npm install # Run frontend app npm start
š Open http://localhost:3000 to use the application.
Copy Edit HF_API_URL=https://api-inference.huggingface.co/models/gpt2 ā³ļø Alternative models supported: bert-base-uncased distilgpt2
The LLM Evaluation Chatbot contributes to agentic AI research by enabling human-in-the-loop feedback on LLM outputs. Through collected evaluations, the system aids in understanding model alignment, helpfulness, and user satisfaction. This facilitates creating more aligned, human-centered AI systems, which is a core goal of agentic AI innovation.
Model Customization: Easily switch models for better or different evaluations.
Future Enhancements: Potential to integrate larger models (e.g., GPT-4) and alignment datasets.
User Data Privacy: Ratings and prompts are stored locally (SQLite);
š Repository
š GitHub Repo: https://github.com/PhilJotham14/llm-evaluation-chatbot
For collaboration or inquiries:
š§ Email: p.jothamokiror@gmail.com
There are no datasets linked
There are no models linked
There are no models linked
There are no datasets linked