AI-Powered Production-Grade GitHub Repository Intelligence System

📖 ABSTRACT

This project presents a production-grade AI-powered GitHub Repository Intelligence System developed using a modern full-stack architecture. The system leverages multi-agent artificial intelligence workflows through LangGraph, backend API services through FastAPI, interactive visualization through Streamlit, and persistent cloud-based storage using PostgreSQL via Supabase. The platform automatically: - 🔍 Analyzes GitHub repositories - 📄 Extracts documentation structure - 🧠 Evaluates repository quality - 💬 Provides conversational AI interaction - 💾 Maintains persistent user sessions.The architecture follows a scalable SaaS-oriented design integrating authentication, persistent storage, real-time chat interfaces, and deployment-ready infrastructure.

🌍 INTRODUCTION

Modern software repositories contain large amounts of documentation, source code, metadata, and development history, making manual evaluation increasingly difficult. This project introduces an AI-driven platform capable of automating repository analysis using a LangGraph multi-agent orchestration pipeline integrated with cloud-based SaaS architecture. Key capabilities include repository analysis, metadata extraction, conversational interaction, persistent session management, and intelligent documentation evaluation.

🎯 OBJECTIVES

The major objectives of this project are:

Automated GitHub repository analysis
Documentation quality evaluation
AI-generated repository summaries
Persistent conversational interaction
SaaS deployment architecture ---

🌐 Live Deployment

This project is deployed using Render with separate frontend and backend services.

Frontend (Streamlit UI):
https://ai-github-intelligence-system-front-end.onrender.com
Backend (FastAPI):
https://ai-github-intelligence-system.onrender.com

⚠️ Important: Backend Cold Start (Render Free Tier)

The backend is hosted on Render’s free tier, which means it may go to sleep after periods of inactivity.

When this happens:

The first request may take 30–60 seconds to respond
You may see messages like:
- “Server is starting or temporarily unavailable”
- 502 Bad Gateway
- Request timeout errors

🔄 How to Wake the Backend

If the backend is asleep, simply this URLs to wake it up:

https://ai-github-intelligence-system.onrender.com

Once accessed, the backend will start up and subsequent requests will be fast.

💡 Why This Happens

Render free services spin down after periods of inactivity to save resources. This is expected behavior and not a bug in the application.

🚀 Recommended Usage

For the best experience:

Open the backend URL first (to wake it up)
Then open the frontend UI
Use the application normally

📦 Installation (Local Development)

1. Clone repository

git clone https://github.com/Electrobello1/AI-Powered Production-Grade GitHub Repository Intelligence System.git
cd AI-Powered Production-Grade GitHub Repository Intelligence System

2. Create virtual environment

python -m venv venv
venv\Scripts\activate

3. Install dependencies

pip install -r requirements.txt

4. Configure environment variables

DATABASE_URL=your_supabase_postgres_url
SECRET_KEY=your_jwt_secret
REFRESH_SECRET_KEY=your_refresh_secret
OLLAMA_API_Key=your_API_Key

5. Run backend

uvicorn main:app --reload

6.Run frontend

streamlit run app.py

📊 Example Output

{

  "title": "Flask Chatbot System",
  "summary": "A chatbot built using Flask and LLMs",
  "stars": 9,
  "forks": 6,
  "tags": ["flask", "chatbot", "api"],
  "quality_score": 3,
  "confidence": 0.87,
  "status": "pass"
}

🏗️ Methodology

⚙️ System Architecture

The platform follows a layered full-stack architecture.

Frontend (Streamlit UI)
           ↓
FastAPI Backend (Auth + API Layer)
           ↓
LangGraph Multi-Agent System
           ↓
GitHub API + LLM
           ↓
Supabase PostgreSQL
           ↓
Render Deployment

🧠 Multi-Agent Framework

The intelligence layer is implemented using LangGraph, enabling specialized agents to collaborate during repository analysis.

🤖 Agent Roles

Agent	Responsibility
📄 Content Agent	README summarization and content extraction
🏷️ Metadata Agent	Repository metadata and keyword extraction
🏛️ Structure Agent	Documentation structure validation
📊 Quality Agent	Repository quality scoring
🧾 Reviewer Agent	Aggregation and final decision making
💬 LLM Agent	Conversational repository interaction

🔄 Workflow Pipeline

GitHub Repo URL
        ↓
Analyzer Agent
        ↓
Parallel Agent Execution
├── Content Agent
├── Metadata Agent
├── Structure Agent
└── Quality Agent
         ↓
Reviewer Agent
         ↓
LLM Interaction Layer
         ↓
Persistent Database Storage

⚡ Backend Engineering

The backend is implemented using FastAPI and provides RESTful API endpoints for authentication, repository analysis, session management, and chat interaction.

🔒 Authentication System

The application uses JWT-based authentication with protected API routes, refresh token rotation, and user-specific data isolation.

💾 Database Design

The system uses Supabase PostgreSQL for persistent storage and session management.

🗄️ Database Relationships

Users
  ↓
Sessions
  ↓
Messages
  ↓
Repository Analysis Results

The database supports persistent chat history, repository tracking, and session recovery.

🎨 Frontend Design

The frontend is built using Streamlit and provides:

Repository analytics dashboard
Real-time chat interface
Sidebar session navigation
Metrics visualization

Displayed metrics include GitHub stars, forks, quality scores, and confidence scores.

🧪 EXPERIMENT

⚙️ Experimental Setup

The system was evaluated using public GitHub repositories deployed on cloud infrastructure.

☁️ Deployment Environment

Component	Technology
🎨 Frontend	Streamlit
⚡ Backend	FastAPI
🧠 AI Framework	LangGraph
🗄️ Database	Supabase PostgreSQL
☁️ Deployment	Render
🔐 Authentication	JWT
🤖 LLM Integration	Ollama / Cloud APIs

🧪 Testing and Validation

To ensure reliability, scalability, and production readiness, the system underwent multiple layers of testing across the backend architecture and AI workflow pipeline.

🚀 End-to-End (E2E) API Testing

Comprehensive API testing was carried out using pytest and FastAPI’s TestClient to validate the complete request-response lifecycle of the platform.

These tests verified:

User authentication and authorization
Protected route access using JWT tokens
Repository analysis endpoint functionality
Error handling and validation responses
Session persistence and retrieval
Chat memory and conversational continuity

The E2E tests simulated real user interactions with the API to ensure the backend behaves correctly under production-like conditions.

🔗 Integration Testing (Graph Workflow Testing)

Integration testing was performed on the agentic workflow graph to validate communication and interoperability between interconnected AI components.

The tests ensured correct interaction between:

Repository ingestion modules
LLM enrichment agents
Memory and session management layers
Summarization pipelines
Review and scoring systems
Context-aware chat components

These tests verified that data flowed correctly across the graph-based architecture and that chained agent operations produced stable and coherent outputs.

🛠 Unit Testing (Tool-Level Validation)

Unit tests were conducted on individual tools and backend utility functions to ensure isolated component correctness.

This included testing for:

Summary extraction
Quality score
Repository parsing functions
Tag Extraction

The unit testing process improved maintainability and reduced the likelihood of regression errors during future feature expansion.

⚙️ Testing Frameworks and Utilities

The testing infrastructure leveraged the following technologies:

pytest
FastAPI TestClient
PostgreSQL
JWT authentication testing
Mock request validation
Structured API response verification

This multi-layered testing strategy ensured that the platform remained robust, secure, scalable, and production-ready while supporting continuous development and deployment workflows.

📈 Results

The developed system successfully demonstrated automated repository analysis, metadata extraction, AI-powered summarization, documentation evaluation, and persistent conversational interaction.

The multi-agent architecture improved scalability, modularity, and intelligent decision-making while supporting production-grade SaaS deployment capabilities.

Login page

Chat Page

Swagger UI

swagger UI.png

Supabase

LLM Guardrail

End to end test(API)

Integration test

test graph.png \

Unit test

test tools.png

🏁 Conclusion

This project demonstrates the successful integration of multi-agent AI systems, modern backend engineering, and cloud deployment infrastructure to create a production-grade GitHub Repository Intelligence Platform.

The combination of LangGraph, FastAPI, Streamlit, and Supabase PostgreSQL provides a scalable ecosystem for intelligent repository analytics and AI-assisted developer tooling.