Jul 11, 2025●9 reads●Creative Commons Attribution-ShareAlike (CC BY-SA)

Multi-Agent AI Project Publication Assistant

agentic_ai
ai
langchain
langgraph
llm
python
qwen

Andrei Germanov

Overview

The AI Project Publication Assistant is a multi-agent system designed to help developers enhance the presentation and discoverability of their AI/ML projects on platforms like GitHub. The system uses a combination of language models, intelligent agents, and external tools to analyze repositories, extract meaningful metadata, and provide actionable suggestions for improvement.

This tool can be applied in real-world scenarios such as:

Helping open-source developers optimize their project documentation before publication.
Improving visibility of academic or research-based AI projects by suggesting relevant keywords and categories.
Supporting teams in maintaining consistent and professional repository descriptions across multiple projects.
Assisting newcomers to the AI field in understanding how to present their work effectively to attract collaborators and users.

Key Features:

Clones GitHub repositories automatically
Analyzes README files and project structure
Recommends relevant tags, keywords, and categories
Improves clarity and readability of content
Uses local hosted Qwen-7B model as an LLM for privacy and performance

Core Components:

Agents:

RepoAnalyzerAgent: Parses and interprets repository content.
MetadataRecommenderAgent: Suggests tags, categories, and keywords.
ContentImproverAgent: Enhances clarity and presentation of the analysis report.

Tools:

git_clone_tool: Clones GitHub repos locally.
repo_reader_tool: Reads README and other key files.
keyword_extractor_tool: Extracts important terms from text.

Framework:

Built using LangGraph for orchestrating agent interactions.
Uses a custom LocalQwenChat wrapper around the Qwen-7B LLM model.

Project Structure

The assistant's implementation is organized into three main modules: tools.py, agents.py, and workflow.py. Each module serves a specific purpose in the system architecture. All these modules use local QWEN chat model as an LLM, which is defined in the qwen_chat.py file.

tools.py

This module defines tools, that used by agents in LangGraph workflow:

git_clone_tool

@tool
def git_clone_tool(repo_url: str, target_dir: str = "cloned_repo") -> dict:
    """Clones a GitHub repository into a local directory."""
    try:
        if not repo_url.startswith(("http://", "https://")):
            return {"status": "error", "message": "Invalid URL: must start with http:// or https://"}

        if os.path.exists(target_dir):
            os.system(f"rm -rf {target_dir}")

        print(f"📥 Cloning {repo_url}...")
        Repo.clone_from(repo_url, target_dir)
        return {"status": "success", "repo_path": target_dir}

    except InvalidGitRepositoryError:
        return {"status": "error", "message": "The URL is not a valid Git repository."}
    except GitCommandError as e:
        return {"status": "error", "message": f"Git command failed: {e.stderr.strip()}"}
    except Exception as e:
        return {"status": "error", "message": f"Unexpected error: {str(e)}"}

Standardized tool for cloning GitHub repositories using GitPython.
Takes a repository URL and optional target directory (default: "cloned_repo").
Returns a dictionary with status and either repository path or error message.
Validates URLs to ensure they use HTTP(S) protocols.
Cleans up any existing directory at the target location before cloning.
Handles common Git errors and provides appropriate error messages.
Part of the GitPython ecosystem for working with Git repositories in Python.

repo_reader_tool

@tool
def repo_reader_tool(repo_path: str) -> dict:
    """Reads README and other key files from a given repository path."""
    content = {}
    for root, dirs, files in os.walk(repo_path):
        for file in files:
            if file.lower() in ["readme.md", "readme", "description.md"]:
                with open(os.path.join(root, file), "r", encoding="utf-8") as f:
                    content[file] = f.read()
    return {"repo_content": content}

Standardized tool for reading important files from cloned repositories.
Takes a repository path as input after cloning.
Searches recursively through directories for key documentation files.
Currently focuses on README and description files (with extensions like .md).
Returns a dictionary mapping file names to their contents.
Uses standard OS and file handling utilities for reliable file reading.

keyword_extractor_tool

@tool
def keyword_extractor_tool(text: str) -> list:
    """Extracts keywords from input text."""
    import nltk
    from nltk.corpus import stopwords
    from collections import Counter

    try:
        nltk.data.find('tokenizers/punkt')
        nltk.data.find('corpora/stopwords')
    except LookupError:
        nltk.download(['punkt', 'stopwords'])

    words = nltk.word_tokenize(text.lower())
    stop_words = set(stopwords.words("english"))
    filtered_words = [word for word in words if word.isalnum() and word not in stop_words]
    common_words = Counter(filtered_words).most_common(10)
    return [word for word, count in common_words]

Standardized tool for extracting keywords from text content.
Uses the Natural Language Toolkit (NLTK) for NLP processing.
First checks for required NLTK resources (punkt tokenizer, stopwords corpus).
Automatically downloads missing resources if needed.
Tokenizes text into words and converts to lowercase.
Filters out common English stopwords and non-alphanumeric characters.
Counts word frequencies and returns top 10 most common words.
Provides a simple but effective mechanism for identifying key terms in repository content.

All tools follow the LangChain tool interface standard, making them easily integrable with LangGraph nodes and other LangChain components. The tools combine Git operations, text processing, and natural language analysis to support the multi-agent system's functionality.

agents.py

This module contains the implementation of all specialized agents:

class RepoAnalyzerAgent:
    def __init__(self):
        self.llm = LocalQwenChat()

    def analyze(self, repo_content):
        prompt = ChatPromptTemplate.from_template(
            "Analyze the following repository content:\n{repo_content}\n\n"
            "Identify key components such as project goals, features, missing sections, etc."
        )
        chain = prompt | self.llm
        return chain.invoke({"repo_content": repo_content})

RepoAnalyzerAgent class handles analysis of repository content.
Initializes with a LocalQwenChat LLM instance for local model inference.
analyze() method:
- Takes repository content (from read_repo_files) as input.
- Creates a structured prompt asking the LLM to identify key components like project goals, features, and missing sections.
- Uses LangChain's pipe operator (|) to create a simple LLM chain.
- Returns the LLM's analysis of the repository content.

class MetadataRecommenderAgent:
    def __init__(self):
        self.llm = LocalQwenChat()

    def recommend(self, keywords):
        prompt = ChatPromptTemplate.from_template(
            "Given these keywords: {keywords}, suggest relevant tags, categories, and keywords "
            "for an AI/ML project on GitHub."
        )
        chain = prompt | self.llm
        return chain.invoke({"keywords": ", ".join(keywords)})

MetadataRecommenderAgent class suggests metadata for projects.
Initializes with a LocalQwenChat LLM instance for local model inference.
recommend() method:
- Accepts extracted keywords from the repository as input.
- Creates a prompt asking the LLM to suggest relevant tags, categories, and keywords for GitHub.
- Uses LangChain's pipe operator (|) to create a simple LLM chain.
- Returns recommendations from the LLM based on the input keywords.

class ContentImproverAgent:
    def __init__(self):
        self.llm = LocalQwenChat()

    def improve(self, analysis_report):
        prompt = ChatPromptTemplate.from_template(
            "Improve the clarity and presentation of the following analysis report:\n{analysis}\n\n"
            "Rewrite it in a more professional and readable format suitable for publication."
        )
        chain = prompt | self.llm
        return chain.invoke({"analysis": analysis_report.content})

ContentImproverAgent class enhances the clarity and professionalism of content.
Initializes with a LocalQwenChat LLM instance for local model inference.
improve() method:
- Takes the initial analysis report as input.
- Creates a prompt instructing the LLM to rewrite the analysis in a more professional and readable format.
- Uses LangChain's pipe operator (|) to create a simple LLM chain.
- Returns the improved version of the analysis report.

workflow.py

This module implements the LangGraph-based orchestration logic for the multi-agent system:

from langgraph.graph import StateGraph, END, START
from typing import Dict, Any
from tools import git_clone_tool, repo_reader_tool, keyword_extractor_tool
from agents import RepoAnalyzerAgent, MetadataRecommenderAgent, ContentImproverAgent
import os
import shutil

class PublicationAssistantState(Dict):
    repo_url: str
    repo_path: str
    repo_content: str
    keywords: list
    analysis_report: Any
    metadata_suggestions: Any
    final_report: Any
    error: str

Defines a typed state class using Python dictionaries to maintain execution context across the graph.
The state contains all necessary information that needs to be preserved between node executions.

Node Functions

def clone_repo(state: PublicationAssistantState):
    """Clones GitHub repository using git_clone_tool"""
    result = git_clone_tool.invoke({"repo_url": state["repo_url"]})
    if result["status"] == "success":
        return {"repo_path": result["repo_path"]}
    else:
        return {"error": result["message"]}

Takes repository URL from state and clones it using git_clone_tool.
Returns repository path on success or error message on failure.

def read_repo(state: PublicationAssistantState):
    """Reads repository files using repo_reader_tool"""
    result = repo_reader_tool.invoke({"repo_path": state["repo_path"]})
    state["repo_content"] = result.get("repo_content", {})
    return {"repo_content": state["repo_content"]}

Reads content of cloned repository using repo_reader_tool.
Stores README and other key files in state for further processing.

def extract_keywords(state: PublicationAssistantState):
    """Extracts keywords from repository content"""
    content = str(state["repo_content"])
    result = keyword_extractor_tool.invoke(content)
    state["keywords"] = result
    return {"keywords": result}

Uses keyword_extractor_tool to identify important terms from repository content.

def analyze_repo(state: PublicationAssistantState):
    """Analyzes repository content with RepoAnalyzerAgent"""
    agent = RepoAnalyzerAgent()
    analysis = agent.analyze(state["repo_content"])
    state["analysis_report"] = analysis
    return {"analysis_report": analysis}

Creates a RepoAnalyzerAgent instance to generate repository analysis.
Stores the analysis report in state for subsequent steps.

def recommend_metadata(state: PublicationAssistantState):
    """Recommends metadata based on extracted keywords"""
    agent = MetadataRecommenderAgent()
    suggestions = agent.recommend(state["keywords"])
    state["metadata_suggestions"] = suggestions
    return {"metadata_suggestions": suggestions}

Instantiates MetadataRecommenderAgent to suggest tags, categories, and keywords.

def improve_content(state: PublicationAssistantState):
    """Improves the analysis report with ContentImproverAgent"""
    agent = ContentImproverAgent()
    improved_report = agent.improve(state["analysis_report"])
    state["final_report"] = improved_report
    return {"final_report": improved_report}

Uses ContentImproverAgent to enhance the clarity and presentation of the analysis.

def cleanup_repo(state: PublicationAssistantState):
    """Cleans up cloned repository after processing"""
    repo_path = state.get("repo_path")
    if repo_path and os.path.exists(repo_path):
        try:
            shutil.rmtree(repo_path)
            print(f"✅ Successfully deleted: {repo_path}")
        except Exception as e:
            print(f"⚠️ Failed to delete {repo_path}: {str(e)}")
    else:
        print("⚠️ No repository path found or already deleted.")
    return {}

Safely removes cloned repository to clean up disk space.
Handles potential errors during deletion.

def handle_error(state: PublicationAssistantState):
    """Handles errors during execution"""
    print(f"❌ Error occurred: {state.get('error', 'Unknown error')}")
    return {}

Simple error handling node that logs any issues encountered.

def route_after_clone(state: PublicationAssistantState):
    """Conditional router after cloning repository"""
    if "error" in state:
        return "error_node"
    else:
        return "read_repo"

Conditional edge router that determines whether to proceed with execution or handle an error.

Graph Construction

workflow = StateGraph(PublicationAssistantState)

# Add Nodes
workflow.add_node("clone_repo", clone_repo)
workflow.add_node("read_repo", read_repo)
workflow.add_node("extract_keywords", extract_keywords)
workflow.add_node("analyze_repo", analyze_repo)
workflow.add_node("recommend_metadata", recommend_metadata)
workflow.add_node("improve_content", improve_content)
workflow.add_node("cleanup_repo", cleanup_repo)
workflow.add_node("handle_error", handle_error)

# Conditional edge
workflow.add_conditional_edges("clone_repo", route_after_clone, {
    "read_repo": "read_repo",
    "error_node": "handle_error"
})

# Set Entry Point
workflow.set_entry_point("clone_repo")

# Define Edges
workflow.add_edge("read_repo", "extract_keywords")
workflow.add_edge("extract_keywords", "analyze_repo")
workflow.add_edge("analyze_repo", "recommend_metadata")
workflow.add_edge("recommend_metadata", "improve_content")
workflow.add_edge("improve_content", "cleanup_repo")
workflow.add_edge("cleanup_repo", END)
workflow.add_edge("handle_error", END)

# Compile Graph
app = workflow.compile()

The graph follows a sequential execution pattern with conditional branching:

Start Node: Receives repository URL through the state.
clone_repo: Clones the GitHub repository to a temporary location.
Conditional Edge: Checks if cloning was successful:
- On success → Proceeds to read_repo
- On failure → Routes to handle_error
read_repo: Reads README and other relevant files from the repository.
extract_keywords: Extracts important terms from repository content.
analyze_repo: Generates comprehensive analysis of the project.
recommend_metadata: Suggests appropriate tags, categories, and keywords.
improve_content: Enhances the clarity and professionalism of the analysis.
cleanup_repo: Removes the temporary cloned repository.
End Node: Returns the complete results.

The graph handles errors gracefully by routing to the error handling node at any point where an error is detected. The cleanup node ensures that temporary files are removed regardless of success or failure.

LangGraph provides several advantages for this implementation:

Maintains state throughout the entire execution process
Enables conditional routing based on execution outcomes
Provides visibility into the execution flow
Supports error handling and recovery mechanisms
Allows for future expansion with additional nodes and functionality

qwen_chat.py

This module implements a custom wrapper for the Qwen-7B language model, enabling its integration with LangChain's interface.

LocalQwenChat Class

class LocalQwenChat(BaseChatModel):
    # Using Qwen-7B model
    model_name: str = "Qwen/Qwen-7B-Chat"
    device: str = "cuda" if torch.cuda.is_available() else "cpu"
    tokenizer: Any = None  
    model: Any = None

LocalQwenChat class extends LangChain's BaseChatModel to provide a standardized interface.
Class-level attributes define:
- The Qwen model name (Qwen/Qwen-7B-Chat)
- Device selection (CUDA GPU if available, otherwise CPU)
- Placeholders for the model and tokenizer instances

    def __init__(self, **kwargs):
        super().__init__(**kwargs)
        self.tokenizer = AutoTokenizer.from_pretrained(self.model_name, trust_remote_code=True)
        self.model = AutoModelForCausalLM.from_pretrained(
            self.model_name,
            trust_remote_code=True,
            device_map={"": self.device},
            bf16=True
        ).eval()

The constructor initializes the model by:
- Loading the appropriate tokenizer
- Loading the causal language model weights
- Configuring the model to run on the selected device
- Enabling bfloat16 precision for improved performance
- Setting the model to evaluation mode

    def _generate(self, messages, stop=None, run_manager=None, **kwargs) -> ChatResult:
        prompt = ""
        for message in messages:
            if isinstance(message, SystemMessage):
                prompt += f"System: {message.content}\n"
            elif isinstance(message, HumanMessage):
                prompt += f"User: {message.content}\n"
            elif isinstance(message, AIMessage):
                prompt += f"Assistant: {message.content}\n"
        prompt += "Assistant:"

        inputs = self.tokenizer(prompt, return_tensors="pt")
        inputs_on_device = {
            key: value.to(self.device) for key, value in inputs.items()
        }
        outputs = self.model.generate(**inputs_on_device, max_new_tokens=2048)
        response = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
        response = response[len(prompt):]

        generation = ChatGeneration(message=AIMessage(content=response))
        return ChatResult(generations=[generation])

The _generate method processes input messages and generates responses:
- Converts LangChain message objects into a formatted string prompt
- Tokenizes the prompt using the model's tokenizer
- Moves tensors to the appropriate device (CPU/GPU)
- Generates response tokens with a maximum length of 2048 new tokens
- Decodes the tokenized response into human-readable text
- Creates a proper LangChain ChatResult object with the response

    @property
    def _llm_type(self) -> str:
        return "local_qwen_chat"

The _llm_type property identifies this as a "local_qwen_chat" type LLM for LangChain integration purposes

This implementation provides several benefits:

Enables local execution of the Qwen-7B model for privacy and cost efficiency
Maintains compatibility with LangChain interfaces through standardized methods
Optimizes performance using GPU acceleration when available
Implements efficient memory management through bfloat16 precision
Follows LangChain's expected message format and response structure

By encapsulating the model-specific logic within this wrapper, the rest of the application can interact with Qwen-7B using standard LangChain patterns without needing to handle low-level model operations.

Installation Instructions

To set up and run the AI Project Publication Assistant, follow these steps:

1. Clone the project repository

git clone git@github.com:AndreyGermanov/ai_project_publication_assistant.git
cd ai-publication-assistant

2. Create a virtual environment (recommended)

python -m venv venv
source venv/bin/activate  # Linux/macOS
# or
venv\Scripts\activate     # Windows

3. Install dependencies

pip install -r requirements.txt

Usage Example

Run the assistant with a GitHub repo URL

python app.py

When prompted:

🔗 Enter GitHub repo URL: https://github.com/AndreyGermanov/langchain_qwen_chat_cli

The assistant will:

Clone the repository
Analyze its contents
Extract keywords
Generate an improved README summary
Recommend metadata and tags

Example Output:

🚀 Running Publication Assistant...

📥 Cloning repository...
🔍 Reading repository...
🔑 Extracting keywords...
🧾 Analyzing repository...
🏷️ Recommending metadata...
✍️ Improving content...

📝 Final Analysis Report:
 This project aims to develop a command-line chatbot that utilizes a local copy of the Qwen Chat 7B language model to respond to user queries. The chatbot uses text files stored in a "texts" folder as its knowledge base. During execution, the chatbot converts the text files into vector embeddings using a chroma database. When a user inputs a question, the chatbot retrieves the top ten most similar text chunks from the database and incorporates them as context into the prompt generated by the large language model. However, the chatbot is limited to the information contained within the "texts" folder, and it returns an error message if the user's question is not related to the content of this folder.
The repository currently lacks several key components, including instructions on training the Qwen Chat 7B language model or generating vector embeddings, guidance on updating or expanding the "texts" folder, and details on planned future enhancements to the chatbot. Despite these omissions, the repository provides a solid foundation for developing a chatbot that leverages a pre-trained language model and a local text corpus to generate user responses.
In conclusion, while there is still room for improvement in terms of providing more comprehensive instructions and documentation, this repository offers a promising starting point for anyone interested in building a chatbot with these capabilities. By addressing the gaps identified in the current version of the repository, developers can build upon this foundation to create a more robust and sophisticated chatbot.


🏷️ Recommended Metadata / Keywords:
Tags:

  * Chatbot
  * Natural Language Processing (NLP)
  * Text Analysis
  * Machine Learning
  * Query Generation

Categories:

  * Artificial Intelligence
  * Computer Science
  * Data Science
  * Software Development
  * Programming

Keywords:

  * Language Model
  * Sentiment Analysis
  * Intent Recognition
  * Text Classification
  * Question Answering
  * Dialogue Management
  * Speech Recognition
  * Machine Translation
  * Text summarization
  * Keyword extraction
  * Named Entity Recognition
  * Information Retrieval. 

These tags and keywords can be used to organize and categorize the project on GitHub, making it easier for others to find and contribute to the project. Additionally, using relevant keywords in the project's title, description, and other metadata can help improve its visibility on search engines and make it more discoverable by potential users or contributors. Finally, including relevant documentation and tutorials can help new users understand how to use the project and contribute to its development.

References

Summary

This AI Project Publication Assistant demonstrates how a multi-agent system, powered by local LLMs, tool integration, and LangGraph orchestration, can significantly improve the quality and discoverability of open-source AI/ML projects. Whether you're preparing a new repository for publication or optimizing an existing one, this tool offers valuable insights and automated enhancements.