The AI Project Publication Assistant is a multi-agent system designed to help developers enhance the presentation and discoverability of their AI/ML projects on platforms like GitHub. The system uses a combination of language models, intelligent agents, and external tools to analyze repositories, extract meaningful metadata, and provide actionable suggestions for improvement.
This tool can be applied in real-world scenarios such as:
LocalQwenChat
wrapper around the Qwen-7B LLM model.The assistant's implementation is organized into three main modules: tools.py
, agents.py
, and workflow.py
. Each module serves a specific purpose in the system architecture. All these modules use local QWEN chat model as an LLM, which is defined in the qwen_chat.py
file.
This module defines tools, that used by agents in LangGraph workflow:
@tool def git_clone_tool(repo_url: str, target_dir: str = "cloned_repo") -> dict: """Clones a GitHub repository into a local directory.""" try: if not repo_url.startswith(("http://", "https://")): return {"status": "error", "message": "Invalid URL: must start with http:// or https://"} if os.path.exists(target_dir): os.system(f"rm -rf {target_dir}") print(f"๐ฅ Cloning {repo_url}...") Repo.clone_from(repo_url, target_dir) return {"status": "success", "repo_path": target_dir} except InvalidGitRepositoryError: return {"status": "error", "message": "The URL is not a valid Git repository."} except GitCommandError as e: return {"status": "error", "message": f"Git command failed: {e.stderr.strip()}"} except Exception as e: return {"status": "error", "message": f"Unexpected error: {str(e)}"}
@tool def repo_reader_tool(repo_path: str) -> dict: """Reads README and other key files from a given repository path.""" content = {} for root, dirs, files in os.walk(repo_path): for file in files: if file.lower() in ["readme.md", "readme", "description.md"]: with open(os.path.join(root, file), "r", encoding="utf-8") as f: content[file] = f.read() return {"repo_content": content}
@tool def keyword_extractor_tool(text: str) -> list: """Extracts keywords from input text.""" import nltk from nltk.corpus import stopwords from collections import Counter try: nltk.data.find('tokenizers/punkt') nltk.data.find('corpora/stopwords') except LookupError: nltk.download(['punkt', 'stopwords']) words = nltk.word_tokenize(text.lower()) stop_words = set(stopwords.words("english")) filtered_words = [word for word in words if word.isalnum() and word not in stop_words] common_words = Counter(filtered_words).most_common(10) return [word for word, count in common_words]
All tools follow the LangChain tool interface standard, making them easily integrable with LangGraph nodes and other LangChain components. The tools combine Git operations, text processing, and natural language analysis to support the multi-agent system's functionality.
This module contains the implementation of all specialized agents:
class RepoAnalyzerAgent: def __init__(self): self.llm = LocalQwenChat() def analyze(self, repo_content): prompt = ChatPromptTemplate.from_template( "Analyze the following repository content:\n{repo_content}\n\n" "Identify key components such as project goals, features, missing sections, etc." ) chain = prompt | self.llm return chain.invoke({"repo_content": repo_content})
LocalQwenChat
LLM instance for local model inference.read_repo_files
) as input.|
) to create a simple LLM chain.class MetadataRecommenderAgent: def __init__(self): self.llm = LocalQwenChat() def recommend(self, keywords): prompt = ChatPromptTemplate.from_template( "Given these keywords: {keywords}, suggest relevant tags, categories, and keywords " "for an AI/ML project on GitHub." ) chain = prompt | self.llm return chain.invoke({"keywords": ", ".join(keywords)})
LocalQwenChat
LLM instance for local model inference.|
) to create a simple LLM chain.class ContentImproverAgent: def __init__(self): self.llm = LocalQwenChat() def improve(self, analysis_report): prompt = ChatPromptTemplate.from_template( "Improve the clarity and presentation of the following analysis report:\n{analysis}\n\n" "Rewrite it in a more professional and readable format suitable for publication." ) chain = prompt | self.llm return chain.invoke({"analysis": analysis_report.content})
LocalQwenChat
LLM instance for local model inference.|
) to create a simple LLM chain.This module implements the LangGraph-based orchestration logic for the multi-agent system:
from langgraph.graph import StateGraph, END, START from typing import Dict, Any from tools import git_clone_tool, repo_reader_tool, keyword_extractor_tool from agents import RepoAnalyzerAgent, MetadataRecommenderAgent, ContentImproverAgent import os import shutil class PublicationAssistantState(Dict): repo_url: str repo_path: str repo_content: str keywords: list analysis_report: Any metadata_suggestions: Any final_report: Any error: str
def clone_repo(state: PublicationAssistantState): """Clones GitHub repository using git_clone_tool""" result = git_clone_tool.invoke({"repo_url": state["repo_url"]}) if result["status"] == "success": return {"repo_path": result["repo_path"]} else: return {"error": result["message"]}
def read_repo(state: PublicationAssistantState): """Reads repository files using repo_reader_tool""" result = repo_reader_tool.invoke({"repo_path": state["repo_path"]}) state["repo_content"] = result.get("repo_content", {}) return {"repo_content": state["repo_content"]}
def extract_keywords(state: PublicationAssistantState): """Extracts keywords from repository content""" content = str(state["repo_content"]) result = keyword_extractor_tool.invoke(content) state["keywords"] = result return {"keywords": result}
def analyze_repo(state: PublicationAssistantState): """Analyzes repository content with RepoAnalyzerAgent""" agent = RepoAnalyzerAgent() analysis = agent.analyze(state["repo_content"]) state["analysis_report"] = analysis return {"analysis_report": analysis}
def recommend_metadata(state: PublicationAssistantState): """Recommends metadata based on extracted keywords""" agent = MetadataRecommenderAgent() suggestions = agent.recommend(state["keywords"]) state["metadata_suggestions"] = suggestions return {"metadata_suggestions": suggestions}
def improve_content(state: PublicationAssistantState): """Improves the analysis report with ContentImproverAgent""" agent = ContentImproverAgent() improved_report = agent.improve(state["analysis_report"]) state["final_report"] = improved_report return {"final_report": improved_report}
def cleanup_repo(state: PublicationAssistantState): """Cleans up cloned repository after processing""" repo_path = state.get("repo_path") if repo_path and os.path.exists(repo_path): try: shutil.rmtree(repo_path) print(f"โ Successfully deleted: {repo_path}") except Exception as e: print(f"โ ๏ธ Failed to delete {repo_path}: {str(e)}") else: print("โ ๏ธ No repository path found or already deleted.") return {}
def handle_error(state: PublicationAssistantState): """Handles errors during execution""" print(f"โ Error occurred: {state.get('error', 'Unknown error')}") return {}
def route_after_clone(state: PublicationAssistantState): """Conditional router after cloning repository""" if "error" in state: return "error_node" else: return "read_repo"
workflow = StateGraph(PublicationAssistantState) # Add Nodes workflow.add_node("clone_repo", clone_repo) workflow.add_node("read_repo", read_repo) workflow.add_node("extract_keywords", extract_keywords) workflow.add_node("analyze_repo", analyze_repo) workflow.add_node("recommend_metadata", recommend_metadata) workflow.add_node("improve_content", improve_content) workflow.add_node("cleanup_repo", cleanup_repo) workflow.add_node("handle_error", handle_error) # Conditional edge workflow.add_conditional_edges("clone_repo", route_after_clone, { "read_repo": "read_repo", "error_node": "handle_error" }) # Set Entry Point workflow.set_entry_point("clone_repo") # Define Edges workflow.add_edge("read_repo", "extract_keywords") workflow.add_edge("extract_keywords", "analyze_repo") workflow.add_edge("analyze_repo", "recommend_metadata") workflow.add_edge("recommend_metadata", "improve_content") workflow.add_edge("improve_content", "cleanup_repo") workflow.add_edge("cleanup_repo", END) workflow.add_edge("handle_error", END) # Compile Graph app = workflow.compile()
The graph follows a sequential execution pattern with conditional branching:
The graph handles errors gracefully by routing to the error handling node at any point where an error is detected. The cleanup node ensures that temporary files are removed regardless of success or failure.
LangGraph provides several advantages for this implementation:
This module implements a custom wrapper for the Qwen-7B language model, enabling its integration with LangChain's interface.
class LocalQwenChat(BaseChatModel): # Using Qwen-7B model model_name: str = "Qwen/Qwen-7B-Chat" device: str = "cuda" if torch.cuda.is_available() else "cpu" tokenizer: Any = None model: Any = None
BaseChatModel
to provide a standardized interface.Qwen/Qwen-7B-Chat
)def __init__(self, **kwargs): super().__init__(**kwargs) self.tokenizer = AutoTokenizer.from_pretrained(self.model_name, trust_remote_code=True) self.model = AutoModelForCausalLM.from_pretrained( self.model_name, trust_remote_code=True, device_map={"": self.device}, bf16=True ).eval()
def _generate(self, messages, stop=None, run_manager=None, **kwargs) -> ChatResult: prompt = "" for message in messages: if isinstance(message, SystemMessage): prompt += f"System: {message.content}\n" elif isinstance(message, HumanMessage): prompt += f"User: {message.content}\n" elif isinstance(message, AIMessage): prompt += f"Assistant: {message.content}\n" prompt += "Assistant:" inputs = self.tokenizer(prompt, return_tensors="pt") inputs_on_device = { key: value.to(self.device) for key, value in inputs.items() } outputs = self.model.generate(**inputs_on_device, max_new_tokens=2048) response = self.tokenizer.decode(outputs[0], skip_special_tokens=True) response = response[len(prompt):] generation = ChatGeneration(message=AIMessage(content=response)) return ChatResult(generations=[generation])
_generate
method processes input messages and generates responses:
ChatResult
object with the response@property def _llm_type(self) -> str: return "local_qwen_chat"
_llm_type
property identifies this as a "local_qwen_chat" type LLM for LangChain integration purposesThis implementation provides several benefits:
By encapsulating the model-specific logic within this wrapper, the rest of the application can interact with Qwen-7B using standard LangChain patterns without needing to handle low-level model operations.
To set up and run the AI Project Publication Assistant, follow these steps:
git clone git@github.com:AndreyGermanov/ai_project_publication_assistant.git cd ai-publication-assistant
python -m venv venv source venv/bin/activate # Linux/macOS # or venv\Scripts\activate # Windows
pip install -r requirements.txt
python app.py
When prompted:
๐ Enter GitHub repo URL: https://github.com/AndreyGermanov/langchain_qwen_chat_cli
The assistant will:
๐ Running Publication Assistant...
๐ฅ Cloning repository...
๐ Reading repository...
๐ Extracting keywords...
๐งพ Analyzing repository...
๐ท๏ธ Recommending metadata...
โ๏ธ Improving content...
๐ Final Analysis Report:
This project aims to develop a command-line chatbot that utilizes a local copy of the Qwen Chat 7B language model to respond to user queries. The chatbot uses text files stored in a "texts" folder as its knowledge base. During execution, the chatbot converts the text files into vector embeddings using a chroma database. When a user inputs a question, the chatbot retrieves the top ten most similar text chunks from the database and incorporates them as context into the prompt generated by the large language model. However, the chatbot is limited to the information contained within the "texts" folder, and it returns an error message if the user's question is not related to the content of this folder.
The repository currently lacks several key components, including instructions on training the Qwen Chat 7B language model or generating vector embeddings, guidance on updating or expanding the "texts" folder, and details on planned future enhancements to the chatbot. Despite these omissions, the repository provides a solid foundation for developing a chatbot that leverages a pre-trained language model and a local text corpus to generate user responses.
In conclusion, while there is still room for improvement in terms of providing more comprehensive instructions and documentation, this repository offers a promising starting point for anyone interested in building a chatbot with these capabilities. By addressing the gaps identified in the current version of the repository, developers can build upon this foundation to create a more robust and sophisticated chatbot.
๐ท๏ธ Recommended Metadata / Keywords:
Tags:
* Chatbot
* Natural Language Processing (NLP)
* Text Analysis
* Machine Learning
* Query Generation
Categories:
* Artificial Intelligence
* Computer Science
* Data Science
* Software Development
* Programming
Keywords:
* Language Model
* Sentiment Analysis
* Intent Recognition
* Text Classification
* Question Answering
* Dialogue Management
* Speech Recognition
* Machine Translation
* Text summarization
* Keyword extraction
* Named Entity Recognition
* Information Retrieval.
These tags and keywords can be used to organize and categorize the project on GitHub, making it easier for others to find and contribute to the project. Additionally, using relevant keywords in the project's title, description, and other metadata can help improve its visibility on search engines and make it more discoverable by potential users or contributors. Finally, including relevant documentation and tutorials can help new users understand how to use the project and contribute to its development.
This AI Project Publication Assistant demonstrates how a multi-agent system, powered by local LLMs, tool integration, and LangGraph orchestration, can significantly improve the quality and discoverability of open-source AI/ML projects. Whether you're preparing a new repository for publication or optimizing an existing one, this tool offers valuable insights and automated enhancements.