Artificial Intelligence is the ability of a machine to think and behave as close as possible to humans. It enables a system to learn, reason, and solve problems just like humans do. AI agents are at the next level of artificial intelligence. There has been a lot of discussion about AI agents since the beginning of this year as we have been hearing statements from the CEOs of Meta, Microsoft, and Nvidia about the rise of AI agents and the dominance of autonomous AI systems all over the world, potentially reshaping the technological landscape. These big tech giants are cutting down their work force by embedding agentic behavior into their routine tasks. Many companies are rapidly integrating AI agents into their workflows, transforming their processes through intelligent, autonomous systems in order to increase their productivity.
AI agents are considered as the biggest transformation in the history of Artificial Intelligence. They are set to revolutionize the way work gets done in any specific domain.
AI agents have the potential to redefine how work is done across industries. The first question that comes to mind is what exactly are AI agents and how is agentic AI different from traditional AI, specifically generative AI? In this article, I'll explore AI agents in detail, defining them in the simplest possible way and taking them to the next level, exploring agentic patterns, architecture and workflows along with some real-world examples of AI agents and their implementation.
An AI agent is an intelligent system that solves a real-world problem by automating tasks, eventually making one's life easier and improving productivity.
Think of AI as raw intelligence. It has got the potential to behave intelligently but it's not really useful until you give it a job. It needs a clear direction in order to utilize its full potential, but where does all this intelligence reside? Well, large language models (LLMs) possess this vast amount of intelligence. They have been trained on huge corpuses of text and are becoming more and more intelligent with every day passing by. Despite all this intelligence, LLMs don't take action on their own. This is where an AI agent steps in. An AI agent is a system which helps achieve a useful goal by using all the intelligence of the LLMs.
AI agents expand beyond the capabilities of a generative AI model alone. Unlike traditional AI models that generate responses based on prompts, AI agents go a step further. They are provided with clear goals and objectives, and they can autonomously make decisions, interact with external systems, and execute tasks whether its answering questions, managing workflows, or automating daily tasks. Its the difference between knowing how to do something and actually doing it.
Here's a simple way to think about it:
AI = Brain (Knowledge and Intelligence)
AI Agent = The Executor (The one who makes things happen)
An AI agent follows a structured process:
Let's think of an analogy to understand an AI agent:
Imagine you return from a long vacation and find your inbox with hundreds of emails waiting for you to go through. You get overwhelmed. You obviously don't have time to read everything.
Now, think of an AI agent as your smart assistant. It
You give instructions to your agent according to your preferences and it performs tasks accordingly, saving your time and effort. That's exactly how an AI agent works. Developers provide these agents with detailed instructions so they know exactly how to respond in different situations. Some agents come programmed with basic instructions like an experienced assistant capable of performing routine tasks. Others can be customized to fit your needs like giving the assistant some special rules according to your needs and preferences.
A more formal definition of an AI agent could be:
An AI agent is a software system which is capable of performing tasks autonomously in order to achieve a specific goal by leveraging the capabilities of various tools and external systems such as APIs and databases, and by interacting with other agents.
An AI agent takes your goal, processes information, makes decisions, and executes actions - automating tedious tasks so you don't have to.
Agents are not as simple as we think they are. They are complex systems involving
Not everything is an AI agent. Nowadays, every developer wants to develop an AI agent whether it's useful or not.
Be cautious! Every single thing IS NOT and SHOULD NOT be an AI agent.
We have to be very thoughtful while putting in efforts developing an AI agent. It should solve a real-world problem and be intelligent enough to help humans in their routine tasks enhancing their productivity. It can also be helpful in specialized tasks using its intelligence and decision-making capabilities. Otherwise, it's just a piece of complicated software.
Agentic AI is the future!
However, the concept of agents is not new - it has already been there since decades.
In their book "Artificial Intelligence - A Modern Approach", Stuart J. Russel & Peter Norvig define an agent as:
An agent is anything that can be viewed as perceiving its environment through sensors and acting upon that environment through actuators.
Today, when we talk about AI agents, we are usually referring to LLM-powered AI agents.
LLMs on their own have certain limitations like they are constrained by the data they have been trained on. They are capable of performing a variety of tasks like summarizing documents, creating drafts for emails, and generating reports to name a few. Fine-tuning LLMs requires a substantial amount of investment in data and resources. This is where AI agents take the charge. The real magic happens when we start building systems around LLMs, essentially taking the LLM and integrating it into our processes and workflows.
For example, imagine an autonomous AI agent that scans your inbox to summarize important emails, highlighting the ones requiring urgent attention, saving a lot of time that can be used to focus on more productive tasks.
Taking this a step further, think of a collaborative research copilot, a team of AI agents working together to achieve the research goals.
The future of AI isn't just about more powerful models - it's about intelligent, autonomous systems that enhance productivity and decision-making. Systems that automate repetitive tasks, taking your business to the next level. That's the power of Agentic AI.
Agentic AI is potentially being considered as the biggest transformation we'll see in our lifetime. Those who embrace AI agents now will have a major advantage over those who don't. The world of AI is changing at a fast pace and we must understand the importance of AI agents and their use in our particular scenario, otherwise we might be left behind in this rapidly changing world. The key is identifying the right agent type based on our particular requirements.
How is an AI agent different from a traditional AI system?
The key difference lies in its autonomy- the ability to think independently (to a certain extent) in order to make decisions and give intelligent suggestions.
According to the Agents whitepaper by Google,
The combination of reasoning, logic, and access to external information that are all connected to a Generative AI model invokes the concept of an agent.
While a traditional AI system completes its task as directed, an AI agent adds more intelligence to traditional AI capabilities, behaving more like a human.
Let's say we want to automate the process of ordering food at a restaurant. A traditional AI chatbot would interact with the user, take the order, extract important information and place that order according to the user's requirement. However, an AI agent would remember the returning users, recall what they ordered from their past experiences, and give them valuable suggestions or recommendations to make their experience more pleasant. That sounds more like an intelligent human who is conscious and aware of his returning customers.
Generative AI is all about LLM models. Big tech giants have been competing to build the best LLM models for the past few years. 2024 has been the year to build custom chatbots using LLMs like OpenAI, Llama3, and others. These LLMs help generate content. A query by the user is sent to the LLM in the form of a prompt which generates a response as an output. These LLMs can also be further fine-tuned for specific tasks.
Agentic AI, on the other hand, are autonomous AI systems which are able to perform tasks on their own. 2025 is the year to build autonomous AI agents. They have a specific goal to achieve. These AI systems work independently without any human intervention to achieve their goal. You can integrate as many tools into these systems as you want. The goal is mostly based on a business outcome. They have complex workflows and can fine-tune themselves to improve their performance.
LLMs are the brain of AI agents. They provide the agents with the autonomy they require to perform specific tasks. They are smart enough to break the bigger task into smaller sub-tasks and decide a plan of action to execute dynamically. They enable the agent to perform in a specific manner using memory, knowledge, and tools.
Agentic workflows have LLM calls chained together where an agent lets the LLM decide how many times to run. It may continue the loop until it finds a resolution. For example, talking to a customer for customer support or iterating on code changes. When you don't know the number of steps to complete a task, it is an autonomous workflow where we don't predefine a path. All this has been possible with the evolution of large language models (LLMs). LLMs and tools are getting better and smarter every day making the agents more capable and prevalent.
Prompt is the most important part in giving instructions to an AI agent. It should be clear and thoughtful. It must define the goal that wants to be achieved and specify the methodology to achieve that goal. The tools that are given to the model must be properly documented. This helps the LLM in deciding which tool to use for a particular task. The prompt should be detailed enough but we cannot deny the importance of properly documented tools. How can you use tools and function calling if you don't know what the parameters mean and what that tool actually does? Tools should be described properly to understand their purpose. Writing a good tool description influences other parts of the agent prompt.
Control logic of a compound AI system is the path to answer a query. Previously, this control logic had been defined by hard coded rules. Another way to control the logic of a compound AI system is to put the LLM in charge. This has only become possible because we are seeing a lot of improvement in the reasoning capabilities of large language models. You can feed complex problems to LLMs and then prompt them to break the problem into smaller problems and come up with a plan on how to tackle them.
If we control the logic programmatically, we are designing our system to stick with the instructions being given without deviating from a set path. On the other hand, if we control the logic through LLM, then we are designing our system to break the problem down into sub-problems, make a plan to solve them, try to come up with a solution, and see if we need to readjust the plan accordingly.
Agentic AI approach is putting LLMs in charge of the whole system.
Since the release of ChatGPT in November, 2022, there have been various shifts in the field of Generative AI.
2022 was the year of LLMs. Models on their own are limited by the data they have been trained on. This impacts what they know about the world and what tasks they may solve. So they have limited knowledge and are hard to adapt to new changes in the real-world. Information is being generated at the speed of light. Data is all around and new data keeps on being added to the existing data. One option is to fine tune the models but that requires an investment in data, resources, and time. Models on their own could be useful for a variety of tasks like summarizing documents, creating first drafts of emails, or creating reports.
In 2023, compound AI systems started to emerge. Certain problems are better solved when you apply the principles of system design. Building an AI system around the model makes it possible to solve many complex real-world problems where context is important.
Accomplishing a task through system design is inherently easier to solve than tuning a model. Systems are faster, quicker, and easier to adapt. Systems have programmatic components such as output verifiers, query breakers, database search engines, and customized tools. We can pick the right components to solve our problem.
Retrieval Augmented Generation (RAG) is one of the most popular and commonly used compound AI systems. It utilizes the capabilities of LLMs and adds context to the information, also trying to resolve the hallucination of LLMs.
In 2024, all focus shifted to building AI agents, LLM agents to be specific. AI Agents are systems built around the large language models (LLMs), actually taking the model and integrating into the processes that we have.
Following are the core components of an AI agent:
Every agent defines its role, description, and specific instructions to carry out tasks. It specifies the model (LLM) and available tools to be used at various stages during its course of action. Similarly, it also specifies the memory and knowledge base to be used for storing conversations, interactions, and other information.
The role of an agent defines its personality. It allows the agent to behave in a specific manner appropriate to its assigned task.
Large language models (LLMs) are the basic building blocks of an AI agent. They are the brain of the agent allowing it to think, understand, reason, plan, and act accordingly. In addition to that, they serve as the generative component of the agent as well, helping it to generate meaningful responses.
Tools are external resources that an agent can use to achieve its goal. They can be as simple as Python functions, or APIs, databases, or other external systems. Tools enhance the capabilities of an agent by helping it access information and interact with its environment. Agents are smart enough to understand the functionalities of tools and utilize them as required during their workflow. Examples of tools include:
Agents possess short term and long term memory for specific purposes. Short term memory is used for immediate interactions with the environment whereas long term memory is used for storing information and conversations. The agents can maintain context, learn from historical data and past experiences, and improve their performance. They are adaptive and learn from past interactions.
Knowledge base helps agents store the information and context which is required to perform their tasks.
Agents possess the ability to think and reason utilizing the large language models (LLMs) at their disposal. This helps them achieve the level of autonomy required to complete various tasks moving towards their goal.
Imagine you are going on a 5-day business trip to Netherlands next month. You want to pack efficiently, considering the weather, your meetings, and any formal events.
You ask an AI agent:
I have a 5-day business trip to Netherlands next month. I need to know what clothes to pack based on the weather, my meetings, and evening events. Can you please help?
An LLM alone won't be able to solve this problem for you because it needs your specific travel information along with some real-time weather information which it might acquire through APIs.
This is a modular, multi-step planning problem, where the agent retrieves external data, personalizes recommendations, and optimizes the process, making it a great AI agent use case!
Here are a few other agentic AI use-cases which help solve real-world problems.
Think of an agent which writes a code and then tests it to pass all unit tests, corrects the code and continues this loop making the program perfect in terms of quality. When the coding agent sees the error during each step of the loop, it helps the model converge towards the right solution sooner after getting this feedback.
Think of a chatbot which remembers the customer, knows all the information about particular customers and serves them according to their preferences.
A personal healthcare assistant which remembers everything about a person's health records, nutrition requirements, fitness and well-being. It has access to a person's vitals through fitness trackers. It advises a person about regular physical and mental examination based on his/her current condition.
Think of an automated system which triggers important alarms automatically assessing an emergency situation through its sensors. It might call the appropriate helplines and rescue operations when in need.
An agent to review emails and prepare tasks on a digital to-do list based on the highest priority emails.
An environmental sustainability agent that uses weather data, satellite images, and other information to decide how to take care of plants and their environment.
A travel planning agent that takes a user's preferences and needs into account, researches things like destinations, hotels, and activities, and then suggests an itinerary for the user. It might use a collaborative agent to book everything taking the user into confidence.
Agents have a clear and specific goal to achieve. Every step they take moves them closer to their end goal. They plan and divide the goal into sub-tasks. They process information iteratively, making decisions at every step about the next action to take based on the output from the previous step. The orchestration layer is responsible for the overall execution of an agent delegating responsibilities to various components of the agentic architecture. Prompts define the role and task of an agent. The better the prompt, the easy it is for the agent to achieve the required objective. The evolution of prompt engineering and task planning for language models helps make the agents much more better with the passage of time.
Here are a few prompt engineering frameworks considered as best for the agents:
The ReAct framework is a technique for building AI agents that combine the reasoning capabilities of LLMs with the ability to take actions, allowing them to solve complex problems.
Chain-of-thought prompting is a technique that enhances the reasoning capabilities of LLMs by guiding them to generate step-by-step explanations before arriving at a final answer.
Tree-of-thoughts framework is a problem-solving technique that allows LLMs to explore multiple reasoning paths simultaneously, and evaluating different solution paths.
Agents can utilize these frameworks or other techniques to choose the next best action based on a user query.
The quality of an agent's response highly depends on the quality of the prompt it receives.
These prompts enable the agent to think and plan their tasks in an effective manner. They help them decide which tools or external resources to use in order to complete their assigned tasks properly.
An AI agent is as good as the model you are using. You can totally customize your agents with agentic frameworks by providing various tools. Docstring of a function or tool is extremely important. When you register a tool, the LLM understands the purpose of that tool by going over the docstring. The agent gives the docstring to the LLM which can understand it and map it to the intent of the user query. The agent basically uses docstrings to enhance its understanding of its tools. Based on the docstring, the LLM is smart enough to map the function or tool to the user query and also give the required arguments to the function or tool.
There are many types of AI agents based on the level of autonomy, task execution, and integration with tools and memory. Here are a few common types of agents:
They are the simplest form of automation. They execute preprogrammed instructions without thinking and adaptation. They are mostly efficient but not much flexible. These agents are best suited for repetitive tasks such as email autoresponders and invoice processing agents.
These are LLM-enhanced agents for contextual understanding. They are simple but intelligent and mostly suitable for low-complexity, high-volume tasks such as filtering emails and AI-enhanced content generation.
They have a clear, specific goal to achieve. They instruct the LLM to achieve that goal which thinks and makes a plan of action. They utilize tools, knowledge, and memory to achieve intermediate tasks and iterate until the goal is complete. Examples include coding copilots and automated data analysis systems.
These agents learn, adapt, evolve, and improve themselves over time. They don't need constant human intervention. They use reasoning, memory, and autonomous learning capabilities such as complex research and simulation agents.
These agents are a combination of reasoning and action. They involve strategic thinking for task decomposition and multi-step decision-making. They are the closest to human problem-solving behavior dynamically adjusting their approach. Their examples include project planning tools and content generation agents which produce and critique their content in an iterative manner in order to improve the response in every step of the way. Real-time external knowledge combined with a ReAct agent makes it even more powerful for precision critical tasks.
These agents use historical context, task history, and previous interactions to remember user preferences. Examples include customer service chatbots and personalized recommendation systems for shopping, travel, and other things.
An AI agent is a combination of two things:
The interface is how you interact with an AI agent. It may be through a chat window, a voice assistant or a button on a website.
Behind that interface is the workflow. You ask the AI agent a question or give it a command through the interface. From there, an AI agent follows a series of steps in a workflow. These steps could involve accessing information from a database, making a decision based on what it finds or even performing a task like sending an email. The steps in this workflow may change depending on what you ask or what the agent needs to do next. The AI agent works through a whole process to get you the best result.
At a higher level, an agentic workflow is a 4-step process:
At a lower level, an AI agent works in the following way:
Before LLMs, these agents had defined rules which helped them make decisions on the go. LLMs help the agents achieve that level of autonomy which is required to make decisions in real-time based on the current situation and responses from the agent itself.
Autonomous AI agents work by simplifying and automating complex tasks.
Basic agentic workflow includes:
The agent receives a specific instruction or goal from the user. Planning involves:
In order to act on the planned tasks, agents may acquire information in the following ways:
Task implementation involves accomplishing a task and moving to the next one in plan. The agent needs to evaluate if the goal has been achieved before moving on to the next task using external feedback or inspecting its own logs. This might require creating and acting on more tasks as the agent moves towards its goal.
Workflow prompt is a prompt which you feed to an LLM, then take its response and feed it to the same or another LLM, and continue this pattern until you are done. You might have a fixed number of steps or you may check the intermediate results to decide of they are good enough to stop. Each of these prompts is very specific in terms of taking one input and transforming it into an output.
In contrast, an agent prompt is much more open-ended and usually gives the model tools and multiple things to check. Feedback is very important in order for the agent to converge to the right answer.
In multiagent collaborative systems, we may use a combination of workflow prompt and agent prompts. Every single agent which is a part of the system is given an open-ended prompt to accomplish its task. These single agents can then be put into a workflow in which the input of the next agent depends on the output of the previous agent and so on. This helps agents achieve a bigger goal in a collaborative manner while achieving their defined tasks autonomously.
AI agents can be configured in many ways. The configuration of an AI agent in a specific manner is called an agentic pattern.
Most popular agentic patterns include:
The reflection pattern involves a Generate agent and a Reflect agent both working in a loop. One agent generates the content and the other agent critiques or reflects upon its response including its own feedback and sending the response back to the generate agent so that it can improve or make it better. When an agent iteratively reflects on its response to improve it, we eventually get far better results.
Large language models can become effective AI agents by reflecting on their own behavior.
A tool is a way for an LLM to access the outside world. It is like a Python function that the LLM can access and run to fetch some relevant result. e.g.
How can we make this function (tool) available to an LLM? We use system prompt in which we tell the LLM to behave as a function calling AI model.
Tool use agent has the capability of using tools. It has a list of tools and it selects the right tool for the question that we have asked. Then it runs the tool to fetch the relevant information from the outside world. It ultimately returns this information in natural language as a response.
This pattern is also called ReAct - reason and act. It tries to improve the learning and reasoning capabilities of LLMs.
Core agents implement the ReAct technique because it is one of the most known techniques for improving the capabilities of LLMs.
The main idea of this pattern is to divide a task into smaller, simpler subtasks that are executed by different agents. Each agent adopts a role and solves a smaller task. When all small tasks have been completed, we may say that the bigger task has been achieved by the crew of agents or multi-agents. There is a dependency between the agents in a multi-agent system.
According to the Agents whitepaper by Google, the three essential components in an agent runtime from user query to agent response include:
Large language model (LLM) is the main decision-maker or task-planner in an agent.
Foundation models are constrained by their inability to interact with the outside world. This brings the importance of using tools in agentic architecture.
Orchestration is an iterative process where an agent takes information, does internal reasoning, and makes the next decision. The loop continues until the agent has reached its goal or a stopping point. It can be simple (calculations with decision rules) or complex (chained logic, additional machine learning algorithms, probabilistic reasoning techniques) depending on the goal and tasks.
Let's differentiate between a model and an agent.
Following are a few agentic frameworks used to build cutting-edge AI agents for real-world use-cases.
Despite having a lot of benefits, agents pose some significant challenges which include:
Confused about whether you should build an AI agent or not for your use case? Consider the following aspects.
Agents are highly beneficial when tasks require complex decision-making, autonomy, and adaptability. For example:
Agents are not useful for straight-forward, infrequent, and minimal automation tasks or tasks requiring deep domain-specific knowledge. They are also not suitable for high-stake decision-making tasks involving financial transactions or data security and privacy concerns.
Let's build two AI agents - a simple AI research agent and a multi-agent collaborative system - in order to understand the agentic workflow process. We'll use agno to build our AI agents. I'll take you step by step explaining each step in detail so that you get an in-depth understanding of the agentic workflow.
Agno is an open-source Python framework for building multi-modal AI agents. It claims to be the fastest framework for building AI agents. It is simple, intuitive, efficient, and easy to learn. It enhances LLMs with memory, knowledge, tools, and reasoning as required.
Finding and reading research papers is a tedious task which requires days and months of effort. Understanding the papers and writing literature reviews is a very time-consuming task. Automating this task can make the lives of students, researchers, and AI professionals easier in terms of understanding and reviewing research papers.
We'll build the following two AI agents to understand the agentic process.
Let's build a simple research agent which finds and extracts research papers on a given topic, extracts their metadata, summarizes them, and generates a comprehensive literature review using a single research agent.
simple-research-agent/ # Main project folder │── frontend/ # Frontend directory │ ├── app.py # Streamlit application │ │── backend/ # Backend directory │ ├── main.py # FastAPI backend │ ├── pdf.py # Converts output to PDF │ ├── utils.py # Utility functions │ ├── research_agent.py # Simple Research Agent │ │── requirements.txt # Dependencies │── .env # Environment variables │── LICENSE # License file │── README.md # Documentation
Create a new virtual environment and install the dependencies using requirements.txt file.
requirements.txt
streamlit uvicorn fastapi agno groq openai requests arxiv pypdf reportlab html2text
.env
Create a .env file and add your API key.
TOGETHER_API_KEY="your_api_key_here"
The frontend of the simple research agent looks like this:
User Input: User enters a research topic and the number of papers to fetch from arXiv.
Output: Literature review of the fetched papers.
app.py
This is the main entry point of the application built in Streamlit. It uses requests to interact with the FastAPI endpoints at the backend.
import streamlit as st import requests import json # Set Streamlit page configuration st.set_page_config(page_title="AI Research Agent", layout="wide") # Initialize session state variables if "result_json" not in st.session_state: st.session_state.result_json = None # Stores fetched research paper results if "research_topic" not in st.session_state: st.session_state.research_topic = "" # Stores user-entered research topic if "max_papers" not in st.session_state: st.session_state.max_papers = 5 # Stores user-defined number of papers to fetch # Sidebar UI elements with st.sidebar: st.image("../images/research_logo.png", width=150) # Display research agent logo st.subheader("Search and Generate Literature Reviews") st.header("🔍 Search Papers") # User input fields for topic and number of papers topic = st.text_input("Enter Research Topic:", key="topic_input", value=st.session_state.research_topic) max_papers = st.number_input("Number of Papers to Fetch:", min_value=1, max_value=10, value=st.session_state.max_papers, key="papers_input") # Button columns for fetching papers and refreshing col_fetch, col_refresh = st.columns([2, 1]) with col_fetch: if st.button("Fetch Papers & Generate Review"): if topic: st.session_state.research_topic = topic # Save topic in session state try: with st.spinner("Fetching papers..."): API_URL = "http://127.0.0.1:8000/fetch_papers/" # Backend API endpoint response = requests.post(API_URL, json={"topic": topic, "max_papers": max_papers}) if response.status_code != 200: st.error("Error: Invalid response from server.") st.stop() result_decoded = response.content.decode("utf-8") st.session_state.result_json = json.loads(result_decoded) # Store API response in session state except json.JSONDecodeError: st.error("Error: Received an unreadable response from the server.") st.stop() except requests.exceptions.RequestException: st.error("Error: Could not connect to the server.") st.stop() with col_refresh: if st.button("🔄 Refresh"): st.session_state.result_json = None # Clear previous results st.session_state.research_topic = "" # Reset topic field st.session_state.max_papers = 5 # Reset number of papers st.rerun() # Force a page refresh # Main content layout col1, col2 = st.columns([1, 11]) # Page title st.title("AI Research Agent") # Initial message when no results are available if st.session_state.result_json is None: st.info("Enter a research topic and click 'Fetch Papers & Generate Review' to get started.") # Display results if available if st.session_state.result_json: research_topic = st.session_state.research_topic research_topic = ' '.join(word.capitalize() for word in research_topic.split()) # Capitalize topic words st.header(f"📖 Literature Review: {research_topic}") # If a PDF path is provided in the response, display download option if "pdf_path" in st.session_state.result_json: pdf_path = st.session_state.result_json["pdf_path"] filename = f"{topic.replace(' ', '_')}_literature_review.pdf" # Create a row layout for success message + download button col_success, col_download = st.columns([7, 1]) with col_success: st.success("Literature Review Generated!") with col_download: st.download_button(label="📥 Download", data=open(pdf_path, "rb"), file_name=filename, mime="application/pdf") # Display the generated literature review text st.write(st.session_state.result_json.get("response", "No review available.")) else: st.error("No papers found!")
from fastapi import FastAPI from pydantic import BaseModel from agno.agent import RunResponse from research_agent import research_agent from utils import extract_metadata, save_paper_metadata, generate_pdf import os # Initialize FastAPI app app = FastAPI() # Define request model for research paper fetching class ResearchRequest(BaseModel): topic: str # Research topic provided by the user max_papers: int = 5 # Default number of papers to fetch # Define API endpoint to fetch research papers @app.post("/fetch_papers/") async def fetch_papers(request: ResearchRequest): # Call the research agent to search for relevant papers and generate a literature review response: RunResponse = research_agent.run( f"Search for {request.max_papers} most relevant papers on {request.topic} and generate a literature review." ) if response: # Extract metadata from the response (e.g., title, authors, publication year, etc.) metadata = extract_metadata(response.content) # Save extracted metadata to a file for future reference metadata_file = save_paper_metadata(request.topic, metadata) # Generate a PDF containing the literature review generate_pdf(request.topic, response.content) # Define the PDF file path based on the topic name pdf_path = os.path.join("../literature_reviews", f"{request.topic.replace(' ', '_')}_literature_review.pdf") # Return the generated literature review details return { "message": "Literature review generated!", "pdf_path": pdf_path, "response": response.content } else: # Return an error message if no papers were found return {"error": "No papers found"}
from reportlab.lib.pagesizes import letter from reportlab.lib.units import inch from reportlab.pdfgen import canvas from reportlab.lib.colors import blue, black, gray import re class PDFDocument: def __init__(self, filename, title): """ Initializes the PDF document with the given filename and title. """ self.canvas = canvas.Canvas(filename, pagesize=letter) self.canvas.setTitle(title) self.width, self.height = letter def add_page_number(self, page_num): """ Adds a page number at the bottom of each page. """ self.canvas.setFont('Helvetica', 10) self.canvas.drawRightString(self.width - inch, 0.5 * inch, f"Page {page_num}") self.canvas.line(inch, 0.6 * inch, self.width - inch, 0.6 * inch) def draw_text(self, text, x, y, size=12, line_height=14, indent=0): """ Draws formatted text onto the PDF, handling bold, italic, and links. """ def get_chunks(text): """ Splits the text into chunks based on formatting (bold, italic, links). """ patterns = [r'\*\*(.*?)\*\*', r'\*(.*?)\*', r'(https?://\S+)'] cursor = 0 for match in re.finditer('|'.join(patterns), text): if cursor < match.start(): yield text[cursor:match.start()], 'Helvetica', black if match.group().startswith('**'): yield match.group()[2:-2], 'Helvetica-Bold', black elif match.group().startswith('*') and not match.group().startswith('**'): yield match.group()[1:-1], 'Helvetica', black elif re.match(r'https?://', match.group()): yield match.group(), 'Helvetica-Oblique', blue cursor = match.end() if cursor < len(text): yield text[cursor:], 'Helvetica', black cursor_x = x + indent cursor_y = y line_width = self.width - 2 * inch - indent for chunk, style, color in get_chunks(text): self.canvas.setFont(style, size) self.canvas.setFillColor(color) words = chunk.split() for word in words: word_width = self.canvas.stringWidth(word, style, size) if cursor_x + word_width > x + line_width: cursor_y -= line_height cursor_x = x + indent self.canvas.drawString(cursor_x, cursor_y, word) cursor_x += word_width + self.canvas.stringWidth(' ', style, size) self.canvas.setFillColor(black) return x, cursor_y - line_height def create_pdf(self, topic, content): """ Generates a literature review PDF from the provided topic and content. """ page_num = 1 y = self.height - 1.5 * inch sections = content.split("\n") bullet = u"\u2022" # Unicode for bullet point list_counter = 1 # Add Title at the top self.canvas.setFont('Helvetica-Bold', 20) self.canvas.drawString(inch, y, f"Literature Review: {topic}") y -= 0.5 * inch # Space below the title for section in sections: # Add new page if the section is a new heading and space is low if section.startswith("### ") and y < self.height - 2 * inch: self.add_page_number(page_num) self.canvas.showPage() page_num += 1 y = self.height - inch # Add new page if reaching bottom of the page if y < 2 * inch: self.add_page_number(page_num) self.canvas.showPage() page_num += 1 y = self.height - inch # Handle headings if section.startswith("### "): y -= 0.5 * inch self.canvas.setFont('Helvetica-Bold', 18) self.canvas.drawString(inch, y, section[4:].strip()) y -= 0.3 * inch self.canvas.line(inch, y, self.width - inch, y) y -= 0.4 * inch # Handle bold text elif section.startswith("**"): y -= 0.3 * inch _, y = self.draw_text(section, inch, y, 14) # Handle numbered lists elif section.startswith("* "): section = f"**{list_counter}.** {section[2:]}" list_counter += 1 _, y = self.draw_text(section, inch, y, 12, 14, 10) # Handle bullet points elif section.startswith(" + ") or section.startswith(" - "): section = f"{bullet} {section[4:]}" _, y = self.draw_text(section, inch, y, 12, 14, 20) # Handle normal text else: _, y = self.draw_text(section, inch, y) y -= 0.1 * inch # Add slight space between sections # Add final page number and save PDF self.add_page_number(page_num) self.canvas.save()
import os import json import re from pdf import PDFDocument def extract_metadata(response_content: str): metadata = [] paper_sections = response_content.split("## ")[1:] # Splitting by section titles for section in paper_sections: lines = section.strip().split("\n") title = lines[0].strip() # Skip non-research sections if title.lower() in ["conclusion", "references"]: continue authors = re.search(r"\*\*Authors\*\*: (.+)", section) publication_date = re.search(r"\*\*Publication Date\*\*: (.+)", section) keywords = re.search(r"\*\*Keywords\*\*: (.+)", section) source_link = re.search(r"(http[s]?://[^\s]+)", section) metadata.append({ "title": title, "authors": authors.group(1) if authors else "", "publication_date": publication_date.group(1) if publication_date else "", "keywords": [kw.strip() for kw in keywords.group(1).split(",")] if keywords else [], "source_link": source_link.group(1) if source_link else "" }) return metadata # Save paper metadata as JSON def save_paper_metadata(topic, papers): PAPERS_DIR = "../papers_metadata" os.makedirs(PAPERS_DIR, exist_ok=True) file_path = os.path.join(PAPERS_DIR, f"{topic}_papers.json") with open(file_path, "w") as f: json.dump(papers, f, indent=4) return file_path def generate_pdf(topic, response): PDF_DIR = "../literature_reviews" os.makedirs(PDF_DIR, exist_ok=True) pdf_path = os.path.join(PDF_DIR, f"{topic.replace(' ', '_')}_literature_review.pdf") print("Generating PDF...") # Initialize PDFDocument pdf = PDFDocument(pdf_path, f"Literature Review: {topic}") # Generate PDF content using the class method pdf.create_pdf(topic, response) print(f"PDF successfully generated: {pdf_path}") return pdf_path
We are using the Agent class from the agno framework. It uses a model (LLM) and a tool (ArXivTools) to accomplish its task. The LLM is used through the Together API provided in the framework. Similarly, ArxivTools are being used from built in toolkits in the framework to search and find research papers from arXiv. The description and instructions provided to the agent are very important here. We give a comprehensive set of instructions which help the agent autonomously decide which course of actions to take in order to achieve its goal. First, it uses ArxivTools to search the papers and bring their content including metadata and summaries. Then it utilizes those summaries to generate a comprehensive literature review.
from agno.agent import Agent, RunResponse from agno.models.together import Together from agno.tools.arxiv import ArxivTools from agno.utils.pprint import pprint_run_response from dotenv import load_dotenv import os load_dotenv() # Define the research agent research_agent = Agent( name="research-agent", model=Together( id="meta-llama/Llama-3.3-70B-Instruct-Turbo-Free", api_key=os.getenv("TOGETHER_API_KEY")), tools=[ArxivTools()], description= ''' You are a research agent that will fetch papers from arXiv based on a given topic and maximum number of papers. You will generate a comprehensive literature review of these papers. ''', instructions=[ "Search for research papers on the given topic from arXiv.", "If the number of papers is not specified, fetch 5 by default.", "Process papers one at a time to reduce token usage.", "For each paper, extract the following metadata: title, abstract, authors, publication date, keywords, and source link.", "Limit extracted text to at most 2000 tokens per paper.", "Generate a concise summary (max 300 tokens) for each paper.", "After all papers are processed, generate a literature review using only stored summaries.", "The final response should include the literature review of each research paper in a separate paragraph." "Each paper review should start with the number and title of the paper as a subheading." "The metadata of each paper should be displayed after the title.", "Follow the following format for the metadata of each paper in normal text displaying each metadata item on a new line:", "**Authors**: [Authors of the Paper]", "**Publication Date**: [Date of Publication]", "**Keywords**: [Keywords of the Paper]", "Follow the following format for the review of each paper in italic text:", "**Review**: *[Literature review of the Paper]*", "Do not include a Literature Review heading in the response.", "Include a conclusion section providing a combined summary of all papers.", "Include a references section with citations and source links at the end of the literature review.", "Include numbers for each paper in front of their titles.", "Generate the literature review by processing the summaries of the papers in smaller chunks instead of one long response." ], markdown=True, show_tool_calls=True, debug_mode=True )
Here is a demonstration of using the simple research agent.
cd backend uvicorn main:app --reload
cd frontend streamlit run app.py
Access the application through the following link.
http://localhost:8501
Let's build a multi-agent collaborative system by enhancing the simple research agent built previously. This system finds and extracts research papers on a given topic, extracts their metadata, and summarizes them using a summarization agent, and generates a comprehensive literature review using a review generation agent. Both these agents work in an agentic workflow as the review generation agent is dependent on the response from the summarization agent.
research-copilot/ # Main project folder │── frontend/ # Frontend directory │ ├── app.py # Streamlit application │ │── backend/ # Backend directory │ ├── main.py # FastAPI backend │ ├── pdf_from_json.py # Converts JSON output to PDF │ ├── utils.py # Utility functions │ │ │ ├── agents/ # Agents folder │ │ ├── summarization_agent.py │ │ ├── review_generation_agent.py │ │ │ ├── tools/ # Tools folder │ │ ├── paper_download_tool.py │ │ │ ├── workflows/ # Workflows folder │ │ ├── research_workflow.py │ │── requirements.txt # Dependencies │── .env # Environment variables │── LICENSE # License file │── README.md # Documentation
Create a new virtual environment and install the dependencies using requirements.txt file.
streamlit uvicorn fastapi agno requests arxiv pypdf sqlalchemy reportlab html2text pymupdf
TOGETHER_API_KEY="your_api_key_here"
The frontend of the research copilot looks like this:
User Input: User enters a research topic and the number of papers to fetch from arXiv.
Output: Literature review of the fetched papers in JSON and PDF format.
app.py
This is the main entry point of the application built in Streamlit. It uses requests to interact with the FastAPI endpoints at the backend.
import streamlit as st import requests import json # Set Streamlit page configurations st.set_page_config(page_title="AI Research Copilot", layout="wide") # Initialize session state variables to maintain user input and results if "result_json" not in st.session_state: st.session_state.result_json = None if "research_topic" not in st.session_state: st.session_state.research_topic = "" if "max_papers" not in st.session_state: st.session_state.max_papers = 5 if "selected_paper" not in st.session_state: st.session_state.selected_paper = None if "chat_history" not in st.session_state: st.session_state.chat_history = [] # Define API URL for backend communication API_URL = "http://127.0.0.1:8000" # Sidebar UI for user inputs and controls with st.sidebar: st.image("../images/research_copilot_logo.png", width=150) # Display logo st.subheader("Search and Generate Literature Reviews") st.header("🔍 Search Papers") # Input fields for research topic and number of papers topic = st.text_input("Enter Research Topic:", key="topic_input", value=st.session_state.research_topic) max_papers = st.number_input("Number of Papers to Fetch:", min_value=1, max_value=10, value=st.session_state.max_papers, key="papers_input") col_btn1, col_btn2 = st.columns([2, 1]) # Layout for buttons with col_btn1: # Fetch papers button if st.button("Fetch Papers & Generate Review"): if topic: st.session_state.research_topic = topic try: with st.spinner("Fetching papers from arXiv..."): response = requests.post(API_URL+"/fetch_papers/", json={"topic": topic, "max_papers": max_papers}) if response.status_code != 200: st.error("Error: Invalid response from server.") st.stop() # Parse response JSON result_decoded = response.content.decode("utf-8") st.session_state.result_json = json.loads(result_decoded) except json.JSONDecodeError: st.error("Error: Received an unreadable response from the server.") st.stop() except requests.exceptions.RequestException: st.error("Error: Could not connect to the server.") st.stop() with col_btn2: # Refresh button to clear results and reset inputs if st.button("🔄 Refresh"): st.session_state.result_json = None st.session_state.research_topic = "" st.session_state.max_papers = 5 st.rerun() # Display main application title st.title("AI Research Copilot") # Show instructions if no results are available if st.session_state.result_json is None: st.info("Enter a research topic and click 'Fetch Papers & Generate Review' to get started.") # Display results if available if st.session_state.result_json: research_topic = st.session_state.research_topic.title() # Capitalize topic words st.header(f"📖 Literature Review: {research_topic}") if "pdf_path" in st.session_state.result_json: pdf_path = st.session_state.result_json["pdf_path"] filename = f"{topic.replace(' ', '_')}_literature_review.pdf" # Layout for success message and download button col_success, col_download = st.columns([5, 1]) with col_success: st.success("Literature Review Generated!") with col_download: st.download_button(label="📥 Download", data=open(pdf_path, "rb"), file_name=filename, mime="application/pdf") # Extract and display literature review details response = st.session_state.result_json.get("response") data = json.loads(response) papers = data.get("papers", []) conclusion = data.get("conclusion", "") references = data.get("references", []) if papers: # Iterate through each retrieved paper and display details for i, paper in enumerate(papers): with st.container(): st.subheader(f"📄 {i+1}. {paper['title']}") st.markdown(f"**🖊️ Authors:** {paper['authors']}") st.markdown(f"**📅 Publication Date:** {paper['publication_date'][:10]}") st.markdown(f"**🔑 Keywords:** {paper['keywords']}") st.markdown("**📌 Summary:**") st.write(paper["summary"]) st.markdown("**📝 Review:**") st.write(paper["review"]) st.divider() # Display conclusion section st.subheader("🧐 Conclusion") st.write(conclusion) # Display references section st.subheader("📚 References") for ref in references: st.markdown(f"- {ref}") else: st.error("No papers found!")
This is the FastAPI backend of the application. It has the following endpoint:
fetch_papers - This endpoint fetches the research papers on a given topic from arXiv, extracts and saves their metadata and generates a comprehensive literature review of the papers using the research workflow. It saves the generated literature review in PDF format and returns the response in JSON format to be displayed at the frontend.
from fastapi import FastAPI from pydantic import BaseModel from agents.chat_agent import chat_agent from workflows.research_workflow import research_workflow from agno.agent import RunResponse from knowledge.knowledge_base import pdf_knowledge_base from utils import extract_metadata, save_paper_metadata, generate_pdf import os # Initialize FastAPI application app = FastAPI() # Define request model for research paper fetching class ResearchRequest(BaseModel): topic: str # Research topic to fetch papers for max_papers: int = 5 # Default number of papers to fetch @app.post("/fetch_papers/") async def fetch_papers(request: ResearchRequest): """ Endpoint to fetch research papers and generate a literature review. Executes the research workflow and processes the retrieved papers. """ print("Executing workflow...") # Run the research workflow to fetch relevant papers response: RunResponse = research_workflow.run(topic=request.topic, max_papers=request.max_papers) if response: # Extract metadata from the response metadata = extract_metadata(response.content) metadata_file = save_paper_metadata(request.topic, metadata) # Generate and save the literature review PDF generate_pdf(request.topic, response.content) pdf_path = os.path.join("../literature_reviews", f"{request.topic.replace(' ', '_')}_literature_review.pdf") return { "message": "Literature review generated!", "pdf_path": pdf_path, "response": response.content } else: return {"error": "No papers found"}
from reportlab.lib.pagesizes import A4 from reportlab.lib import colors from reportlab.lib.styles import getSampleStyleSheet, ParagraphStyle from reportlab.platypus import SimpleDocTemplate, Paragraph, Spacer, Table, TableStyle def generate_pdf_from_json(topic, json_data, output_filename="research_report.pdf"): """ Generate a PDF from a JSON response containing research papers, a conclusion, and references. Args: json_data (dict): JSON response with papers, conclusion, and references. output_filename (str): Name of the output PDF file. """ doc = SimpleDocTemplate(output_filename, pagesize=A4) elements = [] # Define styles with Unicode font styles = getSampleStyleSheet() title_style = ParagraphStyle(name="Title", fontName="Helvetica-Bold", fontSize=16, spaceAfter=10) heading_style = ParagraphStyle(name="Heading2", fontName="Helvetica", fontSize=14, spaceAfter=8, textColor=colors.darkblue) body_style = ParagraphStyle(name="BodyText", fontName="Helvetica", fontSize=12, spaceAfter=6) bold_style = ParagraphStyle(name="Bold", parent=body_style, fontName="Helvetica-Bold") # Add Title elements.append(Paragraph(f"Literature Review: {topic}", title_style)) elements.append(Spacer(1, 12)) # Process Papers Section if "papers" in json_data: elements.append(Paragraph("Papers", heading_style)) elements.append(Spacer(1, 6)) for idx, paper in enumerate(json_data["papers"], 1): elements.append(Paragraph(f"{idx}. {paper['title']}", bold_style)) elements.append(Paragraph(f"Authors: {paper['authors']}", body_style)) elements.append(Paragraph(f"Publication Date: {paper['publication_date'][:10]}", body_style)) elements.append(Paragraph(f"Keywords: {', '.join(paper['keywords'])}", body_style)) elements.append(Paragraph(f"Source: <a href='{paper['source_link']}'>{paper['source_link']}</a>", body_style)) elements.append(Spacer(1, 6)) # Add Abstract # elements.append(Paragraph("Abstract:", bold_style)) # elements.append(Paragraph(paper["abstract"], body_style)) # elements.append(Spacer(1, 6)) # Add Summary elements.append(Paragraph("Summary:", bold_style)) elements.append(Paragraph(paper["summary"], body_style)) elements.append(Spacer(1, 12)) # Add Review (if available) if "review" in paper and paper["review"]: elements.append(Paragraph("Review:", bold_style)) elements.append(Paragraph(paper["review"], body_style)) elements.append(Spacer(1, 12)) # Conclusion Section if "conclusion" in json_data: elements.append(Paragraph("Conclusion", heading_style)) elements.append(Spacer(1, 6)) elements.append(Paragraph(json_data["conclusion"], body_style)) elements.append(Spacer(1, 12)) # References Section if "references" in json_data: elements.append(Paragraph("References", heading_style)) elements.append(Spacer(1, 6)) # Format references as a table ref_data = [[f"{idx}. {ref}"] for idx, ref in enumerate(json_data["references"], 1)] ref_table = Table(ref_data, colWidths=[500]) # Style the table ref_table.setStyle(TableStyle([ ("TEXTCOLOR", (0, 0), (-1, -1), colors.black), ("ALIGN", (0, 0), (-1, -1), "LEFT"), ("FONTNAME", (0, 0), (-1, -1), "Helvetica"), # ✅ Change to built-in font ("FONTSIZE", (0, 0), (-1, -1), 10), ("BOTTOMPADDING", (0, 0), (-1, -1), 6), ("GRID", (0, 0), (-1, -1), 1, colors.black), ])) elements.append(ref_table) # Build PDF doc.build(elements) print(f"✅ PDF generated successfully: {output_filename}")
These are utility functions to extract and save metadata and generate PDF.
import os import json import re from pdf_from_json import generate_pdf_from_json def extract_metadata(response_content: str): metadata = [] paper_sections = response_content.split("## ")[1:] # Splitting by section titles for section in paper_sections: lines = section.strip().split("\n") title = lines[0].strip() # Skip non-research sections if title.lower() in ["conclusion", "references"]: continue authors = re.search(r"\*\*Authors\*\*: (.+)", section) publication_date = re.search(r"\*\*Publication Date\*\*: (.+)", section) keywords = re.search(r"\*\*Keywords\*\*: (.+)", section) journal = re.search(r"\*\*Journal\*\*: (.+)", section) review_match = re.search(r"\*\*Review\*\*: \*(.+?)\*", section, re.DOTALL) source_link = re.search(r"(http[s]?://[^\s]+)", section) metadata.append({ "title": title, "authors": authors.group(1) if authors else "", "publication_date": publication_date.group(1) if publication_date else "", "keywords": [kw.strip() for kw in keywords.group(1).split(",")] if keywords else [], "journal": journal.group(1) if journal else "", "source_link": source_link.group(1) if source_link else "" }) return metadata # Save paper metadata as JSON def save_paper_metadata(topic, papers): PAPERS_DIR = "../papers_metadata" os.makedirs(PAPERS_DIR, exist_ok=True) file_path = os.path.join(PAPERS_DIR, f"{topic}_papers.json") with open(file_path, "w") as f: json.dump(papers, f, indent=4) return file_path def generate_pdf(topic, response): PDF_DIR = "../literature_reviews" os.makedirs(PDF_DIR, exist_ok=True) pdf_path = os.path.join(PDF_DIR, f"{topic.replace(' ', '_')}_literature_review.pdf") print("Generating PDF...") # Generate PDF from JSON json_response = json.loads(response) generate_pdf_from_json(topic, json_response, pdf_path) print(f"PDF successfully generated: {pdf_path}") return pdf_path
This tool helps download the research papers from arXiv into a local directory for further reference and processing.
import requests import os def download_arxiv_papers(topic, links): """ Downloads PDFs from given arXiv source links and saves them locally. Args: topic (str): Topic of the papers (used as the directory name). links (list): List of arXiv paper URLs (e.g., "https://arxiv.org/abs/2403.12345"). Returns: list: A list of dictionaries containing 'link', 'file_path' (or 'error' if any). """ save_dir = f"downloaded_papers/{topic}" if not os.path.exists(save_dir): os.makedirs(save_dir) # Create directory if it doesn't exist results = [] for link in links: result_entry = {"link": link} # Initialize entry with the link try: # Convert abstract page URL to PDF URL if "arxiv.org/abs/" in link: pdf_url = link.replace("arxiv.org/abs/", "arxiv.org/pdf/") + ".pdf" elif "arxiv.org/pdf/" in link and not link.endswith(".pdf"): pdf_url = link + ".pdf" else: pdf_url = link # Assume it's already a direct PDF link paper_id = pdf_url.split("/")[-1] # Extract paper ID file_path = os.path.join(save_dir, f"{paper_id}") # Download the PDF response = requests.get(pdf_url, stream=True) response.raise_for_status() # Raise error for failed requests with open(file_path, "wb") as file: for chunk in response.iter_content(chunk_size=1024): file.write(chunk) result_entry["file_path"] = file_path # Store success result except requests.exceptions.RequestException as e: result_entry["error"] = str(e) # Store error message results.append(result_entry) # Append the result entry to the list return results
This is the summarization agent which fetches papers from arXiv based on a given topic and maximum number of papers. It extracts the metadata, summary, and a list of source links of the papers and downloads them in PDF format into a local directory.
from agno.agent import Agent, RunResponse from agno.models.together import Together from agno.tools.arxiv import ArxivTools from tools.paper_download_tool import download_arxiv_papers from dotenv import load_dotenv import os load_dotenv() # Define the summarization agent summarization_agent = Agent( name="summarization-agent", model=Together(id="meta-llama/Llama-3.3-70B-Instruct-Turbo-Free", api_key=os.getenv("TOGETHER_API_KEY")), tools=[ArxivTools(), download_arxiv_papers], role="Download papers from arXiv and save them for further processing.", description= ''' You are a summarization agent that will fetch papers from arXiv based on a given topic and maximum number of papers. You will extract the metadata, summary, and a list of source links of the papers and download them in pdf format. ''', instructions=[ "Search for the latest and most relevant research papers on the given topic from arXiv.", "If the number of papers is not specified, fetch 5 by default.", "Extract the metadata, summary, and a list of source links of the papers for downloading.", "For each paper, extract the following metadata: title, abstract, authors, publication date, keywords, and source link.", "Download the papers in pdf format for processing by other agents.", "Your response should include a list of the extracted papers including the following information for each paper:" "1. Metadata: Title, Authors, Abstract,Publication Date, Keywords, Source Link", "2. Summary: Abstract or a concise summary of the paper", "3. Paths to the downloaded papers for further processing", "The information for each paper must be included in the following format.", "Title: [Title of the Paper]", "Authors: [Authors of the Paper]", "Abstract: [Abstract of the Paper]", "Publication Date: [Date of Publication]", "Keywords: [Keywords of the Paper]", "Source Link: [Source Link of the Paper]", "Summary: [Summary of the Paper]", "PDF Path: [Path to the Downloaded Paper]" ], markdown=True, show_tool_calls=True, debug_mode=True )
This is the review generation agent which generates a comprehensive literature review of the research papers using the provided metadata and summaries of papers generated by the summarization agent. It saves the literature review in PDF format for downloading.
from agno.agent import Agent from agno.models.together import Together from dotenv import load_dotenv import os load_dotenv() # Define the review generation agent review_generation_agent = Agent( name="review-generation-agent", model=Together(id="meta-llama/Llama-3.3-70B-Instruct-Turbo-Free", api_key=os.getenv("TOGETHER_API_KEY")), description= ''' You are a review generation agent that will generate a comprehensive literature review of research papers using the provided metadata and summaries of papers generated by the Summarization Agent. ''', instructions=[ "Generate a literature review using the metadata, abstracts, and summaries of the papers from the Summarization Agent.", "The final response should include the literature review of each research paper in a separate paragraph.", "Each paper review should start with the title of the paper as a subheading, along with metadata of that paper at the beginning." "Follow the following format for the metadata of each paper in normal text displaying each metadata item on a new line:", "**Authors**: [Authors of the Paper]", "**Publication Date**: [Date of Publication]", "**Keywords**: [Keywords of the Paper]", "**PDF Path**: [Path to the Downloaded Paper]", "Display each item of the metadata on a new line.", "Follow the following format for the review of each paper in italic text:", "**Review**: *[Literature review of the Paper]*", "Do not include a Literature Review heading in the response.", "Include a conclusion section providing a combined summary of all papers.", "Include a references section at the end of the literature review which includes a list of citations with source links.", "Generate the literature review by processing the summaries of the papers in smaller chunks instead of one long response." "Important: You final response must be in JSON format with the following structure:", "{" " 'papers': [" " {" " 'title': 'Title of the Paper'," " 'authors': 'Authors of the Paper'," " 'abstract': 'Abstract of the Paper'," " 'publication_date': 'Date of Publication'," " 'keywords': 'Keywords of the Paper'," " 'source_link': 'Source Link of the Paper'," " 'summary': 'Summary of the Paper'," " 'review': 'Literature Review of the Paper'", " 'pdf_path': 'Path to the Downloaded Paper'" " }," " {...}" " ]", " 'conclusion': 'Combined summary of all papers'," " 'references': 'List of citations with source links'" "}" "The above JSON response must include all papers with required information, conclusion, and references.", "The response MUST be in proper JSON format with keys and values in double quotes.", "The final response MUST not include anything else other than the JSON response." ], markdown=True, show_tool_calls=True, debug_mode=True )
This is the agentic workflow of this multi-agent collaborative system. It runs the summarization agent and the review generation agent step by step as the input of one agent depends on the output of the other agent. It is an end to end workflow that helps the agents collaborate and work together effectively to achieve their goal.
from agno.agent import Agent from agno.workflow import Workflow, RunResponse, RunEvent from agno.storage.workflow.sqlite import SqliteWorkflowStorage from agno.utils.pprint import pprint_run_response from agno.utils.log import logger from agents.summarization_agent import summarization_agent from agents.review_generation_agent import review_generation_agent # Define the ResearchCopilot workflow class ResearchCopilot(Workflow): summarization_agent: Agent = summarization_agent review_generation_agent: Agent = review_generation_agent def run(self, topic: str, max_papers: int = 5) -> RunResponse: """ Executes the research workflow: 1. Searches for research papers related to the topic. 2. Generates a literature review based on the extracted papers. """ logger.info(f"Generating a literature review of {max_papers} research papers from arXiv on: {topic}") # Step 1: Search arXiv for research papers on the topic and summarize them extracted_papers = self.get_extracted_papers(topic, max_papers) # If no extracted papers are found for the topic, end the workflow if extracted_papers is None: return RunResponse( event=RunEvent.workflow_completed, content=f"Sorry, could not find any research papers on the topic: {topic}", ) print("Extracted papers:", extracted_papers) # Step 2: Generate a literature review of the extracted papers literature_review: RunResponse = self.review_generation_agent.run(extracted_papers) if literature_review is None: return RunResponse( event=RunEvent.workflow_completed, content="Sorry, could not generate a literature review of the research papers.", ) print("Literature review:", literature_review.content) return RunResponse( event=RunEvent.workflow_completed, content=literature_review.content, ) def get_extracted_papers(self, topic: str, max_papers: int): """Get the search results for a topic.""" MAX_ATTEMPTS = 3 for attempt in range(MAX_ATTEMPTS): try: prompt = f"Search for {max_papers} most relevant papers on {topic}." summarizer_response: RunResponse = self.summarization_agent.run(prompt) # Validate response content if not summarizer_response or not summarizer_response.content: logger.warning(f"Attempt {attempt + 1}/{MAX_ATTEMPTS}: Empty searcher response") continue logger.info(f"Found papers on the topic {topic} in attempt {attempt + 1}") print("Extracted papers:", summarizer_response.content) return summarizer_response.content except Exception as e: logger.warning(f"Attempt {attempt + 1}/{MAX_ATTEMPTS} failed: {str(e)}") logger.error(f"Failed to get extracted papers after {MAX_ATTEMPTS} attempts") return None # Initialize the Research Copilot workflow with SQLite storage research_workflow = ResearchCopilot( session_id=f"generate-literature-review", storage=SqliteWorkflowStorage( table_name="generate_literature_review_workflows", db_file="workflows/db/workflows.db", ), )
This is a demonstration of using the research copilot.
cd backend uvicorn main:app --reload
cd frontend streamlit run app.py
Access the application through the following link.
http://localhost:8501
Agents are no magic. They are complex software systems powered by LLMs. Here are a few tips from my personal experience for building effective agents:
AI agents are going to totally transform how we live and work especially for local businesses. Imagine AI handling routine tasks, freeing business owners to focus on growth and innovation.
AI agents are set to revolutionize how work gets done with a huge impact in every domain. They can easily take up routine tasks freeing you to focus on more important things.
Powerful agentic AI systems have been possible because of the recent advancements in LLMs and their reasoning capabilities. The future of AI agents seems to be very bright incorporating multi-agent collaborative systems into business processes.
This article serves as a starting point for anyone who wants to understand the concept of AI agents from a very basic level along with real-world use cases and implementation. To further enhance your understanding and taking it to the next level, consider:
AI agents are the most powerful innovation in the history of artificial intelligence. They are software applications that use large language models (LLMs) to autonomously perform specific tasks, ranging from answering research questions to handling backend services. They work towards achieving a specific goal in an optimized way marking their true impact while making our lives easier. AI agents are incredibly useful for tasks that demand complex decision making, autonomy, and adaptability. LLMs, tools, and orchestration are the core components of an agentic architecture. AI agents leverage AI models (LLMs) to understand goals, generate tasks, and go about completing them in an efficient manner. Examples of AI agents include coding agents, customer service chatbots, and personalized learning management systems.
There are no datasets linked
There are no datasets linked