The Agentic Research Abstract Generator and Web Content Summariser Agent With LangGraph

Project Description

This project presents an agentic AI system designed to support research workflows by either generating structured research abstracts or summarizing the content of online webpages. Built with LangGraph and integrated with HuggingFace models, Selenium, and LangSmith, this assistant leverages a modular agent architecture that enables easy experimentation, task routing, and feedback loops.

The two primary functionalities; research abstract generation and webpage summarization, which are orchestrated through separate but inter-linked LangGraph flows. The system’s design emphasises the following;

Reproducibility
Modularity
Usability

This makes it ideal for students, researchers, and AI developers interested in agent-based architectures.

Aims and Objectives

AIM
To build an interactive agentic system that automates the summarization of online research content and the generation of domain-specific abstracts, using LangGraph as the core orchestrator and HuggingFace LLMs as reasoning engines.

OBJECTIVES

Implement modular agents using LangGraph.
Enable URL-based summarization via Selenium.
Generate structured, relevant abstracts with feedback refinement.
Use LangSmith for tracing the flow of the agents.
Provide a reproducible interpretable, and easy to extend environment.

Methodology

Agentic Research Assistant AI.png

Abstract Generation Pipeline

a user begins by inputting a research title and selecting a category. The system then routes this input through two LangGraph agents. The Writer agent is responsible for generating an initial abstract based on the user’s prompt. This abstract is then passed to the Critic agent, which evaluates the draft’s relevance and coherence. If the abstract is accepted, the process ends; if it is rejected, the Writer is prompted to regenerate a revised version. This loop continues until the Critic approves the abstract, effectively simulating a writer–reviewer interaction found in academic publishing.

Web Content Summarization Pipeline

The second workflow focuses on web summarization. When a user provides a URL, the input is first processed by a Search agent that validates and cleans the link. The Loader agent then uses Selenium to retrieve the webpage content, limiting the total length to approximately 32,000 tokens to ensure efficient summarization. Finally, the Summarizer agent, powered by a HuggingFace model, condenses the extracted content into a concise summary that captures the key insights of the page.

These two flows are encapsulated in a shared state and routed using a central manager node defined in main.py. Users can interactively select which workflow they wish to run, and LangGraph handles the orchestration of the agents in a visually traceable manner.

Code Structure and Key Components

The project follows a modular codebase that facilitates maintainability and experimentation. The graph_article folder contains the abstract generation components, including the Writer and Critic agents and the graph controller. Similarly, the graph_web folder contains the summarization logic, divided into the Search, Loader, and Summarizer agents. A shared.py module defines a Pydantic-based state object that both graphs reference, ensuring consistent data handling.

For tracing and observability, the system is integrated with LangSmith. Users can view detailed logs of the agent flows and monitor how decisions were made during abstract generation or summarization. A utility script is provided to visualize the LangGraph flows as images, which are automatically saved for documentation or debugging.

The entire environment is defined in environment.yml, and a clear README.md is included for setup instructions and usage notes. Two Jupyter notebooks (research_graph2.ipynb and research_graph3.ipynb) demonstrate usage of the abstract and summarization graphs independently, enabling quick testing and iteration.

Workflow Summary

Once the system is installed, users simply run the main.py script. The program launches an interactive session where users can choose whether to run the abstract generation or the webpage summarization flow. Depending on the selection, the appropriate LangGraph workflow is activated.

In the abstract generation path, users are prompted to input a category (e.g., Artificial Intelligence, Healthcare, Education) and a custom title. The writer–critic loop then iteratively generates and evaluates abstracts until a suitable one is produced. In the web summarization path, users input a valid URL, and the system returns a summarized version of the page content after loading and processing it through a HuggingFace model.

LangSmith traces are automatically activated (if configured), offering transparency and debugging support. Visual graphs of each flow are saved and can be used to understand how data and decisions propagate through the agents.

Example Usage

When main.py is ran, the graph would be run and the user will be prompted to navigate to their desired use case.

Issues faced:

One of the primary technical challenges was coordinating multiple agents and state transitions within LangGraph. Ensuring that each agent could pass outputs in the correct structure, while also enabling looping (especially in the writer–critic process), required careful definition of the shared state and transitions.

Another practical limitation encountered during testing was the restriction imposed by the free HuggingFace inference tier. Model usage often exceeded the quota during development, suggesting that a paid plan or local hosting (e.g., via Ollama) may be necessary for long-term use or scalability. Additionally, URLs passed to the summarizer must be publicly accessible and not behind authentication; MDPI open-access research articles worked reliably.

Finally, although LangGraph provides robust agent orchestration, more agent-specialized libraries like CrewAI may offer more flexibility for similar future projects.

Conclusion

The Agentic Research Abstract Generator and Web Content Summariser demonstrates how agent-based AI systems can be constructed using LangGraph and LLMs to streamline core research tasks. Through a combination of interactive interfaces, modular agent design, and iterative refinement loops, the system provides both a functional tool and a reference architecture for future research assistant agents.

It shows how tools such as; LangChain, HuggingFace, and LangSmith can be orchestrated into a cohesive workflow that is both transparent and extensible. Whether used as a base for academic support tools or a playground for experimenting with agentic reasoning, the project highlights the potential of graph-based AI orchestration in research automation.

License

This project is licensed under the MIT License. See the LICENSE file for more details.