Built with LangGraph, integrated with Cohere LLM, Jina embeddings, and Qdrant vector database, it is a multi-agent system that analyses an INGV dataset, generates comprehensive documentation, and provides intelligent question-answering capabilities.
The system is designed as a pipeline orchestrating four collaborative agents:
| Agent | Responsibility |
|---|---|
| Dataset Analyzer | Inspects and structures the input dataset to build a foundational understanding. |
| Embedding Agent | Generates semantic embeddings used for similarity search and knowledge extraction. |
| Content Improver | Enhances the clarity, structure, and readability of the text. |
| Metadata Recommender | Proposes metadata, labels, and structural annotations for enriched context. |
| QA Agent | Answers analytical or data‑driven queries using processed information. |
| QA Reviewer | Validates, refines, and ensures quality of answers generated by other agents. |
| Article Generator | Produces final articles or reports synthesising all processed information. |

The architecture shows a multi-level system in which user interaction occurs through different interfaces, such as the CLI, Streamlit, or a REST API, which queries LangGraph, which coordinates the flow of operations involving a set of collaborating agents. These agents are responsible for analysing the dataset and generating embeddings; others improve the content, suggest metadata, or handle question answering and reviewing answers. Each agent has a specific role and contributes to transforming, enriching, or validating the information. To support this, the Tool layer provides essential tools such as file upload, data analysis, text splitting, and web search, which feed agents with structured inputs. The architecture integrates with external services, including the embedding files INGV makes available on its website and the Cohere LLM APIs. These services add advanced semantic analysis. The entire system then operates as a coordinated chain, with each layer collaborating to produce high-quality, intelligent outputs.
A lightweight Streamlit‑based interface abstracts the system’s internal complexity and presents agent capabilities through a guided, user‑friendly workflow. The UI allows users to provide inputs such as a dataset, and then follow the multi‑agent pipeline step by step. Each stage displays structured outputs, including intermediate embeddings, recommendations, improvements, and final generated content. The interface also provides validation feedback, error notifications, and clear status indicators, ensuring transparency throughout the pipeline. At the end of the workflow, the interface assembles all validated outputs into a coherent scientific‑style article generated by the Article Generator. This final report is presented in a clean, readable layout, ready for export or further review. Through this design, the UI opens the system to researchers, analysts, and non‑technical users alike, enabling advanced multi‑agent capabilities through a simple, guided experience.
Resilience improvements ensure the earthquake‑processing pipeline remains stable even when dealing with noisy data, missing parameters, or inconsistent INGV feeds. The system now applies backoff‑based retry logic when remote seismic data endpoints fail or return incomplete TXT payloads, preventing abrupt interruptions. Structured error propagation makes every failure explicit, enabling agents and the Streamlit UI to respond predictably rather than silently discarding information. Early‑exit detection stops the workflow immediately when essential seismic fields—such as magnitude, depth, or coordinates—are absent, avoiding the generation of misleading outputs. All user‑provided notes and textual inputs are sanitised to prevent malformed strings from contaminating downstream agents. Deterministic state management ensures reproducible behaviour across multiple runs, even under fluctuating network conditions or intermittent INGV availability. Together, these upgrades enable the system to gracefully handle real‑world seismological data and deliver reliable, scientifically valid results under a variety of environmental and data‑quality constraints.
A dedicated automated test suite validates tools, pipeline logic, keyword extraction, and end-to-end execution. Mocked at file INGV calls ensure functionality without external dependency. Tests also verify that human edits flow through the state and influence final outputs, guaranteeing correctness and enabling maintainability for iterative development.
| Layer | Tools |
|---|---|
| Language | Python 3.10+ |
| Orchestration | LangGraph (0.0.x) – multi‑agent workflow engine |
| Text Pipeline | LangChain (0.1.x) – text utilities, tool integration |
| LLM / AI | Cohere API (5.20.0) – generation, analysis |
| Embeddings | MiniLM & Sentence‑Transformers – local embedding fallback |
| Vector DB | Qdrant (local, 1.6.0) – vector search and storage |
| UI | Streamlit – lightweight web interface |
| HTTP / I/O | requests, file loader utilities |
| Environment | python‑dotenv – configuration & secrets |
| Testing | pytest |
| Clients | cohere SDK, qdrant‑client |
| Output | Scientific reports, structured analysis, agent‑generated content |
🧿 1. Retrieval Stability Assessment
The system was tested with multiple seismic input files to verify consistent extraction under varying data conditions. When endpoints were unreachable or files were not found, the pipeline activated controlled retries and fallback mechanisms. These checks confirmed stable degradation behaviour and safe handling of retrieval failures.
🔄 2. Multi‑Agent Coordination Performance
Repeated executions of the seismic analysis pipeline confirmed that agents consistently maintained shared context, correctly incorporated upstream refinements, and produced logically aligned downstream outputs. These tests validated reliable handoff, stable state transitions, and coherent reasoning across the entire multi‑agent chain.
🛠️ 4. Fault Handling & Robustness
The pipeline was tested against invalid URLs, missing seismic datasets, and temporary API interruptions. It responded with clear logs, controlled retries, and explicit termination messages, demonstrating strong resilience and awareness of failure conditions.
🧭 5. Usability Assessment
Non‑technical users tested the Streamlit interface and were able to run the full earthquake‑analysis workflow without touching the command line. The guided layout and structured panels enabled them to follow each step and ultimately obtain a complete, automatically generated scientific-style report.
✅ 6. Automated Consistency Checks
A suite of automated tests verifies that each agent executes correctly and that the orchestration behaves deterministically across all seismic‑analysis workflows. The validation also confirms the accuracy of extracted features and ensures that human‑in‑the‑loop signals propagate reliably through every stage of the pipeline.
The improved system delivered stable performance, recovered smoothly from failure scenarios, and preserved user‑provided edits throughout the entire seismic‑analysis pipeline. Both CLI and browser‑based workflows produced clear, actionable outputs, with all automated tests confirming full operational readiness. Feedback loops enhanced the clarity of the generated scientific reports, and usability tests showed successful adoption, even among users unfamiliar with multi‑agent systems.
This project developed from a conceptual earthquake-analysis pipeline into a deployable, production-ready system focused on robustness, traceability, and user-driven control. Its strength lies not in adding more agents, but in engineering the existing architecture to behave like dependable software: measurable, diagnosable, steerable, resilient, and easy to operate. Through orchestration design, UI support, and fault-tolerance strategies, the system now meets real-world expectations for agentic AI and demonstrates readiness for professional use.