A Multi Agent Open Data Insight Generator

Multi Agent Open Data Insight Generator

ChatGPT Image Nov 2, 2025, 06_59_45 PM.png

Tl;dr

Open Data Insight Generator is a multi-agent analytics pipeline that turns open government datasets into validated insights, visualizations, and audit trails. LangGraph-driven orchestration coordinates data collection, statistical analysis, automated auditing, and LLM-based evaluation so policy analysts and civic technologists can publish decision-ready briefs without touching raw data.

Overview

Modern public-sector teams struggle to translate sprawling Data.gov inventories into trustworthy narratives. This project automates the path from a natural-language question to a vetted insight pack: datasets are fetched with license checks, cleaned, analyzed, visualized, numerically audited, and finally scored by an LLM judge for clarity, honesty, and statistical soundness. The modular design mirrors human analyst roles, which simplifies monitoring, extension, and integration into existing research workflows.

Workflow

User submits a query or explicit dataset reference.
Data Collector Agent retrieves and validates open datasets (Data.gov or Kaggle).
Analyst Agent computes statistics, renders charts, and drafts natural-language insights.
Auditor Agent cross-checks every statistic, assigns confidence, and flags discrepancies.
Evaluator Agent combines numeric validation with LLM-as-a-judge scoring to produce a publishable evaluation dossier.

        workflow.add_edge("data_collector", "analyst")
        workflow.add_edge("analyst", "auditor")
        workflow.add_edge("auditor", "evaluator")
        workflow.add_edge("evaluator", END)

Each run outputs machine-readable artifacts (insights.json, validation_report.txt, evaluation_results.json) and visualization assets so downstream platforms can render dashboards or PDF briefings without rerunning the pipeline.

Domain Overview - Give links for Gov Data platform

The solution targets open government datasets, with a primary integration into the Data.gov catalog and support for civic competitions and research corpora hosted on Kaggle. Data.gov’s CKAN APIs deliver structured metadata and license signals, while Kaggle expands the domain to global civic, climate, and mobility datasets—together covering the bulk of Ready Tensor’s government analytics use cases.

Methodology

System Architecture

A typed LangGraph state machine orchestrates specialized agents so each stage can focus on a single responsibility while sharing context through structured state payloads.

        for dataset_name, df in datasets.items():
            print(f"  Analyzing {dataset_name}...")
            statistics = self._compute_statistics(df)
            visualization_paths = {}
            if generate_visualizations:
                plots = self.visualization_tool.generate_summary_plots(df)
                visualization_paths = {k: str(v) for k, v in plots.items()}
            insights_text = self._generate_insights(df, statistics)
            all_insights[dataset_name] = {
                "statistics": statistics,
                "insights_text": insights_text,
                "visualizations": visualization_paths,
                "data_summary": get_data_summary(df)
            }

Production-Readiness Enhancements

Structured DatasetResult models enforce schema guarantees and track license/file-type compliance during collection.
Guardrails evaluate null ratios, duplicates, outliers, and type coercion to reject low-quality feeds before analysis.
Configuration is centralized in a validated settings object, ensuring required API keys and output paths exist prior to runtime.

        if dataset_id:
            try:
                dataset_info = self.data_gov_client.get_dataset(dataset_id)
                # ... existing code ...
                if "public domain" in license_info or "open" in license_info or "cc0" in license_info:
                    license_verified = True
            except Exception as e:
                errors.append(f"Failed to verify license: {e}")
        result = DatasetResult(
            datasets=validated_datasets,
            metadata=metadata,
            license_verified=license_verified,
            file_types_valid=file_types_valid,
            errors=errors
        )

        result = {
            "valid": True,
            "score": 0.0,
            "issues": [],
            "warnings": [],
            "metrics": {}
        }
        # ... existing code ...
        score = max(0.0, score)
        result["metrics"] = metrics
        result["issues"] = issues
        result["warnings"] = warnings
        result["score"] = score
        result["valid"] = score >= settings.min_data_quality_score
        return result

class Settings(BaseSettings):
    """Application settings."""
    kaggle_username: Optional[str] = Field(default=None, description="Kaggle username")
    # ... existing code ...
    def validate(self) -> None:
        """Validate required settings."""
        errors = []
        # ... existing code ...
        self.data_dir.mkdir(parents=True, exist_ok=True)
        self.output_dir.mkdir(parents=True, exist_ok=True)

Testing Strategy

Automated validation runs at multiple layers. The Auditor Agent recomputes descriptive statistics and numeric claims, deducing confidence scores and polishing a discrepancy log. Concurrently, the Evaluator Agent applies NumericValidator heuristics plus LLM qualitative scoring to produce accuracy, clarity, soundness, and honesty ratings—effectively a regression suite against statistical drift.

        result = {
            "overall_valid": True,
            "confidence_score": 1.0,
            "statistical_checks": [],
            "numeric_checks": [],
            "errors": [],
            "warnings": [],
            "discrepancies": []
        }
        # ... existing code ...
        if result["errors"]:
            result["overall_valid"] = False
        return result

            numeric_eval = self._evaluate_numeric_accuracy(
                insights,
                dataset_name,
                dataset_metadata.get(dataset_name, {})
            )
            qualitative_eval = self._evaluate_qualitative(
                insights,
                auditor_validation,
                dataset_metadata.get(dataset_name, {})
            )
            evaluation_result = {
                "dataset_name": dataset_name,
                "accuracy_score": qualitative_eval.get("accuracy_score", 5.0),
                "clarity_score": qualitative_eval.get("clarity_score", 5.0),
                "soundness_score": qualitative_eval.get("soundness_score", 5.0),
                "honesty_score": qualitative_eval.get("honesty_score", 5.0),
                "numeric_accuracy": numeric_eval.get("accuracy", 0.0),
                "justification": qualitative_eval.get("justification", {}),
                "discrepancies": auditor_validation.get("errors", [])
            }

Safety Features

License verification, file-type constraints, dataframe validation, and statistical guardrails enforce that only open, structured, and numerically stable datasets advance. If Data.gov metadata lacks explicit licensing, the collector flags the uncertainty for reviewer attention. Downstream agents operate on typed inputs, decreasing the probability of prompt injection or schema drift when piping insights across agents.

Failure Handling, Monitoring & Observability

Guarded try/except blocks at each stage surface descriptive errors to the console, while partially successful stages still persist their outputs for later diagnosis. Evaluation metrics, discrepancy lists, and confidence scores provide monitoring signals that can be scraped into dashboards or alerts. All generated reports land in a versionable output directory so operators can trace regressions across runs.

    print("=" * 80)
    print("Open Data Insight Generator")
    # ... existing code ...
    try:
        if args.dataset == "kaggle":
            # ... existing code ...
        elif args.dataset == "data-gov":
            # ... existing code ...
        else:
            print("Error: --dataset required or use --query for query-based search")
            return
        datasets = data_result["datasets"]
        metadata = data_result["metadata"]
    except Exception as e:
        print(f"✗ Failed to collect dataset: {e}")
        return

        insights = analyst.analyze(
            datasets,
            generate_visualizations=not args.no_viz
        )
        insights_path = output_dir / "insights.json"
        analyst.save_insights(insights, str(insights_path))
        # ... existing code ...
        report_path = output_dir / "validation_report.txt"
        # ... existing code ...
        eval_report_path = output_dir / "evaluation_report.txt"
        with open(eval_report_path, 'w', encoding='utf-8') as f:
            f.write(eval_report)

Interface Design & Visual Output

The current interface is a guided CLI that echoes pipeline progress, highlights validation warnings, and summarizes evaluation metrics. Visualization assets (heatmaps, distributions, scatter matrices, time series) are generated automatically to support publication-ready figures. These assets are saved with consistent naming conventions so a Streamlit or Ready Tensor front-end can embed them directly.

        for col in numeric_cols[:5]:
            try:
                plots[f'distribution_{col}'] = self.generate_distribution_plot(
                    df[col],
                    title=col,
                    output_path=output_dir / f"distribution_{col}.{settings.visualization_format}"
                )
            except Exception as e:
                print(f"Warning: Failed to generate distribution plot for {col}: {e}")

Screenshot (117).png

distribution_2017 NAICS Code.png

Deployment Choices

The pipeline is packaged as a Python application with a single entry point. Operators install dependencies via pip install -r requirements.txt, configure credentials in .env, and run python main.py --query "...". A feature flag toggles between LangGraph coordination and the legacy sequential fallback so teams can deploy in environments without LangGraph while retaining compatibility with orchestration-first setups.

1. Install dependencies: `pip install -r requirements.txt`
2. Set up `.env` file from `.env.example`
3. Run: `python main.py --dataset kaggle --dataset-name "username/dataset-name"`

Github repo link

https://github.com/sahil095/data-insight-generator

Results

Recent dry runs on U.S. renewable energy datasets demonstrate the end-to-end loop: the analyst highlights growth trends, the auditor confirms parity between claimed and computed statistics with a 0.95 confidence score, and the evaluator judges accuracy, clarity, soundness, and honesty between 8–9/10 while measuring 98% numeric accuracy. Each artifact is archived in the output/ directory for transparency and re-use.

**Analyst Agent:**
- Generates summary: "Renewable energy grew 18% between 2020–2023"
- Identifies: "Wind energy is the fastest-growing sector"
- Creates visualization: Year vs. Production chart
# ... existing code ...
**Evaluator Agent (LLM-as-Judge):**
- Rates analysis clarity (9/10), accuracy (9/10), soundness (8/10), honesty (9/10)
- Numeric accuracy: 98%

Current State Gap Identification

Open government analytics typically lean on ad-hoc spreadsheets or isolated BI dashboards, which seldom guarantee license compliance, numeric validation, or reproducible prompt histories. The Open Data Insight Generator closes that gap by encoding each analyst role as an auditable agent with typed state transitions, something absent in most tooling meant for Data.gov exploration.

Manual CKAN workflows download resources without automated guardrails, whereas the collector here enforces license and file-type checks before admitting data downstream.
BI tools such as Tableau or Power BI surface visuals but lack automated verifiers similar to the Auditor Agent’s confidence-scored discrepancy log.
LLM summarizers focus on narrative output yet rarely cross-check claims against raw data; the Evaluator Agent combines numeric and qualitative validation to provide those assurances.

        result = {
            "overall_valid": True,
            "confidence_score": 1.0,
            "statistical_checks": [],
            "numeric_checks": [],
            "errors": [],
            "warnings": [],
            "discrepancies": []
        }
        # ... existing code ...
        if result["errors"]:
            result["overall_valid"] = False
        return result

Comparative Analysis

Baseline	Strength	Gap Addressed by This Project
Manual Data.gov downloads	Direct access to authoritative datasets	No automated license verification, cleaning, or audit trail; this pipeline automates those steps via the Data Collector Guardrails.
Tableau / Power BI dashboards	Rich visualization tooling	Require pre-processed data and manual statistical validation; Analyst + Auditor Agents automate statistical checks and produce machine-readable reports.
CKAN-based CKAN Harvester scripts	Automated ingestion	Focus on metadata sync, not insight generation or validation; our multi-agent chain delivers insights, audits, and evaluation artifacts.
Single-agent LLM summarizers	Fast narrative generation	Often hallucinate or misstate figures; Evaluator + NumericValidator modules constrain narratives with numeric cross-checks.
Kaggle notebooks	Flexible analysis environment	Depend on expert intervention, lacking turnkey orchestration; LangGraph coordination provides reusable, query-driven workflows.

        workflow.add_edge("data_collector", "analyst")
        workflow.add_edge("analyst", "auditor")
        workflow.add_edge("auditor", "evaluator")
        workflow.add_edge("evaluator", END)

Limitations Discussion

Dataset search coverage: Kaggle query-based search remains unimplemented (NotImplementedError), so operators must supply explicit dataset names; extending fetch_by_query is a priority.
LLM dependency: Insight and evaluation quality hinge on availability and stability of the configured LLM provider. If API quotas are exhausted, the system falls back to template messaging with reduced expressiveness.
Numeric validation heuristics: Regex-based extraction in NumericValidator may miss complex claims, potentially under-reporting discrepancies; augmenting it with structured parsing or semantic tagging would improve recall.
Visualization scope: Automated charts focus on numeric distributions and correlations, which may not capture geospatial or network relationships present in some civic datasets.
Operational observability: While reports are persisted, there is no built-in telemetry export to monitoring stacks; integrating with metrics pipelines would ease production operations.

        elif source.lower() == "kaggle":
            # For Kaggle, would need to search - simplified for now
            raise NotImplementedError("Kaggle query search not yet implemented")

        if self.provider == LLMProvider.GROQ.value:
            if not settings.groq_api_key:
                raise ValueError("GROQ_API_KEY is required for Groq provider")
            self.client = Groq(api_key=settings.groq_api_key)
        elif self.provider == LLMProvider.OPENAI.value:
            if not settings.openai_api_key:
                raise ValueError("OPENAI_API_KEY is required for OpenAI provider")
            self.client = OpenAI(api_key=settings.openai_api_key)

        patterns = [
            (r'(?:mean|average|avg)[:\s]*([+-]?\d+\.?\d*)', 'mean'),
            (r'(?:std|standard deviation)[:\s]*([+-]?\d+\.?\d*)', 'std'),
            (r'(?:count|total)[:\s]*(\d+)', 'count'),
            (r'(\d+\.?\d*)%', 'percentage'),
        ]
        for pattern, claim_type in patterns:
            matches = re.finditer(pattern, text, re.IGNORECASE)
            # ... existing code ...

Examples (2-3 use queries with expected output)

Analyze open data about U.S. renewable energy consumption from 2020 onward → Fetches DOE renewable production tables, produces growth-rate commentary, and exports heatmaps plus a 0.95-confidence audit report.
Compare county-level vaccination rates with demographic indicators in California → Pulls CDC immunization datasets, highlights correlations between coverage and age brackets, and flags counties with data-quality warnings before publishing evaluator scores.
What are the top safety incidents reported in public transit systems in 2023? → Aggregates Bureau of Transportation Statistics feeds, ranks incident categories, and attaches audited count totals ready for Ready Tensor dashboards.

Next Steps section

Launch a Streamlit website to demo the multi-agent pipeline with interactive dataset selection and live evaluation summaries.
Add a one-click PDF report generation button that compiles insights, charts, and audit logs for Ready Tensor downloads.
Refine API ergonomics (pagination, retries, caching) to harden Data.gov and Kaggle integrations for higher query volumes.