Personal Identity OS: Multi-Agent Fact Dossier Builder

ig_0bd4f6fc9fe063f30169ea8054df4c8191a1de707953b0f381.png

Overview

Personal Identity OS is a local-first multi-agent system built with LangGraph. It helps a user turn raw personal inputs such as notes, public URLs, and PDFs into a structured Fact Dossier.md that can be reviewed before it is written to disk.

The project was built as a capstone submission for the Ready Tensor Multi-Agent System program. Its goal is to demonstrate practical multi-agent collaboration, tool integration, orchestration with LangGraph, and human-in-the-loop review in a workflow that solves a real documentation problem: turning scattered professional information into a grounded and reusable profile.

GitHub repository: https://github.com/Zegor88/identity-os

Why This Project Matters

Professionals often have their career facts spread across CVs, PDFs, websites, notes, and memory. Turning that material into a clean and trustworthy profile is harder than it looks.

A single prompt is usually not enough for this task because the workflow requires several different behaviors:

ingesting different source types
extracting only grounded facts
checking new facts against what is already known
presenting pending updates in a human-readable way
letting the user approve, cancel, or correct the result before it is written

This is exactly the kind of problem that benefits from a modular multi-agent design rather than a one-shot generation flow.

Problem Statement

The system addresses a practical problem:

How can we convert unstructured personal and professional source material into a reusable, source-grounded fact dossier without blindly trusting a single LLM pass?

The project focuses on three requirements:

support multiple input formats
separate responsibilities across specialized agents
keep the human in control before the final write

What the System Does

Personal Identity OS supports two working modes:

Interview mode: the system asks the user a structured 32-question interview across seven sections, summarizes each section, asks for anything missing, and then processes the section through the same extraction and review pipeline
Direct mode: the user provides raw text, a public URL, or a local PDF path and the system processes that input directly

In both modes, the workflow ends with a preview and a human approval gate before the dossier is updated.

Generated artifacts:

output/Fact Dossier.md
output/change_log.md

Multi-Agent Architecture

The system is orchestrated with LangGraph. Each node has a distinct responsibility and collaborates through shared graph state.

High-Level Architecture

graph TD
    User[User]
    CLI[CLI]
    Graph[LangGraph Orchestrator]
    Sources[Text / URL / PDF]
    Interview[Interview Flow]
    Direct[Direct Update Flow]
    Extract[Fact Extraction]
    Check[Conflict Checking]
    Preview[Preview + Approval]
    Dossier[(Fact Dossier.md)]

    subgraph Tools[Custom Tools]
        PDF[pdf_reader]
        WEB[web_search]
        MD[md_manager]
    end

    User --> CLI
    CLI --> Graph
    User --> Sources
    Graph -->|interview mode| Interview
    Graph -->|direct mode| Direct
    Sources --> Direct
    Interview --> Extract
    Direct --> Extract
    Extract --> Check
    Check --> Preview
    Preview --> Dossier

    Sources -.-> PDF
    Sources -.-> WEB
    Preview -. read/write .-> MD

Execution Graph

flowchart TD
    Entry[entry]

    Entry -->|mode=direct| Ingestion[ingestion]
    Entry -->|mode=interview| IQ[interview_question]

    IQ --> IP[interview_progress]
    IP -->|more questions| IQ
    IP -->|section complete| ISR[interview_section_review]
    ISR --> IA[interview_apply]
    IA --> Ingestion

    Ingestion --> FE[fact_extractor]
    FE --> CC[conflict_checker]
    CC --> PP[prepare_preview]
    PP --> AP[approval]

    AP -->|approve| WR[write]
    AP -->|edit| FE
    AP -->|cancel| End1((END))

    WR -->|direct| End2((END))
    WR -->|interview| ADV[interview_advance]
    ADV -->|next section| IQ
    ADV -->|complete| End3((END))

Agent Roles

This project satisfies the capstone requirement of using at least three agents with distinct responsibilities. In practice, it uses several specialized graph nodes:

Agent / Node	Responsibility
`Router / Ingestion`	Normalizes raw text, URL, or PDF input into graph-ready text
`Fact Extractor`	Extracts grounded professional facts into structured state
`Conflict Checker`	Compares new facts to the current dossier and classifies them
`Prepare Preview`	Builds the human-readable preview shown before writing
`Markdown Writer`	Rewrites the dossier using approved updates
`Interview Question`	Asks the next question during interview mode
`Section Review`	Summarizes a completed interview section and asks for missing details
`Interview Apply / Advance`	Converts interview answers into graph input and moves to the next section

The key point is not just the number of nodes, but the separation of concerns: extraction, checking, summarization, approval, and writing are handled independently rather than collapsed into one model invocation.

Tool Integration

The project also satisfies the capstone requirement of integrating at least three tools beyond basic LLM responses.

Tool	Purpose	Why it matters
`tools/pdf_reader.py`	Extracts text from local PDF files using PyMuPDF	Extends the system to work with document inputs rather than plain prompts
`tools/web_search.py`	Fetches public web page content through Firecrawl	Allows ingestion of external web sources
`tools/md_manager.py`	Handles markdown reads, writes, template lookup, and output paths	Separates file lifecycle management from agent logic

These tools give the system real I/O behavior and make the pipeline useful beyond pure text completion.

Human-in-the-Loop Design

One of the most important design choices in this project is the explicit approval gate before writing to disk.

After extraction and conflict checking, the system shows the user a preview of the pending facts. At that point, the user can:

approve the write
cancel the write
provide an edit instruction or correction

This keeps the workflow grounded and reduces the risk of silently writing incorrect facts into the dossier.

How the Workflow Operates

Direct Mode

The user provides:

raw text
a public URL
or a local PDF path

The graph then runs:

entry -> ingestion -> fact_extractor -> conflict_checker -> prepare_preview -> approval -> write

Interview Mode

The user launches the CLI with no argument. The graph then runs a 32-question interview across these seven sections:

Core Identity
Career Timeline
Business Facts
Personal Story
Positioning
Hard Boundaries
Content Assets

Each completed section is summarized, optionally extended by the user, and then processed through the same extraction and approval pipeline before the interview advances.

Data Structures and Review Logic

The structured graph state includes:

router decisions (has_facts, is_ambiguous, justification)
extracted facts with section, label, value, and source_excerpt
fact classifications with one of four labels:
- Duplicate
- Contradiction
- Enrichment
- Novel
preview payloads for the approval screen

This design gives the workflow a clear internal contract and makes it easier to reason about how information moves through the system.

Example Usage

Setup

git clone https://github.com/Zegor88/identity-os.git
cd identity-os
python3.10 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
cp .env_example .env

Required environment variables:

GOOGLE_API_KEY
FIRECRAWL_API_KEY for URL ingestion

Run in Interview Mode

python run_graph.py

Run in Direct Mode

python run_graph.py "From 2022 to 2024 I led analytics automation for 14 markets."
python run_graph.py "https://example.com/about-me"
python run_graph.py "/absolute/path/to/profile.pdf"

Design Choices

Several practical design choices shaped the final system:

LangGraph over a linear script: useful because the workflow has branching, interruptions, and section-by-section progression
Specialized nodes instead of one agent: useful because extraction, checking, writing, and interviewing have different objectives
Markdown output: useful because the result remains inspectable, portable, and easy to version
Human approval before write: useful because the workflow deals with personal facts, where silent mistakes are costly

Current Limitations

This is a working capstone project, not a polished production platform. Current limitations include:

URL ingestion depends on Firecrawl availability and API access
the project does not yet include an automated test suite
evaluation is qualitative rather than benchmark-driven
the system currently focuses on writing a fact dossier, not a broader identity platform

I state these limits explicitly because Ready Tensor evaluation emphasizes technical honesty and verifiable claims.

What This Project Demonstrates

From a capstone perspective, the project demonstrates:

a working multi-agent system
distinct agent roles with shared orchestration
LangGraph as the orchestration framework
at least three integrated tools
a meaningful human-in-the-loop step
runnable local setup and example usage

Why I Think This Is a Good Fit for the Capstone

This project is a good fit for the Mastering AI Agents capstone because it solves a real workflow problem that is hard to address with a single prompt. It requires routing, structured extraction, consistency checking, staged review, and controlled writing. Those responsibilities are distributed across multiple cooperating agents and tools, with LangGraph coordinating the overall flow.

In other words, the system is not “multi-agent” in name only. The decomposition is central to how the project works.

Repository

GitHub: https://github.com/Zegor88/identity-os

Future Improvements

If I continue this project, the next improvements would be:

add automated tests for graph behavior and tool paths
add evaluation datasets for extraction and conflict classification quality
improve conflict resolution UX
support richer output artifacts beyond a single dossier file
add demo screenshots or a short walkthrough video

Conclusion

Personal Identity OS is a practical example of a local-first multi-agent pipeline that turns scattered professional inputs into a reviewed fact dossier. The project combines orchestration, tool use, human review, and structured writing into one coherent LangGraph workflow. It is designed to be understandable, runnable, and evaluable as a Ready Tensor capstone submission.