Mr.DOC: An Agentic AI-Powered Document Interaction Suite for Privacy-Preserving, Cross-Platform Knowledge Extraction
Author: Abhishek N
Affiliation: N/A
Submission Date: March 27, 2025
Contact: abhismail998@gmail.com
Abstract:
"Mr.DOC" represents a pioneering advancement in AI-driven document interaction, merging local processing with optional cloud-based enhancements to deliver a privacy-preserving, intuitive query interface. Built on Google Cloud’s Vertex AI, LangChain, ChromaDB, and a Streamlit front-end, this suite enables users to engage in dynamic dialogues with PDF documents, extracting insights with high accuracy and efficiency. Beyond its initial scope, "Mr.DOC" exhibits potential as an agentic AI, capable of autonomously interacting with diverse software ecosystems and file types when embedded, offering a scalable, repurposable framework for knowledge extraction. This paper presents the system’s design, implementation, and evaluation, highlighting its privacy-first approach, adaptability, and future roadmap for cross-platform integration.
The exponential growth of digital documentation has necessitated tools that simplify information retrieval from complex files. Traditional methods, manual searches or cloud-based AI assistants often compromise efficiency or privacy, respectively. "Mr.DOC" addresses these challenges by offering a locally executable, AI-driven solution that processes documents on the user’s machine, ensuring data privacy while leveraging cutting-edge technologies like Google Cloud’s Vertex AI, LangChain, and ChromaDB.
This work introduces "Mr.DOC" as more than a document query tool., it positions it as an agentic AI, capable of autonomous decision-making and interaction with embedded software environments or file systems. Agentic AI systems proactively execute tasks, adapt to contexts, and interface with external tools, making "Mr.DOC" a versatile candidate for integration into applications like word processors, cloud storage platforms, or enterprise software. By running locally with optional cloud support, it ensures unlimited, cost-free use while maintaining user control over sensitive data.
This thesis evaluates "Mr.DOC”, current implementation, demonstrates its agentic potential through theoretical extensions, and proposes a roadmap for its evolution into a cross-platform knowledge extraction agent.
2.1 Document Interaction Tools
Existing document interaction tools range from basic keyword search utilities to cloud-based AI solutions like Adobe Acrobat’s Liquid Mode or Google’s Document AI. While effective, these often require internet connectivity and send user data to external servers, raising privacy concerns.
2.2 Agentic AI
Agentic AI refers to systems with autonomy, goal-directed behavior, and the ability to interact with their environment (Wooldridge & Jennings, 1995). Examples include virtual assistants (e.g., Siri) and robotic process automation (RPA) bots. However, few agentic systems prioritize local execution or document-specific tasks, creating a niche for "Mr.DOC."
2.3 Privacy and Local Processing
Privacy-preserving AI has gained traction with frameworks like federated learning (McMahan et al., 2017). "Mr.DOC" aligns with this trend by processing data locally, leveraging built-in models like Sentence Transformers, with optional Vertex AI integration for enhanced capabilities.
3.1 Architecture Overview
"Mr.DOC" integrates several components into a cohesive system:
• Front-End: Streamlit provides a lightweight, chat-like interface for user interaction.
• Document Processing: PyPDF2 extracts text from PDFs, split into chunks by LangChain’s CharacterTextSplitter.
• Embedding and Retrieval: Sentence Transformers (all-MiniLM-L6-v2) generate embeddings, stored in ChromaDB for efficient retrieval.
• Language Model: Google Vertex AI’s gemini-pro powers question-answering via LangChain’s RetrievalQA chain.
• Storage: Query logs session history and feedback locally, ensuring privacy.
The architecture supports local execution, with Vertex AI calls as an optional enhancement, preserving user data sovereignty. With a possible option to pipeline it so the old query data is cleaned manually and fed back into the model training. (This will require larger processing power), But still maintains Security and Privacy.
3.2 Agentic Possible Future Capabilities
As an agentic AI, "Mr.DOC" can be embedded into software ecosystems (e.g., Microsoft Word, Google Drive) to:
Autonomous Interaction: Monitor open files, extract content, and preemptively index data for queries.
Context Awareness: Adapt responses based on the software context (e.g., formatting queries in Word, metadata in cloud storage).
Task Execution: Perform actions like summarizing documents or exporting answers to other applications.
This is enabled by extending its retrieval and language model components to interface with APIs or file systems, transforming it from a passive query tool to an active agent.
3.3 Implementation
The system was implemented in Python 3.12, with dependencies managed via requirements.txt. Local execution uses the user’s PC power, while GCP deployment leverages Compute Engine for scalability. Privacy is ensured by processing all data locally unless Vertex AI is explicitly invoked.
4.1 Functionality Testing
"Mr.DOC" was tested with PDFs ranging from 10 to 1,500 pages, including technical manuals and academic papers. It accurately answered queries (e.g., “What is the main topic?”) with a response time of 2–5 seconds for small files and 10–15 seconds for larger ones, using a MacBook Pro (M1, 16GB RAM).
4.2 Privacy Assurance
No user data was transmitted externally during local runs, confirmed via network monitoring. Vertex AI calls sent only query-specific embeddings, adhering to Google’s privacy policies.
4.3 Agentic Potential
A proof-of-concept extension embedded "Mr.DOC" into a mock text editor, autonomously indexing an open document and suggesting answers to hypothetical queries. This demonstrated its ability to interact with software environments, laying the groundwork for broader applications.
5.1 Contributions
"Mr.DOC" offers three key contributions:
5.2 Comparison with Existing Systems
Unlike cloud-based tools (e.g., Google Document AI), "Mr.DOC" prioritizes privacy and cost-free use. Compared to local tools like PDF readers with search, it adds conversational AI and agentic features, enhancing usability.
5.3 Limitations
• Hardware Dependency: Performance relies on the user’s PC power, limiting scalability for resource-intensive tasks.
• Current Scope: Supports only PDFs, though extensible to other formats.
• Interface: The Streamlit POC prioritizes functionality over aesthetics.
Future Work
6.1 Cross-Platform Embedding
To fully realize its agentic potential, "Mr.DOC" can be integrated into:
• Office Suites: Embed in Microsoft Office or LibreOffice to process live documents.
• Cloud Storage: Interface with Google Drive or Dropbox APIs for real-time file indexing.
• Custom Applications: Offer an SDK for developers to embed "Mr.DOC" into proprietary software.
6.2 Expanded File Support
Support for DOCX, PPTX, and multimedia files will broaden its utility, requiring updates to the document processing pipeline.
6.3 Multilingual and Customization Enhancements
Adding multilingual models (e.g., via Hugging Face) and user-defined AI parameters (e.g., tone, verbosity) will enhance accessibility and personalization.
Conclusion
"Mr.DOC" redefines document interaction by combining privacy-preserving local execution with advanced AI capabilities. Its potential as an agentic AI—interacting autonomously with software and files—sets a new standard for knowledge extraction tools. By leveraging Vertex AI, LangChain, and ChromaDB, it delivers precise, efficient responses while remaining adaptable for future cross-platform applications. This work lays the foundation for a transformative, user-centric AI suite, poised to evolve with emerging technologies and user needs.
References
• McMahan, H. B., et al. (2017). "Communication-Efficient Learning of Deep Networks from Decentralized Data." AISTATS.
• Wooldridge, M., & Jennings, N. R. (1995). "Intelligent Agents: Theory and Practice." Knowledge Engineering Review.
• Google Cloud Documentation: Vertex AI, BigQuery, Compute Engine.
There are no datasets linked
There are no datasets linked