This paper presents the development and deployment of the DocVerify App, an AI-based solution designed to enhance document authentication processes. By utilizing OCR and similarity algorithms, the app extracts and compares text from images and PDFs to detect inconsistencies. The project aims to provide businesses and institutions with a robust mechanism to prevent forgery, streamline verification, and maintain high data accuracy.
Forgery and document manipulation have become pervasive issues across sectors such as finance, legal, and healthcare. Traditional verification methods often fall short in detecting subtle changes in official records. This project introduces an AI-driven document verifier that uses OCR and cosine similarity algorithms to automate document validation, minimizing human error and increasing efficiency.
The app's goal is to simplify the verification process by allowing users to upload documents for comparison with original templates, identifying mismatches and alerting users to inconsistencies.
System Architecture
The DocVerify App follows a modular design consisting of the following components:
OCR and Text Extraction
Tesseract OCR is used to extract textual data from scanned images and PDFs. The extracted text is preprocessed by removing noise, special characters, and formatting inconsistencies.
Text Similarity Analysis
Text comparison is performed using cosine similarity and vectorization techniques. This helps the app detect minute variations between original and subsequent documents.
Dataset and Document Types
The experiment was conducted using a dataset comprising:
Testing Environment
The Document Verifier App demonstrated the following performance:
The app successfully identified tampered documents
The DocVerify App proves to be an effective solution for automating document authentication. By integrating OCR and similarity analysis, the app offers a high-accuracy tool for detecting document forgery.
Future improvements will focus on enhancing multilingual support, integrating AI-driven anomaly detection, and expanding the app’s capabilities to handle larger document volumes.
This project highlights the potential of AI and computer vision in strengthening document verification processes, contributing to fraud prevention across industries.
There are no datasets linked
There are no datasets linked