The banking industry has rapidly embraced AI and machine learning models across various operations, with text recognition emerging as a pivotal technology for automating tasks. From streamlining account opening procedures to verifying KYC details, AI-driven solutions are transforming traditional banking processes.
A critical application of this technology lies in automating the validation of vendor invoices for payment disbursement. Historically, this has been a manual and time-consuming process, where teams meticulously cross-check key invoice metrics—such as tax details, PAN numbers, and GST information—against existing system records. By leveraging AI in this domain, banks are poised to significantly enhance efficiency, reduce errors, and accelerate payment cycles.
The invoice validation process in our banking operations is currently facing significant challenges due to its manual nature and high volume. The following flowchart illustrates the current manual process:
Our current invoice validation process involves the following steps:
This manual approach presents several challenges:
To address the challenges presented by our current manual invoice validation process, we started on a journey to develop an automated solution. This section outlines our proposed solution and the steps we took to arrive at our final approach.
Our proposed solution aims to automate the invoice validation process using AI and OCR technologies. Here's a visual representation of how this solution compares to our current process:
The proposed automated process involves the following steps:
This automated approach is designed to streamline the invoice validation process, minimize manual effort, and improve the accuracy and efficiency of invoice processing.
Before we describe the implementation of our final solution, it's important to understand the major challenges we faced at the outset of this project:
Varying Invoice Formats:
Image Quality Issues:
Complex Metric Extraction:
India-Specific Metrics:
These challenges significantly impacted the performance of standard OCR solutions. For instance, despite efforts in image preprocessing (enhancing image clarity, spell checking, and adjusting text spacing), our initial approach using EasyOCR only achieved an overall accuracy of 39%.
Our path to this proposed solution involved exploring several approaches, each building upon the lessons learned from the previous one.
We initially experimented with various OCR libraries:
We favored EasyOCR due to its ease of implementation and support for multiple languages. However, this approach faced significant challenges:
Despite efforts in image preprocessing, this approach only achieved an overall accuracy of 39%.
Recognizing the limitations of our initial approach, we turned to Microsoft Azure Document Intelligence. We first tested their basic "Read" model to extract text directly. However, this proved insufficient due to:
Next, we experimented with Azure's pre-built invoice models. While these improved overall performance, they struggled with:
To overcome the limitations of both our initial approach and the pre-built Azure models, we developed a custom model tailored to our specific needs:
This custom approach proved to be the key to overcoming our unique challenges, significantly improving our ability to accurately process and validate invoices at scale.
In the next section, we'll discuss the results and impact of our custom model implementation.
The development and implementation of our custom model using Microsoft Azure Document Intelligence led to significant improvements in our invoice validation process. Here, we present the key results and their impact on our operations.
To evaluate the effectiveness of our solution, we compared the performance of different approaches on a sample of 25 invoices. The following table summarizes the recall for key fields extracted from the invoices using the Pre-Built Invoice Model:
Invoice Field | Present in Images | OCR Matches | Recall |
---|---|---|---|
PAN No. | 23 | 23 | 100% |
Invoice No. | 23 | 23 | 100% |
Invoice Date | 23 | 23 | 100% |
Bank GSTN | 22 | 21 | 95% |
Vendor GSTN | 22 | 20 | 91% |
HSN | 21 | 4 | 19% |
Tax Total | 18 | 5 | 28% |
Next, we have the results from the custom model:
Invoice Field | Present in Images | OCR Matches | Recall |
---|---|---|---|
PAN No. | 18 | 18 | 100% |
Invoice No. | 21 | 19 | 90% |
Invoice Date | 21 | 21 | 100% |
Bank GSTN | 21 | 20 | 95% |
Vendor GSTN | 18 | 17 | 94% |
HSN | 17 | 15 | 88% |
Invoice Total | 21 | 17 | 81% |
CGST | 18 | 17 | 94% |
SGST | 18 | 16 | 89% |
IGST | 7 | 6 | 86% |
As seen from the results, our custom model demonstrated significant improvements over the pre-built invoice model, particularly in handling India-specific fields. Here are the key findings from our performance comparison:
Consistent High Performance: Both models showed excellent recall (100%) for basic fields like PAN No. and Invoice Date.
Improved HSN Code Extraction: The custom model dramatically improved HSN code recognition from 19% to 88% recall, a critical enhancement for Indian invoices.
New Capabilities: The custom model introduced the ability to extract CGST, SGST, and IGST separately, which wasn't possible with the pre-built model.
Slight Decrease in Invoice Number Recall: We observed a minor decrease in recall for the invoice number, from 100% in the pre-built model to 90% in our custom model. While this is a slight reduction, the overall benefits in other areas, particularly in India-specific fields, outweigh this small decrease.
These results demonstrate that our custom model significantly outperforms the pre-built model in handling the complexities of Indian invoices, especially in tax-related information extraction. The slight decrease in invoice number recall is an area for potential future improvement, but it does not detract from the model's overall enhanced performance.
Our custom model brought about significant improvements in invoice processing, particularly in handling India-specific metrics like GST, HSN codes, and PAN numbers. It demonstrated remarkable adaptability to diverse invoice formats from various vendors and enabled granular data extraction, including specific tax breakdowns that were previously unattainable with pre-built models.
The operational impact of implementing this custom model has been substantial. Processing time for each invoice has been slashed from 15 minutes to approximately 5 minutes, significantly improving overall efficiency. This reduction, coupled with higher accuracy rates, has drastically decreased the need for manual verification and correction. As a result, our team of 30 checkers can now manage the daily influx of 4,000-5,000 invoices more effectively, freeing up valuable resources for other critical tasks. The streamlined process has accelerated our payment cycle, reducing the turnaround time for payment disbursal from 4-5 days to just 2-3 days. This improvement not only enhances vendor relationships but also potentially qualifies us for early payment discounts. Moreover, the automated system's scalability allows us to handle increased invoice volumes without a proportional rise in processing time or resources, positioning us well for future growth and fluctuations in workload.
Our project to automate invoice validation has successfully transformed a manual, time-intensive process into an efficient, scalable system. By developing a custom extraction model using Microsoft Azure Document Intelligence, we've dramatically improved our operations:
These improvements have optimized resource allocation, accelerated our payment cycle, and enhanced vendor relationships.
Looking ahead, we are exploring cost-effective machine learning models and hybrid approaches combining OCR and Natural Language Processing (NLP) techniques. These investigations aim to further enhance our system's ability to recognize and process complex invoice formats, ensuring we remain adaptable to future challenges in financial operations.
Tesseract OCR
EasyOCR
OpenCV (Open Source Computer Vision Library)
Microsoft Azure Document Intelligence
Optical Character Recognition (OCR) Technology
There are no models linked
There are no datasets linked
There are no datasets linked
There are no models linked