This project introduces a robust and scalable API for intelligent text extraction from images, leveraging Azure Computer Vision OCR and FastAPI. Designed to streamline document digitization and improve accessibility, the system supports various image formats, automatically optimizes images for processing, and delivers precise text recognition. By integrating advanced cloud-based OCR capabilities with an intuitive API framework, this project simplifies text extraction workflows for developers and businesses. Key features include automatic resizing, format conversion, and seamless deployment via the Render platform, ensuring high performance and usability. This innovative solution demonstrates the potential of combining Azure’s cognitive services with FastAPI to create efficient and user-friendly tools for text extraction and analysis.
This project integrates Azure Computer Vision OCR with FastAPI to deliver an efficient and user-friendly API for image-to-text extraction. The methodology encompasses the following key components:
Format Support: The API supports JPEG and PNG formats. Unsupported image formats are automatically converted to a compatible format using the Pillow library.
Automatic Resizing: Very large images are resized to optimize processing time and accuracy without compromising the quality of the extracted text.
The core of the system is powered by Azure Computer Vision OCR, a cloud-based solution offering fast and accurate text recognition.
The API communicates with Azure’s endpoint to process images and retrieve the extracted text.
FastAPI Framework: The API is developed using FastAPI for high performance and ease of use.
Endpoints:
Root Endpoint: Provides a status check to ensure the API is running correctly.
Text Extraction Endpoint: Allows users to upload images and receive the extracted text in JSON format.
Invalid Input Formats: Returns clear error messages if the uploaded file is not a supported image format.
Azure Service Issues: Provides descriptive error messages when there is an issue with Azure’s OCR service.
Internal Server Errors: Ensures general server-side issues are logged and reported effectively.
The API is hosted on the Render platform, ensuring reliability and scalability for production use.
Seamless deployment allows users to access the service without needing extensive technical setup.
To start using the Azure OCR API, clone the repository from GitHub using the following command:
Install the required dependencies using the requirements.txt file:
Create a config.py file and add your Azure OCR credentials:
To use the Azure OCR API, you need to obtain an Azure Computer Vision subscription key and endpoint. Follow these steps:
If you don’t already have an Azure account, create one at Azure Portal.
Go to the Azure Portal and navigate to the Create a resource section.
Search for Computer Vision and select Create.
Fill in the required details (e.g., subscription, resource group, region) and create the resource.
Obtain API Key and Endpoint
Once the resource is created, go to the Keys and Endpoint section under the resource overview.
Copy your Endpoint URL and one of the API Keys.
Create a config.py file in the root directory of your project.
Add the below lines to store your credentials:
To understand the capabilities and limitations of the Azure OCR API, visit the official Azure Computer Vision Documentation.
Start the FastAPI server locally with the following command:
Open the Swagger UI by navigating to the local server URL:
Locate the /extract-text/ endpoint and click Try it out.
Upload an image under the file parameter and click Execute.
View the JSON response in the output section.
You can access the deployed version of this API here:
A response code of 200 indicates the process was completed successfully. Other response codes are as follows:
400 Bad Request: An invalid request was made (e.g., an unsupported image format was uploaded).
401 Unauthorized: The Azure API key is invalid or missing.
500 Internal Server Error: An error occurred on the server.
There are no datasets linked
There are no datasets linked