This project, titled "Multi-Format Translator with Azure OpenAI," was developed as part of the DIO challenge "Technical Articles Translator with Azure AI" for the Microsoft Certification Challenge #1 - AI 102 Bootcamp. It is a versatile translation tool designed to convert various types of content—including plain text, Word documents (.docx), and complete websites—into a target language (with Portuguese set as the default). Leveraging the Azure OpenAI API, the tool delivers high-quality and precise translations while providing a command-line interface for ease of use and accessibility.
The solution is implemented in Python 3.12 and integrates several key libraries to support its multi-format translation capabilities:
Azure OpenAI Integration:
The core of the translation functionality is built around the Azure OpenAI API. The service constructs a dynamic prompt that instructs the AI model (configured with the "o1-mini" model) to translate the input text into the desired language.
Text Translation:
For plain text inputs, the tool sends the text directly to the OpenAI API, receiving a translated version as output. This process is encapsulated in the traduzir_texto
method, which calls a lower-level method that interacts with the API.
Document Translation:
Using the python-docx
library, the tool reads Word documents (.docx) and extracts text from each paragraph. Each paragraph is individually processed and translated by the API, ensuring that the document’s content is accurately translated in segments. This modular approach allows for handling documents with varied formatting and content density.
Website Translation:
The project employs the requests
library to fetch website content and BeautifulSoup
from beautifulsoup4
to parse and clean the HTML. The process involves:
<p>
, <h1>
–<h6>
, <span>
, <div>
) to extract and translate the visible text.Environment and Configuration Management:
The python-dotenv
library is used to manage environment variables, ensuring sensitive information like the GitHub token (used as an API key) remains secure and is easily configurable.
Error Handling and Robustness:
Throughout the application, try/except blocks are employed to manage exceptions—ranging from missing environment variables to API errors or issues during HTTP requests—thus ensuring the tool fails gracefully and provides informative error messages.
The project successfully demonstrates a robust, multi-format translation tool that meets its design objectives:
Versatility:
The tool is capable of translating plain text, extracting and converting content from Word documents, and processing entire web pages. This multi-format support showcases its adaptability to various use cases and content types.
Accuracy and Quality:
Leveraging the Azure OpenAI API ensures high-quality translations. The modular approach—translating individual text segments—helps maintain the contextual integrity of the original content, whether from documents or websites.
User-Friendly Interface:
A command-line interface allows users to specify the input type (text, document, or HTML), target language, and input source, making the tool accessible and easy to use for a variety of translation tasks.
Scalability and Extensibility:
The code’s structure, centered around the ServicoTraducao
class, allows for easy extension. Future improvements could include additional file format support, enhanced error handling, or further integration with other AI services.
In summary, the project meets its objectives by providing a flexible and high-performing translation service, demonstrating the practical application of Azure OpenAI in real-world multi-format translation scenarios.
There are no models linked
There are no datasets linked
There are no models linked
There are no datasets linked