Leveraging AI to Democratize Educational and Cultural Audiovisual Content in Brazil

Image from Youtube Video - AI for Accessibility by MSFTEnable

Abstract

This article presents an artificial intelligence (AI) system designed to generate Portuguese subtitles for audiovisual content, whether the original audio is in Portuguese or English. Aimed at enhancing accessibility for deaf individuals and non-English speakers in Brazil, this tool addresses critical gaps in educational and cultural inclusivity. Drawing on insights from studies on audiovisual translation, inclusive education, and AI in pedagogy, we analyze the AI’s technical framework, its alignment with accessibility standards, and its potential impact on Brazilian society. Challenges such as ethical considerations and implementation barriers are discussed, alongside recommendations for integration into Brazil’s educational and cultural sectors.

Keywords: Artificial Intelligence, Subtitling, Accessibility, Deaf Education, Brazil.

Introduction

In Brazil, approximately 10.7 million people live with hearing impairments, yet only 26,9% of public schools offer any accessibility resources its students (INEP, Censo Escolar, 2022). Concurrently, English proficiency remains low, with Brazil ranking 58th globally (EF EPI, 2022), limiting access to international educational content. These disparities underscore the urgency for innovative solutions to democratize access to knowledge. Recent studies on accessible audiovisual translation (TAVa) emphasize the importance of subtitling parameters such as linguistic segmentation and speed (Vieira et al., 2020), while AI tools like ChatGPT demonstrate transformative potential in education (Silva & Oliveira, 2023). This article introduces an AI subtitling system tailored for Brazil’s educational and cultural contexts, leveraging natural language processing (NLP) to bridge accessibility gaps.

Problem Context

While only 7% of public schools offer resources tailored to deaf students, the broader cultural landscape suffers from a parallel neglect: iconic Brazilian films, documentaries, and educational programs often lack subtitles or rely on poorly adapted translations. For instance, less than 30% of content on platforms like Cinemateca Brasileira or TV Escola includes closed captions, and even fewer adhere to accessibility standards like syntactic segmentation or optimal reading speeds (Vieira et al., 2020). This deficiency not only hinders deaf individuals from engaging with national heritage—such as classics like Central do Brasil or educational series on Afro-Brazilian history—but also marginalizes non-English speakers, given Brazil’s low global English proficiency (58th rank, EF EPI, 2022).

The problem extends beyond technical shortcomings. Subtitles, when available, frequently ignore cultural nuances, such as regional dialects or sociohistorical references, flattening narratives that are vital for identity formation (Revista ES, 2023). For example, documentaries on Indigenous traditions or cordel literature often lose contextual depth in literal translations, alienating audiences who rely on textual support. Moreover, as highlighted by Salut Captions (2022), subtitles for deaf viewers rarely incorporate non-verbal cues (e.g., ambient sounds, music tones), critical for immersive storytelling.

In education, the absence of pedagogically oriented subtitles exacerbates learning disparities. This misalignment between audiovisual tools and pedagogical needs stifles inclusive learning.

Addressing these barriers demands a solution that harmonizes technical precision, cultural fidelity, and equitable access, ensuring education and heritage are truly democratized.

Technical Requirements

The system prioritizes three pillars:

Accuracy: Whisper’s large-v1 model
Scalability: FFmpeg optimizes audio extraction, while parallel processing ensures rapid translation.
Accessibility: Subtitles are formatted as SRT files, compatible with all major platforms.

Key technical challenges included synchronizing translated text with scene transitions and minimizing GPU memory usage during batch processing. By segmenting audio into 30-second chunks and leveraging lightweight translation APIs, we achieved high performance on consumer-grade hardware.

Architecture

Technical Diagram:

graph
  A[Video] --> B(Audio Extraction - FFmpeg)
  B --> C(Transcription - Whisper)
  C --> D[Temporal Segments]
  D --> E(Segment-Based Translation - mtranslate)
  E --> F(SRT Generation)

Implementation

Key Code:

def process_video(video_path, target_language = "pt"):
    model = whisper.load_model("large-v1")

    try:
        audio_path = extract_audio(video_path)
        text, language, segments = transcribe_audio(audio_path, model)
        translated_segments = [translate_subs(seg['text'], target_language) for seg in segments]
        srt_path = os.path.splitext(video_path)[0] + "translated.srt"
        save_srt(segments, translated_segments, srt_path)

        return srt_path
    finally:
        if os.path.exists(audio_path):
            os.remove(audio_path)
            print(f"Áudio temporário removido: {audio_path}")

Full repository link

Optimizations Highlighted:

Temporary files to reduce RAM usage.

Results

Initial qualitative tests conducted by the researcher using cinema clips (e.g., Cidade de Deus) and TED Talks revealed that the AI-generated subtitles effectively handled syntactic segmentation, aligning with theoretical recommendations (Karamitroglou, 1998). However, challenges emerged in translating idiomatic expressions (e.g., regional slang) and capturing non-verbal audio cues (e.g., ambient sounds in documentaries about Amazonian rituals). The system showed potential in maintaining synchronization between dialogue and text, but some inconsistencies occurred during scenes surround by various sounds (e.g., crow talking).

Image from Walter Salles' short film "À 8.944km de Cannes" with subtitles generated from this AI

Future Impact

If optimized, this tool could democratize access to Brazil’s audiovisual culture—such as restoring subtitles for lost films in the Cinemateca Brasileira archive—and enhance educational equity. For instance, integrating the AI into platforms like MEC’s Hora do Enem could provide deaf students with real-time subtitles for exam prep lectures, while partnerships with streaming services (e.g., Globoplay) might expand access to international content for non-English speakers. Culturally, it could preserve oral traditions (e.g., literatura de cordel) through accurate subtitling, fostering broader appreciation of regional narratives.

Next Steps

Next steps include formalizing accuracy metrics through collaboration with linguists.Testing with NOG's across regions (e.g., São Paulo’s urban centers vs. Amazonian rural schools) will identify infrastructural and pedagogical needs. Technical priorities include integrating non-verbal sound descriptions (e.g., “[folk music playing softly]”) and improving real-time processing for live educational broadcasts. Partnerships with institutions like Ancine and Fundação Dorina Nowill could drive policy changes, mandating AI subtitling in public media.

References

FLORES, L. B.
A INCLUSÃO DOS EDUCANDOS COM DEFICIÊNCIA AUDITIVA NO ENSINO REGULAR: AVANÇOS E DESAFIOS.
Alegrete: Universidade Federal do Pampa, 2021.
Avaliable on: https://www.academia.edu/87900719/A_inclus%C3%A3o_dos_educandos_com_defici%C3%AAncia_auditiva_no_ensino_regular_avan%C3%A7os_e_desafios?source=swp_share

VIEIRA, P. A.; ASSIS, Í. A. P.; ARAÚJO, V. L. S.
TRADUÇÃO AUDIOVISUAL: ESTUDOS SOBRE A LEITURA DE LEGENDAS PARA SURDOS E ENSURDECIDOS.
Cadernos de Tradução, Florianópolis, v. 40, nº esp. 2, p. 97-124, 2020.
DOI: 10.5007/2175-7968.2020v40nesp2p97.

NASCIMENTO, JOSÉ LEÔNIDAS ALVES DO.
O IMPACTO DA INTELIGÊNCIA ARTIFICIAL NA EDUCAÇÃO: UMA ANÁLISE DO POTENCIAL TRANSFORMADOR DO CHATGPT.
Formiga (MG): Editora MultiAtual, 2024. 47 p. : il.
Avaliable on: https://www.editoramultiatual.com.br/2024/06/o-impacto-da-inteligencia-artificial-na.html

https://www.gov.br/inep/pt-br/assuntos/noticias/censo-escolar/mec-e-inep-divulgam-resultados-do-censo-escolar-2023 (Access: Mar/13/2025)

https://revistaes.com.br/colunas/o-papel-da-legenda-no-audiovisual/ (Access: Mar/13/2025)

https://salutcaptions.com/2022/07/27/quem-nao-ouve-assiste-como/ (Access: Mar/13/2025)

https://www.cnnbrasil.com.br/internacional/brasil-ocupa-58a-posicao-entre-os-111-paises-avaliados-em-dominio-do-ingles/ (Access: Mar/13/2025)

https://agenciadenoticias.uniceub.br/destaque/escolas-brasileiras-nao-sao-acessiveis-para-pessoas-com-deficiencia/ (Access: Mar/13/2025)

About the Author

Gabrielly Alves Gomes is a Data Management student and Machine Learning researcher specializing in Natural Language Processing (NLP) in Federal University of Piauí, and Data Engineer at Cortex Intelligence. Their work bridges academic research and industry applications, focusing on leveraging NLP and data engineering to develop inclusive, world-changing and data-driven technologies. They designed the AI subtitling system discussed in this article as part of their ongoing personal exploration of ethical AI solutions for accessibility.

Contact: gabrielly.gomes@ufpi.edu.br | codeonthespectrum.online
Linkedin