In this project, I designed an advanced computer vision system that interprets text from protest banners in real-time. The system employs Vision Transformers (ViT) and the Segment Anything Model (SAM) to accurately detect and mask banners in images. This is followed by Optical Character Recognition (OCR) for extracting text. The extracted text is then processed using a generative chatbot (based on Gemini) with a carefully crafted prompt to analyze and provide contextual insights into the messages. By integrating cutting-edge AI technologies, the system aims to facilitate social and political research by providing insights into the content of protest banners.
Protests play a significant role in expressing collective societal and political views. Understanding the messages displayed on protest banners can provide valuable insights into public opinion. However, manual analysis of banners from images is labor-intensive and prone to errors. To address this challenge, we developed an automated AI-powered system capable of real-time text extraction and contextual analysis.
The system consists of the following main components:
Using Vision Transformers and SAM, the system generates precise masks for banners in images, ensuring robust segmentation even in complex scenes.
# Example of using SAM for banner segmentation from transformers import AutoProcessor, AutoModelForMaskGeneration processor = AutoProcessor.from_pretrained("Zigeng/SlimSAM-uniform-77") model = AutoModelForMaskGeneration.from_pretrained("Zigeng/SlimSAM-uniform-77") inputs = processor(raw_image, input_points=input_points, return_tensors="pt")
The system leverages Tesseract OCR for extracting text from segmented banners.
# Example of OCR text extraction import easyocr reader = easyocr.Reader(['en']) # Ensure masked_img is converted to a format EasyOCR can handle results = reader.readtext(np.array(masked_img)) extracted_text = " ".join([result[1] for result in results])
A Gemini-based generative chatbot is employed to process the extracted text. With a carefully crafted prompt, the chatbot provides context and analysis for the message, such as identifying themes and sentiment.
import google.generativeai as genai genai.configure(api_key=GEMINI_API_KEY) prompt = f""" Please review and enhance the following text by: Correcting grammar, phrasing, and clarity issues, ensuring consistent capitalization and proper punctuation, eliminating redundant words while maintaining the original tone and message, standardizing formatting and spacing. After these corrections, provide a **detailed and thoughtful analysis** of the main message of the text. If the text mentions a place, organization, or cultural reference, include a creative explanation of its significance or relevance, with additional context where possible. Focus on providing unique insights while keeping the response concise and engaging. Avoid including section headings like "Analysis" or "Main Message" in the response. TEXT: {query_oneline} """ model = genai.GenerativeModel("gemini-1.5-flash-latest") answer = model.generate_content(prompt)
The demo of this project is available on youtube.
This AI-powered system demonstrates the potential of integrating computer vision and NLP to analyze protest banners effectively. It is a step forward in leveraging AI for social and political research.
The full implementation of this project is available on GitHub.
# Clone the repository $ git clonehttps://github.com/fedy-benhassouna/Protest-Banner-Interpreter.git
There are no datasets linked
There are no datasets linked