Dec 27, 2024●13 reads●No License

American Sign Language Model using OpenCV

c
@chaudharyankita667-1

Real-Time Sign Language Detection Using Mediapipe, OpenCV, and gTT

Abstract

This project aims to develop a real-time sign language detection model leveraging Mediapipe for hand landmark detection, OpenCV for image processing, and gTTS for converting recognized text into speech. The model enables real-time communication between sign language users and others by recognizing gestures and generating corresponding spoken text.

Workflow

Screenshot 2024-12-27 212642.png

1. Dataset Generation

The dataset consists of images of hand gestures corresponding to sign language labels. Using OpenCV, images are captured from a webcam and organized into folders for training purposes.

# Create folders and capture images
for label in labels:
    label_path = os.path.join(IMAGES_PATH, label)
    os.makedirs(label_path, exist_ok=True)
    cap = cv2.VideoCapture(0)
    print('Collecting images for {}'.format(label))
    time.sleep(5)
    
    for imgnum in range(number_img):
        ret, frame = cap.read()
        imagename = os.path.join(label_path, '{}.jpg'.format(str(uuid.uuid1())))
        cv2.imwrite(imagename, frame)
        cv2.imshow('frame', frame)
        time.sleep(2)

        if cv2.waitKey(1) & 0xFF == ord('q'):
            break
    
    cap.release()
    cv2.destroyAllWindows()

print('Image collection completed!')

2. Preprocessing with Mediapipe

Mediapipe is used to extract hand landmarks, which are preprocessed and normalized to prepare the data for the CNN model.

# import numpy as np
import mediapipe as mp

# Function to preprocess Mediapipe landmarks
def preprocess_landmarks(landmarks):
    landmarks = np.array([[lm.x, lm.y, lm.z] for lm in landmarks]).flatten()
    landmarks = (landmarks - np.mean(landmarks)) / np.std(landmarks)  # Normalize
    return landmarks[:63].reshape(1, -1)  # Reshape for CNN input

3. Building the CNN Model

The CNN model is designed to classify the preprocessed landmarks into one of the nine labels.

# from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LeakyReLU, BatchNormalization, Dropout

# Define the CNN model
model = Sequential()
model.add(Dense(256, input_shape=(63,)))
model.add(LeakyReLU(alpha=0.1))
model.add(BatchNormalization())
model.add(Dropout(0.5))
model.add(Dense(128))
model.add(LeakyReLU(alpha=0.1))
model.add(BatchNormalization())
model.add(Dropout(0.5))
model.add(Dense(64))
model.add(LeakyReLU(alpha=0.1))
model.add(BatchNormalization())
model.add(Dropout(0.5))
model.add(Dense(9, activation='softmax'))

# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

4. Real-Time Gesture Detection

Real-time detection is performed using Mediapipe for hand tracking, and the landmarks are processed and passed through the trained CNN model for prediction.

# cap = cv2.VideoCapture(0)
mp_hands = mp.solutions.hands
hands = mp_hands.Hands()

while cap.isOpened():
    ret, frame = cap.read()
    image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    results = hands.process(image)
    
    if results.multi_hand_landmarks:
        for hand_landmarks in results.multi_hand_landmarks:
            landmarks = preprocess_landmarks(hand_landmarks.landmark)
            prediction = model.predict(landmarks)
            label = labels[np.argmax(prediction)]
            
            # Display the prediction on the video feed
            cv2.putText(frame, label, (10, 50), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 0, 0), 2, cv2.LINE_AA)
    
    cv2.imshow('Real-Time Detection', frame)
    
    if cv2.waitKey(10) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

5. Converting Text to Speech

The detected gesture is converted into text and then into speech using the gTTS library.

# from gtts import gTTS
import os

# Function to convert text to speech
def text_to_speech(text):
    tts = gTTS(text, lang='en')
    tts.save('output.mp3')
    os.system('start output.mp3')

Conclusion

This project showcases a complete pipeline for real-time sign language detection, from dataset generation to gesture classification and text-to-speech conversion. The integration of Mediapipe, OpenCV, and gTTS makes it an accessible and efficient tool for breaking communication barriers between sign language users and others.

Project Link

[https://github.com/Ankitach780/American-Sign-Language-Detection]

Models

Files

Dec 27, 2024●13 reads●No License

American Sign Language Model using OpenCV

c
@chaudharyankita667-1

Real-Time Sign Language Detection Using Mediapipe, OpenCV, and gTT

Abstract

Workflow

Screenshot 2024-12-27 212642.png

1. Dataset Generation

The dataset consists of images of hand gestures corresponding to sign language labels. Using OpenCV, images are captured from a webcam and organized into folders for training purposes.

# Create folders and capture images
for label in labels:
    label_path = os.path.join(IMAGES_PATH, label)
    os.makedirs(label_path, exist_ok=True)
    cap = cv2.VideoCapture(0)
    print('Collecting images for {}'.format(label))
    time.sleep(5)
    
    for imgnum in range(number_img):
        ret, frame = cap.read()
        imagename = os.path.join(label_path, '{}.jpg'.format(str(uuid.uuid1())))
        cv2.imwrite(imagename, frame)
        cv2.imshow('frame', frame)
        time.sleep(2)

        if cv2.waitKey(1) & 0xFF == ord('q'):
            break
    
    cap.release()
    cv2.destroyAllWindows()

print('Image collection completed!')

2. Preprocessing with Mediapipe

Mediapipe is used to extract hand landmarks, which are preprocessed and normalized to prepare the data for the CNN model.

# import numpy as np
import mediapipe as mp

# Function to preprocess Mediapipe landmarks
def preprocess_landmarks(landmarks):
    landmarks = np.array([[lm.x, lm.y, lm.z] for lm in landmarks]).flatten()
    landmarks = (landmarks - np.mean(landmarks)) / np.std(landmarks)  # Normalize
    return landmarks[:63].reshape(1, -1)  # Reshape for CNN input

3. Building the CNN Model

The CNN model is designed to classify the preprocessed landmarks into one of the nine labels.

# from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LeakyReLU, BatchNormalization, Dropout

# Define the CNN model
model = Sequential()
model.add(Dense(256, input_shape=(63,)))
model.add(LeakyReLU(alpha=0.1))
model.add(BatchNormalization())
model.add(Dropout(0.5))
model.add(Dense(128))
model.add(LeakyReLU(alpha=0.1))
model.add(BatchNormalization())
model.add(Dropout(0.5))
model.add(Dense(64))
model.add(LeakyReLU(alpha=0.1))
model.add(BatchNormalization())
model.add(Dropout(0.5))
model.add(Dense(9, activation='softmax'))

# Compile the model
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

4. Real-Time Gesture Detection

Real-time detection is performed using Mediapipe for hand tracking, and the landmarks are processed and passed through the trained CNN model for prediction.

# cap = cv2.VideoCapture(0)
mp_hands = mp.solutions.hands
hands = mp_hands.Hands()

while cap.isOpened():
    ret, frame = cap.read()
    image = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)
    results = hands.process(image)
    
    if results.multi_hand_landmarks:
        for hand_landmarks in results.multi_hand_landmarks:
            landmarks = preprocess_landmarks(hand_landmarks.landmark)
            prediction = model.predict(landmarks)
            label = labels[np.argmax(prediction)]
            
            # Display the prediction on the video feed
            cv2.putText(frame, label, (10, 50), cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 0, 0), 2, cv2.LINE_AA)
    
    cv2.imshow('Real-Time Detection', frame)
    
    if cv2.waitKey(10) & 0xFF == ord('q'):
        break

cap.release()
cv2.destroyAllWindows()

5. Converting Text to Speech

The detected gesture is converted into text and then into speech using the gTTS library.

# from gtts import gTTS
import os

# Function to convert text to speech
def text_to_speech(text):
    tts = gTTS(text, lang='en')
    tts.save('output.mp3')
    os.system('start output.mp3')

American Sign Language Model using OpenCV

Table of contents

Real-Time Sign Language Detection Using Mediapipe, OpenCV, and gTT

Abstract

Workflow

1. Dataset Generation

2. Preprocessing with Mediapipe

3. Building the CNN Model

4. Real-Time Gesture Detection

5. Converting Text to Speech

Conclusion

Project Link

Models

Files

American Sign Language Model using OpenCV

Table of contents

Real-Time Sign Language Detection Using Mediapipe, OpenCV, and gTT

Abstract

Workflow

1. Dataset Generation

2. Preprocessing with Mediapipe

3. Building the CNN Model

4. Real-Time Gesture Detection

5. Converting Text to Speech

Conclusion

Project Link

Models

Files

Datasets

Datasets

Models