Approximately 99% of the U.S. population does not use American Sign Language (ASL)
https://cdhh.ri.gov/information-referral/american-sign-language.php
, creating significant communication barriers for the Deaf and Hard of Hearing community. This project aims to bridge that gap by developing an ASL hand sign recognition system that facilitates more accessible interactions.
This project introduces a system for American Sign Language (ASL) hand sign recognition, achieving 95% accuracy using a Bidirectional Long Short-Term Memory (LSTM) model developed with TensorFlow. The system processes both static images and live signing sequences to ensure robustness, leveraging a dataset of over 30,000 samples. MediaPipe enables precise hand landmark detection, while OpenCV facilitates seamless video processing, allowing for the recognition of letters, words, and phrases with immediate visual feedback. Advanced preprocessing techniques, implemented with Scikit-Learn, effectively handle class imbalances, contributing to a validation accuracy of 92%. This work highlights the integration of AI and computer vision to create practical tools for the Deaf and Hard of Hearing community. It serves as a benchmark for future innovations in inclusive technologies. A future application of this project is in Real-time ASL recognition glasses, for production.
Feel free to checkout the Github Repo @ https://github.com/qbeka/ASL-Gesture-Recognition
Data Collection and Preprocessing
A dataset of 30,000+ samples, including static and dynamic hand signs, was curated to train and validate the model.
MediaPipe was utilized for detecting and labeling hand landmarks, producing feature vectors used as model inputs.
OpenCV was employed to extract frames from signing videos, supporting dynamic sequence recognition.
import cv2 video = cv2.VideoCapture("sample_video.mp4") while video.isOpened(): ret, frame = video.read() if not ret: break # Process frame for landmark detection cv2.imshow("Frame", frame) video.release() cv2.destroyAllWindows()
Data preprocessing, including normalization and oversampling of minority classes, was implemented using Scikit-Learn.
from sklearn.preprocessing import StandardScaler scaler = StandardScaler() X_normalized = scaler.fit_transform(X)
Model Architecture
A Bidirectional LSTM model was chosen for its capability to process sequential data effectively, with TensorFlow as the framework.
from tensorflow.keras.models import Sequential from tensorflow.keras.layers import LSTM, Dense, Bidirectional model = Sequential([ Bidirectional(LSTM(128, return_sequences=True), input_shape=(None, 21)), LSTM(64), Dense(26, activation='softmax') ]) model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
Real-Time Recognition
The system supports real-time recognition of letters, words, and phrases.
OpenCV integrates with MediaPipe to capture and process live video input, enabling immediate visual feedback.
import mediapipe as mp hands = mp.solutions.hands.Hands()
The model achieved 95% training accuracy and 92% validation accuracy, demonstrating strong generalization across diverse data samples, like dark and bright settings, blurry and clear images.
Real-time testing revealed consistent performance, with accurate recognition of both static and dynamic signs.
The tool was evaluated with user tests, showcasing its effectiveness in recognizing ASL signs in various lighting and motion conditions.