This project involves building a Hand Sign Recognition Model to interact with a Tetris game. Hand gestures control the movement and rotation of Tetris pieces in real-time.
Below is a detailed guide on how the project is structured and implemented.
This project uses a hand gesture recognition system to control shapes in a Tetris game. The gestures and their corresponding actions are:
point_up
: Rotate the Tetris piece.point_down
: Move the Tetris piece down.left_thumb
: Move the Tetris piece left.right_thumb
: Move the Tetris piece right.Data is collected using a webcam with MediaPipe's hand detection library. Each gesture is recorded as a series of 21 hand landmark coordinates saved as .npy
files.
import os import numpy as np class Action_Data_Manager(): def __init__(self): self.actions = ['point_up', 'point_down', 'left_thumb', 'right_thumb'] self.root_path = './action_data' self.actions_paths = [] def create_action_paths(self): try: os.mkdir(self.root_path) except: pass for action in self.actions: path = f'{self.root_path}/{action}' try: os.mkdir(path) except: pass self.actions_paths.append(path)
Run the webcam to collect hand landmark data:
import mediapipe as mp import cv2 def get_landmarks_from_cam(path, num_record=50, message=''): count, paused = 0, False capture = cv2.VideoCapture(0) with mp_hands.Hands(min_detection_confidence=0.8, min_tracking_confidence=0.5) as hands: while capture.isOpened(): if not paused: ret, frame = capture.read() if not ret: break image = cv2.flip(frame, 1) detected_image = hands.process(image) detected_image if detected_image.multi_hand_landmarks: for hand_lms in detected_image.multi_hand_landmarks: mp_drawing.draw_landmarks(image, hand_lms, mp_hands.HAND_CONNECTIONS) landmarks = np.array([[lm.x, lm.y, lm.z] for lm in hand_lms.landmark]) np.save(file=path+f'{count}.npy', arr=landmarks) count += 1 if count == num_record: break cv2.putText(image, message, org=(10,20), fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=1, color=(255, 0, 0)) cv2.imshow('Webcam', image) key = cv2.waitKey(10) & 0xFF # Wait for a key press and store the key code if key == ord('p'): # Press 'p' to pause or resume paused = not paused if (key == ord('q')) or (count >= num_record): # Quit when press 'q' or have enough images break capture.release() cv2.destroyAllWindows() num_record = 50 for i in range(len(action_data.actions)): message = f'DO {action_data.actions[i]} action' path = f'{action_data.actions_paths[i]}/{action_data.actions[i]}' get_lanmarks_from_cam(path, num_record, message)
After collecting data, it needs to be reprocessed into arrays suitable for training:
X = np.array([ [np.array(action_data.get_action_landmarks(action, id)) for id in range(50)] for action in action_data.actions ]) y = np.array([[id] * 50 for id in range(len(action_data.actions))]).flatten()
A simple FNN model is used to classify hand gestures based on the 21 hand landmarks.
from tensorflow.keras.models import Sequential from tensorflow.keras.layers import Dense, Dropout, Input from sklearn.model_selection import train_test_split from tensorflow.keras.utils import to_categorical # Prepare data X = X.reshape(X.shape[0], -1) # Flatten y = to_categorical(y) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) # Build model def create_fnn_model(input_shape, num_classes): model = Sequential([ Input(shape=input_shape), Dense(128, activation='relu'), Dropout(0.5), Dense(64, activation='relu'), Dropout(0.5), Dense(num_classes, activation='softmax') ]) return model model = create_fnn_model((X_train.shape[1],), num_classes=len(action_data.actions)) model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy']) model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=100, batch_size=32) model.save('fnn.keras')
Show learn curves
import matplotlib.pyplot as plt # Plot training and validation accuracy plt.plot(history.history['accuracy'], label='Train Accuracy') plt.plot(history.history['val_accuracy'], label='Validation Accuracy') plt.xlabel('Epochs') plt.ylabel('Accuracy') plt.legend() plt.show()
Evaluate the model on test data
test_loss, test_accuracy = model.evaluate(X_test, y_test, verbose=0) print(f"Test Loss: {test_loss:.4f}") print(f"Test Accuracy: {test_accuracy:.4f}")
Test Loss: 0.0044 Test Accuracy: 1.0000
The trained model is deployed to classify gestures in real-time:
import cv2 def classify(model, landmarks): prob = model.predict(landmarks.reshape(1, -1)) return action_data.actions[np.argmax(prob)]
Using the classified gestures, players can control Tetris pieces in the game. Here's how gestures map to actions in Tetris:
point_up
: Rotate the piece.point_down
: Move the piece down.left_thumb
: Move the piece left.right_thumb
: Move the piece right.The Tetris game listens to the classified gestures and updates the game state accordingly:
if action == 'left_thumb': piece_pos[0] -= 1 elif action == 'right_thumb': piece_pos[0] += 1 elif action == 'point_down': piece_pos[1] += 1 elif action == 'point_up': rotated_piece = rotate_piece(current_piece)
Too see the full implementation with Tetris coding, please consider the attached Tetris.py in the uploaded files page.
Explore attached machine_human_interact.ipynb for model development.
This project demonstrates the integration of computer vision and machine learning in game control. The hand sign recognition model is versatile and can be extended to other interactive applications beyond gaming.
There are no models linked
There are no datasets linked
There are no models linked
There are no datasets linked