Dec 30, 2024●11 reads

Play Tetris without keyboard and mouse

d
Orson

Hand Sign Recognition Model for Tetris Game

This project involves building a Hand Sign Recognition Model to interact with a Tetris game. Hand gestures control the movement and rotation of Tetris pieces in real-time.

Below is a detailed guide on how the project is structured and implemented.

Overview
Action Recognition Model
- Data Generation
- Data Reprocessing
Model Building
- Feedforward Neural Network (FNN)
Real-Time Hand Sign Recognition
Integration with Tetris

Overview

This project uses a hand gesture recognition system to control shapes in a Tetris game. The gestures and their corresponding actions are:

point_up: Rotate the Tetris piece.
point_down: Move the Tetris piece down.
left_thumb: Move the Tetris piece left.
right_thumb: Move the Tetris piece right.

Action Recognition Model

Data Generation

Data is collected using a webcam with MediaPipe's hand detection library. Each gesture is recorded as a series of 21 hand landmark coordinates saved as .npy files.

Code Snippet

import os
import numpy as np

class Action_Data_Manager():
    def __init__(self):
        self.actions = ['point_up', 'point_down', 'left_thumb', 'right_thumb']
        self.root_path = './action_data'
        self.actions_paths = []

    def create_action_paths(self):
        try:
            os.mkdir(self.root_path)
        except: pass

        for action in self.actions:
            path = f'{self.root_path}/{action}'
            try:
                os.mkdir(path)
            except: pass
            self.actions_paths.append(path)

Run the webcam to collect hand landmark data:

import mediapipe as mp
import cv2

def get_landmarks_from_cam(path, num_record=50, message=''):
    count, paused = 0, False
    capture = cv2.VideoCapture(0)

    with mp_hands.Hands(min_detection_confidence=0.8, min_tracking_confidence=0.5) as hands:
        while capture.isOpened():
            if not paused:
                ret, frame = capture.read()
                if not ret: break

                image = cv2.flip(frame, 1)
                detected_image = hands.process(image)
                detected_image

                if detected_image.multi_hand_landmarks:
                    for hand_lms in detected_image.multi_hand_landmarks:
                        mp_drawing.draw_landmarks(image, hand_lms, mp_hands.HAND_CONNECTIONS)
                        landmarks = np.array([[lm.x, lm.y, lm.z] for lm in hand_lms.landmark])
                        np.save(file=path+f'{count}.npy', arr=landmarks)
                        count += 1
                        if count == num_record:
                            break

                cv2.putText(image, message, org=(10,20), fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=1, color=(255, 0, 0))
                cv2.imshow('Webcam', image)
            
            key = cv2.waitKey(10) & 0xFF # Wait for a key press and store the key code

            if key == ord('p'): # Press 'p' to pause or resume
                paused = not paused

            if (key == ord('q')) or (count >= num_record): # Quit when press 'q' or have enough images
                break

    capture.release()
    cv2.destroyAllWindows()

num_record = 50
for i in range(len(action_data.actions)):
    message = f'DO {action_data.actions[i]} action'
    path = f'{action_data.actions_paths[i]}/{action_data.actions[i]}'
    get_lanmarks_from_cam(path, num_record, message)

Data Reprocessing

After collecting data, it needs to be reprocessed into arrays suitable for training:

X = np.array([
    [np.array(action_data.get_action_landmarks(action, id)) for id in range(50)]
    for action in action_data.actions
])
y = np.array([[id] * 50 for id in range(len(action_data.actions))]).flatten()

Model Building

Feedforward Neural Network (FNN)

A simple FNN model is used to classify hand gestures based on the 21 hand landmarks.

Code Snippet

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Input
from sklearn.model_selection import train_test_split
from tensorflow.keras.utils import to_categorical

# Prepare data
X = X.reshape(X.shape[0], -1)  # Flatten
y = to_categorical(y)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Build model
def create_fnn_model(input_shape, num_classes):
    model = Sequential([
        Input(shape=input_shape),
        Dense(128, activation='relu'),
        Dropout(0.5),
        Dense(64, activation='relu'),
        Dropout(0.5),
        Dense(num_classes, activation='softmax')
    ])
    return model

model = create_fnn_model((X_train.shape[1],), num_classes=len(action_data.actions))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=100, batch_size=32)
model.save('fnn.keras')

Evaluation

Show learn curves

import matplotlib.pyplot as plt

# Plot training and validation accuracy
plt.plot(history.history['accuracy'], label='Train Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()

Evaluate the model on test data

test_loss, test_accuracy = model.evaluate(X_test, y_test, verbose=0)
print(f"Test Loss: {test_loss:.4f}")
print(f"Test Accuracy: {test_accuracy:.4f}")

Test Loss: 0.0044
Test Accuracy: 1.0000

Real-Time Hand Sign Recognition

The trained model is deployed to classify gestures in real-time:

import cv2

def classify(model, landmarks):
    prob = model.predict(landmarks.reshape(1, -1))
    return action_data.actions[np.argmax(prob)]

Integration with Tetris

Using the classified gestures, players can control Tetris pieces in the game. Here's how gestures map to actions in Tetris:

point_up: Rotate the piece.
point_down: Move the piece down.
left_thumb: Move the piece left.
right_thumb: Move the piece right.

The Tetris game listens to the classified gestures and updates the game state accordingly:

if action == 'left_thumb':
    piece_pos[0] -= 1
elif action == 'right_thumb':
    piece_pos[0] += 1
elif action == 'point_down':
    piece_pos[1] += 1
elif action == 'point_up':
    rotated_piece = rotate_piece(current_piece)

Too see the full implementation with Tetris coding, please consider the attached Tetris.py in the uploaded files page.

Explore attached machine_human_interact.ipynb for model development.

Conclusion

This project demonstrates the integration of computer vision and machine learning in game control. The hand sign recognition model is versatile and can be extended to other interactive applications beyond gaming.

Models

There are no models linked

Files

Dec 30, 2024●11 reads

Play Tetris without keyboard and mouse

d
Orson

Hand Sign Recognition Model for Tetris Game

This project involves building a Hand Sign Recognition Model to interact with a Tetris game. Hand gestures control the movement and rotation of Tetris pieces in real-time.

Below is a detailed guide on how the project is structured and implemented.

Overview
Action Recognition Model
- Data Generation
- Data Reprocessing
Model Building
- Feedforward Neural Network (FNN)
Real-Time Hand Sign Recognition
Integration with Tetris

Overview

This project uses a hand gesture recognition system to control shapes in a Tetris game. The gestures and their corresponding actions are:

point_up: Rotate the Tetris piece.
point_down: Move the Tetris piece down.
left_thumb: Move the Tetris piece left.
right_thumb: Move the Tetris piece right.

Action Recognition Model

Data Generation

Data is collected using a webcam with MediaPipe's hand detection library. Each gesture is recorded as a series of 21 hand landmark coordinates saved as .npy files.

Code Snippet

import os
import numpy as np

class Action_Data_Manager():
    def __init__(self):
        self.actions = ['point_up', 'point_down', 'left_thumb', 'right_thumb']
        self.root_path = './action_data'
        self.actions_paths = []

    def create_action_paths(self):
        try:
            os.mkdir(self.root_path)
        except: pass

        for action in self.actions:
            path = f'{self.root_path}/{action}'
            try:
                os.mkdir(path)
            except: pass
            self.actions_paths.append(path)

Run the webcam to collect hand landmark data:

import mediapipe as mp
import cv2

def get_landmarks_from_cam(path, num_record=50, message=''):
    count, paused = 0, False
    capture = cv2.VideoCapture(0)

    with mp_hands.Hands(min_detection_confidence=0.8, min_tracking_confidence=0.5) as hands:
        while capture.isOpened():
            if not paused:
                ret, frame = capture.read()
                if not ret: break

                image = cv2.flip(frame, 1)
                detected_image = hands.process(image)
                detected_image

                if detected_image.multi_hand_landmarks:
                    for hand_lms in detected_image.multi_hand_landmarks:
                        mp_drawing.draw_landmarks(image, hand_lms, mp_hands.HAND_CONNECTIONS)
                        landmarks = np.array([[lm.x, lm.y, lm.z] for lm in hand_lms.landmark])
                        np.save(file=path+f'{count}.npy', arr=landmarks)
                        count += 1
                        if count == num_record:
                            break

                cv2.putText(image, message, org=(10,20), fontFace=cv2.FONT_HERSHEY_SIMPLEX, fontScale=1, color=(255, 0, 0))
                cv2.imshow('Webcam', image)
            
            key = cv2.waitKey(10) & 0xFF # Wait for a key press and store the key code

            if key == ord('p'): # Press 'p' to pause or resume
                paused = not paused

            if (key == ord('q')) or (count >= num_record): # Quit when press 'q' or have enough images
                break

    capture.release()
    cv2.destroyAllWindows()

num_record = 50
for i in range(len(action_data.actions)):
    message = f'DO {action_data.actions[i]} action'
    path = f'{action_data.actions_paths[i]}/{action_data.actions[i]}'
    get_lanmarks_from_cam(path, num_record, message)

Data Reprocessing

After collecting data, it needs to be reprocessed into arrays suitable for training:

X = np.array([
    [np.array(action_data.get_action_landmarks(action, id)) for id in range(50)]
    for action in action_data.actions
])
y = np.array([[id] * 50 for id in range(len(action_data.actions))]).flatten()

Model Building

Feedforward Neural Network (FNN)

A simple FNN model is used to classify hand gestures based on the 21 hand landmarks.

Code Snippet

from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Input
from sklearn.model_selection import train_test_split
from tensorflow.keras.utils import to_categorical

# Prepare data
X = X.reshape(X.shape[0], -1)  # Flatten
y = to_categorical(y)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

# Build model
def create_fnn_model(input_shape, num_classes):
    model = Sequential([
        Input(shape=input_shape),
        Dense(128, activation='relu'),
        Dropout(0.5),
        Dense(64, activation='relu'),
        Dropout(0.5),
        Dense(num_classes, activation='softmax')
    ])
    return model

model = create_fnn_model((X_train.shape[1],), num_classes=len(action_data.actions))
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])
model.fit(X_train, y_train, validation_data=(X_test, y_test), epochs=100, batch_size=32)
model.save('fnn.keras')

Evaluation

Show learn curves

import matplotlib.pyplot as plt

# Plot training and validation accuracy
plt.plot(history.history['accuracy'], label='Train Accuracy')
plt.plot(history.history['val_accuracy'], label='Validation Accuracy')
plt.xlabel('Epochs')
plt.ylabel('Accuracy')
plt.legend()
plt.show()

Evaluate the model on test data

test_loss, test_accuracy = model.evaluate(X_test, y_test, verbose=0)
print(f"Test Loss: {test_loss:.4f}")
print(f"Test Accuracy: {test_accuracy:.4f}")

Test Loss: 0.0044
Test Accuracy: 1.0000

Real-Time Hand Sign Recognition

The trained model is deployed to classify gestures in real-time:

import cv2

def classify(model, landmarks):
    prob = model.predict(landmarks.reshape(1, -1))
    return action_data.actions[np.argmax(prob)]

Integration with Tetris

Using the classified gestures, players can control Tetris pieces in the game. Here's how gestures map to actions in Tetris:

point_up: Rotate the piece.
point_down: Move the piece down.
left_thumb: Move the piece left.
right_thumb: Move the piece right.

The Tetris game listens to the classified gestures and updates the game state accordingly:

if action == 'left_thumb':
    piece_pos[0] -= 1
elif action == 'right_thumb':
    piece_pos[0] += 1
elif action == 'point_down':
    piece_pos[1] += 1
elif action == 'point_up':
    rotated_piece = rotate_piece(current_piece)

Too see the full implementation with Tetris coding, please consider the attached Tetris.py in the uploaded files page.

Play Tetris without keyboard and mouse

Table of contents

Hand Sign Recognition Model for Tetris Game

Table of Contents

Overview

Action Recognition Model

Data Generation

Code Snippet

Data Reprocessing

Model Building

Feedforward Neural Network (FNN)

Code Snippet

Evaluation

Real-Time Hand Sign Recognition

Integration with Tetris

Conclusion

Models

Files

Play Tetris without keyboard and mouse

Table of contents

Hand Sign Recognition Model for Tetris Game

Table of Contents

Overview

Action Recognition Model

Data Generation

Code Snippet

Data Reprocessing

Model Building

Feedforward Neural Network (FNN)

Code Snippet

Evaluation

Real-Time Hand Sign Recognition

Integration with Tetris

Conclusion

Models

Files

Datasets

Datasets