In recent years, the field of artificial intelligence and machine learning has experienced a dramatic increase in interest and innovation. This surge opens new possibilities in various disciplines, including computer vision. In our project, we focused on using the NVIDIA Jetson Nano device, which is equipped with a powerful graphics card optimized for artificial intelligence and computer vision applications. The goal of our work was to recognize the signals of a flag semaphore based on key points on the body using the Jetson Nano. After recognizing individual letters, we were able to decode the transmitted message. From our experiments, we gained valuable insights and discussed the practical advantages and limitations of this device. The results of our work can be found in GitHub repository.
The device we used to test the entire project was a small single-board computer, the NVIDIA Jetson Nano with a GPU. This aspect makes our project portable, and thanks to GPU-accelerated computations, it is also sufficiently fast. To speed up computations, we utilized NVIDIA's CUDA technology, which allows for efficient processing of parallel tasks. NVIDIA also provides a Software Development Kit for deep learning, which significantly facilitated the development of our application. Based on our experiments, we analyzed the results and discussed in detail the advantages and limitations of this device for practical use.
For recognizing key points on the body, we used a pre-trained model, Pose-ResNet18-Body, which can identify 18 key points on the human body, as shown in the image below. This model allowed us to track hand positions and effectively recognize individual semaphore flag signals. We developed the entire system using the Python programming language, which offers a specialized library for working with this device.
Our first task was to enable the program to recognize gestures. This involved identifying when the arms are in a straight position, indicating an attempt at a gesture. We explored various methods to define this condition and eventually found an optimal solution. We calculated the differences in the
Next, we needed to determine the angles the arms made with the vertical axis. We achieved this by calculating the differences in the x and y coordinates between the shoulder and wrist points and then using the two-argument
We also developed a condition for accurately detecting the letter the person was trying to signal. Since relying on a single frame could lead to errors, we implemented a more robust approach. We defined a variable
if is_gesture(pose): frames_to_detect = net.GetNetworkFPS() * 2.5 letter = detectLetter(pose) if i == 0: letter_first = letter if letter == letter_first: i += 1 else: i -= 1 if i > frames_to_detect: if(letter == "-"): message = message[:-1] else: message += letter i = 0 trust = i/frames_to_detect drawProgress(img, trust)
The graphical interface, shown in the video below, includes a red circle in the upper right corner that gradually fills with green depending on the variable i. If we hold a position for a sufficient amount of time, the circle fills with green, and the letter is added to the message in the upper left corner. The specific letter detected by the program is also displayed in the centre of this circle. The semaphore flag characters include a space and a backspace character, allowing us to delete letters. The graphical interface also shows the skeleton of recognized points on the human body, holding virtual flags for better imitation of reality.
The program we developed successfully detects most semaphore flag characters as shown in the whole alphabet picture below. One significant limitation occurred when the arm was positioned directly above the head. The neural network used has difficulty recognizing key points on the arm in this position. To address this, we adjusted the gesture detection condition so that it does not activate at certain angles above the head.
Another issue arose when the arm crossed the body and pointed to the opposite side. This positioning sometimes caused the program to incorrectly identify key points on the arm, leading to the misrecognition of certain letters. We implemented additional corrective measures to increase accuracy, but there is still room for improvement.
These limitations highlighted the complexity of gesture recognition and the need for continuous refinement. Future work could involve improving the algorithm to better handle extreme arm positions and exploring more advanced machine learning models. Training a custom neural network specifically for these purposes could lead to improved accuracy.
There are no datasets linked
There are no datasets linked