Building an Object Detection System with OpenCV's Haar Cascade Classifier and Docker Using Python
An overview of object detection using Haar cascade classifier
Object detection is a critical task in computer vision that enables other processes to be applied effectively. It involves not only identifying objects but also localizing their positions within an image or video frame. The use cases for object detection are diverse, ranging from face recognition on smartphones to obstacle and vehicle detection in autonomous driving. This variety of applicatioFor a more comprehensive guide, you can refer to the official OpenCV website.ns highlights the essential role of object detection in numerous fields.
One popular method for object detection is the Haar Cascade Classifier, a machine learning-based approach introduced by Paul Viola and Michael Jones in their 2001 paper, "Rapid Object Detection using a Boosted Cascade of Simple Features." This method applies a series of classifiers to an image at various scales and positions to detect and localize objects.
The Haar Cascade Classifier works by extracting features from an image and comparing them with pre-trained features. The classifier is called a "cascade" because it involves multiple stages, where each stage consists of several weak learners (simple classifiers) that, when combined, create a strong classifier. This cascading structure allows for efficient detection, as it quickly discards areas of the image that are unlikely to contain the object of interest, focusing computational resources on more promising regions. More about this algorithm can be found in opencv official website
In this article, we'll guide you through the process of training your own Haar Cascade Classifier using OpenCV, a powerful computer vision library, and Docker, which helps us to deploy a semi virtual machine. We'll use Python as our programming language, ensuring that the process is both accessible and reproducible. And at the end, we will be able to detect our rubber duck within an image just by looking at it.
The first step in training a Haar Cascade Classifier is to gather positive images (containing the desired object) and negative images (without the desired object). These images are essential for training the algorithm to accurately detect the object - in this case, a rubber duck.
Role of Docker in training
You can download and install Docker from its official website, depending on you operating system.
In the training process, you'll need three OpenCV commands:
opencv_annotation
opencv_createsamples
opencv_traincascade
These commands come with OpenCV, and if you have OpenCV installed on your machine, you don't need to follow this step. However, if you're using OpenCV with Python, it may not be installed system-wide.
One approach is to install OpenCV locally to access these commands. However, if you prefer not to clutter your filesystem or need a more manageable way to use the package, Docker provides an excellent solution. With Docker, you can use these OpenCV commands without installing them directly on your system. To use this approach, you need to pull the relevant Docker image and ensure it has been downloaded completely.
docker pull spmallick/opencv-docker
In this case, I used approximately 200 images, each containing one or more rubber ducks.
Three objects in a solid backgroundTwo objects with a different backgroundIt's better if objects appear in a variety of backgrounds and environments.
After gathering the required number of positive images, you'll need to create a file containing information about all the images in the following format:
IMAGE_NAME <number_of_objects>
For example, if an image contains a single object, the format should be as follows:
positives/IMAGE_1.jpg 1 124 75 54 76
Or if an image contains two objects, it should be demonstrated like this:
positives/IMAGE_2.jpg 2 85 64 32 65 234 675 45 76
Finding objects inside an image manually can be time-consuming. To streamline this process, you can use third-party software. One tool that is particularly suited for this task comes from OpenCV and can be used by running a simple command. To do this, run a Docker container with the image you just downloaded using the following command:
docker run -e DISPLAY=
-v /tmp/.X11-unix:/tmp/.X11-unix: The -v flag mounts a volume from the host machine into the container. This specific mount links the X11 Unix socket from the host (/tmp/.X11-unix) to the container. It allows the graphical applications inside the container to access the X server on the host machine for displaying graphical interface.
-v /path/to/your/project/directory:/workspace: This is another volume mounting command that links a specific directory on your host machine to a directory inside the container. This setup is important for making the generated files persistent.
-it: The -i flag stands for interactive, and -t stands for terminal. Together, -it makes the container interactive and attaches a terminal session, allowing you to interact with it via command line.
By running this command, you will enter the terminal of the container, which has OpenCV installed. At this point, you can use the following command to open the annotator tool.
opencv_annotation --annotations=/path/to/annotations/file.txt --images=/path/to/image/folder/
The annotations parameter refers to the final .txt file, while the images parameter specifies the directory containing all the positive images.
If you encounter an error related to accessing the display, simply exit the container and type xhost + to allow all software to use the X server.
Now you can draw rectangles around all objects in each image using the following shortcuts in the annotation tool:
c: Accept the current selection.
d: Discard the last selection.
n: Move to the next image.
Once all the images are annotated, a .txt file will be generated, indicating that you're ready to proceed to the next step. Congratulations!
Creating a vector
A vector is a file, often with the .vec extension, that contains processed training samples. These samples are used during the training process to create the classifier. To generate a vector file, use the following command in the same container terminal:
opencv_createsamples -info positives.txt -num <number_of_samples> -w -h -vec positives.vec
opencv_createsamples: This is the OpenCV utility that creates the vector file from annotated positive samples.
-info positives.txt: Specifies the file that contains information about the positive samples.
-num <number_of_samples>: Specifies the number of samples you want to generate. This should match the number of annotated images.
Now that you have everything ready with the positive images, it's time to move on to the next step
Negative images
Negative images in object detection are images that do not contain the object you want to detect. They are used to train the classifier to distinguish between the presence and absence of the object. By including the variety of negative images, the classifier learns to ignore backgrounds and other irrelevant features, focusing only on the object of interest.
To gather negative images, you should capture images of objects or environments that you want the classifier to ignore. Generally the number of negative images must be more than positives. So its better to have this amount of images:
Low complexity: 500 to 1000 images
Medium complexity: 1000 to 2000 images
High complexity: 2000 to 3000 or more images
The good news is that you don't need to do anything else for negative images except create a file containing their paths, which can be named negatives.txt, for example. In Linux, you can list all negative image filenames into a single .txt file using the following command:
find <negative_images_directory> -name '*.jpg' > negatives.txt
We are done here, lets move on to the training step.
Training
Now that we have all the materials we created, use the following command in the Docker container terminal to start the training:
opencv_traincascade -data <output_directory> -vec positives.vec -bg negatives.txt -numPos <number_of_positive_samples> -numNeg <number_of_negative_samples> -numStages <number_of_stages> -w -h
-data <output_directory>: Directory where the trained classifier will be saved.
-vec positives.vec: The .vec file containing positive samples
-bg negatives.txt: File listing paths to negative images.
-numPos <number_of_positive_samples>: Number of positive samples used
-numNeg <number_of_negative_samples>: Number of negative samples used
-numStages <number_of_stages>: Number of stages in cascade. more stages takes more time and creates more accurate detector.
-w and -h : Width and height of the training images. (It must be equal to the size of vector creation command)
The time required for this command to execute depends on your machine's resources, the number of samples, window size, number of stages, and other factors.
Final result
After executing this command, the directory specified with the -data parameter will contain a number of XML files equal to the number of stages plus two additional XML files: params.xml and cascade.xml. The final generated file, cascade.xml, contains the trained data needed for detection. You can use that file in the following code to detect objects.
import cv2
cascade_path = './classifier/cascade.xml'
cascade = cv2.CascadeClassifier(cascade_path)
image_path = './positives/resized/172.jpg'
image = cv2.imread(image_path)
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)
objects = cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5)
for (x, y, w, h) in objects:
cv2.rectangle(image, (x, y), (x+w, y+h), (0, 0, 255), 4)
cv2.imshow('Object Detection', image)
cv2.waitKey(0)
cv2.destroyAllWindows()
Or to detect objects from your webcam, you can use this code:
import cv2
cascade = cv2.CascadeClassifier('./classifier/cascade.xml')
cap = cv2.VideoCapture(0)
while True:
ret, frame = cap.read()
if not ret:
print("Error: Failed to capture image.")
break
gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)
objects = cascade.detectMultiScale(gray, scaleFactor=1.1, minNeighbors=5)
for (x, y, w, h) in objects:
cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 0, 255), 2)
cv2.imshow('Object Detection', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break
cap.release()
cv2.destroyAllWindows()
The final result will looks like this
Detected objectsThat's it - congratulations, fellow geeks!