Camvisiotech - AIOT Security System

Abstract

CamVisioTech is an AI-powered IoT-based security system project, collaboratively developed to implement advanced surveillance features with iterative enhancements. Each version builds upon its predecessor, integrating cutting-edge technologies for home and office security.
Camvisiotech has been a journey from a simple ESP32 CAM to a powerful Edge AI system. Starting as a third-semester project, it began with a basic security system using dlib. Recognizing its potential, I teamed up with a friend to develop Camvisiotech 2.0, running on an ESP32 CAM with optimized synchronization and improved ML models. This version won us Bytecraft at IIIT Nagpur TechFest 2023. Inspired by the Maixduino’s AI capabilities, we started Camvisiotech 3.0—a standalone, AI-driven face detection and recognition solution. With WiFi and GSM connectivity, it now operates autonomously on EDGE, suitable for rural and remote areas with limited connectivity, making edge AI accessible in low-resource environments.

CamVisioTech MK-0: ESP32-CAM Security System

- Introduction :

The system utilizes facial recognition to control a solenoid lock, unlocking the door for 3 seconds when a known person is recognized. If an intruder is detected, the system sends an alert via Telegram and activates a buzzer. Additionally, the system offers a GUI application for monitoring and control, and a custom desktop application built with Python and Tkinter to manage both the facial recognition and alert system.

- Features :

Face Recognition: Captures and processes images via the ESP32-CAM and compares them against pre-stored face encodings.
Door Control: Utilizes a solenoid lock controlled via a relay to lock/unlock the door based on successful face recognition.
Intruder Alert System: Sends the intruder’s image to the user’s Telegram account and sounds an alarm if an unauthorized face is detected.
Python-based GUI: Includes both a high-latency (old-school) and low-latency modern Python GUI for user-friendly system control.

- Requirements :

1. Hardware

ESP32-CAM microcontroller
Solenoid lock, relay module, buzzer
IC-7805 voltage regulator
Additional components: diode, transistor, resistors, and capacitors.

2. Software

Arduino IDE for programming ESP32-CAM.
Python with libraries like opencv-python, face_recognition, and customtkinter.
Telegram Bot: For sending notifications in case of an intruder alert.

- Methodology :

1. Face Recognition

The ESP32-CAM captures images at regular intervals and sends them to the server. The server compares the captured images with pre-stored face encodings to authenticate users.

def process_frame():
    # Fetch the image from the specified URL
    img_resp = urllib.request.urlopen(url)
    
    # Convert the image data into a NumPy array
    imgnp = np.array(bytearray(img_resp.read()), dtype=np.uint8)
    
    # Decode the image array into an OpenCV image
    img = cv2.imdecode(imgnp, -1)
    
    # Resize the image to 25% of its original size for faster processing
    imgS = cv2.resize(img, (0, 0), None, 0.25, 0.25)
    
    # Convert the image from BGR (OpenCV default) to RGB (required by face_recognition)
    imgS = cv2.cvtColor(imgS, cv2.COLOR_BGR2RGB)

    # Detect face locations in the resized image
    facesCurFrame = face_recognition.face_locations(imgS)
    
    # Compute face encodings for the detected faces
    encodesCurFrame = face_recognition.face_encodings(imgS, facesCurFrame)

    # Loop through each detected face and its encoding
    for encodeFace, faceLoc in zip(encodesCurFrame, facesCurFrame):
        # Compare the detected face encoding with known encodings
        matches = face_recognition.compare_faces(encodeListKnown, encodeFace)
        
        # Calculate the distance between the detected face and known faces
        faceDis = face_recognition.face_distance(encodeListKnown, encodeFace)
        
        # Find the index of the closest match
        matchIndex = np.argmin(faceDis)

        # If the face matches a known person
        if matches[matchIndex]:
            # Retrieve the name of the person
            name = classNames[matchIndex].upper()
            
            # Extract the face location and scale back to the original image size
            y1, x2, y2, x1 = faceLoc
            y1, x2, y2, x1 = y1 * 4, x2 * 4, y2 * 4, x1 * 4
            
            # Draw a green rectangle around the recognized face
            cv2.rectangle(img, (x1, y1), (x2, y2), (0, 255, 0), 2)
            
            # Draw a filled rectangle below the face for the name label
            cv2.rectangle(img, (x1, y2 - 35), (x2, y2), (0, 255, 0), cv2.FILLED)
            
            # Display the person's name on the image
            cv2.putText(img, name, (x1 + 6, y2 - 6), cv2.FONT_HERSHEY_COMPLEX, 1, (255, 255, 255), 2)
            
            # Trigger an action (e.g., turning on a device or sending a notification)
            requests.get(urlOn)
        else:
            # If the face is not recognized, treat it as an intruder
            y1, x2, y2, x1 = faceLoc
            y1, x2, y2, x1 = y1 * 4, x2 * 4, y2 * 4, x1 * 4
            
            # Draw a red rectangle around the unrecognized face
            cv2.rectangle(img, (x1, y1), (x2, y2), (0, 0, 255), 2)
            
            # Draw a filled rectangle below the face for the label
            cv2.rectangle(img, (x1, y2 - 35), (x2, y2), (0, 0, 255), cv2.FILLED)
            
            # Display the "Intruder!!!" warning on the image
            cv2.putText(img, "Intruder!!!", (x1 + 6, y2 - 6), cv2.FONT_HERSHEY_COMPLEX, 1, (255, 255, 255), 2)
            
            # Print an intruder warning in the console
            print("Intruder")
            
            # Send the image with the intruder to telegram
            sendPhoto(img)
            
            # Trigger an alert action (e.g., sounding a buzzer)
            requests.get(buzzOn)

2. Intruder Detection

If a face does not match the stored faces, the system triggers an intruder alert.

An image of the intruder is sent to the user’s Telegram account.
The buzzer is activated for 1.5 seconds as a local alert.

3. GUI Application

The GUI application allows users to:

View the real-time camera feed at different resolutions.
Control the solenoid lock (unlock for 3 seconds).
Trigger the buzzer manually if needed.

- Models :

The face recognition is carried out using face_recognition python library, built using dlib's state-of-the-art face recognition built with deep learning. The model has an accuracy of 99.38% on the Labeled Faces in the Wild benchmark.

1. Face Detection

The library uses a convolutional neural network (CNN) or a Histogram of Oriented Gradients (HOG) model to detect faces in an image. It identifies the coordinates of bounding boxes for each detected face.

2. Face Encoding

A detected face is transformed into a high-dimensional numerical representation called a "face encoding." This is achieved by analyzing the unique facial features and extracting a fixed-length vector for each face.

3. Face Comparison

The library can compare face encodings to determine similarity. This is achieved using distance metrics such as Euclidean distance.
If the distance between two face encodings is below a certain threshold, the faces are considered a match.

4. Known and Unknown Face Identification

The library allows maintaining a database of face encodings (representing known faces). Detected faces can be matched against this database to identify known individuals or flag unknown ones.

- Conclusion :

In conclusion, the face_recognition library simplifies implementing advanced face detection and recognition in Python, enabling applications such as security, access control, and surveillance. The project’s ability to send intruder alerts via Telegram and activate a buzzer enhances its utility as a reliable security system. Its modular design, including Python-based GUIs and a streamlined workflow, makes it a practical and scalable solution for modern smart security needs.

- References :

CamVisioTech MK-1: AI-Driven Smart Security Camera

- Introduction :

The enhanced version introduces real-time object detection and a Flask web application:

Haar Cascade: Utilized for object and face detection.
Alerts via Email & Telegram: Notifications for security breaches.
Flask Web App: Streams live video feed with overlays.

- Features :

Combines face recognition and object detection.
Real-time alerts for unauthorized access.
Web-based video streaming with detection overlays.

- Requirements :

1. Hardware

ESP32CAM module
Similar components as MK-0 with enhanced integration.

ckt (1).png

2. Software

Python and Flask for backend and web streaming.
Libraries for detection: opencv, face_recognition.
PySerial (for serial communication with Arduino or ESP32)

- Methodology :

USER (1).png

def generate_frames():
    global continuous_zeros, last_notification_time

    while True:
        try:
            # Capture the image from the ESP32-CAM's URL
            img_response = urllib.request.urlopen(url)
            img_np = np.array(bytearray(img_response.read()), dtype=np.uint8)
            frame = cv2.imdecode(img_np, -1)

            # Convert the frame to RGB for processing
            image_rgb = cv2.cvtColor(frame, cv2.COLOR_BGR2RGB)

            # Detect faces in the frame using MediaPipe Face Detection
            with mp_face_detection.FaceDetection(
                model_selection=0, min_detection_confidence=0.5) as face_detection:
                results = face_detection.process(image_rgb)

                # Initialize flags for recognized and unknown faces
                recognized = False
                unknown = False

                if results.detections:
                    for detection in results.detections:
                        bboxC = detection.location_data.relative_bounding_box
                        ih, iw, _ = frame.shape
                        x, y, w, h = (int(bboxC.xmin * iw), int(bboxC.ymin * ih),
                                      int(bboxC.width * iw), int(bboxC.height * ih))

                        # Crop and resize the face region for face recognition
                        face_image = frame[y:y+h, x:x+w]
                        if face_image.shape[0] > 0 and face_image.shape[1] > 0:
                            imgS = cv2.resize(face_image, (0, 0), None, 0.25, 0.25)
                            imgS = cv2.cvtColor(imgS, cv2.COLOR_BGR2RGB)
                            facesCurFrame = face_recognition.face_locations(imgS)
                            encodesCurFrame = face_recognition.face_encodings(imgS, facesCurFrame)

                            for encodeFace, faceLoc in zip(encodesCurFrame, facesCurFrame):
                                matches = face_recognition.compare_faces(encodeListKnown, encodeFace)
                                if any(matches):
                                    recognized = True
                                    name = classNames[matches.index(True)]
                                    # Draw bounding box and label for recognized face
                                    y1, x2, y2, x1 = [v * 4 for v in faceLoc]
                                    y1 += y; x2 += x; y2 += y; x1 += x
                                    cv2.rectangle(frame, (x1, y1), (x2, y2), (0, 255, 0), 2)
                                    cv2.putText(frame, name, (x1 + 6, y2 - 6),
                                                cv2.FONT_HERSHEY_SIMPLEX, 1, (255, 255, 255), 2)
                                else:
                                    unknown = True
                                    # Draw bounding box for unknown face
                                    cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 0, 255), 2)

            # Determine the displayed text based on face recognition
            if recognized or not unknown:
                displayed_text = "1"
                continuous_zeros = 0
            else:
                displayed_text = "0"
                continuous_zeros += 1

            # Trigger notifications if no faces are detected for a certain duration

            if continuous_zeros >= 9:  # Approximately 3 seconds (3 frames/sec)
                current_time = datetime.datetime.now()
                if last_notification_time is None or (current_time - last_notification_time).total_seconds() >= 15:
                    last_notification_time = current_time
                    trigger_buzzer()
                    lock_door()
                    send_notification("Motion Detected!",
                                      "Someone has entered the frame. Check the link for details:\n\n"
                                      "https://mohittalwar23.github.io/PythonSystemTest/")

            # Overlay text and timestamp on the frame
            cv2.putText(frame, displayed_text, (10, 30),
                        cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)
            timestamp = datetime.datetime.now().strftime("%Y-%m-%d %H:%M:%S")
            cv2.putText(frame, timestamp, (frame.shape[1] - 300, 30),
                        cv2.FONT_HERSHEY_SIMPLEX, 0.6, (0, 0, 255), 2)

            # Encode and yield the frame as a JPEG stream
            _, buffer = cv2.imencode('.jpg', frame)
            frame = buffer.tobytes()
            yield (b'--frame\r\nContent-Type: image/jpeg\r\n\r\n' + frame + b'\r\n')

        except Exception as e:
            print(f"Error: {e}")

- Models :

1. What is Haar Cascade?

Haar Cascades are machine learning object detection algorithms used to identify objects in images or videos. The technique was introduced by Paul Viola and Michael Jones in their 2001 research paper Rapid Object Detection using a Boosted Cascade of Simple Features. The algorithm uses a cascade function trained from a lot of positive and negative images to detect objects, primarily faces.

It works in stages, where each stage applies increasingly complex filters to identify the features of the object of interest.

2. Object Detection and Face Recognition

In this project, we'll combine Haar Cascades for face detection and face recognition using the face_recognition library. The system will recognize known individuals and trigger a response if an unknown face is detected.

For more information on Haar Cascades, check the following free resources:

- Conclusion :

By incorporating features such as real-time object detection, live video streaming, automated alerts, and door-locking mechanisms, this project provides a comprehensive solution for modern security needs. The detailed implementation steps and modular design make it a great resource for learning and extending into advanced AI-powered applications.

- References :

CamVisioTech MK-2: Advanced Security via Edge AI

- Introduction :

The MK-2 version of CamVisioTech moves beyond conventional surveillance by leveraging Edge AI for on-device processing. Unlike earlier iterations, it focuses on executing models locally, enhancing privacy, reliability, and efficiency, even in low-bandwidth environments.

- Features :

YOLOv2 Integration: Enables precise object detection and activity recognition directly on the device.
Edge AI Processing: All inference operations are performed on the hardware itself (Maixduino), eliminating the need for cloud-based computations.
Multi-Connectivity Support: Offers Wi-Fi or GSM for sending real-time alerts and notifications.
Actuator Integration: Supports on-site physical responses such as buzzer alerts or relay-controlled actions.
Alerts are dispatched via Wi-Fi or GSM, with extensibility to third-party apps like Telegram using platforms such as IFTTT or PipeDream.

- Requirements :

1. Hardware

Maixduino RISC-V + AI Kit: AI-capable development board with integrated ESP32 for Wi-Fi and Bluetooth capabilities.
OV2640 Camera Module: High-quality imaging for real-time object detection.
Buzzer: For audio alerts.
Breadboard & Jumper Wires: For modular connections.
2.4-inch TFT Display: For on-device status monitoring.

Screenshot 2024-12-30 222620.jpg

2. Software

MicroPython: Lightweight scripting language for development.
MaixPy IDE: To build and deploy applications on the Maixduino hardware.
kflash_gui: For uploading firmware and pre-trained models.
uPyLoader: For accessing, updating, and managing files on the device.
YOLOv2 Model Files: Optimized for real-time inference on the Maixduino platform.

- Methodology :

1. AI Model Training and Deployment

The Maixduino Kit, which includes an AI-capable microcontroller and camera, functions as the primary processing unit. It utilizes three core .smodel files trained via MaixHub.com—a model for Face(s) Detection, a model for Face Landmark Detection, and Feature Extraction. These models process and identify known faces, storing features as recognized persons (e.g., "Person 1," "Person 2") in arrays for rapid matching. The setup allows for the simultaneous recognition of multiple faces, enabling quick identification of individuals stored in the system.

2.Intruder Detection and Actuator Control

When an unrecognized person (not in the known faces array) is detected, the system categorizes them as an intruder. It triggers an immediate response, such as sounding a buzzer or activating an LED indicator. The system is also capable of relay control which also allows integration with other actuators, such as door locks, providing instant, real-time physical security.

3.Communication and Alerts

For reliable notifications, the system can use either Wi-Fi or GSM connectivity to send alerts, ensuring communication in areas with variable internet access. Integrating with third-party messaging apps, like Telegram. This multifaceted communication ensures that users are informed wherever they are, making the system suitable for both urban and remote applications.

Table of contents

Abstract

CamVisioTech MK-0: ESP32-CAM Security System

- Introduction :

- Features :

- Requirements :

1. Hardware

2. Software

- Methodology :

1. Face Recognition

2. Intruder Detection

3. GUI Application

- Models :

1. Face Detection

2. Face Encoding

3. Face Comparison

4. Known and Unknown Face Identification

- Conclusion :

- References :

CamVisioTech MK-1: AI-Driven Smart Security Camera

- Introduction :

- Features :

- Requirements :

1. Hardware

2. Software

- Methodology :

- Models :

1. What is Haar Cascade?

2. Object Detection and Face Recognition

- Conclusion :

- References :

CamVisioTech MK-2: Advanced Security via Edge AI

- Introduction :

- Features :

- Requirements :

1. Hardware

2. Software

- Methodology :

1. AI Model Training and Deployment

2.Intruder Detection and Actuator Control

3.Communication and Alerts

- Models :

- Face Detection

Purpose: Detects faces in the camera feed, identifying multiple faces at once.

Model Type: YOLOv2 object detection model, optimized for real-time processing on the Maixduino’s hardware.

Outcome: Locates face bounding boxes to trigger further analysis.

- Face Landmark Detection

Purpose: Identifies specific facial landmarks (e.g., eyes, nose, mouth) within detected faces.

Model Type: Keypoint detection model, enabling finer facial structure identification.

Outcome: Provides a precise facial map, aiding in consistent feature extraction for recognition.

- Feature Extraction

Purpose: Extracts unique facial features from detected landmarks to distinguish individual identities.

Model Type: Embedding model that outputs feature vectors for each detected face.

Outcome: Compares these vectors with stored profiles, recognizing known individuals and categorizing unknown ones as intruders.

- References :

Table of contents

Datasets

Datasets

Code

Code