Face Sight is an AI-powered system for real-time face detection and recognition, combining traditional computer vision with deep learning techniques. The project uses OpenCV for face detection and a fine-tuned InceptionV3 model to generate face embeddings for recognition. Designed for real-time applications such as surveillance, access control, and user authentication, the system can detect and recognize faces live through a webcam with high accuracy and low latency. The solution is lightweight and runs effectively on a CPU, making it accessible for deployment on standard hardware.
The system follows a modular pipeline architecture consisting of the following components:
Face Detection:
Uses OpenCV’s Haar Cascade Classifier to detect faces in real-time video streams from a webcam. Detected face regions are cropped and preprocessed for embedding generation.
Preprocessing:
Faces are resized and normalized before being passed into the deep learning model. Basic augmentation techniques are applied during user registration to improve generalization.
Face Embedding with InceptionV3:
A pre-trained InceptionV3 model (with the top layer removed) is used to extract 128-dimensional feature vectors (embeddings) from each face image. These embeddings are stored in a local database (e.g., JSON or pickle file) alongside the associated user ID.
Recognition Engine:
During inference, the system calculates the cosine similarity between the real-time face embedding and stored embeddings. If the similarity exceeds a predefined threshold, the identity is confirmed.
Real-Time Display:
Recognized faces are displayed on the webcam feed with bounding boxes and labels. Unknown faces are optionally saved for future registration.
The system was tested under various lighting conditions and face orientations to ensure robustness and responsiveness. Here are the key results:
Detection Accuracy: 97%+ in frontal face conditions using Haar Cascade
Recognition Accuracy: ~95% on registered faces using cosine similarity
Latency: Processes webcam video at ~20 frames per second on standard CPU (Intel i5, no GPU)
Scalability: Supports multiple users with efficient real-time performance
Robustness: Maintains high accuracy despite minor variations in face angle or lighting
The system demonstrated strong generalization to unseen samples and quick response time, proving its potential for real-world applications like smart home security, automated attendance, and contactless access control.