Undertook a comprehensive exploration of fake and real video datasets, employing advanced techniques in face detection, data preprocessing, and the creation of structured training, validation, and testing sets.This project holds significance as it served as the culmination of my Master's degree in Ottawa in 2023.
The deepfake detection system utilizes a multi-stage approach involving data preprocessing, feature extraction, deep learning-based classification, and a user-friendly web interface. It employs state-of-the-art algorithms to distinguish between authentic and manipulated videos, addressing the challenge of deepfake proliferation.
Data Collection
Data Exploration
Number of Fake Videos: 1000 Number of Real Videos: 1000
Video Processing
Capture one frame every 1 seconds Total number of videos: 1999 Total number of frames: 16370 Average frames per video: 8.189094547273637
Capture one frame every 2 seconds Total number of videos: 1999 Total number of frames: 7965 Average frames per video: 3.9844922461230614
Capture one frame every 4 seconds Total number of videos: 1999 Total number of frames: 3258 Average frames per video: 1.629814907453727
Video ID
, Frame ID
, Video Label
.cvlib
, resizing images to 300x300, and drawing rectangles around faces.Data Preprocessing
LabelEncoder
.Data Preparation
Normalized Frame
data and Labels
columns to TensorFlow tensors.Model Creation and Training
ResNet50, InceptionResNetV2, MobileNetV2, VGG16 models pre-trained on ImageNet.
Transfer learning with specific architectures(custom Layers)
x = GlobalAveragePooling2D()(resnet_model.output) x = Dense(512, activation='relu')(x) x = Dropout(0.5)(x) x = Dense(2, activation='softmax')(x)
Model compilation:
custom_optimizer = Adam(learning_rate=0.0001) model.compile(optimizer=custom_optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
lr_schedule = ExponentialDecay(initial_learning_rate, decay_steps=100000, decay_rate=0.96, staircase=True) optimizer = Adam(learning_rate=lr_schedule) model.compile(optimizer=custom_optimizer, loss='categorical_crossentropy', metrics=['accuracy'])
sgd = SGD(lr=0.0001) # Stochastic Gradient Descent optimizer with a specific learning rate vgg_model_transfer.compile(loss='categorical_crossentropy', optimizer=sgd, metrics=['accuracy']) # Compile the model
Training details: epochs, batch size, early stopping.
epochs=100 batch_size=32 learning rate= 0.00001 early_stopping = EarlyStopping(monitor='val_loss', patience=7, restore_best_weights=True)
Evaluation and Result Analysis
Confusion matrix for video label determination: Calculated based on a specific threshold for determining the video label (REAL or FAKE) from the predicted frames.
Prediction Threshold:
Categorization of Videos:
Comparison with Actual Labels:
Here is an example illustrating our evaluation process on the ResNetV2 model using the Test Set in one frame per 1 sec. The green column (Actual Label) contains the known actual labels of each video, while the red column (Model Decision) is derived from the two blue columns (Predicted Fake Count, Predicted Real Count).
+ 1-Second Superiority Selecting a One frame per 1-second duration for video processing is recommended due to its consistent high training and validation accuracy across different models (ResNetV2, InceptionResNetV2, MobileNetV2, VGG-16).This duration strikes a balance between capturing essential temporal information, ensuring better generalization, and reducing computational load for improved efficiency in training and inference. ```python ```
Cross-Validation
Soft Voting
Hyperparameters Tuning: on Chapion Model MobileNetV2 Model
Different Learning Rates: with batch size 32 and early stop after 5 epochs.
Different Batch Sizes: with Learning Rate 10^(-4) and early stop after 5 epochs.
Different number of epochs in early stop: with Learning Rate 10^(-4) and batch size =32.
Overall Comparison and The Superior Model
- Save the superior model for further development.
There are no models linked
There are no datasets linked
There are no datasets linked