Catch The AI is our graduation project, it is an
Intelligent Sytem to detect AI-generated content with our advanced models. Our deep learning technology distinguishes between AI and human-authored media in images and text,
we deployed our model in a web application to make it easy to use for everyone. you can visit the website Catch The AI
Detect AI-generated content in images, text, and audio.
User-friendly web application.
Full User authentication and authorization system.
detecting history for each user.
Admin panel to manage users and their data.
Supscription system to get more features (coming soon).
Here's a concise summary for your README:
🚀 Data Sources and Datasets
This project utilizes a range of datasets to train and test the AI detection models. The datasets are categorized by the type of media they contain, including images, audio, and text. The datasets are sourced from various repositories and research projects, providing a diverse and comprehensive collection of AI-generated content for model training and evaluation.
Audio Data
Fake-or-Real Dataset (FoR): Baseline detection with genuine and fake audio samples.
Scenefake Dataset: Diverse deepfake audio clips from various techniques.
In the Wild Dataset: Real and fake audio from diverse internet sources.
ASVspoof 2019 Dataset: Authentic and spoofed audio for ASV challenges.
140k Real and Fake Faces: 70,000 real faces from Flickr and 70,000 StyleGAN-generated faces, resized to 256x256.
CelebA-HQ (256x256): 30,000 high-quality celebrity faces for model training.
Synthetic Faces High Quality (SFHQ) Part 2: 91,361 curated faces at 1024x1024, enhanced by StyleGAN2.
Face Dataset Using Stable Diffusion v1.4: Real and fake faces, resized to 256x256, using Stable Diffusion models.
Stable Diffusion Face Dataset: AI-generated faces at 512x512, 768x768, and 1024x1024 resolutions using Stable Diffusion checkpoints.
Synthetic Faces High Quality (SFHQ) Part 3: 118,358 faces at 1024x1024, generated by StyleGAN2 with advanced techniques.
Synthetic Human Faces for 3D Reconstruction: High-quality 512x512 faces generated using the EG3D model for 3D reconstruction.
Text Data
LLM Generated Essays for the Detect AI Comp: 700 essays, including 500 generated with GPT-3.5-turbo and 200 with GPT-4.
DAIGT Data - Llama 70b and Falcon 180b:
Llama Falcon v3: 7,000 LLM-generated essays.
Llama 70b v2: 1,172 LLM-generated essays.
Llama 70b v1: 1,172 LLM-generated essays.
Falcon 180b v1: 1,055 LLM-generated essays.
Persuade Corpus 2.0: Over 25,000 argumentative essays by U.S. students in grades 6-12.
DAIGT External Dataset: 2,421 student-generated texts and 2,421 AI-generated texts for balanced training data.
📊 Models selection and Initial results
Text Models
BERT: Achieved 90% accuracy but showed signs of overfitting on smaller datasets.
RoBERTa: Outperformed BERT with 99% accuracy and demonstrated better generalization.
DeBERTa: Achieved the highest accuracy of 99%, showing superior handling of complex text patterns.
Audio Models
Wav2Vec2: Excelled with a word error rate of 7% and robust anomaly detection.
Mel-spectrogram + CNN: Delivered reasonable accuracy but was less effective in detecting subtle anomalies compared to Wav2Vec2.
ResNet-based Model: Provided good results but was more computationally intensive.
Image Models
EfficientNet: Balanced accuracy and computational efficiency, achieving 99% accuracy.
ResNet: Reached 99% accuracy but required more computational resources.
Xception: Offered detailed feature extraction but was less efficient compared to EfficientNet.
Evaluation Criteria
🏆 Final Model
Text
Ensemble of RoBERTa and DeBERTa: Combines the outputs of both models and integrates them through a final linear layer to enhance overall classification performance.
Architecture:
RoBERTa Output: Captures robust language patterns.
DeBERTa Output: Provides nuanced language understanding.
Final Linear Layer: Integrates the concatenated outputs to improve classification.
🔗 For more details, about text models please check the DAIGT-Catch-the-AI
Audio
Wav2Vec2: Selected for its state-of-the-art performance in audio anomaly detection.
Image
EfficientNet: Chosen for its efficiency and high accuracy in distinguishing real from AI-generated images.
The ensemble model for text was validated on additional test datasets, confirming its robustness and ability to generalize across various scenarios. This ensemble approach demonstrated significant improvements over individual models, providing a more comprehensive understanding and classification of text inputs.
collaborators
We did not just work as a team, but we were a family. These people are truly skilled and creative. Follow them and wait for their wonderful projects, from which they learn a lot and benefit a lot of people.❤️
Romani Nasrat (Team Leader):
Ahmed Mohamed Ali
Reham Mustafa
Sara Reda Moatamed
Zeyad El-Sayed Abdel-Azim
Rawan Abdel-Aziz Ahmed
Abdalla Mohammed
Mohannad Ayman
Mohammed Abdeldayem
Contact
you can contact team leader for any questions or help via the following links