πΏπ΅ Welcome to Environmental Sound Classification (ESC)βa cutting-edge project leveraging transformer-based architectures and Convolutional Neural Networks (CNNs) to tackle the unique challenges of environmental sound classification. By utilizing advanced models such as Vision Transformers (ViT) and Audio Spectrogram Transformers (AST), this project introduces innovative methodologies to revolutionize how we process environmental audio data.
π― Project Highlights
- State-of-the-Art Models: Demonstrates the potential of ViT and AST for environmental sound classification.
- Superior Accuracy: AST achieved a remarkable 88.35% validation accuracy, outperforming CNNs and ViTs.
- Sustainability Applications: Enhances ecological monitoring, biodiversity research, and urban soundscape analysis.
π§ͺ Problem Statement
Environmental sound classification presents unique challenges:
- Environmental sounds are often polyphonic and lack stable temporal structures.
- Traditional approaches using CNNs are limited in capturing long-range dependencies.
This project hypothesizes that transformer-based models can outperform or match CNNs in accuracy and efficiency for ESC tasks.
π Key Results
Model | Validation Accuracy |
---|
CNN (ResNet-50) | 60% |
Vision Transformer (ViT) | 40% |
Audio Spectrogram Transformer (AST) | 88.35% |
Visualized Results
CNN Training Loss
See GitHub
ViT Validation Accuracy
See GitHub
AST Training and Validation
See GitHub
π Methodology
π Data Preparation
- Dataset: Bird sound recordings from 20 species.
- Preprocessing: Audio converted into mel spectrograms using the Librosa library.
- Augmentation: Normalization and scaling to enhance features.
π§ Model Architectures
-
CNN (ResNet-50):
- Tuned for ESC with pretrained weights.
- Observed overfitting after 60 epochs.
-
Vision Transformer (ViT):
- Adapted for spectrogram inputs.
- Struggled with small datasets due to lack of inductive biases.
-
Audio Spectrogram Transformer (AST):
- Fine-tuned on AudioSet for spectrogram-specific tasks.
- Leveraged overlapping patches for nuanced feature extraction.
π Metrics and Validation
- Validation Accuracy: Assessed model performance.
- Training/Validation Loss Trends: Evaluated convergence and overfitting.
- Statistical Testing: Welch's t-tests validated AST's superior performance.
π Innovations
- AST demonstrated superior capability in audio classification with spectrogram-specific designs.
2οΈβ£ Sustainable Solutions
- Enables non-invasive environmental monitoring and public health applications.
3οΈβ£ Multimodal Applications
- Potential for combining audio and visual data for enhanced analytics in ESC.
π¬ Engage with Us!
Interested in our work? Feel free to:
π Thank you for exploring Environmental Sound Classification! Together, let's redefine audio analytics for a better future. π¦π§β¨