πΏπ΅ Welcome to Environmental Sound Classification (ESC)βa cutting-edge project leveraging transformer-based architectures and Convolutional Neural Networks (CNNs) to tackle the unique challenges of environmental sound classification. By utilizing advanced models such as Vision Transformers (ViT) and Audio Spectrogram Transformers (AST), this project introduces innovative methodologies to revolutionize how we process environmental audio data.
Environmental sound classification presents unique challenges:
This project hypothesizes that transformer-based models can outperform or match CNNs in accuracy and efficiency for ESC tasks.
| Model | Validation Accuracy |
|---|---|
| CNN (ResNet-50) | 60% |
| Vision Transformer (ViT) | 40% |
| Audio Spectrogram Transformer (AST) | 88.35% |
See GitHub
See GitHub
See GitHub
CNN (ResNet-50):
Vision Transformer (ViT):
Audio Spectrogram Transformer (AST):
Interested in our work? Feel free to:
π Thank you for exploring Environmental Sound Classification! Together, let's redefine audio analytics for a better future. π¦π§β¨