Environmental Sound Classification: Vision Transformers and CNNs in Action
πΏπ΅ Welcome to Environmental Sound Classification (ESC)βa cutting-edge project leveraging transformer-based architectures and Convolutional Neural Networks (CNNs) to tackle the unique challenges of environmental sound classification. By utilizing advanced models such as Vision Transformers (ViT) and Audio Spectrogram Transformers (AST), this project introduces innovative methodologies to revolutionize how we process environmental audio data.
π― Project Highlights
State-of-the-Art Models: Demonstrates the potential of ViT and AST for environmental sound classification.
Superior Accuracy: AST achieved a remarkable 88.35% validation accuracy, outperforming CNNs and ViTs.