This project explores the potential of Deep Learning in enhancing audio spectrograms through a Deep Convolutional Autoencoder (DCAE). The goal is to accurately reconstruct sound spectrograms while preserving their time-frequency structure, improving denoising and spectral information restoration. The model is named OMNIA, inspired by its ability to process and enhance diverse soundscapes.
As a sound engineer and AI researcher, I aimed to synthesize realistic sound representations using a CNN-based autoencoder. While reproducing a full-spectrum sound remains complex, the use of Computer Vision techniques on spectrograms helps analyze and generate structured representations of sound.
The model consists of two main components:
āļø Uses Convolutional Neural Networks (CNNs) to extract spatial features.
āļø MaxPooling layers reduce spectrogram dimensionality while retaining essential information.
āļø Encodes a compressed representation of the spectrogram.
āļø Uses UpSampling layers to restore the spectrogram resolution.
āļø Convolutional layers refine and reconstruct details.
āļø Applies Cropping to fix minor output mismatches.
Dataset Preparation
Model Training
Performance Evaluation
āļø The model successfully denoised and enhanced spectrograms.
āļø It showed promising results in restoring lost spectral details.
āļø Challenges emerged in generating fully realistic audio outputs, where low sampling frequency affected quality, leading to slightly robotic reconstructions.
āļø Further optimizations with higher resolution datasets and GAN-based approaches could improve results.
š Explore GANs for spectrogram generation.
š Improve sampling fidelity for clearer reconstructed sounds.
š Implement latent space audio synthesis for game and film sound design.
This research demonstrates the potential of AI in audio processing through spectrogram-based deep learning models. OMNIA serves as a stepping stone toward AI-driven sound design, merging the power of machine learning and digital signal processing to create richer, more immersive audio environments.
š GitHub Repository: github.com/Mike014
š¢ Feedback & Collaboration Welcome!