This project leverages Generative Adversarial Networks (GANs) to revolutionize the process of image colorization, transforming grayscale images into vibrant, realistic color outputs. By automating a traditionally manual task, this solution highlights the power of AI in achieving professional-grade results with minimal human intervention.
The approach combines a U-Net-based generator with a CNN-based discriminator to ensure high-quality outputs through adversarial learning. Utilizing a robust dataset and state-of-the-art training techniques, the model achieves remarkable accuracy and consistency, with significant applications across historical restoration, media production, creative design, and research.
This project is a testament to the capabilities of AI in reshaping creative workflows and paving the way for further advancements in computer vision.
The advancement of artificial intelligence and deep learning has revolutionized how we interact with and manipulate digital content. One such area where AI has shown remarkable promise is image colorization—the process of adding vibrant, realistic colors to grayscale images. Traditionally, colorization was a time-intensive task requiring expert knowledge of color theory and a contextual understanding of the image's content. However, with the emergence of Generative Adversarial Networks (GANs), this challenge is being addressed with automation, accuracy, and efficiency.
This project explores the development of an AI-powered image colorization model, leveraging GANs to transform grayscale images into richly detailed and natural-colored visuals. At the core of the model are two key components: a Generator, responsible for predicting and applying colors to grayscale inputs, and a Discriminator, which evaluates the realism of the generated output, enabling the system to iteratively improve over time.
Our solution is built on a robust training pipeline using PyTorch and trained on a carefully curated dataset. The model not only achieves impressive results in generating realistic color images but also has broad applications across industries such as historical restoration, media production, creative design, and education. This project is a step forward in demonstrating the potential of AI to transform creative processes, automate complex workflows, and deliver solutions that are both scalable and impactful.
This report outlines the methodology, innovations, challenges, and future opportunities stemming from this project, offering a glimpse into how AI is redefining the boundaries of what's possible in image processing.
The primary objective of this project is to develop an AI-driven system capable of automatically colorizing grayscale images with high accuracy, realism, and efficiency. The goal is to minimize the need for human intervention while ensuring natural and vibrant results that can be applied across various domains, including:
By leveraging Generative Adversarial Networks (GANs), the project aims to create a scalable and adaptable framework for image colorization that meets the demands of real-world applications.
The methodology for this project is structured around key phases, integrating advanced deep learning techniques and a systematic approach to achieve optimal results:
1.1. Generator (U-Net Architecture):
1.2. Discriminator:
This project introduces several groundbreaking innovations that redefine image colorization:
End-to-End Automation:
The system eliminates the need for manual efforts, seamlessly transforming grayscale images into realistic and vibrant colorized versions.
Generative Adversarial Learning:
By leveraging the dynamic interplay between the generator and discriminator, the GAN-based approach iteratively refines the colorization quality, producing results that are indistinguishable from real images.
Scalable Framework:
Built using PyTorch, the system is designed for scalability, enabling easy adaptation for enhanced features, diverse datasets, and real-world applications.
Preservation of Fine Details:
The U-Net architecture ensures that intricate textures and edges are preserved while applying color, maintaining the authenticity of the images.
Generalization Across Use Cases:
The model's robust design allows it to perform well across diverse image sets, including historical photographs, artistic creations, and modern grayscale content.
These innovations highlight the transformative potential of the project in advancing image-processing technologies.
This project demonstrates significant practical applications across various industries and domains:
Historical Restoration:
Automates the revival of black-and-white photographs, film archives, and other historical content, preserving cultural and artistic heritage.
Media Production:
Streamlines workflows in the entertainment industry, enabling rapid and professional-grade colorization of grayscale footage.
Creative Tools for Artists:
Provides designers and artists with AI-powered tools to add color to grayscale content, enhancing creativity and productivity.
Education and Research:
Serves as a valuable resource for advancing studies in image processing, computer vision, and machine learning.
Advertising and Marketing:
Assists in creating visually appealing materials by converting black-and-white visuals into vivid imagery.
This versatile system empowers professionals and enthusiasts to harness AI for enhancing imagery, fostering creativity, and preserving history.
Challenge: Ensuring Generated Colors Are Consistent and Contextually Accurate
Challenge: Limited Diversity in the Dataset
Challenge: Training Stability of GANs
Challenge: Computational Resources and Efficiency
These solutions ensured the successful development of a reliable, high-quality image colorization system while addressing the technical challenges inherent to GAN-based approaches.
This project demonstrates the transformative power of AI-driven solutions in the field of image colorization. By leveraging Generative Adversarial Networks (GANs), we have developed a system capable of transforming grayscale images into vivid, lifelike color versions with minimal human intervention.
The project highlights the synergy of advanced neural architectures, such as U-Net and CNN-based discriminators, to deliver high-quality outputs. With its end-to-end automation, scalability, and adaptability, the system addresses a wide range of real-world applications, from historical restoration to creative media production and beyond.
This work is a testament to the capability of AI to bridge traditional challenges in image processing, opening new opportunities for innovation in both creative and professional domains.
import torch import torch.nn as nn import torch.optim as optim from torch.utils.data import DataLoader, Dataset from torchvision import transforms from PIL import Image import os # Check for GPU device = torch.device("cuda" if torch.cuda.is_available() else "cpu") # Define custom dataset for paired grayscale and color images class ColorizationDataset(Dataset): def __init__(self, black_dir, color_dir, transform=None): self.black_dir = black_dir self.color_dir = color_dir self.transform = transform self.image_names = os.listdir(black_dir) def __len__(self): return len(self.image_names) def __getitem__(self, idx): black_image_path = os.path.join(self.black_dir, self.image_names[idx]) color_image_path = os.path.join(self.color_dir, self.image_names[idx]) black_image = Image.open(black_image_path).convert("L") color_image = Image.open(color_image_path).convert("RGB") if self.transform: black_image = self.transform(black_image) color_image = self.transform(color_image) return black_image, color_image import torch.nn as nn class GeneratorUNet(nn.Module): def __init__(self): super(GeneratorUNet, self).__init__() # Encoder self.enc1 = nn.Sequential(nn.Conv2d(1, 64, kernel_size=4, stride=2, padding=1), nn.ReLU(inplace=True)) self.enc2 = nn.Sequential(nn.Conv2d(64, 128, kernel_size=4, stride=2, padding=1), nn.ReLU(inplace=True)) self.enc3 = nn.Sequential(nn.Conv2d(128, 256, kernel_size=4, stride=2, padding=1), nn.ReLU(inplace=True)) # Bottleneck self.bottleneck = nn.Sequential(nn.Conv2d(256, 512, kernel_size=4, stride=2, padding=1), nn.ReLU(inplace=True)) # Decoder self.dec3 = nn.Sequential(nn.ConvTranspose2d(512, 256, kernel_size=4, stride=2, padding=1), nn.ReLU(inplace=True)) self.dec2 = nn.Sequential(nn.ConvTranspose2d(256, 128, kernel_size=4, stride=2, padding=1), nn.ReLU(inplace=True)) self.dec1 = nn.Sequential(nn.ConvTranspose2d(128, 64, kernel_size=4, stride=2, padding=1), nn.ReLU(inplace=True)) # Output layer self.output_layer = nn.ConvTranspose2d(64, 3, kernel_size=4, stride=2, padding=1) def forward(self, x): x1 = self.enc1(x) x2 = self.enc2(x1) x3 = self.enc3(x2) bottleneck = self.bottleneck(x3) x = self.dec3(bottleneck) x = self.dec2(x + x3) # Skip connection x = self.dec1(x + x2) # Skip connection return torch.tanh(self.output_layer(x + x1)) # Final skip connection class Discriminator(nn.Module): def __init__(self): super(Discriminator, self).__init__() self.model = nn.Sequential( nn.Conv2d(4, 64, kernel_size=4, stride=2, padding=1), nn.LeakyReLU(0.2, inplace=True), nn.Conv2d(64, 128, kernel_size=4, stride=2, padding=1), nn.BatchNorm2d(128), nn.LeakyReLU(0.2, inplace=True), nn.Conv2d(128, 256, kernel_size=4, stride=2, padding=1), nn.BatchNorm2d(256), nn.LeakyReLU(0.2, inplace=True), nn.Conv2d(256, 512, kernel_size=4, stride=2, padding=1), nn.BatchNorm2d(512), nn.LeakyReLU(0.2, inplace=True), nn.Conv2d(512, 1, kernel_size=4, padding=1) ) def forward(self, img_A, img_B): # Ensure img_A and img_B have identical batch, height, and width dimensions if img_A.size(0) != img_B.size(0): # Check if batch sizes match raise ValueError(f"Batch sizes of img_A ({img_A.size(0)}) and img_B ({img_B.size(0)}) do not match.") if img_A.shape[2:] != img_B.shape[2:]: # Check if spatial dimensions match img_B = F.interpolate(img_B, size=(img_A.size(2), img_A.size(3)), mode='bilinear', align_corners=False) # Concatenate grayscale and color images along the channel dimension img_input = torch.cat((img_A, img_B), 1) return torch.sigmoid(self.model(img_input)) # Paths to the training folders train_black_dir = r"F:\archive\data\train_black" train_color_dir = r"F:\archive\data\train_color" # Load the training data transform = transforms.Compose([ transforms.Resize((256, 256)), transforms.ToTensor(), ]) train_dataset = ColorizationDataset(train_black_dir, train_color_dir, transform=transform) train_loader = DataLoader(train_dataset, batch_size=16, shuffle=True) # Instantiate models, loss functions, and optimizers generator = GeneratorUNet().to(device) discriminator = Discriminator().to(device) optimizer_G = optim.Adam(generator.parameters(), lr=0.0002, betas=(0.5, 0.999)) optimizer_D = optim.Adam(discriminator.parameters(), lr=0.0002, betas=(0.5, 0.999)) criterion_GAN = nn.BCELoss() criterion_pixel = nn.L1Loss() # Training loop num_epochs = 100 # Define the number of epochs for epoch in range(num_epochs): for i, (black, color) in enumerate(train_loader): # Ensure `black` and `color` have matching batch sizes assert black.size(0) == color.size(0), f"Batch size mismatch: black({black.size(0)}), color({color.size(0)})" black, color = black.to(device), color.to(device) # Generate colorized images generated_color = generator(black) # Reshape valid and fake tensors to match discriminator output dimensions discriminator_output_shape = discriminator(black, generated_color).shape valid = torch.ones((black.size(0), *discriminator_output_shape[1:]), requires_grad=False).to(device) fake = torch.zeros((black.size(0), *discriminator_output_shape[1:]), requires_grad=False).to(device) # Train Generator optimizer_G.zero_grad() generated_color = generator(black) # Calculate the GAN loss for the generator # Calculate GAN loss for the generator g_loss_gan = criterion_GAN(discriminator(black, generated_color), valid) g_loss_pixel = criterion_pixel(generated_color, color) g_loss = g_loss_gan + 100 * g_loss_pixel g_loss.backward() optimizer_G.step() import torch.nn.functional as F # Ensure dimensions of grayscale and color images match before passing to the discriminator # Ensure consistency before feeding into the discriminator black_resized = F.interpolate(black, size=(color.size(2), color.size(3)), mode='bilinear', align_corners=False) generated_color_resized = F.interpolate(generated_color, size=(color.size(2), color.size(3)), mode='bilinear', align_corners=False) # Reshape `valid` and `fake` tensors to match the output dimensions of the discriminator valid = torch.ones((black.size(0), 1, discriminator(black_resized, generated_color_resized).shape[2], discriminator(black_resized, generated_color_resized).shape[3]), requires_grad=False).to(device) fake = torch.zeros((black.size(0), 1, discriminator(black_resized, generated_color_resized).shape[2], discriminator(black_resized, generated_color_resized).shape[3]), requires_grad=False).to(device) # Train Discriminator optimizer_D.zero_grad() # Real loss with resized inputs real_loss = criterion_GAN(discriminator(black_resized, color), valid) # Fake loss with resized inputs fake_loss = criterion_GAN(discriminator(black_resized, generated_color_resized.detach()), fake) # Total discriminator loss d_loss = (real_loss + fake_loss) / 2 d_loss.backward() optimizer_D.step() print(f"[Epoch {epoch}/{num_epochs}] [Batch {i}/{len(train_loader)}] [D loss: {d_loss.item()}] [G loss: {g_loss.item()}]") # Save model weights to Google Drive if needed torch.save(generator.state_dict(), "generator.pth") torch.save(discriminator.state_dict(), "discriminator.pth") print("Model Training Completed")
# Load the trained generator model generator = GeneratorUNet().to(device) generator.load_state_dict(torch.load("generator.pth", map_location=device)) generator.eval() # Define transformation for test images transform = transforms.Compose([ transforms.Resize((256, 256)), transforms.ToTensor(), transforms.Grayscale(num_output_channels=1), ]) # Function to colorize a single grayscale image def colorize_image(image_path, output_path): image = Image.open(image_path).convert("L") grayscale_image = transform(image).unsqueeze(0).to(device) # Generate colorized image with torch.no_grad(): colorized_image = generator(grayscale_image) # Post-process and save output colorized_image = colorized_image.squeeze(0).cpu() colorized_image = (colorized_image * 0.5 + 0.5).clamp(0, 1) colorized_image = transforms.ToPILImage()(colorized_image) colorized_image.save(output_path) print(f"Colorized image saved to {output_path}") # Paths for test and output folders test_black_dir = r"C:\Users\Pictures\1black" # your path (location) output_dir = r"C:\UsersPictures\2color" # your path (location) os.makedirs(output_dir, exist_ok=True) # Loop through test images and colorize for filename in os.listdir(test_black_dir): if filename.endswith(".jpg") or filename.endswith(".png"): image_path = os.path.join(test_black_dir, filename) output_path = os.path.join(output_dir, f"colorized_{filename}") colorize_image(image_path, output_path)
Gray Scale Image |
---|
Output Image (color) | Final Enhancement |
---|---|