At our workplace, a bank provides credit to entrepreneurs engaged in gold mining within designated areas. However, these entrepreneurs risk losing their mining licenses due to potential violations such as trespassing or environmental pollution, including improper waste disposal into water bodies or soil contamination. If a license is lost, the bank won't be able to return the credit, leading to irretrievable losses. Detecting such violations is challenging due to the scarcity of relevant satellite image data needed for accurate identification and monitoring.
Gold mining and borders(red color) of multiple entrepreneurs
To address this issue, we aimed to develop a model capable of identifying unauthorized mining activities and environmental breaches. The primary challenge was the lack of sufficient labeled satellite images to train an effective detection system. To overcome this data scarcity, we employed advanced data augmentation techniques and leveraged Stable Diffusion to generate synthetic satellite images. Additionally, we integrated Low-Rank Adaptation (LoRA).
This article explores the application of LoRA in generating high-quality satellite images by integrating it with the YOLO (You Only Look Once) object detection model and advanced data augmentation techniques. Specifically, it aims to:
By the end of this article, readers will understand how combining LoRA with YOLO and sophisticated augmentation methods can significantly enhance satellite image generation, making it more detailed, realistic, and computationally efficient.
Low-Rank Adaptation (LoRA) is a technique designed to improve the adaptability and efficiency of large pre-trained models. Traditional fine-tuning adjusts all model parameters, which is computationally intensive and requires substantial memory. LoRA introduces low-rank matrices to adapt the model's weights with minimal overhead, enabling efficient fine-tuning without retraining the entire model.
Advantages of LoRA:
In satellite image generation, LoRA allows fine-tuning pre-trained models to produce high-resolution images efficiently, enhancing model performance without excessive computational costs.
By merging these technologies, the project aims to achieve superior performance in satellite image generation, making high-quality imagery more accessible and affordable.
Benefits of This Connection
The foundation of our project lies in the collection and preparation of satellite images relevant to gold mining activities. We sourced high-resolution satellite images from publicly available datasets and proprietary sources, focusing on areas designated for mining operations.
Key Steps in Data Preparation:
Image and Mask Loading: We utilized the rasterio
library to handle geospatial data formats, loading both satellite images and their corresponding masks.
Filtering and Splitting: Due to the high resolution of satellite images, we divided them into smaller blocks of 512x512 pixels using custom scripts. This approach ensures manageable data sizes and maintains the integrity of the original imagery.
Saving Processed Data: The processed images and masks were systematically saved into designated directories for training, validation, and testing.
# Function to divide and filter images def divide_image(image, mask, split, image_basename, image_save_dir, mask_save_dir, filtered_image_dir, filtered_mask_dir, threshold): # Detailed implementation can be seen on GitHub, hidden for readability purpose # Creating output directories for split in ['train', 'val', 'test']: os.makedirs(os.path.join(output_dirs[split], 'images'), exist_ok=True) os.makedirs(os.path.join(output_dirs[split], 'masks'), exist_ok=True) os.makedirs(os.path.join(output_dirs[f'{split}_filtered'], 'images'), exist_ok=True) os.makedirs(os.path.join(output_dirs[f'{split}_filtered'], 'masks'), exist_ok=True) # Processing and dividing images for split in ['train', 'val', 'test']: image_dir = os.path.join(dataset_dir, split, 'images') mask_dir = os.path.join(dataset_dir, split, 'masks') output_image_dir = os.path.join(output_dirs[split], 'images') output_mask_dir = os.path.join(output_dirs[split], 'masks') filtered_image_dir = os.path.join(output_dirs[f'{split}_filtered'], 'images') filtered_mask_dir = os.path.join(output_dirs[f'{split}_filtered'], 'masks') for image_name in tqdm(os.listdir(image_dir), desc=f"Processing {split}"): if image_name.lower().endswith('.png'): image_path = os.path.join(image_dir, image_name) mask_name = image_name.replace("train_image", "train_mask").replace("val_image", "val_mask").replace("test_image", "test_mask") mask_path = os.path.join(mask_dir, mask_name) if not os.path.exists(mask_path): logging.warning(f'Mask file not found: {mask_path}') continue try: image = Image.open(image_path).convert("RGB") mask = Image.open(mask_path).convert("L") except Exception as e: logging.error(f"Error opening {image_path} or {mask_path}: {e}") continue image_basename = os.path.splitext(image_name)[0] divide_image(image, mask, split, image_basename, output_image_dir, output_mask_dir, filtered_image_dir, filtered_mask_dir, threshold=filtered_threshold) # Logging and statistics def print_folder_stats(output_dirs): # Detailed implementation can be seen on GitHub, hidden for readability purpose print_folder_stats(output_dirs)
Data augmentation is pivotal in enhancing the diversity and robustness of the training dataset, allowing models to generalize better to unseen data. Given the limited availability of real satellite images for gold mining detection, we implemented several advanced augmentation techniques using the albumentations
library.
Augmentation Pipeline:
from albumentations import Compose, GridDistortion, ElasticTransform, Flip, Affine, RandomBrightnessContrast # Define augmentation pipeline transform = Compose([ GridDistortion(p=0.5), ElasticTransform(p=0.5), Flip(p=0.5), Affine( rotate=(-10, 10), translate_percent={"x": (-0.05, 0.05), "y": (-0.05, 0.05)}, p=0.5 ), RandomBrightnessContrast(p=0.5) ]) def augment_image(image): augmented = transform(image=image) return augmented['image']
This what we get after implementing augmentation and pasting cropped objects for making augmentation more diverse:
Training the YOLO model involved several key steps to ensure effective detection of unauthorized mining activities and environmental breaches.
Model Initialization: We initialized the YOLO model with pre-trained weights to leverage existing knowledge, accelerating the training process and enhancing initial performance.
Data Integration: Combined real and synthetic (augmented) satellite images to create a balanced and comprehensive training dataset. This approach mitigates data scarcity and enhances the model's ability to generalize across diverse scenarios.
Training Configuration: Configured training parameters, including learning rate, batch size, and number of epochs, to optimize model performance.
Training Loop: Executed the training process, monitoring performance metrics such as Class, Box Precision (P), mAP50. The training aimed to maximize these metrics, ensuring robust detection capabilities.
To evaluate the model's effectiveness, we assessed its performance using standard metrics. The results are summarized below:
Category | Box (P) | mAP50 |
---|---|---|
All | 0.81 | 0.80 |
Clouds | 0.74 | 0.97 |
Gold | 0.89 | 0.70 |
Poisonous Water | 0.68 | 0.665 |
Water | 0.89 | 0.86 |
Integrating Low-Rank Adaptation (LoRA) with the YOLO model significantly enhanced our ability to generate and detect high-resolution satellite images tailored to our specific requirements. This integration involved training the Stable Diffusion model on our dataset to generate synthetic satellite images, manually annotating these images, and subsequently retraining the YOLO model with the enriched dataset. The training process was adapted from the great book "Using Stable Diffusion with Python. Leverage Python to control and automate high-quality AI image generation using Stable Diffusion," written by Andrew Zhu(Shudong Zhu), which also details various inference optimization techniques we employed.
To generate realistic and context-specific satellite images, we fine-tuned the Stable Diffusion model using our curated dataset of satellite images related to gold mining activities. The fine-tuning process included the following steps:
Dataset Preparation: Collected and organized a comprehensive set of high-resolution satellite images from designated gold mining areas. Ensured diversity in environmental conditions and mining scenarios to improve the model's generalization capabilities.
Model Fine-Tuning: Leveraged the diffusers
library to fine-tune the Stable Diffusion model on our dataset. This process involved configuring hyperparameters such as learning rate, batch size, and number of epochs to optimize image generation quality.
LoRA Configuration: Applied LoRA to the Stable Diffusion model to enable efficient fine-tuning with minimal computational overhead. This adaptation allowed us to enhance the model's performance without extensive retraining.
Key Code Snippet for Training Stable Diffusion with LoRA:
import torch from diffusers import StableDiffusionPipeline, DDPMScheduler from peft import LoraConfig from peft.utils import get_peft_model_state_dict from accelerate import Accelerator # Initialize Accelerator accelerator = Accelerator(gradient_accumulation_steps=4, mixed_precision="no") device = accelerator.device # Define LoRA configuration lora_config = LoraConfig( r=4, lora_alpha=4, init_lora_weights="gaussian", target_modules=["to_k", "to_q", "to_v", "to_out.0"] ) # Load pre-trained Stable Diffusion model pipe = StableDiffusionPipeline.from_pretrained( "runwayml/stable-diffusion-v1-5", torch_dtype=torch.float32 ).to(device) # Add LoRA adapter to the Unet component pipe.unet.add_adapter(lora_config) print("LoRA adapter added to Unet.")
Example of generated image:
To evaluate the effectiveness of integrating LoRA with the YOLO model, we compared the performance metrics before and after the integration across all categories. The results indicate noticeable improvements in detection accuracy and reliability.
Comparative Analysis of YOLO Configurations
Category | Box (P) | mAP50 |
---|---|---|
All | 0.80 | 0.79 |
Clouds | 0.725 | 0.959 |
Gold | 0.87 | 0.67 |
Poisonous Water | 0.663 | 0.6 |
Water | 0.875 | 0.86 |
Category | Box (P) | mAP50 |
---|---|---|
All | 0.81 | 0.80 |
Clouds | 0.74 | 0.97 |
Gold | 0.89 | 0.70 |
Poisonous Water | 0.68 | 0.665 |
Water | 0.89 | 0.86 |
Category | Box (P) | mAP50 |
---|---|---|
All | 0.827 | 0.818 |
Clouds | 0.77 | 0.995 |
Gold | 0.913 | 0.718 |
Poisonous Water | 0.706 | 0.67 |
Water | 0.921 | 0.891 |
In this article, we explored the integration of Low-Rank Adaptation (LoRA) with the YOLO model and advanced augmentation techniques to enhance satellite image generation. By leveraging Stable Diffusion, we achieved the generation of high-resolution, realistic satellite images with improved performance metrics. Our approach not only demonstrates the efficacy of LoRA in fine-tuning large models but also underscores the importance of robust data augmentation in machine learning workflows.
The combination of these technologies paves the way for more efficient and scalable satellite image generation solutions, with potential applications spanning environmental monitoring, urban planning, and compliance enforcement. As we continue to refine these methods, future research will focus on overcoming current limitations and expanding the applicability of our approach to broader domains.
For more technical details and access to the project code, please visit our Github: https://github.com/jettooss/stable_diffusion_satellite_images/tree/main
repository.
Overall result includes the whole app, using Leaflet in frontend part. Here you can see how we detect violations on Russian Far East territory: https://drive.google.com/file/d/1EztfKvIE9mN8j3p1MvQRDZTYIZV9MU95/view?usp=sharing
If you want to download the dataset, please click on download in google drive and then we will provide you the permission
A1: YOLOv8 (You Only Look Once version 8) was chosen for its real-time object detection capabilities, high accuracy, and versatility. Additionally, I have extensive experience with YOLOv8, not only at work but also in hackathons where it consistently delivered outstanding results, regardless of how arduous the dataset was. This familiarity and proven performance made YOLOv8 the ideal choice for detecting unauthorized mining activities and environmental breaches in satellite images.
A2: Integrating LoRA with YOLOv8 and Stable Diffusion proved to be quite challenging. The process was complex and required a deep understanding of both model architectures and their training pipelines. To overcome these difficulties, we referred to the comprehensive book by Andrew Zhu(Shudong Zhu) "Using Stable Diffusion with Python. Leverage Python to control and automate high-quality AI image generation using Stable Diffusion." This resource was instrumental in successfully implementing the training part of the integration, enabling us to fine-tune the models effectively.
A3: Yes, the methodology of integrating LoRA with YOLOv8 and using advanced augmentation techniques can be applied to various domains that require high-quality image generation and object detection, such as environmental monitoring, urban planning, agriculture, and disaster management. The flexibility and scalability of this approach make it adaptable to different scenarios and datasets, allowing for broad applicability across diverse fields.
A4: We ensure the quality of synthetic images by combining quantitative and qualitative evaluation methods. After generating the images with Stable Diffusion, we assessed them by analyzing key performance metrics and conducting visual inspections. This dual approach ensures that the synthetic images accurately reflect potential violations and environmental conditions, maintaining high standards of reliability and realism in the training dataset.
A5: The inference optimization techniques employed are detailed in the guide "Using Stable Diffusion with Python. Leverage Python to control and automate high-quality AI image generation using Stable Diffusion." These techniques include efficient model loading, leveraging GPU acceleration, and implementing memory management strategies to enhance the speed and efficiency of image generation. By following these optimized practices, we ensured that the Stable Diffusion model operates effectively within our computational constraints.