Stable Diffusion has revolutionized image generation, gaining immense popularity across various domains. However, creating LoRAs requires large datasets of images paired with their captions. Creating them manually requires a significant amount of time. This project introduces an automated solution to automatically create datasets of desired anime characters. By leveraging this tool, creators can focus on innovation rather than laborious data collection, unlocking new possibilities in AI-driven art.
LoRA Dataset Automaker is a Jupyter Notebook designed to facilitate the creation of datasets for training Stable Diffusion anime characters LoRAs.
Main key features:
To filter duplicate images, pre-trained clip-vit-base32-torch model was used. It generates vector embeddings for each image, representing their features. The cosine similarity is calculated between these embeddings to measure how similar each image is to others. A diagonal matrix is created where each value (except diagonal elements) represents the similarity between two images. Images with similarity above a defined threshold are marked as duplicates. The first image of a duplicate set is kept. The rest are tagged as "delete."
The image illustrates the architecture of a CLIP model.
Yolov5 anime models was used for face detection. According to the GitHub page, their performance using Tesla P100:
Model | Images | Targets | P | R | mAP@.5 | mAP@.5:.95 | Inference (ms) | NMS (ms) | Total (ms) |
---|---|---|---|---|---|---|---|---|---|
yolov5x_anime | 655 | 873 | 0.964 | 0.95 | 0.947 | 0.518 | 22.6 | 1.5 | 24.1 |
yolov5s_anime | 655 | 873 | 0.959 | 0.955 | 0.953 | 0.582 | 3.4 | 1.3 | 4.6 |
A Siamese EfficientNet-B0 model filters images based on facial similarity to the desired character. Similarity is calculated as the pairwise distance of the output vector for two images. Triplet margin loss was used as a loss function.
JoyTag is used to tag filtered images.
Train dataset: 22993 triplets.
Test dataset: 7926 triplets.
Accuracy: 0.93265
F1: 0.93388
P: 0.91723
R: 0.95116
Best threshold: 35.48
Accuracy: 0.88540
F1: 0.88695
P: 0.88982
R: 0.88410
Best threshold: 31.68
20 random images from the test dataset (17/20 correct predictions):
Examples of parsed datasets for different anime: https://drive.google.com/drive/folders/1QmGk8vdOLigLIft6rjCgjzdUAbmgVxuT?usp=sharing
GitHub repository: https://github.com/Maximax67/LoRA-Dataset-Automaker
Dataset Maker Google colab link: https://colab.research.google.com/github/Maximax67/LoRA-Dataset-Automaker/blob/main/Dataset_Automaker.ipynb
The LoRA Dataset Automaker streamlines the creation of high-quality datasets for training Stable Diffusion models, automating tasks like image collection, filtering, and tagging. This project makes dataset generation more accessible and efficient.