Defect detection and classification in manufacturing components are critical for ensuring product quality and operational efficiency. This project addresses the problem of identifying and highlighting defects in screws, followed by classifying the type of defect. Leveraging the Mask2Former model for semantic segmentation, we fine-tuned it on the MVTEC screw dataset to accurately detect defective regions. Subsequently, the Qwen2-VL-7B-Instruct model was employed for defect classification using few-shot learning techniques. Our approach successfully identified and categorized various screw defects, demonstrating high accuracy and robustness. The results highlight the potential of combining advanced segmentation and language models for automated quality control in manufacturing.
The quality control of mechanical components such as screws is critical in industries ranging from manufacturing to aerospace. Defects in screws, such as thread misalign-ment, cracks, or corrosion, can lead to product failure or safety hazards. Detecting these defects manually is time-consuming and prone to human error, particularly when dealing with large-scale production.
Recent advances in computer vision and natural language processing (NLP) offer an opportunity to automate defect detection and classification. This project combines Mask2Former, a state-of-the-art segmentation model, with Qwen2-VL-7B-Instruct, a multimodal large language model, to identify and classify defects in screws. The pri-mary goal is to detect the location of defects in screw im-ages and provide an accurate classification of the defect type.
We propose a two-stage approach: first, a segmentation model (Mask2Former) is fine-tuned to identify defective areas in screw images, and then a language model (Qwen2) is used for few-shot learning to classify these defects based on the segmented mask. Our findings show that this com-bination of computer vision and NLP techniques offers a robust solution for screw defect detection, with significant potential for deployment in industrial settings.
• Dataset Selection: Utilized the screws subset from the MVTEC AD dataset, comprising high-resolution images of screws with corresponding binary masks indicating defects.
• Data Splitting: Divided the dataset into training (80%) and validation (20%) sets to ensure robust evaluation of the model's performance.
• Model Selection: Chose Mask2Former for its state-of-the-art performance in semantic segmentation tasks.
• Fine-Tuning: Fine-tuned the Mask2Former model on the training dataset to adapt it to the specific task of defect detection in screws.
• Data Augmentation: Applied transformations such as resizing, horizontal flipping, brightness and contrast adjustments, and rotations to enhance the model's generalization capabilities.
• Model Integration: Employed the Qwen2-VL-7B-Instruct model for classifying the type of defect based on the segmented defective regions.
• Few-Shot Learning: Provided the model with two examples of each defect type to facilitate accurate classification on new images.
• Pipeline Integration: Combined the segmentation and classification models to create an end-to-end system that first identifies defective regions and then classifies the defect type.
• Defect Localization: Applied the fine-tuned Mask2Former model to new screw images to generate segmentation masks highlighting defective areas.
• Segmentation Overlay: Overlaid the segmentation masks on the original images to visually indicate defects.
• Defect Classification: Utilized the Qwen2-VL model to analyze the overlaid images and classify the type of defect present.
• Developed custom dataset classes and data loaders using PyTorch to handle image and mask data efficiently.
• Implemented transformations using Albumentations to perform data augmentation and normalization.
• Configured the Mask2Former model with appropriate hyperparameters, including learning rate and batch size.
• Employed AdamW optimizer and MultiStepLR scheduler to optimize the training process.
• Monitored training and validation loss, as well as mean Intersection over Union (mIoU) metrics to evaluate model performance.
• Leveraged the Qwen2-VL model's ability to process mul-timodal inputs, integrating both image and textual data for accurate defect classification.
• Designed a prompt with examples to guide the model in categorizing defects into predefined categories.
The training and validation performance of the Mask2Former model is depicted in the attached graphs showing loss and mean Intersection over Union (mIoU) over epochs.
The training loss decreased steadily across epochs, indicating the model's convergence as it learned to segment defective regions. The validation loss also showed a con-sistent downward trend initially but experienced fluctua-tions in later epochs. This behavior suggests that while the model generalized well to unseen data, there were some variations likely caused by differences in defect complexity in the validation set. The final training loss was 0.045, and the validation loss was 0.050, reflecting minimal over-fitting.
The mIoU metric, which measures the overlap between predicted segmentation masks and ground truth, demon-strated progressive improvement throughout the training process. The training mIoU steadily increased and stabi-lized at 0.88, while the validation mIoU achieved a com-parable value of 0.85, indicating robust segmentation per-formance. Despite some fluctuations in validation mIoU, the overall trend showcases the model's capability to gen-eralize well to new data.
To demonstrate the improvements achieved through fi-ne-tuning the Mask2Former model, we present a visual comparison of the segmentation results. The "Before Fine-Tuning" image shows the initial performance of the pre-trained model on an example screw, while the "After Fine-Tuning" image highlights the refined segmentation achieved after adapting the model to the MVTEC dataset.
The pre-trained Mask2Former model struggles to pre-cisely segment defective regions, resulting in incomplete or incorrect masks, but after fine-tuning the model demon-strates significant improvement accurately identifying de-fective regions and providing well defined segmentation masks
Before Finetuning:
After Finetuning:
Upon fine-tuning, the Mask2Former model effectively identified defective regions in screws, overlaying segmenta-tion
masks with high precision. The subsequent classification using Qwen2-VL accurately categorized defects into pre-defined types based on the highlighted regions.
• Classification Accuracy: 82%
• Confusion Matrix:
This project successfully developed an automated system for detecting and classifying defects in screws using Mask2Former and Qwen2-VL models. The fine-tuned Mask2Former model accurately segmented defective regions, while the Qwen2-VL model effectively classified the type of defect, achieving high accuracy rates. The integration of these models demonstrates the potential of advanced AI techniques in enhancing quality control processes in manufacturing.