This notebook explores and compares various Explainable AI (XAI) techniques—LIME, Anchors, and SHAP—to explain the predictions made by a pre-trained ResNet34 model for image classification. Specifically, a sample bus image is used, which the model misclassifies as a passenger car, and these XAI methods are applied to understand why the model made this incorrect prediction.
The notebook walks through the implementation of each explanation technique, highlighting their strengths and weaknesses. Each method is evaluated based on its interpretability, computational efficiency, and the clarity of its visualizations. In the end, LIME is selected as the preferred technique due to its intuitive explanations and computational efficiency.
Key components of this notebook include:
By the end of this notebook, we will gain a deeper understanding of how XAI techniques can be used to interpret the behavior of black-box models like ResNet34, and how these techniques compare in terms of effectiveness and usability in real-world image classification tasks.
LIME is a technique that explains the predictions of any black-box machine learning model by approximating its behavior locally (near the instance of interest) using a simple interpretable model like linear regression or decision trees. The goal is to explain why a model made a particular prediction for a specific data point, rather than understanding the global behavior of the model. LIME approximates the decision function around the instance of interest using a linear model or some other simple interpretable model.
| Aspect | Pros | Cons | 
|---|---|---|
| Interpretability | Provides feature importance scores that are intuitive and easy to understand | Explanations can be inconsistent across settings or experiments, leading to instability | 
| Model Agnosticism | Flexible – can be applied to any black-box model, even if the model is replaced | Can struggle with nonlinear decision boundaries due to its use of linear approximations | 
| Flexibility | Works well with tabular data, text, and images | Defining the neighborhood and optimizing the kernel width can be difficult | 
| Efficiency | Provides local explanations without requiring a global model to be retrained | Can be computationally expensive when a large number of perturbations is required | 
| Local Approximation | Approximates the local decision boundary around the instance, making it simple | May provide oversimplified explanations, missing complex model behavior | 
| Bias | Can work with various data types and models, remaining intuitive | Can be fooled or used to hide biases, as the explanation may focus on irrelevant features | 
The LIME Image Explainer (Local Interpretable Model-agnostic Explanations) is a technique designed to explain the predictions of black-box models for image classification tasks. It works by perturbing the original image—typically by segmenting the image into superpixels (small regions of similar pixels) and selectively altering these regions to create multiple perturbed versions of the image. The model’s predictions for these perturbed images are then used to train an interpretable local surrogate model, such as a linear classifier, which approximates the model's behavior in the vicinity of the original image. LIME highlights the most important regions of the image that contributed to the model's prediction, providing human-understandable visual explanations for complex models like neural networks. This approach is especially useful for understanding why certain parts of an image strongly influenced the model’s classification decision.
| Aspect | Pros | Cons | 
|---|---|---|
| Interpretability | Provides visual explanations by highlighting important image regions, making it intuitive | Explanations can be inconsistent across different image perturbations, leading to instability | 
| Model Agnosticism | Flexible – can be applied to any black-box image classifier, regardless of the model architecture | May struggle with highly nonlinear decision boundaries due to its linear approximations | 
| Flexibility | Works well with various image classification models | Requires careful tuning of the perturbation process (e.g., superpixel size) | 
| Efficiency | Provides local explanations for specific images without retraining the model | Can be computationally expensive when a large number of image perturbations are required | 
| Local Approximation | Approximates the local decision boundary of the model for each image | May lead to oversimplified explanations, failing to capture complex model behaviors | 
| Bias | Provides intuitive visual cues for feature importance in images | Can be fooled or manipulated to highlight irrelevant regions, leading to biased explanations | 
The LIME explanation provides clear insights into how ResNet34 makes its top 5 predictions.
Passenger car: The ResNet34 model predicts a 76.77% probability that the image is a passenger car, making it the top predicted class. From the LIME explanation, we can observe that the model focused on the bus door, the wheel, the upper side bus windows, and the sky. This is logical because, when picture those features together, these features resemble those of a passenger car.
Trolleybus: The model predicts a 21.94% probability for the image being a trolleybus. The LIME explanation shows that the model focused on the large front bus window and the lower side features along the entire length of the bus, such as the door, wheels, side windows, and rearview mirror. This makes sense as trolleybuses typically have a long structure, and the model has captured those characteristics.
Minibus: With a 0.28% prediction for a minibus, the model focused on the sky, features along the lower and middle sections along the entire length of the bus (door, wheels, side windows, rearview mirror), the slogan on the side, and part of the front window. Although this prediction has a low probability, it is the most accurate for a bus, as it captures the essential features of a bus.
Racecar: The model predicts a 0.21% probability that the image is a racecar. The LIME explanation reveals that this prediction is based on the bus’s bumper and a black car beside the bus near a building. This is understandable, as focusing on just that part of the image could lead one to mistake it for a racecar.
Tram: The model predicts a 0.15% probability for the image being a tram. LIME shows that the model focused on the bus’s bumper, along with features from the lower and middle parts of the bus, such as the door, wheels, side windows, rearview mirror, and half of the front window. These features are similar to those found on a tram, explaining the model’s prediction.
Overall, LIME provides a solid explanation for how the ResNet34 model is making its predictions. The visualizations are clear, with irrelevant parts of the image grayed out, highlighting only the most important features considered by the model for each class prediction.
Anchors is a type of if-then rule used to explain individual predictions made by machine learning models. Unlike LIME, which approximates complex decision boundaries using local linear models, Anchors provide decision rules that "anchor" the prediction to certain features or conditions. When these conditions hold true, the model’s prediction remains consistent with high probability, even if other features change.
| Aspect | Pros | Cons | 
|---|---|---|
| Interpretability | Rules are easy to understand—if-then format is intuitive | Requires careful parameter tuning, which can be challenging | 
| Handling Complexity | Works well with nonlinear or complex models | Perturbation function needs to be custom-designed for each specific use case | 
| Model Agnosticism | Can be used with any model, model agnostic | Coverage is hard to define and measure in some domains, making it difficult to compare | 
| Efficiency | Highly efficient search algorithms like beam search | Imbalanced data can result in biased perturbation spaces unless careful mitigation is used | 
alibi.explainers.anchors.anchor_image module documentation
The AnchorImage explainer is an extension of the Anchors approach designed specifically for image classification tasks, providing highly interpretable, rule-based explanations for predictions made by complex models such as deep neural networks. Instead of relying on global model behavior or linear approximations, AnchorImage identifies if-then rules that are based on superpixels, or segmented regions of the image. These anchors represent key areas of the image that, when present, strongly influence the model’s prediction. The method generates a set of perturbed images by altering different regions, and it tests whether the model’s prediction remains consistent when the anchor region is unchanged. This allows AnchorImage to provide a highly stable and visual explanation by highlighting the specific areas of the image responsible for the model’s decision, making it an effective tool for understanding predictions in complex models like convolutional neural networks (CNNs).
| Aspect | Pros | Cons | 
|---|---|---|
| Interpretability | Provides easy-to-understand if-then rules based on image superpixels, making visual explanations intuitive | Requires careful parameter tuning for perturbations, such as defining superpixels and thresholds | 
| Handling Complexity | Works well with nonlinear and complex models, such as deep neural networks | Perturbation function and superpixel segmentation need to be custom-designed for each image dataset | 
| Model Agnosticism | Can be used with any image classification model, remaining model-agnostic | Coverage can be hard to measure, and it is difficult to define in certain domains | 
| Efficiency | Uses efficient search algorithms like beam search to find anchor regions | Computationally expensive when generating and testing a large number of image perturbations | 
| Stability | Provides stable explanations by identifying key superpixels that consistently influence predictions | Imbalanced data can lead to anchors focusing on irrelevant regions or failing to generalize | 
In the simplified Anchors visualization (as the original was cluttered with too many colors and difficult to interpret), we can observe how ResNet34 predicts the bus image to be a passenger car. The model focuses on specific areas, such as part of the bus's upper front window, the large bus number, and the majority of the bus's iron sheet body. Interestingly, it ignores significant features such as the bus's entire front windows, the upper and lower side windows, the wheels, and the door—all of which are completely grayed out. There is also notable focus on the road, which makes some sense, as a road paired with a metal structure resembling a car could reasonably lead to a passenger car prediction.
Looking at the anchor precision of 0.99, we can see that the anchor rule is highly reliable—99% of the time, when the anchor conditions hold, the model makes the same prediction. This suggests that the features highlighted by the anchor are critical to the model’s decision-making. However, the anchor coverage is 0.25, meaning that this rule applies to only 25% of similar instances. While the explanation is precise, it is relatively specific and doesn’t generalize widely across other examples.
Overall, while Anchors provides a reasonable explanation with high precision, the visualization is less intuitive compared to LIME. It’s noteworthy that Anchors is designed to address some of LIME’s limitations by offering a more robust explanation, yet the resulting visualization can feel less clear and interpretable in practice.
SHAP values are based on the idea of Shapley values from game theory, which fairly distribute the difference between the model's prediction for an instance and a baseline (such as the average prediction across all instances) among the individual feature values.
SHAP uses this concept to explain the marginal contribution of each feature to the prediction.
| Aspect | Pros | Cons | 
|---|---|---|
| Fairness in Feature Attribution | SHAP fairly distributes the difference between the prediction and average prediction among the feature values | Extremely slow to compute, especially for complex models or large datasets | 
| Contrastive Explanations | Enables contrastive explanations, allowing comparisons to the average prediction, subsets, or even specific data points | Requires access to the full dataset to accurately calculate baseline predictions | 
| Model-Agnostic | SHAP is model-agnostic, applicable to any type of machine learning model | Can be misused to create misleading interpretations or hide biases by manipulating feature attributions | 
| Consistency | SHAP guarantees consistency, meaning that if a model changes such that a feature’s contribution increases, its SHAP value will also increase | Computationally expensive, especially for models with many features or complex decision boundaries | 
| Robust Explanations | Provides robust and interpretable explanations that account for both individual and global feature contributions | Can be challenging to interpret for non-technical users if not simplified properly | 
The SHAP Deep Explainer is a specialized version of SHAP (SHapley Additive exPlanations) designed to explain predictions made by deep learning models, such as neural networks. It leverages DeepLIFT (Deep Learning Important FeaTures) and the properties of Shapley values to provide more accurate and efficient explanations for complex, nonlinear models. The Deep Explainer uses backpropagation to attribute the contribution of each input feature to the model's output, making it well-suited for deep networks where traditional methods like linear approximations (e.g., LIME) may fall short. It is particularly effective for models built with frameworks like TensorFlow and Keras.
I'm not using the SHAP Deep Explainer approach in this case due to constraints related to the gradient calculation within the internal architecture of the ResNet34 model. Specifically, there is an issue where the BackwardHookFunctionBackward generates an output that is a view and is being modified in-place. This view, created inside a custom function, can interfere with the autograd mechanism used to calculate gradients, potentially leading to incorrect gradients. Since this behavior is not permitted, and correcting it would require cloning the output of the custom function, the use of SHAP Deep Explainer is not feasible for this example involving the Bus dataset. Given these architectural limitations in ResNet34, I have opted not to use the Deep Explainer in this case.
Shap.GradientExplainer documentation
GradientSHAP is a variant of SHAP (SHapley Additive exPlanations) that combines ideas from Integrated Gradients and SHAP values. It aims to improve the explanation of machine learning models by attributing the contribution of each feature to the model's prediction using gradients.
| Aspect | Pros | Cons | 
|---|---|---|
| Fairness in Feature Attribution | The difference between the prediction and the average prediction is fairly distributed among the feature values | Very slow to compute due to accumulation of gradients along many paths from baseline to input | 
| Contrastive Explanations | Enables contrastive explanations, allowing comparisons to average predictions, subsets, or specific points | Requires access to the data for accurate baseline generation and comparison | 
| Handling Nonlinearity | Works well for nonlinear models like neural networks, as it integrates gradients along multiple paths | Can be used to hide biases or create misleading interpretations, just like other SHAP methods | 
| Interpretability | Provides robust and interpretable attributions by considering gradients across input space | Computationally expensive compared to simpler explanation methods | 
| Adaptability | Can be adapted to different baselines (e.g., comparing to mean or zero) for more meaningful explanations | 
The SHAP explanation visualization highlights key areas of the bus image, marked in color, that contribute to the model's decision to classify the bus as a passenger car. From the visualization, we can see that parts of the bus’s wheels, side, front, and even portions of the sky play a role in influencing the prediction.
The max absolute SHAP value (predicted class 705) is 0.4788, which indicates that one specific feature significantly impacts the model’s decision. SHAP values quantify the contribution of each feature to the difference between the actual prediction (passenger car) and the mean prediction across all possible classes. A max SHAP value of 0.4788 suggests that a particular feature—highlighted in the SHAP visualization—plays a substantial role in shifting the model's output toward the passenger car class. Compared to the mean prediction, this feature drives the model's classification with a considerable impact.
Overall, the SHAP explanation falls short compared to LIME and Anchors. The visualization is difficult to interpret, as the highlighted colors blend into the background, making it hard to distinguish the important features. Additionally, the SHAP Gradient Explainer took more than 2 hours to run on a Nvidia L4 GPU, which is highly time-consuming and inefficient. In contrast, both LIME and Anchors generated their explanations in a matter of seconds using the same GPU. Considering both the explanation clarity and computational efficiency, I would not recommend SHAP for image explanations in this context.
After experimenting with three different explainable AI (XAI) techniques—LIME, Anchors, and SHAP—I chose LIME as the most suitable explanation technique for the bus image classification task. This decision was based on several factors, including its interpretability, computational efficiency, and overall effectiveness in providing a clear explanation of why the ResNet34 model incorrectly classified the bus image as a passenger car.
Interpretability:
LIME provides visual explanations that are highly intuitive for image classification tasks. It highlights important image regions (in this case, parts of the bus) that contribute to the model's predictions. For example, in the bus image classification, the LIME explanation clearly shows that the model focused on parts of the bus door, wheels, upper side bus windows, and even the sky to predict the image as a passenger car. These visual cues are straightforward to interpret and allow us to understand the model's reasoning, even though the prediction was incorrect. While Anchors provides a reasonable explanation with high precision, the visualization is less intuitive compared to LIME. It’s noteworthy that Anchors is designed to address some of LIME’s limitations by offering a more robust explanation, yet the resulting visualization can feel less clear and interpretable in practice. This intuitive visual feedback is one of the key strengths of LIME, making it easy to explain model decisions to both technical and non-technical audiences.
Efficiency:
One of the main reasons for selecting LIME was its computational efficiency. Compared to other XAI methods like SHAP, which took more than 2 hours to run with a Nvidia L4 GPU, LIME was able to generate explanations in just seconds using the same GPU. This makes LIME highly suitable for tasks where quick insights are needed, especially when experimenting with large datasets or complex models like ResNet34. Given that LIME only approximates the local decision boundary and doesn’t require retraining the model or running complex gradient computations, it is far more lightweight in terms of resource usage.
Flexibility:
LIME is model-agnostic, meaning it can be applied to any black-box model, including deep learning models like ResNet34. This flexibility is particularly valuable in the context of image classification, where models can be swapped or updated without the need to change the explanation technique. LIME works effectively across various image classification models and can easily adapt to changes in the underlying architecture without losing its ability to provide explanations.
Local Approximation:
LIME’s ability to focus on local explanations is well-suited for this task. By approximating the local decision boundary around the instance of interest (the bus image), LIME provides insights into why the model made a specific prediction rather than explaining the global behavior of the model. In this case, the local approximation around the bus image shows that the ResNet34 model misclassified the bus as a passenger car due to its focus on specific features, such as the wheels and the bus door, which in isolation resemble features of a car.
Inconsistency Across Perturbations:
One limitation of LIME is that its explanations can be inconsistent across different perturbations of the image. LIME perturbs the input image by segmenting it into superpixels and altering them to create variations of the image. Since this process involves randomness, the explanation generated for the same input image might slightly change depending on the perturbations. This can lead to some instability in the explanation, which might be a concern when consistency is crucial. In the bus image classification task, this inconsistency is not a major issue but can affect the reliability of explanations when making fine-grained decisions.
Oversimplified Explanations:
Another limitation of LIME is its tendency to oversimplify the model's behavior due to its use of linear approximations. ResNet34 is a highly nonlinear model, and by approximating the decision boundary using a linear model in the vicinity of the input, LIME may not capture the full complexity of the model’s decision-making process. For example, the classification of the bus as a passenger car involves complex interactions between multiple features, which may not be fully captured by the linear surrogate model used by LIME. In this case, while LIME provides a clear and intuitive explanation, it might miss some of the underlying complexity of ResNet34’s internal decision-making.
Bias in Explanation:
LIME’s explanation can sometimes focus on irrelevant regions of the image, leading to biased explanations. In the bus image, for instance, LIME highlights parts of the sky, which might not be relevant for distinguishing between a bus and a passenger car. This potential bias could make the explanation less useful in certain situations, as the highlighted regions may not always align with human intuition about what should be important for classification.
Optimizing Superpixel Segmentation:
One way to improve the accuracy of LIME’s explanations is to carefully tune the superpixel segmentation process. By adjusting the size and granularity of the superpixels, it might be possible to capture more meaningful segments of the image that contribute to the prediction. For example, finer segmentation could help separate key features like the bus windows, wheels, and body, leading to a more accurate and detailed explanation.
Combining LIME with Other Techniques:
To address LIME’s potential limitations in oversimplifying complex models, it could be beneficial to combine LIME with other XAI techniques like Anchors. Anchors provides rule-based explanations that capture important feature interactions. A hybrid approach could help balance LIME’s intuitiveness and computational efficiency with Anchors’ ability to handle nonlinearities and provide more robust, rule-based explanations.
Use of Additional Perturbation Methods:
To mitigate the issue of inconsistent explanations, it may be useful to introduce more controlled or domain-specific perturbation methods. Instead of randomly perturbing the image, a more structured approach that considers the domain (e.g., ensuring that the bus’s critical features are perturbed systematically) could lead to more consistent and reliable explanations.
In summary, LIME was chosen for this bus image classification task because it provides the most intuitive and computationally efficient explanation. Despite some limitations like inconsistency across perturbations and potential bias in the highlighted regions, LIME offers clear visual feedback that helps explain why the ResNet34 model misclassified the bus as a passenger car. By focusing on local explanations and being highly flexible and model-agnostic, LIME is well-suited for tasks that require fast, interpretable insights. With potential improvements like optimizing superpixel segmentation and combining LIME with other XAI methods, LIME could become an even more effective tool for understanding complex model behavior.