“Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization” Summarized
https://arxiv.org/abs/1610.02391 (2016-10-7)
1. Accuracy vs Interpretability
There typically exists a trade-off between accuracy and interpertability(simplicity).
- Classical rule-based systems: interpretable but not accurate
- Deep models: accurate but not interpretable
The authors made deep models interpretable with Grad-CAM.
2. Grad-CAM Formulation
We first compute $\alpha^c_k$ which
-
represents partial linearization of the deep network downstream from $A$
-
captures the ‘importance’ of feature map $k$ for a target class $c$
- $c$: class index
- $k$: channel index
- $i$: height index, $j$: width index
- $y^c$: score of $c$th class (before softmax) / In general, it could be any differentiable output for any tasks.
- $A$: activation from the last convolution
- $A^k_{i,j}$: number at $(i,j)$ position of $k$th channel of $A$
- $Z$: width * height of $A^k$
Then we compute Grad-Cam $L^c_{Grad-CAM}$
\[L^c_{Grad-CAM} = ReLU(\sum_k \alpha^c_k A^k)\]We apply a ReLU to the linear combination of maps because we are only interested in the features that have a positive influence on the class of interest.
3. Grad-CAM generalizes CAM
Previous work, CAM is formulated as follows.
\[Y^c = \sum_k w^c_k \frac{1}{Z}\sum_i\sum_jA^k_{i,j}\]CAM can only be applied to a specific kind of architecture where global average pooled convolutional feature maps are fed directly into softmax. In this condition, the paper shows Grad-CAM is equivalent to CAM, thus it is a strict generalization of CAM.
4. Guided Grad-CAM
A good visual explanation from the model for justifying any target category should be ‘class discriminative’ and ‘high-resolution’.
- Guided Backpropagation, Deconvolution: high-resolution but not class-discriminative
- Grad-Cam, CAM: class-discriminative but not high-resolution
The authors upscaled Grad-CAM and multiplied it pixel-wise with guided Backpropagation to create Guided Grad-CAM visualizations that are both high-resolution and class-discriminative.
5. Counterfactual Explanations
Counterfactual explanation highlights support for regions that would make the network change its prediction.
It is computed the same way as Grad-CAM but uses negative gradients as shown below.
\[\alpha^c_k = \frac{1}{Z}\sum_i\sum_j - \frac{\partial y^c}{\partial A^k_{i,j}}\]6. Localization Ability
1. Weakly-supervised Localization
2. Weakly-supervised Segmentation
7. Pros of Grad-CAM
1. Class Discrimination
When viewing Guided Grad-CAM, human subjects can correctly identify the category being visualized better than with Guided Backpropagation.
2. Trust
Human subjects are able to identify the more accurate classifier simply from the Guided Grad-CAM, despite both models making identical predictions
3. Faithful to the model
Patches which change the CNN score are also patches to which Grad-CAM assign high intensity.
4. Analyzing failure modes
With Grad-CAM on failed images, we can see that seemingly unreasonable predictions have reasonable explanations.
Leave a Comment