论文必备 - RT-DETR热力图可视化，支持指定模型，指定显示层，设置置信度，以及10种可视化实现方式

一、本文介绍

本文带来的是 RT-DETR热力图可视化功能，支持 指定模型 ， 指定显示层 ， 设置置信度 ，以及 10种可视化实现方式 。

我们经常看到一些论文里绘制了不同的热力图，一方面能够直观的感受其模型的有效性，另一方面也丰富了论文内容。特别是在使用了注意力模块的网络中，热力图就可以验证注意力机制是否真正聚焦到了预期的重要特征上，以便对模型的有效性和合理性进行评估。例如 Centralized Feature Pyramid for Object Detection 这篇文章中展示的，就很能够表达作者改进后的模型相比之前模型的优越性。

在这里插入图片描述

二、项目完整代码

在项目根目录中新建 heatmap.py 文件，并粘贴如下代码：

项目中主要使用 grad_cam 这个依赖库，需要提前安装一下，比较慢，建议用个镜像。

pip install grad-cam==1.4.8

完整代码如下：

import warnings
warnings.filterwarnings('ignore')
warnings.simplefilter('ignore')
import torch, yaml, cv2, os, shutil
import numpy as np
np.random.seed(0)
import matplotlib.pyplot as plt
from tqdm import trange
from PIL import Image
from ultralytics.nn.tasks import attempt_load_weights
from ultralytics.utils.ops import xywh2xyxy
from pytorch_grad_cam import GradCAMPlusPlus, GradCAM, XGradCAM, EigenCAM, HiResCAM, LayerCAM, RandomCAM, EigenGradCAM
from pytorch_grad_cam.utils.image import show_cam_on_image, scale_cam_image
from pytorch_grad_cam.activations_and_gradients import ActivationsAndGradients

def letterbox(im, new_shape=(640, 640), color=(114, 114, 114), auto=True, scaleFill=False, scaleup=True, stride=32):
    # Resize and pad image while meeting stride-multiple constraints
    shape = im.shape[:2]  # current shape [height, width]
    if isinstance(new_shape, int):
        new_shape = (new_shape, new_shape)

    # Scale ratio (new / old)
    r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
    if not scaleup:  # only scale down, do not scale up (for better val mAP)
        r = min(r, 1.0)

    # Compute padding
    ratio = r, r  # width, height ratios
    new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
    dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]  # wh padding
    if auto:  # minimum rectangle
        dw, dh = np.mod(dw, stride), np.mod(dh, stride)  # wh padding
    elif scaleFill:  # stretch
        dw, dh = 0.0, 0.0
        new_unpad = (new_shape[1], new_shape[0])
        ratio = new_shape[1] / shape[1], new_shape[0] / shape[0]  # width, height ratios

    dw /= 2  # divide padding into 2 sides
    dh /= 2

    if shape[::-1] != new_unpad:  # resize
        im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)
    top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
    left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
    im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color)  # add border
    return im, ratio, (dw, dh)

class ActivationsAndGradients:
    """ Class for extracting activations and
    registering gradients from targetted intermediate layers """

    def __init__(self, model, target_layers, reshape_transform):
        self.model = model
        self.gradients = []
        self.activations = []
        self.reshape_transform = reshape_transform
        self.handles = []
        for target_layer in target_layers:
            self.handles.append(
                target_layer.register_forward_hook(self.save_activation))
            # Because of https://github.com/pytorch/pytorch/issues/61519,
            # we don't use backward hook to record gradients.
            self.handles.append(
                target_layer.register_forward_hook(self.save_gradient))

    def save_activation(self, module, input, output):
        activation = output

        if self.reshape_transform is not None:
            activation = self.reshape_transform(activation)
        self.activations.append(activation.cpu().detach())

    def save_gradient(self, module, input, output):
        if not hasattr(output, "requires_grad") or not output.requires_grad:
            # You can only register hooks on tensor requires grad.
            return

        # Gradients are computed in reverse order
        def _store_grad(grad):
            if self.reshape_transform is not None:
                grad = self.reshape_transform(grad)
            self.gradients = [grad.cpu().detach()] + self.gradients

        output.register_hook(_store_grad)

    def post_process(self, result):
        logits_ = result[:, 4:]
        boxes_ = result[:, :4]
        sorted, indices = torch.sort(logits_.max(1)[0], descending=True)
        return logits_[indices], boxes_[indices], xywh2xyxy(boxes_[indices]).cpu().detach().numpy()
  
    def __call__(self, x):
        self.gradients = []
        self.activations = []
        model_output = self.model(x)
        post_result, pre_post_boxes, post_boxes = self.post_process(model_output[0][0])
        return [[post_result, pre_post_boxes]]

    def release(self):
        for handle in self.handles:
            handle.remove()

class rtdetr_target(torch.nn.Module):
    def __init__(self, ouput_type, conf, ratio) -> None:
        super().__init__()
        self.ouput_type = ouput_type
        self.conf = conf
        self.ratio = ratio
    
    def forward(self, data):
        post_result, pre_post_boxes = data
        result = []
        for i in trange(int(post_result.size(0) * self.ratio)):
            if float(post_result[i].max()) < self.conf:
                break
            if self.ouput_type == 'class' or self.ouput_type == 'all':
                result.append(post_result[i].max())
            elif self.ouput_type == 'box' or self.ouput_type == 'all':
                for j in range(4):
                    result.append(pre_post_boxes[i, j])
        return sum(result)

class rtdetr_heatmap:
    def __init__(self, weight, device, method, layer, backward_type, conf_threshold, ratio, show_box, renormalize):
        device = torch.device(device)
        ckpt = torch.load(weight)
        model_names = ckpt['model'].names
        model = attempt_load_weights(weight, device)
        model.info()
        for p in model.parameters():
            p.requires_grad_(True)
        model.eval()
        
        target = rtdetr_target(backward_type, conf_threshold, ratio)
        target_layers = [model.model[l] for l in layer]
        method = eval(method)(model, target_layers, use_cuda=device.type == 'cuda')
        method.activations_and_grads = ActivationsAndGradients(model, target_layers, None)

        colors = np.random.uniform(0, 255, size=(len(model_names), 3)).astype(np.int64)
        self.__dict__.update(locals())
    
    def post_process(self, result, shape):
        logits_ = result[:, 4:]
        boxes_ = result[:, :4]
        
        # filter
        score, cls = logits_.max(1, keepdim=True)
        idx = (score > self.conf_threshold).squeeze()
        logits_, boxes_ = logits_[idx], boxes_[idx]
        
        # xywh -> xyxy
        h, w = shape
        boxes_ = xywh2xyxy(boxes_)
        boxes_[:, 0] *= w
        boxes_[:, 2] *= w
        boxes_[:, 1] *= w
        boxes_[:, 3] *= w
        
        return torch.cat([boxes_, logits_], dim=1)
    
    def draw_detections(self, box, color, name, img):
        xmin, ymin, xmax, ymax = list(map(int, list(box)))
        cv2.rectangle(img, (xmin, ymin), (xmax, ymax), tuple(int(x) for x in color), 2)
        cv2.putText(img, str(name), (xmin, ymin - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.8, tuple(int(x) for x in color), 2, lineType=cv2.LINE_AA)
        return img

    def renormalize_cam_in_bounding_boxes(self, boxes, image_float_np, grayscale_cam):
        """Normalize the CAM to be in the range [0, 1] 
        inside every bounding boxes, and zero outside of the bounding boxes. """
        h, w, _ = image_float_np.shape
        renormalized_cam = np.zeros(grayscale_cam.shape, dtype=np.float32)
        for x1, y1, x2, y2 in boxes:
            x1, y1 = max(x1 , 0) , max(y1, 0) 
            x2, y2 = min(grayscale_cam.shape[1] - 1, x2) , min(grayscale_cam.shape[0] - 1, y2) 
            renormalized_cam[y1:y2, x1:x2] = scale_cam_image(grayscale_cam[y1:y2, x1:x2].copy())    
        renormalized_cam = scale_cam_image(renormalized_cam)
        eigencam_image_renormalized = show_cam_on_image(image_float_np, renormalized_cam, use_rgb=True)
        return eigencam_image_renormalized
    
    def process(self, img_path, save_path):
        # img process
        img = cv2.imread(img_path)
        ori_h, ori_w = img.shape[:2]
        img = letterbox(img, auto=False, scaleFill=True)[0]
        img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        img = np.float32(img) / 255.0
        tensor = torch.from_numpy(np.transpose(img, axes=[2, 0, 1])).unsqueeze(0).to(self.device)
        
        try:
            grayscale_cam = self.method(tensor, [self.target])
        except AttributeError as e:
            return
        
        grayscale_cam = grayscale_cam[0, :]
        cam_image = show_cam_on_image(img, grayscale_cam, use_rgb=True)
        pred = self.model(tensor)[0][0]
        pred = self.post_process(pred, img.shape[:2])
        if self.renormalize:
            cam_image = self.renormalize_cam_in_bounding_boxes(pred[:, :4].cpu().detach().numpy().astype(np.int32), img, grayscale_cam)
        if self.show_box:
            for data in pred:
                data = data.cpu().detach().numpy()
                cam_image = self.draw_detections(data[:4], self.colors[int(data[4:].argmax())], f'{self.model_names[int(data[4:].argmax())]} {float(data[4:].max()):.2f}', cam_image)
        cam_image = cv2.resize(cam_image, (ori_w, ori_h))
        cam_image = Image.fromarray(cam_image)
        cam_image.save(save_path)
    
    def __call__(self, img_path, save_path):
        # remove dir if exist
        if os.path.exists(save_path):
            shutil.rmtree(save_path)
        # make dir if not exist
        os.makedirs(save_path, exist_ok=True)

        if os.path.isdir(img_path):
            for img_path_ in os.listdir(img_path):
                self.process(f'{img_path}/{img_path_}', f'{save_path}/{img_path_}')
        else:
            self.process(img_path, f'{save_path}/result.png')

def get_params():
    params = {
        'weight': 'runs/detect/train/weights/best.pt',
        'device': 'cuda:0',
        'method': 'GradCAMPlusPlus', # GradCAMPlusPlus, GradCAM, XGradCAM, EigenCAM, HiResCAM, LayerCAM, RandomCAM, EigenGradCAM
        'layer': [15,17,23],
        'backward_type': 'all', # class, box, all
        'conf_threshold': 0.2, # 0.2
        'ratio': 0.02, 
        'show_box': True,
        'renormalize': True
    }
    return params

if __name__ == '__main__':
    model = rtdetr_heatmap(**get_params())
    model(r'figures', 'result')

三、参数解析

需要配置的参数主要在 get_params() 函数下：

def get_params():
    params = {
        'weight': 'runs/detect/train/weights/best.pt',
        'device': 'cuda:0',
        'method': 'GradCAMPlusPlus', # GradCAMPlusPlus, GradCAM, XGradCAM, EigenCAM, HiResCAM, LayerCAM, RandomCAM, EigenGradCAM
        'layer': [15,17,23],
        'backward_type': 'all', # class, box, all
        'conf_threshold': 0.2, # 0.2
        'ratio': 0.02, 
        'show_box': True,
        'renormalize': True
    }
    return params

在这里插入图片描述

参数含义解释 ：

参数	解释
weight	权重路径，训练完成后的权重文件
device	运行的设备，和模型训练时的device参数设置一致
method_name	代码注释中放了10种不同的可视化实现方式，可以都试试，效果不同
layer	想要输出第哪几层的热力图就写几，可以多换换层数，大小，看看效果
backward_type	反向传播的计算类型，class表示按照类别最大概率进行计算或通过box计算梯度 all表示均使用
conf_threshold	目标置信度阈值
ratio	取前多少数据，设置成0.02
show_box	是否显示检测框，False不显示，True显示
renormalize	是否对检测后的热力图进行优化

在这里插入图片描述

红色框中的数据就是行号。

四、使用方法

4.1 指定模型

指定模型需要修改的参数是 weight 。

'weight': 'runs/detect/train/weights/best.pt'

将参数内容修改成自己的权重文件路径。

4.2 设置可视化实现方式

设置可视化实现方式需要修改的参数是 method_name 。

'method_name': 'GradCAMPlusPlus'

可选方法有如下10种： GradCAMPlusPlus , GradCAM , XGradCAM , EigenCAM , HiResCAM , LayerCAM , RandomCAM , EigenGradCAM , ScoreCAM , GradCAMElementWise

4.3 指定显示层

指定显示层需要修改的参数是 layer 。

'layer': [15,17,23]

可换成其他层号，只要不是-1就行。

4.4 设置置信度

设置置信度需要修改的参数是 conf_threshold 。

'conf_threshold': 0.2

主要和检测有关。

4.5 指定图像和保存地址

指定图像和保存地址需要修改的参数在 main 函数中。

if __name__ == '__main__':
    model = rtdetr_heatmap(**get_params())
    model(r'figures', 'result')

在这里插入图片描述

在 model(r'figures', 'result') 中：

第一个参数 r'figures' 表示想要进行热力图绘制的原图像路径。

第二个参数 'result' 表示绘制完成后输出的文件夹路径。

五、热力图可视化结果

运行文件后变回开始检测并绘制热力图，下面进度条未满是置信度设置的原因，未进行的都是不满足置信度的，不影响结果的输出。

在这里插入图片描述

绘制结果

在这里插入图片描述

学习资源站

论文必备-RT-DETR热力图可视化，支持指定模型，指定显示层，设置置信度，以及10种可视化实现方式_rtdetr热力图-

论文必备 - RT-DETR热力图可视化，支持指定模型，指定显示层，设置置信度，以及10种可视化实现方式

一、本文介绍

二、项目完整代码

三、参数解析

四、使用方法

4.1 指定模型

4.2 设置可视化实现方式

4.3 指定显示层

4.4 设置置信度

4.5 指定图像和保存地址

五、热力图可视化结果