论文必备 - RT-DETR热力图可视化,支持指定模型,指定显示层,设置置信度,以及10种可视化实现方式
一、本文介绍
本文带来的是 RT-DETR热力图可视化功能 ,支持 指定模型 , 指定显示层 , 设置置信度 ,以及 10种可视化实现方式 。
我们经常看到一些论文里绘制了不同的热力图,一方面能够直观的感受其模型的有效性,另一方面也丰富了论文内容。特别是在使用了注意力模块的网络中,热力图就可以验证注意力机制是否真正聚焦到了预期的重要特征上,以便对模型的有效性和合理性进行评估。例如 Centralized Feature Pyramid for Object Detection 这篇文章中展示的,就很能够表达作者改进后的模型相比之前模型的优越性。
二、项目完整代码
在项目根目录中新建
heatmap.py
文件,并粘贴如下代码:
项目中主要使用
grad_cam
这个依赖库,需要提前安装一下,比较慢,建议用个镜像。
pip install grad-cam==1.4.8
完整代码如下:
import warnings
warnings.filterwarnings('ignore')
warnings.simplefilter('ignore')
import torch, yaml, cv2, os, shutil
import numpy as np
np.random.seed(0)
import matplotlib.pyplot as plt
from tqdm import trange
from PIL import Image
from ultralytics.nn.tasks import attempt_load_weights
from ultralytics.utils.ops import xywh2xyxy
from pytorch_grad_cam import GradCAMPlusPlus, GradCAM, XGradCAM, EigenCAM, HiResCAM, LayerCAM, RandomCAM, EigenGradCAM
from pytorch_grad_cam.utils.image import show_cam_on_image, scale_cam_image
from pytorch_grad_cam.activations_and_gradients import ActivationsAndGradients
def letterbox(im, new_shape=(640, 640), color=(114, 114, 114), auto=True, scaleFill=False, scaleup=True, stride=32):
# Resize and pad image while meeting stride-multiple constraints
shape = im.shape[:2] # current shape [height, width]
if isinstance(new_shape, int):
new_shape = (new_shape, new_shape)
# Scale ratio (new / old)
r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])
if not scaleup: # only scale down, do not scale up (for better val mAP)
r = min(r, 1.0)
# Compute padding
ratio = r, r # width, height ratios
new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))
dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1] # wh padding
if auto: # minimum rectangle
dw, dh = np.mod(dw, stride), np.mod(dh, stride) # wh padding
elif scaleFill: # stretch
dw, dh = 0.0, 0.0
new_unpad = (new_shape[1], new_shape[0])
ratio = new_shape[1] / shape[1], new_shape[0] / shape[0] # width, height ratios
dw /= 2 # divide padding into 2 sides
dh /= 2
if shape[::-1] != new_unpad: # resize
im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)
top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
im = cv2.copyMakeBorder(im, top, bottom, left, right, cv2.BORDER_CONSTANT, value=color) # add border
return im, ratio, (dw, dh)
class ActivationsAndGradients:
""" Class for extracting activations and
registering gradients from targetted intermediate layers """
def __init__(self, model, target_layers, reshape_transform):
self.model = model
self.gradients = []
self.activations = []
self.reshape_transform = reshape_transform
self.handles = []
for target_layer in target_layers:
self.handles.append(
target_layer.register_forward_hook(self.save_activation))
# Because of https://github.com/pytorch/pytorch/issues/61519,
# we don't use backward hook to record gradients.
self.handles.append(
target_layer.register_forward_hook(self.save_gradient))
def save_activation(self, module, input, output):
activation = output
if self.reshape_transform is not None:
activation = self.reshape_transform(activation)
self.activations.append(activation.cpu().detach())
def save_gradient(self, module, input, output):
if not hasattr(output, "requires_grad") or not output.requires_grad:
# You can only register hooks on tensor requires grad.
return
# Gradients are computed in reverse order
def _store_grad(grad):
if self.reshape_transform is not None:
grad = self.reshape_transform(grad)
self.gradients = [grad.cpu().detach()] + self.gradients
output.register_hook(_store_grad)
def post_process(self, result):
logits_ = result[:, 4:]
boxes_ = result[:, :4]
sorted, indices = torch.sort(logits_.max(1)[0], descending=True)
return logits_[indices], boxes_[indices], xywh2xyxy(boxes_[indices]).cpu().detach().numpy()
def __call__(self, x):
self.gradients = []
self.activations = []
model_output = self.model(x)
post_result, pre_post_boxes, post_boxes = self.post_process(model_output[0][0])
return [[post_result, pre_post_boxes]]
def release(self):
for handle in self.handles:
handle.remove()
class rtdetr_target(torch.nn.Module):
def __init__(self, ouput_type, conf, ratio) -> None:
super().__init__()
self.ouput_type = ouput_type
self.conf = conf
self.ratio = ratio
def forward(self, data):
post_result, pre_post_boxes = data
result = []
for i in trange(int(post_result.size(0) * self.ratio)):
if float(post_result[i].max()) < self.conf:
break
if self.ouput_type == 'class' or self.ouput_type == 'all':
result.append(post_result[i].max())
elif self.ouput_type == 'box' or self.ouput_type == 'all':
for j in range(4):
result.append(pre_post_boxes[i, j])
return sum(result)
class rtdetr_heatmap:
def __init__(self, weight, device, method, layer, backward_type, conf_threshold, ratio, show_box, renormalize):
device = torch.device(device)
ckpt = torch.load(weight)
model_names = ckpt['model'].names
model = attempt_load_weights(weight, device)
model.info()
for p in model.parameters():
p.requires_grad_(True)
model.eval()
target = rtdetr_target(backward_type, conf_threshold, ratio)
target_layers = [model.model[l] for l in layer]
method = eval(method)(model, target_layers, use_cuda=device.type == 'cuda')
method.activations_and_grads = ActivationsAndGradients(model, target_layers, None)
colors = np.random.uniform(0, 255, size=(len(model_names), 3)).astype(np.int64)
self.__dict__.update(locals())
def post_process(self, result, shape):
logits_ = result[:, 4:]
boxes_ = result[:, :4]
# filter
score, cls = logits_.max(1, keepdim=True)
idx = (score > self.conf_threshold).squeeze()
logits_, boxes_ = logits_[idx], boxes_[idx]
# xywh -> xyxy
h, w = shape
boxes_ = xywh2xyxy(boxes_)
boxes_[:, 0] *= w
boxes_[:, 2] *= w
boxes_[:, 1] *= w
boxes_[:, 3] *= w
return torch.cat([boxes_, logits_], dim=1)
def draw_detections(self, box, color, name, img):
xmin, ymin, xmax, ymax = list(map(int, list(box)))
cv2.rectangle(img, (xmin, ymin), (xmax, ymax), tuple(int(x) for x in color), 2)
cv2.putText(img, str(name), (xmin, ymin - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.8, tuple(int(x) for x in color), 2, lineType=cv2.LINE_AA)
return img
def renormalize_cam_in_bounding_boxes(self, boxes, image_float_np, grayscale_cam):
"""Normalize the CAM to be in the range [0, 1]
inside every bounding boxes, and zero outside of the bounding boxes. """
h, w, _ = image_float_np.shape
renormalized_cam = np.zeros(grayscale_cam.shape, dtype=np.float32)
for x1, y1, x2, y2 in boxes:
x1, y1 = max(x1 , 0) , max(y1, 0)
x2, y2 = min(grayscale_cam.shape[1] - 1, x2) , min(grayscale_cam.shape[0] - 1, y2)
renormalized_cam[y1:y2, x1:x2] = scale_cam_image(grayscale_cam[y1:y2, x1:x2].copy())
renormalized_cam = scale_cam_image(renormalized_cam)
eigencam_image_renormalized = show_cam_on_image(image_float_np, renormalized_cam, use_rgb=True)
return eigencam_image_renormalized
def process(self, img_path, save_path):
# img process
img = cv2.imread(img_path)
ori_h, ori_w = img.shape[:2]
img = letterbox(img, auto=False, scaleFill=True)[0]
img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
img = np.float32(img) / 255.0
tensor = torch.from_numpy(np.transpose(img, axes=[2, 0, 1])).unsqueeze(0).to(self.device)
try:
grayscale_cam = self.method(tensor, [self.target])
except AttributeError as e:
return
grayscale_cam = grayscale_cam[0, :]
cam_image = show_cam_on_image(img, grayscale_cam, use_rgb=True)
pred = self.model(tensor)[0][0]
pred = self.post_process(pred, img.shape[:2])
if self.renormalize:
cam_image = self.renormalize_cam_in_bounding_boxes(pred[:, :4].cpu().detach().numpy().astype(np.int32), img, grayscale_cam)
if self.show_box:
for data in pred:
data = data.cpu().detach().numpy()
cam_image = self.draw_detections(data[:4], self.colors[int(data[4:].argmax())], f'{self.model_names[int(data[4:].argmax())]} {float(data[4:].max()):.2f}', cam_image)
cam_image = cv2.resize(cam_image, (ori_w, ori_h))
cam_image = Image.fromarray(cam_image)
cam_image.save(save_path)
def __call__(self, img_path, save_path):
# remove dir if exist
if os.path.exists(save_path):
shutil.rmtree(save_path)
# make dir if not exist
os.makedirs(save_path, exist_ok=True)
if os.path.isdir(img_path):
for img_path_ in os.listdir(img_path):
self.process(f'{img_path}/{img_path_}', f'{save_path}/{img_path_}')
else:
self.process(img_path, f'{save_path}/result.png')
def get_params():
params = {
'weight': 'runs/detect/train/weights/best.pt',
'device': 'cuda:0',
'method': 'GradCAMPlusPlus', # GradCAMPlusPlus, GradCAM, XGradCAM, EigenCAM, HiResCAM, LayerCAM, RandomCAM, EigenGradCAM
'layer': [15,17,23],
'backward_type': 'all', # class, box, all
'conf_threshold': 0.2, # 0.2
'ratio': 0.02,
'show_box': True,
'renormalize': True
}
return params
if __name__ == '__main__':
model = rtdetr_heatmap(**get_params())
model(r'figures', 'result')
三、参数解析
需要配置的参数主要在
get_params()
函数下:
def get_params():
params = {
'weight': 'runs/detect/train/weights/best.pt',
'device': 'cuda:0',
'method': 'GradCAMPlusPlus', # GradCAMPlusPlus, GradCAM, XGradCAM, EigenCAM, HiResCAM, LayerCAM, RandomCAM, EigenGradCAM
'layer': [15,17,23],
'backward_type': 'all', # class, box, all
'conf_threshold': 0.2, # 0.2
'ratio': 0.02,
'show_box': True,
'renormalize': True
}
return params
参数含义解释 :
| 参数 | 解释 |
|---|---|
| weight | 权重路径,训练完成后的权重文件 |
| device | 运行的设备,和模型训练时的device参数设置一致 |
| method_name | 代码注释中放了10种不同的可视化实现方式,可以都试试,效果不同 |
| layer | 想要输出第哪几层的热力图就写几,可以多换换层数,大小,看看效果 |
| backward_type | 反向传播的计算类型,class表示按照类别最大概率进行计算 或 通过box计算梯度 all表示均使用 |
| conf_threshold | 目标置信度阈值 |
| ratio | 取前多少数据,设置成0.02 |
| show_box | 是否显示检测框,False不显示,True显示 |
| renormalize | 是否对检测后的热力图进行优化 |
红色框中的数据就是行号。
四、使用方法
4.1 指定模型
指定模型需要修改的参数是
weight
。
'weight': 'runs/detect/train/weights/best.pt'
将参数内容修改成自己的权重文件路径。
4.2 设置可视化实现方式
设置可视化实现方式需要修改的参数是
method_name
。
'method_name': 'GradCAMPlusPlus'
可选方法有如下10种:
GradCAMPlusPlus
,
GradCAM
,
XGradCAM
,
EigenCAM
,
HiResCAM
,
LayerCAM
,
RandomCAM
,
EigenGradCAM
,
ScoreCAM
,
GradCAMElementWise
4.3 指定显示层
指定显示层需要修改的参数是
layer
。
'layer': [15,17,23]
可换成其他层号,只要不是-1就行。
4.4 设置置信度
设置置信度需要修改的参数是
conf_threshold
。
'conf_threshold': 0.2
主要和检测有关。
4.5 指定图像和保存地址
指定图像和保存地址需要修改的参数在
main
函数中。
if __name__ == '__main__':
model = rtdetr_heatmap(**get_params())
model(r'figures', 'result')
在
model(r'figures', 'result')
中:
第一个参数
r'figures'
表示想要进行热力图绘制的原图像路径。
第二个参数
'result'
表示绘制完成后输出的文件夹路径。
五、热力图可视化结果
运行文件后变回开始检测并绘制热力图,下面进度条未满是置信度设置的原因,未进行的都是不满足置信度的,不影响结果的输出。
绘制结果