RT-DETR改进策略【Neck】| PRCV 2023,SBA(Selective Boundary Aggregation):特征融合模块,描绘物体轮廓重新校准物体位置,解决边界模糊问题
一、本文介绍
本文主要
利用DuAT中的SBA 模块优化 RT-DETR的目标检测网络模型
。
SBA 模块
借鉴了医疗图像分割中
处理边界信息
的独特思路,通过创新性的结构设计,在维持合理计算复杂度的基础上,巧妙
融合浅层的边界细节特征与深层的语义信息
,实现边界特征的精准提取与语义信息的有效整合。将其应用于
RT-DETR
的改进过程中,能够使模型
着重聚焦于目标物体的边界区域,降低背景及其他无关信息的影响
,强化目标物体的边界特征表达,从而提升模型在复杂场景下对目标物体的检测精度与定位准确性。
二、SBA介绍
DuAT: Dual-Aggregation Transformer Network for Medical Image Segmentation
SBA 模块
在医疗图像分割中具有重要作用,其设计出发点是
解决图像中物体边界模糊以及融合特征时的冗余和不一致问题。
2.1 出发点
医疗图像存在物体 边界模糊 的问题,且以往直接融合低层次和高层次特征会导致冗余和不一致。浅层特征细节丰富但语义少、边界清晰,深层特征语义丰富。为了更好地描绘物体轮廓和重新校准物体位置,需要一种更有效的特征聚合方式。
2.2 结构原理
设计了新颖的 重新校准注意力单元(RAU)块 。
- 它在融合前自适应地从两个输入( F s F^{s} F s , F b F^{b} F b )中选取相互表示。浅层和深层信息以不同方式输入到两个 RAU 块,其中 F s F^{s} F s 是融合编码器第三和第四层后的深层语义信息, F b F^{b} F b 是来自骨干网络的第一层具有丰富边界细节的信息。
- 两个 RAU 块的输出经过 3×3 卷积后进行拼接。
- RAU 块函数 P A U ( ⋅ , ⋅ ) PAU(\cdot,\cdot) P A U ( ⋅ , ⋅ ) 过程为:先对输入特征 T 1 T_{1} T 1 , T 2 T_{2} T 2 应用线性映射和 sigmoid 函数 W θ ( ⋅ ) W_{\theta}(\cdot) W θ ( ⋅ ) , W ϕ ( ⋅ ) W_{\phi}(\cdot) W ϕ ( ⋅ ) 将通道维度降为 32 得到特征图 T 1 ′ T_{1}' T 1 ′ 和 T 2 ′ T_{2}' T 2 ′ ,然后通过特定公式计算得到最终结果。
2.3 作用
通过 选择性地聚合边界信息和语义信息 , 能够更精细地描绘物体轮廓和确定重新校准物体的位置,有效解决边界模糊问题 ,提高模型对物体边界的分割精度,从而提升整体模型的性能。
论文: https://arxiv.org/pdf/2212.11677
源码: https://github.com/Barrett-python/DuAT
三、SBA的实现代码
SBA模块
的实现代码如下:
import torch
from torch import nn
def autopad(k, p=None, d=1):
"""
Pads kernel to 'same' output shape, adjusting for optional dilation; returns padding size.
`k`: kernel, `p`: padding, `d`: dilation.
"""
if d > 1:
k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k] # actual kernel-size
if p is None:
p = k // 2 if isinstance(k, int) else [x // 2 for x in k] # auto-pad
return p
class Conv(nn.Module):
# Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)
default_act = nn.SiLU() # default activation
def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):
"""Initializes a standard convolution layer with optional batch normalization and activation."""
super().__init__()
self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)
self.bn = nn.BatchNorm2d(c2)
self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()
def forward(self, x):
"""Applies a convolution followed by batch normalization and an activation function to the input tensor `x`."""
return self.act(self.bn(self.conv(x)))
def forward_fuse(self, x):
"""Applies a fused convolution and activation function to the input tensor `x`."""
return self.act(self.conv(x))
def Upsample(x, size, align_corners = False):
"""
Wrapper Around the Upsample Call
"""
return nn.functional.interpolate(x, size=size, mode='bilinear', align_corners=align_corners)
class SBA(nn.Module):
def __init__(self, inc, input_dim=64):
super().__init__()
self.input_dim = input_dim
self.d_in1 = Conv(input_dim//2, input_dim//2, 1)
self.d_in2 = Conv(input_dim//2, input_dim//2, 1)
self.conv = Conv(input_dim, input_dim, 3)
self.fc1 = nn.Conv2d(inc[1], input_dim//2, kernel_size=1, bias=False)
self.fc2 = nn.Conv2d(inc[0], input_dim//2, kernel_size=1, bias=False)
self.Sigmoid = nn.Sigmoid()
def forward(self, x):
H_feature, L_feature = x
L_feature = self.fc1(L_feature)
H_feature = self.fc2(H_feature)
g_L_feature = self.Sigmoid(L_feature)
g_H_feature = self.Sigmoid(H_feature)
L_feature = self.d_in1(L_feature)
H_feature = self.d_in2(H_feature)
L_feature = L_feature + L_feature * g_L_feature + (1 - g_L_feature) * Upsample(g_H_feature * H_feature, size= L_feature.size()[2:], align_corners=False)
H_feature = H_feature + H_feature * g_H_feature + (1 - g_H_feature) * Upsample(g_L_feature * L_feature, size= H_feature.size()[2:], align_corners=False)
H_feature = Upsample(H_feature, size = L_feature.size()[2:])
out = self.conv(torch.cat([H_feature, L_feature], dim=1))
return out
四、添加步骤
4.1 修改一
① 在
ultralytics/nn/
目录下新建
AddModules
文件夹用于存放模块代码
② 在
AddModules
文件夹下新建
SBA.py
,将
第三节
中的代码粘贴到此处
4.2 修改二
在
AddModules
文件夹下新建
__init__.py
(已有则不用新建),在文件内导入模块:
from .SBA import *
4.3 修改三
在
ultralytics/nn/modules/tasks.py
文件中,需要在两处位置添加各模块类名称。
首先:导入模块
然后,在
parse_model函数
中添加如下代码:
elif m in {SBA}:
c1 = [ch[x] for x in f]
c2 = c1[-1]
args = [c1, c2]
五、yaml模型文件
5.1 模型改进版本⭐
此处以
ultralytics/cfg/models/rt-detr/rtdetr-l.yaml
为例,在同目录下创建一个用于自己数据集训练的模型文件
rtdetr-l-SBA.yaml
。
将
rtdetr-l.yaml
中的内容复制到
rtdetr-l-SBA.yaml
文件下,修改
nc
数量等于自己数据中目标的数量。
📌 模型的修改方法是将
颈部网络
中的
上采样和Concat
替换成
SBA模块
。
# Ultralytics YOLO 🚀, AGPL-3.0 license
# RT-DETR-l object detection model with P3-P5 outputs. For details see https://docs.ultralytics.com/models/rtdetr
# Parameters
nc: 1 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'
# [depth, width, max_channels]
l: [1.00, 1.00, 1024]
backbone:
# [from, repeats, module, args]
- [-1, 1, HGStem, [32, 48]] # 0-P2/4
- [-1, 6, HGBlock, [48, 128, 3]] # stage 1
- [-1, 1, DWConv, [128, 3, 2, 1, False]] # 2-P3/8
- [-1, 6, HGBlock, [96, 512, 3]] # stage 2
- [-1, 1, DWConv, [512, 3, 2, 1, False]] # 4-P4/16
- [-1, 6, HGBlock, [192, 1024, 5, True, False]] # cm, c2, k, light, shortcut
- [-1, 6, HGBlock, [192, 1024, 5, True, True]]
- [-1, 6, HGBlock, [192, 1024, 5, True, True]] # stage 3
- [-1, 1, DWConv, [1024, 3, 2, 1, False]] # 8-P5/32
- [-1, 6, HGBlock, [384, 2048, 5, True, False]] # stage 4
head:
- [-1, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 10 input_proj.2
- [-1, 1, AIFI, [1024, 8]]
- [-1, 1, Conv, [256, 1, 1]] # 12, Y5, lateral_convs.0
- [[-1, 7], 1, SBA, []]
- [-1, 3, RepC3, [256]] # 14, fpn_blocks.0
- [[-1, 3], 1, SBA, []]
- [-1, 3, RepC3, [256]] # X3 (16), fpn_blocks.1
- [[-1, 14], 1, SBA, []] # cat Y4
- [-1, 3, RepC3, [256]] # F4 (18), pan_blocks.0
- [[-1, 12], 1, SBA, []] # cat Y5
- [-1, 3, RepC3, [256]] # F5 (20), pan_blocks.1
- [[16, 18, 20], 1, RTDETRDecoder, [nc]] # Detect(P3, P4, P5)
六、成功运行结果
打印网络模型可以看到
SBA
已经加入到模型中,并可以进行训练了。
rtdetr-l-SBA :
rtdetr-l-SBA summary: 711 layers, 45,855,427 parameters, 45,855,427 gradients, 172.8 GFLOPs
from n params module arguments
0 -1 1 25248 ultralytics.nn.modules.block.HGStem [3, 32, 48]
1 -1 6 155072 ultralytics.nn.modules.block.HGBlock [48, 48, 128, 3, 6]
2 -1 1 1408 ultralytics.nn.modules.conv.DWConv [128, 128, 3, 2, 1, False]
3 -1 6 839296 ultralytics.nn.modules.block.HGBlock [128, 96, 512, 3, 6]
4 -1 1 5632 ultralytics.nn.modules.conv.DWConv [512, 512, 3, 2, 1, False]
5 -1 6 1695360 ultralytics.nn.modules.block.HGBlock [512, 192, 1024, 5, 6, True, False]
6 -1 6 2055808 ultralytics.nn.modules.block.HGBlock [1024, 192, 1024, 5, 6, True, True]
7 -1 6 2055808 ultralytics.nn.modules.block.HGBlock [1024, 192, 1024, 5, 6, True, True]
8 -1 1 11264 ultralytics.nn.modules.conv.DWConv [1024, 1024, 3, 2, 1, False]
9 -1 6 6708480 ultralytics.nn.modules.block.HGBlock [1024, 384, 2048, 5, 6, True, False]
10 -1 1 524800 ultralytics.nn.modules.conv.Conv [2048, 256, 1, 1, None, 1, 1, False]
11 -1 1 789760 ultralytics.nn.modules.transformer.AIFI [256, 1024, 8]
12 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
13 [-1, 7] 1 10620928 ultralytics.nn.AddModules.SBA.SBA [[256, 1024], 1024]
14 -1 3 2494464 ultralytics.nn.modules.block.RepC3 [1024, 256, 3]
15 [-1, 3] 1 2689024 ultralytics.nn.AddModules.SBA.SBA [[256, 512], 512]
16 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
17 [-1, 14] 1 689152 ultralytics.nn.AddModules.SBA.SBA [[256, 256], 256]
18 -1 3 2101248 ultralytics.nn.modules.block.RepC3 [256, 256, 3]
19 [-1, 12] 1 689152 ultralytics.nn.AddModules.SBA.SBA [[256, 256], 256]
20 -1 3 2101248 ultralytics.nn.modules.block.RepC3 [256, 256, 3]
21 [16, 18, 20] 1 7303907 ultralytics.nn.modules.head.RTDETRDecoder [1, [256, 256, 256]]
rtdetr-l-SBA summary: 711 layers, 45,855,427 parameters, 45,855,427 gradients, 172.8 GFLOPs