RT-DETR改进策略【卷积层】| RFEM:感受野增强模块, 解决多尺度目标检测中因感受野不足导致的小目标信息丢失问题
一、本文介绍
本文记录的是
利用 RFEM 模块优化 RT-DETR 的目标检测网络模型
。
RFE(Receptive Field Enhancement Module)
模块的设计
解决了多尺度目标检测中因感受野不足导致的小目标信息丢失问题
。通过引入
扩张卷积
,在不增加计算量的前提下扩大有效感受野,
增强模型对不同尺度目标的特征表达能力
。
二、RFEM模块介绍
2.1 设计出发点
RFE模块(Receptive Field Enhancement Module)的设计出发点是解决多尺度人脸检测中因感受野不足导致的小目标信息丢失问题。传统特征金字塔网络(如FPN)在处理小尺度人脸时,多次卷积操作会导致浅层特征的空间信息逐渐丢失,而深层特征虽语义丰富但空间分辨率较低。
RFE模块通过引入扩张卷积,在不增加计算量的前提下扩大有效感受野,增强模型对不同尺度人脸的特征表达能力。
2.2 结构
RFE模块由以下核心部分构成:
- 多分支扩张卷积 :采用三个并行分支,分别使用扩张率为1、2、3的3×3卷积,捕捉不同范围的上下文信息。所有分支共享权重,减少参数冗余。
- 聚合与加权层 :通过平均池化融合多分支特征,并利用1×1卷积调整通道数,最后通过残差连接将输出与输入相加,避免梯度消失问题。
- 替换瓶颈结构 :将CSPDarknet53中的Bottleneck替换为RFE模块,优化P5层的特征提取能力,平衡模型深度与感受野扩张需求。
2.3 优势
- 多尺度适应性 :不同扩张率的卷积分支有效覆盖多尺度感受野,提升对小目标和密集人脸的检测能力。
- 轻量高效 :共享权重设计在增强模型能力的同时保持计算效率,避免过拟合风险。
- 鲁棒性提升 :残差连接确保梯度流畅传递,结合扩张卷积的上下文建模能力,显著改善遮挡和模糊人脸的检测效果。
论文: https://arxiv.org/pdf/2208.02019
源码: https://github.com/Krasjet-Yu/YOLO-FaceV2
三、RFEM的实现代码
RFEM
的实现代码如下:
import torch
import torch.nn as nn
class TridentBlock(nn.Module):
def __init__(self, c1, c2, stride=1, c=False, e=0.5, padding=[1, 2, 3], dilate=[1, 2, 3], bias=False):
super(TridentBlock, self).__init__()
self.stride = stride
self.c = c
c_ = int(c2 * e)
self.padding = padding
self.dilate = dilate
self.share_weightconv1 = nn.Parameter(torch.Tensor(c_, c1, 1, 1))
self.share_weightconv2 = nn.Parameter(torch.Tensor(c2, c_, 3, 3))
self.bn1 = nn.BatchNorm2d(c_)
self.bn2 = nn.BatchNorm2d(c2)
self.act = nn.SiLU()
nn.init.kaiming_uniform_(self.share_weightconv1, nonlinearity="relu")
nn.init.kaiming_uniform_(self.share_weightconv2, nonlinearity="relu")
if bias:
self.bias = nn.Parameter(torch.Tensor(c2))
else:
self.bias = None
if self.bias is not None:
nn.init.constant_(self.bias, 0)
def forward_for_small(self, x):
residual = x
out = nn.functional.conv2d(x, self.share_weightconv1, bias=self.bias)
out = self.bn1(out)
out = self.act(out)
out = nn.functional.conv2d(out, self.share_weightconv2, bias=self.bias, stride=self.stride, padding=self.padding[0],
dilation=self.dilate[0])
out = self.bn2(out)
out += residual
out = self.act(out)
return out
def forward_for_middle(self, x):
residual = x
out = nn.functional.conv2d(x, self.share_weightconv1, bias=self.bias)
out = self.bn1(out)
out = self.act(out)
out = nn.functional.conv2d(out, self.share_weightconv2, bias=self.bias, stride=self.stride, padding=self.padding[1],
dilation=self.dilate[1])
out = self.bn2(out)
out += residual
out = self.act(out)
return out
def forward_for_big(self, x):
residual = x
out = nn.functional.conv2d(x, self.share_weightconv1, bias=self.bias)
out = self.bn1(out)
out = self.act(out)
out = nn.functional.conv2d(out, self.share_weightconv2, bias=self.bias, stride=self.stride, padding=self.padding[2],
dilation=self.dilate[2])
out = self.bn2(out)
out += residual
out = self.act(out)
return out
def forward(self, x):
xm = x
base_feat = []
if self.c is not False:
x1 = self.forward_for_small(x)
x2 = self.forward_for_middle(x)
x3 = self.forward_for_big(x)
else:
x1 = self.forward_for_small(xm[0])
x2 = self.forward_for_middle(xm[1])
x3 = self.forward_for_big(xm[2])
base_feat.append(x1)
base_feat.append(x2)
base_feat.append(x3)
return base_feat
class RFEM(nn.Module):
def __init__(self, c1, c2, n=1, e=0.5, stride=1):
super(RFEM, self).__init__()
c = True
layers = []
layers.append(TridentBlock(c1, c2, stride=stride, c=c, e=e))
c1 = c2
for i in range(1, n):
layers.append(TridentBlock(c1, c2))
self.layer = nn.Sequential(*layers)
# self.cv = Conv(c2, c2)
self.bn = nn.BatchNorm2d(c2)
self.act = nn.SiLU()
def forward(self, x):
out = self.layer(x)
out = out[0] + out[1] + out[2] + x
out = self.act(self.bn(out))
return out
def autopad(k, p=None, d=1): # kernel, padding, dilation
"""Pad to 'same' shape outputs."""
if d > 1:
k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k] # actual kernel-size
if p is None:
p = k // 2 if isinstance(k, int) else [x // 2 for x in k] # auto-pad
return p
class Conv(nn.Module):
"""Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)."""
default_act = nn.SiLU() # default activation
def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):
"""Initialize Conv layer with given arguments including activation."""
super().__init__()
self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)
self.bn = nn.BatchNorm2d(c2)
self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()
def forward(self, x):
"""Apply convolution, batch normalization and activation to input tensor."""
return self.act(self.bn(self.conv(x)))
def forward_fuse(self, x):
"""Perform transposed convolution of 2D data."""
return self.act(self.conv(x))
class C3RFEM(nn.Module):
def __init__(self, c1, c2, n=1, shortcut=True, e=0.5): # ch_in, ch_out, number, shortcut, groups, expansion
super().__init__()
c_ = int(c2 * e) # hidden channels
self.cv1 = Conv(c1, c_, 1, 1)
self.cv2 = Conv(c1, c_, 1, 1)
self.cv3 = Conv(2 * c_, c2, 1) # act=FReLU(c2)
# self.m = nn.Sequential(*(Bottleneck(c_, c_, shortcut, g, e=1.0) for _ in range(n)))
# self.rfem = RFEM(c_, c_, n)
self.m = nn.Sequential(*[RFEM(c_, c_, n=1, e=e) for _ in range(n)])
# self.m = nn.Sequential(*[CrossConv(c_, c_, 3, 1, g, 1.0, shortcut) for _ in range(n)])
def forward(self, x):
return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), dim=1))
四、创新模块
4.1 改进点⭐
模块改进方法
:加入
RFEM模块
(
第五节讲解添加步骤
)。
RFEM模块
添加后如下:
五、添加步骤
5.1 修改一
① 在
ultralytics/nn/
目录下新建
AddModules
文件夹用于存放模块代码
② 在
AddModules
文件夹下新建
RFEM.py
,将
第三节
中的代码粘贴到此处
5.2 修改二
在
AddModules
文件夹下新建
__init__.py
(已有则不用新建),在文件内导入模块:
from .RFEM import *
5.3 修改三
在
ultralytics/nn/modules/tasks.py
文件中,需要在两处位置添加各模块类名称。
首先:导入模块
其次:在
parse_model函数
中注册
C3RFEM
:
六、yaml模型文件
6.1 模型改进版本
此处以
ultralytics/cfg/models/rt-detr/rtdetr-l.yaml
为例,在同目录下创建一个用于自己数据集训练的模型文件
rtdetr-l-RFEM.yaml
。
将
rtdetr-l.yaml
中的内容复制到
rtdetr-l-RFEM.yaml
文件下,修改
nc
数量等于自己数据中目标的数量。
📌 模型的修改方法是将
骨干网络
中添加
C3RFEM模块
。
# Ultralytics YOLO 🚀, AGPL-3.0 license
# RT-DETR-l object detection model with P3-P5 outputs. For details see https://docs.ultralytics.com/models/rtdetr
# Parameters
nc: 1 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'
# [depth, width, max_channels]
l: [1.00, 1.00, 1024]
backbone:
# [from, repeats, module, args]
- [-1, 1, HGStem, [32, 48]] # 0-P2/4
- [-1, 6, HGBlock, [48, 128, 3]] # stage 1
- [-1, 1, DWConv, [128, 3, 2, 1, False]] # 2-P3/8
- [-1, 6, C3RFEM, [128]] # stage 2
- [-1, 1, DWConv, [512, 3, 2, 1, False]] # 4-P4/16
- [-1, 6, C3RFEM, [512]] # cm, c2, k, light, shortcut
- [-1, 6, C3RFEM, [512]]
- [-1, 6, C3RFEM, [512]] # stage 3
- [-1, 1, DWConv, [1024, 3, 2, 1, False]] # 8-P5/32
- [-1, 6, C3RFEM, [1024]] # stage 4
head:
- [-1, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 10 input_proj.2
- [-1, 1, AIFI, [1024, 8]]
- [-1, 1, Conv, [256, 1, 1]] # 12, Y5, lateral_convs.0
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [7, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 14 input_proj.1
- [[-2, -1], 1, Concat, [1]]
- [-1, 3, RepC3, [256]] # 16, fpn_blocks.0
- [-1, 1, Conv, [256, 1, 1]] # 17, Y4, lateral_convs.1
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [3, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 19 input_proj.0
- [[-2, -1], 1, Concat, [1]] # cat backbone P4
- [-1, 3, RepC3, [256]] # X3 (21), fpn_blocks.1
- [-1, 1, Conv, [256, 3, 2]] # 22, downsample_convs.0
- [[-1, 17], 1, Concat, [1]] # cat Y4
- [-1, 3, RepC3, [256]] # F4 (24), pan_blocks.0
- [-1, 1, Conv, [256, 3, 2]] # 25, downsample_convs.1
- [[-1, 12], 1, Concat, [1]] # cat Y5
- [-1, 3, RepC3, [256]] # F5 (27), pan_blocks.1
- [[21, 24, 27], 1, RTDETRDecoder, [nc]] # Detect(P3, P4, P5)
七、成功运行结果
打印网络模型可以看到
C3RFEM
已经加入到模型中,并可以进行训练了。
rtdetr-l-RFEM :
rtdetr-l-RFEM summary: 1,007 layers, 55,168,835 parameters, 55,168,835 gradients, 114.9 GFLOPs
from n params module arguments
0 -1 1 25248 ultralytics.nn.modules.block.HGStem [3, 32, 48]
1 -1 6 155072 ultralytics.nn.modules.block.HGBlock [48, 48, 128, 3, 6]
2 -1 1 1408 ultralytics.nn.modules.conv.DWConv [128, 128, 3, 2, 1, False]
3 -1 6 324480 ultralytics.nn.AddModules.RFEM.C3RFEM [128, 128]
4 -1 1 5632 ultralytics.nn.modules.conv.DWConv [128, 512, 3, 2, 1, False]
5 -1 6 5131776 ultralytics.nn.AddModules.RFEM.C3RFEM [512, 512]
6 -1 6 5131776 ultralytics.nn.AddModules.RFEM.C3RFEM [512, 512]
7 -1 6 5131776 ultralytics.nn.AddModules.RFEM.C3RFEM [512, 512]
8 -1 1 11264 ultralytics.nn.modules.conv.DWConv [512, 1024, 3, 2, 1, False]
9 -1 6 20487168 ultralytics.nn.AddModules.RFEM.C3RFEM [1024, 1024]
10 -1 1 262656 ultralytics.nn.modules.conv.Conv [1024, 256, 1, 1, None, 1, 1, False]
11 -1 1 789760 ultralytics.nn.modules.transformer.AIFI [256, 1024, 8]
12 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
13 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
14 7 1 131584 ultralytics.nn.modules.conv.Conv [512, 256, 1, 1, None, 1, 1, False]
15 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
16 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
17 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
18 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
19 3 1 33280 ultralytics.nn.modules.conv.Conv [128, 256, 1, 1, None, 1, 1, False]
20 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
21 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
22 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
23 [-1, 17] 1 0 ultralytics.nn.modules.conv.Concat [1]
24 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
25 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
26 [-1, 12] 1 0 ultralytics.nn.modules.conv.Concat [1]
27 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
28 [21, 24, 27] 1 7303907 ultralytics.nn.modules.head.RTDETRDecoder [1, [256, 256, 256]]
rtdetr-l-RFEM summary: 1,007 layers, 55,168,835 parameters, 55,168,835 gradients, 114.9 GFLOPs