RT-DETR改进策略【卷积层】| AAAI 2025 FBRT-YOLO 应用RT-DETR 加强跨层特征融合能力与多尺度适应性,含 l 和 resnet18 两个版本
一、本文介绍
本文记录的是 利用FBRT-YOLO中的FCM、MKP模块改进RT-DETR的目标检测网络模型。
在小目标检测任务中,传统卷积方式
难以平衡空间位置信息与深层语义信息的融合
,
导致小目标特征易丢失且多尺度感知不足
,影响检测精度与效率。本文引入
FBRT-YOLO
中的
特征互补映射模块
(FCM)与
多内核感知单元
(MKP)改进
RT-DETR
。
FCM
通过通道分割、方向变换与互补映射机制,将浅层空间位置信息逐层嵌入深层语义特征,
缓解了骨干网络中空间-语义失配问题
,使模型更精准捕捉小目标的空间定位信息;
MKP
则通过多尺寸卷积核级联与逐点卷积,
增强了对不同尺度目标的特征感知能力
,扩大了有效感受野。
二、风车卷积介绍
FBRT-YOLO: Faster and Better for Real-Time Aerial Image Detection
FBRT - YOLO是一种用于实时航拍图像检测的模型,其模型结构包含两个核心的轻量级模块,在航拍图像检测任务中展现出了良好的性能。
2.1 FBRT - YOLO的模型结构
FBRT - YOLO包含两个核心轻量级模块,即特征互补映射模块(FCM)和多内核感知单元(MKP)。
- FCM旨在将更多空间位置信息集成到丰富的语义特征中,增强小物体的表示;
- MKP利用不同的卷积核对多尺度目标信息进行捕捉。
- 此外,针对航拍图像检测,该模型还对基线网络进行了精简,去除了非关键或冗余的计算。
2.2 FCM模块设计的出发点
在深度网络中,小目标信息容易丢失,导致信息不平衡,同时浅层网络的高分辨率空间信息与深层网络的低分辨率语义信息存在不匹配的问题,这使得目标信息在空间位置和语义信息的整合上存在不足,影响小目标的检测和定位。
FCM模块正是为了解决这些问题而设计,它致力于将目标的空间位置信息更深入地集成到网络中,使其更好地与深层语义信息对齐,从而提升小目标的定位能力。
2.3 FCM模块的结构
- 通道分割 :将输入特征的通道按比例 α \alpha α 分为两部分,分别为 α C \alpha C α C 通道和 ( 1 − α ) C (1 - \alpha) C ( 1 − α ) C 通道 ,使得网络在加深过程中,能更好地获取和处理低层次空间信息。
- 方向变换 :将分割后的两部分特征分别送入不同的分支进行处理。 X 1 X^{1} X 1 经过标准3×3卷积提取更丰富的特征信息 X C X^{C} X C ; X 2 X^{2} X 2 经过逐点卷积,保留大量浅层空间位置信息 X S X^{S} X S 。
- 互补映射 :对 X C X^{C} X C 和 X S X^{S} X S 进行互补映射。通过通道交互和空间交互,分别为重要信息分配权重并映射到另一分支,实现特征的互补融合,以解决特征离散导致的目标特征匹配不精确问题。
- 特征聚合 :将经过互补映射后得到的通道信息权重 ω 1 \omega_{1} ω 1 和空间信息权重 ω 2 \omega_{2} ω 2 ,分别映射到包含 X S X^{S} X S 和 X C X^{C} X C 的特征上,然后将两个分支连接起来,得到包含空间和语义关系双重映射特征的 X F C M X^{FCM} X FCM 。
2.4 FCM模块的优势
- 缓解信息不平衡 :采用信息互补融合的方式,将浅层空间位置信息传播到网络深层,有效缓解了骨干网络下采样过程中物体空间位置信息的丢失,促进了空间和语义信息在不同阶段骨干网络的互补学习。
- 增强小目标检测能力 :通过将空间位置信息更有效地融入深层语义信息,FCM模块提升了网络对小目标的特征匹配能力,增强了小目标在深层网络中的表示,进而提高了小目标的检测和定位精度。
- 计算资源消耗低 :整体采用相对低计算资源的方式进行信息处理和融合,在提升检测性能的同时,不会给模型带来过高的计算负担,有助于实现实时检测。
论文: https://arxiv.org/pdf/2504.20670
源码: https://github.com/galaxy-oss/FCM
三、FBRT-YOLO的实现代码
FBRT-YOLO
及其改进的实现代码如下:
import torch
import torch.nn as nn
from ultralytics.nn.modules.conv import LightConv
class Channel(nn.Module):
def __init__(self, dim):
super().__init__()
self.dwconv = self.dconv = nn.Conv2d(
dim, dim, 3,
1, 1, groups=dim
)
self.Apt = nn.AdaptiveAvgPool2d(1)
self.sigmoid = nn.Sigmoid()
def forward(self, x):
x2 = self.dwconv(x)
x5 = self.Apt(x2)
x6 = self.sigmoid(x5)
return x6
class Spatial(nn.Module):
def __init__(self, dim):
super().__init__()
self.conv1 = nn.Conv2d(dim, 1, 1, 1)
self.bn = nn.BatchNorm2d(1)
self.sigmoid = nn.Sigmoid()
def forward(self, x):
x1 = self.conv1(x)
x5 = self.bn(x1)
x6 = self.sigmoid(x5)
return x6
class FCM(nn.Module):
def __init__(self, dim,dim_out):
super().__init__()
self.one = dim // 4
self.two = dim - dim // 4
self.conv1 = Conv(dim // 4, dim // 4, 3, 1, 1)
self.conv12 = Conv(dim // 4, dim // 4, 3, 1, 1)
self.conv123 = Conv(dim // 4, dim, 1, 1)
self.conv2 = Conv(dim - dim // 4, dim, 1, 1)
self.conv3 = Conv(dim, dim, 1, 1)
self.spatial = Spatial(dim)
self.channel = Channel(dim)
def forward(self, x):
x1, x2 = torch.split(x, [self.one, self.two], dim=1)
x3 = self.conv1(x1)
x3 = self.conv12(x3)
x3 = self.conv123(x3)
x4 = self.conv2(x2)
x33 = self.spatial(x4) * x3
x44 = self.channel(x3) * x4
x5 = x33 + x44
x5 = self.conv3(x5)
return x5
class Pzconv(nn.Module):
def __init__(self, dim, k=1, s=1, p=None, g=1, d=1, act=True):
super().__init__()
self.conv1 = nn.Conv2d(
dim, dim, 3,
1, 1, groups=dim
)
self.conv2 = Conv(dim, dim, k=1, s=1, )
self.conv3 = nn.Conv2d(
dim, dim, 5,
1, 2, groups=dim
)
self.conv4 = Conv(dim, dim, 1, 1)
self.conv5 = nn.Conv2d(
dim, dim, 7,
1, 3, groups=dim
)
def forward(self, x):
x1 = self.conv1(x)
x2 = self.conv2(x1)
x3 = self.conv3(x2)
x4 = self.conv4(x3)
x5 = self.conv5(x4)
x6 = x5 + x
return x6
def autopad(k, p=None, d=1): # kernel, padding, dilation
"""Pad to 'same' shape outputs."""
if d > 1:
k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k] # actual kernel-size
if p is None:
p = k // 2 if isinstance(k, int) else [x // 2 for x in k] # auto-pad
return p
class Conv(nn.Module):
"""Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)."""
default_act = nn.SiLU() # default activation
def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):
"""Initialize Conv layer with given arguments including activation."""
super().__init__()
self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)
self.bn = nn.BatchNorm2d(c2)
self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()
def forward(self, x):
"""Apply convolution, batch normalization and activation to input tensor."""
return self.act(self.bn(self.conv(x)))
def forward_fuse(self, x):
"""Perform transposed convolution of 2D data."""
return self.act(self.conv(x))
class HGBlock_FCM(nn.Module):
"""
HG_Block of PPHGNetV2 with 2 convolutions and LightConv.
https://github.com/PaddlePaddle/PaddleDetection/blob/develop/ppdet/modeling/backbones/hgnet_v2.py
"""
def __init__(self, c1, cm, c2, k=3, n=6, lightconv=False, shortcut=False, act=nn.ReLU()):
"""Initializes a CSP Bottleneck with 1 convolution using specified input and output channels."""
super().__init__()
block = LightConv if lightconv else Conv
self.m = nn.ModuleList(block(c1 if i == 0 else cm, cm, k=k, act=act) for i in range(n))
self.sc = Conv(c1 + n * cm, c2 // 2, 1, 1, act=act) # squeeze conv
self.ec = Conv(c2 // 2, c2, 1, 1, act=act) # excitation conv
self.add = shortcut and c1 == c2
self.cv = FCM(c2, c2)
def forward(self, x):
"""Forward pass of a PPHGNetV2 backbone layer."""
y = [x]
y.extend(m(y[-1]) for m in self.m)
y = self.cv(self.ec(self.sc(torch.cat(y, 1))))
return y + x if self.add else y
四、创新模块
4.1 改进点⭐
模块改进方法
:基于
FCM模块
的
HGBlock
(
第五节讲解添加步骤
)。
第二种改进方法是对
RT-DETR
中的
HGBlock模块
进行改进,并将
FCM
在加入到
HGBlock
模块中。
改进代码如下:
对
HGBlock
模块进行改进,加入
FCM模块
class HGBlock_FCM(nn.Module):
"""
HG_Block of PPHGNetV2 with 2 convolutions and LightConv.
https://github.com/PaddlePaddle/PaddleDetection/blob/develop/ppdet/modeling/backbones/hgnet_v2.py
"""
def __init__(self, c1, cm, c2, k=3, n=6, lightconv=False, shortcut=False, act=nn.ReLU()):
"""Initializes a CSP Bottleneck with 1 convolution using specified input and output channels."""
super().__init__()
block = LightConv if lightconv else Conv
self.m = nn.ModuleList(block(c1 if i == 0 else cm, cm, k=k, act=act) for i in range(n))
self.sc = Conv(c1 + n * cm, c2 // 2, 1, 1, act=act) # squeeze conv
self.ec = Conv(c2 // 2, c2, 1, 1, act=act) # excitation conv
self.add = shortcut and c1 == c2
self.cv = FCM(c2, c2)
def forward(self, x):
"""Forward pass of a PPHGNetV2 backbone layer."""
y = [x]
y.extend(m(y[-1]) for m in self.m)
y = self.cv(self.ec(self.sc(torch.cat(y, 1))))
return y + x if self.add else y
4.2 改进点⭐
模块改进方法
:第二种方法是针对
resnet18
版本进行改进,需先配置好
resnet18
版本,在复制下方代码,覆盖
ResNet.py
即可。
改进代码如下:
from collections import OrderedDict
import torch
import torch.nn as nn
import torch.nn.functional as F
def autopad(k, p=None, d=1): # kernel, padding, dilation
"""Pad to 'same' shape outputs."""
if d > 1:
k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k] # actual kernel-size
if p is None:
p = k // 2 if isinstance(k, int) else [x // 2 for x in k] # auto-pad
return p
class Conv(nn.Module):
"""Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)."""
default_act = nn.SiLU() # default activation
def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):
"""Initialize Conv layer with given arguments including activation."""
super().__init__()
self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)
self.bn = nn.BatchNorm2d(c2)
self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()
def forward(self, x):
"""Apply convolution, batch normalization and activation to input tensor."""
return self.act(self.bn(self.conv(x)))
def forward_fuse(self, x):
"""Perform transposed convolution of 2D data."""
return self.act(self.conv(x))
class Channel(nn.Module):
def __init__(self, dim):
super().__init__()
self.dwconv = self.dconv = nn.Conv2d(
dim, dim, 3,
1, 1, groups=dim
)
self.Apt = nn.AdaptiveAvgPool2d(1)
self.sigmoid = nn.Sigmoid()
def forward(self, x):
x2 = self.dwconv(x)
x5 = self.Apt(x2)
x6 = self.sigmoid(x5)
return x6
class Spatial(nn.Module):
def __init__(self, dim):
super().__init__()
self.conv1 = nn.Conv2d(dim, 1, 1, 1)
self.bn = nn.BatchNorm2d(1)
self.sigmoid = nn.Sigmoid()
def forward(self, x):
x1 = self.conv1(x)
x5 = self.bn(x1)
x6 = self.sigmoid(x5)
return x6
class FCM(nn.Module):
def __init__(self, dim,dim_out):
super().__init__()
self.one = dim // 4
self.two = dim - dim // 4
self.conv1 = Conv(dim // 4, dim // 4, 3, 1, 1)
self.conv12 = Conv(dim // 4, dim // 4, 3, 1, 1)
self.conv123 = Conv(dim // 4, dim, 1, 1)
self.conv2 = Conv(dim - dim // 4, dim, 1, 1)
self.conv3 = Conv(dim, dim, 1, 1)
self.spatial = Spatial(dim)
self.channel = Channel(dim)
def forward(self, x):
x1, x2 = torch.split(x, [self.one, self.two], dim=1)
x3 = self.conv1(x1)
x3 = self.conv12(x3)
x3 = self.conv123(x3)
x4 = self.conv2(x2)
x33 = self.spatial(x4) * x3
x44 = self.channel(x3) * x4
x5 = x33 + x44
x5 = self.conv3(x5)
return x5
class ConvNormLayer(nn.Module):
def __init__(self,
ch_in,
ch_out,
filter_size,
stride,
groups=1,
act=None):
super(ConvNormLayer, self).__init__()
self.act = act
self.conv = nn.Conv2d(
in_channels=ch_in,
out_channels=ch_out,
kernel_size=filter_size,
stride=stride,
padding=(filter_size - 1) // 2,
groups=groups)
self.norm = nn.BatchNorm2d(ch_out)
def forward(self, inputs):
out = self.conv(inputs)
out = self.norm(out)
if self.act:
out = getattr(F, self.act)(out)
return out
class SELayer(nn.Module):
def __init__(self, ch, reduction_ratio=16):
super(SELayer, self).__init__()
self.avg_pool = nn.AdaptiveAvgPool2d(1)
self.fc = nn.Sequential(
nn.Linear(ch, ch // reduction_ratio, bias=False),
nn.ReLU(inplace=True),
nn.Linear(ch // reduction_ratio, ch, bias=False),
nn.Sigmoid()
)
def forward(self, x):
b, c, _, _ = x.size()
y = self.avg_pool(x).view(b, c)
y = self.fc(y).view(b, c, 1, 1)
return x * y.expand_as(x)
class BasicBlock_FCM(nn.Module):
expansion = 1
def __init__(self,
ch_in,
ch_out,
stride,
shortcut,
act='relu',
variant='b',
att=False):
super(BasicBlock_FCM, self).__init__()
self.shortcut = shortcut
if not shortcut:
if variant == 'd' and stride == 2:
self.short = nn.Sequential()
self.short.add_sublayer(
'pool',
nn.AvgPool2d(
kernel_size=2, stride=2, padding=0, ceil_mode=True))
self.short.add_sublayer(
'conv',
ConvNormLayer(
ch_in=ch_in,
ch_out=ch_out,
filter_size=1,
stride=1))
else:
self.short = ConvNormLayer(
ch_in=ch_in,
ch_out=ch_out,
filter_size=1,
stride=stride)
self.branch2a = ConvNormLayer(
ch_in=ch_in,
ch_out=ch_out,
filter_size=3,
stride=stride,
act='relu')
self.branch2b = ConvNormLayer(
ch_in=ch_out,
ch_out=ch_out,
filter_size=3,
stride=1,
act=None)
self.att = att
if self.att:
self.se = FCM(ch_out, ch_out)
def forward(self, inputs):
out = self.branch2a(inputs)
out = self.branch2b(out)
if self.att:
out = self.se(out)
if self.shortcut:
short = inputs
else:
short = self.short(inputs)
out = out + short
out = F.relu(out)
return out
class BottleNeck(nn.Module):
expansion = 4
def __init__(self, ch_in, ch_out, stride, shortcut, act='relu', variant='d', att=False):
super().__init__()
if variant == 'a':
stride1, stride2 = stride, 1
else:
stride1, stride2 = 1, stride
width = ch_out
self.branch2a = ConvNormLayer(ch_in, width, 1, stride1, act=act)
self.branch2b = ConvNormLayer(width, width, 3, stride2, act=act)
self.branch2c = ConvNormLayer(width, ch_out * self.expansion, 1, 1)
self.shortcut = shortcut
if not shortcut:
if variant == 'd' and stride == 2:
self.short = nn.Sequential(OrderedDict([
('pool', nn.AvgPool2d(2, 2, 0, ceil_mode=True)),
('conv', ConvNormLayer(ch_in, ch_out * self.expansion, 1, 1))
]))
else:
self.short = ConvNormLayer(ch_in, ch_out * self.expansion, 1, stride)
self.att = att
if self.att:
self.se = SELayer(ch_out * 4)
def forward(self, x):
out = self.branch2a(x)
out = self.branch2b(out)
out = self.branch2c(out)
if self.att:
out = self.se(out)
if self.shortcut:
short = x
else:
short = self.short(x)
out = out + short
out = F.relu(out)
return out
class Blocks(nn.Module):
def __init__(self,
ch_in,
ch_out,
count,
block,
stage_num,
att=False,
variant='b'):
super(Blocks, self).__init__()
self.blocks = nn.ModuleList()
block = globals()[block]
for i in range(count):
self.blocks.append(
block(
ch_in,
ch_out,
stride=2 if i == 0 and stage_num != 2 else 1,
shortcut=False if i == 0 else True,
variant=variant,
att=att)
)
if i == 0:
ch_in = ch_out * block.expansion
def forward(self, inputs):
block_out = inputs
for block in self.blocks:
block_out = block(block_out)
return block_out
注意❗:在
第五小节
中需要声明的模块名称为:
HGBlock_FCM
。
五、添加步骤
5.1 修改一
① 在
ultralytics/nn/
目录下新建
AddModules
文件夹用于存放模块代码
② 在
AddModules
文件夹下新建
FBRT_YOLO.py
,将
第三节
中的代码粘贴到此处
5.2 修改二
在
AddModules
文件夹下新建
__init__.py
(已有则不用新建),在文件内导入模块:
from .FBRT_YOLO import *
5.3 修改三
在
ultralytics/nn/tasks.py
文件中,需要在指定位置添加各模块类名称。
首先:导入模块
其次:在
parse_model函数
中注册
HGBlock_FCM
模块
Resnet18版本的tasks.py只需要按照教程步骤配置即可,此处配置完成后无需再次配置。
六、yaml模型文件
6.1 模型改进版本⭐
此处以
ultralytics/cfg/models/rt-detr/rtdetr-l.yaml
为例,在同目录下创建一个用于自己数据集训练的模型文件
rtdetr-l-FBRT_YOLO.yaml
。
将
rtdetr-l.yaml
中的内容复制到
rtdetr-l-FBRT_YOLO.yaml
文件下,修改
nc
数量等于自己数据中目标的数量。
📌 模型的修改方法是在
rtdetr-l
上,利用
FBRT-YOLO
改进
HGBlock
。
# Ultralytics YOLO 🚀, AGPL-3.0 license
# RT-DETR-l object detection model with P3-P5 outputs. For details see https://docs.ultralytics.com/models/rtdetr
# Parameters
nc: 1 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'
# [depth, width, max_channels]
l: [1.00, 1.00, 1024]
backbone:
# [from, repeats, module, args]
- [-1, 1, HGStem, [32, 48]] # 0-P2/4
- [-1, 6, HGBlock, [48, 128, 3]] # stage 1
- [-1, 1, DWConv, [128, 3, 2, 1, False]] # 2-P3/8
- [-1, 6, HGBlock, [96, 512, 3]] # stage 2
- [-1, 1, DWConv, [512, 3, 2, 1, False]] # 4-P4/16
- [-1, 6, HGBlock_FCM, [192, 1024, 5, True, False]] # cm, c2, k, light, shortcut
- [-1, 6, HGBlock_FCM, [192, 1024, 5, True, True]]
- [-1, 6, HGBlock_FCM, [192, 1024, 5, True, True]] # stage 3
- [-1, 1, DWConv, [1024, 3, 2, 1, False]] # 8-P5/32
- [-1, 6, HGBlock, [384, 2048, 5, True, False]] # stage 4
head:
- [-1, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 10 input_proj.2
- [-1, 1, AIFI, [1024, 8]]
- [-1, 1, Conv, [256, 1, 1]] # 12, Y5, lateral_convs.0
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [7, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 14 input_proj.1
- [[-2, -1], 1, Concat, [1]]
- [-1, 3, RepC3, [256]] # 16, fpn_blocks.0
- [-1, 1, Conv, [256, 1, 1]] # 17, Y4, lateral_convs.1
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [3, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 19 input_proj.0
- [[-2, -1], 1, Concat, [1]] # cat backbone P4
- [-1, 3, RepC3, [256]] # X3 (21), fpn_blocks.1
- [-1, 1, Conv, [256, 3, 2]] # 22, downsample_convs.0
- [[-1, 17], 1, Concat, [1]] # cat Y4
- [-1, 3, RepC3, [256]] # F4 (24), pan_blocks.0
- [-1, 1, Conv, [256, 3, 2]] # 25, downsample_convs.1
- [[-1, 12], 1, Concat, [1]] # cat Y5
- [-1, 3, RepC3, [256]] # F5 (27), pan_blocks.1
- [[21, 24, 27], 1, RTDETRDecoder, [nc]] # Detect(P3, P4, P5)
6.2 模型改进版本⭐
此处以
rtdetr-resnet18.yaml
为例,在同目录下创建一个用于自己数据集训练的模型文件
rtdetr-resnet18-FBRT_YOLO.yaml
。
将
rtdetr-resnet18.yaml
中的内容复制到
rtdetr-resnet18-FBRT_YOLO.yaml
文件下,修改
nc
数量等于自己数据中目标的数量。
📌 模型的修改方法是在网络中的
BasicBlock模块
中添加
FCM模块
。
# Ultralytics YOLO 🚀, AGPL-3.0 license
# RT-DETR-ResNet50 object detection model with P3-P5 outputs.
# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'
# [depth, width, max_channels]
l: [1.00, 1.00, 1024]
backbone:
# [from, repeats, module, args]
- [-1, 1, ConvNormLayer, [32, 3, 2, 1, 'relu']] # 0-P1
- [-1, 1, ConvNormLayer, [32, 3, 1, 1, 'relu']] # 1
- [-1, 1, ConvNormLayer, [64, 3, 1, 1, 'relu']] # 2
- [-1, 1, nn.MaxPool2d, [3, 2, 1]] # 3-P2
- [-1, 2, Blocks, [64, BasicBlock_FCM, 2, True]] # 4
- [-1, 2, Blocks, [128, BasicBlock_FCM, 3, True]] # 5-P3
- [-1, 2, Blocks, [256, BasicBlock_FCM, 4, True]] # 6-P4
- [-1, 2, Blocks, [512, BasicBlock_FCM, 5, True]] # 7-P5
head:
- [-1, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 8 input_proj.2
- [-1, 1, AIFI, [1024, 8]]
- [-1, 1, Conv, [256, 1, 1]] # 10, Y5, lateral_convs.0
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 11
- [6, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 12 input_proj.1
- [[-2, -1], 1, Concat, [1]]
- [-1, 3, RepC3, [256, 0.5]] # 14, fpn_blocks.0
- [-1, 1, Conv, [256, 1, 1]] # 15, Y4, lateral_convs.1
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 16
- [5, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 17 input_proj.0
- [[-2, -1], 1, Concat, [1]] # 18 cat backbone P4
- [-1, 3, RepC3, [256, 0.5]] # X3 (19), fpn_blocks.1
- [-1, 1, Conv, [256, 3, 2]] # 20, downsample_convs.0
- [[-1, 15], 1, Concat, [1]] # 21 cat Y4
- [-1, 3, RepC3, [256, 0.5]] # F4 (22), pan_blocks.0
- [-1, 1, Conv, [256, 3, 2]] # 23, downsample_convs.1
- [[-1, 10], 1, Concat, [1]] # 24 cat Y5
- [-1, 3, RepC3, [256, 0.5]] # F5 (25), pan_blocks.1
- [[19, 22, 25], 1, RTDETRDecoder, [nc, 256, 300, 4, 8, 3]] # Detect(P3, P4, P5)
七、成功运行结果
打印网络模型可以看到
HGBlock_FCM
和
BasicBlock_FCM
已经加入到模型中,并可以进行训练了。
rtdetr-l-HGBlock_FCM :
rtdetr-l-HGBlock_FCM summary: 755 layers, 42,693,836 parameters, 42,693,836 gradients, 139.7 GFLOPs
from n params module arguments
0 -1 1 25248 ultralytics.nn.modules.block.HGStem [3, 32, 48]
1 -1 6 155072 ultralytics.nn.modules.block.HGBlock [48, 48, 128, 3, 6]
2 -1 1 1408 ultralytics.nn.modules.conv.DWConv [128, 128, 3, 2, 1, False]
3 -1 6 839296 ultralytics.nn.modules.block.HGBlock [128, 96, 512, 3, 6]
4 -1 1 5632 ultralytics.nn.modules.conv.DWConv [512, 512, 3, 2, 1, False]
5 -1 6 4990595 ultralytics.nn.AddModules.FBRT_YOLO.HGBlock_FCM[512, 192, 1024, 5, 6, True, False]
6 -1 6 5351043 ultralytics.nn.AddModules.FBRT_YOLO.HGBlock_FCM[1024, 192, 1024, 5, 6, True, True]
7 -1 6 5351043 ultralytics.nn.AddModules.FBRT_YOLO.HGBlock_FCM[1024, 192, 1024, 5, 6, True, True]
8 -1 1 11264 ultralytics.nn.modules.conv.DWConv [1024, 1024, 3, 2, 1, False]
9 -1 6 6708480 ultralytics.nn.modules.block.HGBlock [1024, 384, 2048, 5, 6, True, False]
10 -1 1 524800 ultralytics.nn.modules.conv.Conv [2048, 256, 1, 1, None, 1, 1, False]
11 -1 1 789760 ultralytics.nn.modules.transformer.AIFI [256, 1024, 8]
12 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
13 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
14 7 1 262656 ultralytics.nn.modules.conv.Conv [1024, 256, 1, 1, None, 1, 1, False]
15 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
16 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
17 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
18 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
19 3 1 131584 ultralytics.nn.modules.conv.Conv [512, 256, 1, 1, None, 1, 1, False]
20 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
21 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
22 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
23 [-1, 17] 1 0 ultralytics.nn.modules.conv.Concat [1]
24 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
25 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
26 [-1, 12] 1 0 ultralytics.nn.modules.conv.Concat [1]
27 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
28 [21, 24, 27] 1 7303907 ultralytics.nn.modules.head.RTDETRDecoder [1, [256, 256, 256]]
rtdetr-l-HGBlock_FCM summary: 755 layers, 42,693,836 parameters, 42,693,836 gradients, 139.7 GFLOPs
rtdetr-resnet18-FBRT_YOLO :
rtdetr-resnet18-FBRT_YOLO summary: 582 layers, 22,298,284 parameters, 22,298,284 gradients, 63.8 GFLOPs
from n params module arguments
0 -1 1 960 ultralytics.nn.AddModules.ResNet.ConvNormLayer[3, 32, 3, 2, 1, 'relu']
1 -1 1 9312 ultralytics.nn.AddModules.ResNet.ConvNormLayer[32, 32, 3, 1, 1, 'relu']
2 -1 1 18624 ultralytics.nn.AddModules.ResNet.ConvNormLayer[32, 64, 3, 1, 1, 'relu']
3 -1 1 0 torch.nn.modules.pooling.MaxPool2d [3, 2, 1]
4 -1 2 180422 ultralytics.nn.AddModules.ResNet.Blocks [64, 64, 2, 'BasicBlock_FCM', 2, True]
5 -1 2 633222 ultralytics.nn.AddModules.ResNet.Blocks [64, 128, 2, 'BasicBlock_FCM', 3, True]
6 -1 2 2519814 ultralytics.nn.AddModules.ResNet.Blocks [128, 256, 2, 'BasicBlock_FCM', 4, True]
7 -1 2 10053126 ultralytics.nn.AddModules.ResNet.Blocks [256, 512, 2, 'BasicBlock_FCM', 5, True]
8 -1 1 131584 ultralytics.nn.modules.conv.Conv [512, 256, 1, 1, None, 1, 1, False]
9 -1 1 789760 ultralytics.nn.modules.transformer.AIFI [256, 1024, 8]
10 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
11 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
12 6 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1, None, 1, 1, False]
13 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
14 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
15 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
16 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
17 5 1 33280 ultralytics.nn.modules.conv.Conv [128, 256, 1, 1, None, 1, 1, False]
18 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
19 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
20 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
21 [-1, 15] 1 0 ultralytics.nn.modules.conv.Concat [1]
22 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
23 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
24 [-1, 10] 1 0 ultralytics.nn.modules.conv.Concat [1]
25 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
26 [19, 22, 25] 1 3917684 ultralytics.nn.modules.head.RTDETRDecoder [1, [256, 256, 256], 256, 300, 4, 8, 3]
rtdetr-resnet18-FBRT_YOLO summary: 582 layers, 22,298,284 parameters, 22,298,284 gradients, 63.8 GFLOPs