RT-DETR改进策略【卷积层】| SAConv 可切换的空洞卷积 二次创新ResNetLayer
一、本文介绍
本文记录的是
利用
SAConv
优化
RT-DETR
的目标检测网络模型
。
空洞卷积
是一种在不增加参数量和计算量的情况下,通过在卷积核元素之间插入空洞来扩大滤波器视野的技术。并且为了使模型能够
适应不同尺度
的目标,本文利用
SAConv
将不同空洞率卷积结果进行结合,来获取更全面的特征表示,实现涨点。
二、SAConv介绍
DetectoRS: Detecting Objects with Recursive Feature Pyramid and Switchable Atrous Convolution
DetectoRS:用递归特征金字塔和可切换的空洞卷积检测物体
Switchable Atrous Convolution(SAC)
模块的设计出发点、原理、结构和优势如下:
2.1 设计出发点
- 提升检测性能 :受到计算机视觉中“看两次思考两次”机制的启发,在微观层面上通过对特征进行不同空洞率的卷积操作来更好地捕捉图像中的目标信息,以提高目标检测性能。
- 适应不同尺度目标 :为了使模型能够适应不同尺度的目标,通过不同空洞率卷积结果的结合来获取更全面的特征表示。
- 便于利用预训练模型 :提供一种机制,能够轻松地将预训练的标准卷积网络进行转换,而无需从头开始训练整个模型。
2.2 原理
- 空洞卷积原理 :空洞卷积(Atrous Convolution)是一种在不增加参数量和计算量的情况下,通过在卷积核元素之间插入空洞来扩大滤波器视野的技术。对于空洞率为 r r r 的空洞卷积,在连续的滤波器值之间引入 r − 1 r - 1 r − 1 个零,相当于将(k×k)的滤波器内核大小扩大到 k e = k + ( k − 1 ) ( r − 1 ) k_{e}=k+(k - 1)(r - 1) k e = k + ( k − 1 ) ( r − 1 ) 。
-
SAC原理
:
SAConv模块将相同的输入特征用不同的空洞率进行卷积,并使用开关函数(switch functions)来收集结果。开关函数是空间相关的,即特征图的每个位置可能有不同的开关来控制SAConv的输出。通过这种方式,模型可以根据图像中不同位置的目标尺度自适应地选择合适的空洞率卷积结果。
2.3 结构
-
主要组件
:
SAConv模块主要由三个部分组成。- 中间的SAC组件 :这是核心部分,用于将卷积层转换为SAC。对于一个卷积层,其转换公式为 C o n v ( x , w , 1 ) → C o n v e r t t o S A C S ( x ) ⋅ C o n v ( x , w , 1 ) + ( 1 − S ( x ) ) ⋅ C o n v ( x , w + Δ w , r ) Conv(x, w, 1) \underset{ to SAC }{\stackrel{ Convert }{\to}} S(x) \cdot Conv(x, w, 1)+(1 - S(x)) \cdot Conv(x, w+\Delta w, r) C o n v ( x , w , 1 ) t o S A C → C o n v er t S ( x ) ⋅ C o n v ( x , w , 1 ) + ( 1 − S ( x )) ⋅ C o n v ( x , w + Δ w , r ) ,其中 r r r 是SAC的一个超参数, Δ w \Delta w Δ w 是一个可训练的权重,开关函数 S ( ⋅ ) S(\cdot) S ( ⋅ ) 通过一个 5 × 5 5×5 5 × 5 内核的平均池化层后接一个 1 × 1 1×1 1 × 1 卷积层来实现。
-
前后的全局上下文模块
:在
SAConv组件前后分别插入两个全局上下文模块。这两个模块首先通过全局平均池化层压缩输入特征,然后经过一个 1 × 1 1×1 1 × 1 卷积层(无非线性层),将输出直接加回到主流中。这两个模块类似于SENet,但有一些区别,例如这里只有一个卷积层且输出处理方式不同。
-
应用于骨干网络
:在实现中,将骨干网络(如ResNet及其变体)中的所有
3
×
3
3×3
3
×
3
卷积层替换为
SAConv,并且使用可变形卷积(deformable convolution)来替换公式中的卷积操作,其偏移函数在从预训练骨干网络加载时初始化为预测 0 0 0 。
2.3 优势
- 性能提升 :通过结合不同空洞率的卷积结果,能够更好地捕捉目标的多尺度信息,从而显著提高目标检测性能。在COCO数据集上的实验表明,SAC模块能够提高检测的准确率,例如在不同的骨干网络设置下都能使AP值有较大提升。
- 灵活适应尺度 :由于开关函数的空间相关性,模型能够根据目标在图像中的位置和尺度自适应地调整卷积操作,更好地适应不同大小的目标检测,对大目标的检测效果尤为明显,体现在较高的 A P L AP_{L} A P L 值上。
-
有效利用预训练模型
:提供了一种从标准卷积到条件卷积的有效转换机制,无需改变任何预训练模型,只需将骨干网络中的卷积层进行替换即可。这使得
SAConv模块可以作为一个即插即用的模块应用于许多预训练的骨干网络,大大节省了训练成本和时间。 - 新颖的权重锁定机制 :采用了一种权重锁定机制,即不同空洞卷积的权重除了一个可训练的差异外是相同的。这种机制在实验中证明了其有效性,有助于稳定模型训练和提高性能,当打破这种锁定机制时,AP值会明显下降。
论文: https://arxiv.org/pdf/2006.02334
源码: https://github.com/joe-siyuan-qiao/DetectoRS
三、SAConv的实现代码
SAConv模块
的实现代码如下:
import torch
import torch.nn as nn
import torch.nn.functional as F
class ConvAWS2d(nn.Conv2d):
def __init__(self,
in_channels,
out_channels,
kernel_size,
stride=1,
padding=0,
dilation=1,
groups=1,
bias=True):
super().__init__(
in_channels,
out_channels,
kernel_size,
stride=stride,
padding=padding,
dilation=dilation,
groups=groups,
bias=bias)
self.register_buffer('weight_gamma', torch.ones(self.out_channels, 1, 1, 1))
self.register_buffer('weight_beta', torch.zeros(self.out_channels, 1, 1, 1))
def _get_weight(self, weight):
weight_mean = weight.mean(dim=1, keepdim=True).mean(dim=2,
keepdim=True).mean(dim=3, keepdim=True)
weight = weight - weight_mean
std = torch.sqrt(weight.view(weight.size(0), -1).var(dim=1) + 1e-5).view(-1, 1, 1, 1)
weight = weight / std
weight = self.weight_gamma * weight + self.weight_beta
return weight
def forward(self, x):
weight = self._get_weight(self.weight)
return super()._conv_forward(x, weight, None)
def _load_from_state_dict(self, state_dict, prefix, local_metadata, strict,
missing_keys, unexpected_keys, error_msgs):
self.weight_gamma.data.fill_(-1)
super()._load_from_state_dict(state_dict, prefix, local_metadata, strict,
missing_keys, unexpected_keys, error_msgs)
if self.weight_gamma.data.mean() > 0:
return
weight = self.weight.data
weight_mean = weight.data.mean(dim=1, keepdim=True).mean(dim=2,
keepdim=True).mean(dim=3, keepdim=True)
self.weight_beta.data.copy_(weight_mean)
std = torch.sqrt(weight.view(weight.size(0), -1).var(dim=1) + 1e-5).view(-1, 1, 1, 1)
self.weight_gamma.data.copy_(std)
class SAConv2d(ConvAWS2d):
def __init__(self,
in_channels,
out_channels,
kernel_size=3,
s=1,
p=None,
g=1,
d=1,
act=True,
bias=True):
super().__init__(
in_channels,
out_channels,
kernel_size,
stride=s,
padding=autopad(kernel_size, p, d),
dilation=d,
groups=g,
bias=bias)
self.switch = torch.nn.Conv2d(
self.in_channels,
1,
kernel_size=1,
stride=s,
bias=True)
self.switch.weight.data.fill_(0)
self.switch.bias.data.fill_(1)
self.weight_diff = torch.nn.Parameter(torch.Tensor(self.weight.size()))
self.weight_diff.data.zero_()
self.pre_context = torch.nn.Conv2d(
self.in_channels,
self.in_channels,
kernel_size=1,
bias=True)
self.pre_context.weight.data.fill_(0)
self.pre_context.bias.data.fill_(0)
self.post_context = torch.nn.Conv2d(
self.out_channels,
self.out_channels,
kernel_size=1,
bias=True)
self.post_context.weight.data.fill_(0)
self.post_context.bias.data.fill_(0)
self.bn = nn.BatchNorm2d(out_channels)
self.act = Conv.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()
def forward(self, x):
# pre-context
avg_x = torch.nn.functional.adaptive_avg_pool2d(x, output_size=1)
avg_x = self.pre_context(avg_x)
avg_x = avg_x.expand_as(x)
x = x + avg_x
# switch
avg_x = torch.nn.functional.pad(x, pad=(2, 2, 2, 2), mode="reflect")
avg_x = torch.nn.functional.avg_pool2d(avg_x, kernel_size=5, stride=1, padding=0)
switch = self.switch(avg_x)
# sac
weight = self._get_weight(self.weight)
out_s = super()._conv_forward(x, weight, None)
ori_p = self.padding
ori_d = self.dilation
self.padding = tuple(3 * p for p in self.padding)
self.dilation = tuple(3 * d for d in self.dilation)
weight = weight + self.weight_diff
out_l = super()._conv_forward(x, weight, None)
out = switch * out_s + (1 - switch) * out_l
self.padding = ori_p
self.dilation = ori_d
# post-context
avg_x = torch.nn.functional.adaptive_avg_pool2d(out, output_size=1)
avg_x = self.post_context(avg_x)
avg_x = avg_x.expand_as(out)
out = out + avg_x
return self.act(self.bn(out))
def autopad(k, p=None, d=1): # kernel, padding, dilation
"""Pad to 'same' shape outputs."""
if d > 1:
k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k] # actual kernel-size
if p is None:
p = k // 2 if isinstance(k, int) else [x // 2 for x in k] # auto-pad
return p
class Conv(nn.Module):
"""Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)."""
default_act = nn.SiLU() # default activation
def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):
"""Initialize Conv layer with given arguments including activation."""
super().__init__()
self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)
self.bn = nn.BatchNorm2d(c2)
self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()
def forward(self, x):
"""Apply convolution, batch normalization and activation to input tensor."""
return self.act(self.bn(self.conv(x)))
def forward_fuse(self, x):
"""Perform transposed convolution of 2D data."""
return self.act(self.conv(x))
class ResNetBlock(nn.Module):
"""ResNet block with standard convolution layers."""
def __init__(self, c1, c2, s=1, e=4):
"""Initialize convolution with given parameters."""
super().__init__()
c3 = e * c2
self.cv1 = Conv(c1, c2, k=1, s=1, act=True)
self.cv2 = Conv(c2, c2, k=3, s=s, p=1, act=True)
self.cv3 = SAConv2d(c2, c3, kernel_size=1, act=False)
# self.cv4 = SAConv2d(c2, c3, kernel_size=1, act=False)
self.shortcut = nn.Sequential(Conv(c1, c3, k=1, s=s, act=False)) if s != 1 or c1 != c3 else nn.Identity()
def forward(self, x):
"""Forward pass through the ResNet block."""
return F.relu(self.cv3(self.cv2(self.cv1(x))) + self.shortcut(x))
class ResNetLayer_SAConv(nn.Module):
"""ResNet layer with multiple ResNet blocks."""
def __init__(self, c1, c2, s=1, is_first=False, n=1, e=4):
"""Initializes the ResNetLayer given arguments."""
super().__init__()
self.is_first = is_first
if self.is_first:
self.layer = nn.Sequential(
Conv(c1, c2, k=7, s=2, p=3, act=True), nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
)
else:
blocks = [ResNetBlock(c1, c2, s, e=e)]
blocks.extend([ResNetBlock(e * c2, c2, 1, e=e) for _ in range(n - 1)])
self.layer = nn.Sequential(*blocks)
def forward(self, x):
"""Forward pass through the ResNet layer."""
return self.layer(x)
四、创新模块
4.1 改进点1
模块改进方法
1️⃣:直接使用
SAConv模块
,源代码已在第三节中列出。
SAConv模块
添加后如下(
第五节讲解添加步骤
):
注意❗:在
第五小节
中需要声明的模块名称为:
SAConv2d
。
4.2 改进点2⭐
模块改进方法
2️⃣:基于
SAConv模块
的
ResNetLayer
(
第五节讲解添加步骤
)。
第二种改进方法是对
RT-DETR
中的
ResNetLayer模块
进行改进。
SAConv
在加入到
ResNetLayer
模块中后,
通过结合不同空洞率的卷积结果,能够更好地捕捉目标的多尺度信息,从而显著提高目标检测性能。
改进代码如下:
首先添加如下代码改进
ResNetBlock
模块,并将
ResNetLayer
重命名为
ResNetLayer_SAConv
class ResNetBlock(nn.Module):
"""ResNet block with standard convolution layers."""
def __init__(self, c1, c2, s=1, e=4):
"""Initialize convolution with given parameters."""
super().__init__()
c3 = e * c2
self.cv1 = Conv(c1, c2, k=1, s=1, act=True)
self.cv2 = Conv(c2, c2, k=3, s=s, p=1, act=True)
self.cv3 = SAConv2d(c2, c3, kernel_size=1, act=False)
self.shortcut = nn.Sequential(Conv(c1, c3, k=1, s=s, act=False)) if s != 1 or c1 != c3 else nn.Identity()
def forward(self, x):
"""Forward pass through the ResNet block."""
return F.relu(self.cv3(self.cv2(self.cv1(x))) + self.shortcut(x))
class ResNetLayer_SAConv(nn.Module):
"""ResNet layer with multiple ResNet blocks."""
def __init__(self, c1, c2, s=1, is_first=False, n=1, e=4):
"""Initializes the ResNetLayer given arguments."""
super().__init__()
self.is_first = is_first
if self.is_first:
self.layer = nn.Sequential(
Conv(c1, c2, k=7, s=2, p=3, act=True), nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
)
else:
blocks = [ResNetBlock(c1, c2, s, e=e)]
blocks.extend([ResNetBlock(e * c2, c2, 1, e=e) for _ in range(n - 1)])
self.layer = nn.Sequential(*blocks)
def forward(self, x):
"""Forward pass through the ResNet layer."""
return self.layer(x)
注意❗:在
第五小节
中需要声明的模块名称为:
ResNetLayer_SAConv
。
五、添加步骤
5.1 修改一
① 在
ultralytics/nn/
目录下新建
AddModules
文件夹用于存放模块代码
② 在
AddModules
文件夹下新建
SAConv.py
,将
第三节
中的代码粘贴到此处
5.2 修改二
在
AddModules
文件夹下新建
__init__.py
(已有则不用新建),在文件内导入模块:
from .SAConv import *
5.3 修改三
在
ultralytics/nn/modules/tasks.py
文件中,需要在两处位置添加各模块类名称。
首先:导入模块
其次:在
parse_model函数
中注册
SAConv2d
和
ResNetLayer_SAConv
模块
六、yaml模型文件
6.1 模型改进版本一
在代码配置完成后,配置模型的YAML文件。
此处以
ultralytics/cfg/models/rt-detr/rtdetr-l.yaml
为例,在同目录下创建一个用于自己数据集训练的模型文件
rtdetr-l-SAConv.yaml
。
将
rtdetr-l.yaml
中的内容复制到
rtdetr-l-SAConv.yaml
文件下,修改
nc
数量等于自己数据中目标的数量。
📌 模型的修改方法是将
网络
中的部分
HGBlock模块
替换成
SAConv模块
。
# Ultralytics YOLO 🚀, AGPL-3.0 license
# RT-DETR-l object detection model with P3-P5 outputs. For details see https://docs.ultralytics.com/models/rtdetr
# Parameters
nc: 1 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'
# [depth, width, max_channels]
l: [1.00, 1.00, 1024]
backbone:
# [from, repeats, module, args]
- [-1, 1, HGStem, [32, 48]] # 0-P2/4
- [-1, 6, HGBlock, [48, 128, 3]] # stage 1
- [-1, 1, DWConv, [128, 3, 2, 1, False]] # 2-P3/8
- [-1, 6, HGBlock, [96, 512, 3]] # stage 2
- [-1, 1, DWConv, [512, 3, 2, 1, False]] # 4-P4/16
- [-1, 6, SAConv2d, [512]] # cm, c2, k, light, shortcut
- [-1, 6, SAConv2d, [512]]
- [-1, 6, SAConv2d, [512]] # stage 3
- [-1, 1, DWConv, [1024, 3, 2, 1, False]] # 8-P5/32
- [-1, 6, HGBlock, [384, 2048, 5, True, False]] # stage 4
head:
- [-1, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 10 input_proj.2
- [-1, 1, AIFI, [1024, 8]]
- [-1, 1, Conv, [256, 1, 1]] # 12, Y5, lateral_convs.0
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [7, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 14 input_proj.1
- [[-2, -1], 1, Concat, [1]]
- [-1, 3, RepC3, [256]] # 16, fpn_blocks.0
- [-1, 1, Conv, [256, 1, 1]] # 17, Y4, lateral_convs.1
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [3, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 19 input_proj.0
- [[-2, -1], 1, Concat, [1]] # cat backbone P4
- [-1, 3, RepC3, [256]] # X3 (21), fpn_blocks.1
- [-1, 1, Conv, [256, 3, 2]] # 22, downsample_convs.0
- [[-1, 17], 1, Concat, [1]] # cat Y4
- [-1, 3, RepC3, [256]] # F4 (24), pan_blocks.0
- [-1, 1, Conv, [256, 3, 2]] # 25, downsample_convs.1
- [[-1, 12], 1, Concat, [1]] # cat Y5
- [-1, 3, RepC3, [256]] # F5 (27), pan_blocks.1
- [[21, 24, 27], 1, RTDETRDecoder, [nc]] # Detect(P3, P4, P5)
6.2 模型改进版本二⭐
此处同样以
ultralytics/cfg/models/rt-detr/rtdetr-resnet50.yaml
为例,在同目录下创建一个用于自己数据集训练的模型文件
rtdetr-ResNetLayer_SAConv.yaml
。
将
rtdetr-resnet50.yaml
中的内容复制到
rtdetr-ResNetLayer_SAConv.yaml
文件下,修改
nc
数量等于自己数据中目标的数量。
📌 模型的修改方法是将
网络
中的部分
ResNetLayer模块
替换成
ResNetLayer_SAConv模块
。
# Ultralytics YOLO 🚀, AGPL-3.0 license
# RT-DETR-ResNet50 object detection model with P3-P5 outputs.
# Parameters
nc: 1 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'
# [depth, width, max_channels]
l: [1.00, 1.00, 1024]
backbone:
# [from, repeats, module, args]
- [-1, 1, ResNetLayer_SAConv, [3, 64, 1, True, 1]] # 0
- [-1, 1, ResNetLayer_SAConv, [64, 64, 1, False, 3]] # 1
- [-1, 1, ResNetLayer_SAConv, [256, 128, 2, False, 4]] # 2
- [-1, 1, ResNetLayer_SAConv, [512, 256, 2, False, 6]] # 3
- [-1, 1, ResNetLayer_SAConv, [1024, 512, 2, False, 3]] # 4
head:
- [-1, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 5
- [-1, 1, AIFI, [1024, 8]]
- [-1, 1, Conv, [256, 1, 1]] # 7
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [3, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 9
- [[-2, -1], 1, Concat, [1]]
- [-1, 3, RepC3, [256]] # 11
- [-1, 1, Conv, [256, 1, 1]] # 12
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [2, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 14
- [[-2, -1], 1, Concat, [1]] # cat backbone P4
- [-1, 3, RepC3, [256]] # X3 (16), fpn_blocks.1
- [-1, 1, Conv, [256, 3, 2]] # 17, downsample_convs.0
- [[-1, 12], 1, Concat, [1]] # cat Y4
- [-1, 3, RepC3, [256]] # F4 (19), pan_blocks.0
- [-1, 1, Conv, [256, 3, 2]] # 20, downsample_convs.1
- [[-1, 7], 1, Concat, [1]] # cat Y5
- [-1, 3, RepC3, [256]] # F5 (22), pan_blocks.1
- [[16, 19, 22], 1, RTDETRDecoder, [nc]] # Detect(P3, P4, P5)
七、成功运行结果
分别打印网络模型可以看到
SAConv2d
和
C3k2_SAConv
已经加入到模型中,并可以进行训练了。
rtdetr-l-SAConv :
rtdetr-l-SAConv summary: 607 layers, 121,297,237 parameters, 121,297,237 gradients, 89.1 GFLOPs
from n params module arguments
0 -1 1 25248 ultralytics.nn.modules.block.HGStem [3, 32, 48]
1 -1 6 155072 ultralytics.nn.modules.block.HGBlock [48, 48, 128, 3, 6]
2 -1 1 1408 ultralytics.nn.modules.conv.DWConv [128, 128, 3, 2, 1, False]
3 -1 6 839296 ultralytics.nn.modules.block.HGBlock [128, 96, 512, 3, 6]
4 -1 1 5632 ultralytics.nn.modules.conv.DWConv [512, 512, 3, 2, 1, False]
5 -1 6 31475718 ultralytics.nn.AddModules.SAConv.SAConv2d [512, 512]
6 -1 6 31475718 ultralytics.nn.AddModules.SAConv.SAConv2d [512, 512]
7 -1 6 31475718 ultralytics.nn.AddModules.SAConv.SAConv2d [512, 512]
8 -1 1 11264 ultralytics.nn.modules.conv.DWConv [512, 1024, 3, 2, 1, False]
9 -1 6 6708480 ultralytics.nn.modules.block.HGBlock [1024, 384, 2048, 5, 6, True, False]
10 -1 1 524800 ultralytics.nn.modules.conv.Conv [2048, 256, 1, 1, None, 1, 1, False]
11 -1 1 789760 ultralytics.nn.modules.transformer.AIFI [256, 1024, 8]
12 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
13 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
14 7 1 131584 ultralytics.nn.modules.conv.Conv [512, 256, 1, 1, None, 1, 1, False]
15 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
16 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
17 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
18 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
19 3 1 131584 ultralytics.nn.modules.conv.Conv [512, 256, 1, 1, None, 1, 1, False]
20 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
21 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
22 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
23 [-1, 17] 1 0 ultralytics.nn.modules.conv.Concat [1]
24 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
25 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
26 [-1, 12] 1 0 ultralytics.nn.modules.conv.Concat [1]
27 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
28 [21, 24, 27] 1 7303907 ultralytics.nn.modules.head.RTDETRDecoder [1, [256, 256, 256]]
rtdetr-l-SAConv summary: 607 layers, 121,297,237 parameters, 121,297,237 gradients, 89.1 GFLOPs
rtdetr-ResNetLayer_SAConv :
rtdetr-ResNetLayer_SAConv summary: 625 layers, 69,207,475 parameters, 69,207,475 gradients, 117.1 GFLOPs
from n params module arguments
0 -1 1 9536 ultralytics.nn.AddModules.SAConv.ResNetLayer_SAConv[3, 64, 1, True, 1]
1 -1 1 475779 ultralytics.nn.AddModules.SAConv.ResNetLayer_SAConv[64, 64, 1, False, 3]
2 -1 1 2600964 ultralytics.nn.AddModules.SAConv.ResNetLayer_SAConv[256, 128, 2, False, 4]
3 -1 1 15371270 ultralytics.nn.AddModules.SAConv.ResNetLayer_SAConv[512, 256, 2, False, 6]
4 -1 1 31495171 ultralytics.nn.AddModules.SAConv.ResNetLayer_SAConv[1024, 512, 2, False, 3]
5 -1 1 524800 ultralytics.nn.modules.conv.Conv [2048, 256, 1, 1, None, 1, 1, False]
6 -1 1 789760 ultralytics.nn.modules.transformer.AIFI [256, 1024, 8]
7 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
8 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
9 3 1 262656 ultralytics.nn.modules.conv.Conv [1024, 256, 1, 1, None, 1, 1, False]
10 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
11 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
12 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
13 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
14 2 1 131584 ultralytics.nn.modules.conv.Conv [512, 256, 1, 1, None, 1, 1, False]
15 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
16 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
17 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
18 [-1, 12] 1 0 ultralytics.nn.modules.conv.Concat [1]
19 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
20 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
21 [-1, 7] 1 0 ultralytics.nn.modules.conv.Concat [1]
22 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
23 [16, 19, 22] 1 7303907 ultralytics.nn.modules.head.RTDETRDecoder [1, [256, 256, 256]]
rtdetr-ResNetLayer_SAConv summary: 625 layers, 69,207,475 parameters, 69,207,475 gradients, 117.1 GFLOPs