RT-DETR改进策略【Conv和Transformer】| ACmix 卷积和自注意力的结合,充分发挥两者优势
一、本文介绍
本文记录的是
利用
ACmix
改进
RT-DETR
检测模型
,
卷积
和
自注意力
是两种强大的表示学习技术,本文利用两者之间潜在的紧密关系,进行二次创新,实现优势互补,减少冗余,通过实验证明,实现模型有效涨点。
二、ACmix介绍
On the Integration of Self-Attention and Convolution
2.1 原理
2.1.1 卷积分解
传统的卷积可以分解为多个 1 × 1 1×1 1 × 1 卷积,然后是位移和求和操作。例如对于一个 k × k k×k k × k 的卷积核,可分解为 k 2 k^{2} k 2 个 1 × 1× 1 × 卷积。
2.1.2 自注意力解释
自注意力模块中查询、键和值的投影可以看作是多个 1 × 1 1×1 1 × 1 卷积,然后计算注意力权重并聚合值。
2.1.3 相似性及主导计算复杂度
两个模块的第一阶段都包含类似的 1 × 1 1×1 1 × 1 卷积操作,并且这个第一阶段相比第二阶段在计算复杂度上占主导地位(与通道大小的平方相关),这为整合提供了理论基础。
2.2 结构
- 第一阶段 :输入特征图通过三个 1 × 1 1×1 1 × 1 卷积进行投影并重塑为 N N N 块,得到一组丰富的中间特征,包含 3 × N 3×N 3 × N 个特征图。
-
第二阶段
:
- 自注意力路径 :将中间特征收集为 N N N 组,每组包含三个特征(来自每个 1 × 1 1×1 1 × 1 卷积),作为查询、键和值,按照传统的多头自注意力模块进行处理。
- 卷积路径 :对于卷积核大小为 k k k 的情况,采用一个轻量级全连接层并生成 k 2 k^{2} k 2 个特征图,然后通过位移和聚合这些特征来处理输入特征,从局部感受野收集信息。
- 最终输出 :两条路径的输出相加,其强度由两个可学习的标量 α \alpha α 和 β \beta β 控制,即 F o u t = α F a t t + β F c o n v F_{out}=\alpha F_{att}+\beta F_{conv} F o u t = α F a tt + β F co n v 。
2.3 优势
-
计算效率
:
- 理论上,在第一阶段的计算复杂度与通道大小相关,相比传统卷积(如 3 × 3 3×3 3 × 3 卷积),在第一阶段的计算成本与自注意力相似且更轻。在第二阶段虽然有额外计算开销,但复杂度与通道大小呈线性关系且相对第一阶段较小。
- 通过改进位移和求和操作,如采用深度可分离卷积替代低效的张量位移,提高了模块的实际计算效率。
- 性能优势 :在图像识别和下游任务(如图像分类、语义分割和目标检测)上,与竞争基准相比,模型取得了持续改进的结果。
-
灵活性和通用性
:
- 模型可以自适应地调整卷积和自注意力路径的强度,根据网络中滤波器的位置灵活组合两个模块。
- 可以应用于多种自注意力模式,如Patchwise attention、Window attention和Global attention等变体。
论文: https://arxiv.org/pdf/2111.14556
源码: https://github.com/LeapLabTHU/ACmix
三、ACmix的实现代码
ACmix模块
的实现代码如下:
import torch
import torch.nn as nn
import torch.nn.functional as F
def position(H, W, is_cuda=True):
if is_cuda:
loc_w = torch.linspace(-1.0, 1.0, W).cuda().unsqueeze(0).repeat(H, 1)
loc_h = torch.linspace(-1.0, 1.0, H).cuda().unsqueeze(1).repeat(1, W)
else:
loc_w = torch.linspace(-1.0, 1.0, W).unsqueeze(0).repeat(H, 1)
loc_h = torch.linspace(-1.0, 1.0, H).unsqueeze(1).repeat(1, W)
loc = torch.cat([loc_w.unsqueeze(0), loc_h.unsqueeze(0)], 0).unsqueeze(0)
return loc
def stride(x, stride):
b, c, h, w = x.shape
return x[:, :, ::stride, ::stride]
def init_rate_half(tensor):
if tensor is not None:
tensor.data.fill_(0.5)
def init_rate_0(tensor):
if tensor is not None:
tensor.data.fill_(0.)
class ACmix(nn.Module):
def __init__(self, in_planes, out_planes, kernel_att=7, head=2, kernel_conv=3, stride=1, dilation=1):
super(ACmix, self).__init__()
self.in_planes = in_planes
self.out_planes = out_planes
self.head = head
self.kernel_att = kernel_att
self.kernel_conv = kernel_conv
self.stride = stride
self.dilation = dilation
self.rate1 = torch.nn.Parameter(torch.Tensor(1))
self.rate2 = torch.nn.Parameter(torch.Tensor(1))
self.head_dim = self.out_planes // self.head
self.conv1 = nn.Conv2d(in_planes, out_planes, kernel_size=1)
self.conv2 = nn.Conv2d(in_planes, out_planes, kernel_size=1)
self.conv3 = nn.Conv2d(in_planes, out_planes, kernel_size=1)
self.conv_p = nn.Conv2d(2, self.head_dim, kernel_size=1)
self.padding_att = (self.dilation * (self.kernel_att - 1) + 1) // 2
self.pad_att = torch.nn.ReflectionPad2d(self.padding_att)
self.unfold = nn.Unfold(kernel_size=self.kernel_att, padding=0, stride=self.stride)
self.softmax = torch.nn.Softmax(dim=1)
self.fc = nn.Conv2d(3 * self.head, self.kernel_conv * self.kernel_conv, kernel_size=1, bias=False)
self.dep_conv = nn.Conv2d(self.kernel_conv * self.kernel_conv * self.head_dim, out_planes,
kernel_size=self.kernel_conv, bias=True, groups=self.head_dim, padding=1,
stride=stride)
self.reset_parameters()
def reset_parameters(self):
init_rate_half(self.rate1)
init_rate_half(self.rate2)
kernel = torch.zeros(self.kernel_conv * self.kernel_conv, self.kernel_conv, self.kernel_conv)
for i in range(self.kernel_conv * self.kernel_conv):
kernel[i, i // self.kernel_conv, i % self.kernel_conv] = 1.
kernel = kernel.squeeze(0).repeat(self.out_planes, 1, 1, 1)
self.dep_conv.weight = nn.Parameter(data=kernel, requires_grad=True)
self.dep_conv.bias = init_rate_0(self.dep_conv.bias)
def forward(self, x):
q, k, v = self.conv1(x), self.conv2(x), self.conv3(x)
scaling = float(self.head_dim) ** -0.5
b, c, h, w = q.shape
h_out, w_out = h // self.stride, w // self.stride
# ### att
# ## positional encoding
pe = self.conv_p(position(h, w, x.is_cuda))
q_att = q.view(b * self.head, self.head_dim, h, w) * scaling
k_att = k.view(b * self.head, self.head_dim, h, w)
v_att = v.view(b * self.head, self.head_dim, h, w)
if self.stride > 1:
q_att = stride(q_att, self.stride)
q_pe = stride(pe, self.stride)
else:
q_pe = pe
unfold_k = self.unfold(self.pad_att(k_att)).view(b * self.head, self.head_dim,
self.kernel_att * self.kernel_att, h_out,
w_out) # b*head, head_dim, k_att^2, h_out, w_out
unfold_rpe = self.unfold(self.pad_att(pe)).view(1, self.head_dim, self.kernel_att * self.kernel_att, h_out,
w_out) # 1, head_dim, k_att^2, h_out, w_out
att = (q_att.unsqueeze(2) * (unfold_k + q_pe.unsqueeze(2) - unfold_rpe)).sum(
1) # (b*head, head_dim, 1, h_out, w_out) * (b*head, head_dim, k_att^2, h_out, w_out) -> (b*head, k_att^2, h_out, w_out)
att = self.softmax(att)
out_att = self.unfold(self.pad_att(v_att)).view(b * self.head, self.head_dim, self.kernel_att * self.kernel_att,
h_out, w_out)
out_att = (att.unsqueeze(1) * out_att).sum(2).view(b, self.out_planes, h_out, w_out)
## conv
f_all = self.fc(torch.cat(
[q.view(b, self.head, self.head_dim, h * w), k.view(b, self.head, self.head_dim, h * w),
v.view(b, self.head, self.head_dim, h * w)], 1))
f_conv = f_all.permute(0, 2, 1, 3).reshape(x.shape[0], -1, x.shape[-2], x.shape[-1])
out_conv = self.dep_conv(f_conv)
return self.rate1 * out_att + self.rate2 * out_conv
def autopad(k, p=None, d=1): # kernel, padding, dilation
"""Pad to 'same' shape outputs."""
if d > 1:
k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k] # actual kernel-size
if p is None:
p = k // 2 if isinstance(k, int) else [x // 2 for x in k] # auto-pad
return p
class Conv(nn.Module):
"""Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)."""
default_act = nn.SiLU() # default activation
def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):
"""Initialize Conv layer with given arguments including activation."""
super().__init__()
self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)
self.bn = nn.BatchNorm2d(c2)
self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()
def forward(self, x):
"""Apply convolution, batch normalization and activation to input tensor."""
return self.act(self.bn(self.conv(x)))
def forward_fuse(self, x):
"""Perform transposed convolution of 2D data."""
return self.act(self.conv(x))
class ResNetBlock(nn.Module):
"""ResNet block with standard convolution layers."""
def __init__(self, c1, c2, s=1, e=4):
"""Initialize convolution with given parameters."""
super().__init__()
c3 = e * c2
self.cv1 = Conv(c1, c2, k=1, s=1, act=True)
self.cv2 = Conv(c2, c2, k=3, s=s, p=1, act=True)
self.cv3 = ACmix(c2, c3)
self.shortcut = nn.Sequential(Conv(c1, c3, k=1, s=s, act=False)) if s != 1 or c1 != c3 else nn.Identity()
def forward(self, x):
"""Forward pass through the ResNet block."""
return F.relu(self.cv3(self.cv2(self.cv1(x))) + self.shortcut(x))
class ResNetLayer_ACmix(nn.Module):
"""ResNet layer with multiple ResNet blocks."""
def __init__(self, c1, c2, s=1, is_first=False, n=1, e=4):
"""Initializes the ResNetLayer given arguments."""
super().__init__()
self.is_first = is_first
if self.is_first:
self.layer = nn.Sequential(
Conv(c1, c2, k=7, s=2, p=3, act=True), nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
)
else:
blocks = [ResNetBlock(c1, c2, s, e=e)]
blocks.extend([ResNetBlock(e * c2, c2, 1, e=e) for _ in range(n - 1)])
self.layer = nn.Sequential(*blocks)
def forward(self, x):
"""Forward pass through the ResNet layer."""
return self.layer(x)
四、创新模块
4.1 改进点⭐
模块改进方法
:直接加入
ACmix
(
第五节讲解添加步骤
)。
ACmix
模块加入如下:
4.2 改进点⭐
模块改进方法
:基于
ACmix模块
的
ResNetLayer
(
第五节讲解添加步骤
)。
第二种改进方法是对
RT-DETR
中的
ResNetLayer模块
进行改进,并将
ACmix
在加入到
ResNetLayer
模块中。
改进代码如下:
对
ResNetBlock
模块进行改进,加入
ACmix模块
,将
ResNetLayer
重命名为
ResNetLayer_ACmix
。
class ResNetBlock(nn.Module):
"""ResNet block with standard convolution layers."""
def __init__(self, c1, c2, s=1, e=4):
"""Initialize convolution with given parameters."""
super().__init__()
c3 = e * c2
self.cv1 = Conv(c1, c2, k=1, s=1, act=True)
self.cv2 = Conv(c2, c2, k=3, s=s, p=1, act=True)
self.cv3 = ACmix(c2, c3)
self.shortcut = nn.Sequential(Conv(c1, c3, k=1, s=s, act=False)) if s != 1 or c1 != c3 else nn.Identity()
def forward(self, x):
"""Forward pass through the ResNet block."""
return F.relu(self.cv3(self.cv2(self.cv1(x))) + self.shortcut(x))
class ResNetLayer_ACmix(nn.Module):
"""ResNet layer with multiple ResNet blocks."""
def __init__(self, c1, c2, s=1, is_first=False, n=1, e=4):
"""Initializes the ResNetLayer given arguments."""
super().__init__()
self.is_first = is_first
if self.is_first:
self.layer = nn.Sequential(
Conv(c1, c2, k=7, s=2, p=3, act=True), nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
)
else:
blocks = [ResNetBlock(c1, c2, s, e=e)]
blocks.extend([ResNetBlock(e * c2, c2, 1, e=e) for _ in range(n - 1)])
self.layer = nn.Sequential(*blocks)
def forward(self, x):
"""Forward pass through the ResNet layer."""
return self.layer(x)
注意❗:在
第五小节
中需要声明的模块名称为:
ACmix
和
ResNetLayer_ACmix
。
五、添加步骤
5.1 修改一
① 在
ultralytics/nn/
目录下新建
AddModules
文件夹用于存放模块代码
② 在
AddModules
文件夹下新建
ACmix.py
,将
第三节
中的代码粘贴到此处
5.2 修改二
在
AddModules
文件夹下新建
__init__.py
(已有则不用新建),在文件内导入模块:
from .ACmix import *
5.3 修改三
在
ultralytics/nn/modules/tasks.py
文件中,需要在两处位置添加各模块类名称。
首先:导入模块
其次:在
parse_model函数
中注册
ACmix
和
ResNetLayer_ACmix
模块
最后,在
ultralytics/cfg/default.yaml
路径中找到
amp
,并将其设置为
False
六、yaml模型文件
6.1 模型改进版本⭐
此处以
ultralytics/cfg/models/rt-detr/rtdetr-l.yaml
为例,在同目录下创建一个用于自己数据集训练的模型文件
rtdetr-l-ACmix.yaml
。
将
rtdetr-l.yaml
中的内容复制到
rtdetr-l-ACmix.yaml
文件下,修改
nc
数量等于自己数据中目标的数量。
📌 模型的修改方法是将
骨干网络
中添加
ACmix模块
。
# Ultralytics YOLO 🚀, AGPL-3.0 license
# RT-DETR-l object detection model with P3-P5 outputs. For details see https://docs.ultralytics.com/models/rtdetr
# Parameters
nc: 1 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'
# [depth, width, max_channels]
l: [1.00, 1.00, 1024]
backbone:
# [from, repeats, module, args]
- [-1, 1, HGStem, [32, 48]] # 0-P2/4
- [-1, 6, HGBlock, [48, 128, 3]] # stage 1
- [-1, 1, DWConv, [128, 3, 2, 1, False]] # 2-P3/8
- [-1, 6, HGBlock, [96, 512, 3]] # stage 2
- [-1, 1, DWConv, [512, 3, 2, 1, False]] # 4-P4/16
- [-1, 6, HGBlock, [192, 1024, 5, True, False]] # cm, c2, k, light, shortcut
- [-1, 6, HGBlock, [192, 1024, 5, True, True]]
- [-1, 6, HGBlock, [192, 1024, 5, True, True]] # stage 3
- [-1, 1, DWConv, [1024, 3, 2, 1, False]] # 8-P5/32
- [-1, 6, ACmix, [2048]] # stage 4
head:
- [-1, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 10 input_proj.2
- [-1, 1, AIFI, [1024, 8]]
- [-1, 1, Conv, [256, 1, 1]] # 12, Y5, lateral_convs.0
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [7, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 14 input_proj.1
- [[-2, -1], 1, Concat, [1]]
- [-1, 3, RepC3, [256]] # 16, fpn_blocks.0
- [-1, 1, Conv, [256, 1, 1]] # 17, Y4, lateral_convs.1
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [3, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 19 input_proj.0
- [[-2, -1], 1, Concat, [1]] # cat backbone P4
- [-1, 3, RepC3, [256]] # X3 (21), fpn_blocks.1
- [-1, 1, Conv, [256, 3, 2]] # 22, downsample_convs.0
- [[-1, 17], 1, Concat, [1]] # cat Y4
- [-1, 3, RepC3, [256]] # F4 (24), pan_blocks.0
- [-1, 1, Conv, [256, 3, 2]] # 25, downsample_convs.1
- [[-1, 12], 1, Concat, [1]] # cat Y5
- [-1, 3, RepC3, [256]] # F5 (27), pan_blocks.1
- [[21, 24, 27], 1, RTDETRDecoder, [nc]] # Detect(P3, P4, P5)
6.2 模型改进版本⭐
此处以
ultralytics/cfg/models/rt-detr/rtdetr-resnet50.yaml
为例,在同目录下创建一个用于自己数据集训练的模型文件
yolov10m-ResNetLayer_ACmix.yaml
。
将
rtdetr-resnet50.yaml
中的内容复制到
rtdetr-ResNetLayer_ACmix.yaml
文件下,修改
nc
数量等于自己数据中目标的数量。
📌 模型的修改方法是将
骨干网络
中的
ResNetLayer模块
替换成
ResNetLayer_ACmix模块
。
# Ultralytics YOLO 🚀, AGPL-3.0 license
# RT-DETR-ResNet50 object detection model with P3-P5 outputs.
# Parameters
nc: 1 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'
# [depth, width, max_channels]
l: [1.00, 1.00, 1024]
backbone:
# [from, repeats, module, args]
- [-1, 1, ResNetLayer_ACmix, [3, 64, 1, True, 1]] # 0
- [-1, 1, ResNetLayer_ACmix, [64, 64, 1, False, 3]] # 1
- [-1, 1, ResNetLayer_ACmix, [256, 128, 2, False, 4]] # 2
- [-1, 1, ResNetLayer_ACmix, [512, 256, 2, False, 6]] # 3
- [-1, 1, ResNetLayer_ACmix, [1024, 512, 2, False, 3]] # 4
head:
- [-1, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 5
- [-1, 1, AIFI, [1024, 8]]
- [-1, 1, Conv, [256, 1, 1]] # 7
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [3, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 9
- [[-2, -1], 1, Concat, [1]]
- [-1, 3, RepC3, [256]] # 11
- [-1, 1, Conv, [256, 1, 1]] # 12
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [2, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 14
- [[-2, -1], 1, Concat, [1]] # cat backbone P4
- [-1, 3, RepC3, [256]] # X3 (16), fpn_blocks.1
- [-1, 1, Conv, [256, 3, 2]] # 17, downsample_convs.0
- [[-1, 12], 1, Concat, [1]] # cat Y4
- [-1, 3, RepC3, [256]] # F4 (19), pan_blocks.0
- [-1, 1, Conv, [256, 3, 2]] # 20, downsample_convs.1
- [[-1, 7], 1, Concat, [1]] # cat Y5
- [-1, 3, RepC3, [256]] # F5 (22), pan_blocks.1
- [[16, 19, 22], 1, RTDETRDecoder, [nc]] # Detect(P3, P4, P5)
七、成功运行结果
打印网络模型可以看到
ACmix
和
ResNetLayer_ACmix
已经加入到模型中,并可以进行训练了。
rtdetr-l-ACmix :
rtdetr-l-ACmix summary: 686 layers, 45,237,523 parameters, 45,237,523 gradients, 118.0 GFLOPs
from n params module arguments
0 -1 1 25248 ultralytics.nn.modules.block.HGStem [3, 32, 48]
1 -1 6 155072 ultralytics.nn.modules.block.HGBlock [48, 48, 128, 3, 6]
2 -1 1 1408 ultralytics.nn.modules.conv.DWConv [128, 128, 3, 2, 1, False]
3 -1 6 839296 ultralytics.nn.modules.block.HGBlock [128, 96, 512, 3, 6]
4 -1 1 5632 ultralytics.nn.modules.conv.DWConv [512, 512, 3, 2, 1, False]
5 -1 6 1695360 ultralytics.nn.modules.block.HGBlock [512, 192, 1024, 5, 6, True, False]
6 -1 6 2055808 ultralytics.nn.modules.block.HGBlock [1024, 192, 1024, 5, 6, True, True]
7 -1 6 2055808 ultralytics.nn.modules.block.HGBlock [1024, 192, 1024, 5, 6, True, True]
8 -1 1 11264 ultralytics.nn.modules.conv.DWConv [1024, 1024, 3, 2, 1, False]
9 -1 6 19400016 ultralytics.nn.AddModules.ACmix.ACmix [1024, 1024]
10 -1 1 262656 ultralytics.nn.modules.conv.Conv [1024, 256, 1, 1, None, 1, 1, False]
11 -1 1 789760 ultralytics.nn.modules.transformer.AIFI [256, 1024, 8]
12 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
13 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
14 7 1 262656 ultralytics.nn.modules.conv.Conv [1024, 256, 1, 1, None, 1, 1, False]
15 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
16 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
17 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
18 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
19 3 1 131584 ultralytics.nn.modules.conv.Conv [512, 256, 1, 1, None, 1, 1, False]
20 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
21 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
22 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
23 [-1, 17] 1 0 ultralytics.nn.modules.conv.Concat [1]
24 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
25 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
26 [-1, 12] 1 0 ultralytics.nn.modules.conv.Concat [1]
27 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
28 [21, 24, 27] 1 7303907 ultralytics.nn.modules.head.RTDETRDecoder [1, [256, 256, 256]]
rtdetr-l-ACmix summary: 686 layers, 45,237,523 parameters, 45,237,523 gradients, 118.0 GFLOPs
rtdetr-ResNetLayer_ACmix :
rtdetr-ResNetLayer_ACmix summary: 689 layers, 54,084,643 parameters, 54,084,643 gradients, 166.8 GFLOPs
from n params module arguments
0 -1 1 9536 ultralytics.nn.AddModules.ACmix.ResNetLayer_ACmix[3, 64, 1, True, 1]
1 -1 1 378408 ultralytics.nn.AddModules.ACmix.ResNetLayer_ACmix[64, 64, 1, False, 3]
2 -1 1 1915104 ultralytics.nn.AddModules.ACmix.ResNetLayer_ACmix[256, 128, 2, False, 4]
3 -1 1 10757456 ultralytics.nn.AddModules.ACmix.ResNetLayer_ACmix[512, 256, 2, False, 6]
4 -1 1 21769384 ultralytics.nn.AddModules.ACmix.ResNetLayer_ACmix[1024, 512, 2, False, 3]
5 -1 1 524800 ultralytics.nn.modules.conv.Conv [2048, 256, 1, 1, None, 1, 1, False]
6 -1 1 789760 ultralytics.nn.modules.transformer.AIFI [256, 1024, 8]
7 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
8 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
9 3 1 262656 ultralytics.nn.modules.conv.Conv [1024, 256, 1, 1, None, 1, 1, False]
10 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
11 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
12 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
13 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
14 2 1 131584 ultralytics.nn.modules.conv.Conv [512, 256, 1, 1, None, 1, 1, False]
15 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
16 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
17 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
18 [-1, 12] 1 0 ultralytics.nn.modules.conv.Concat [1]
19 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
20 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
21 [-1, 7] 1 0 ultralytics.nn.modules.conv.Concat [1]
22 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
23 [16, 19, 22] 1 7303907 ultralytics.nn.modules.head.RTDETRDecoder [1, [256, 256, 256]]
rtdetr-ResNetLayer_ACmix summary: 689 layers, 54,084,643 parameters, 54,084,643 gradients, 166.8 GFLOPs