RT-DETR改进策略【注意力机制篇】| SENet V2 优化SE注意力机制,聚合通道和全局信息
一、本文介绍
本文记录的是
利用
SENet V2 模块
模块优化
RT-DETR
的目标检测网络模型
。
SENet V2
在
V1
的基础上引入
多分支密集层
,同时包含了
通道信息和全局信息
,克服了传统卷积神经网络在
全局表示学习不足
以及
V1
本身可优化空间的问题。本文将其加入到
RT-DETR
的不同位置中,并进行二次创新,充分发挥
SE V2
模块的性能。
二、SENet V2介绍
SENetV2: Aggregated dense layer for channelwise and global representations
1. 模块设计出发点
-
现有技术的局限性
- CNN的空间学习优势与全局学习不足 :卷积神经网络(CNNs)在学习局部感受野内的空间相关性方面表现出色,但在学习全局表示方面相对不足。例如在图像分类任务中,虽然能提取局部特征,但对于整体的图像类别特征把握可能不够全面。
-
SENet的改进空间
:
SENet通过 挤压 和 激励 操作增强了通道表示,但仍有可优化之处。
-
借鉴其他成功架构的思路
-
Inception模块的多分支卷积优势
:
Inception模块采用多分支卷积,不同分支使用不同尺寸的滤波器,最后拼接,能在降低理论复杂度的同时提高性能。这种多分支结构启发了新模块设计,使其能够更好地学习不同尺度的特征。 -
ResNeXt的聚合模块思想
:
ResNeXt引入了聚合残差模块和“基数”概念,减少了理论复杂度并提升了性能。这为新模块在结构设计和优化上提供了参考,以更好地整合信息和提高效率。
-
Inception模块的多分支卷积优势
:
2. 原理
-
通道信息的处理
- 挤压操作(Squeeze) :输入经过卷积层后,进入全局平均池化层生成通道方向的输入,再进入具有缩减尺寸的全连接(FC)层进行挤压操作。该操作通过全连接层对通道信息进行重新整合和筛选,提取关键特征。
- 激励操作(Excitation) :挤压后的信息进入激励组件,激励组件包含一个不进行缩减的FC层,恢复输入的原始形式,然后通过缩放操作与特征图进行通道方向的乘法,最后重新缩放恢复原始形状。这一步骤能够增强重要通道的信息,抑制不重要的通道信息。
-
全局与局部信息的融合
- 多分支密集层的引入 :在挤压操作中引入多分支密集层,将聚合层连接起来并传递给FC层。这种结构使得模块能够学习到更广泛的全局表示,同时与通道表示相结合,实现全局与局部信息的融合。
- 核心特征与激励层的交互 :通过选择合适的基数(如4),使模块能够在不增加不必要复杂度和模型参数的情况下,让核心特征与激励层有效交互,更好地学习全局表示并保留高效的结构。
3. 结构
-
与现有模块的对比
-
聚合残差模块(ResNeXt)
:
ResNeXt的 聚合残差模块 通过分支卷积直接连接输入,数学公式为 R e s n e X t = x + ∑ F ( x ) Resne X t=x+\sum F(x) R es n e Xt = x + ∑ F ( x ) 。而新模块在此基础上进行了改进,更加注重通道信息的处理和全局表示的学习。 -
挤压和激励模块(SENet)
:
SENet的挤压和激励操作公式为 S E n e t = x + F ( x ⋅ E x ( S q ( x ) ) ) S E n e t=x+F(x \cdot E x(S q(x))) SE n e t = x + F ( x ⋅ E x ( Sq ( x ))) ,新模块在其基础上引入了多分支密集层和新的操作方式,如公式 S E n e t V 2 = x + F ( x ⋅ E x ( ∑ S q ( x ) ) ) S E n e t V 2=x+F\left(x \cdot E x\left(\sum S q(x)\right)\right) SE n e t V 2 = x + F ( x ⋅ E x ( ∑ Sq ( x ) ) ) 所示。
-
聚合残差模块(ResNeXt)
:
-
自身结构特点
- 多分支FC层 :类似于ResNeXt的方法, 引入相同大小的多分支FC层 ,增加了层间的基数,优化了信息传递。
-
分层处理流程
:包括
挤压层在激励前传递关键特征,然后经过一系列操作恢复原始形式,最后将处理后的信息与输入在残差模块中连接,形成一个完整的分层处理流程。
ResNeXt, SENet和SENetV2模块之间的比较
4. 优势
-
性能提升
- 实验验证 :在CIFAR-10、CIFAR-100和定制版ImageNet等数据集上进行实验,与ResNet、SENet等现有架构相比,SENetV2在分类准确率上有显著提高。例如在CIFAR-10数据集上,Resnet准确率为77.38,SE Resnet为77.79,而SE ResnetV2达到了78.60。
- 特征表示增强 :通过更好地融合通道表示和全局表示,增强了网络对图像特征的提取能力,从而提高了分类性能。
-
复杂度控制
- 参数增加可接受 :虽然模型参数相比SENet有少量增加,但增加幅度较小。例如在CIFAR-100数据集上,Resnet参数为23.62M,SE Resnet为24.90M,SE ResnetV2为28.67M,增加的参数换来的是性能的提升,在实际应用中是可接受的。
- 结构优化 :通过合理选择基数和引入多分支结构,在不增加过多复杂度的情况下提升了性能,保持了模型结构的高效性。
论文: https://arxiv.org/pdf/2311.10807
源码: https://github.com/mahendran-narayanan/SENetV2-Aggregated-dense-layer-for-channelwise-and-global-representations
三、SE v2的实现代码
SE v2
及其改进的实现代码如下:
import torch
import torch.nn as nn
import torch.nn.functional as F
from ultralytics.nn.modules.conv import LightConv
class SELayer(nn.Module):
def __init__(self, channel, reduction=16):
super(SELayer, self).__init__()
self.avg_pool = nn.AdaptiveAvgPool2d(1)
self.fc = nn.Sequential(
nn.Linear(channel, channel // reduction, bias=False),
nn.ReLU(inplace=True),
nn.Linear(channel // reduction, channel, bias=False),
nn.Sigmoid()
)
def forward(self, x):
b, c, _, _ = x.size()
y = self.avg_pool(x).view(b, c)
y = self.fc(y).view(b, c, 1, 1)
return x * y.expand_as(x)
class SELayerV2(nn.Module):
def __init__(self, in_channel, reduction=16):
super(SELayerV2, self).__init__()
assert in_channel >= reduction and in_channel % reduction == 0, 'invalid in_channel in SaElayer'
self.reduction = reduction
self.cardinality = 4
self.avg_pool = nn.AdaptiveAvgPool2d(1)
# cardinality 1
self.fc1 = nn.Sequential(
nn.Linear(in_channel, in_channel // self.reduction, bias=False),
nn.ReLU(inplace=True)
)
# cardinality 2
self.fc2 = nn.Sequential(
nn.Linear(in_channel, in_channel // self.reduction, bias=False),
nn.ReLU(inplace=True)
)
# cardinality 3
self.fc3 = nn.Sequential(
nn.Linear(in_channel, in_channel // self.reduction, bias=False),
nn.ReLU(inplace=True)
)
# cardinality 4
self.fc4 = nn.Sequential(
nn.Linear(in_channel, in_channel // self.reduction, bias=False),
nn.ReLU(inplace=True)
)
self.fc = nn.Sequential(
nn.Linear(in_channel // self.reduction * self.cardinality, in_channel, bias=False),
nn.Sigmoid()
)
def forward(self, x):
b, c, _, _ = x.size()
y = self.avg_pool(x).view(b, c)
y1 = self.fc1(y)
y2 = self.fc2(y)
y3 = self.fc3(y)
y4 = self.fc4(y)
y_concate = torch.cat([y1, y2, y3, y4], dim=1)
y_ex_dim = self.fc(y_concate).view(b, c, 1, 1)
return x * y_ex_dim.expand_as(x)
def autopad(k, p=None, d=1): # kernel, padding, dilation
"""Pad to 'same' shape outputs."""
if d > 1:
k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k] # actual kernel-size
if p is None:
p = k // 2 if isinstance(k, int) else [x // 2 for x in k] # auto-pad
return p
class Conv(nn.Module):
"""Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)."""
default_act = nn.SiLU() # default activation
def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):
"""Initialize Conv layer with given arguments including activation."""
super().__init__()
self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)
self.bn = nn.BatchNorm2d(c2)
self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()
def forward(self, x):
"""Apply convolution, batch normalization and activation to input tensor."""
return self.act(self.bn(self.conv(x)))
def forward_fuse(self, x):
"""Perform transposed convolution of 2D data."""
return self.act(self.conv(x))
class HGBlock_SEV2(nn.Module):
"""
HG_Block of PPHGNetV2 with 2 convolutions and LightConv.
https://github.com/PaddlePaddle/PaddleDetection/blob/develop/ppdet/modeling/backbones/hgnet_v2.py
"""
def __init__(self, c1, cm, c2, k=3, n=6, lightconv=False, shortcut=False, act=nn.ReLU()):
"""Initializes a CSP Bottleneck with 1 convolution using specified input and output channels."""
super().__init__()
block = LightConv if lightconv else Conv
self.m = nn.ModuleList(block(c1 if i == 0 else cm, cm, k=k, act=act) for i in range(n))
self.sc = Conv(c1 + n * cm, c2 // 2, 1, 1, act=act) # squeeze conv
self.ec = Conv(c2 // 2, c2, 1, 1, act=act) # excitation conv
self.add = shortcut and c1 == c2
self.cv = SELayerV2(c2)
def forward(self, x):
"""Forward pass of a PPHGNetV2 backbone layer."""
y = [x]
y.extend(m(y[-1]) for m in self.m)
y = self.cv(self.ec(self.sc(torch.cat(y, 1))))
return y + x if self.add else y
class ResNetBlock(nn.Module):
"""ResNet block with standard convolution layers."""
def __init__(self, c1, c2, s=1, e=4):
"""Initialize convolution with given parameters."""
super().__init__()
c3 = e * c2
self.cv1 = Conv(c1, c2, k=1, s=1, act=True)
self.cv2 = Conv(c2, c2, k=3, s=s, p=1, act=True)
self.cv3 = Conv(c2, c3, k=1, act=False)
self.cv4 = SELayerV2(c2)
self.shortcut = nn.Sequential(Conv(c1, c3, k=1, s=s, act=False)) if s != 1 or c1 != c3 else nn.Identity()
def forward(self, x):
"""Forward pass through the ResNet block."""
return F.relu(self.cv3(self.cv4(self.cv2(self.cv1(x)))) + self.shortcut(x))
class ResNetLayer_SEV2(nn.Module):
"""ResNet layer with multiple ResNet blocks."""
def __init__(self, c1, c2, s=1, is_first=False, n=1, e=4):
"""Initializes the ResNetLayer given arguments."""
super().__init__()
self.is_first = is_first
if self.is_first:
self.layer = nn.Sequential(
Conv(c1, c2, k=7, s=2, p=3, act=True), nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
)
else:
blocks = [ResNetBlock(c1, c2, s, e=e)]
blocks.extend([ResNetBlock(e * c2, c2, 1, e=e) for _ in range(n - 1)])
self.layer = nn.Sequential(*blocks)
def forward(self, x):
"""Forward pass through the ResNet layer."""
return self.layer(x)
四、创新模块
4.1 改进点1⭐
模块改进方法
:基于
SEv2模块
的
HGBlock
(
第五节讲解添加步骤
)。
SEv2模块
添加到
HGBlock
后如下:
class HGBlock_SEV2(nn.Module):
"""
HG_Block of PPHGNetV2 with 2 convolutions and LightConv.
https://github.com/PaddlePaddle/PaddleDetection/blob/develop/ppdet/modeling/backbones/hgnet_v2.py
"""
def __init__(self, c1, cm, c2, k=3, n=6, lightconv=False, shortcut=False, act=nn.ReLU()):
"""Initializes a CSP Bottleneck with 1 convolution using specified input and output channels."""
super().__init__()
block = LightConv if lightconv else Conv
self.m = nn.ModuleList(block(c1 if i == 0 else cm, cm, k=k, act=act) for i in range(n))
self.sc = Conv(c1 + n * cm, c2 // 2, 1, 1, act=act) # squeeze conv
self.ec = Conv(c2 // 2, c2, 1, 1, act=act) # excitation conv
self.add = shortcut and c1 == c2
self.cv = SELayerV2(c2)
def forward(self, x):
"""Forward pass of a PPHGNetV2 backbone layer."""
y = [x]
y.extend(m(y[-1]) for m in self.m)
y = self.cv(self.ec(self.sc(torch.cat(y, 1))))
return y + x if self.add else y
4.2 改进点2⭐
模块改进方法
:基于
SEv2模块
的
ResNetLayer
(
第五节讲解添加步骤
)。
第二种改进方法是对
RT-DETR
中的
ResNetLayer模块
进行改进,并将
SEv2
在加入到
ResNetLayer
模块中。
改进代码如下:
对
ResNetLayer
模块进行改进,加入
SEv2模块
,重命名为
ResNetLayer_SEV2
。
class ResNetBlock(nn.Module):
"""ResNet block with standard convolution layers."""
def __init__(self, c1, c2, s=1, e=4):
"""Initialize convolution with given parameters."""
super().__init__()
c3 = e * c2
self.cv1 = Conv(c1, c2, k=1, s=1, act=True)
self.cv2 = Conv(c2, c2, k=3, s=s, p=1, act=True)
self.cv3 = Conv(c2, c3, k=1, act=False)
self.cv4 = SELayerV2(c2)
self.shortcut = nn.Sequential(Conv(c1, c3, k=1, s=s, act=False)) if s != 1 or c1 != c3 else nn.Identity()
def forward(self, x):
"""Forward pass through the ResNet block."""
return F.relu(self.cv3(self.cv4(self.cv2(self.cv1(x)))) + self.shortcut(x))
class ResNetLayer_SEV2(nn.Module):
"""ResNet layer with multiple ResNet blocks."""
def __init__(self, c1, c2, s=1, is_first=False, n=1, e=4):
"""Initializes the ResNetLayer given arguments."""
super().__init__()
self.is_first = is_first
if self.is_first:
self.layer = nn.Sequential(
Conv(c1, c2, k=7, s=2, p=3, act=True), nn.MaxPool2d(kernel_size=3, stride=2, padding=1)
)
else:
blocks = [ResNetBlock(c1, c2, s, e=e)]
blocks.extend([ResNetBlock(e * c2, c2, 1, e=e) for _ in range(n - 1)])
self.layer = nn.Sequential(*blocks)
def forward(self, x):
"""Forward pass through the ResNet layer."""
return self.layer(x)
注意❗:在
第五小节
中需要声明的模块名称为:
HGBlock_SEV2
和
ResNetLayer_SEV2
。
五、添加步骤
5.1 修改一
① 在
ultralytics/nn/
目录下新建
AddModules
文件夹用于存放模块代码
② 在
AddModules
文件夹下新建
SEv2.py
,将
第三节
中的代码粘贴到此处
5.2 修改二
在
AddModules
文件夹下新建
__init__.py
(已有则不用新建),在文件内导入模块:
from .SEv2 import *
5.3 修改三
在
ultralytics/nn/modules/tasks.py
文件中,需要在两处位置添加各模块类名称。
首先:导入模块
其次:在
parse_model函数
中注册
HGBlock_SEV2
和
ResNetLayer_SEV2
模块
最后:在
parse_model函数
中添加如下代码:
elif m in {SELayerV2}:
c2 = ch[f]
args = [c2, *args]
六、yaml模型文件
6.1 模型改进版本1
此处以
ultralytics/cfg/models/rt-detr/rtdetr-l.yaml
为例,在同目录下创建一个用于自己数据集训练的模型文件
rtdetr-l-HGBlock_SEV2.yaml
。
将
rtdetr-l.yaml
中的内容复制到
rtdetr-l-HGBlock_SEV2.yaml
文件下,修改
nc
数量等于自己数据中目标的数量。
📌 模型的修改方法是将骨干网络中的
HGBlock
模块替换成
HGBlock_SEV2模块
。
# Ultralytics YOLO 🚀, AGPL-3.0 license
# RT-DETR-l object detection model with P3-P5 outputs. For details see https://docs.ultralytics.com/models/rtdetr
# Parameters
nc: 1 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'
# [depth, width, max_channels]
l: [1.00, 1.00, 1024]
backbone:
# [from, repeats, module, args]
- [-1, 1, HGStem, [32, 48]] # 0-P2/4
- [-1, 6, HGBlock, [48, 128, 3]] # stage 1
- [-1, 1, DWConv, [128, 3, 2, 1, False]] # 2-P3/8
- [-1, 6, HGBlock, [96, 512, 3]] # stage 2
- [-1, 1, DWConv, [512, 3, 2, 1, False]] # 4-P4/16
- [-1, 6, HGBlock_SEV2, [192, 512, 5, True, False]] # cm, c2, k, light, shortcut
- [-1, 6, HGBlock_SEV2, [192, 512, 5, True, True]]
- [-1, 6, HGBlock_SEV2, [192, 512, 5, True, True]] # stage 3
- [-1, 1, DWConv, [1024, 3, 2, 1, False]] # 8-P5/32
- [-1, 6, HGBlock, [384, 2048, 5, True, False]] # stage 4
head:
- [-1, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 10 input_proj.2
- [-1, 1, AIFI, [1024, 8]]
- [-1, 1, Conv, [256, 1, 1]] # 12, Y5, lateral_convs.0
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [7, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 14 input_proj.1
- [[-2, -1], 1, Concat, [1]]
- [-1, 3, RepC3, [256]] # 16, fpn_blocks.0
- [-1, 1, Conv, [256, 1, 1]] # 17, Y4, lateral_convs.1
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [3, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 19 input_proj.0
- [[-2, -1], 1, Concat, [1]] # cat backbone P4
- [-1, 3, RepC3, [256]] # X3 (21), fpn_blocks.1
- [-1, 1, Conv, [256, 3, 2]] # 22, downsample_convs.0
- [[-1, 17], 1, Concat, [1]] # cat Y4
- [-1, 3, RepC3, [256]] # F4 (24), pan_blocks.0
- [-1, 1, Conv, [256, 3, 2]] # 25, downsample_convs.1
- [[-1, 12], 1, Concat, [1]] # cat Y5
- [-1, 3, RepC3, [256]] # F5 (27), pan_blocks.1
- [[21, 24, 27], 1, RTDETRDecoder, [nc]] # Detect(P3, P4, P5)
6.2 模型改进版本2⭐
此处以
ultralytics/cfg/models/rt-detr/rtdetr-resnet50.yaml
为例,在同目录下创建一个用于自己数据集训练的模型文件
rtdetr-ResNetLayer_SEV2.yaml
。
将
rtdetr-resnet50.yaml
中的内容复制到
rtdetr-ResNetLayer_SEV2.yaml
文件下,修改
nc
数量等于自己数据中目标的数量。
📌 模型的修改方法是将
骨干网络
中的
ResNetLayer模块
替换成
ResNetLayer_SEV2模块
。
# Ultralytics YOLO 🚀, AGPL-3.0 license
# RT-DETR-ResNet50 object detection model with P3-P5 outputs.
# Parameters
nc: 1 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'
# [depth, width, max_channels]
l: [1.00, 1.00, 1024]
backbone:
# [from, repeats, module, args]
- [-1, 1, ResNetLayer_LSKA, [3, 64, 1, True, 1]] # 0
- [-1, 1, ResNetLayer_LSKA, [64, 64, 1, False, 3]] # 1
- [-1, 1, ResNetLayer_LSKA, [256, 128, 2, False, 4]] # 2
- [-1, 1, ResNetLayer_LSKA, [512, 256, 2, False, 6]] # 3
- [-1, 1, ResNetLayer_LSKA, [1024, 512, 2, False, 3]] # 4
head:
- [-1, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 5
- [-1, 1, AIFI, [1024, 8]]
- [-1, 1, Conv, [256, 1, 1]] # 7
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [3, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 9
- [[-2, -1], 1, Concat, [1]]
- [-1, 3, RepC3, [256]] # 11
- [-1, 1, Conv, [256, 1, 1]] # 12
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [2, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 14
- [[-2, -1], 1, Concat, [1]] # cat backbone P4
- [-1, 3, RepC3, [256]] # X3 (16), fpn_blocks.1
- [-1, 1, Conv, [256, 3, 2]] # 17, downsample_convs.0
- [[-1, 12], 1, Concat, [1]] # cat Y4
- [-1, 3, RepC3, [256]] # F4 (19), pan_blocks.0
- [-1, 1, Conv, [256, 3, 2]] # 20, downsample_convs.1
- [[-1, 7], 1, Concat, [1]] # cat Y5
- [-1, 3, RepC3, [256]] # F5 (22), pan_blocks.1
- [[16, 19, 22], 1, RTDETRDecoder, [nc]] # Detect(P3, P4, P5)
七、成功运行结果
打印网络模型可以看到
HGBlock_SEV2
和
ResNetLayer_SEV2
已经加入到模型中,并可以进行训练了。
rtdetr-l-HGBlock_SEV2 :
rtdetr-l-HGBlock_SEV2 summary: 1,081 layers, 50,854,723 parameters, 50,854,723 gradients, 158.3 GFLOPs
from n params module arguments
0 -1 1 25248 ultralytics.nn.modules.block.HGStem [3, 32, 48]
1 -1 6 155072 ultralytics.nn.modules.block.HGBlock [48, 48, 128, 3, 6]
2 -1 1 1408 ultralytics.nn.modules.conv.DWConv [128, 128, 3, 2, 1, False]
3 -1 6 839296 ultralytics.nn.modules.block.HGBlock [128, 96, 512, 3, 6]
4 -1 1 5632 ultralytics.nn.modules.conv.DWConv [512, 512, 3, 2, 1, False]
5 -1 6 17411328 ultralytics.nn.AddModules.SEv2.HGBlock_SEV2 [512, 192, 512, 5, True, False]
6 -1 6 3286656 ultralytics.nn.AddModules.SEv2.HGBlock_SEV2 [512, 192, 512, 5, True, True]
7 -1 6 3286656 ultralytics.nn.AddModules.SEv2.HGBlock_SEV2 [512, 192, 512, 5, True, True]
8 -1 1 11264 ultralytics.nn.modules.conv.DWConv [512, 1024, 3, 2, 1, False]
9 -1 6 6708480 ultralytics.nn.modules.block.HGBlock [1024, 384, 2048, 5, 6, True, False]
10 -1 1 524800 ultralytics.nn.modules.conv.Conv [2048, 256, 1, 1, None, 1, 1, False]
11 -1 1 789760 ultralytics.nn.modules.transformer.AIFI [256, 1024, 8]
12 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
13 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
14 7 1 131584 ultralytics.nn.modules.conv.Conv [512, 256, 1, 1, None, 1, 1, False]
15 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
16 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
17 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
18 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
19 3 1 131584 ultralytics.nn.modules.conv.Conv [512, 256, 1, 1, None, 1, 1, False]
20 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
21 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
22 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
23 [-1, 17] 1 0 ultralytics.nn.modules.conv.Concat [1]
24 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
25 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
26 [-1, 12] 1 0 ultralytics.nn.modules.conv.Concat [1]
27 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
28 [21, 24, 27] 1 7303907 ultralytics.nn.modules.head.RTDETRDecoder [1, [256, 256, 256]]
rtdetr-l-HGBlock_SEV2 summary: 1,081 layers, 50,854,723 parameters, 50,854,723 gradients, 158.3 GFLOPs
rtdetr-ResNetLayer_SEV2 :
rtdetr-ResNetLayer_SEV2 summary: 689 layers, 44,099,555 parameters, 44,099,555 gradients, 134.2 GFLOPs
from n params module arguments
0 -1 1 9536 ultralytics.nn.AddModules.LSKA.ResNetLayer_LSKA[3, 64, 1, True, 1]
1 -1 1 232128 ultralytics.nn.AddModules.LSKA.ResNetLayer_LSKA[64, 64, 1, False, 3]
2 -1 1 1295872 ultralytics.nn.AddModules.LSKA.ResNetLayer_LSKA[256, 128, 2, False, 4]
3 -1 1 7523840 ultralytics.nn.AddModules.LSKA.ResNetLayer_LSKA[512, 256, 2, False, 6]
4 -1 1 15783424 ultralytics.nn.AddModules.LSKA.ResNetLayer_LSKA[1024, 512, 2, False, 3]
5 -1 1 524800 ultralytics.nn.modules.conv.Conv [2048, 256, 1, 1, None, 1, 1, False]
6 -1 1 789760 ultralytics.nn.modules.transformer.AIFI [256, 1024, 8]
7 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
8 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
9 3 1 262656 ultralytics.nn.modules.conv.Conv [1024, 256, 1, 1, None, 1, 1, False]
10 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
11 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
12 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
13 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
14 2 1 131584 ultralytics.nn.modules.conv.Conv [512, 256, 1, 1, None, 1, 1, False]
15 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
16 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
17 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
18 [-1, 12] 1 0 ultralytics.nn.modules.conv.Concat [1]
19 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
20 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
21 [-1, 7] 1 0 ultralytics.nn.modules.conv.Concat [1]
22 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
23 [16, 19, 22] 1 7303907 ultralytics.nn.modules.head.RTDETRDecoder [1, [256, 256, 256]]
rtdetr-ResNetLayer_SEV2 summary: 689 layers, 44,099,555 parameters, 44,099,555 gradients, 134.2 GFLOPs