RT-DETR改进策略【注意力机制篇】| 2024 SCI TOP FCAttention 即插即用注意力模块,增强局部和全局特征信息交互
一、本文介绍
本文记录的是
基于FCAttention模块的RT-DETR目标检测改进方法研究
。
FCAttention
是图像去雾领域新提出的模块能够
有效整合全局和局部信息、合理分配权重的通道注意力机制,使得网络能够更准确地强调有用特征,抑制不太有用的特征
,在目标检测领域中同样有效。
二、FCA原理
用于图像去雾的无监督双向对比重建和自适应细粒度通道注意网络
FCA(Adaptive Fine - Grained Channel Attention)
模块设计的原理及优势如下:
2.1 原理
- 特征图处理 :首先,对包含全局空间信息的特征图F进行全局平均池化,将其转换为通道描述符U,用于获取通道信息。具体公式为: U n = G A P ( F n ) = 1 H × W ∑ i = 1 H ∑ j = 1 W F n ( i , j ) U_{n}=GAP(F_{n})=\frac{1}{H×W}\sum_{i=1}^{H}\sum_{j=1}^{W}F_{n}(i, j) U n = G A P ( F n ) = H × W 1 ∑ i = 1 H ∑ j = 1 W F n ( i , j ) ,其中 F ∈ R C × H × W F \in \mathbb{R}^{C×H×W} F ∈ R C × H × W , C C C 、 H H H 和 W W W 分别代表通道数、长度和宽度, U ∈ R C U \in \mathbb{R}^{C} U ∈ R C , G A P ( x ) GAP(x) G A P ( x ) 为全局平均 pooling 函数。
- 局部信息获取 :为了在获取少量模型参数的同时获得局部通道信息,使用带矩阵B进行局部通道交互,设置 B = [ b 1 , b 2 , b 3 , . . . , b k ] B=[b_{1}, b_{2}, b_{3},..., b_{k}] B = [ b 1 , b 2 , b 3 , ... , b k ] ,通过 U l c = ∑ i = 1 k U ⋅ b i U_{lc}=\sum_{i=1}^{k}U\cdot b_{i} U l c = ∑ i = 1 k U ⋅ b i 计算局部信息 U l c U_{lc} U l c ,其中 U U U 为通道描述符, k k k 为相邻通道数。
- 全局信息获取 :利用对角矩阵D捕获所有通道之间的依赖关系作为全局信息,设置 D = [ d 1 , d 2 , d 3 , . . . , d c ] D=[d_{1}, d_{2}, d_{3},..., d_{c}] D = [ d 1 , d 2 , d 3 , ... , d c ] ,通过 U g c = ∑ i = 1 c U ⋅ d i U_{gc}=\sum_{i=1}^{c}U\cdot d_{i} U g c = ∑ i = 1 c U ⋅ d i 计算全局信息 U g c U_{gc} U g c ,其中 c c c 为通道数。
- 相关性捕获 :通过交叉相关操作将全局信息 U g c U_{gc} U g c 与局部信息 U l c U_{lc} U l c 相结合,得到相关性矩阵 M = U g c ⋅ U l c T M = U_{gc}\cdot U_{lc}^{T} M = U g c ⋅ U l c T ,以捕获两者在不同粒度上的相关性。
- 自适应融合 :从相关性矩阵及其转置中提取行和列信息作为全局和局部信息的权重向量,通过可学习因子实现动态融合。具体公式为: U g c w = ∑ j c M i , j , i ∈ 1 , 2 , 3... c U_{gc}^{w}=\sum_{j}^{c}M_{i, j}, i \in 1,2,3...c U g c w = ∑ j c M i , j , i ∈ 1 , 2 , 3... c , U l c w = ∑ j c ( U l c ⋅ U g c T ) i , j = ∑ j c M i , j T , i ∈ 1 , 2 , 3... c U_{lc}^{w}=\sum_{j}^{c}(U_{lc}\cdot U_{gc}^{T})_{i, j}=\sum_{j}^{c}M^{T}_{i, j}, i \in 1,2,3...c U l c w = ∑ j c ( U l c ⋅ U g c T ) i , j = ∑ j c M i , j T , i ∈ 1 , 2 , 3... c , W = σ ( σ ( θ ) × σ ( U g c w ) + ( 1 − σ ( θ ) ) × σ ( U l c w ) ) W=\sigma(\sigma(\theta)×\sigma(U_{gc}^{w})+(1 - \sigma(\theta))×\sigma(U_{lc}^{w})) W = σ ( σ ( θ ) × σ ( U g c w ) + ( 1 − σ ( θ )) × σ ( U l c w )) ,其中 KaTeX parse error: Can't use function '\)' in math mode at position 11: U_{gc}^{w}\̲)̲和\(U_{lc}^{w} 为融合后的全局和局部通道权重, θ \theta θ 表示sigmoid激活函数。
- 权重应用 :将得到的权重与输入特征图相乘,得到最终输出特征图,即 τ ∗ = W ⊗ F \tau^{*}=W \otimes F τ ∗ = W ⊗ F ,其中 F F F 为输入特征图, F ∗ F^{*} F ∗ 为最终输出特征图。
2.2 优势
- 有效整合信息 :能够有效整合全局和局部信息,通过相关性矩阵捕获两者在不同粒度上的相关性,促进了全局和局部信息的有效交互。
- 合理分配权重 :采用自适应融合策略,避免了局部和全局信息之间冗余的交叉相关操作,进一步促进了它们的交互,能够更精确地为去雾相关特征分配权重。
- 提升去雾性能 :在网络去雾过程中,充分利用全局和局部通道信息,提高了网络去雾性能,使得网络能够更准确地强调有用特征,抑制不太有用的特征。
论文: https://doi.org/10.1016/j.neunet.2024.106314
源码: https://github.com/Lose-Code/UBRFC-Net
三、FCAttention的实现代码
FCAttention模块
的实现代码如下:
import math
import torch
import torch.nn as nn
from ultralytics.nn.modules.conv import LightConv
class Mix(nn.Module):
def __init__(self, m=-0.80):
super(Mix, self).__init__()
w = torch.nn.Parameter(torch.FloatTensor([m]), requires_grad=True)
w = torch.nn.Parameter(w, requires_grad=True)
self.w = w
self.mix_block = nn.Sigmoid()
def forward(self, fea1, fea2):
mix_factor = self.mix_block(self.w)
out = fea1 * mix_factor.expand_as(fea1) + fea2 * (1 - mix_factor.expand_as(fea2))
return out
class FCAttention(nn.Module):
def __init__(self,channel,b=1, gamma=2):
super(FCAttention, self).__init__()
self.avg_pool = nn.AdaptiveAvgPool2d(1)#全局平均池化
#一维卷积
t = int(abs((math.log(channel, 2) + b) / gamma))
k = t if t % 2 else t + 1
self.conv1 = nn.Conv1d(1, 1, kernel_size=k, padding=int(k / 2), bias=False)
self.fc = nn.Conv2d(channel, channel, 1, padding=0, bias=True)
self.sigmoid = nn.Sigmoid()
self.mix = Mix()
def forward(self, input):
x = self.avg_pool(input)
x1 = self.conv1(x.squeeze(-1).transpose(-1, -2)).transpose(-1, -2)#(1,64,1)
x2 = self.fc(x).squeeze(-1).transpose(-1, -2)#(1,1,64)
out1 = torch.sum(torch.matmul(x1,x2),dim=1).unsqueeze(-1).unsqueeze(-1)#(1,64,1,1)
#x1 = x1.transpose(-1, -2).unsqueeze(-1)
out1 = self.sigmoid(out1)
out2 = torch.sum(torch.matmul(x2.transpose(-1, -2),x1.transpose(-1, -2)),dim=1).unsqueeze(-1).unsqueeze(-1)
#out2 = self.fc(x)
out2 = self.sigmoid(out2)
out = self.mix(out1,out2)
out = self.conv1(out.squeeze(-1).transpose(-1, -2)).transpose(-1, -2).unsqueeze(-1)
out = self.sigmoid(out)
return input*out
def autopad(k, p=None, d=1): # kernel, padding, dilation
"""Pad to 'same' shape outputs."""
if d > 1:
k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k] # actual kernel-size
if p is None:
p = k // 2 if isinstance(k, int) else [x // 2 for x in k] # auto-pad
return p
class Conv(nn.Module):
"""Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)."""
default_act = nn.SiLU() # default activation
def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):
"""Initialize Conv layer with given arguments including activation."""
super().__init__()
self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)
self.bn = nn.BatchNorm2d(c2)
self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()
def forward(self, x):
"""Apply convolution, batch normalization and activation to input tensor."""
return self.act(self.bn(self.conv(x)))
def forward_fuse(self, x):
"""Perform transposed convolution of 2D data."""
return self.act(self.conv(x))
class HGBlock_FCAttention(nn.Module):
"""
HG_Block of PPHGNetV2 with 2 convolutions and LightConv.
https://github.com/PaddlePaddle/PaddleDetection/blob/develop/ppdet/modeling/backbones/hgnet_v2.py
"""
def __init__(self, c1, cm, c2, k=3, n=6, lightconv=False, shortcut=False, act=nn.ReLU()):
"""Initializes a CSP Bottleneck with 1 convolution using specified input and output channels."""
super().__init__()
block = LightConv if lightconv else Conv
self.m = nn.ModuleList(block(c1 if i == 0 else cm, cm, k=k, act=act) for i in range(n))
self.sc = Conv(c1 + n * cm, c2 // 2, 1, 1, act=act) # squeeze conv
self.ec = Conv(c2 // 2, c2, 1, 1, act=act) # excitation conv
self.add = shortcut and c1 == c2
self.cv = FCAttention(c2)
def forward(self, x):
"""Forward pass of a PPHGNetV2 backbone layer."""
y = [x]
y.extend(m(y[-1]) for m in self.m)
y = self.cv(self.ec(self.sc(torch.cat(y, 1))))
return y + x if self.add else y
四、创新模块
4.1 改进点1
模块改进方法
1️⃣:直接加入
FCAttention模块
。
FCAttention模块
添加后如下:
注意❗:需要声明的模块名称为:
FCAttention
。
4.2 改进点2⭐
模块改进方法
2️⃣:基于
FCAttention模块
的
HGBlock
。
第二种改进方法是对
RT-DETR
中的
HGBlock模块
进行改进,将
FCAttention注意力模块
加入后,能够充分利用全局和局部通道信息,
使得网络能够更准确地强调有用特征,抑制不太有用的特征。并且其中的自适应融合策略,避免了局部和全局信息之间冗余的交叉相关操作,进一步促进了它们的交互,能够更精确地进行特征分配权重。
改进代码如下:
class HGBlock_FCAttention(nn.Module):
"""
HG_Block of PPHGNetV2 with 2 convolutions and LightConv.
https://github.com/PaddlePaddle/PaddleDetection/blob/develop/ppdet/modeling/backbones/hgnet_v2.py
"""
def __init__(self, c1, cm, c2, k=3, n=6, lightconv=False, shortcut=False, act=nn.ReLU()):
"""Initializes a CSP Bottleneck with 1 convolution using specified input and output channels."""
super().__init__()
block = LightConv if lightconv else Conv
self.m = nn.ModuleList(block(c1 if i == 0 else cm, cm, k=k, act=act) for i in range(n))
self.sc = Conv(c1 + n * cm, c2 // 2, 1, 1, act=act) # squeeze conv
self.ec = Conv(c2 // 2, c2, 1, 1, act=act) # excitation conv
self.add = shortcut and c1 == c2
self.cv = FCAttention(c2)
def forward(self, x):
"""Forward pass of a PPHGNetV2 backbone layer."""
y = [x]
y.extend(m(y[-1]) for m in self.m)
y = self.cv(self.ec(self.sc(torch.cat(y, 1))))
return y + x if self.add else y
注意❗:需要声明的模块名称为:
HGBlock_FCAttention
。
五、添加步骤
5.1 修改一
① 在
ultralytics/nn/
目录下新建
AddModules
文件夹用于存放模块代码
② 在
AddModules
文件夹下新建
FCAttention.py
,将
第三节
中的代码粘贴到此处
5.2 修改二
在
AddModules
文件夹下新建
__init__.py
(已有则不用新建),在文件内导入模块:
from .FCAttention import *
5.3 修改三
在
ultralytics/nn/modules/tasks.py
文件中,需要在两处位置添加各模块类名称。
首先:导入模块
其次:在
parse_model函数
中注册
FCAttention
和
HGBlock_FCAttention
模块
六、yaml模型文件
6.1 模型改进版本一
在代码配置完成后,配置模型的YAML文件。
此处以
ultralytics/cfg/models/rt-detr/rtdetr-l.yaml
为例,在同目录下创建一个用于自己数据集训练的模型文件
rtdetr-l-FCAttention.yaml
。
将
rtdetr-l.yaml
中的内容复制到
rtdetr-l-FCAttention.yaml
文件下,修改
nc
数量等于自己数据中目标的数量。
在骨干网络的深层添加
FCAttention模块
,
只需要填入一个参数,通道数
。
# Ultralytics YOLO 🚀, AGPL-3.0 license
# RT-DETR-l object detection model with P3-P5 outputs. For details see https://docs.ultralytics.com/models/rtdetr
# Parameters
nc: 1 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'
# [depth, width, max_channels]
l: [1.00, 1.00, 1024]
backbone:
# [from, repeats, module, args]
- [-1, 1, HGStem, [32, 48]] # 0-P2/4
- [-1, 6, HGBlock, [48, 128, 3]] # stage 1
- [-1, 1, DWConv, [128, 3, 2, 1, False]] # 2-P3/8
- [-1, 6, HGBlock, [96, 512, 3]] # stage 2
- [-1, 1, DWConv, [512, 3, 2, 1, False]] # 4-P4/16
- [-1, 6, HGBlock, [192, 1024, 5, True, False]] # cm, c2, k, light, shortcut
- [-1, 6, HGBlock, [192, 1024, 5, True, True]]
- [-1, 6, HGBlock, [192, 1024, 5, True, True]] # stage 3
- [-1, 1, DWConv, [1024, 3, 2, 1, False]] # 8-P5/32
- [-1, 1, FCAttention, [1024]] # stage 4
- [-1, 6, HGBlock, [384, 2048, 5, True, False]] # stage 4
head:
- [-1, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 10 input_proj.2
- [-1, 1, AIFI, [1024, 8]]
- [-1, 1, Conv, [256, 1, 1]] # 12, Y5, lateral_convs.0
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [7, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 14 input_proj.1
- [[-2, -1], 1, Concat, [1]]
- [-1, 3, RepC3, [256]] # 16, fpn_blocks.0
- [-1, 1, Conv, [256, 1, 1]] # 17, Y4, lateral_convs.1
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [3, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 19 input_proj.0
- [[-2, -1], 1, Concat, [1]] # cat backbone P4
- [-1, 3, RepC3, [256]] # X3 (21), fpn_blocks.1
- [-1, 1, Conv, [256, 3, 2]] # 22, downsample_convs.0
- [[-1, 18], 1, Concat, [1]] # cat Y4
- [-1, 3, RepC3, [256]] # F4 (24), pan_blocks.0
- [-1, 1, Conv, [256, 3, 2]] # 25, downsample_convs.1
- [[-1, 13], 1, Concat, [1]] # cat Y5
- [-1, 3, RepC3, [256]] # F5 (27), pan_blocks.1
- [[22, 25, 28], 1, RTDETRDecoder, [nc]] # Detect(P3, P4, P5)
6.2 模型改进版本二⭐
此处同样以
ultralytics/cfg/models/rt-detr/rtdetr-l.yaml
为例,在同目录下创建一个用于自己数据集训练的模型文件
rtdetr-l-HGBlock_FCAttention.yaml
。
将
rtdetr-l.yaml
中的内容复制到
rtdetr-l-HGBlock_FCAttention.yaml
文件下,修改
nc
数量等于自己数据中目标的数量。
📌 模型的修改方法是将
骨干网络
中的部分
HGBlock模块
替换成
HGBlock_FCAttention模块
。
# Ultralytics YOLO 🚀, AGPL-3.0 license
# RT-DETR-l object detection model with P3-P5 outputs. For details see https://docs.ultralytics.com/models/rtdetr
# Parameters
nc: 1 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'
# [depth, width, max_channels]
l: [1.00, 1.00, 1024]
backbone:
# [from, repeats, module, args]
- [-1, 1, HGStem, [32, 48]] # 0-P2/4
- [-1, 6, HGBlock, [48, 128, 3]] # stage 1
- [-1, 1, DWConv, [128, 3, 2, 1, False]] # 2-P3/8
- [-1, 6, HGBlock, [96, 512, 3]] # stage 2
- [-1, 1, DWConv, [512, 3, 2, 1, False]] # 4-P4/16
- [-1, 6, HGBlock_FCAttention, [192, 1024, 5, True, False]] # cm, c2, k, light, shortcut
- [-1, 6, HGBlock_FCAttention, [192, 1024, 5, True, True]]
- [-1, 6, HGBlock_FCAttention, [192, 1024, 5, True, True]] # stage 3
- [-1, 1, DWConv, [1024, 3, 2, 1, False]] # 8-P5/32
- [-1, 6, HGBlock, [384, 2048, 5, True, False]] # stage 4
head:
- [-1, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 10 input_proj.2
- [-1, 1, AIFI, [1024, 8]]
- [-1, 1, Conv, [256, 1, 1]] # 12, Y5, lateral_convs.0
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [7, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 14 input_proj.1
- [[-2, -1], 1, Concat, [1]]
- [-1, 3, RepC3, [256]] # 16, fpn_blocks.0
- [-1, 1, Conv, [256, 1, 1]] # 17, Y4, lateral_convs.1
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [3, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 19 input_proj.0
- [[-2, -1], 1, Concat, [1]] # cat backbone P4
- [-1, 3, RepC3, [256]] # X3 (21), fpn_blocks.1
- [-1, 1, Conv, [256, 3, 2]] # 22, downsample_convs.0
- [[-1, 17], 1, Concat, [1]] # cat Y4
- [-1, 3, RepC3, [256]] # F4 (24), pan_blocks.0
- [-1, 1, Conv, [256, 3, 2]] # 25, downsample_convs.1
- [[-1, 12], 1, Concat, [1]] # cat Y5
- [-1, 3, RepC3, [256]] # F5 (27), pan_blocks.1
- [[21, 24, 27], 1, RTDETRDecoder, [nc]] # Detect(P3, P4, P5)
七、成功运行结果
分别打印网络模型可以看到
FCAttention 模块
和
HGBlock_FCAttention
已经加入到模型中,并可以进行训练了。
rtdetr-l-FCAttention :
rtdetr-l-FCAttention summary: 688 layers, 33,858,249 parameters, 33,858,249 gradients, 108.0 GFLOPs
from n params module arguments
0 -1 1 25248 ultralytics.nn.modules.block.HGStem [3, 32, 48]
1 -1 6 155072 ultralytics.nn.modules.block.HGBlock [48, 48, 128, 3, 6]
2 -1 1 1408 ultralytics.nn.modules.conv.DWConv [128, 128, 3, 2, 1, False]
3 -1 6 839296 ultralytics.nn.modules.block.HGBlock [128, 96, 512, 3, 6]
4 -1 1 5632 ultralytics.nn.modules.conv.DWConv [512, 512, 3, 2, 1, False]
5 -1 6 1695360 ultralytics.nn.modules.block.HGBlock [512, 192, 1024, 5, 6, True, False]
6 -1 6 2055808 ultralytics.nn.modules.block.HGBlock [1024, 192, 1024, 5, 6, True, True]
7 -1 6 2055808 ultralytics.nn.modules.block.HGBlock [1024, 192, 1024, 5, 6, True, True]
8 -1 1 11264 ultralytics.nn.modules.conv.DWConv [1024, 1024, 3, 2, 1, False]
9 -1 1 1050118 ultralytics.nn.AddModules.FCAttention.FCAttention[1024, 1024]
10 -1 6 6708480 ultralytics.nn.modules.block.HGBlock [1024, 384, 2048, 5, 6, True, False]
11 -1 1 524800 ultralytics.nn.modules.conv.Conv [2048, 256, 1, 1, None, 1, 1, False]
12 -1 1 789760 ultralytics.nn.modules.transformer.AIFI [256, 1024, 8]
13 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
14 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
15 7 1 262656 ultralytics.nn.modules.conv.Conv [1024, 256, 1, 1, None, 1, 1, False]
16 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
17 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
18 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
19 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
20 3 1 131584 ultralytics.nn.modules.conv.Conv [512, 256, 1, 1, None, 1, 1, False]
21 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
22 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
23 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
24 [-1, 18] 1 0 ultralytics.nn.modules.conv.Concat [1]
25 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
26 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
27 [-1, 13] 1 0 ultralytics.nn.modules.conv.Concat [1]
28 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
29 [22, 25, 28] 1 7303907 ultralytics.nn.modules.head.RTDETRDecoder [1, [256, 256, 256]]
rtdetr-l-FCAttention summary: 688 layers, 33,858,249 parameters, 33,858,249 gradients, 108.0 GFLOPs
rtdetr-l-HGBlock_FCAttention :
rtdetr-l-HGBlock_FCAttention summary: 703 layers, 35,956,949 parameters, 35,956,949 gradients, 108.0 GFLOPs
from n params module arguments
0 -1 1 25248 ultralytics.nn.modules.block.HGStem [3, 32, 48]
1 -1 6 155072 ultralytics.nn.modules.block.HGBlock [48, 48, 128, 3, 6]
2 -1 1 1408 ultralytics.nn.modules.conv.DWConv [128, 128, 3, 2, 1, False]
3 -1 6 839296 ultralytics.nn.modules.block.HGBlock [128, 96, 512, 3, 6]
4 -1 1 5632 ultralytics.nn.modules.conv.DWConv [512, 512, 3, 2, 1, False]
5 -1 6 2744966 ultralytics.nn.AddModules.FCAttention.HGBlock_FCAttention[512, 192, 1024, 5, 6, True, False]
6 -1 6 3105414 ultralytics.nn.AddModules.FCAttention.HGBlock_FCAttention[1024, 192, 1024, 5, 6, True, True]
7 -1 6 3105414 ultralytics.nn.AddModules.FCAttention.HGBlock_FCAttention[1024, 192, 1024, 5, 6, True, True]
8 -1 1 11264 ultralytics.nn.modules.conv.DWConv [1024, 1024, 3, 2, 1, False]
9 -1 6 6708480 ultralytics.nn.modules.block.HGBlock [1024, 384, 2048, 5, 6, True, False]
10 -1 1 524800 ultralytics.nn.modules.conv.Conv [2048, 256, 1, 1, None, 1, 1, False]
11 -1 1 789760 ultralytics.nn.modules.transformer.AIFI [256, 1024, 8]
12 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
13 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
14 7 1 262656 ultralytics.nn.modules.conv.Conv [1024, 256, 1, 1, None, 1, 1, False]
15 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
16 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
17 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
18 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
19 3 1 131584 ultralytics.nn.modules.conv.Conv [512, 256, 1, 1, None, 1, 1, False]
20 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
21 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
22 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
23 [-1, 17] 1 0 ultralytics.nn.modules.conv.Concat [1]
24 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
25 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
26 [-1, 12] 1 0 ultralytics.nn.modules.conv.Concat [1]
27 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
28 [21, 24, 27] 1 7303907 ultralytics.nn.modules.head.RTDETRDecoder [1, [256, 256, 256]]
rtdetr-l-HGBlock_FCAttention summary: 703 layers, 35,956,949 parameters, 35,956,949 gradients, 108.0 GFLOPs