RT-DETR改进策略【独家融合改进】| SPD-Conv+PPA 再次提升模型针对小目标的特征提取能力
一、本文介绍
本文记录的是
利用
SPD-Conv
优化RT-DETR的目标检测网络模型
。在利用
SPD-Conv
在进行下采样特征图时保留了所有信息,
避免了传统步长卷积和池化操作导致的细粒度信息丢失问题
,在处理
小对象和低分辨率图像
时表现更为突出。在此基础上添加
PPA模块
,其通过改进其中的传统卷积操作,能够
更好地保留小目标的重要信息。
二、SPD - Conv介绍
SPD-Conv
是一种新的 CNN 构建模块,用于替代传统 CNN 架构中使用的
步长卷积(strided convolution)
和
池化(pooling)层
,它由
空间到深度(Space-to-depth,SPD)层
和
非步长卷积(non - strided convolution)层
组成。
2.1、SPD - Conv模块的设计原理
2.1.1 SPD层
SPD层
对特征图进行下采样,同时在通道维度上保留所有信息,从而不会造成信息损失。具体来说,对于任何中间特征图X(大小为S×S×C1),
SPD层
将其切分为一系列子特征图。例如,子图f0,0 = X[0 : S : scale, 0 : S : scale],f1,0 = X[1 : S : scale, 0 : S : scale]等。一般地,给定原始特征图X,子图fx,y由所有满足i + x和j + y能被scale整除的条目X(i, j)形成。因此,每个子图将X下采样为原来的1/scale。以scale = 2为例,会得到四个子图f0,0, f1,0, f0,1, f1,1,每个子图的形状为(S / 2, S / 2, C1),将X下采样了2倍。然后,将这些子特征图沿通道维度进行拼接,得到一个特征图X’,其空间维度减小了scale倍,通道维度增加了scale²倍。换句话说,
SPD
将特征图X(S, S, C1)转换为中间特征图X’(S / scale, S / scale, scaleC1)。
2.1.2 非步长卷积层
在
SPD
特征转换层之后,添加一个非步长(即步长为1)卷积层,该卷积层具有C2个滤波器,其中C2 < scale²C1,进一步将X’(S / scale, S / scale, scaleC1)转换为X’‘(S / scale, S / scale, C2)。使用非步长卷积的原因是为了尽可能保留所有的判别特征信息。否则,例如使用步长为3的3×3滤波器,特征图会“缩小”,但每个像素仅被采样一次;如果步长为2,则会出现不对称采样,偶数行/列和奇数行/列将在不同时间被采样。一般来说,步长大于1会导致信息的无差别丢失,尽管表面上它似乎也能将特征图X(S, S, C1)转换为X’‘(S / scale, S / scale, C2)(但没有中间的X’)。
2.2、SPD - Conv模块的优势
-
通用性和统一性
:
SPD-Conv可以应用于大多数CNN架构,并且以相同的方式替换步长卷积和池化操作。 -
提高准确性
:通过在
YOLOv5和ResNet中应用SPD-Conv并进行实验,结果表明在对象检测和图像分类任务中,SPD-Conv能够显著提高检测和分类的准确性,尤其是在处理小对象和低分辨率图像时表现更为突出。例如,在对象检测任务中,与其他基线模型相比,使用SPD-Conv的YOLOv5-SPD在AP(平均精度)和APS(小对象的AP)指标上有明显提升;在图像分类任务中,ResNet18-SPD和ResNet50-SPD在Top-1准确率上明显优于其他基线模型。 -
保留信息
:
SPD-Conv通过SPD层下采样特征图时保留了所有信息,避免了传统步长卷积和池化操作导致的细粒度信息丢失问题,从而使得神经网络能够学习到更有效的特征表示。 -
易于集成
:
SPD-Conv可以轻松集成到流行的深度学习库如PyTorch和TensorFlow中,有可能产生更大的影响。
论文: https://arxiv.org/pdf/2208.03641v1.pdf
源码: https://github.com/LabSAINT/SPD-Conv
三、PPA 介绍
HCF-Net: Hierarchical Context Fusion Network for Infrared Small Object Detection
3.1 原理
3.1.1 多分支特征提取原理
采用多分支特征提取策略,通过不同分支提取不同尺度和层次的特征。利用局部、全局和串行卷积分支,对输入特征张量进行处理。通过控制 patch size参数实现局部和全局分支的区分,计算非重叠 patch之间的注意力矩阵,实现局部和全局特征提取与交互。在特征提取过程中,还通过一系列操作对特征进行选择和调整权重,最终将三个分支的结果求和得到融合后的特征。
3.1.2 特征融合和注意力原理
在多分支特征提取后,利用注意力机制进行自适应特征增强。注意力模块包括高效的通道注意力和空间注意力组件。首先通过一维通道注意力图和二维空间注意力图对特征进行依次处理,然后经过一系列激活函数、批归一化和 dropout等操作,得到最终输出。
3.2 结构
2.2.1 多分支特征提取结构
-
主要由多分支融合和注意力机制两部分组成。多分支融合部分包括 patch - aware和串联卷积。patch - aware中的参数
p设置为2和4,分别代表局部和全局分支。对于输入特征张量F,先通过点式卷积调整得到F',然后通过三个分支分别计算F_local、F_global和F_conv,最后将这三个结果求和得到\tilde{F}。
2.2.2 特征融合和注意力结构
-
包括通道注意力和空间注意力组件。
\tilde{F}依次经过一维通道注意力图M_c和二维空间注意力图M_s的处理,通过元素级乘法和后续的激活函数、批归一化等操作,最终得到PPA的输出F''。
3.3 优势
- 多分支特征提取优势 :通过多分支策略能够捕获对象的多尺度特征,提高了小目标检测的准确性。不同分支可以关注到不同尺度和层次的信息,避免了单一尺度下可能丢失的小目标特征。
- 特征融合和注意力优势 :利用注意力机制可以自适应地增强特征,突出小目标的关键信息。通道注意力和空间注意力的结合能够更好地选择和聚焦于与小目标相关的特征,提高网络对小目标的表征能力。
论文:h ttps://arxiv.org/pdf/2403.10778
源码: https://github.com/zhengshuchen/HCFNet
四、SPDConv和PPA实现代码
SPDConv模块
的实现代码如下:
import torch
import torch.nn as nn
def autopad(k, p=None, d=1): # kernel, padding, dilation
"""Pad to 'same' shape outputs."""
if d > 1:
k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k] # actual kernel-size
if p is None:
p = k // 2 if isinstance(k, int) else [x // 2 for x in k] # auto-pad
return p
class SPDConv(nn.Module):
"""Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)."""
default_act = nn.SiLU() # default activation
def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):
"""Initialize Conv layer with given arguments including activation."""
super().__init__()
c1 = c1 * 4
self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)
self.bn = nn.BatchNorm2d(c2)
self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()
def forward(self, x):
x = torch.cat([x[..., ::2, ::2], x[..., 1::2, ::2], x[..., ::2, 1::2], x[..., 1::2, 1::2]], 1)
"""Apply convolution, batch normalization and activation to input tensor."""
return self.act(self.bn(self.conv(x)))
def forward_fuse(self, x):
"""Perform transposed convolution of 2D data."""
x = torch.cat([x[..., ::2, ::2], x[..., 1::2, ::2], x[..., ::2, 1::2], x[..., 1::2, 1::2]], 1)
return self.act(self.conv(x))
PPA 模块
的实现代码如下:
import math
import torch
import torch.nn as nn
import torch.nn.functional as F
class SpatialAttentionModule(nn.Module):
def __init__(self):
super(SpatialAttentionModule, self).__init__()
self.conv2d = nn.Conv2d(in_channels=2, out_channels=1, kernel_size=7, stride=1, padding=3)
self.sigmoid = nn.Sigmoid()
def forward(self, x):
avgout = torch.mean(x, dim=1, keepdim=True)
maxout, _ = torch.max(x, dim=1, keepdim=True)
out = torch.cat([avgout, maxout], dim=1)
out = self.sigmoid(self.conv2d(out))
return out * x
class PPA(nn.Module):
def __init__(self, in_features, filters) -> None:
super().__init__()
self.skip = conv_block(in_features=in_features,
out_features=filters,
kernel_size=(1, 1),
padding=(0, 0),
norm_type='bn',
activation=False)
self.c1 = conv_block(in_features=in_features,
out_features=filters,
kernel_size=(3, 3),
padding=(1, 1),
norm_type='bn',
activation=True)
self.c2 = conv_block(in_features=filters,
out_features=filters,
kernel_size=(3, 3),
padding=(1, 1),
norm_type='bn',
activation=True)
self.c3 = conv_block(in_features=filters,
out_features=filters,
kernel_size=(3, 3),
padding=(1, 1),
norm_type='bn',
activation=True)
self.sa = SpatialAttentionModule()
self.cn = ECA(filters)
self.lga2 = LocalGlobalAttention(filters, 2)
self.lga4 = LocalGlobalAttention(filters, 4)
self.bn1 = nn.BatchNorm2d(filters)
self.drop = nn.Dropout2d(0.1)
self.relu = nn.ReLU()
self.gelu = nn.GELU()
def forward(self, x):
x_skip = self.skip(x)
x_lga2 = self.lga2(x_skip)
x_lga4 = self.lga4(x_skip)
x1 = self.c1(x)
x2 = self.c2(x1)
x3 = self.c3(x2)
x = x1 + x2 + x3 + x_skip + x_lga2 + x_lga4
x = self.cn(x)
x = self.sa(x)
x = self.drop(x)
x = self.bn1(x)
x = self.relu(x)
return x
class LocalGlobalAttention(nn.Module):
def __init__(self, output_dim, patch_size):
super().__init__()
self.output_dim = output_dim
self.patch_size = patch_size
self.mlp1 = nn.Linear(patch_size * patch_size, output_dim // 2)
self.norm = nn.LayerNorm(output_dim // 2)
self.mlp2 = nn.Linear(output_dim // 2, output_dim)
self.conv = nn.Conv2d(output_dim, output_dim, kernel_size=1)
self.prompt = torch.nn.parameter.Parameter(torch.randn(output_dim, requires_grad=True))
self.top_down_transform = torch.nn.parameter.Parameter(torch.eye(output_dim), requires_grad=True)
def forward(self, x):
x = x.permute(0, 2, 3, 1)
B, H, W, C = x.shape
P = self.patch_size
# Local branch
local_patches = x.unfold(1, P, P).unfold(2, P, P) # (B, H/P, W/P, P, P, C)
local_patches = local_patches.reshape(B, -1, P * P, C) # (B, H/P*W/P, P*P, C)
local_patches = local_patches.mean(dim=-1) # (B, H/P*W/P, P*P)
local_patches = self.mlp1(local_patches) # (B, H/P*W/P, input_dim // 2)
local_patches = self.norm(local_patches) # (B, H/P*W/P, input_dim // 2)
local_patches = self.mlp2(local_patches) # (B, H/P*W/P, output_dim)
local_attention = F.softmax(local_patches, dim=-1) # (B, H/P*W/P, output_dim)
local_out = local_patches * local_attention # (B, H/P*W/P, output_dim)
cos_sim = F.normalize(local_out, dim=-1) @ F.normalize(self.prompt[None, ..., None], dim=1) # B, N, 1
mask = cos_sim.clamp(0, 1)
local_out = local_out * mask
local_out = local_out @ self.top_down_transform
# Restore shapes
local_out = local_out.reshape(B, H // P, W // P, self.output_dim) # (B, H/P, W/P, output_dim)
local_out = local_out.permute(0, 3, 1, 2)
local_out = F.interpolate(local_out, size=(H, W), mode='bilinear', align_corners=False)
output = self.conv(local_out)
return output
class ECA(nn.Module):
def __init__(self, in_channel, gamma=2, b=1):
super(ECA, self).__init__()
k = int(abs((math.log(in_channel, 2) + b) / gamma))
kernel_size = k if k % 2 else k + 1
padding = kernel_size // 2
self.pool = nn.AdaptiveAvgPool2d(output_size=1)
self.conv = nn.Sequential(
nn.Conv1d(in_channels=1, out_channels=1, kernel_size=kernel_size, padding=padding, bias=False),
nn.Sigmoid()
)
def forward(self, x):
out = self.pool(x)
out = out.view(x.size(0), 1, x.size(1))
out = self.conv(out)
out = out.view(x.size(0), x.size(1), 1, 1)
return out * x
class conv_block(nn.Module):
def __init__(self,
in_features,
out_features,
kernel_size=(3, 3),
stride=(1, 1),
padding=(1, 1),
dilation=(1, 1),
norm_type='bn',
activation=True,
use_bias=True,
groups=1
):
super().__init__()
self.conv = nn.Conv2d(in_channels=in_features,
out_channels=out_features,
kernel_size=kernel_size,
stride=stride,
padding=padding,
dilation=dilation,
bias=use_bias,
groups=groups)
self.norm_type = norm_type
self.act = activation
if self.norm_type == 'gn':
self.norm = nn.GroupNorm(32 if out_features >= 32 else out_features, out_features)
if self.norm_type == 'bn':
self.norm = nn.BatchNorm2d(out_features)
if self.act:
# self.relu = nn.GELU()
self.relu = nn.ReLU(inplace=False)
def forward(self, x):
x = self.conv(x)
if self.norm_type is not None:
x = self.norm(x)
if self.act:
x = self.relu(x)
return x
五、添加步骤
SPD-Conv添加步骤参考:
PPA添加步骤参考(使用其中的C3k2_PPA模块):
六、yaml模型文件
6.1 模型改进版本⭐
此处以
ultralytics/cfg/models/rt-detr/rtdetr-l.yaml
为例,在同目录下创建一个用于自己数据集训练的模型文件
rtdetr-SPDConv-PPA.yaml
。
将
rtdetr-l.yaml
中的内容复制到
rtdetr-SPDConv-PPA.yaml
文件下,修改
nc
数量等于自己数据中目标的数量。
# Ultralytics YOLO 🚀, AGPL-3.0 license
# RT-DETR-l object detection model with P3-P5 outputs. For details see https://docs.ultralytics.com/models/rtdetr
# Parameters
nc: 1 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'
# [depth, width, max_channels]
l: [1.00, 1.00, 1024]
backbone:
# [from, repeats, module, args]
- [-1, 1, HGStem, [32, 48]] # 0-P2/4
- [-1, 6, HGBlock, [48, 128, 3]] # stage 1
- [-1, 1, SPDConv, [128]] # 2-P3/8
- [-1, 6, HGBlock, [96, 512, 3]] # stage 2
- [-1, 1, SPDConv, [512]] # 4-P4/16
- [-1, 6, HGBlock, [192, 1024, 5, True, False]] # cm, c2, k, light, shortcut
- [-1, 6, HGBlock, [192, 1024, 5, True, True]]
- [-1, 6, HGBlock, [192, 1024, 5, True, True]] # stage 3
- [-1, 1, SPDConv, [1024]] # 8-P5/32
- [-1, 6, HGBlock_PPA, [384, 2048, 5, True, False]] # stage 4
head:
- [-1, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 10 input_proj.2
- [-1, 1, AIFI, [1024, 8]]
- [-1, 1, Conv, [256, 1, 1]] # 12, Y5, lateral_convs.0
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [7, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 14 input_proj.1
- [[-2, -1], 1, Concat, [1]]
- [-1, 3, RepC3, [256]] # 16, fpn_blocks.0
- [-1, 1, Conv, [256, 1, 1]] # 17, Y4, lateral_convs.1
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [3, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 19 input_proj.0
- [[-2, -1], 1, Concat, [1]] # cat backbone P4
- [-1, 3, RepC3, [256]] # X3 (21), fpn_blocks.1
- [-1, 1, SPDConv, [256]] # 22, downsample_convs.0
- [[-1, 17], 1, Concat, [1]] # cat Y4
- [-1, 3, RepC3, [256]] # F4 (24), pan_blocks.0
- [-1, 1, SPDConv, [256]] # 25, downsample_convs.1
- [[-1, 12], 1, Concat, [1]] # cat Y5
- [-1, 3, RepC3, [256]] # F5 (27), pan_blocks.1
- [[21, 24, 27], 1, RTDETRDecoder, [nc]] # Detect(P3, P4, P5)
📌 新建
rtdetr-SPDConv-PPA-p2.yaml
,
增加小目标检测层
# Ultralytics YOLO 🚀, AGPL-3.0 license
# RT-DETR-l object detection model with P3-P5 outputs. For details see https://docs.ultralytics.com/models/rtdetr
# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'
# [depth, width, max_channels]
l: [1.00, 1.00, 1024]
backbone:
# [from, repeats, module, args]
- [-1, 1, HGStem, [32, 48]] # 0-P2/4
- [-1, 6, HGBlock, [48, 128, 3]] # stage 1
- [-1, 1, SPDConv, [128]] # 2-P3/8
- [-1, 6, HGBlock, [96, 512, 3]] # stage 2
- [-1, 1, SPDConv, [512]] # 4-P4/16
- [-1, 6, HGBlock, [192, 1024, 5, True, False]] # cm, c2, k, light, shortcut
- [-1, 6, HGBlock, [192, 1024, 5, True, True]]
- [-1, 6, HGBlock, [192, 1024, 5, True, True]] # stage 3
- [-1, 1, SPDConv, [1024]] # 8-P5/32
- [-1, 6, HGBlock_PPA, [384, 2048, 5, True, False]] # stage 4
head:
- [-1, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 10 input_proj.2
- [-1, 1, AIFI, [1024, 8]]
- [-1, 1, Conv, [256, 1, 1]] # 12, Y5, lateral_convs.0
- [-1, 1, nn.Upsample, [None, 2, 'nearest']]
- [7, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 14 input_proj.1
- [[-2, -1], 1, Concat, [1]]
- [-1, 3, RepC3, [256]] # 16, fpn_blocks.0
- [-1, 1, Conv, [256, 1, 1]] # 17, Y4, lateral_convs.1
- [-1, 1, nn.Upsample, [None, 2, 'nearest']]
- [3, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 19 input_proj.0
- [[-2, -1], 1, Concat, [1]] # cat backbone P3
- [-1, 3, RepC3, [256]] # X3 (21), fpn_blocks.1
- [-1, 1, nn.Upsample, [None, 2, 'nearest']]
- [1, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 19 input_proj.0
- [[-2, -1], 1, Concat, [1]] # cat backbone P2
- [-1, 3, RepC3, [128]] # X3 (25), fpn_blocks.1
- [-1, 1, SPDConv, [256]] # 22, downsample_convs.0
- [[-1, 21], 1, Concat, [1]] # cat Y3
- [-1, 3, RepC3, [256]] # F4 (28), pan_blocks.0
- [-1, 1, SPDConv, [256]] # 25, downsample_convs.1
- [[-1, 17], 1, Concat, [1]] # cat Y4
- [-1, 3, RepC3, [256]] # F5 (31), pan_blocks.1
- [-1, 1, SPDConv, [256]] # 25, downsample_convs.1
- [[-1, 12], 1, Concat, [1]] # cat Y5
- [-1, 3, RepC3, [256]] # F5 (34), pan_blocks.1
- [[25, 28, 31, 34], 1, RTDETRDecoder, [nc]] # Detect(P3, P4, P5)
七、成功运行结果
打印网络模型可以看到网络已经修改完成,并可以进行训练了。
rtdetr-SPDConv-PPA :
rtdetr-SPDConv-PPA summary: 718 layers, 175,925,933 parameters, 175,925,933 gradients, 215.4 GFLOPs
from n params module arguments
0 -1 1 25248 ultralytics.nn.modules.block.HGStem [3, 32, 48]
1 -1 6 155072 ultralytics.nn.modules.block.HGBlock [48, 48, 128, 3, 6]
2 -1 1 65792 ultralytics.nn.AddModules.SPDConv.SPDConv [128, 128]
3 -1 6 839296 ultralytics.nn.modules.block.HGBlock [128, 96, 512, 3, 6]
4 -1 1 1049600 ultralytics.nn.AddModules.SPDConv.SPDConv [512, 512]
5 -1 6 1695360 ultralytics.nn.modules.block.HGBlock [512, 192, 1024, 5, 6, True, False]
6 -1 6 2055808 ultralytics.nn.modules.block.HGBlock [1024, 192, 1024, 5, 6, True, True]
7 -1 6 2055808 ultralytics.nn.modules.block.HGBlock [1024, 192, 1024, 5, 6, True, True]
8 -1 1 4196352 ultralytics.nn.AddModules.SPDConv.SPDConv [1024, 1024]
9 -1 6 145188202 ultralytics.nn.AddModules.PPA.HGBlock_PPA [1024, 384, 2048, 5, 6, True, False]
10 -1 1 524800 ultralytics.nn.modules.conv.Conv [2048, 256, 1, 1, None, 1, 1, False]
11 -1 1 789760 ultralytics.nn.modules.transformer.AIFI [256, 1024, 8]
12 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
13 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
14 7 1 262656 ultralytics.nn.modules.conv.Conv [1024, 256, 1, 1, None, 1, 1, False]
15 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
16 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
17 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
18 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
19 3 1 131584 ultralytics.nn.modules.conv.Conv [512, 256, 1, 1, None, 1, 1, False]
20 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
21 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
22 -1 1 262656 ultralytics.nn.AddModules.SPDConv.SPDConv [256, 256]
23 [-1, 17] 1 0 ultralytics.nn.modules.conv.Concat [1]
24 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
25 -1 1 262656 ultralytics.nn.AddModules.SPDConv.SPDConv [256, 256]
26 [-1, 12] 1 0 ultralytics.nn.modules.conv.Concat [1]
27 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
28 [21, 24, 27] 1 7303907 ultralytics.nn.modules.head.RTDETRDecoder [1, [256, 256, 256]]
rtdetr-SPDConv-PPA summary: 718 layers, 175,925,933 parameters, 175,925,933 gradients, 215.4 GFLOPs
rtdetr-SPDConv-PPA-p2 :
rtdetr-SPDConv-PPA-p2 summary: 803 layers, 179,129,069 parameters, 179,129,069 gradients, 304.9 GFLOPs
from n params module arguments
0 -1 1 25248 ultralytics.nn.modules.block.HGStem [3, 32, 48]
1 -1 6 155072 ultralytics.nn.modules.block.HGBlock [48, 48, 128, 3, 6]
2 -1 1 65792 ultralytics.nn.AddModules.SPDConv.SPDConv [128, 128]
3 -1 6 839296 ultralytics.nn.modules.block.HGBlock [128, 96, 512, 3, 6]
4 -1 1 1049600 ultralytics.nn.AddModules.SPDConv.SPDConv [512, 512]
5 -1 6 1695360 ultralytics.nn.modules.block.HGBlock [512, 192, 1024, 5, 6, True, False]
6 -1 6 2055808 ultralytics.nn.modules.block.HGBlock [1024, 192, 1024, 5, 6, True, True]
7 -1 6 2055808 ultralytics.nn.modules.block.HGBlock [1024, 192, 1024, 5, 6, True, True]
8 -1 1 4196352 ultralytics.nn.AddModules.SPDConv.SPDConv [1024, 1024]
9 -1 6 145188202 ultralytics.nn.AddModules.PPA.HGBlock_PPA [1024, 384, 2048, 5, 6, True, False]
10 -1 1 524800 ultralytics.nn.modules.conv.Conv [2048, 256, 1, 1, None, 1, 1, False]
11 -1 1 789760 ultralytics.nn.modules.transformer.AIFI [256, 1024, 8]
12 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
13 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
14 7 1 262656 ultralytics.nn.modules.conv.Conv [1024, 256, 1, 1, None, 1, 1, False]
15 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
16 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
17 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
18 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
19 3 1 131584 ultralytics.nn.modules.conv.Conv [512, 256, 1, 1, None, 1, 1, False]
20 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
21 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
22 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
23 1 1 33280 ultralytics.nn.modules.conv.Conv [128, 256, 1, 1, None, 1, 1, False]
24 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
25 -1 3 624640 ultralytics.nn.modules.block.RepC3 [512, 128, 3]
26 -1 1 131584 ultralytics.nn.AddModules.SPDConv.SPDConv [128, 256]
27 [-1, 21] 1 0 ultralytics.nn.modules.conv.Concat [1]
28 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
29 -1 1 262656 ultralytics.nn.AddModules.SPDConv.SPDConv [256, 256]
30 [-1, 17] 1 0 ultralytics.nn.modules.conv.Concat [1]
31 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
32 -1 1 262656 ultralytics.nn.AddModules.SPDConv.SPDConv [256, 256]
33 [-1, 12] 1 0 ultralytics.nn.modules.conv.Concat [1]
34 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
35 [25, 28, 31, 34] 1 7485219 ultralytics.nn.modules.head.RTDETRDecoder [1, [128, 256, 256, 256]]
rtdetr-SPDConv-PPA-p2 summary: 803 layers, 179,129,069 parameters, 179,129,069 gradients, 304.9 GFLOPs