RT-DETR改进策略【卷积层】| CGblock 内容引导网络 利用不同层次信息,提高多类别分类能力 (含二次创新)
一、本文介绍
本文记录的是
利用
CGNet
中的
CG block
模块优化
RT-DETR
的目标检测网络模型
。
CG block
通过
局部特征提取器
、
周围环境提取器
、
联合特征提取器
和
全局环境提取器
来提取
局部特征
、
周围环境
和
全局环境信息
,
充分利用不同层次的信息
。本文将其应用到
RT-DETR
中,并进行
二次创新
,使网络能够更好的
处理多类别目标的分类能力。
二、CGblock 介绍
CGNet: A Light-Weight Context Guided Network for Semantic Segmentation
2.1 设计出发点
- 受到人类视觉系统依赖上下文信息理解场景的启发。例如人类视觉系统识别黄色区域时,如果只关注该区域本身很难识别,但结合其周围环境(红色区域)以及整个场景的全局环境(紫色区域)就更容易对黄色区域进行分类,
-
所以设计
CG block来充分利用局部特征、周围环境和全局环境以提高准确性。
2.2 原理
1️⃣ 首先通过 局部特征提取器 f l o c ( ∗ ) floc(*) f l oc ( ∗ ) 和周围环境提取器 f s u r ( ∗ ) f_{sur }(*) f s u r ( ∗ ) 分别学习 局部特征 和 周围环境特征 ;
2️⃣ 然后由 联合特征提取器 f j o i ( ∗ ) f_{j o i}(*) f j o i ( ∗ ) 获得 联合特征 ;
3️⃣ 最后利用 全局环境提取器 f o l o ( ∗ ) folo(*) f o l o ( ∗ ) 提取 全局环境来改进联合特征 。
2.3 结构
-
如
图(d)所示,CG block由以下部分组成:-
局部特征提取器
f
l
o
c
(
∗
)
floc(*)
f
l
oc
(
∗
)
: 实例化为
3×3标准卷积层,从8个相邻特征向量中 学习局部特征 ,对应图(a)中的黄色区域。 -
周围环境提取器
f
s
u
r
(
∗
)
f_{sur }(*)
f
s
u
r
(
∗
)
: 实例化为
3×3空洞/扩张卷积层,因为空洞/扩张卷积有较大的感受野能有效 学习周围环境 ,对应图(b)中的红色区域。 -
联合特征提取器
f
j
o
i
(
∗
)
f_{j o i}(*)
f
j
o
i
(
∗
)
: 设计为一个
连接层,后面跟着批量归一化(BN)和参数化ReLU(PReLU)操作符,从 f l o c ( ∗ ) floc(*) f l oc ( ∗ ) 和 f s u r ( ∗ ) f_{sur }(*) f s u r ( ∗ ) 的输出中 获取联合特征 。 -
全局环境提取器
f
o
l
o
(
∗
)
folo(*)
f
o
l
o
(
∗
)
: 实例化为一个
全局平均池化层来聚合对应图(c)中紫色区域的 全局环境 ,后面跟着一个多层感知器进一步提取全局环境,最后通过一个尺度层用提取的全局环境对联合特征进行重新加权。
-
局部特征提取器
f
l
o
c
(
∗
)
floc(*)
f
l
oc
(
∗
)
: 实例化为
-
此外,
CG block还采用了 两种残差连接 :- 局部残差学习(LRL) :连接输入和联合特征提取器 f j o i ( ∗ ) f_{j o i}(*) f j o i ( ∗ ) 。
- 全局残差学习(GRL) :连接输入和全局特征提取器 f o l o ( ∗ ) folo(*) f o l o ( ∗ ) , GRL比LRL在促进网络信息流动方面能力更强。
2.4 优势
-
有效利用多种信息
:充分利用了
局部特征、周围环境和全局环境,能更好地对物体进行分类,提高分割准确性。 -
残差学习促进训练
:采用
残差学习有助于 学习高度复杂的特征 ,并在训练过程中 改善梯度反向传播 。 - 合理的结构设计 :各部分的实例化选择合理,如周围环境提取器采用空洞/扩张卷积层,全局环境提取器采用全局平均池化层和多层感知器的组合等。
论文: https://arxiv.org/pdf/1811.08201.pdf
源码: https://github.com/wutianyiRosun/CGNet
三、CGblock的实现代码
CGblock
及其改进的实现代码如下:
import torch
import torch.nn as nn
import torch.nn.functional as F
from ultralytics.nn.modules.conv import LightConv
# https://arxiv.org/pdf/1811.08201.pdf
# https://github.com/wutianyiRosun/CGNet
class ConvBNPReLU(nn.Module):
def __init__(self, nIn, nOut, kSize, stride=1):
"""
args:
nIn: number of input channels
nOut: number of output channels
kSize: kernel size
stride: stride rate for down-sampling. Default is 1
"""
super().__init__()
if isinstance(kSize, tuple):
kSize = kSize[0]
padding = int((kSize - 1) / 2)
self.conv = nn.Conv2d(nIn, nOut, (kSize, kSize), stride=stride, padding=(padding, padding), bias=False)
self.bn = nn.BatchNorm2d(nOut, eps=1e-03)
self.act = nn.PReLU(nOut)
def forward(self, input):
"""
args:
input: input feature map
return: transformed feature map
"""
output = self.conv(input)
output = self.bn(output)
output = self.act(output)
return output
class BNPReLU(nn.Module):
def __init__(self, nOut):
"""
args:
nOut: channels of output feature maps
"""
super().__init__()
self.bn = nn.BatchNorm2d(nOut, eps=1e-03)
self.act = nn.PReLU(nOut)
def forward(self, input):
"""
args:
input: input feature map
return: normalized and thresholded feature map
"""
output = self.bn(input)
output = self.act(output)
return output
class ConvBN(nn.Module):
def __init__(self, nIn, nOut, kSize, stride=1):
"""
args:
nIn: number of input channels
nOut: number of output channels
kSize: kernel size
stride: optinal stide for down-sampling
"""
super().__init__()
if isinstance(kSize, tuple):
kSize = kSize[0]
padding = int((kSize - 1) / 2)
self.conv = nn.Conv2d(nIn, nOut, (kSize, kSize), stride=stride, padding=(padding, padding), bias=False)
self.bn = nn.BatchNorm2d(nOut, eps=1e-03)
def forward(self, input):
"""
args:
input: input feature map
return: transformed feature map
"""
output = self.conv(input)
output = self.bn(output)
return output
class Conv(nn.Module):
def __init__(self, nIn, nOut, kSize, stride=1):
"""
args:
nIn: number of input channels
nOut: number of output channels
kSize: kernel size
stride: optional stride rate for down-sampling
"""
super().__init__()
if isinstance(kSize, tuple):
kSize = kSize[0]
padding = int((kSize - 1) / 2)
self.conv = nn.Conv2d(nIn, nOut, (kSize, kSize), stride=stride, padding=(padding, padding), bias=False)
def forward(self, input):
"""
args:
input: input feature map
return: transformed feature map
"""
output = self.conv(input)
return output
class ChannelWiseConv(nn.Module):
def __init__(self, nIn, nOut, kSize, stride=1):
"""
Args:
nIn: number of input channels
nOut: number of output channels, default (nIn == nOut)
kSize: kernel size
stride: optional stride rate for down-sampling
"""
super().__init__()
if isinstance(kSize, tuple):
kSize = kSize[0]
padding = int((kSize - 1) / 2)
self.conv = nn.Conv2d(nIn, nOut, (kSize, kSize), stride=stride, padding=(padding, padding), groups=nIn,
bias=False)
def forward(self, input):
"""
args:
input: input feature map
return: transformed feature map
"""
output = self.conv(input)
return output
class DilatedConv(nn.Module):
def __init__(self, nIn, nOut, kSize, stride=1, d=1):
"""
args:
nIn: number of input channels
nOut: number of output channels
kSize: kernel size
stride: optional stride rate for down-sampling
d: dilation rate
"""
super().__init__()
if isinstance(kSize, tuple):
kSize = kSize[0]
padding = int((kSize - 1) / 2) * d
self.conv = nn.Conv2d(nIn, nOut, (kSize, kSize), stride=stride, padding=(padding, padding), bias=False,
dilation=d)
def forward(self, input):
"""
args:
input: input feature map
return: transformed feature map
"""
output = self.conv(input)
return output
class ChannelWiseDilatedConv(nn.Module):
def __init__(self, nIn, nOut, kSize, stride=1, d=1):
"""
args:
nIn: number of input channels
nOut: number of output channels, default (nIn == nOut)
kSize: kernel size
stride: optional stride rate for down-sampling
d: dilation rate
"""
super().__init__()
if isinstance(kSize, tuple):
kSize = kSize[0]
padding = int((kSize - 1) / 2) * d
self.conv = nn.Conv2d(nIn, nOut, (kSize, kSize), stride=stride, padding=(padding, padding), groups=nIn,
bias=False, dilation=d)
def forward(self, input):
"""
args:
input: input feature map
return: transformed feature map
"""
output = self.conv(input)
return output
class FGlo(nn.Module):
"""
the FGlo class is employed to refine the joint feature of both local feature and surrounding context.
"""
def __init__(self, channel, reduction=16):
super(FGlo, self).__init__()
self.avg_pool = nn.AdaptiveAvgPool2d(1)
self.fc = nn.Sequential(
nn.Linear(channel, channel // reduction),
nn.ReLU(inplace=True),
nn.Linear(channel // reduction, channel),
nn.Sigmoid()
)
def forward(self, x):
b, c, _, _ = x.size()
y = self.avg_pool(x).view(b, c)
y = self.fc(y).view(b, c, 1, 1)
return x * y
class ContextGuidedBlock_Down(nn.Module):
"""
the size of feature map divided 2, (H,W,C)---->(H/2, W/2, 2C)
"""
def __init__(self, nIn, nOut, dilation_rate=2, reduction=16):
"""
args:
nIn: the channel of input feature map
nOut: the channel of output feature map, and nOut=2*nIn
"""
super().__init__()
self.conv1x1 = ConvBNPReLU(nIn, nOut, 3, 2) # size/2, channel: nIn--->nOut
self.F_loc = ChannelWiseConv(nOut, nOut, 3, 1)
self.F_sur = ChannelWiseDilatedConv(nOut, nOut, 3, 1, dilation_rate)
self.bn = nn.BatchNorm2d(2 * nOut, eps=1e-3)
self.act = nn.PReLU(2 * nOut)
self.reduce = Conv(2 * nOut, nOut, 1, 1) # reduce dimension: 2*nOut--->nOut
self.F_glo = FGlo(nOut, reduction)
def forward(self, input):
output = self.conv1x1(input)
loc = self.F_loc(output)
sur = self.F_sur(output)
joi_feat = torch.cat([loc, sur], 1) # the joint feature
joi_feat = self.bn(joi_feat)
joi_feat = self.act(joi_feat)
joi_feat = self.reduce(joi_feat) # channel= nOut
output = self.F_glo(joi_feat) # F_glo is employed to refine the joint feature
return output
class ContextGuidedBlock(nn.Module):
def __init__(self, nIn, nOut, dilation_rate=2, reduction=16, add=True):
"""
args:
nIn: number of input channels
nOut: number of output channels,
add: if true, residual learning
"""
super().__init__()
n = int(nOut / 2)
self.conv1x1 = ConvBNPReLU(nIn, n, 1, 1) # 1x1 Conv is employed to reduce the computation
self.F_loc = ChannelWiseConv(n, n, 3, 1) # local feature
self.F_sur = ChannelWiseDilatedConv(n, n, 3, 1, dilation_rate) # surrounding context
self.bn_prelu = BNPReLU(nOut)
self.add = add
self.F_glo = FGlo(nOut, reduction)
def forward(self, input):
output = self.conv1x1(input)
loc = self.F_loc(output)
sur = self.F_sur(output)
joi_feat = torch.cat([loc, sur], 1)
joi_feat = self.bn_prelu(joi_feat)
output = self.F_glo(joi_feat) # F_glo is employed to refine the joint feature
# if residual version
if self.add:
output = input + output
return output
def autopad(k, p=None, d=1): # kernel, padding, dilation
"""Pad to 'same' shape outputs."""
if d > 1:
k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k] # actual kernel-size
if p is None:
p = k // 2 if isinstance(k, int) else [x // 2 for x in k] # auto-pad
return p
class Conv(nn.Module):
"""Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)."""
default_act = nn.SiLU() # default activation
def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):
"""Initialize Conv layer with given arguments including activation."""
super().__init__()
self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)
self.bn = nn.BatchNorm2d(c2)
self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()
def forward(self, x):
"""Apply convolution, batch normalization and activation to input tensor."""
return self.act(self.bn(self.conv(x)))
def forward_fuse(self, x):
"""Perform transposed convolution of 2D data."""
return self.act(self.conv(x))
class HGBlock_ContextGuidedBlock(nn.Module):
"""
HG_Block of PPHGNetV2 with 2 convolutions and LightConv.
https://github.com/PaddlePaddle/PaddleDetection/blob/develop/ppdet/modeling/backbones/hgnet_v2.py
"""
def __init__(self, c1, cm, c2, k=3, n=6, lightconv=False, shortcut=False, act=nn.ReLU()):
"""Initializes a CSP Bottleneck with 1 convolution using specified input and output channels."""
super().__init__()
block = LightConv if lightconv else Conv
self.m = nn.ModuleList(block(c1 if i == 0 else cm, cm, k=k, act=act) for i in range(n))
self.sc = Conv(c1 + n * cm, c2 // 2, 1, 1, act=act) # squeeze conv
self.ec = Conv(c2 // 2, c2, 1, 1, act=act) # excitation conv
self.add = shortcut and c1 == c2
self.cv = ContextGuidedBlock(c2, c2)
def forward(self, x):
"""Forward pass of a PPHGNetV2 backbone layer."""
y = [x]
y.extend(m(y[-1]) for m in self.m)
y = self.cv(self.ec(self.sc(torch.cat(y, 1))))
return y + x if self.add else y
四、创新模块
4.1 改进点1⭐
模块改进方法
:直接加入
ContextGuidedBlock_Down模块
(
第五节讲解添加步骤
)。
ContextGuidedBlock_Down模块
添加后如下:
4.2 改进点2⭐
模块改进方法
:基于
ContextGuidedBlock模块
的
HGBlock
(
第五节讲解添加步骤
)。
第二种改进方法是对
RT-DETR
中的
HGBlock模块
进行改进,并将
ContextGuidedBlock
在加入到
HGBlock
模块中。
改进代码如下:
首先,在
HGBlock
中加入
ContextGuidedBlock模块
,并重命名为
HGBlock_ContextGuidedBlock
。
class HGBlock_ContextGuidedBlock(nn.Module):
"""
HG_Block of PPHGNetV2 with 2 convolutions and LightConv.
https://github.com/PaddlePaddle/PaddleDetection/blob/develop/ppdet/modeling/backbones/hgnet_v2.py
"""
def __init__(self, c1, cm, c2, k=3, n=6, lightconv=False, shortcut=False, act=nn.ReLU()):
"""Initializes a CSP Bottleneck with 1 convolution using specified input and output channels."""
super().__init__()
block = LightConv if lightconv else Conv
self.m = nn.ModuleList(block(c1 if i == 0 else cm, cm, k=k, act=act) for i in range(n))
self.sc = Conv(c1 + n * cm, c2 // 2, 1, 1, act=act) # squeeze conv
self.ec = Conv(c2 // 2, c2, 1, 1, act=act) # excitation conv
self.add = shortcut and c1 == c2
self.cv = ContextGuidedBlock(c2, c2)
def forward(self, x):
"""Forward pass of a PPHGNetV2 backbone layer."""
y = [x]
y.extend(m(y[-1]) for m in self.m)
y = self.cv(self.ec(self.sc(torch.cat(y, 1))))
return y + x if self.add else y
注意❗:在
第五小节
中需要声明的模块名称为:
ContextGuidedBlock_Down
和
HGBlock_ContextGuidedBlock
。
五、添加步骤
5.1 修改一
① 在
ultralytics/nn/
目录下新建
AddModules
文件夹用于存放模块代码
② 在
AddModules
文件夹下新建
CGNetBlock.py
,将
第三节
中的代码粘贴到此处
5.2 修改二
在
AddModules
文件夹下新建
__init__.py
(已有则不用新建),在文件内导入模块:
from .CGNetBlock import *
5.3 修改三
在
ultralytics/nn/modules/tasks.py
文件中,需要在两处位置添加各模块类名称。
首先:导入模块
其次:在
parse_model函数
中注册
ContextGuidedBlock_Down
和
HGBlock_ContextGuidedBlock
模块
六、yaml模型文件
6.1 模型改进版本1
此处以
ultralytics/cfg/models/rt-detr/rtdetr-l.yaml
为例,在同目录下创建一个用于自己数据集训练的模型文件
rtdetr-l-ContextGuidedBlock_Down.yaml
。
将
rtdetr-l.yaml
中的内容复制到
rtdetr-l-ContextGuidedBlock_Down.yaml
文件下,修改
nc
数量等于自己数据中目标的数量。
📌 模型的修改方法是将骨干网络中的
下采样模块
替换成
ContextGuidedBlock_Down模块
。
# Ultralytics YOLO 🚀, AGPL-3.0 license
# RT-DETR-l object detection model with P3-P5 outputs. For details see https://docs.ultralytics.com/models/rtdetr
# Parameters
nc: 1 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'
# [depth, width, max_channels]
l: [1.00, 1.00, 1024]
backbone:
# [from, repeats, module, args]
- [-1, 1, HGStem, [32, 48]] # 0-P2/4
- [-1, 6, HGBlock, [48, 128, 3]] # stage 1
- [-1, 1, ContextGuidedBlock_Down, [128]] # 2-P3/8
- [-1, 6, HGBlock, [96, 512, 3]] # stage 2
- [-1, 1, ContextGuidedBlock_Down, [512]] # 4-P4/16
- [-1, 6, HGBlock, [192, 1024, 5, True, False]] # cm, c2, k, light, shortcut
- [-1, 6, HGBlock, [192, 1024, 5, True, True]]
- [-1, 6, HGBlock, [192, 1024, 5, True, True]] # stage 3
- [-1, 1, ContextGuidedBlock_Down, [1024]] # 8-P5/32
- [-1, 6, HGBlock, [384, 2048, 5, True, False]] # stage 4
head:
- [-1, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 10 input_proj.2
- [-1, 1, AIFI, [1024, 8]]
- [-1, 1, Conv, [256, 1, 1]] # 12, Y5, lateral_convs.0
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [7, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 14 input_proj.1
- [[-2, -1], 1, Concat, [1]]
- [-1, 3, RepC3, [256]] # 16, fpn_blocks.0
- [-1, 1, Conv, [256, 1, 1]] # 17, Y4, lateral_convs.1
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [3, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 19 input_proj.0
- [[-2, -1], 1, Concat, [1]] # cat backbone P4
- [-1, 3, RepC3, [256]] # X3 (21), fpn_blocks.1
- [-1, 1, Conv, [256, 3, 2]] # 22, downsample_convs.0
- [[-1, 17], 1, Concat, [1]] # cat Y4
- [-1, 3, RepC3, [256]] # F4 (24), pan_blocks.0
- [-1, 1, Conv, [256, 3, 2]] # 25, downsample_convs.1
- [[-1, 12], 1, Concat, [1]] # cat Y5
- [-1, 3, RepC3, [256]] # F5 (27), pan_blocks.1
- [[21, 24, 27], 1, RTDETRDecoder, [nc]] # Detect(P3, P4, P5)
6.2 模型改进版本2⭐
此处以
ultralytics/cfg/models/rt-detr/rtdetr-l.yaml
为例,在同目录下创建一个用于自己数据集训练的模型文件
rtdetr-l-HGBlock_ContextGuidedBlock.yaml
。
将
rtdetr-l.yaml
中的内容复制到
rtdetr-l-HGBlock_ContextGuidedBlock.yaml
文件下,修改
nc
数量等于自己数据中目标的数量。
📌 模型的修改方法是将
骨干网络
中的部分
HGBlock模块
替换成
HGBlock_ContextGuidedBlock模块
。
# Ultralytics YOLO 🚀, AGPL-3.0 license
# RT-DETR-l object detection model with P3-P5 outputs. For details see https://docs.ultralytics.com/models/rtdetr
# Parameters
nc: 1 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'
# [depth, width, max_channels]
l: [1.00, 1.00, 1024]
backbone:
# [from, repeats, module, args]
- [-1, 1, HGStem, [32, 48]] # 0-P2/4
- [-1, 6, HGBlock, [48, 128, 3]] # stage 1
- [-1, 1, DWConv, [128, 3, 2, 1, False]] # 2-P3/8
- [-1, 6, HGBlock, [96, 512, 3]] # stage 2
- [-1, 1, DWConv, [512, 3, 2, 1, False]] # 4-P4/16
- [-1, 6, HGBlock_ContextGuidedBlock, [192, 1024, 5, True, False]] # cm, c2, k, light, shortcut
- [-1, 6, HGBlock_ContextGuidedBlock, [192, 1024, 5, True, True]]
- [-1, 6, HGBlock_ContextGuidedBlock, [192, 1024, 5, True, True]] # stage 3
- [-1, 1, DWConv, [1024, 3, 2, 1, False]] # 8-P5/32
- [-1, 6, HGBlock, [384, 2048, 5, True, False]] # stage 4
head:
- [-1, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 10 input_proj.2
- [-1, 1, AIFI, [1024, 8]]
- [-1, 1, Conv, [256, 1, 1]] # 12, Y5, lateral_convs.0
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [7, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 14 input_proj.1
- [[-2, -1], 1, Concat, [1]]
- [-1, 3, RepC3, [256]] # 16, fpn_blocks.0
- [-1, 1, Conv, [256, 1, 1]] # 17, Y4, lateral_convs.1
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [3, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 19 input_proj.0
- [[-2, -1], 1, Concat, [1]] # cat backbone P4
- [-1, 3, RepC3, [256]] # X3 (21), fpn_blocks.1
- [-1, 1, Conv, [256, 3, 2]] # 22, downsample_convs.0
- [[-1, 17], 1, Concat, [1]] # cat Y4
- [-1, 3, RepC3, [256]] # F4 (24), pan_blocks.0
- [-1, 1, Conv, [256, 3, 2]] # 25, downsample_convs.1
- [[-1, 12], 1, Concat, [1]] # cat Y5
- [-1, 3, RepC3, [256]] # F5 (27), pan_blocks.1
- [[21, 24, 27], 1, RTDETRDecoder, [nc]] # Detect(P3, P4, P5)
七、成功运行结果
打印网络模型可以看到
ContextGuidedBlock_Down
和
HGBlock_ContextGuidedBlock
已经加入到模型中,并可以进行训练了。
rtdetr-l-ContextGuidedBlock_Down :
rtdetr-l-ContextGuidedBlock_Down summary: 733 layers, 47,603,883 parameters, 47,603,883 gradients, 128.9 GFLOPs
from n params module arguments
0 -1 1 25248 ultralytics.nn.modules.block.HGStem [3, 32, 48]
1 -1 6 155072 ultralytics.nn.modules.block.HGBlock [48, 48, 128, 3, 6]
2 -1 1 186120 ultralytics.nn.AddModules.CGNetBlock.ContextGuidedBlock_Down[128, 128]
3 -1 6 839296 ultralytics.nn.modules.block.HGBlock [128, 96, 512, 3, 6]
4 -1 1 2931744 ultralytics.nn.AddModules.CGNetBlock.ContextGuidedBlock_Down[512, 512]
5 -1 6 1695360 ultralytics.nn.modules.block.HGBlock [512, 192, 1024, 5, 6, True, False]
6 -1 6 2055808 ultralytics.nn.modules.block.HGBlock [1024, 192, 1024, 5, 6, True, True]
7 -1 6 2055808 ultralytics.nn.modules.block.HGBlock [1024, 192, 1024, 5, 6, True, True]
8 -1 1 11696192 ultralytics.nn.AddModules.CGNetBlock.ContextGuidedBlock_Down[1024, 1024]
9 -1 6 6708480 ultralytics.nn.modules.block.HGBlock [1024, 384, 2048, 5, 6, True, False]
10 -1 1 524800 ultralytics.nn.modules.conv.Conv [2048, 256, 1, 1, None, 1, 1, False]
11 -1 1 789760 ultralytics.nn.modules.transformer.AIFI [256, 1024, 8]
12 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
13 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
14 7 1 262656 ultralytics.nn.modules.conv.Conv [1024, 256, 1, 1, None, 1, 1, False]
15 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
16 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
17 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
18 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
19 3 1 131584 ultralytics.nn.modules.conv.Conv [512, 256, 1, 1, None, 1, 1, False]
20 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
21 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
22 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
23 [-1, 17] 1 0 ultralytics.nn.modules.conv.Concat [1]
24 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
25 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
26 [-1, 12] 1 0 ultralytics.nn.modules.conv.Concat [1]
27 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
28 [21, 24, 27] 1 7303907 ultralytics.nn.modules.head.RTDETRDecoder [1, [256, 256, 256]]
rtdetr-l-ContextGuidedBlock_Down summary: 733 layers, 47,603,883 parameters, 47,603,883 gradients, 128.9 GFLOPs
rtdetr-l-HGBlock_ContextGuidedBlock :
rtdetr-l-HGBlock_ContextGuidedBlock summary: 739 layers, 34,818,947 parameters, 34,818,947 gradients, 113.2 GFLOPs
from n params module arguments
0 -1 1 25248 ultralytics.nn.modules.block.HGStem [3, 32, 48]
1 -1 6 155072 ultralytics.nn.modules.block.HGBlock [48, 48, 128, 3, 6]
2 -1 1 1408 ultralytics.nn.modules.conv.DWConv [128, 128, 3, 2, 1, False]
3 -1 6 839296 ultralytics.nn.modules.block.HGBlock [128, 96, 512, 3, 6]
4 -1 1 5632 ultralytics.nn.modules.conv.DWConv [512, 512, 3, 2, 1, False]
5 -1 6 2365632 ultralytics.nn.AddModules.CGNetBlock.HGBlock_ContextGuidedBlock[512, 192, 1024, 5, 6, True, False]
6 -1 6 2726080 ultralytics.nn.AddModules.CGNetBlock.HGBlock_ContextGuidedBlock[1024, 192, 1024, 5, 6, True, True]
7 -1 6 2726080 ultralytics.nn.AddModules.CGNetBlock.HGBlock_ContextGuidedBlock[1024, 192, 1024, 5, 6, True, True]
8 -1 1 11264 ultralytics.nn.modules.conv.DWConv [1024, 1024, 3, 2, 1, False]
9 -1 6 6708480 ultralytics.nn.modules.block.HGBlock [1024, 384, 2048, 5, 6, True, False]
10 -1 1 524800 ultralytics.nn.modules.conv.Conv [2048, 256, 1, 1, None, 1, 1, False]
11 -1 1 789760 ultralytics.nn.modules.transformer.AIFI [256, 1024, 8]
12 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
13 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
14 7 1 262656 ultralytics.nn.modules.conv.Conv [1024, 256, 1, 1, None, 1, 1, False]
15 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
16 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
17 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
18 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
19 3 1 131584 ultralytics.nn.modules.conv.Conv [512, 256, 1, 1, None, 1, 1, False]
20 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
21 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
22 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
23 [-1, 17] 1 0 ultralytics.nn.modules.conv.Concat [1]
24 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
25 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
26 [-1, 12] 1 0 ultralytics.nn.modules.conv.Concat [1]
27 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
28 [21, 24, 27] 1 7303907 ultralytics.nn.modules.head.RTDETRDecoder [1, [256, 256, 256]]
rtdetr-l-HGBlock_ContextGuidedBlock summary: 739 layers, 34,818,947 parameters, 34,818,947 gradients, 113.2 GFLOPs