学习资源站

RT-DETR改进策略【卷积层】 AAAI 2025 FBRT-YOLO 应用RT-DETR 加强跨层特征融合能力与多尺度适应性,含 l 和 resnet18 两个版本-

RT-DETR改进策略【卷积层】| AAAI 2025 FBRT-YOLO 应用RT-DETR 加强跨层特征融合能力与多尺度适应性,含 l 和 resnet18 两个版本

一、本文介绍

本文记录的是 利用FBRT-YOLO中的FCM、MKP模块改进RT-DETR的目标检测网络模型。

在小目标检测任务中,传统卷积方式 难以平衡空间位置信息与深层语义信息的融合 导致小目标特征易丢失且多尺度感知不足 ,影响检测精度与效率。本文引入 FBRT-YOLO 中的 特征互补映射模块 (FCM)与 多内核感知单元 (MKP)改进 RT-DETR FCM 通过通道分割、方向变换与互补映射机制,将浅层空间位置信息逐层嵌入深层语义特征, 缓解了骨干网络中空间-语义失配问题 ,使模型更精准捕捉小目标的空间定位信息; MKP 则通过多尺寸卷积核级联与逐点卷积, 增强了对不同尺度目标的特征感知能力 ,扩大了有效感受野。



二、风车卷积介绍

FBRT-YOLO: Faster and Better for Real-Time Aerial Image Detection

FBRT - YOLO是一种用于实时航拍图像检测的模型,其模型结构包含两个核心的轻量级模块,在航拍图像检测任务中展现出了良好的性能。

2.1 FBRT - YOLO的模型结构

FBRT - YOLO包含两个核心轻量级模块,即特征互补映射模块(FCM)和多内核感知单元(MKP)。

  • FCM旨在将更多空间位置信息集成到丰富的语义特征中,增强小物体的表示;
  • MKP利用不同的卷积核对多尺度目标信息进行捕捉。
  • 此外,针对航拍图像检测,该模型还对基线网络进行了精简,去除了非关键或冗余的计算。

在这里插入图片描述

2.2 FCM模块设计的出发点

在深度网络中,小目标信息容易丢失,导致信息不平衡,同时浅层网络的高分辨率空间信息与深层网络的低分辨率语义信息存在不匹配的问题,这使得目标信息在空间位置和语义信息的整合上存在不足,影响小目标的检测和定位。

FCM模块正是为了解决这些问题而设计,它致力于将目标的空间位置信息更深入地集成到网络中,使其更好地与深层语义信息对齐,从而提升小目标的定位能力。

2.3 FCM模块的结构

  • 通道分割 :将输入特征的通道按比例 α \alpha α 分为两部分,分别为 α C \alpha C α C 通道和 ( 1 − α ) C (1 - \alpha) C ( 1 α ) C 通道 ,使得网络在加深过程中,能更好地获取和处理低层次空间信息。
  • 方向变换 :将分割后的两部分特征分别送入不同的分支进行处理。 X 1 X^{1} X 1 经过标准3×3卷积提取更丰富的特征信息 X C X^{C} X C X 2 X^{2} X 2 经过逐点卷积,保留大量浅层空间位置信息 X S X^{S} X S
  • 互补映射 :对 X C X^{C} X C X S X^{S} X S 进行互补映射。通过通道交互和空间交互,分别为重要信息分配权重并映射到另一分支,实现特征的互补融合,以解决特征离散导致的目标特征匹配不精确问题。
  • 特征聚合 :将经过互补映射后得到的通道信息权重 ω 1 \omega_{1} ω 1 和空间信息权重 ω 2 \omega_{2} ω 2 ,分别映射到包含 X S X^{S} X S X C X^{C} X C 的特征上,然后将两个分支连接起来,得到包含空间和语义关系双重映射特征的 X F C M X^{FCM} X FCM

在这里插入图片描述

2.4 FCM模块的优势

  • 缓解信息不平衡 :采用信息互补融合的方式,将浅层空间位置信息传播到网络深层,有效缓解了骨干网络下采样过程中物体空间位置信息的丢失,促进了空间和语义信息在不同阶段骨干网络的互补学习。
  • 增强小目标检测能力 :通过将空间位置信息更有效地融入深层语义信息,FCM模块提升了网络对小目标的特征匹配能力,增强了小目标在深层网络中的表示,进而提高了小目标的检测和定位精度。
  • 计算资源消耗低 :整体采用相对低计算资源的方式进行信息处理和融合,在提升检测性能的同时,不会给模型带来过高的计算负担,有助于实现实时检测。

论文: https://arxiv.org/pdf/2504.20670
源码: https://github.com/galaxy-oss/FCM

三、FBRT-YOLO的实现代码

FBRT-YOLO 及其改进的实现代码如下:

import torch
import torch.nn as nn
from ultralytics.nn.modules.conv import LightConv

class Channel(nn.Module):
    def __init__(self, dim):
        super().__init__()
        self.dwconv = self.dconv = nn.Conv2d(
            dim, dim, 3,
            1, 1, groups=dim
        )
        self.Apt = nn.AdaptiveAvgPool2d(1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x2 = self.dwconv(x)
        x5 = self.Apt(x2)
        x6 = self.sigmoid(x5)

        return x6

class Spatial(nn.Module):
    def __init__(self, dim):
        super().__init__()
        self.conv1 = nn.Conv2d(dim, 1, 1, 1)
        self.bn = nn.BatchNorm2d(1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x1 = self.conv1(x)
        x5 = self.bn(x1)
        x6 = self.sigmoid(x5)

        return x6

class FCM(nn.Module):
    def __init__(self, dim,dim_out):
        super().__init__()
        self.one = dim // 4
        self.two = dim - dim // 4
        self.conv1 = Conv(dim // 4, dim // 4, 3, 1, 1)
        self.conv12 = Conv(dim // 4, dim // 4, 3, 1, 1)
        self.conv123 = Conv(dim // 4, dim, 1, 1)

        self.conv2 = Conv(dim - dim // 4, dim, 1, 1)
        self.conv3 = Conv(dim, dim, 1, 1)
        self.spatial = Spatial(dim)
        self.channel = Channel(dim)

    def forward(self, x):
        x1, x2 = torch.split(x, [self.one, self.two], dim=1)
        x3 = self.conv1(x1)
        x3 = self.conv12(x3)
        x3 = self.conv123(x3)
        x4 = self.conv2(x2)
        x33 = self.spatial(x4) * x3
        x44 = self.channel(x3) * x4
        x5 = x33 + x44
        x5 = self.conv3(x5)
        return x5

class Pzconv(nn.Module):
    def __init__(self, dim, k=1, s=1, p=None, g=1, d=1, act=True):
        super().__init__()
        self.conv1 = nn.Conv2d(
            dim, dim, 3,
            1, 1, groups=dim
        )
        self.conv2 = Conv(dim, dim, k=1, s=1, )
        self.conv3 = nn.Conv2d(
            dim, dim, 5,
            1, 2, groups=dim
        )
        self.conv4 = Conv(dim, dim, 1, 1)
        self.conv5 = nn.Conv2d(
            dim, dim, 7,
            1, 3, groups=dim
        )

    def forward(self, x):
        x1 = self.conv1(x)
        x2 = self.conv2(x1)
        x3 = self.conv3(x2)
        x4 = self.conv4(x3)
        x5 = self.conv5(x4)
        x6 = x5 + x
        return x6

def autopad(k, p=None, d=1):  # kernel, padding, dilation
    """Pad to 'same' shape outputs."""
    if d > 1:
        k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k]  # actual kernel-size
    if p is None:
        p = k // 2 if isinstance(k, int) else [x // 2 for x in k]  # auto-pad
    return p
 
class Conv(nn.Module):
    """Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)."""
    default_act = nn.SiLU()  # default activation
 
    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):
        """Initialize Conv layer with given arguments including activation."""
        super().__init__()
        self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)
        self.bn = nn.BatchNorm2d(c2)
        self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()
 
    def forward(self, x):
        """Apply convolution, batch normalization and activation to input tensor."""
        return self.act(self.bn(self.conv(x)))
 
    def forward_fuse(self, x):
        """Perform transposed convolution of 2D data."""
        return self.act(self.conv(x))

class HGBlock_FCM(nn.Module):
    """
    HG_Block of PPHGNetV2 with 2 convolutions and LightConv.

    https://github.com/PaddlePaddle/PaddleDetection/blob/develop/ppdet/modeling/backbones/hgnet_v2.py
    """

    def __init__(self, c1, cm, c2, k=3, n=6, lightconv=False, shortcut=False, act=nn.ReLU()):
        """Initializes a CSP Bottleneck with 1 convolution using specified input and output channels."""
        super().__init__()
        block = LightConv if lightconv else Conv
        self.m = nn.ModuleList(block(c1 if i == 0 else cm, cm, k=k, act=act) for i in range(n))
        self.sc = Conv(c1 + n * cm, c2 // 2, 1, 1, act=act)  # squeeze conv
        self.ec = Conv(c2 // 2, c2, 1, 1, act=act)  # excitation conv
        self.add = shortcut and c1 == c2
        self.cv = FCM(c2, c2)
        
    def forward(self, x):
        """Forward pass of a PPHGNetV2 backbone layer."""
        y = [x]
        y.extend(m(y[-1]) for m in self.m)
        y = self.cv(self.ec(self.sc(torch.cat(y, 1))))
        return y + x if self.add else y


四、创新模块

4.1 改进点⭐

模块改进方法 :基于 FCM模块 HGBlock 第五节讲解添加步骤 )。

第二种改进方法是对 RT-DETR 中的 HGBlock模块 进行改进,并将 FCM 在加入到 HGBlock 模块中。

改进代码如下:

HGBlock 模块进行改进,加入 FCM模块

class HGBlock_FCM(nn.Module):
    """
    HG_Block of PPHGNetV2 with 2 convolutions and LightConv.

    https://github.com/PaddlePaddle/PaddleDetection/blob/develop/ppdet/modeling/backbones/hgnet_v2.py
    """

    def __init__(self, c1, cm, c2, k=3, n=6, lightconv=False, shortcut=False, act=nn.ReLU()):
        """Initializes a CSP Bottleneck with 1 convolution using specified input and output channels."""
        super().__init__()
        block = LightConv if lightconv else Conv
        self.m = nn.ModuleList(block(c1 if i == 0 else cm, cm, k=k, act=act) for i in range(n))
        self.sc = Conv(c1 + n * cm, c2 // 2, 1, 1, act=act)  # squeeze conv
        self.ec = Conv(c2 // 2, c2, 1, 1, act=act)  # excitation conv
        self.add = shortcut and c1 == c2
        self.cv = FCM(c2, c2)
        
    def forward(self, x):
        """Forward pass of a PPHGNetV2 backbone layer."""
        y = [x]
        y.extend(m(y[-1]) for m in self.m)
        y = self.cv(self.ec(self.sc(torch.cat(y, 1))))
        return y + x if self.add else y

在这里插入图片描述

4.2 改进点⭐

模块改进方法 :第二种方法是针对 resnet18 版本进行改进,需先配置好 resnet18 版本,在复制下方代码,覆盖 ResNet.py 即可。

改进代码如下:

from collections import OrderedDict
import torch
import torch.nn as nn
import torch.nn.functional as F

def autopad(k, p=None, d=1):  # kernel, padding, dilation
    """Pad to 'same' shape outputs."""
    if d > 1:
        k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k]  # actual kernel-size
    if p is None:
        p = k // 2 if isinstance(k, int) else [x // 2 for x in k]  # auto-pad
    return p
 
class Conv(nn.Module):
    """Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)."""
    default_act = nn.SiLU()  # default activation
 
    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):
        """Initialize Conv layer with given arguments including activation."""
        super().__init__()
        self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)
        self.bn = nn.BatchNorm2d(c2)
        self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()
 
    def forward(self, x):
        """Apply convolution, batch normalization and activation to input tensor."""
        return self.act(self.bn(self.conv(x)))
 
    def forward_fuse(self, x):
        """Perform transposed convolution of 2D data."""
        return self.act(self.conv(x))

class Channel(nn.Module):
    def __init__(self, dim):
        super().__init__()
        self.dwconv = self.dconv = nn.Conv2d(
            dim, dim, 3,
            1, 1, groups=dim
        )
        self.Apt = nn.AdaptiveAvgPool2d(1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x2 = self.dwconv(x)
        x5 = self.Apt(x2)
        x6 = self.sigmoid(x5)

        return x6

class Spatial(nn.Module):
    def __init__(self, dim):
        super().__init__()
        self.conv1 = nn.Conv2d(dim, 1, 1, 1)
        self.bn = nn.BatchNorm2d(1)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
        x1 = self.conv1(x)
        x5 = self.bn(x1)
        x6 = self.sigmoid(x5)

        return x6

class FCM(nn.Module):
    def __init__(self, dim,dim_out):
        super().__init__()
        self.one = dim // 4
        self.two = dim - dim // 4
        self.conv1 = Conv(dim // 4, dim // 4, 3, 1, 1)
        self.conv12 = Conv(dim // 4, dim // 4, 3, 1, 1)
        self.conv123 = Conv(dim // 4, dim, 1, 1)

        self.conv2 = Conv(dim - dim // 4, dim, 1, 1)
        self.conv3 = Conv(dim, dim, 1, 1)
        self.spatial = Spatial(dim)
        self.channel = Channel(dim)

    def forward(self, x):
        x1, x2 = torch.split(x, [self.one, self.two], dim=1)
        x3 = self.conv1(x1)
        x3 = self.conv12(x3)
        x3 = self.conv123(x3)
        x4 = self.conv2(x2)
        x33 = self.spatial(x4) * x3
        x44 = self.channel(x3) * x4
        x5 = x33 + x44
        x5 = self.conv3(x5)
        return x5

class ConvNormLayer(nn.Module):
    def __init__(self,
                 ch_in,
                 ch_out,
                 filter_size,
                 stride,
                 groups=1,
                 act=None):
        super(ConvNormLayer, self).__init__()
        self.act = act
        self.conv = nn.Conv2d(
            in_channels=ch_in,
            out_channels=ch_out,
            kernel_size=filter_size,
            stride=stride,
            padding=(filter_size - 1) // 2,
            groups=groups)
 
        self.norm = nn.BatchNorm2d(ch_out)
 
    def forward(self, inputs):
        out = self.conv(inputs)
        out = self.norm(out)
        if self.act:
            out = getattr(F, self.act)(out)
        return out
 
class SELayer(nn.Module):
    def __init__(self, ch, reduction_ratio=16):
        super(SELayer, self).__init__()
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.fc = nn.Sequential(
            nn.Linear(ch, ch // reduction_ratio, bias=False),
            nn.ReLU(inplace=True),
            nn.Linear(ch // reduction_ratio, ch, bias=False),
            nn.Sigmoid()
        )
 
    def forward(self, x):
        b, c, _, _ = x.size()
        y = self.avg_pool(x).view(b, c)
        y = self.fc(y).view(b, c, 1, 1)
        return x * y.expand_as(x)

class BasicBlock_FCM(nn.Module):
    expansion = 1
 
    def __init__(self,
                 ch_in,
                 ch_out,
                 stride,
                 shortcut,
                 act='relu',
                 variant='b',
                 att=False):
        super(BasicBlock_FCM, self).__init__()
        self.shortcut = shortcut
        if not shortcut:
            if variant == 'd' and stride == 2:
                self.short = nn.Sequential()
                self.short.add_sublayer(
                    'pool',
                    nn.AvgPool2d(
                        kernel_size=2, stride=2, padding=0, ceil_mode=True))
                self.short.add_sublayer(
                    'conv',
                    ConvNormLayer(
                        ch_in=ch_in,
                        ch_out=ch_out,
                        filter_size=1,
                        stride=1))
            else:
                self.short = ConvNormLayer(
                    ch_in=ch_in,
                    ch_out=ch_out,
                    filter_size=1,
                    stride=stride)
 
        self.branch2a = ConvNormLayer(
            ch_in=ch_in,
            ch_out=ch_out,
            filter_size=3,
            stride=stride,
            act='relu')
 
        self.branch2b = ConvNormLayer(
            ch_in=ch_out,
            ch_out=ch_out,
            filter_size=3,
            stride=1,
            act=None)
 
        self.att = att
        if self.att:
            self.se = FCM(ch_out, ch_out)
 
    def forward(self, inputs):
        out = self.branch2a(inputs)
        out = self.branch2b(out)
 
        if self.att:
            out = self.se(out)
 
        if self.shortcut:
            short = inputs
        else:
            short = self.short(inputs)
 
        out = out + short
        out = F.relu(out)
 
        return out
 
class BottleNeck(nn.Module):
    expansion = 4
 
    def __init__(self, ch_in, ch_out, stride, shortcut, act='relu', variant='d', att=False):
        super().__init__()
 
        if variant == 'a':
            stride1, stride2 = stride, 1
        else:
            stride1, stride2 = 1, stride
 
        width = ch_out
 
        self.branch2a = ConvNormLayer(ch_in, width, 1, stride1, act=act)
        self.branch2b = ConvNormLayer(width, width, 3, stride2, act=act)
        self.branch2c = ConvNormLayer(width, ch_out * self.expansion, 1, 1)
 
        self.shortcut = shortcut
        if not shortcut:
            if variant == 'd' and stride == 2:
                self.short = nn.Sequential(OrderedDict([
                    ('pool', nn.AvgPool2d(2, 2, 0, ceil_mode=True)),
                    ('conv', ConvNormLayer(ch_in, ch_out * self.expansion, 1, 1))
                ]))
            else:
                self.short = ConvNormLayer(ch_in, ch_out * self.expansion, 1, stride)
 
        self.att = att
        if self.att:
            self.se = SELayer(ch_out * 4)
 
    def forward(self, x):
        out = self.branch2a(x)
        out = self.branch2b(out)
        out = self.branch2c(out)
 
        if self.att:
            out = self.se(out)
 
        if self.shortcut:
            short = x
        else:
            short = self.short(x)
 
        out = out + short
        out = F.relu(out)
 
        return out
 
class Blocks(nn.Module):
    def __init__(self,
                 ch_in,
                 ch_out,
                 count,
                 block,
                 stage_num,
                 att=False,
                 variant='b'):
        super(Blocks, self).__init__()
        self.blocks = nn.ModuleList()
        block = globals()[block]
        for i in range(count):
            self.blocks.append(
                block(
                    ch_in,
                    ch_out,
                    stride=2 if i == 0 and stage_num != 2 else 1,
                    shortcut=False if i == 0 else True,
                    variant=variant,
                    att=att)
            )
            if i == 0:
                ch_in = ch_out * block.expansion
 
    def forward(self, inputs):
        block_out = inputs
        for block in self.blocks:
            block_out = block(block_out)
        return block_out

注意❗:在 第五小节 中需要声明的模块名称为: HGBlock_FCM


五、添加步骤

5.1 修改一

① 在 ultralytics/nn/ 目录下新建 AddModules 文件夹用于存放模块代码

② 在 AddModules 文件夹下新建 FBRT_YOLO.py ,将 第三节 中的代码粘贴到此处

在这里插入图片描述

5.2 修改二

AddModules 文件夹下新建 __init__.py (已有则不用新建),在文件内导入模块: from .FBRT_YOLO import *

在这里插入图片描述

5.3 修改三

ultralytics/nn/tasks.py 文件中,需要在指定位置添加各模块类名称。

首先:导入模块

在这里插入图片描述

其次:在 parse_model函数 中注册 HGBlock_FCM 模块

在这里插入图片描述

在这里插入图片描述

Resnet18版本的tasks.py只需要按照教程步骤配置即可,此处配置完成后无需再次配置。


六、yaml模型文件

6.1 模型改进版本⭐

此处以 ultralytics/cfg/models/rt-detr/rtdetr-l.yaml 为例,在同目录下创建一个用于自己数据集训练的模型文件 rtdetr-l-FBRT_YOLO.yaml

rtdetr-l.yaml 中的内容复制到 rtdetr-l-FBRT_YOLO.yaml 文件下,修改 nc 数量等于自己数据中目标的数量。

📌 模型的修改方法是在 rtdetr-l 上,利用 FBRT-YOLO 改进 HGBlock

# Ultralytics YOLO 🚀, AGPL-3.0 license
# RT-DETR-l object detection model with P3-P5 outputs. For details see https://docs.ultralytics.com/models/rtdetr

# Parameters
nc: 1 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'
  # [depth, width, max_channels]
  l: [1.00, 1.00, 1024]

backbone:
  # [from, repeats, module, args]
  - [-1, 1, HGStem, [32, 48]] # 0-P2/4
  - [-1, 6, HGBlock, [48, 128, 3]] # stage 1

  - [-1, 1, DWConv, [128, 3, 2, 1, False]] # 2-P3/8
  - [-1, 6, HGBlock, [96, 512, 3]] # stage 2

  - [-1, 1, DWConv, [512, 3, 2, 1, False]] # 4-P4/16
  - [-1, 6, HGBlock_FCM, [192, 1024, 5, True, False]] # cm, c2, k, light, shortcut
  - [-1, 6, HGBlock_FCM, [192, 1024, 5, True, True]]
  - [-1, 6, HGBlock_FCM, [192, 1024, 5, True, True]] # stage 3

  - [-1, 1, DWConv, [1024, 3, 2, 1, False]] # 8-P5/32
  - [-1, 6, HGBlock, [384, 2048, 5, True, False]] # stage 4

head:
  - [-1, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 10 input_proj.2
  - [-1, 1, AIFI, [1024, 8]]
  - [-1, 1, Conv, [256, 1, 1]] # 12, Y5, lateral_convs.0

  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [7, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 14 input_proj.1
  - [[-2, -1], 1, Concat, [1]]
  - [-1, 3, RepC3, [256]] # 16, fpn_blocks.0
  - [-1, 1, Conv, [256, 1, 1]] # 17, Y4, lateral_convs.1

  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [3, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 19 input_proj.0
  - [[-2, -1], 1, Concat, [1]] # cat backbone P4
  - [-1, 3, RepC3, [256]] # X3 (21), fpn_blocks.1

  - [-1, 1, Conv, [256, 3, 2]] # 22, downsample_convs.0
  - [[-1, 17], 1, Concat, [1]] # cat Y4
  - [-1, 3, RepC3, [256]] # F4 (24), pan_blocks.0

  - [-1, 1, Conv, [256, 3, 2]] # 25, downsample_convs.1
  - [[-1, 12], 1, Concat, [1]] # cat Y5
  - [-1, 3, RepC3, [256]] # F5 (27), pan_blocks.1

  - [[21, 24, 27], 1, RTDETRDecoder, [nc]] # Detect(P3, P4, P5)

6.2 模型改进版本⭐

此处以 rtdetr-resnet18.yaml 为例,在同目录下创建一个用于自己数据集训练的模型文件 rtdetr-resnet18-FBRT_YOLO.yaml

rtdetr-resnet18.yaml 中的内容复制到 rtdetr-resnet18-FBRT_YOLO.yaml 文件下,修改 nc 数量等于自己数据中目标的数量。

📌 模型的修改方法是在网络中的 BasicBlock模块 中添加 FCM模块

# Ultralytics YOLO 🚀, AGPL-3.0 license
# RT-DETR-ResNet50 object detection model with P3-P5 outputs.

# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'
  # [depth, width, max_channels]
  l: [1.00, 1.00, 1024]
 
backbone:
  # [from, repeats, module, args]
  - [-1, 1, ConvNormLayer, [32, 3, 2, 1, 'relu']] # 0-P1
  - [-1, 1, ConvNormLayer, [32, 3, 1, 1, 'relu']] # 1
  - [-1, 1, ConvNormLayer, [64, 3, 1, 1, 'relu']] # 2
  - [-1, 1, nn.MaxPool2d, [3, 2, 1]] # 3-P2
 
  - [-1, 2, Blocks, [64,  BasicBlock_FCM, 2, True]] # 4
  - [-1, 2, Blocks, [128, BasicBlock_FCM, 3, True]] # 5-P3
  - [-1, 2, Blocks, [256, BasicBlock_FCM, 4, True]] # 6-P4
  - [-1, 2, Blocks, [512, BasicBlock_FCM, 5, True]] # 7-P5
 
head:
  - [-1, 1, Conv, [256, 1, 1, None, 1, 1, False]]  # 8 input_proj.2
  - [-1, 1, AIFI, [1024, 8]]
  - [-1, 1, Conv, [256, 1, 1]]  # 10, Y5, lateral_convs.0
 
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 11
  - [6, 1, Conv, [256, 1, 1, None, 1, 1, False]]  # 12 input_proj.1
  - [[-2, -1], 1, Concat, [1]]
  - [-1, 3, RepC3, [256, 0.5]]  # 14, fpn_blocks.0
  - [-1, 1, Conv, [256, 1, 1]]  # 15, Y4, lateral_convs.1
 
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 16
  - [5, 1, Conv, [256, 1, 1, None, 1, 1, False]]  # 17 input_proj.0
  - [[-2, -1], 1, Concat, [1]]  # 18 cat backbone P4
  - [-1, 3, RepC3, [256, 0.5]]  # X3 (19), fpn_blocks.1
 
  - [-1, 1, Conv, [256, 3, 2]]  # 20, downsample_convs.0
  - [[-1, 15], 1, Concat, [1]]  # 21 cat Y4
  - [-1, 3, RepC3, [256, 0.5]]  # F4 (22), pan_blocks.0
 
  - [-1, 1, Conv, [256, 3, 2]]  # 23, downsample_convs.1
  - [[-1, 10], 1, Concat, [1]]  # 24 cat Y5
  - [-1, 3, RepC3, [256, 0.5]]  # F5 (25), pan_blocks.1
 
  - [[19, 22, 25], 1, RTDETRDecoder, [nc, 256, 300, 4, 8, 3]]  # Detect(P3, P4, P5)


七、成功运行结果

打印网络模型可以看到 HGBlock_FCM BasicBlock_FCM 已经加入到模型中,并可以进行训练了。

rtdetr-l-HGBlock_FCM

rtdetr-l-HGBlock_FCM summary: 755 layers, 42,693,836 parameters, 42,693,836 gradients, 139.7 GFLOPs

                   from  n    params  module                                       arguments                     
  0                  -1  1     25248  ultralytics.nn.modules.block.HGStem          [3, 32, 48]                   
  1                  -1  6    155072  ultralytics.nn.modules.block.HGBlock         [48, 48, 128, 3, 6]           
  2                  -1  1      1408  ultralytics.nn.modules.conv.DWConv           [128, 128, 3, 2, 1, False]    
  3                  -1  6    839296  ultralytics.nn.modules.block.HGBlock         [128, 96, 512, 3, 6]          
  4                  -1  1      5632  ultralytics.nn.modules.conv.DWConv           [512, 512, 3, 2, 1, False]    
  5                  -1  6   4990595  ultralytics.nn.AddModules.FBRT_YOLO.HGBlock_FCM[512, 192, 1024, 5, 6, True, False]
  6                  -1  6   5351043  ultralytics.nn.AddModules.FBRT_YOLO.HGBlock_FCM[1024, 192, 1024, 5, 6, True, True]
  7                  -1  6   5351043  ultralytics.nn.AddModules.FBRT_YOLO.HGBlock_FCM[1024, 192, 1024, 5, 6, True, True]
  8                  -1  1     11264  ultralytics.nn.modules.conv.DWConv           [1024, 1024, 3, 2, 1, False]  
  9                  -1  6   6708480  ultralytics.nn.modules.block.HGBlock         [1024, 384, 2048, 5, 6, True, False]
 10                  -1  1    524800  ultralytics.nn.modules.conv.Conv             [2048, 256, 1, 1, None, 1, 1, False]
 11                  -1  1    789760  ultralytics.nn.modules.transformer.AIFI      [256, 1024, 8]                
 12                  -1  1     66048  ultralytics.nn.modules.conv.Conv             [256, 256, 1, 1]              
 13                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 14                   7  1    262656  ultralytics.nn.modules.conv.Conv             [1024, 256, 1, 1, None, 1, 1, False]
 15            [-2, -1]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 16                  -1  3   2232320  ultralytics.nn.modules.block.RepC3           [512, 256, 3]                 
 17                  -1  1     66048  ultralytics.nn.modules.conv.Conv             [256, 256, 1, 1]              
 18                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 19                   3  1    131584  ultralytics.nn.modules.conv.Conv             [512, 256, 1, 1, None, 1, 1, False]
 20            [-2, -1]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 21                  -1  3   2232320  ultralytics.nn.modules.block.RepC3           [512, 256, 3]                 
 22                  -1  1    590336  ultralytics.nn.modules.conv.Conv             [256, 256, 3, 2]              
 23            [-1, 17]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 24                  -1  3   2232320  ultralytics.nn.modules.block.RepC3           [512, 256, 3]                 
 25                  -1  1    590336  ultralytics.nn.modules.conv.Conv             [256, 256, 3, 2]              
 26            [-1, 12]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 27                  -1  3   2232320  ultralytics.nn.modules.block.RepC3           [512, 256, 3]                 
 28        [21, 24, 27]  1   7303907  ultralytics.nn.modules.head.RTDETRDecoder    [1, [256, 256, 256]]          
rtdetr-l-HGBlock_FCM summary: 755 layers, 42,693,836 parameters, 42,693,836 gradients, 139.7 GFLOPs

rtdetr-resnet18-FBRT_YOLO

rtdetr-resnet18-FBRT_YOLO summary: 582 layers, 22,298,284 parameters, 22,298,284 gradients, 63.8 GFLOPs

                   from  n    params  module                                       arguments                     
  0                  -1  1       960  ultralytics.nn.AddModules.ResNet.ConvNormLayer[3, 32, 3, 2, 1, 'relu']      
  1                  -1  1      9312  ultralytics.nn.AddModules.ResNet.ConvNormLayer[32, 32, 3, 1, 1, 'relu']     
  2                  -1  1     18624  ultralytics.nn.AddModules.ResNet.ConvNormLayer[32, 64, 3, 1, 1, 'relu']     
  3                  -1  1         0  torch.nn.modules.pooling.MaxPool2d           [3, 2, 1]                     
  4                  -1  2    180422  ultralytics.nn.AddModules.ResNet.Blocks      [64, 64, 2, 'BasicBlock_FCM', 2, True]
  5                  -1  2    633222  ultralytics.nn.AddModules.ResNet.Blocks      [64, 128, 2, 'BasicBlock_FCM', 3, True]
  6                  -1  2   2519814  ultralytics.nn.AddModules.ResNet.Blocks      [128, 256, 2, 'BasicBlock_FCM', 4, True]
  7                  -1  2  10053126  ultralytics.nn.AddModules.ResNet.Blocks      [256, 512, 2, 'BasicBlock_FCM', 5, True]
  8                  -1  1    131584  ultralytics.nn.modules.conv.Conv             [512, 256, 1, 1, None, 1, 1, False]
  9                  -1  1    789760  ultralytics.nn.modules.transformer.AIFI      [256, 1024, 8]                
 10                  -1  1     66048  ultralytics.nn.modules.conv.Conv             [256, 256, 1, 1]              
 11                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 12                   6  1     66048  ultralytics.nn.modules.conv.Conv             [256, 256, 1, 1, None, 1, 1, False]
 13            [-2, -1]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 14                  -1  3    657920  ultralytics.nn.modules.block.RepC3           [512, 256, 3, 0.5]            
 15                  -1  1     66048  ultralytics.nn.modules.conv.Conv             [256, 256, 1, 1]              
 16                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 17                   5  1     33280  ultralytics.nn.modules.conv.Conv             [128, 256, 1, 1, None, 1, 1, False]
 18            [-2, -1]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 19                  -1  3    657920  ultralytics.nn.modules.block.RepC3           [512, 256, 3, 0.5]            
 20                  -1  1    590336  ultralytics.nn.modules.conv.Conv             [256, 256, 3, 2]              
 21            [-1, 15]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 22                  -1  3    657920  ultralytics.nn.modules.block.RepC3           [512, 256, 3, 0.5]            
 23                  -1  1    590336  ultralytics.nn.modules.conv.Conv             [256, 256, 3, 2]              
 24            [-1, 10]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 25                  -1  3    657920  ultralytics.nn.modules.block.RepC3           [512, 256, 3, 0.5]            
 26        [19, 22, 25]  1   3917684  ultralytics.nn.modules.head.RTDETRDecoder    [1, [256, 256, 256], 256, 300, 4, 8, 3]
rtdetr-resnet18-FBRT_YOLO summary: 582 layers, 22,298,284 parameters, 22,298,284 gradients, 63.8 GFLOPs