学习资源站

RT-DETR改进策略【Neck】GFPN超越BiFPN通过跳层连接和跨尺度连接改进RT-DETR颈部网络-

RT-DETR改进策略【Neck】| GFPN 超越BiFPN 通过跳层连接和跨尺度连接改进RT-DETR颈部网络

一、本文介绍

本文记录的是 利用 GFPN 颈部结构优化RT-DETR的目标检测网络模型 。利用 GFPN 改进后的颈部网络,通过 跳层连接 避免了在进行反向传播时的梯度消失问题 ,并且引入 跨尺度连接 可以实现不同级别和层次的特征充分融合,获取足够的高层语义信息和低层空间信息交换,从而在大规模变化场景下提高检测性能。



二、GFPN介绍

GIRAFFEDET: A HEAVY-NECK PARADIGM FOR OBJECT DETECTION

2.1 设计出发点

  • 传统的FPN及其改进方法存在一些局限性。例如,常规FPN(Lin et al., 2017a)只有单向的信息流动路径(top-down)来融合多尺度特征;PANet(Liu et al., 2018)虽增加了bottom-up路径聚合网络,但计算成本较高;BiFPN(Tan et al., 2020)对节点和连接进行了优化,但缺乏内部块连接。为了克服这些问题,同时实现高效的多尺度信息融合以应对目标检测中的大规模变化挑战,设计了GFPN模块。

在这里插入图片描述

2.2 原理

2.2.1 Skip-layer Connection(跳层连接)

  • 目的 :减少在“giraffe”这种复杂结构的颈部进行反向传播时的梯度消失问题。
  • 具体方式 :提出了两种特征连接方法,即dense-link和 l o g 2 n log _{2} n l o g 2 n -link。
  • dense-link :灵感来自DenseNet(Huang et al., 2017),对于级别 k k k 中的每个尺度特征 P k l P_{k}^{l} P k l ,第 l l l 层接收所有前面层的特征图,即 P k l = C o n v ( C o n c a t ( P k 0 , . . . , P k l − 1 ) ) P_{k}^{l}=Conv\left(Concat\left(P_{k}^{0},..., P_{k}^{l - 1}\right)\right) P k l = C o n v ( C o n c a t ( P k 0 , ... , P k l 1 ) )
  • l o g 2 n log _{2} n l o g 2 n -link :在每个级别 k k k 中,第 l l l 层接收最多 l o g 2 l + 1 log _{2} l + 1 l o g 2 l + 1 个前面层的特征图,这些输入层与深度(i)以2为底呈指数间隔,即 P k l = C o n v ( C o n c a t ( P k l − 2 n , . . . , P k l − 2 1 , P k l − 2 0 ) ) P_{k}^{l}=Conv\left(Concat\left(P_{k}^{l - 2^{n}},..., P_{k}^{l - 2^{1}}, P_{k}^{l - 2^{0}}\right)\right) P k l = C o n v ( C o n c a t ( P k l 2 n , ... , P k l 2 1 , P k l 2 0 ) ) ,其中 l − 2 n ≥ 0 l - 2^{n}≥0 l 2 n 0 。与dense-link相比, l o g 2 n log _{2} n l o g 2 n -link在深度 l l l 处的时间复杂度仅为 O ( l ⋅ l o g 2 l ) O(l \cdot log _{2} l) O ( l l o g 2 l ) ,而不是 O ( l 2 ) O(l^{2}) O ( l 2 ) ,并且在反向传播时层间距离增加较小,可扩展到更深的网络。

2.2.2 Cross-scale Connection(跨尺度连接)

  • 目的 :为了实现充分的信息交换,克服大规模变化,不仅要有跳层连接,还需要跨尺度连接。
  • 具体方式 :提出了一种新的跨尺度融合方法Queen-fusion,它考虑了相同级别和相邻级别的特征。例如在 P 5 P_{5} P 5 处的Queen-fusion连接包括对前一层 P 4 P_{4} P 4 的下采样、前一层 P 6 P_{6} P 6 的上采样、前一层 P 5 P_{5} P 5 以及当前层 P 4 P_{4} P 4 。在实现中,分别应用双线性插值和最大池化作为上采样和下采样函数。

在这里插入图片描述

2.3 结构

-GFPN包含了上述的跳层连接(dense-link和 l o g 2 n log _{2} n l o g 2 n -link)以及跨尺度连接(Queen-fusion)。与其他FPN设计相比,如PANet和BiFPN,其每层代表一个深度,而PANet和BiFPN的层包含两个深度。

2.4 优势

  • 高效的信息传输 l o g 2 n log _{2} n l o g 2 n -link这种跳层连接方式在相同的FLOPs水平下能提供更有效的信息传输,相比dense-link避免了可能的冗余信息传输,并且能使网络扩展到更深层次。
  • 充分的信息融合 :通过Queen-fusion这种跨尺度连接方式,可以实现不同级别和层次的特征充分融合,获取足够的高层语义信息和低层空间信息交换,从而在大规模变化场景下提高检测性能。
  • 性能优势 :实验结果表明,在不同的FLOPs水平下,GFPN都能使GiraffeDet模型在准确性和效率上取得较好的平衡,优于其他基于不同骨干网络和FPN结构的方法。例如在COCO数据集上的实验结果显示,GiraffeDet-D29采用GFPN结构在多尺度测试下取得了 54.1 % 54.1\% 54.1% 的mAP,超过了其他SOTA方法。

论文: https://arxiv.org/pdf/2202.04256
源码: https://github.com/damo-cv/GiraffeDet

三、GFPN的实现代码

GFPN模块 的实现代码如下:

import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
 
def conv_bn(in_channels, out_channels, kernel_size, stride, padding, groups=1):
    '''Basic cell for rep-style block, including conv and bn'''
    result = nn.Sequential()
    result.add_module(
        'conv',
        nn.Conv2d(in_channels=in_channels,
                  out_channels=out_channels,
                  kernel_size=kernel_size,
                  stride=stride,
                  padding=padding,
                  groups=groups,
                  bias=False))
    result.add_module('bn', nn.BatchNorm2d(num_features=out_channels))
    return result
 
class RepConv(nn.Module):
    '''RepConv is a basic rep-style block, including training and deploy status
    Code is based on https://github.com/DingXiaoH/RepVGG/blob/main/repvgg.py
    '''
    def __init__(self,
                 in_channels,
                 out_channels,
                 kernel_size=3,
                 stride=1,
                 padding=1,
                 dilation=1,
                 groups=1,
                 padding_mode='zeros',
                 deploy=False,
                 act='relu',
                 norm=None):
        super(RepConv, self).__init__()
        self.deploy = deploy
        self.groups = groups
        self.in_channels = in_channels
        self.out_channels = out_channels
 
        assert kernel_size == 3
        assert padding == 1
 
        padding_11 = padding - kernel_size // 2
 
        if isinstance(act, str):
            self.nonlinearity = get_activation(act)
        else:
            self.nonlinearity = act
 
        if deploy:
            self.rbr_reparam = nn.Conv2d(in_channels=in_channels,
                                         out_channels=out_channels,
                                         kernel_size=kernel_size,
                                         stride=stride,
                                         padding=padding,
                                         dilation=dilation,
                                         groups=groups,
                                         bias=True,
                                         padding_mode=padding_mode)
 
        else:
            self.rbr_identity = None
            self.rbr_dense = conv_bn(in_channels=in_channels,
                                     out_channels=out_channels,
                                     kernel_size=kernel_size,
                                     stride=stride,
                                     padding=padding,
                                     groups=groups)
            self.rbr_1x1 = conv_bn(in_channels=in_channels,
                                   out_channels=out_channels,
                                   kernel_size=1,
                                   stride=stride,
                                   padding=padding_11,
                                   groups=groups)
 
    def forward(self, inputs):
        '''Forward process'''
        if hasattr(self, 'rbr_reparam'):
            return self.nonlinearity(self.rbr_reparam(inputs))
 
        if self.rbr_identity is None:
            id_out = 0
        else:
            id_out = self.rbr_identity(inputs)
 
        return self.nonlinearity(
            self.rbr_dense(inputs) + self.rbr_1x1(inputs) + id_out)
 
    def get_equivalent_kernel_bias(self):
        kernel3x3, bias3x3 = self._fuse_bn_tensor(self.rbr_dense)
        kernel1x1, bias1x1 = self._fuse_bn_tensor(self.rbr_1x1)
        kernelid, biasid = self._fuse_bn_tensor(self.rbr_identity)
        return kernel3x3 + self._pad_1x1_to_3x3_tensor(
            kernel1x1) + kernelid, bias3x3 + bias1x1 + biasid
 
    def _pad_1x1_to_3x3_tensor(self, kernel1x1):
        if kernel1x1 is None:
            return 0
        else:
            return torch.nn.functional.pad(kernel1x1, [1, 1, 1, 1])
 
    def _fuse_bn_tensor(self, branch):
        if branch is None:
            return 0, 0
        if isinstance(branch, nn.Sequential):
            kernel = branch.conv.weight
            running_mean = branch.bn.running_mean
            running_var = branch.bn.running_var
            gamma = branch.bn.weight
            beta = branch.bn.bias
            eps = branch.bn.eps
        else:
            assert isinstance(branch, nn.BatchNorm2d)
            if not hasattr(self, 'id_tensor'):
                input_dim = self.in_channels // self.groups
                kernel_value = np.zeros((self.in_channels, input_dim, 3, 3),
                                        dtype=np.float32)
                for i in range(self.in_channels):
                    kernel_value[i, i % input_dim, 1, 1] = 1
                self.id_tensor = torch.from_numpy(kernel_value).to(
                    branch.weight.device)
            kernel = self.id_tensor
            running_mean = branch.running_mean
            running_var = branch.running_var
            gamma = branch.weight
            beta = branch.bias
            eps = branch.eps
        std = (running_var + eps).sqrt()
        t = (gamma / std).reshape(-1, 1, 1, 1)
        return kernel * t, beta - running_mean * gamma / std
 
    def switch_to_deploy(self):
        if hasattr(self, 'rbr_reparam'):
            return
        kernel, bias = self.get_equivalent_kernel_bias()
        self.rbr_reparam = nn.Conv2d(
            in_channels=self.rbr_dense.conv.in_channels,
            out_channels=self.rbr_dense.conv.out_channels,
            kernel_size=self.rbr_dense.conv.kernel_size,
            stride=self.rbr_dense.conv.stride,
            padding=self.rbr_dense.conv.padding,
            dilation=self.rbr_dense.conv.dilation,
            groups=self.rbr_dense.conv.groups,
            bias=True)
        self.rbr_reparam.weight.data = kernel
        self.rbr_reparam.bias.data = bias
        for para in self.parameters():
            para.detach_()
        self.__delattr__('rbr_dense')
        self.__delattr__('rbr_1x1')
        if hasattr(self, 'rbr_identity'):
            self.__delattr__('rbr_identity')
        if hasattr(self, 'id_tensor'):
            self.__delattr__('id_tensor')
        self.deploy = True
 
class Swish(nn.Module):
    def __init__(self, inplace=True):
        super(Swish, self).__init__()
        self.inplace = inplace
 
    def forward(self, x):
        if self.inplace:
            x.mul_(F.sigmoid(x))
            return x
        else:
            return x * F.sigmoid(x)
 
def get_activation(name='silu', inplace=True):
    if name is None:
        return nn.Identity()
 
    if isinstance(name, str):
        if name == 'silu':
            module = nn.SiLU(inplace=inplace)
        elif name == 'relu':
            module = nn.ReLU(inplace=inplace)
        elif name == 'lrelu':
            module = nn.LeakyReLU(0.1, inplace=inplace)
        elif name == 'swish':
            module = Swish(inplace=inplace)
        elif name == 'hardsigmoid':
            module = nn.Hardsigmoid(inplace=inplace)
        elif name == 'identity':
            module = nn.Identity()
        else:
            raise AttributeError('Unsupported act type: {}'.format(name))
        return module
 
    elif isinstance(name, nn.Module):
        return name
 
    else:
        raise AttributeError('Unsupported act type: {}'.format(name))
 
def get_norm(name, out_channels, inplace=True):
    if name == 'bn':
        module = nn.BatchNorm2d(out_channels)
    else:
        raise NotImplementedError
    return module
 
class ConvBNAct(nn.Module):
    """A Conv2d -> Batchnorm -> silu/leaky relu block"""
    def __init__(
        self,
        in_channels,
        out_channels,
        ksize,
        stride=1,
        groups=1,
        bias=False,
        act='silu',
        norm='bn',
        reparam=False,
    ):
        super().__init__()
        # same padding
        pad = (ksize - 1) // 2
        self.conv = nn.Conv2d(
            in_channels,
            out_channels,
            kernel_size=ksize,
            stride=stride,
            padding=pad,
            groups=groups,
            bias=bias,
        )
        if norm is not None:
            self.bn = get_norm(norm, out_channels, inplace=True)
        if act is not None:
            self.act = get_activation(act, inplace=True)
        self.with_norm = norm is not None
        self.with_act = act is not None
 
    def forward(self, x):
        x = self.conv(x)
        if self.with_norm:
            x = self.bn(x)
        if self.with_act:
            x = self.act(x)
        return x
 
    def fuseforward(self, x):
        return self.act(self.conv(x))
 
class BasicBlock_3x3_Reverse(nn.Module):
    def __init__(self,
                 ch_in,
                 ch_hidden_ratio,
                 ch_out,
                 act='relu',
                 shortcut=True):
        super(BasicBlock_3x3_Reverse, self).__init__()
        assert ch_in == ch_out
        ch_hidden = int(ch_in * ch_hidden_ratio)
        self.conv1 = ConvBNAct(ch_hidden, ch_out, 3, stride=1, act=act)
        self.conv2 = RepConv(ch_in, ch_hidden, 3, stride=1, act=act)
        self.shortcut = shortcut
 
    def forward(self, x):
        y = self.conv2(x)
        y = self.conv1(y)
        if self.shortcut:
            return x + y
        else:
            return y
 
class SPP(nn.Module):
    def __init__(
        self,
        ch_in,
        ch_out,
        k,
        pool_size,
        act='swish',
    ):
        super(SPP, self).__init__()
        self.pool = []
        for i, size in enumerate(pool_size):
            pool = nn.MaxPool2d(kernel_size=size,
                                stride=1,
                                padding=size // 2,
                                ceil_mode=False)
            self.add_module('pool{}'.format(i), pool)
            self.pool.append(pool)
        self.conv = ConvBNAct(ch_in, ch_out, k, act=act)
 
    def forward(self, x):
        outs = [x]
 
        for pool in self.pool:
            outs.append(pool(x))
        y = torch.cat(outs, axis=1)
 
        y = self.conv(y)
        return y
 
class CSPStage(nn.Module):
    def __init__(self,
                 ch_in,
                 ch_out,
                 n=1,
                 block_fn='BasicBlock_3x3_Reverse',
                 ch_hidden_ratio=1.0,
                 act='silu',
                 spp=False):
        super(CSPStage, self).__init__()
 
        split_ratio = 2
        ch_first = int(ch_out // split_ratio)
        ch_mid = int(ch_out - ch_first)
        self.conv1 = ConvBNAct(ch_in, ch_first, 1, act=act)
        self.conv2 = ConvBNAct(ch_in, ch_mid, 1, act=act)
        self.convs = nn.Sequential()
 
        next_ch_in = ch_mid
        for i in range(n):
            if block_fn == 'BasicBlock_3x3_Reverse':
                self.convs.add_module(
                    str(i),
                    BasicBlock_3x3_Reverse(next_ch_in,
                                           ch_hidden_ratio,
                                           ch_mid,
                                           act=act,
                                           shortcut=True))
            else:
                raise NotImplementedError
            if i == (n - 1) // 2 and spp:
                self.convs.add_module(
                    'spp', SPP(ch_mid * 4, ch_mid, 1, [5, 9, 13], act=act))
            next_ch_in = ch_mid
        self.conv3 = ConvBNAct(ch_mid * n + ch_first, ch_out, 1, act=act)
 
    def forward(self, x):
        y1 = self.conv1(x)
        y2 = self.conv2(x)
 
        mid_out = [y1]
        for conv in self.convs:
            y2 = conv(y2)
            mid_out.append(y2)
        y = torch.cat(mid_out, axis=1)
        y = self.conv3(y)
        return y



四、添加步骤

4.1 修改一

① 在 ultralytics/nn/ 目录下新建 AddModules 文件夹用于存放模块代码

② 在 AddModules 文件夹下新建 CSPStage.py ,将 第三节 中的代码粘贴到此处

在这里插入图片描述

4.2 修改二

AddModules 文件夹下新建 __init__.py (已有则不用新建),在文件内导入模块: from .CSPStage import *

在这里插入图片描述

4.3 修改三

ultralytics/nn/modules/tasks.py 文件中,需要在两处位置添加各模块类名称。

首先:导入模块

在这里插入图片描述

其次:在 parse_model函数 中的注册 CSPStage 模块

在这里插入图片描述

在这里插入图片描述


五、yaml模型文件

5.1 模型改进版本⭐

此处以 ultralytics/cfg/models/rt-detr/rtdetr-l.yaml 为例,在同目录下创建一个用于自己数据集训练的模型文件 rtdetr-l-CSPStage.yaml

rtdetr-l.yaml 中的内容复制到 rtdetr-l-CSPStage.yaml 文件下,修改 nc 数量等于自己数据中目标的数量。

📌 模型的修改方法是将 颈部网络 进行修改。

# Ultralytics YOLO 🚀, AGPL-3.0 license
# RT-DETR-l object detection model with P3-P5 outputs. For details see https://docs.ultralytics.com/models/rtdetr

# Parameters
nc: 1 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'
  # [depth, width, max_channels]
  l: [1.00, 1.00, 1024]

backbone:
  # [from, repeats, module, args]
  - [-1, 1, HGStem, [32, 48]] # 0-P2/4
  - [-1, 6, HGBlock, [48, 128, 3]] # stage 1

  - [-1, 1, DWConv, [128, 3, 2, 1, False]] # 2-P3/8
  - [-1, 6, HGBlock, [96, 512, 3]] # stage 2

  - [-1, 1, DWConv, [512, 3, 2, 1, False]] # 4-P4/16
  - [-1, 6, HGBlock, [192, 1024, 5, True, False]] # cm, c2, k, light, shortcut
  - [-1, 6, HGBlock, [192, 1024, 5, True, True]]
  - [-1, 6, HGBlock, [192, 1024, 5, True, True]] # stage 3

  - [-1, 1, DWConv, [1024, 3, 2, 1, False]] # 8-P5/32
  - [-1, 6, HGBlock, [384, 2048, 5, True, False]] # stage 4

head:
  - [-1, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 10 input_proj.2
  - [-1, 1, AIFI, [1024, 8]]
  - [-1, 1, Conv, [256, 1, 1]] # 12, Y5, lateral_convs.0

  - [-1, 1, Conv, [256, 1, 1]] # 13
  - [7, 1, Conv, [256, 3, 2]] # 14, downsample_convs.0
  - [[-1, 13], 1, Concat, [1]] # 15
  - [-1, 3, CSPStage, [512]] # F4 (16), pan_blocks.0

  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [3, 1, Conv, [256, 3, 2]] # 18 input_proj.1
  - [[17, -1, 7], 1, Concat, [1]]
  - [-1, 3, RepC3, [256]] # 20, fpn_blocks.0

  - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  - [[-1, 3], 1, Concat, [1]] # cat backbone P4
  - [-1, 3, RepC3, [256]] # X3 (23), fpn_blocks.1

  - [-1, 1, Conv, [256, 3, 2]] # 24, downsample_convs.0
  - [[-1, 20], 1, Concat, [1]] # cat Y4
  - [-1, 3, CSPStage, [512]] # F4 (26), pan_blocks.0

  - [20, 1, Conv, [256, 3, 2]] # 27, downsample_convs.1
  - [26, 1, Conv, [256, 3, 2]] # 28, downsample_convs.1
  - [[16, 27, -1], 1, Concat, [1]] # cat Y5
  - [-1, 3, CSPStage, [1024]] # F5 (30), pan_blocks.1

  - [[23, 26, 30], 1, RTDETRDecoder, [nc]] # Detect(P3, P4, P5)


六、成功运行结果

打印网络模型可以看到颈部网络已经修改完成,并可以进行训练了。

rtdetr-l-CSPStage

rtdetr-l-CSPStage summary: 857 layers, 65,611,459 parameters, 65,611,459 gradients, 145.4 GFLOPs

                   from  n    params  module                                       arguments                     
  0                  -1  1     25248  ultralytics.nn.modules.block.HGStem          [3, 32, 48]                   
  1                  -1  6    155072  ultralytics.nn.modules.block.HGBlock         [48, 48, 128, 3, 6]           
  2                  -1  1      1408  ultralytics.nn.modules.conv.DWConv           [128, 128, 3, 2, 1, False]    
  3                  -1  6    839296  ultralytics.nn.modules.block.HGBlock         [128, 96, 512, 3, 6]          
  4                  -1  1      5632  ultralytics.nn.modules.conv.DWConv           [512, 512, 3, 2, 1, False]    
  5                  -1  6   1695360  ultralytics.nn.modules.block.HGBlock         [512, 192, 1024, 5, 6, True, False]
  6                  -1  6   2055808  ultralytics.nn.modules.block.HGBlock         [1024, 192, 1024, 5, 6, True, True]
  7                  -1  6   2055808  ultralytics.nn.modules.block.HGBlock         [1024, 192, 1024, 5, 6, True, True]
  8                  -1  1     11264  ultralytics.nn.modules.conv.DWConv           [1024, 1024, 3, 2, 1, False]  
  9                  -1  6   6708480  ultralytics.nn.modules.block.HGBlock         [1024, 384, 2048, 5, 6, True, False]
 10                  -1  1    524800  ultralytics.nn.modules.conv.Conv             [2048, 256, 1, 1, None, 1, 1, False]
 11                  -1  1    789760  ultralytics.nn.modules.transformer.AIFI      [256, 1024, 8]                
 12                  -1  1     66048  ultralytics.nn.modules.conv.Conv             [256, 256, 1, 1]              
 13                  -1  1     66048  ultralytics.nn.modules.conv.Conv             [256, 256, 1, 1]              
 14                   7  1   2359808  ultralytics.nn.modules.conv.Conv             [1024, 256, 3, 2]             
 15            [-1, 13]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 16                  -1  3   5319168  ultralytics.nn.AddModules.CSPStage.CSPStage  [512, 512]                    
 17                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 18                   3  1   1180160  ultralytics.nn.modules.conv.Conv             [512, 256, 3, 2]              
 19         [17, -1, 7]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 20                  -1  3   2887680  ultralytics.nn.modules.block.RepC3           [1792, 256, 3]                
 21                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 22             [-1, 3]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 23                  -1  3   2363392  ultralytics.nn.modules.block.RepC3           [768, 256, 3]                 
 24                  -1  1    590336  ultralytics.nn.modules.conv.Conv             [256, 256, 3, 2]              
 25            [-1, 20]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 26                  -1  3   5319168  ultralytics.nn.AddModules.CSPStage.CSPStage  [512, 512]                    
 27                  20  1    590336  ultralytics.nn.modules.conv.Conv             [256, 256, 3, 2]              
 28                  26  1   1180160  ultralytics.nn.modules.conv.Conv             [512, 256, 3, 2]              
 29        [16, 27, -1]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 30                  -1  3  21255168  ultralytics.nn.AddModules.CSPStage.CSPStage  [1024, 1024]                  
 31        [23, 26, 30]  1   7566051  ultralytics.nn.modules.head.RTDETRDecoder    [1, [256, 512, 1024]]         
rtdetr-l-CSPStage summary: 857 layers, 65,611,459 parameters, 65,611,459 gradients, 145.4 GFLOPs