RT-DETR改进策略【模型轻量化】| 替换骨干网络 CVPR-2023 FasterNet 高效快速的部分卷积块

一、本文介绍

本文记录的是基于FasterNet的RT-DETR轻量化改进方法研究。 FasterNet 的网络结构借鉴 CNN 的设计理念，通过提出的 PConv 减少推理时的计算和内存成本，同时减少通道数并增加部分比例，降低延迟，并通过后续的 PWConv 来弥补特征信息可能缺失的问题，提高了准确性。本文在替换骨干网络中配置了原论文中的 fasternet_t0 、 fasternet_t1 、 fasternet_t2 、 fasternet_s 、 fasternet_m 和 fasternet_l 六种模型，以满足不同的需求。

模型	参数量	计算量	推理速度
rtdetr-l	32.8M	108.0GFLOPs	11.6ms
Improved	20.7M	66.1GFLOPs	10.3ms

二、FasterNet结构详解

2.1 出发点

在计算机视觉任务中，追求快速神经网络是一个趋势。然而，现有的一些神经网络在降低FLOPs（浮点运算次数）时，往往忽略了FLOPS（每秒浮点运算次数）的优化，导致实际运行速度不够快。FasterNet的设计出发点是为了克服这一问题，实现既减少FLOPs又提高FLOPS，从而在各种设备上达到快速运行的效果，同时不影响准确性。

2.2 原理

2.2.1 PConv（部分卷积）的原理

减少计算冗余和内存访问 ：观察到现有算子（如DWConv）因频繁内存访问导致FLOPS低，提出 PConv 。它利用特征图在不同通道间的冗余，仅对部分输入通道应用常规卷积进行空间特征提取，其余通道保持不变。

例如，对于输入输出通道数相同且采用典型部分比例 $r=\frac{1}{4}$ 的情况， PConv 的FLOPs仅为常规卷积的 $\frac{1}{16}$ ，内存访问量也仅为常规卷积的 $\frac{1}{4}$ 。

与PWConv结合 ： PConv 后接 PWConv（逐点卷积） ，其有效感受野类似T形卷积，更关注中心位置。这种组合方式比直接使用T形卷积更能利用滤波器间的冗余，进一步节省FLOPs。

2.2.2 FasterNet的构建原理

以 PConv 和 PWConv 为主要构建算子，构建 FasterNet 。它保持架构简单，具有硬件友好性。其整体架构包含四个层次阶段，每个阶段前有嵌入层或合并层用于空间下采样和通道数扩展，每个阶段包含多个 FasterNet块 ，每个块由 一个PConv层和两个PWConv层 组成，类似倒置残差块结构，中间层通道数扩展且有 shortcut连接以复用输入特征。同时，仅在每个中间 PWConv层 后放置 归一化和激活层 ，以保留特征多样性并降低延迟。

2.3 结构

2.3.1 整体架构

四个阶段 ：具有四个层次阶段，各阶段通过嵌入层（如 $4 \times 4$ 卷积，步长为4）或合并层（如 $2 \times 2$ 卷积，步长为2）进行空间下采样和通道数扩展。
FasterNet块 ：每个阶段包含多个 FasterNet块 ，每个块由 PConv层 和 两个PWConv层 组成。

在这里插入图片描述

2.3.2 各层细节

PConv层 ：按照 部分比例 对输入通道进行卷积操作，例如 $r=\frac{1}{4}$ 时仅对 $\frac{1}{4}$ 的输入通道进行卷积。
PWConv层 ：在 PConv层 之后，用于进一步处理特征。

在这里插入图片描述

归一化和激活层 ：采用 批量归一化（BN） ，激活层对于较小的FasterNet变体选择GELU，较大变体选择ReLU，且仅在中间PWConv层后放置，以减少对特征多样性的影响并降低延迟。

2.4 优势

速度快 ：
- PConv的高FLOPS ：在GPU、CPU和ARM处理器上，PConv相比DWConv分别实现了 $10.5 X$ 、 $6.2 X$ 和 $22.8 X$ 更高的FLOPS，同时FLOPs显著降低。例如，10层纯PConv的堆叠在不同处理器上展现出良好的计算速度。
- FasterNet的高效运行 ：在多种设备上，如GPU、CPU和ARM处理器，FasterNet相比其他神经网络（如MobileViT、ResNet等）在保持相似或更高准确性的情况下，运行速度更快。

例如，在ImageNet - 1k上，FasterNet - T0在GPU、CPU和ARM处理器上分别比MobileViT - XXS快 $2.8 \times$ 、 $3.3 X$ 和 $2.4 X$ ，同时精度更高；FasterNet - L达到83.5%的top - 1准确率，与Swin - B相当，在GPU上推理吞吐量提高36%，在CPU上节省37%的计算时间。

准确性高 ：在分类、检测和分割等视觉任务上取得了先进的性能。在ImageNet - 1k分类任务中，不同变体的FasterNet都取得了较好的准确率，且在下游的COCO数据集上进行目标检测和实例分割任务时，相比ResNet和ResNext等模型，具有更高的平均精度（AP）。
结构简单 ：架构设计相比许多其他模型更简单，展示了设计简单而强大神经网络的可行性。这种简单性有助于硬件实现和模型的理解与应用。

论文： https://arxiv.org/pdf/2303.03667
源码： https://github.com/JierunChen/FasterNet

三、FasterNet实现代码

FasterNet 的实现代码如下：

# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
import torch, yaml
import torch.nn as nn
from timm.models.layers import DropPath, to_2tuple, trunc_normal_
from functools import partial
from typing import List
from torch import Tensor
import copy
import os
import numpy as np

__all__ = ['fasternet_t0', 'fasternet_t1', 'fasternet_t2', 'fasternet_s', 'fasternet_m', 'fasternet_l']

class Partial_conv3(nn.Module):

    def __init__(self, dim, n_div, forward):
        super().__init__()
        self.dim_conv3 = dim // n_div
        self.dim_untouched = dim - self.dim_conv3
        self.partial_conv3 = nn.Conv2d(self.dim_conv3, self.dim_conv3, 3, 1, 1, bias=False)

        if forward == 'slicing':
            self.forward = self.forward_slicing
        elif forward == 'split_cat':
            self.forward = self.forward_split_cat
        else:
            raise NotImplementedError

    def forward_slicing(self, x: Tensor) -> Tensor:
        # only for inference
        x = x.clone()   # !!! Keep the original input intact for the residual connection later
        x[:, :self.dim_conv3, :, :] = self.partial_conv3(x[:, :self.dim_conv3, :, :])

        return x

    def forward_split_cat(self, x: Tensor) -> Tensor:
        # for training/inference
        x1, x2 = torch.split(x, [self.dim_conv3, self.dim_untouched], dim=1)
        x1 = self.partial_conv3(x1)
        x = torch.cat((x1, x2), 1)

        return x

class MLPBlock(nn.Module):

    def __init__(self,
                 dim,
                 n_div,
                 mlp_ratio,
                 drop_path,
                 layer_scale_init_value,
                 act_layer,
                 norm_layer,
                 pconv_fw_type
                 ):

        super().__init__()
        self.dim = dim
        self.mlp_ratio = mlp_ratio
        self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
        self.n_div = n_div

        mlp_hidden_dim = int(dim * mlp_ratio)

        mlp_layer: List[nn.Module] = [
            nn.Conv2d(dim, mlp_hidden_dim, 1, bias=False),
            norm_layer(mlp_hidden_dim),
            act_layer(),
            nn.Conv2d(mlp_hidden_dim, dim, 1, bias=False)
        ]

        self.mlp = nn.Sequential(*mlp_layer)

        self.spatial_mixing = Partial_conv3(
            dim,
            n_div,
            pconv_fw_type
        )

        if layer_scale_init_value > 0:
            self.layer_scale = nn.Parameter(layer_scale_init_value * torch.ones((dim)), requires_grad=True)
            self.forward = self.forward_layer_scale
        else:
            self.forward = self.forward

    def forward(self, x: Tensor) -> Tensor:
        shortcut = x
        x = self.spatial_mixing(x)
        x = shortcut + self.drop_path(self.mlp(x))
        return x

    def forward_layer_scale(self, x: Tensor) -> Tensor:
        shortcut = x
        x = self.spatial_mixing(x)
        x = shortcut + self.drop_path(
            self.layer_scale.unsqueeze(-1).unsqueeze(-1) * self.mlp(x))
        return x

class BasicStage(nn.Module):

    def __init__(self,
                 dim,
                 depth,
                 n_div,
                 mlp_ratio,
                 drop_path,
                 layer_scale_init_value,
                 norm_layer,
                 act_layer,
                 pconv_fw_type
                 ):

        super().__init__()

        blocks_list = [
            MLPBlock(
                dim=dim,
                n_div=n_div,
                mlp_ratio=mlp_ratio,
                drop_path=drop_path[i],
                layer_scale_init_value=layer_scale_init_value,
                norm_layer=norm_layer,
                act_layer=act_layer,
                pconv_fw_type=pconv_fw_type
            )
            for i in range(depth)
        ]

        self.blocks = nn.Sequential(*blocks_list)

    def forward(self, x: Tensor) -> Tensor:
        x = self.blocks(x)
        return x

class PatchEmbed(nn.Module):

    def __init__(self, patch_size, patch_stride, in_chans, embed_dim, norm_layer):
        super().__init__()
        self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=patch_size, stride=patch_stride, bias=False)
        if norm_layer is not None:
            self.norm = norm_layer(embed_dim)
        else:
            self.norm = nn.Identity()

    def forward(self, x: Tensor) -> Tensor:
        x = self.norm(self.proj(x))
        return x

class PatchMerging(nn.Module):

    def __init__(self, patch_size2, patch_stride2, dim, norm_layer):
        super().__init__()
        self.reduction = nn.Conv2d(dim, 2 * dim, kernel_size=patch_size2, stride=patch_stride2, bias=False)
        if norm_layer is not None:
            self.norm = norm_layer(2 * dim)
        else:
            self.norm = nn.Identity()

    def forward(self, x: Tensor) -> Tensor:
        x = self.norm(self.reduction(x))
        return x

class FasterNet(nn.Module):
    def __init__(self,
                 in_chans=3,
                 num_classes=1000,
                 embed_dim=96,
                 depths=(1, 2, 8, 2),
                 mlp_ratio=2.,
                 n_div=4,
                 patch_size=4,
                 patch_stride=4,
                 patch_size2=2,  # for subsequent layers
                 patch_stride2=2,
                 patch_norm=True,
                 feature_dim=1280,
                 drop_path_rate=0.1,
                 layer_scale_init_value=0,
                 norm_layer='BN',
                 act_layer='RELU',
                 init_cfg=None,
                 pretrained=None,
                 pconv_fw_type='split_cat',
                 **kwargs):
        super().__init__()

        if norm_layer == 'BN':
            norm_layer = nn.BatchNorm2d
        else:
            raise NotImplementedError

        if act_layer == 'GELU':
            act_layer = nn.GELU
        elif act_layer == 'RELU':
            act_layer = partial(nn.ReLU, inplace=True)
        else:
            raise NotImplementedError

        self.num_stages = len(depths)
        self.embed_dim = embed_dim
        self.patch_norm = patch_norm
        self.num_features = int(embed_dim * 2 ** (self.num_stages - 1))
        self.mlp_ratio = mlp_ratio
        self.depths = depths

        # split image into non-overlapping patches
        self.patch_embed = PatchEmbed(
            patch_size=patch_size,
            patch_stride=patch_stride,
            in_chans=in_chans,
            embed_dim=embed_dim,
            norm_layer=norm_layer if self.patch_norm else None
        )

        # stochastic depth decay rule
        dpr = [x.item()
               for x in torch.linspace(0, drop_path_rate, sum(depths))]

        # build layers
        stages_list = []
        for i_stage in range(self.num_stages):
            stage = BasicStage(dim=int(embed_dim * 2 ** i_stage),
                               n_div=n_div,
                               depth=depths[i_stage],
                               mlp_ratio=self.mlp_ratio,
                               drop_path=dpr[sum(depths[:i_stage]):sum(depths[:i_stage + 1])],
                               layer_scale_init_value=layer_scale_init_value,
                               norm_layer=norm_layer,
                               act_layer=act_layer,
                               pconv_fw_type=pconv_fw_type
                               )
            stages_list.append(stage)

            # patch merging layer
            if i_stage < self.num_stages - 1:
                stages_list.append(
                    PatchMerging(patch_size2=patch_size2,
                                 patch_stride2=patch_stride2,
                                 dim=int(embed_dim * 2 ** i_stage),
                                 norm_layer=norm_layer)
                )

        self.stages = nn.Sequential(*stages_list)

        # add a norm layer for each output
        self.out_indices = [0, 2, 4, 6]
        for i_emb, i_layer in enumerate(self.out_indices):
            if i_emb == 0 and os.environ.get('FORK_LAST3', None):
                raise NotImplementedError
            else:
                layer = norm_layer(int(embed_dim * 2 ** i_emb))
            layer_name = f'norm{i_layer}'
            self.add_module(layer_name, layer)
        
        self.channel = [i.size(1) for i in self.forward(torch.randn(1, 3, 640, 640))]
    def forward(self, x: Tensor) -> Tensor:
        # output the features of four stages for dense prediction
        x = self.patch_embed(x)
        outs = []
        for idx, stage in enumerate(self.stages):
            x = stage(x)
            if idx in self.out_indices:
                norm_layer = getattr(self, f'norm{idx}')
                x_out = norm_layer(x)
                outs.append(x_out)
        return outs

def update_weight(model_dict, weight_dict):
    idx, temp_dict = 0, {}
    for k, v in weight_dict.items():
        if k in model_dict.keys() and np.shape(model_dict[k]) == np.shape(v):
            temp_dict[k] = v
            idx += 1
    model_dict.update(temp_dict)
    print(f'loading weights... {idx}/{len(model_dict)} items')
    return model_dict

def fasternet_t0(weights=None, cfg='ultralytics/nn/AddModules/faster_cfg/fasternet_t0.yaml'):
    with open(cfg) as f:
        cfg = yaml.load(f, Loader=yaml.SafeLoader)
    model = FasterNet(**cfg)
    if weights is not None:
        pretrain_weight = torch.load(weights, map_location='cpu')
        model.load_state_dict(update_weight(model.state_dict(), pretrain_weight))
    return model

def fasternet_t1(weights=None, cfg='ultralytics/nn/AddModules/faster_cfg/fasternet_t1.yaml'):
    with open(cfg) as f:
        cfg = yaml.load(f, Loader=yaml.SafeLoader)
    model = FasterNet(**cfg)
    if weights is not None:
        pretrain_weight = torch.load(weights, map_location='cpu')
        model.load_state_dict(update_weight(model.state_dict(), pretrain_weight))
    return model

def fasternet_t2(weights=None, cfg='ultralytics/nn/AddModules/faster_cfg/fasternet_t2.yaml'):
    with open(cfg) as f:
        cfg = yaml.load(f, Loader=yaml.SafeLoader)
    model = FasterNet(**cfg)
    if weights is not None:
        pretrain_weight = torch.load(weights, map_location='cpu')
        model.load_state_dict(update_weight(model.state_dict(), pretrain_weight))
    return model

def fasternet_s(weights=None, cfg='ultralytics/nn/AddModules/faster_cfg/fasternet_s.yaml'):
    with open(cfg) as f:
        cfg = yaml.load(f, Loader=yaml.SafeLoader)
    model = FasterNet(**cfg)
    if weights is not None:
        pretrain_weight = torch.load(weights, map_location='cpu')
        model.load_state_dict(update_weight(model.state_dict(), pretrain_weight))
    return model

def fasternet_m(weights=None, cfg='ultralytics/nn/AddModules/faster_cfg/fasternet_m.yaml'):
    with open(cfg) as f:
        cfg = yaml.load(f, Loader=yaml.SafeLoader)
    model = FasterNet(**cfg)
    if weights is not None:
        pretrain_weight = torch.load(weights, map_location='cpu')
        model.load_state_dict(update_weight(model.state_dict(), pretrain_weight))
    return model

def fasternet_l(weights=None, cfg='ultralytics/nn/AddModules/faster_cfg/fasternet_l.yaml'):
    with open(cfg) as f:
        cfg = yaml.load(f, Loader=yaml.SafeLoader)
    model = FasterNet(**cfg)
    if weights is not None:
        pretrain_weight = torch.load(weights, map_location='cpu')
        model.load_state_dict(update_weight(model.state_dict(), pretrain_weight))
    return model

if __name__ == '__main__':
    import yaml
    model = fasternet_t0(weights='fasternet_t0-epoch.281-val_acc1.71.9180.pth', cfg='cfg/fasternet_t0.yaml')
    print(model.channel)
    inputs = torch.randn((1, 3, 640, 640))
    for i in model(inputs):
        print(i.size())

四、修改步骤

4.1 修改一

① 在 ultralytics/nn/ 目录下新建 AddModules 文件夹用于存放模块代码

② 在 AddModules 文件夹下新建 FasterNet.py ，将 第三节 中的代码粘贴到此处

在这里插入图片描述

4.2 修改二

在 AddModules 文件夹下新建 __init__.py （已有则不用新建），在文件内导入模块： from .FasterNet import *

在这里插入图片描述

4.3 修改三

在 ultralytics/nn/modules/tasks.py 文件中，需要添加各模块类。

① 首先：导入模块

在这里插入图片描述

② 在BaseModel类的predict函数中，在如下两处位置中去掉 embed 参数：

在这里插入图片描述

③ 在BaseModel类的_predict_once函数，替换如下代码：

    def _predict_once(self, x, profile=False, visualize=False):
        """
        Perform a forward pass through the network.

        Args:
            x (torch.Tensor): The input tensor to the model.
            profile (bool):  Print the computation time of each layer if True, defaults to False.
            visualize (bool): Save the feature maps of the model if True, defaults to False.

        Returns:
            (torch.Tensor): The last output of the model.
        """
        y, dt = [], []  # outputs
        for m in self.model:
            if m.f != -1:  # if not from previous layer
                x = y[m.f] if isinstance(m.f, int) else [x if j == -1 else y[j] for j in m.f]  # from earlier layers
            if profile:
                self._profile_one_layer(m, x, dt)
            x = m(x)  # run
            y.append(x if m.i in self.save else None)  # save output
            if visualize:
                feature_visualization(x, m.type, m.i, save_dir=visualize)
        return x

在这里插入图片描述

④ 将 RTDETRDetectionModel类 中的 predict函数 完整替换：

    def predict(self, x, profile=False, visualize=False, batch=None, augment=False):
        """
        Perform a forward pass through the model.

        Args:
            x (torch.Tensor): The input tensor.
            profile (bool, optional): If True, profile the computation time for each layer. Defaults to False.
            visualize (bool, optional): If True, save feature maps for visualization. Defaults to False.
            batch (dict, optional): Ground truth data for evaluation. Defaults to None.
            augment (bool, optional): If True, perform data augmentation during inference. Defaults to False.

        Returns:
            (torch.Tensor): Model's output tensor.
        """
        y, dt = [], []  # outputs
        for m in self.model[:-1]:  # except the head part
            if m.f != -1:  # if not from previous layer
                x = y[m.f] if isinstance(m.f, int) else [x if j == -1 else y[j] for j in m.f]  # from earlier layers
            if profile:
                self._profile_one_layer(m, x, dt)
            if hasattr(m, 'backbone'):
                x = m(x)
                for _ in range(5 - len(x)):
                    x.insert(0, None)
                for i_idx, i in enumerate(x):
                    if i_idx in self.save:
                        y.append(i)
                    else:
                        y.append(None)
                # for i in x:
                #     if i is not None:
                #         print(i.size())
                x = x[-1]
            else:
                x = m(x)  # run
                y.append(x if m.i in self.save else None)  # save output
            if visualize:
                feature_visualization(x, m.type, m.i, save_dir=visualize)
        head = self.model[-1]
        x = head([y[j] for j in head.f], batch)  # head inference
        return x

在这里插入图片描述

⑤ 在 parse_model函数 如下位置替换如下代码:

    if verbose:
        LOGGER.info(f"\n{'':>3}{'from':>20}{'n':>3}{'params':>10}  {'module':<45}{'arguments':<30}")
    ch = [ch]
    layers, save, c2 = [], [], ch[-1]  # layers, savelist, ch out
    is_backbone = False
    for i, (f, n, m, args) in enumerate(d['backbone'] + d['head']):  # from, number, module, args
        try:
            if m == 'node_mode':
                m = d[m]
                if len(args) > 0:
                    if args[0] == 'head_channel':
                        args[0] = int(d[args[0]])
            t = m
            m = getattr(torch.nn, m[3:]) if 'nn.' in m else globals()[m]  # get module
        except:
            pass
        for j, a in enumerate(args):
            if isinstance(a, str):
                with contextlib.suppress(ValueError):
                    try:
                        args[j] = locals()[a] if a in locals() else ast.literal_eval(a)
                    except:
                        args[j] = a

替换后如下：

在这里插入图片描述

⑥ 在 parse_model 函数，添加如下代码。

elif m in {fasternet_t0, fasternet_t1, fasternet_t2, fasternet_s, fasternet_m, fasternet_l,}:
    m = m(*args)
    c2 = m.channel

在这里插入图片描述

⑦ 在 parse_model函数 如下位置替换如下代码:

    	if isinstance(c2, list):
            is_backbone = True
            m_ = m
            m_.backbone = True
        else:
            m_ = nn.Sequential(*(m(*args) for _ in range(n))) if n > 1 else m(*args)  # module
            t = str(m)[8:-2].replace('__main__.', '')  # module type
        
        m_.np = sum(x.numel() for x in m_.parameters())  # number params
        m_.i, m_.f, m_.type = i + 4 if is_backbone else i, f, t  # attach index, 'from' index, type
        if verbose:
            LOGGER.info(f'{i:>3}{str(f):>20}{n_:>3}{m_.np:10.0f}  {t:<45}{str(args):<30}')  # print
        save.extend(x % (i + 4 if is_backbone else i) for x in ([f] if isinstance(f, int) else f) if x != -1)  # append to savelist
        layers.append(m_)
        if i == 0:
            ch = []
        if isinstance(c2, list):
            ch.extend(c2)
            for _ in range(5 - len(ch)):
                ch.insert(0, 0)
        else:
            ch.append(c2)
    return nn.Sequential(*layers), sorted(save)

在这里插入图片描述

⑧ 在 ultralytics\nn\autobackend.py 文件的 AutoBackend类 中的 forward函数 ，完整替换如下代码：

    def forward(self, im, augment=False, visualize=False):
        """
        Runs inference on the YOLOv8 MultiBackend model.

        Args:
            im (torch.Tensor): The image tensor to perform inference on.
            augment (bool): whether to perform data augmentation during inference, defaults to False
            visualize (bool): whether to visualize the output predictions, defaults to False

        Returns:
            (tuple): Tuple containing the raw output tensor, and processed output for visualization (if visualize=True)
        """
        b, ch, h, w = im.shape  # batch, channel, height, width
        if self.fp16 and im.dtype != torch.float16:
            im = im.half()  # to FP16
        if self.nhwc:
            im = im.permute(0, 2, 3, 1)  # torch BCHW to numpy BHWC shape(1,320,192,3)

        if self.pt or self.nn_module:  # PyTorch
            y = self.model(im, augment=augment, visualize=visualize) if augment or visualize else self.model(im)
        elif self.jit:  # TorchScript
            y = self.model(im)
        elif self.dnn:  # ONNX OpenCV DNN
            im = im.cpu().numpy()  # torch to numpy
            self.net.setInput(im)
            y = self.net.forward()
        elif self.onnx:  # ONNX Runtime
            im = im.cpu().numpy()  # torch to numpy
            y = self.session.run(self.output_names, {self.session.get_inputs()[0].name: im})
        elif self.xml:  # OpenVINO
            im = im.cpu().numpy()  # FP32
            y = list(self.ov_compiled_model(im).values())
        elif self.engine:  # TensorRT
            if self.dynamic and im.shape != self.bindings['images'].shape:
                i = self.model.get_binding_index('images')
                self.context.set_binding_shape(i, im.shape)  # reshape if dynamic
                self.bindings['images'] = self.bindings['images']._replace(shape=im.shape)
                for name in self.output_names:
                    i = self.model.get_binding_index(name)
                    self.bindings[name].data.resize_(tuple(self.context.get_binding_shape(i)))
            s = self.bindings['images'].shape
            assert im.shape == s, f"input size {im.shape} {'>' if self.dynamic else 'not equal to'} max model size {s}"
            self.binding_addrs['images'] = int(im.data_ptr())
            self.context.execute_v2(list(self.binding_addrs.values()))
            y = [self.bindings[x].data for x in sorted(self.output_names)]
        elif self.coreml:  # CoreML
            im = im[0].cpu().numpy()
            im_pil = Image.fromarray((im * 255).astype('uint8'))
            # im = im.resize((192, 320), Image.BILINEAR)
            y = self.model.predict({'image': im_pil})  # coordinates are xywh normalized
            if 'confidence' in y:
                raise TypeError('Ultralytics only supports inference of non-pipelined CoreML models exported with '
                                f"'nms=False', but 'model={w}' has an NMS pipeline created by an 'nms=True' export.")
                # TODO: CoreML NMS inference handling
                # from ultralytics.utils.ops import xywh2xyxy
                # box = xywh2xyxy(y['coordinates'] * [[w, h, w, h]])  # xyxy pixels
                # conf, cls = y['confidence'].max(1), y['confidence'].argmax(1).astype(np.float32)
                # y = np.concatenate((box, conf.reshape(-1, 1), cls.reshape(-1, 1)), 1)
            elif len(y) == 1:  # classification model
                y = list(y.values())
            elif len(y) == 2:  # segmentation model
                y = list(reversed(y.values()))  # reversed for segmentation models (pred, proto)
        elif self.paddle:  # PaddlePaddle
            im = im.cpu().numpy().astype(np.float32)
            self.input_handle.copy_from_cpu(im)
            self.predictor.run()
            y = [self.predictor.get_output_handle(x).copy_to_cpu() for x in self.output_names]
        elif self.ncnn:  # ncnn
            mat_in = self.pyncnn.Mat(im[0].cpu().numpy())
            ex = self.net.create_extractor()
            input_names, output_names = self.net.input_names(), self.net.output_names()
            ex.input(input_names[0], mat_in)
            y = []
            for output_name in output_names:
                mat_out = self.pyncnn.Mat()
                ex.extract(output_name, mat_out)
                y.append(np.array(mat_out)[None])
        elif self.triton:  # NVIDIA Triton Inference Server
            im = im.cpu().numpy()  # torch to numpy
            y = self.model(im)
        else:  # TensorFlow (SavedModel, GraphDef, Lite, Edge TPU)
            im = im.cpu().numpy()
            if self.saved_model:  # SavedModel
                y = self.model(im, training=False) if self.keras else self.model(im)
                if not isinstance(y, list):
                    y = [y]
            elif self.pb:  # GraphDef
                y = self.frozen_func(x=self.tf.constant(im))
                if len(y) == 2 and len(self.names) == 999:  # segments and names not defined
                    ip, ib = (0, 1) if len(y[0].shape) == 4 else (1, 0)  # index of protos, boxes
                    nc = y[ib].shape[1] - y[ip].shape[3] - 4  # y = (1, 160, 160, 32), (1, 116, 8400)
                    self.names = {i: f'class{i}' for i in range(nc)}
            else:  # Lite or Edge TPU
                details = self.input_details[0]
                integer = details['dtype'] in (np.int8, np.int16)  # is TFLite quantized int8 or int16 model
                if integer:
                    scale, zero_point = details['quantization']
                    im = (im / scale + zero_point).astype(details['dtype'])  # de-scale
                self.interpreter.set_tensor(details['index'], im)
                self.interpreter.invoke()
                y = []
                for output in self.output_details:
                    x = self.interpreter.get_tensor(output['index'])
                    if integer:
                        scale, zero_point = output['quantization']
                        x = (x.astype(np.float32) - zero_point) * scale  # re-scale
                    if x.ndim > 2:  # if task is not classification
                        # Denormalize xywh by image size. See https://github.com/ultralytics/ultralytics/pull/1695
                        # xywh are normalized in TFLite/EdgeTPU to mitigate quantization error of integer models
                        x[:, [0, 2]] *= w
                        x[:, [1, 3]] *= h
                    y.append(x)
            # TF segment fixes: export is reversed vs ONNX export and protos are transposed
            if len(y) == 2:  # segment with (det, proto) output order reversed
                if len(y[1].shape) != 4:
                    y = list(reversed(y))  # should be y = (1, 116, 8400), (1, 160, 160, 32)
                y[1] = np.transpose(y[1], (0, 3, 1, 2))  # should be y = (1, 116, 8400), (1, 32, 160, 160)
            y = [x if isinstance(x, np.ndarray) else x.numpy() for x in y]

        # for x in y:
        #     print(type(x), len(x)) if isinstance(x, (list, tuple)) else print(type(x), x.shape)  # debug shapes
        if isinstance(y, (list, tuple)):
            return self.from_numpy(y[0]) if len(y) == 1 else [self.from_numpy(x) for x in y]
        else:
            return self.from_numpy(y)

在这里插入图片描述

4.4 修改四

在 ultralytics/nn/AddModules/ 目录下新建 faster_cfg

在此文件下新建 6个YAML 文件

在这里插入图片描述
① fasternet_l.yaml 文件中粘贴以下内容：

mlp_ratio: 2
embed_dim: 192
depths: [3, 4, 18, 3]
feature_dim: 1280
patch_size: 4
patch_stride: 4
patch_size2: 2
patch_stride2: 2
layer_scale_init_value: 0 # no layer scale
drop_path_rate: 0.3
norm_layer:  BN
act_layer: RELU
n_div: 4

② fasternet_m.yaml 文件中粘贴以下内容：

mlp_ratio: 2
embed_dim: 144
depths: [3, 4, 18, 3]
feature_dim: 1280
patch_size: 4
patch_stride: 4
patch_size2: 2
patch_stride2: 2
layer_scale_init_value: 0 # no layer scale
drop_path_rate: 0.2
norm_layer:  BN
act_layer: RELU
n_div: 4

③ fasternet_s.yaml 文件中粘贴以下内容：

mlp_ratio: 2
embed_dim: 128
depths: [1, 2, 13, 2]
feature_dim: 1280
patch_size: 4
patch_stride: 4
patch_size2: 2
patch_stride2: 2
layer_scale_init_value: 0 # no layer scale
drop_path_rate: 0.1
norm_layer:  BN
act_layer: RELU
n_div: 4

④ fasternet_t0.yaml 文件中粘贴以下内容：

mlp_ratio: 2
embed_dim: 40
depths: [1, 2, 8, 2]
feature_dim: 1280
patch_size: 4
patch_stride: 4
patch_size2: 2
patch_stride2: 2
layer_scale_init_value: 0 # no layer scale
drop_path_rate: 0.
norm_layer:  BN
act_layer: GELU
n_div: 4

⑤ fasternet_t1.yaml 文件中粘贴以下内容：

mlp_ratio: 2
embed_dim: 64
depths: [1, 2, 8, 2]
feature_dim: 1280
patch_size: 4
patch_stride: 4
patch_size2: 2
patch_stride2: 2
layer_scale_init_value: 0 # no layer scale
drop_path_rate: 0.02
norm_layer:  BN
act_layer: GELU
n_div: 4

⑥ fasternet_t2.yaml 文件中粘贴以下内容：

mlp_ratio: 2
embed_dim: 96
depths: [1, 2, 8, 2]
feature_dim: 1280
patch_size: 4
patch_stride: 4
patch_size2: 2
patch_stride2: 2
layer_scale_init_value: 0 # no layer scale
drop_path_rate: 0.05
norm_layer:  BN
act_layer: RELU
n_div: 4

4.5 修改五

在 FasterNet.py 中，将模型配置路径修改成自己的路径，我的路径是： ultralytics/nn/AddModules/faster_cfg

分别有 6 个，均要修改

在这里插入图片描述

至此就修改完成了，可以配置模型开始训练了

五、yaml模型文件

5.1 模型改进⭐

在代码配置完成后，配置模型的YAML文件。

此处以 ultralytics/cfg/models/rt-detr/rtdetr-l.yaml 为例，在同目录下创建一个用于自己数据集训练的模型文件 rtdetr-FasterNet.yaml 。

将 rtdetr-l.yaml 中的内容复制到 rtdetr-FasterNet.yaml 文件下，修改 nc 数量等于自己数据中目标的数量。

📌 模型的修改方法是将 骨干网络 替换成 fasternet_t0 。

# Ultralytics YOLO 🚀, AGPL-3.0 license
# RT-DETR-l object detection model with P3-P5 outputs. For details see https://docs.ultralytics.com/models/rtdetr

# Parameters
nc: 1  # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'
  # [depth, width, max_channels]
  l: [1.00, 1.00, 1024]

backbone:
  # [from, repeats, module, args]
  - [-1, 1, fasternet_t0, []]  # 4

head:
  - [-1, 1, Conv, [256, 1, 1, None, 1, 1, False]]  # 5 input_proj.2
  - [-1, 1, AIFI, [1024, 8]] # 6
  - [-1, 1, Conv, [256, 1, 1]]  # 7, Y5, lateral_convs.0

  - [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 8
  - [3, 1, Conv, [256, 1, 1, None, 1, 1, False]]  # 9 input_proj.1
  - [[-2, -1], 1, Concat, [1]] # 10
  - [-1, 3, RepC3, [256]]  # 11, fpn_blocks.0
  - [-1, 1, Conv, [256, 1, 1]]   # 12, Y4, lateral_convs.1

  - [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 13
  - [2, 1, Conv, [256, 1, 1, None, 1, 1, False]]  # 14 input_proj.0
  - [[-2, -1], 1, Concat, [1]]  # 15 cat backbone P4
  - [-1, 3, RepC3, [256]]    # X3 (16), fpn_blocks.1

  - [-1, 1, Conv, [256, 3, 2]]   # 17, downsample_convs.0
  - [[-1, 12], 1, Concat, [1]]  # 18 cat Y4
  - [-1, 3, RepC3, [256]]    # F4 (19), pan_blocks.0

  - [-1, 1, Conv, [256, 3, 2]]   # 20, downsample_convs.1
  - [[-1, 7], 1, Concat, [1]]  # 21 cat Y5
  - [-1, 3, RepC3, [256]]    # F5 (22), pan_blocks.1

  - [[16, 19, 22], 1, RTDETRDecoder, [nc]]  # Detect(P3, P4, P5)

六、成功运行结果

分别打印网络模型可以看到 FasterNet模块 已经加入到模型中，并可以进行训练了。

rtdetr-FasterNet ：

rtdetr-FasterNet summary: 513 layers, 20,696,711 parameters, 20,696,711 gradients, 66.1 GFLOPs

                   from  n    params  module                                       arguments                     
  0                  -1  1   2216100  fasternet_t0                                 []                            
  1                  -1  1     82432  ultralytics.nn.modules.conv.Conv             [320, 256, 1, 1, None, 1, 1, False]
  2                  -1  1    789760  ultralytics.nn.modules.transformer.AIFI      [256, 1024, 8]                
  3                  -1  1     66048  ultralytics.nn.modules.conv.Conv             [256, 256, 1, 1]              
  4                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
  5                   3  1     41472  ultralytics.nn.modules.conv.Conv             [160, 256, 1, 1, None, 1, 1, False]
  6            [-2, -1]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
  7                  -1  3   2232320  ultralytics.nn.modules.block.RepC3           [512, 256, 3]                 
  8                  -1  1     66048  ultralytics.nn.modules.conv.Conv             [256, 256, 1, 1]              
  9                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 10                   2  1     20992  ultralytics.nn.modules.conv.Conv             [80, 256, 1, 1, None, 1, 1, False]
 11            [-2, -1]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 12                  -1  3   2232320  ultralytics.nn.modules.block.RepC3           [512, 256, 3]                 
 13                  -1  1    590336  ultralytics.nn.modules.conv.Conv             [256, 256, 3, 2]              
 14            [-1, 12]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 15                  -1  3   2232320  ultralytics.nn.modules.block.RepC3           [512, 256, 3]                 
 16                  -1  1    590336  ultralytics.nn.modules.conv.Conv             [256, 256, 3, 2]              
 17             [-1, 7]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 18                  -1  3   2232320  ultralytics.nn.modules.block.RepC3           [512, 256, 3]                 
 19        [16, 19, 22]  1   7303907  ultralytics.nn.modules.head.RTDETRDecoder    [1, [256, 256, 256]]          
rtdetr-FasterNet summary: 513 layers, 20,696,711 parameters, 20,696,711 gradients, 66.1 GFLOPs

学习资源站

RT-DETR改进策略【模型轻量化】替换骨干网络CVPR-2023FasterNet高效快速的部分卷积块_rtdetr轻量化改进-

RT-DETR改进策略【模型轻量化】| 替换骨干网络 CVPR-2023 FasterNet 高效快速的部分卷积块

一、本文介绍

二、FasterNet结构详解

2.1 出发点

2.2 原理

2.2.1 PConv（部分卷积）的原理

2.2.2 FasterNet的构建原理

2.3 结构

2.3.1 整体架构

2.3.2 各层细节

2.4 优势

三、FasterNet实现代码

四、修改步骤

4.1 修改一

4.2 修改二

4.3 修改三

4.4 修改四

4.5 修改五

五、yaml模型文件

5.1 模型改进⭐

六、成功运行结果