RT-DETR改进策略【模型轻量化】| 替换轻量化骨干网络：ShuffleNet V1

一、本文介绍

本文记录的是基于ShuffleNet V1的RT-DETR轻量化改进方法研究。 ShuffleNet 利用 逐点分组卷积 和 通道混洗 操作降低计算成本，克服了现有先进架构在极小网络中因 1×1 卷积计算成本高而效率低的问题。相比一些传统的网络架构，能更好地 在有限计算资源下提升模型性能 。本文配置了原论文中 shufflenet_v1_x0_5 、 shufflenet_v1_x1_0 、 shufflenet_v1_x1_5 和 shufflenet_v1_x2_0 四种模型，以满足不同的需求。

模型	参数量	计算量	推理速度
rtdetr-l	32.8M	108.0GFLOPs	11.6ms
Improved	19.7M	62.0GFLOPs	10.5ms

二、ShuffleNet v1模型轻量化设计

ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices

2.1 出发点

移动设备计算资源有限 ：在移动设备（如无人机、机器人、智能手机等）上运行的深度学习模型需要在非常有限的计算预算（如10 - 150 MFLOPs）下追求最佳精度，而现有的一些工作主要集中在对现有网络架构进行剪枝、压缩或低比特表示，缺乏专门针对此计算范围设计的高效基础架构。
现有架构的局限性 ：一些先进的基础架构如Xception和ResNeXt在极小的网络中效率较低，原因是1×1卷积计算成本高。

2.2 原理

2.2.1 Pointwise Group Convolution（逐点分组卷积）

为降低1×1卷积的计算复杂度，提出使用 逐点分组卷积 。

例如在ResNeXt中，只有3×3层配备了分组卷积，导致1×1卷积在每个残差单元中仍占据较高的计算量（如93.4%的乘法 - 加法运算）。

通过在1×1层也应用分组卷积，确保每个卷积仅在相应的输入通道组上操作，可显著降低计算成本。

2.2.2 Channel Shuffle（通道混洗）操作

原因：
当多个分组卷积层堆叠时会产生副作用，即某个通道的输出仅来自一小部分输入通道，阻碍了通道组之间的信息流，削弱了表示能力。为解决此问题，提出通道混洗操作。

方法：
对于一个具有g组且输出有g×n个通道的卷积层，先将输出通道维度重塑为(g, n)，进行转置然后再展平作为下一层的输入。此操作可使分组卷积能从不同组获取输入数据，保证输入和输出通道完全相关，且该操作是可微的，可嵌入到网络结构中进行端到端训练。

2.3 结构

2.3.1 ShuffleNet Unit（ShuffleNet单元）

基于瓶颈单元（bottleneck unit）设计原则，在其残差分支中，对于3×3层应用计算经济的3×3深度可分离卷积。将第一个1×1层替换为逐点分组卷积并后跟一个通道混洗操作，第二个逐点分组卷积用于恢复通道维度以匹配快捷路径。在应用步长（stride）的情况下，在快捷路径上添加一个3×3平均池化，并将逐元素相加替换为通道拼接。

在这里插入图片描述

ShuffleNet单元。a)深度卷积瓶颈单元（DWConv）；b)带点群卷积（GConv）和通道shuffle的ShuffleNet单元；c)步长=2的ShuffleNet单元。

2.3.2 Network Architecture（网络架构）

由多个 ShuffleNet 单元堆叠组成，分为三个阶段，每个阶段的第一个构建块应用stride = 2 ，同一阶段内其他超参数保持不变，下一阶段输出通道翻倍。通过调整分组数量 g 和通道缩放因子 s 来控制网络的连接稀疏性和复杂度。

2.4 优势

计算效率高
- 在相同计算复杂度预算下，与ResNet和ResNeXt相比，ShuffleNet单元的复杂度更低。例如，给定输入大小和瓶颈通道数，ResNet单元、ResNeXt单元和ShuffleNet单元所需的FLOPs分别为 $hw(2cm + 9m^{2})$ 、 $hw(2cm + 9m^{2}/g)$ 和 $h w (2 c m / g + 9 m)$ 。
性能优越
- 在ImageNet分类和MS COCO目标检测任务上表现优异。与MobileNet相比，在40 MFLOPs的计算预算下，ShuffleNet在ImageNet分类任务上的top - 1误差绝对值低7.8%；在ARM - 基于移动设备上，ShuffleNet在保持可比精度的情况下，比AlexNet实现了约13×的实际加速。
能更好地利用有限计算资源
- 对于小网络，给定计算预算时可使用更宽的特征图，这对处理信息至关重要，因为小网络通常通道数不足。同时，在ShuffleNet单元中，深度可分离卷积仅应用于瓶颈特征图，可避免在低功耗移动设备上的实现开销。

论文： https://arxiv.org/pdf/1707.01083.pdf
源码：h ttps://github.com/Lornatang/ShuffleNetV1-PyTorch/blob/main/model.py

三、ShuffleNet v1模块的实现代码

ShuffleNetv1模块 的实现代码如下：

# Copyright 2022 Dakewe Biotech Corporation. All Rights Reserved.
# Licensed under the Apache License, Version 2.0 (the "License");
#   you may not use this file except in compliance with the License.
#   You may obtain a copy of the License at
#
#       http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
from typing import Any, List, Optional
 
import torch
from torch import Tensor
from torch import nn
 
__all__ = [
    "shufflenet_v1_x0_5", "shufflenet_v1_x1_0", "shufflenet_v1_x1_5", "shufflenet_v1_x2_0",
]

class ShuffleNetV1(nn.Module):
 
    def __init__(
            self,
            repeats_times: List[int],
            stages_out_channels: List[int],
            groups: int = 8,
            num_classes: int = 1000,
    ) -> None:
        super(ShuffleNetV1, self).__init__()
        in_channels = stages_out_channels[0]
 
        self.first_conv = nn.Sequential(
            nn.Conv2d(3, in_channels, (3, 3), (2, 2), (1, 1), bias=False),
            nn.BatchNorm2d(in_channels),
            nn.ReLU(True),
        )
        self.maxpool = nn.MaxPool2d((3, 3), (2, 2), (1, 1))
 
        features = []
        for state_repeats_times_index in range(len(repeats_times)):
            out_channels = stages_out_channels[state_repeats_times_index + 1]
 
            for i in range(repeats_times[state_repeats_times_index]):
                stride = 2 if i == 0 else 1
                first_group = state_repeats_times_index == 0 and i == 0
                features.append(
                    ShuffleNetV1Unit(
                        in_channels,
                        out_channels,
                        stride,
                        groups,
                        first_group,
                    )
                )
                in_channels = out_channels
        self.features = nn.Sequential(*features)
 
        self.globalpool = nn.AvgPool2d((7, 7))
 
        self.classifier = nn.Sequential(
            nn.Linear(stages_out_channels[-1], num_classes, bias=False),
        )
 
        # Initialize neural network weights
        self._initialize_weights()
        self.index = stages_out_channels[-3:]
        self.width_list = [i.size(1) for i in self.forward(torch.randn(1, 3, 640, 640))]
 
    # def forward(self, x: Tensor) -> list[Optional[Any]]:
    def forward(self, x: Tensor):
        x = self.first_conv(x)
        x = self.maxpool(x)
        results = [None, None, None, None]
        for index, model in enumerate(self.features):
            x = model(x)
            # results.append(x)
            if index == 0:
                results[index] = x
            if x.size(1) in self.index:
                position = self.index.index(x.size(1))  # Find the position in the index list
                results[position + 1] = x
        return results
 
    def _initialize_weights(self) -> None:
        for name, module in self.named_modules():
            if isinstance(module, nn.Conv2d):
                if 'first' in name:
                    nn.init.normal_(module.weight, 0, 0.01)
                else:
                    nn.init.normal_(module.weight, 0, 1.0 / module.weight.shape[1])
                if module.bias is not None:
                    nn.init.constant_(module.bias, 0)
            elif isinstance(module, nn.BatchNorm2d):
                nn.init.constant_(module.weight, 1)
                if module.bias is not None:
                    nn.init.constant_(module.bias, 0.0001)
                nn.init.constant_(module.running_mean, 0)
            elif isinstance(module, nn.BatchNorm1d):
                nn.init.constant_(module.weight, 1)
                if module.bias is not None:
                    nn.init.constant_(module.bias, 0.0001)
                nn.init.constant_(module.running_mean, 0)
            elif isinstance(module, nn.Linear):
                nn.init.normal_(module.weight, 0, 0.01)
                if module.bias is not None:
                    nn.init.constant_(module.bias, 0)

class ShuffleNetV1Unit(nn.Module):
    def __init__(
            self,
            in_channels: int,
            out_channels: int,
            stride: int,
            groups: int,
            first_groups: bool = False,
    ) -> None:
        super(ShuffleNetV1Unit, self).__init__()
        self.stride = stride
        self.groups = groups
        self.first_groups = first_groups
        hidden_channels = out_channels // 4
 
        if stride == 2:
            out_channels -= in_channels
            self.branch_proj = nn.AvgPool2d((3, 3), (2, 2), (1, 1))
 
        self.branch_main_1 = nn.Sequential(
            # pw
            nn.Conv2d(in_channels, hidden_channels, (1, 1), (1, 1), (0, 0), groups=1 if first_groups else groups,
                      bias=False),
            nn.BatchNorm2d(hidden_channels),
            nn.ReLU(True),
            # dw
            nn.Conv2d(hidden_channels, hidden_channels, (3, 3), (stride, stride), (1, 1), groups=hidden_channels,
                      bias=False),
            nn.BatchNorm2d(hidden_channels),
        )
        self.branch_main_2 = nn.Sequential(
            # pw-linear
            nn.Conv2d(hidden_channels, out_channels, (1, 1), (1, 1), (0, 0), groups=groups, bias=False),
            nn.BatchNorm2d(out_channels),
        )
 
        self.relu = nn.ReLU(True)
 
    def channel_shuffle(self, x):
        batch_size, channels, height, width = x.data.size()
        assert channels % self.groups == 0
        group_channels = channels // self.groups
 
        out = x.reshape(batch_size, group_channels, self.groups, height, width)
        out = out.permute(0, 2, 1, 3, 4)
        out = out.reshape(batch_size, channels, height, width)
 
        return out
 
    def forward(self, x: Tensor) -> Tensor:
        identify = x
 
        out = self.branch_main_1(x)
        out = self.channel_shuffle(out)
        out = self.branch_main_2(out)
 
        if self.stride == 2:
            branch_proj = self.branch_proj(x)
            out = self.relu(out)
            out = torch.cat([branch_proj, out], 1)
            return out
        else:
            out = torch.add(out, identify)
            out = self.relu(out)
            return out

def shufflenet_v1_x0_5(**kwargs: Any) -> ShuffleNetV1:
    model = ShuffleNetV1([4, 8, 4], [16, 192, 384, 768], 8, **kwargs)
 
    return model

def shufflenet_v1_x1_0(**kwargs: Any) -> ShuffleNetV1:
    model = ShuffleNetV1([4, 8, 4], [24, 384, 768, 1536], 8, **kwargs)
 
    return model

def shufflenet_v1_x1_5(**kwargs: Any) -> ShuffleNetV1:
    model = ShuffleNetV1([4, 8, 4], [24, 576, 1152, 2304], 8, **kwargs)
 
    return model

def shufflenet_v1_x2_0(**kwargs: Any) -> ShuffleNetV1:
    model = ShuffleNetV1([4, 8, 4], [48, 768, 1536, 3072], 8, **kwargs)
 
    return model

if __name__ == "__main__":
 
    # Generating Sample image
    image_size = (1, 3, 640, 640)
    image = torch.rand(*image_size)
 
    # Model
    model = shufflenet_v1_x0_5()
 
    out = model(image)
    print(out)

四、修改步骤

4.1 修改一

① 在 ultralytics/nn/ 目录下新建 AddModules 文件夹用于存放模块代码

② 在 AddModules 文件夹下新建 ShuffleNetV1.py ，将 第三节 中的代码粘贴到此处

在这里插入图片描述

4.2 修改二

在 AddModules 文件夹下新建 __init__.py （已有则不用新建），在文件内导入模块： from .ShuffleNetV1 import *

在这里插入图片描述

4.3 修改三

在 ultralytics/nn/modules/tasks.py 文件中，需要添加各模块类。

① 首先：导入模块

在这里插入图片描述

② 在BaseModel类的predict函数中，在如下两处位置中去掉 embed 参数：

在这里插入图片描述

③ 在BaseModel类的_predict_once函数，替换如下代码：

    def _predict_once(self, x, profile=False, visualize=False):
        """
        Perform a forward pass through the network.

        Args:
            x (torch.Tensor): The input tensor to the model.
            profile (bool):  Print the computation time of each layer if True, defaults to False.
            visualize (bool): Save the feature maps of the model if True, defaults to False.

        Returns:
            (torch.Tensor): The last output of the model.
        """
        y, dt = [], []  # outputs
        for m in self.model:
            if m.f != -1:  # if not from previous layer
                x = y[m.f] if isinstance(m.f, int) else [x if j == -1 else y[j] for j in m.f]  # from earlier layers
            if profile:
                self._profile_one_layer(m, x, dt)
            x = m(x)  # run
            y.append(x if m.i in self.save else None)  # save output
            if visualize:
                feature_visualization(x, m.type, m.i, save_dir=visualize)
        return x

在这里插入图片描述

④ 将 RTDETRDetectionModel类 中的 predict函数 完整替换：

    def predict(self, x, profile=False, visualize=False, batch=None, augment=False):
        """
        Perform a forward pass through the model.

        Args:
            x (torch.Tensor): The input tensor.
            profile (bool, optional): If True, profile the computation time for each layer. Defaults to False.
            visualize (bool, optional): If True, save feature maps for visualization. Defaults to False.
            batch (dict, optional): Ground truth data for evaluation. Defaults to None.
            augment (bool, optional): If True, perform data augmentation during inference. Defaults to False.

        Returns:
            (torch.Tensor): Model's output tensor.
        """
        y, dt = [], []  # outputs
        for m in self.model[:-1]:  # except the head part
            if m.f != -1:  # if not from previous layer
                x = y[m.f] if isinstance(m.f, int) else [x if j == -1 else y[j] for j in m.f]  # from earlier layers
            if profile:
                self._profile_one_layer(m, x, dt)
            if hasattr(m, 'backbone'):
                x = m(x)
                for _ in range(5 - len(x)):
                    x.insert(0, None)
                for i_idx, i in enumerate(x):
                    if i_idx in self.save:
                        y.append(i)
                    else:
                        y.append(None)
                # for i in x:
                #     if i is not None:
                #         print(i.size())
                x = x[-1]
            else:
                x = m(x)  # run
                y.append(x if m.i in self.save else None)  # save output
            if visualize:
                feature_visualization(x, m.type, m.i, save_dir=visualize)
        head = self.model[-1]
        x = head([y[j] for j in head.f], batch)  # head inference
        return x

在这里插入图片描述

⑤ 在 parse_model函数 如下位置替换如下代码:

    if verbose:
        LOGGER.info(f"\n{'':>3}{'from':>20}{'n':>3}{'params':>10}  {'module':<45}{'arguments':<30}")
    ch = [ch]
    layers, save, c2 = [], [], ch[-1]  # layers, savelist, ch out
    is_backbone = False
    for i, (f, n, m, args) in enumerate(d['backbone'] + d['head']):  # from, number, module, args
        try:
            if m == 'node_mode':
                m = d[m]
                if len(args) > 0:
                    if args[0] == 'head_channel':
                        args[0] = int(d[args[0]])
            t = m
            m = getattr(torch.nn, m[3:]) if 'nn.' in m else globals()[m]  # get module
        except:
            pass
        for j, a in enumerate(args):
            if isinstance(a, str):
                with contextlib.suppress(ValueError):
                    try:
                        args[j] = locals()[a] if a in locals() else ast.literal_eval(a)
                    except:
                        args[j] = a

替换后如下：

在这里插入图片描述

⑥ 在 parse_model 函数，添加如下代码。

elif m in {
           shufflenet_v1_x0_5, shufflenet_v1_x1_0, shufflenet_v1_x1_5, shufflenet_v1_x2_0,
           }:
    m = m(*args)
    c2 = m.width_list

在这里插入图片描述

⑦ 在 parse_model函数 如下位置替换如下代码:

    	if isinstance(c2, list):
            is_backbone = True
            m_ = m
            m_.backbone = True
        else:
            m_ = nn.Sequential(*(m(*args) for _ in range(n))) if n > 1 else m(*args)  # module
            t = str(m)[8:-2].replace('__main__.', '')  # module type
        
        m_.np = sum(x.numel() for x in m_.parameters())  # number params
        m_.i, m_.f, m_.type = i + 4 if is_backbone else i, f, t  # attach index, 'from' index, type
        if verbose:
            LOGGER.info(f'{i:>3}{str(f):>20}{n_:>3}{m_.np:10.0f}  {t:<45}{str(args):<30}')  # print
        save.extend(x % (i + 4 if is_backbone else i) for x in ([f] if isinstance(f, int) else f) if x != -1)  # append to savelist
        layers.append(m_)
        if i == 0:
            ch = []
        if isinstance(c2, list):
            ch.extend(c2)
            for _ in range(5 - len(ch)):
                ch.insert(0, 0)
        else:
            ch.append(c2)
    return nn.Sequential(*layers), sorted(save)

在这里插入图片描述

⑧ 在 ultralytics\nn\autobackend.py 文件的 AutoBackend类 中的 forward函数 ，完整替换如下代码：

    def forward(self, im, augment=False, visualize=False):
        """
        Runs inference on the YOLOv8 MultiBackend model.

        Args:
            im (torch.Tensor): The image tensor to perform inference on.
            augment (bool): whether to perform data augmentation during inference, defaults to False
            visualize (bool): whether to visualize the output predictions, defaults to False

        Returns:
            (tuple): Tuple containing the raw output tensor, and processed output for visualization (if visualize=True)
        """
        b, ch, h, w = im.shape  # batch, channel, height, width
        if self.fp16 and im.dtype != torch.float16:
            im = im.half()  # to FP16
        if self.nhwc:
            im = im.permute(0, 2, 3, 1)  # torch BCHW to numpy BHWC shape(1,320,192,3)

        if self.pt or self.nn_module:  # PyTorch
            y = self.model(im, augment=augment, visualize=visualize) if augment or visualize else self.model(im)
        elif self.jit:  # TorchScript
            y = self.model(im)
        elif self.dnn:  # ONNX OpenCV DNN
            im = im.cpu().numpy()  # torch to numpy
            self.net.setInput(im)
            y = self.net.forward()
        elif self.onnx:  # ONNX Runtime
            im = im.cpu().numpy()  # torch to numpy
            y = self.session.run(self.output_names, {self.session.get_inputs()[0].name: im})
        elif self.xml:  # OpenVINO
            im = im.cpu().numpy()  # FP32
            y = list(self.ov_compiled_model(im).values())
        elif self.engine:  # TensorRT
            if self.dynamic and im.shape != self.bindings['images'].shape:
                i = self.model.get_binding_index('images')
                self.context.set_binding_shape(i, im.shape)  # reshape if dynamic
                self.bindings['images'] = self.bindings['images']._replace(shape=im.shape)
                for name in self.output_names:
                    i = self.model.get_binding_index(name)
                    self.bindings[name].data.resize_(tuple(self.context.get_binding_shape(i)))
            s = self.bindings['images'].shape
            assert im.shape == s, f"input size {im.shape} {'>' if self.dynamic else 'not equal to'} max model size {s}"
            self.binding_addrs['images'] = int(im.data_ptr())
            self.context.execute_v2(list(self.binding_addrs.values()))
            y = [self.bindings[x].data for x in sorted(self.output_names)]
        elif self.coreml:  # CoreML
            im = im[0].cpu().numpy()
            im_pil = Image.fromarray((im * 255).astype('uint8'))
            # im = im.resize((192, 320), Image.BILINEAR)
            y = self.model.predict({'image': im_pil})  # coordinates are xywh normalized
            if 'confidence' in y:
                raise TypeError('Ultralytics only supports inference of non-pipelined CoreML models exported with '
                                f"'nms=False', but 'model={w}' has an NMS pipeline created by an 'nms=True' export.")
                # TODO: CoreML NMS inference handling
                # from ultralytics.utils.ops import xywh2xyxy
                # box = xywh2xyxy(y['coordinates'] * [[w, h, w, h]])  # xyxy pixels
                # conf, cls = y['confidence'].max(1), y['confidence'].argmax(1).astype(np.float32)
                # y = np.concatenate((box, conf.reshape(-1, 1), cls.reshape(-1, 1)), 1)
            elif len(y) == 1:  # classification model
                y = list(y.values())
            elif len(y) == 2:  # segmentation model
                y = list(reversed(y.values()))  # reversed for segmentation models (pred, proto)
        elif self.paddle:  # PaddlePaddle
            im = im.cpu().numpy().astype(np.float32)
            self.input_handle.copy_from_cpu(im)
            self.predictor.run()
            y = [self.predictor.get_output_handle(x).copy_to_cpu() for x in self.output_names]
        elif self.ncnn:  # ncnn
            mat_in = self.pyncnn.Mat(im[0].cpu().numpy())
            ex = self.net.create_extractor()
            input_names, output_names = self.net.input_names(), self.net.output_names()
            ex.input(input_names[0], mat_in)
            y = []
            for output_name in output_names:
                mat_out = self.pyncnn.Mat()
                ex.extract(output_name, mat_out)
                y.append(np.array(mat_out)[None])
        elif self.triton:  # NVIDIA Triton Inference Server
            im = im.cpu().numpy()  # torch to numpy
            y = self.model(im)
        else:  # TensorFlow (SavedModel, GraphDef, Lite, Edge TPU)
            im = im.cpu().numpy()
            if self.saved_model:  # SavedModel
                y = self.model(im, training=False) if self.keras else self.model(im)
                if not isinstance(y, list):
                    y = [y]
            elif self.pb:  # GraphDef
                y = self.frozen_func(x=self.tf.constant(im))
                if len(y) == 2 and len(self.names) == 999:  # segments and names not defined
                    ip, ib = (0, 1) if len(y[0].shape) == 4 else (1, 0)  # index of protos, boxes
                    nc = y[ib].shape[1] - y[ip].shape[3] - 4  # y = (1, 160, 160, 32), (1, 116, 8400)
                    self.names = {i: f'class{i}' for i in range(nc)}
            else:  # Lite or Edge TPU
                details = self.input_details[0]
                integer = details['dtype'] in (np.int8, np.int16)  # is TFLite quantized int8 or int16 model
                if integer:
                    scale, zero_point = details['quantization']
                    im = (im / scale + zero_point).astype(details['dtype'])  # de-scale
                self.interpreter.set_tensor(details['index'], im)
                self.interpreter.invoke()
                y = []
                for output in self.output_details:
                    x = self.interpreter.get_tensor(output['index'])
                    if integer:
                        scale, zero_point = output['quantization']
                        x = (x.astype(np.float32) - zero_point) * scale  # re-scale
                    if x.ndim > 2:  # if task is not classification
                        # Denormalize xywh by image size. See https://github.com/ultralytics/ultralytics/pull/1695
                        # xywh are normalized in TFLite/EdgeTPU to mitigate quantization error of integer models
                        x[:, [0, 2]] *= w
                        x[:, [1, 3]] *= h
                    y.append(x)
            # TF segment fixes: export is reversed vs ONNX export and protos are transposed
            if len(y) == 2:  # segment with (det, proto) output order reversed
                if len(y[1].shape) != 4:
                    y = list(reversed(y))  # should be y = (1, 116, 8400), (1, 160, 160, 32)
                y[1] = np.transpose(y[1], (0, 3, 1, 2))  # should be y = (1, 116, 8400), (1, 32, 160, 160)
            y = [x if isinstance(x, np.ndarray) else x.numpy() for x in y]

        # for x in y:
        #     print(type(x), len(x)) if isinstance(x, (list, tuple)) else print(type(x), x.shape)  # debug shapes
        if isinstance(y, (list, tuple)):
            return self.from_numpy(y[0]) if len(y) == 1 else [self.from_numpy(x) for x in y]
        else:
            return self.from_numpy(y)

在这里插入图片描述

至此就修改完成了，可以配置模型开始训练了

五、yaml模型文件

5.1 模型改进⭐

在代码配置完成后，配置模型的YAML文件。

此处以 ultralytics/cfg/models/rt-detr/rtdetr-l.yaml 为例，在同目录下创建一个用于自己数据集训练的模型文件 rtdetr-ShuffleNetV1.yaml 。

将 rtdetr-l.yaml 中的内容复制到 rtdetr-ShuffleNetV1.yaml 文件下，修改 nc 数量等于自己数据中目标的数量。

📌 模型的修改方法是将 骨干网络 替换成 shufflenet_v1_x0_5 。

# Ultralytics YOLO 🚀, AGPL-3.0 license
# RT-DETR-l object detection model with P3-P5 outputs. For details see https://docs.ultralytics.com/models/rtdetr

# Parameters
nc: 1  # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'
  # [depth, width, max_channels]
  l: [1.00, 1.00, 1024]

backbone:
  # [from, repeats, module, args]
  - [-1, 1, shufflenet_v1_x0_5, []]  # 4

head:
  - [-1, 1, Conv, [256, 1, 1, None, 1, 1, False]]  # 5 input_proj.2
  - [-1, 1, AIFI, [1024, 8]] # 6
  - [-1, 1, Conv, [256, 1, 1]]  # 7, Y5, lateral_convs.0

  - [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 8
  - [3, 1, Conv, [256, 1, 1, None, 1, 1, False]]  # 9 input_proj.1
  - [[-2, -1], 1, Concat, [1]] # 10
  - [-1, 3, RepC3, [256]]  # 11, fpn_blocks.0
  - [-1, 1, Conv, [256, 1, 1]]   # 12, Y4, lateral_convs.1

  - [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 13
  - [2, 1, Conv, [256, 1, 1, None, 1, 1, False]]  # 14 input_proj.0
  - [[-2, -1], 1, Concat, [1]]  # 15 cat backbone P4
  - [-1, 3, RepC3, [256]]    # X3 (16), fpn_blocks.1

  - [-1, 1, Conv, [256, 3, 2]]   # 17, downsample_convs.0
  - [[-1, 12], 1, Concat, [1]]  # 18 cat Y4
  - [-1, 3, RepC3, [256]]    # F4 (19), pan_blocks.0

  - [-1, 1, Conv, [256, 3, 2]]   # 20, downsample_convs.1
  - [[-1, 7], 1, Concat, [1]]  # 21 cat Y5
  - [-1, 3, RepC3, [256]]    # F5 (22), pan_blocks.1

  - [[16, 19, 22], 1, RTDETRDecoder, [nc]]  # Detect(P3, P4, P5)

六、成功运行结果

分别打印网络模型可以看到 ShuffleNetV1模块 已经加入到模型中，并可以进行训练了。

rtdetr-ShuffleNetV1 ：

rtdetr-ShuffleNetV1 summary: 559 layers, 19,691,763 parameters, 19,691,763 gradients, 62.0 GFLOPs

                   from  n    params  module                                       arguments                     
  0                  -1  1   1010448  shufflenet_v1_x0_5                           []                            
  1                  -1  1    197120  ultralytics.nn.modules.conv.Conv             [768, 256, 1, 1, None, 1, 1, False]
  2                  -1  1    789760  ultralytics.nn.modules.transformer.AIFI      [256, 1024, 8]                
  3                  -1  1     66048  ultralytics.nn.modules.conv.Conv             [256, 256, 1, 1]              
  4                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
  5                   3  1     98816  ultralytics.nn.modules.conv.Conv             [384, 256, 1, 1, None, 1, 1, False]
  6            [-2, -1]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
  7                  -1  3   2232320  ultralytics.nn.modules.block.RepC3           [512, 256, 3]                 
  8                  -1  1     66048  ultralytics.nn.modules.conv.Conv             [256, 256, 1, 1]              
  9                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']          
 10                   2  1     49664  ultralytics.nn.modules.conv.Conv             [192, 256, 1, 1, None, 1, 1, False]
 11            [-2, -1]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 12                  -1  3   2232320  ultralytics.nn.modules.block.RepC3           [512, 256, 3]                 
 13                  -1  1    590336  ultralytics.nn.modules.conv.Conv             [256, 256, 3, 2]              
 14            [-1, 12]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 15                  -1  3   2232320  ultralytics.nn.modules.block.RepC3           [512, 256, 3]                 
 16                  -1  1    590336  ultralytics.nn.modules.conv.Conv             [256, 256, 3, 2]              
 17             [-1, 7]  1         0  ultralytics.nn.modules.conv.Concat           [1]                           
 18                  -1  3   2232320  ultralytics.nn.modules.block.RepC3           [512, 256, 3]                 
 19        [16, 19, 22]  1   7303907  ultralytics.nn.modules.head.RTDETRDecoder    [1, [256, 256, 256]]          
rtdetr-ShuffleNetV1 summary: 559 layers, 19,691,763 parameters, 19,691,763 gradients, 62.0 GFLOPs

学习资源站

RT-DETR改进策略【模型轻量化】替换轻量化骨干网络：ShuffleNetV1_rtdetr轻量化改进-

RT-DETR改进策略【模型轻量化】| 替换轻量化骨干网络：ShuffleNet V1

一、本文介绍

二、ShuffleNet v1模型轻量化设计

2.1 出发点

2.2 原理

2.2.1 Pointwise Group Convolution（逐点分组卷积）

2.2.2 Channel Shuffle（通道混洗）操作

2.3 结构

2.3.1 ShuffleNet Unit（ShuffleNet单元）

2.3.2 Network Architecture（网络架构）

2.4 优势

三、ShuffleNet v1模块的实现代码

四、修改步骤

4.1 修改一

4.2 修改二

4.3 修改三

五、yaml模型文件

5.1 模型改进⭐

六、成功运行结果