YOLOv5改进系列（5）——替换主干网络之 MobileNetV3

🚀 一、MobileNetV3原理

1.1 MobileNetV3简介

MobileNetV3，是谷歌在2019年3月21日提出的轻量化网络架构，在前两个版本的基础上，加入神经网络架构搜索（NAS）和h-swish激活函数，并引入SE通道注意力机制，性能和速度都表现优异，受到学术界和工业界的追捧。

引用大佬的描述：MobileNet V3 = MobileNet v2 + SE结构 + hard-swish activation +网络结构头尾微调

MobileNetV1&MobileNetV2&MobileNetV3总结

MobileNetV1	MobileNetV2	MobileNetV3
标准卷积改为深度可分离卷积，降低计算量； ReLU改为ReLU6；引入Width Multiplier(α)和Resolution Multiplier(ρ)，调节模型的宽度（卷积核个数）和图像分辨率；	采用线性瓶颈层：将深度可分离卷积中的1×1卷积后的ReLU替换成线性激活函数；采用反向残差结构：引入Expansion layer，在进行深度分离卷积之前首先使用1×1卷积进行升维；引入Shortcut结构，在升维的1×1卷积之前与深度可分离卷积中的1×1卷积之后进行shortcut连接；	采用增加了SE机制的Bottleneck模块结构；使用了一种新的激活函数h-swish(x)替代MobileNetV2中的ReLU6激活函数；网络结构搜索中，结合两种技术：资源受限的NAS（platform-aware NAS）与NetAdapt；修改了MobileNetV2网络端部最后阶段；

1.2 MobileNetV3相关技术

（1）引入MobileNetV1的深度可分离卷积
（2）引入MobileNetV2的具有线性瓶颈的倒残差结构
（3）引入基于squeeze and excitation结构的轻量级注意力模型(SE)
（4）使用了一种新的激活函数h-swish(x)
（5）网络结构搜索中，结合两种技术：资源受限的NAS（platform-aware NAS）与NetAdapt
（6）修改了MobileNetV2网络端部最后阶段

更多介绍，还是看上面的链接吧~

🚀 二、YOLOv5结合MobileNetV3_small

2.1 添加顺序

之前在讲添加注意力机制时我们就介绍过改进网络的顺序，替换主干网络也是大同小异的。
（1）models/common.py --> 加入新增的网络结构

（2） models/yolo.py --> 设定网络结构的传参细节，将MobileNetV3类名加入其中。（当新的自定义模块中存在输入输出维度时，要使用qw调整输出维度）
（3） models/yolov5*.yaml --> 修改现有模型结构配置文件

当引入新的层时，要修改后续的结构中的from参数
当仅替换主千网络时，要注意特征图的变换，/8，/16，/32

（4） train.py --> 修改‘--cfg’默认参数，训练时指定模型结构配置文件

2.2 具体添加步骤

第①步：在common.py中添加MobileNetV3模块

将以下代码复制粘贴到common.py文件的末尾

# Mobilenetv3Small
# ——————MobileNetV3——————
class h_sigmoid(nn.Module):
    def __init__(self, inplace=True):
        super(h_sigmoid, self).__init__()
        self.relu = nn.ReLU6(inplace=inplace)
    def forward(self, x):
        return self.relu(x + 3) / 6
class h_swish(nn.Module):
    def __init__(self, inplace=True):
        super(h_swish, self).__init__()
        self.sigmoid = h_sigmoid(inplace=inplace)
    def forward(self, x):
        return x * self.sigmoid(x)
class SELayer(nn.Module):
    def __init__(self, channel, reduction=4):
        super(SELayer, self).__init__()
        # Squeeze操作
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        # Excitation操作(FC+ReLU+FC+Sigmoid)
        self.fc = nn.Sequential(
            nn.Linear(channel, channel // reduction),
            nn.ReLU(inplace=True),
            nn.Linear(channel // reduction, channel),
            h_sigmoid()
        )
    def forward(self, x):
        b, c, _, _ = x.size()
        y = self.avg_pool(x)
        y = y.view(b, c)
        y = self.fc(y).view(b, c, 1, 1)  # 学习到的每一channel的权重
        return x * y
class conv_bn_hswish(nn.Module):
    """
    This equals to
    def conv_3x3_bn(inp, oup, stride):
        return nn.Sequential(
            nn.Conv2d(inp, oup, 3, stride, 1, bias=False),
            nn.BatchNorm2d(oup),
            h_swish()
        )
    """
    def __init__(self, c1, c2, stride):
        super(conv_bn_hswish, self).__init__()
        self.conv = nn.Conv2d(c1, c2, 3, stride, 1, bias=False)
        self.bn = nn.BatchNorm2d(c2)
        self.act = h_swish()
    def forward(self, x):
        return self.act(self.bn(self.conv(x)))
    def fuseforward(self, x):
        return self.act(self.conv(x))
class MobileNetV3(nn.Module):
    def __init__(self, inp, oup, hidden_dim, kernel_size, stride, use_se, use_hs):
        super(MobileNetV3, self).__init__()
        assert stride in [1, 2]
        self.identity = stride == 1 and inp == oup
        # 输入通道数=扩张通道数 则不进行通道扩张
        if inp == hidden_dim:
            self.conv = nn.Sequential(
                # dw
                nn.Conv2d(hidden_dim, hidden_dim, kernel_size, stride, (kernel_size - 1) // 2, groups=hidden_dim,
                          bias=False),
                nn.BatchNorm2d(hidden_dim),
                h_swish() if use_hs else nn.ReLU(inplace=True),
                # Squeeze-and-Excite
                SELayer(hidden_dim) if use_se else nn.Sequential(),
                # pw-linear
                nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),
                nn.BatchNorm2d(oup),
            )
        else:
            # 否则 先进行通道扩张
            self.conv = nn.Sequential(
                # pw
                nn.Conv2d(inp, hidden_dim, 1, 1, 0, bias=False),
                nn.BatchNorm2d(hidden_dim),
                h_swish() if use_hs else nn.ReLU(inplace=True),
                # dw
                nn.Conv2d(hidden_dim, hidden_dim, kernel_size, stride, (kernel_size - 1) // 2, groups=hidden_dim,
                          bias=False),
                nn.BatchNorm2d(hidden_dim),
                # Squeeze-and-Excite
                SELayer(hidden_dim) if use_se else nn.Sequential(),
                h_swish() if use_hs else nn.ReLU(inplace=True),
                # pw-linear
                nn.Conv2d(hidden_dim, oup, 1, 1, 0, bias=False),
                nn.BatchNorm2d(oup),
            )
    def forward(self, x):
        y = self.conv(x)
        if self.identity:
            return x + y
        else:
            return y

如下图所示：

第②步：在yolo.py文件里的parse_model函数加入类名

首先找到yolo.py里面parse_model函数的这一行

加入h_sigmoid，h_swish，SELayer，conv_bn_hswish，MobileNetV3五个模块

第③步：创建自定义的yaml文件

首先在models文件夹下复制yolov5s.yaml 文件，粘贴并重命名为 yolov5s_MobileNetv3.yaml

然后根据MobileNetv3的网络结构来修改配置文件。

根据网络结构我们可以看出MobileNetV3模块包含六个参数[out_ch, hidden_ch, kernel_size, stride, use_se, use_hs]：

out_ch： 输出通道
hidden_ch： 表示在Inverted residuals中的扩张通道数
kernel_size： 卷积核大小
stride： 步长
use_se： 表示是否使用 SELayer，使用了是1，不使用是0
use_hs： 表示使用 h_swish 还是 ReLU，使用h_swish是1，使用 ReLU是0

修改的时候，需要注意/8，/16，/32等位置特征图的变换

同样的，head部分这几个concat的层也要做修改：

yaml文件修改后代码如下：

# YOLOv5 🚀 by Ultralytics, GPL-3.0 license
# Parameters
nc: 80  # number of classes
depth_multiple: 1.0  # model depth multiple
width_multiple: 1.0  # layer channel multiple
anchors:
  - [10,13, 16,30, 33,23]  # P3/8
  - [30,61, 62,45, 59,119]  # P4/16
  - [116,90, 156,198, 373,326]  # P5/32
   # Mobilenetv3-small backbone
   # MobileNetV3_InvertedResidual [out_ch, hid_ch, k_s, stride, SE, HardSwish]
backbone:
  # [from, number, module, args]
  [[-1, 1, conv_bn_hswish, [16, 2]],             # 0-p1/2   320*320
   [-1, 1, MobileNetV3, [16,  16, 3, 2, 1, 0]],  # 1-p2/4   160*160
   [-1, 1, MobileNetV3, [24,  72, 3, 2, 0, 0]],  # 2-p3/8   80*80
   [-1, 1, MobileNetV3, [24,  88, 3, 1, 0, 0]],  # 3        80*80
   [-1, 1, MobileNetV3, [40,  96, 5, 2, 1, 1]],  # 4-p4/16  40*40
   [-1, 1, MobileNetV3, [40, 240, 5, 1, 1, 1]],  # 5        40*40
   [-1, 1, MobileNetV3, [40, 240, 5, 1, 1, 1]],  # 6        40*40
   [-1, 1, MobileNetV3, [48, 120, 5, 1, 1, 1]],  # 7        40*40
   [-1, 1, MobileNetV3, [48, 144, 5, 1, 1, 1]],  # 8        40*40
   [-1, 1, MobileNetV3, [96, 288, 5, 2, 1, 1]],  # 9-p5/32  20*20
   [-1, 1, MobileNetV3, [96, 576, 5, 1, 1, 1]],  # 10       20*20
   [-1, 1, MobileNetV3, [96, 576, 5, 1, 1, 1]],  # 11       20*20
  ]
# YOLOv5 v6.0 head
head:
  [[-1, 1, Conv, [96, 1, 1]],  # 12                         20*20
   [-1, 1, nn.Upsample, [None, 2, 'nearest']], # 13         40*40
   [[-1, 8], 1, Concat, [1]],  # cat backbone P4            40*40
   [-1, 3, C3, [144, False]],  # 15                         40*40
   [-1, 1, Conv, [144, 1, 1]], # 16                         40*40
   [-1, 1, nn.Upsample, [None, 2, 'nearest']],# 17          80*80
   [[-1, 3], 1, Concat, [1]],  # cat backbone P3            80*80
   [-1, 3, C3, [168, False]],  # 19 (P3/8-small)            80*80
   [-1, 1, Conv, [168, 3, 2]], # 20                         40*40
   [[-1, 16], 1, Concat, [1]], # cat head P4                40*40
   [-1, 3, C3, [312, False]],  # 22 (P4/16-medium)          40*40
   [-1, 1, Conv, [312, 3, 2]], # 23                         20*20
   [[-1, 12], 1, Concat, [1]], # cat head P5                20*20
   [-1, 3, C3, [408, False]],  # 25 (P5/32-large)           20*20
   [[19, 22, 25], 1, Detect, [nc, anchors]],  # Detect(P3, P4, P5)
  ]

第④步：验证是否加入成功

在yolo.py 文件里面配置改为我们刚才自定义的yolov5s_MobileNetv3.yaml

然后运行yolo.py

我们和原始的yolov5s.py进行对比

可以看到替换主干网络为MobileNetV3之后层数变多了，可以学习到更多的特征；参数量由原来的700多万减少为500多万，大幅度减少了；GFLOPs由16.6变为12.2。

第⑤步：修改train.py中 ‘--cfg’默认参数

我们先找到 train.py 文件的parse_opt函数，然后将第二行‘--cfg’的 default改为'models/yolov5s_MobileNetv3.yaml '，然后就可以开始训练啦~

🚀 三、YOLOv5结合MobileNetV3_large

MobileNetV3_large和MobileNetV3_small区别在于yaml文件中head中concat连接不同，深度因子和宽度因子不同。

接下来我们就直接改动yaml的部分，其余参考上面步骤。

第③步：创建自定义的yaml文件

同样，首先在models文件夹下复制yolov5s.yaml 文件，粘贴并重命名为 yolov5s_MobileNetv3_large.yaml

然后根据MobileNetv3的网络结构来修改配置文件。

修改后代码如下：

# Parameters
nc: 20  # number of classes
depth_multiple: 1.0  # model depth multiple
width_multiple: 1.0  # layer channel multiple
anchors:
  - [10,13, 16,30, 33,23]  # P3/8
  - [30,61, 62,45, 59,119]  # P4/16
  - [116,90, 156,198, 373,326]  # P5/32
# YOLOv5 v6.0 backbone
backbone:
  [[-1, 1, conv_bn_hswish, [16, 2]],                  # 0-p1/2
   [-1, 1, MobileNetV3, [ 16,  16, 3, 1, 0, 0]],  # 1-p1/2
   [-1, 1, MobileNetV3, [ 24,  64, 3, 2, 0, 0]],  # 2-p2/4
   [-1, 1, MobileNetV3, [ 24,  72, 3, 1, 0, 0]],  # 3-p2/4
   [-1, 1, MobileNetV3, [ 40,  72, 5, 2, 1, 0]],  # 4-p3/8
   [-1, 1, MobileNetV3, [ 40, 120, 5, 1, 1, 0]],  # 5-p3/8
   [-1, 1, MobileNetV3, [ 40, 120, 5, 1, 1, 0]],  # 6-p3/8
   [-1, 1, MobileNetV3, [ 80, 240, 3, 2, 0, 1]],  # 7-p4/16
   [-1, 1, MobileNetV3, [ 80, 200, 3, 1, 0, 1]],  # 8-p4/16
   [-1, 1, MobileNetV3, [ 80, 184, 3, 1, 0, 1]],  # 9-p4/16
   [-1, 1, MobileNetV3, [ 80, 184, 3, 1, 0, 1]],  # 10-p4/16
   [-1, 1, MobileNetV3, [112, 480, 3, 1, 1, 1]],  # 11-p4/16
   [-1, 1, MobileNetV3, [112, 672, 3, 1, 1, 1]],  # 12-p4/16
   [-1, 1, MobileNetV3, [160, 672, 5, 1, 1, 1]],  # 13-p4/16
   [-1, 1, MobileNetV3, [160, 960, 5, 2, 1, 1]],  # 14-p5/32   原672改为原算法960
   [-1, 1, MobileNetV3, [160, 960, 5, 1, 1, 1]],  # 15-p5/32
  ]
# YOLOv5 v6.0 head
head:
  [ [ -1, 1, Conv, [ 256, 1, 1 ] ],
    [ -1, 1, nn.Upsample, [ None, 2, 'nearest' ] ],
    [ [ -1, 13], 1, Concat, [ 1 ] ],  # cat backbone P4
    [ -1, 1, C3, [ 256, False ] ],  # 13
    [ -1, 1, Conv, [ 128, 1, 1 ] ],
    [ -1, 1, nn.Upsample, [ None, 2, 'nearest' ] ],
    [ [ -1, 6 ], 1, Concat, [ 1 ] ],  # cat backbone P3
    [ -1, 1, C3, [ 128, False ] ],  # 17 (P3/8-small)
    [ -1, 1, Conv, [ 128, 3, 2 ] ],
    [ [ -1, 20 ], 1, Concat, [ 1 ] ],  # cat head P4
    [ -1, 1, C3, [ 256, False ] ],  # 20 (P4/16-medium)
    [ -1, 1, Conv, [ 256, 3, 2 ] ],
    [ [ -1, 16 ], 1, Concat, [ 1 ] ],  # cat head P5
    [ -1, 1, C3, [ 512, False ] ],  # 23 (P5/32-large)
    [ [ 23, 26, 29 ], 1, Detect, [ nc, anchors ] ],  # Detect(P3, P4, P5)
  ]

网络运行结果：

我们可以看到MobileNetV3-large模型比MobileNetV3-small多了更多的MobileNet_Block结构，残差倒置结构中通道数维度也增大了许多，速度比YOLOv5s慢将近一半，但是参数变少，效果介乎MobileNetV3-small和YOLOv5s之间，可以作为模型对比，凸显自己模型优势。

PS：如果训练之后发现掉点纯属正常现象，因为轻量化网络在提速减少计算量的同时会降低精度。

学习资源站

06-替换主干网络之 MobileNetV3_使用mobilenetv3改进yolov5

YOLOv5改进系列（5）——替换主干网络之 MobileNetV3

🚀 一、MobileNetV3原理

1.1 MobileNetV3简介

1.2 MobileNetV3相关技术

🚀 二、YOLOv5结合MobileNetV3_small

2.1 添加顺序

2.2 具体添加步骤

第①步：在common.py中添加MobileNetV3模块

第②步：在yolo.py文件里的parse_model函数加入类名

第③步：创建自定义的yaml文件

第④步：验证是否加入成功

第⑤步：修改train.py中 ‘--cfg’默认参数

🚀 三、YOLOv5结合MobileNetV3_large

第③步：创建自定义的yaml文件