学习资源站

【YOLOv8多模态融合改进】_引入轻量化特征提取模块,解决多模态中的双模型参数量,计算量增加问题(适用不同的轻量化模块)_yolov8轻量化-

【YOLOv8多模态融合改进】| 引入轻量化特征提取模块,解决多模态中的双模型参数量、计算量增加问题(适用不同的轻量化模块)

一、本文介绍

本文记录的是 利用轻量化模块改进 YOLOv8 的多模态目标检测网络模型 。由于多模态模型在训练过程中使用的是两个模型, 整体的参数量计算量相比一般的单模态模型大 ,所以 轻量化也是多模态模型改进过程中常见的一个改进方向 。本文介绍如何使用轻量化模块改进 YOLOv8 中的主要特征提取模块 C2f 。效果如下,也可替换成其它轻量化模块。

模型 参数量(验证) 计算量(验证)
YOLOv8n前期融合 2.7 M 7.6 GFLOPs
轻量后的前期融合 1.8 M( -0.9 5.7 GFLOPs( -1.9
YOLOv8n中期融合 4.0 M 10.1 GFLOPs
轻量后的中期融合 2.8 M( -1.2 7.2 GFLOPs( -2.9
YOLOv8n中-后期融合 4.4 M 12.0 GFLOPs
轻量后的中-后期融合 3.0 M ( -1.4 8.6 GFLOPs( -3.4
YOLOv8n后期融合 5.3 M 13.3 GFLOPs
轻量后的后期融合 3.6 M( -1.6 9.4 GFLOPs( -3.9


二、轻量化模块介绍

本文以EfficientNet中的MBConv为例,介绍其原理,在后续的轻量化过程中,可按照同样的步骤替换成其它轻量化模块。

EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks

2.1 结构组成

  • 逐点卷积(1×1卷积)升维 :首先通过一个1×1的逐点卷积对输入特征图进行通道数扩展。目的是增加特征的维度,为后续的深度可分离卷积提供更多的特征信息,以便更好地提取特征。
  • 深度可分离卷积 :包括深度卷积(Depthwise Convolution)和逐点卷积(Pointwise Convolution)。深度卷积是逐通道进行的卷积运算,每个卷积核负责一个通道,它可以在不增加太多计算量的情况下,提取特征的空间信息。之后的逐点卷积则是在通道维度上对深度卷积产生的特征图进行加权运算,两者结合可有效降低模型的计算量与参数量。
  • SE模块(Squeeze-and-Excitation) :有助于模型在通道维度上对重要的特征信息产生更多的关注。
  • 逐点卷积(1×1卷积)降维 :最后再通过一个1×1的逐点卷积将特征图的通道数恢复到与输入相近的维度,实现特征的融合和压缩,减少模型的参数量和计算量。
  • Shortcut连接 :当输入MBConv结构的特征矩阵与输出的特征矩阵shape相同时存在shortcut连接,将输入直接与经过上述卷积操作后的输出相加,实现特征的复用,有助于解决梯度消失问题,使网络更容易训练。

在这里插入图片描述

2.2 工作原理

  • 特征提取 :先利用1×1卷积升维扩展通道,让网络有更多的维度去学习特征。然后深度可分离卷积中的深度卷积负责提取空间特征,逐点卷积负责融合通道特征。
  • 注意力机制 :SE模块通过对通道特征进行加权,使得网络能够自动关注到更重要的特征通道,抑制不重要的通道,从而提升模型的特征表达能力。
  • 特征融合与输出 :最后的1×1卷积降维将特征进行融合和压缩,得到最终的输出特征图。有shortcut连接时,将输入特征与输出特征相加,使网络能够更好地学习到输入与输出之间的映射关系。

2.3 优势

  • 高效的计算性能 :深度可分离卷积的使用大大减少了计算量,相比传统的卷积操作,能在降低计算成本的同时保持较好的特征提取能力,适用于移动设备等计算资源有限的场景。
  • 强大的特征表达能力 :通过倒置瓶颈结构,先升维再降维,以及SE模块的注意力机制,能够更有效地提取和利用特征,提高模型的准确性和泛化能力。
  • 轻量化模型 :减少了模型的参数量,降低了模型的存储需求和过拟合的风险,使模型更加轻量化,便于部署和应用。

论文: https://arxiv.org/pdf/1905.11946
源码: https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet

三、轻量化改进的实现代码

实现代码如下:

import math
import torch
import torch.nn as nn
from ultralytics.nn.modules.conv import DWConv
from ultralytics.utils.tal import dist2bbox, make_anchors
from ultralytics.utils.torch_utils import fuse_conv_and_bn

class SeBlock(nn.Module):
    def __init__(self, in_channel, reduction=4):
        super().__init__()
        self.Squeeze = nn.AdaptiveAvgPool2d(1)

        self.Excitation = nn.Sequential()
        self.Excitation.add_module('FC1', nn.Conv2d(in_channel, in_channel // reduction, kernel_size=1))  # 1*1卷积与此效果相同
        self.Excitation.add_module('ReLU', nn.ReLU())
        self.Excitation.add_module('FC2', nn.Conv2d(in_channel // reduction, in_channel, kernel_size=1))
        self.Excitation.add_module('Sigmoid', nn.Sigmoid())

    def forward(self, x):
        y = self.Squeeze(x)
        ouput = self.Excitation(y)
        return x*(ouput.expand_as(x))

class drop_connect:
    def __init__(self, drop_connect_rate):
        self.drop_connect_rate = drop_connect_rate

    def forward(self, x, training):
        if not training:
            return x
        keep_prob = 1.0 - self.drop_connect_rate
        batch_size = x.shape[0]
        random_tensor = keep_prob
        random_tensor += torch.rand([batch_size, 1, 1, 1], dtype=x.dtype, device=x.device)
        binary_mask = torch.floor(random_tensor) # 1
        x = (x / keep_prob) * binary_mask
        return x

class stem(nn.Module):
    def __init__(self, c1, c2, act='ReLU6'):
        super().__init__()
        self.conv = nn.Conv2d(c1, c2, kernel_size=3, stride=2, padding=1, bias=False)
        self.bn = nn.BatchNorm2d(num_features=c2)
        if act == 'ReLU6':
            self.act = nn.ReLU6(inplace=True)

    def forward(self, x):
        return self.act(self.bn(self.conv(x)))

class MBConvBlock(nn.Module):
    def __init__(self, inp, final_oup, k=3, s=1, expand_ratio=1, drop_connect_rate=0.057, has_se=False):
        super(MBConvBlock, self).__init__()

        self._momentum = 0.01
        self._epsilon = 1e-3
        self.input_filters = inp
        self.output_filters = final_oup
        self.stride = s
        self.expand_ratio = expand_ratio
        self.has_se = has_se
        self.id_skip = True  # skip connection and drop connect
        se_ratio = 0.25

        # Expansion phase
        oup = inp * expand_ratio  # number of output channels
        if expand_ratio != 1:
            self._expand_conv = nn.Conv2d(in_channels=inp, out_channels=oup, kernel_size=1, bias=False)
            self._bn0 = nn.BatchNorm2d(num_features=oup, momentum=self._momentum, eps=self._epsilon)

        # Depthwise convolution phase
        self._depthwise_conv = nn.Conv2d(
            in_channels=oup, out_channels=oup, groups=oup,  # groups makes it depthwise
            kernel_size=k, padding=(k - 1) // 2, stride=s, bias=False)
        self._bn1 = nn.BatchNorm2d(num_features=oup, momentum=self._momentum, eps=self._epsilon)

        # Squeeze and Excitation layer, if desired
        if self.has_se:
            num_squeezed_channels = max(1, int(inp * se_ratio))
            self.se = SeBlock(oup, 4)

        # Output phase
        self._project_conv = nn.Conv2d(in_channels=oup, out_channels=final_oup, kernel_size=1, bias=False)
        self._bn2 = nn.BatchNorm2d(num_features=final_oup, momentum=self._momentum, eps=self._epsilon)
        self._relu = nn.ReLU6(inplace=True)

        self.drop_connect = drop_connect(drop_connect_rate)

    def forward(self, x, drop_connect_rate=None):
        """
        :param x: input tensor
        :param drop_connect_rate: drop connect rate (float, between 0 and 1)
        :return: output of block
        """

        # Expansion and Depthwise Convolution
        identity = x
        if self.expand_ratio != 1:
            x = self._relu(self._bn0(self._expand_conv(x)))
        x = self._relu(self._bn1(self._depthwise_conv(x)))

        # Squeeze and Excitation
        if self.has_se:
            x = self.se(x)

        x = self._bn2(self._project_conv(x))

        # Skip connection and drop connect
        if self.id_skip and self.stride == 1  and self.input_filters == self.output_filters:
            if drop_connect_rate:
                x = self.drop_connect(x, training=self.training)
            x += identity  # skip connection
        return x

def autopad(k, p=None, d=1):  # kernel, padding, dilation
    """Pad to 'same' shape outputs."""
    if d > 1:
        k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k]  # actual kernel-size
    if p is None:
        p = k // 2 if isinstance(k, int) else [x // 2 for x in k]  # auto-pad
    return p

class Conv(nn.Module):
    """Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)."""
    default_act = nn.SiLU()  # default activation

    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):
        """Initialize Conv layer with given arguments including activation."""
        super().__init__()
        self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)
        self.bn = nn.BatchNorm2d(c2)
        self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()

    def forward(self, x):
        """Apply convolution, batch normalization and activation to input tensor."""
        return self.act(self.bn(self.conv(x)))

    def forward_fuse(self, x):
        """Perform transposed convolution of 2D data."""
        return self.act(self.conv(x))

class Bottleneck(nn.Module):
    """Standard bottleneck."""

    def __init__(self, c1, c2, shortcut=True, g=1, k=(3, 3), e=0.5):
        """Initializes a standard bottleneck module with optional shortcut connection and configurable parameters."""
        super().__init__()
        c_ = int(c2 * e)  # hidden channels
        self.cv1 = Conv(c1, c_, k[0], 1)
        self.cv2 = Conv(c_, c2, k[1], 1, g=g)
        self.add = shortcut and c1 == c2

    def forward(self, x):
        """Applies the YOLO FPN to input data."""
        return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))

class C2f(nn.Module):
    """Faster Implementation of CSP Bottleneck with 2 convolutions."""

    def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5):
        """Initializes a CSP bottleneck with 2 convolutions and n Bottleneck blocks for faster processing."""
        super().__init__()
        self.c = int(c2 * e)  # hidden channels
        self.cv1 = Conv(c1, 2 * self.c, 1, 1)
        self.cv2 = Conv((2 + n) * self.c, c2, 1)  # optional act=FReLU(c2)
        self.m = nn.ModuleList(Bottleneck(self.c, self.c, shortcut, g, k=((3, 3), (3, 3)), e=1.0) for _ in range(n))

    def forward(self, x):
        """Forward pass through C2f layer."""
        y = list(self.cv1(x).chunk(2, 1))
        y.extend(m(y[-1]) for m in self.m)
        return self.cv2(torch.cat(y, 1))

    def forward_split(self, x):
        """Forward pass using split() instead of chunk()."""
        y = list(self.cv1(x).split((self.c, self.c), 1))
        y.extend(m(y[-1]) for m in self.m)
        return self.cv2(torch.cat(y, 1))

class C3(nn.Module):
    """CSP Bottleneck with 3 convolutions."""

    def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):
        """Initialize the CSP Bottleneck with given channels, number, shortcut, groups, and expansion values."""
        super().__init__()
        c_ = int(c2 * e)  # hidden channels
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = Conv(c1, c_, 1, 1)
        self.cv3 = Conv(2 * c_, c2, 1)  # optional act=FReLU(c2)
        self.m = nn.Sequential(*(Bottleneck(c_, c_, shortcut, g, k=((1, 1), (3, 3)), e=1.0) for _ in range(n)))

    def forward(self, x):
        """Forward pass through the CSP bottleneck with 2 convolutions."""
        return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), 1))

class C3k(C3):
    """C3k is a CSP bottleneck module with customizable kernel sizes for feature extraction in neural networks."""

    def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5, k=3):
        """Initializes the C3k module with specified channels, number of layers, and configurations."""
        super().__init__(c1, c2, n, shortcut, g, e)
        c_ = int(c2 * e)  # hidden channels
        # self.m = nn.Sequential(*(RepBottleneck(c_, c_, shortcut, g, k=(k, k), e=1.0) for _ in range(n)))
        self.m = nn.Sequential(*(Bottleneck(c_, c_, shortcut, g, k=(k, k), e=1.0) for _ in range(n)))

class C3k2_MBConv(C2f):
    """Faster Implementation of CSP Bottleneck with 2 convolutions."""

    def __init__(self, c1, c2, n=1, c3k=False, e=0.5, g=1, shortcut=True):
        """Initializes the C3k2 module, a faster CSP Bottleneck with 2 convolutions and optional C3k blocks."""
        super().__init__(c1, c2, n, shortcut, g, e)
        self.m = nn.ModuleList(
            C3k(self.c, self.c, 2, shortcut, g) if c3k else MBConvBlock(self.c, self.c) for _ in range(n)
        )

class A2C2f_MBConv(nn.Module):
    """
    Area-Attention C2f module for enhanced feature extraction with area-based attention mechanisms.

    This module extends the C2f architecture by incorporating area-attention and ABlock layers for improved feature
    processing. It supports both area-attention and standard convolution modes.

    Attributes:
        cv1 (Conv): Initial 1x1 convolution layer that reduces input channels to hidden channels.
        cv2 (Conv): Final 1x1 convolution layer that processes concatenated features.
        gamma (nn.Parameter | None): Learnable parameter for residual scaling when using area attention.
        m (nn.ModuleList): List of either ABlock or C3k modules for feature processing.

    Methods:
        forward: Processes input through area-attention or standard convolution pathway.

    Examples:
        >>> m = A2C2f(512, 512, n=1, a2=True, area=1)
        >>> x = torch.randn(1, 512, 32, 32)
        >>> output = m(x)
        >>> print(output.shape)
        torch.Size([1, 512, 32, 32])
    """

    def __init__(self, c1, c2, n=1, a2=True, area=1, residual=False, mlp_ratio=2.0, e=0.5, g=1, shortcut=True):
        """
        Initialize Area-Attention C2f module.

        Args:
            c1 (int): Number of input channels.
            c2 (int): Number of output channels.
            n (int): Number of ABlock or C3k modules to stack.
            a2 (bool): Whether to use area attention blocks. If False, uses C3k blocks instead.
            area (int): Number of areas the feature map is divided.
            residual (bool): Whether to use residual connections with learnable gamma parameter.
            mlp_ratio (float): Expansion ratio for MLP hidden dimension.
            e (float): Channel expansion ratio for hidden channels.
            g (int): Number of groups for grouped convolutions.
            shortcut (bool): Whether to use shortcut connections in C3k blocks.
        """
        super().__init__()
        c_ = int(c2 * e)  # hidden channels
        assert c_ % 32 == 0, "Dimension of ABlock be a multiple of 32."

        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = Conv((1 + n) * c_, c2, 1)

        self.gamma = nn.Parameter(0.01 * torch.ones(c2), requires_grad=True) if a2 and residual else None
        self.m = nn.ModuleList(
            nn.Sequential(*(MBConvBlock(c_, c_) for _ in range(2)))
            if a2
            else C3k(c_, c_, 2, shortcut, g)
            for _ in range(n)
        )

    def forward(self, x):
        """
        Forward pass through A2C2f layer.

        Args:
            x (torch.Tensor): Input tensor.

        Returns:
            (torch.Tensor): Output tensor after processing.
        """
        y = [self.cv1(x)]
        y.extend(m(y[-1]) for m in self.m)
        y = self.cv2(torch.cat(y, 1))
        if self.gamma is not None:
            return x + self.gamma.view(-1, len(self.gamma), 1, 1) * y
        return y

class C2f_MBConv(nn.Module):
    """Faster Implementation of CSP Bottleneck with 2 convolutions."""

    def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5):
        """Initializes a CSP bottleneck with 2 convolutions and n Bottleneck blocks for faster processing."""
        super().__init__()
        self.c = int(c2 * e)  # hidden channels
        self.cv1 = Conv(c1, 2 * self.c, 1, 1)
        self.cv2 = Conv((2 + n) * self.c, c2, 1)  # optional act=FReLU(c2)
        self.m = nn.ModuleList(MBConvBlock(self.c, self.c) for _ in range(n))

    def forward(self, x):
        """Forward pass through C2f layer."""
        y = list(self.cv1(x).chunk(2, 1))
        y.extend(m(y[-1]) for m in self.m)
        return self.cv2(torch.cat(y, 1))

    def forward_split(self, x):
        """Forward pass using split() instead of chunk()."""
        y = list(self.cv1(x).split((self.c, self.c), 1))
        y.extend(m(y[-1]) for m in self.m)
        return self.cv2(torch.cat(y, 1))

class RepVGGDW(torch.nn.Module):
    """RepVGGDW is a class that represents a depth wise separable convolutional block in RepVGG architecture."""

    def __init__(self, ed) -> None:
        """Initializes RepVGGDW with depthwise separable convolutional layers for efficient processing."""
        super().__init__()
        self.conv = Conv(ed, ed, 7, 1, 3, g=ed, act=False)
        self.conv1 = Conv(ed, ed, 3, 1, 1, g=ed, act=False)
        self.dim = ed
        self.act = nn.SiLU()

    def forward(self, x):
        """
        Performs a forward pass of the RepVGGDW block.

        Args:
            x (torch.Tensor): Input tensor.

        Returns:
            (torch.Tensor): Output tensor after applying the depth wise separable convolution.
        """
        return self.act(self.conv(x) + self.conv1(x))

    def forward_fuse(self, x):
        """
        Performs a forward pass of the RepVGGDW block without fusing the convolutions.

        Args:
            x (torch.Tensor): Input tensor.

        Returns:
            (torch.Tensor): Output tensor after applying the depth wise separable convolution.
        """
        return self.act(self.conv(x))

    @torch.no_grad()
    def fuse(self):
        """
        Fuses the convolutional layers in the RepVGGDW block.

        This method fuses the convolutional layers and updates the weights and biases accordingly.
        """
        conv = fuse_conv_and_bn(self.conv.conv, self.conv.bn)
        conv1 = fuse_conv_and_bn(self.conv1.conv, self.conv1.bn)

        conv_w = conv.weight
        conv_b = conv.bias
        conv1_w = conv1.weight
        conv1_b = conv1.bias

        conv1_w = torch.nn.functional.pad(conv1_w, [2, 2, 2, 2])

        final_conv_w = conv_w + conv1_w
        final_conv_b = conv_b + conv1_b

        conv.weight.data.copy_(final_conv_w)
        conv.bias.data.copy_(final_conv_b)

        self.conv = conv
        del self.conv1

class CIB(nn.Module):
    """
    Conditional Identity Block (CIB) module.

    Args:
        c1 (int): Number of input channels.
        c2 (int): Number of output channels.
        shortcut (bool, optional): Whether to add a shortcut connection. Defaults to True.
        e (float, optional): Scaling factor for the hidden channels. Defaults to 0.5.
        lk (bool, optional): Whether to use RepVGGDW for the third convolutional layer. Defaults to False.
    """

    def __init__(self, c1, c2, shortcut=True, e=0.5, lk=False):
        """Initializes the custom model with optional shortcut, scaling factor, and RepVGGDW layer."""
        super().__init__()
        c_ = int(c2 * e)  # hidden channels
        self.cv1 = nn.Sequential(
            Conv(c1, c1, 3, g=c1),
            Conv(c1, 2 * c_, 1),
            RepVGGDW(2 * c_) if lk else Conv(2 * c_, 2 * c_, 3, g=2 * c_),
            Conv(2 * c_, c2, 1),
            Conv(c2, c2, 3, g=c2),
        )

        self.add = shortcut and c1 == c2

    def forward(self, x):
        """
        Forward pass of the CIB module.

        Args:
            x (torch.Tensor): Input tensor.

        Returns:
            (torch.Tensor): Output tensor.
        """
        return x + self.cv1(x) if self.add else self.cv1(x)

class C2fCIB_MBConv(C2f_MBConv):
    """
    C2fCIB class represents a convolutional block with C2f and CIB modules.

    Args:
        c1 (int): Number of input channels.
        c2 (int): Number of output channels.
        n (int, optional): Number of CIB modules to stack. Defaults to 1.
        shortcut (bool, optional): Whether to use shortcut connection. Defaults to False.
        lk (bool, optional): Whether to use local key connection. Defaults to False.
        g (int, optional): Number of groups for grouped convolution. Defaults to 1.
        e (float, optional): Expansion ratio for CIB modules. Defaults to 0.5.
    """

    def __init__(self, c1, c2, n=1, shortcut=False, lk=False, g=1, e=0.5):
        """Initializes the module with specified parameters for channel, shortcut, local key, groups, and expansion."""
        super().__init__(c1, c2, n, shortcut, g, e)
        self.m = nn.ModuleList(CIB(self.c, self.c, shortcut, e=1.0, lk=lk) for _ in range(n))

四、改进点

YOLOv8 中的 C2f模块 进行改进,并将 MBConv 在加入到 C2f 模块中。 (第五节讲解添加步骤)

改进代码如下:

C2f 模块进行改进,加入 MBConv模块 。,替换原本的 Bottleneck

class C2f_MBConv(nn.Module):
    """Faster Implementation of CSP Bottleneck with 2 convolutions."""

    def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5):
        """Initializes a CSP bottleneck with 2 convolutions and n Bottleneck blocks for faster processing."""
        super().__init__()
        self.c = int(c2 * e)  # hidden channels
        self.cv1 = Conv(c1, 2 * self.c, 1, 1)
        self.cv2 = Conv((2 + n) * self.c, c2, 1)  # optional act=FReLU(c2)
        self.m = nn.ModuleList(MBConvBlock(self.c, self.c) for _ in range(n))

    def forward(self, x):
        """Forward pass through C2f layer."""
        y = list(self.cv1(x).chunk(2, 1))
        y.extend(m(y[-1]) for m in self.m)
        return self.cv2(torch.cat(y, 1))

    def forward_split(self, x):
        """Forward pass using split() instead of chunk()."""
        y = list(self.cv1(x).split((self.c, self.c), 1))
        y.extend(m(y[-1]) for m in self.m)
        return self.cv2(torch.cat(y, 1))

在这里插入图片描述

五、配置步骤

5.1 修改一

① 在 ultralytics/nn/ 目录下新建 AddModules 文件夹用于存放模块代码(提供的项目包里已存在 AddModules 文件夹)

② 在 AddModules 文件夹下新建 MBConv.py ,将 第三节 中的代码粘贴到此处

在这里插入图片描述

5.2 修改二

AddModules 文件夹下新建 __init__.py (提供的项目包里已存在 __init__.py ),在文件内导入模块: from .MBConv import *

在这里插入图片描述

5.3 修改三

ultralytics/nn/modules/tasks.py 文件中,需要在两处位置添加各模块类名称。

首先:导入模块

在这里插入图片描述

其次:在 parse_model函数 中注册 C2f_MBConv 模块

在这里插入图片描述

在这里插入图片描述


六、yaml模型文件

!!! 获取的项目包就已经把相关的多模态输入、训练等改动都已经配好了,只需要新建模型yaml文件,粘贴对应的模型,进行训练即可。 项目包获取及使用教程可参考链接: 《YOLO系列模型的多模态项目》配置使用教程

在什么地方新建,n,s,m,l,x,用哪个版本按自己的需求来即可,和普通的训练步骤一致。

除了模型结构方面的改动,在yaml文件中还传入了一个通道数 ch: 6 表示传入的是双模态,6通道 ,前三个是可见光,后三个是红外。
在default.yaml中也配置了这个参数。

下面便是在前期、中期、中后期、后期融合中添加 C2f_MBConv

6.1 前期融合

# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect

# Parameters
ch: 6
nc: 1  # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
  # [depth, width, max_channels]
  n: [0.33, 0.25, 1024]  # YOLOv8n summary: 225 layers,  3157200 parameters,  3157184 gradients,   8.9 GFLOPs
  s: [0.33, 0.50, 1024]  # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients,  28.8 GFLOPs
  m: [0.67, 0.75, 768]   # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients,  79.3 GFLOPs
  l: [1.00, 1.00, 512]   # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPs
  x: [1.00, 1.25, 512]   # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs

# YOLOv8.0n backbone
backbone:
  # [from, repeats, module, args]
  - [-1, 1, MF, [64]]  # 0
  - [-1, 1, Conv, [64, 3, 2]]  # 1-P1/2
  - [-1, 1, Conv, [128, 3, 2]]  # 2-P2/4
  - [-1, 3, C2f_MBConv, [128, True]]
  - [-1, 1, Conv, [256, 3, 2]]  # 4-P3/8
  - [-1, 6, C2f_MBConv, [256, True]]
  - [-1, 1, Conv, [512, 3, 2]]  # 6-P4/16
  - [-1, 6, C2f_MBConv, [512, True]]
  - [-1, 1, Conv, [1024, 3, 2]]  # 8-P5/32
  - [-1, 3, C2f_MBConv, [1024, True]]
  - [-1, 1, SPPF, [1024, 5]]  # 10

# YOLOv8.0n head
head:
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
  - [[-1, 7], 1, Concat, [1]]  # cat backbone P4
  - [-1, 3, C2f_MBConv, [512]]  # 13

  - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
  - [[-1, 5], 1, Concat, [1]]  # cat backbone P3
  - [-1, 3, C2f_MBConv, [256]]  # 16 (P3/8-small)

  - [-1, 1, Conv, [256, 3, 2]]
  - [[-1, 13], 1, Concat, [1]]  # cat head P4
  - [-1, 3, C2f_MBConv, [512]]  # 19 (P4/16-medium)

  - [-1, 1, Conv, [512, 3, 2]]
  - [[-1, 10], 1, Concat, [1]]  # cat head P5
  - [-1, 3, C2f_MBConv, [1024]]  # 22 (P5/32-large)

  - [[16, 19, 22], 1, Detect, [nc]]  # Detect(P3, P4, P5)

6.2 中期融合

# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect

# Parameters
ch: 6
nc: 1  # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
  # [depth, width, max_channels]
   n: [0.33, 0.25, 1024]  # YOLOv8n summary: 225 layers,  3157200 parameters,  3157184 gradients,   8.9 GFLOPs
   s: [0.33, 0.50, 1024]  # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients,  28.8 GFLOPs
   m: [0.67, 0.75, 768]   # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients,  79.3 GFLOPs
   l: [1.00, 1.00, 512]   # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPs
   x: [1.00, 1.25, 512]   # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs

# YOLOv8.0n backbone
backbone:
  # [from, repeats, module, args]
  - [-1, 1, IN, []]  # 0
  - [-1, 1, Multiin, [1]]  # 1
  - [-2, 1, Multiin, [2]]  # 2

  - [1, 1, Conv, [64, 3, 2]] # 3-P1/2
  - [-1, 1, Conv, [128, 3, 2]] # 4-P2/4
  - [-1, 3, C2f_MBConv, [128, True]]
  - [-1, 1, Conv, [256, 3, 2]] # 6-P3/8
  - [-1, 6, C2f_MBConv, [256, True]]
  - [-1, 1, Conv, [512, 3, 2]] # 8-P4/16
  - [-1, 6, C2f_MBConv, [512, True]]
  - [-1, 1, Conv, [1024, 3, 2]] # 10-P5/32
  - [-1, 3, C2f_MBConv, [1024, True]]

  - [2, 1, Conv, [64, 3, 2]] # 12-P1/2
  - [-1, 1, Conv, [128, 3, 2]] # 13-P2/4
  - [-1, 3, C2f_MBConv, [128, True]]
  - [-1, 1, Conv, [256, 3, 2]] # 15-P3/8
  - [-1, 6, C2f_MBConv, [256, True]]
  - [-1, 1, Conv, [512, 3, 2]] # 17-P4/16
  - [-1, 6, C2f_MBConv, [512, True]]
  - [-1, 1, Conv, [1024, 3, 2]] # 19-P5/32
  - [-1, 3, C2f_MBConv, [1024, True]]

  - [[7, 16], 1, Concat, [1]]  # 21 cat backbone P3
  - [[9, 18], 1, Concat, [1]]  # 22 cat backbone P4
  - [[11, 20], 1, Concat, [1]]  # 23 cat backbone P5

  - [-1, 1, SPPF, [1024, 5]] # 24

 # YOLOv8.0n head
head:
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
  - [[-1, 22], 1, Concat, [1]]  # cat backbone P4
  - [-1, 3, C2f_MBConv, [512]]  # 27

  - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
  - [[-1, 21], 1, Concat, [1]]  # cat backbone P3
  - [-1, 3, C2f_MBConv, [256]]  # 30 (P3/8-small)

  - [-1, 1, Conv, [256, 3, 2]]
  - [[-1, 27], 1, Concat, [1]]  # cat head P4
  - [-1, 3, C2f_MBConv, [512]]  # 33 (P4/16-medium)

  - [-1, 1, Conv, [512, 3, 2]]
  - [[-1, 24], 1, Concat, [1]]  # cat head P5
  - [-1, 3, C2f_MBConv, [1024]]  # 36 (P5/32-large)

  - [[30, 33, 36], 1, Detect, [nc]]  # Detect(P3, P4, P5)

6.3 中-后期融合

# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect

# Parameters
ch: 6
nc: 1  # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
  # [depth, width, max_channels]
   n: [0.33, 0.25, 1024]  # YOLOv8n summary: 225 layers,  3157200 parameters,  3157184 gradients,   8.9 GFLOPs
   s: [0.33, 0.50, 1024]  # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients,  28.8 GFLOPs
   m: [0.67, 0.75, 768]   # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients,  79.3 GFLOPs
   l: [1.00, 1.00, 512]   # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPs
   x: [1.00, 1.25, 512]   # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs

# YOLOv8.0n backbone
backbone:
  # [from, repeats, module, args]
  - [-1, 1, IN, []]  # 0
  - [-1, 1, Multiin, [1]]  # 1
  - [-2, 1, Multiin, [2]]  # 2

  - [1, 1, Conv, [64, 3, 2]] # 3-P1/2
  - [-1, 1, Conv, [128, 3, 2]] # 4-P2/4
  - [-1, 3, C2f_MBConv, [128, True]]
  - [-1, 1, Conv, [256, 3, 2]] # 6-P3/8
  - [-1, 6, C2f_MBConv, [256, True]]
  - [-1, 1, Conv, [512, 3, 2]] # 8-P4/16
  - [-1, 6, C2f_MBConv, [512, True]]
  - [-1, 1, Conv, [1024, 3, 2]] # 10-P5/32
  - [-1, 3, C2f_MBConv, [1024, True]]
  - [-1, 1, SPPF, [1024, 5]] # 12

  - [2, 1, Conv, [64, 3, 2]] # 13-P1/2
  - [-1, 1, Conv, [128, 3, 2]] # 14-P2/4
  - [-1, 3, C2f_MBConv, [128, True]]
  - [-1, 1, Conv, [256, 3, 2]] # 16-P3/8
  - [-1, 6, C2f_MBConv, [256, True]]
  - [-1, 1, Conv, [512, 3, 2]] # 18-P4/16
  - [-1, 6, C2f_MBConv, [512, True]]
  - [-1, 1, Conv, [1024, 3, 2]] # 20-P5/32
  - [-1, 3, C2f_MBConv, [1024, True]]
  - [-1, 1, SPPF, [1024, 5]] # 22

 # YOLOv8.0n head
head:
  - [12, 1, nn.Upsample, [None, 2, 'nearest']]
  - [[-1, 9], 1, Concat, [1]]  # cat backbone P4
  - [-1, 3, C2f_MBConv, [512]]  # 25

  - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
  - [[-1, 7], 1, Concat, [1]]  # cat backbone P3
  - [-1, 3, C2f_MBConv, [256]]  # 28 (P3/8-small)

  - [22, 1, nn.Upsample, [None, 2, 'nearest']]
  - [[-1, 19], 1, Concat, [1]]  # cat backbone P4
  - [-1, 3, C2f_MBConv, [512]]  # 31

  - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
  - [[-1, 17], 1, Concat, [1]]  # cat backbone P3
  - [-1, 3, C2f_MBConv, [256]]  # 34 (P3/8-small)

  - [ [ 12, 22 ], 1, Concat, [ 1 ] ]  # cat head P3  35
  - [ [ 25, 31 ], 1, Concat, [ 1 ] ]  # cat head P4  36
  - [ [ 28, 34 ], 1, Concat, [ 1 ] ]  # cat head P5  37

  - [37, 1, Conv, [256, 3, 2]]
  - [[-1, 36], 1, Concat, [1]]  # cat head P4
  - [-1, 3, C2f_MBConv, [512]]  # 40 (P4/16-medium)

  - [-1, 1, Conv, [512, 3, 2]]
  - [[-1, 35], 1, Concat, [1]]  # cat head P5
  - [-1, 3, C2f_MBConv, [1024]]  # 43 (P5/32-large)

  - [[37, 40, 43], 1, Detect, [nc]]  # Detect(P3, P4, P5)

6.4 后期融合

# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect

# Parameters
ch: 6
nc: 1  # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
  # [depth, width, max_channels]
   n: [0.33, 0.25, 1024]  # YOLOv8n summary: 225 layers,  3157200 parameters,  3157184 gradients,   8.9 GFLOPs
   s: [0.33, 0.50, 1024]  # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients,  28.8 GFLOPs
   m: [0.67, 0.75, 768]   # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients,  79.3 GFLOPs
   l: [1.00, 1.00, 512]   # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPs
   x: [1.00, 1.25, 512]   # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs

# YOLOv8.0n backbone
backbone:
  # [from, repeats, module, args]
  - [-1, 1, IN, []]  # 0
  - [-1, 1, Multiin, [1]]  # 1
  - [-2, 1, Multiin, [2]]  # 2

  - [1, 1, Conv, [64, 3, 2]] # 3-P1/2
  - [-1, 1, Conv, [128, 3, 2]] # 4-P2/4
  - [-1, 3, C2f_MBConv, [128, True]]
  - [-1, 1, Conv, [256, 3, 2]] # 6-P3/8
  - [-1, 6, C2f_MBConv, [256, True]]
  - [-1, 1, Conv, [512, 3, 2]] # 8-P4/16
  - [-1, 6, C2f_MBConv, [512, True]]
  - [-1, 1, Conv, [1024, 3, 2]] # 10-P5/32
  - [-1, 3, C2f_MBConv, [1024, True]]
  - [-1, 1, SPPF, [1024, 5]] # 12

  - [2, 1, Conv, [64, 3, 2]] # 13-P1/2
  - [-1, 1, Conv, [128, 3, 2]] # 14-P2/4
  - [-1, 3, C2f_MBConv, [128, True]]
  - [-1, 1, Conv, [256, 3, 2]] # 16-P3/8
  - [-1, 6, C2f_MBConv, [256, True]]
  - [-1, 1, Conv, [512, 3, 2]] # 18-P4/16
  - [-1, 6, C2f_MBConv, [512, True]]
  - [-1, 1, Conv, [1024, 3, 2]] # 20-P5/32
  - [-1, 3, C2f_MBConv, [1024, True]]
  - [-1, 1, SPPF, [1024, 5]] # 22

 # YOLOv8.0n head
head:
  - [12, 1, nn.Upsample, [None, 2, 'nearest']]
  - [[-1, 9], 1, Concat, [1] ]  # cat backbone P4
  - [-1, 3, C2f_MBConv, [512]]  # 25

  - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
  - [[ -1, 7], 1, Concat, [1]]  # cat backbone P3
  - [-1, 3, C2f_MBConv, [256]]  # 28 (P3/8-small)

  - [-1, 1, Conv, [256, 3, 2]]
  - [[-1, 25], 1, Concat, [1]]  # cat head P4
  - [-1, 3, C2f_MBConv, [512]]  # 31 (P4/16-medium)

  - [-1, 1, Conv, [512, 3, 2]]
  - [[-1, 12], 1, Concat, [1]]  # cat head P5
  - [-1, 3, C2f_MBConv, [1024]]  # 34 (P5/32-large)

  - [22, 1, nn.Upsample, [None, 2, 'nearest']]
  - [[-1, 19], 1, Concat, [1]]  # cat backbone P4
  - [-1, 3, C2f_MBConv, [512]]  # 37

  - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
  - [[ -1, 17 ], 1, Concat, [1]]  # cat backbone P3
  - [-1, 3, C2f_MBConv, [256]]  # 40 (P3/8-small)

  - [-1, 1, Conv, [256, 3, 2]]
  - [[-1, 37], 1, Concat, [1]]  # cat head P4
  - [-1, 3, C2f_MBConv, [512]]  # 43 (P4/16-medium)

  - [-1, 1, Conv, [512, 3, 2]]
  - [[-1, 22], 1, Concat, [1]]  # cat head P5
  - [-1, 3, C2f_MBConv, [1024]]  # 46 (P5/32-large)

  - [[28, 40], 1, Concat, [1]]  # cat head P5  47
  - [[31, 43], 1, Concat, [1]]  # cat head P5  48
  - [[34, 46], 1, Concat, [1]]  # cat head P5  49

  - [[47, 48, 49], 1, Detect, [nc]]  # Detect(P3, P4, P5)

七、成功运行结果

前期融合结果: 可以看到输入的通道数为6,表明可见光图像和红外图像均输入到了模型中进行融合训练。

YOLOv8-early-MBConv summary: 271 layers, 1,847,931 parameters, 1,847,915 gradients, 5.7 GFLOPs

                   from  n    params  module                                       arguments
  0                  -1  1       472  ultralytics.nn.AddModules.multimodal.MF      [6, 16]
  1                  -1  1      2336  ultralytics.nn.modules.conv.Conv             [16, 16, 3, 2]
  2                  -1  1      4672  ultralytics.nn.modules.conv.Conv             [16, 32, 3, 2]
  3                  -1  1      3152  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [32, 32, True]
  4                  -1  1     18560  ultralytics.nn.modules.conv.Conv             [32, 64, 3, 2]
  5                  -1  2     23872  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [64, 64, True]
  6                  -1  1     73984  ultralytics.nn.modules.conv.Conv             [64, 128, 3, 2]
  7                  -1  2     92800  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [128, 128, True]
  8                  -1  1    295424  ultralytics.nn.modules.conv.Conv             [128, 256, 3, 2]
  9                  -1  1    182912  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [256, 256, True]
 10                  -1  1    164608  ultralytics.nn.modules.block.SPPF            [256, 256, 5]
 11                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']
 12             [-1, 7]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 13                  -1  1     79168  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [384, 128]
 14                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']
 15             [-1, 5]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 16                  -1  1     20128  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [192, 64]
 17                  -1  1     36992  ultralytics.nn.modules.conv.Conv             [64, 64, 3, 2]
 18            [-1, 13]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 19                  -1  1     54592  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [192, 128]
 20                  -1  1    147712  ultralytics.nn.modules.conv.Conv             [128, 128, 3, 2]
 21            [-1, 10]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 22                  -1  1    215680  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [384, 256]
 23        [16, 19, 22]  1    430867  ultralytics.nn.modules.head.Detect           [1, [64, 128, 256]]
YOLOv8-early-MBConv summary: 271 layers, 1,847,931 parameters, 1,847,915 gradients, 5.7 GFLOPs

中期融合结果:

YOLOv8-mid-MBConv summary: 365 layers, 2,791,539 parameters, 2,791,523 gradients, 7.2 GFLOPs

                   from  n    params  module                                       arguments
  0                  -1  1         0  ultralytics.nn.AddModules.multimodal.IN      []
  1                  -1  1         0  ultralytics.nn.AddModules.multimodal.Multiin [1]
  2                  -2  1         0  ultralytics.nn.AddModules.multimodal.Multiin [2]
  3                   1  1       464  ultralytics.nn.modules.conv.Conv             [3, 16, 3, 2]
  4                  -1  1      4672  ultralytics.nn.modules.conv.Conv             [16, 32, 3, 2]
  5                  -1  1      3152  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [32, 32, True]
  6                  -1  1     18560  ultralytics.nn.modules.conv.Conv             [32, 64, 3, 2]
  7                  -1  2     23872  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [64, 64, True]
  8                  -1  1     73984  ultralytics.nn.modules.conv.Conv             [64, 128, 3, 2]
  9                  -1  2     92800  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [128, 128, True]
 10                  -1  1    295424  ultralytics.nn.modules.conv.Conv             [128, 256, 3, 2]
 11                  -1  1    182912  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [256, 256, True]
 12                   2  1       464  ultralytics.nn.modules.conv.Conv             [3, 16, 3, 2]
 13                  -1  1      4672  ultralytics.nn.modules.conv.Conv             [16, 32, 3, 2]
 14                  -1  1      3152  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [32, 32, True]
 15                  -1  1     18560  ultralytics.nn.modules.conv.Conv             [32, 64, 3, 2]
 16                  -1  2     23872  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [64, 64, True]
 17                  -1  1     73984  ultralytics.nn.modules.conv.Conv             [64, 128, 3, 2]
 18                  -1  2     92800  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [128, 128, True]
 19                  -1  1    295424  ultralytics.nn.modules.conv.Conv             [128, 256, 3, 2]
 20                  -1  1    182912  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [256, 256, True]
 21             [7, 16]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 22             [9, 18]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 23            [11, 20]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 24                  -1  1    394240  ultralytics.nn.modules.block.SPPF            [512, 256, 5]
 25                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']
 26            [-1, 22]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 27                  -1  1     95552  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [512, 128]
 28                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']
 29            [-1, 21]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 30                  -1  1     24224  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [256, 64]
 31                  -1  1     36992  ultralytics.nn.modules.conv.Conv             [64, 64, 3, 2]
 32            [-1, 27]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 33                  -1  1     54592  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [192, 128]
 34                  -1  1    147712  ultralytics.nn.modules.conv.Conv             [128, 128, 3, 2]
 35            [-1, 24]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 36                  -1  1    215680  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [384, 256]
 37        [30, 33, 36]  1    430867  ultralytics.nn.modules.head.Detect           [1, [64, 128, 256]]
YOLOv8-mid-MBConv summary: 365 layers, 2,791,539 parameters, 2,791,523 gradients, 7.2 GFLOPs

中-后期融合结果:

YOLOv8-mid-to-late-MBConv summary: 405 layers, 3,038,483 parameters, 3,038,467 gradients, 8.7 GFLOPs

                   from  n    params  module                                       arguments
  0                  -1  1         0  ultralytics.nn.AddModules.multimodal.IN      []
  1                  -1  1         0  ultralytics.nn.AddModules.multimodal.Multiin [1]
  2                  -2  1         0  ultralytics.nn.AddModules.multimodal.Multiin [2]
  3                   1  1       464  ultralytics.nn.modules.conv.Conv             [3, 16, 3, 2]
  4                  -1  1      4672  ultralytics.nn.modules.conv.Conv             [16, 32, 3, 2]
  5                  -1  1      3152  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [32, 32, True]
  6                  -1  1     18560  ultralytics.nn.modules.conv.Conv             [32, 64, 3, 2]
  7                  -1  2     23872  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [64, 64, True]
  8                  -1  1     73984  ultralytics.nn.modules.conv.Conv             [64, 128, 3, 2]
  9                  -1  2     92800  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [128, 128, True]
 10                  -1  1    295424  ultralytics.nn.modules.conv.Conv             [128, 256, 3, 2]
 11                  -1  1    182912  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [256, 256, True]
 12                  -1  1    164608  ultralytics.nn.modules.block.SPPF            [256, 256, 5]
 13                   2  1       464  ultralytics.nn.modules.conv.Conv             [3, 16, 3, 2]
 14                  -1  1      4672  ultralytics.nn.modules.conv.Conv             [16, 32, 3, 2]
 15                  -1  1      3152  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [32, 32, True]
 16                  -1  1     18560  ultralytics.nn.modules.conv.Conv             [32, 64, 3, 2]
 17                  -1  2     23872  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [64, 64, True]
 18                  -1  1     73984  ultralytics.nn.modules.conv.Conv             [64, 128, 3, 2]
 19                  -1  2     92800  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [128, 128, True]
 20                  -1  1    295424  ultralytics.nn.modules.conv.Conv             [128, 256, 3, 2]
 21                  -1  1    182912  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [256, 256, True]
 22                  -1  1    164608  ultralytics.nn.modules.block.SPPF            [256, 256, 5]
 23                  12  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']
 24             [-1, 9]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 25                  -1  1     79168  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [384, 128]
 26                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']
 27             [-1, 7]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 28                  -1  1     20128  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [192, 64]
 29                  22  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']
 30            [-1, 19]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 31                  -1  1     79168  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [384, 128]
 32                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']
 33            [-1, 17]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 34                  -1  1     20128  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [192, 64]
 35            [12, 22]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 36            [25, 31]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 37            [28, 34]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 38                  37  1     73856  ultralytics.nn.modules.conv.Conv             [128, 64, 3, 2]
 39            [-1, 36]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 40                  -1  1     70976  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [320, 128]
 41                  -1  1    147712  ultralytics.nn.modules.conv.Conv             [128, 128, 3, 2]
 42            [-1, 35]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 43                  -1  1    281216  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [640, 256]
 44        [37, 40, 43]  1    545235  ultralytics.nn.modules.head.Detect           [1, [128, 128, 256]]
YOLOv8-mid-to-late-MBConv summary: 405 layers, 3,038,483 parameters, 3,038,467 gradients, 8.7 GFLOPs

后期融合结果:

YOLOv8-late-MBConv summary: 441 layers, 3,649,235 parameters, 3,649,219 gradients, 9.5 GFLOPs

                  from  n    params  module                                       arguments
  0                  -1  1         0  ultralytics.nn.AddModules.multimodal.IN      []
  1                  -1  1         0  ultralytics.nn.AddModules.multimodal.Multiin [1]
  2                  -2  1         0  ultralytics.nn.AddModules.multimodal.Multiin [2]
  3                   1  1       464  ultralytics.nn.modules.conv.Conv             [3, 16, 3, 2]
  4                  -1  1      4672  ultralytics.nn.modules.conv.Conv             [16, 32, 3, 2]
  5                  -1  1      3152  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [32, 32, True]
  6                  -1  1     18560  ultralytics.nn.modules.conv.Conv             [32, 64, 3, 2]
  7                  -1  2     23872  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [64, 64, True]
  8                  -1  1     73984  ultralytics.nn.modules.conv.Conv             [64, 128, 3, 2]
  9                  -1  2     92800  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [128, 128, True]
 10                  -1  1    295424  ultralytics.nn.modules.conv.Conv             [128, 256, 3, 2]
 11                  -1  1    182912  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [256, 256, True]
 12                  -1  1    164608  ultralytics.nn.modules.block.SPPF            [256, 256, 5]
 13                   2  1       464  ultralytics.nn.modules.conv.Conv             [3, 16, 3, 2]
 14                  -1  1      4672  ultralytics.nn.modules.conv.Conv             [16, 32, 3, 2]
 15                  -1  1      3152  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [32, 32, True]
 16                  -1  1     18560  ultralytics.nn.modules.conv.Conv             [32, 64, 3, 2]
 17                  -1  2     23872  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [64, 64, True]
 18                  -1  1     73984  ultralytics.nn.modules.conv.Conv             [64, 128, 3, 2]
 19                  -1  2     92800  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [128, 128, True]
 20                  -1  1    295424  ultralytics.nn.modules.conv.Conv             [128, 256, 3, 2]
 21                  -1  1    182912  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [256, 256, True]
 22                  -1  1    164608  ultralytics.nn.modules.block.SPPF            [256, 256, 5]
 23                  12  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']
 24             [-1, 9]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 25                  -1  1     79168  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [384, 128]
 26                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']
 27             [-1, 7]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 28                  -1  1     20128  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [192, 64]
 29                  -1  1     36992  ultralytics.nn.modules.conv.Conv             [64, 64, 3, 2]
 30            [-1, 25]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 31                  -1  1     54592  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [192, 128]
 32                  -1  1    147712  ultralytics.nn.modules.conv.Conv             [128, 128, 3, 2]
 33            [-1, 12]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 34                  -1  1    215680  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [384, 256]
 35                  22  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']
 36            [-1, 19]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 37                  -1  1     79168  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [384, 128]
 38                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']
 39            [-1, 17]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 40                  -1  1     20128  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [192, 64]
 41                  -1  1     36992  ultralytics.nn.modules.conv.Conv             [64, 64, 3, 2]
 42            [-1, 37]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 43                  -1  1     54592  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [192, 128]
 44                  -1  1    147712  ultralytics.nn.modules.conv.Conv             [128, 128, 3, 2]
 45            [-1, 22]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 46                  -1  1    215680  ultralytics.nn.AddModules.MBConv.C2f_MBConv  [384, 256]
 47            [28, 40]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 48            [31, 43]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 49            [34, 46]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 50        [47, 48, 49]  1    819795  ultralytics.nn.modules.head.Detect           [1, [128, 256, 512]]
YOLOv8-late-MBConv summary: 441 layers, 3,649,235 parameters, 3,649,219 gradients, 9.5 GFLOPs