【YOLOv10多模态融合改进】| 引入轻量化特征提取模块,解决多模态中的双模型参数量、计算量增加问题(适用不同的轻量化模块)
一、本文介绍
本文记录的是
利用轻量化模块改进 YOLOv10 的多模态目标检测网络模型
。由于多模态模型在训练过程中使用的是两个模型,
整体的参数量计算量相比一般的单模态模型大
,所以
轻量化也是多模态模型改进过程中常见的一个改进方向
。本文介绍如何使用轻量化模块改进
YOLOv10
中的主要特征提取模块
C2fCIB
。效果如下,也可替换成其它轻量化模块。
| 模型 | 参数量(验证) | 计算量(验证) |
|---|---|---|
| YOLOv10n前期融合 | 2.7 M | 9.0 GFLOPs |
| 轻量后的前期融合 | 2.1 M( -0.6 ) | 7.3 GFLOPs( -1.7 ) |
| YOLOv10n中期融合 | 3.7 M | 11.3 GFLOPs |
| 轻量后的中期融合 | 2.7 M( -1.0 ) | 8.5 GFLOPs( -2.8 ) |
| YOLOv10n中-后期融合 | 4.4 M | 14.3 GFLOPs |
| 轻量后的中-后期融合 | 3.4 M ( -1.0 ) | 11.1 GFLOPs( -3.2 ) |
| YOLOv10n后期融合 | 5.3 M | 15.7 GFLOPs |
| 轻量后的后期融合 | 4.2 M( -1.1 ) | 12.3 GFLOPs( -3.4 ) |
二、轻量化模块介绍
本文以EfficientNet中的MBConv为例,介绍其原理,在后续的轻量化过程中,可按照同样的步骤替换成其它轻量化模块。
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
2.1 结构组成
- 逐点卷积(1×1卷积)升维 :首先通过一个1×1的逐点卷积对输入特征图进行通道数扩展。目的是增加特征的维度,为后续的深度可分离卷积提供更多的特征信息,以便更好地提取特征。
- 深度可分离卷积 :包括深度卷积(Depthwise Convolution)和逐点卷积(Pointwise Convolution)。深度卷积是逐通道进行的卷积运算,每个卷积核负责一个通道,它可以在不增加太多计算量的情况下,提取特征的空间信息。之后的逐点卷积则是在通道维度上对深度卷积产生的特征图进行加权运算,两者结合可有效降低模型的计算量与参数量。
- SE模块(Squeeze-and-Excitation) :有助于模型在通道维度上对重要的特征信息产生更多的关注。
- 逐点卷积(1×1卷积)降维 :最后再通过一个1×1的逐点卷积将特征图的通道数恢复到与输入相近的维度,实现特征的融合和压缩,减少模型的参数量和计算量。
- Shortcut连接 :当输入MBConv结构的特征矩阵与输出的特征矩阵shape相同时存在shortcut连接,将输入直接与经过上述卷积操作后的输出相加,实现特征的复用,有助于解决梯度消失问题,使网络更容易训练。
2.2 工作原理
- 特征提取 :先利用1×1卷积升维扩展通道,让网络有更多的维度去学习特征。然后深度可分离卷积中的深度卷积负责提取空间特征,逐点卷积负责融合通道特征。
- 注意力机制 :SE模块通过对通道特征进行加权,使得网络能够自动关注到更重要的特征通道,抑制不重要的通道,从而提升模型的特征表达能力。
- 特征融合与输出 :最后的1×1卷积降维将特征进行融合和压缩,得到最终的输出特征图。有shortcut连接时,将输入特征与输出特征相加,使网络能够更好地学习到输入与输出之间的映射关系。
2.3 优势
- 高效的计算性能 :深度可分离卷积的使用大大减少了计算量,相比传统的卷积操作,能在降低计算成本的同时保持较好的特征提取能力,适用于移动设备等计算资源有限的场景。
- 强大的特征表达能力 :通过倒置瓶颈结构,先升维再降维,以及SE模块的注意力机制,能够更有效地提取和利用特征,提高模型的准确性和泛化能力。
- 轻量化模型 :减少了模型的参数量,降低了模型的存储需求和过拟合的风险,使模型更加轻量化,便于部署和应用。
论文: https://arxiv.org/pdf/1905.11946
源码: https://github.com/tensorflow/tpu/tree/master/models/official/efficientnet
三、轻量化改进的实现代码
实现代码如下:
import math
import torch
import torch.nn as nn
from ultralytics.nn.modules.conv import DWConv
from ultralytics.utils.tal import dist2bbox, make_anchors
from ultralytics.utils.torch_utils import fuse_conv_and_bn
class SeBlock(nn.Module):
def __init__(self, in_channel, reduction=4):
super().__init__()
self.Squeeze = nn.AdaptiveAvgPool2d(1)
self.Excitation = nn.Sequential()
self.Excitation.add_module('FC1', nn.Conv2d(in_channel, in_channel // reduction, kernel_size=1)) # 1*1卷积与此效果相同
self.Excitation.add_module('ReLU', nn.ReLU())
self.Excitation.add_module('FC2', nn.Conv2d(in_channel // reduction, in_channel, kernel_size=1))
self.Excitation.add_module('Sigmoid', nn.Sigmoid())
def forward(self, x):
y = self.Squeeze(x)
ouput = self.Excitation(y)
return x*(ouput.expand_as(x))
class drop_connect:
def __init__(self, drop_connect_rate):
self.drop_connect_rate = drop_connect_rate
def forward(self, x, training):
if not training:
return x
keep_prob = 1.0 - self.drop_connect_rate
batch_size = x.shape[0]
random_tensor = keep_prob
random_tensor += torch.rand([batch_size, 1, 1, 1], dtype=x.dtype, device=x.device)
binary_mask = torch.floor(random_tensor) # 1
x = (x / keep_prob) * binary_mask
return x
class stem(nn.Module):
def __init__(self, c1, c2, act='ReLU6'):
super().__init__()
self.conv = nn.Conv2d(c1, c2, kernel_size=3, stride=2, padding=1, bias=False)
self.bn = nn.BatchNorm2d(num_features=c2)
if act == 'ReLU6':
self.act = nn.ReLU6(inplace=True)
def forward(self, x):
return self.act(self.bn(self.conv(x)))
class MBConvBlock(nn.Module):
def __init__(self, inp, final_oup, k=3, s=1, expand_ratio=1, drop_connect_rate=0.057, has_se=False):
super(MBConvBlock, self).__init__()
self._momentum = 0.01
self._epsilon = 1e-3
self.input_filters = inp
self.output_filters = final_oup
self.stride = s
self.expand_ratio = expand_ratio
self.has_se = has_se
self.id_skip = True # skip connection and drop connect
se_ratio = 0.25
# Expansion phase
oup = inp * expand_ratio # number of output channels
if expand_ratio != 1:
self._expand_conv = nn.Conv2d(in_channels=inp, out_channels=oup, kernel_size=1, bias=False)
self._bn0 = nn.BatchNorm2d(num_features=oup, momentum=self._momentum, eps=self._epsilon)
# Depthwise convolution phase
self._depthwise_conv = nn.Conv2d(
in_channels=oup, out_channels=oup, groups=oup, # groups makes it depthwise
kernel_size=k, padding=(k - 1) // 2, stride=s, bias=False)
self._bn1 = nn.BatchNorm2d(num_features=oup, momentum=self._momentum, eps=self._epsilon)
# Squeeze and Excitation layer, if desired
if self.has_se:
num_squeezed_channels = max(1, int(inp * se_ratio))
self.se = SeBlock(oup, 4)
# Output phase
self._project_conv = nn.Conv2d(in_channels=oup, out_channels=final_oup, kernel_size=1, bias=False)
self._bn2 = nn.BatchNorm2d(num_features=final_oup, momentum=self._momentum, eps=self._epsilon)
self._relu = nn.ReLU6(inplace=True)
self.drop_connect = drop_connect(drop_connect_rate)
def forward(self, x, drop_connect_rate=None):
"""
:param x: input tensor
:param drop_connect_rate: drop connect rate (float, between 0 and 1)
:return: output of block
"""
# Expansion and Depthwise Convolution
identity = x
if self.expand_ratio != 1:
x = self._relu(self._bn0(self._expand_conv(x)))
x = self._relu(self._bn1(self._depthwise_conv(x)))
# Squeeze and Excitation
if self.has_se:
x = self.se(x)
x = self._bn2(self._project_conv(x))
# Skip connection and drop connect
if self.id_skip and self.stride == 1 and self.input_filters == self.output_filters:
if drop_connect_rate:
x = self.drop_connect(x, training=self.training)
x += identity # skip connection
return x
def autopad(k, p=None, d=1): # kernel, padding, dilation
"""Pad to 'same' shape outputs."""
if d > 1:
k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k] # actual kernel-size
if p is None:
p = k // 2 if isinstance(k, int) else [x // 2 for x in k] # auto-pad
return p
class Conv(nn.Module):
"""Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)."""
default_act = nn.SiLU() # default activation
def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):
"""Initialize Conv layer with given arguments including activation."""
super().__init__()
self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)
self.bn = nn.BatchNorm2d(c2)
self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()
def forward(self, x):
"""Apply convolution, batch normalization and activation to input tensor."""
return self.act(self.bn(self.conv(x)))
def forward_fuse(self, x):
"""Perform transposed convolution of 2D data."""
return self.act(self.conv(x))
class Bottleneck(nn.Module):
"""Standard bottleneck."""
def __init__(self, c1, c2, shortcut=True, g=1, k=(3, 3), e=0.5):
"""Initializes a standard bottleneck module with optional shortcut connection and configurable parameters."""
super().__init__()
c_ = int(c2 * e) # hidden channels
self.cv1 = Conv(c1, c_, k[0], 1)
self.cv2 = Conv(c_, c2, k[1], 1, g=g)
self.add = shortcut and c1 == c2
def forward(self, x):
"""Applies the YOLO FPN to input data."""
return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))
class C2f(nn.Module):
"""Faster Implementation of CSP Bottleneck with 2 convolutions."""
def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5):
"""Initializes a CSP bottleneck with 2 convolutions and n Bottleneck blocks for faster processing."""
super().__init__()
self.c = int(c2 * e) # hidden channels
self.cv1 = Conv(c1, 2 * self.c, 1, 1)
self.cv2 = Conv((2 + n) * self.c, c2, 1) # optional act=FReLU(c2)
self.m = nn.ModuleList(Bottleneck(self.c, self.c, shortcut, g, k=((3, 3), (3, 3)), e=1.0) for _ in range(n))
def forward(self, x):
"""Forward pass through C2f layer."""
y = list(self.cv1(x).chunk(2, 1))
y.extend(m(y[-1]) for m in self.m)
return self.cv2(torch.cat(y, 1))
def forward_split(self, x):
"""Forward pass using split() instead of chunk()."""
y = list(self.cv1(x).split((self.c, self.c), 1))
y.extend(m(y[-1]) for m in self.m)
return self.cv2(torch.cat(y, 1))
class C3(nn.Module):
"""CSP Bottleneck with 3 convolutions."""
def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):
"""Initialize the CSP Bottleneck with given channels, number, shortcut, groups, and expansion values."""
super().__init__()
c_ = int(c2 * e) # hidden channels
self.cv1 = Conv(c1, c_, 1, 1)
self.cv2 = Conv(c1, c_, 1, 1)
self.cv3 = Conv(2 * c_, c2, 1) # optional act=FReLU(c2)
self.m = nn.Sequential(*(Bottleneck(c_, c_, shortcut, g, k=((1, 1), (3, 3)), e=1.0) for _ in range(n)))
def forward(self, x):
"""Forward pass through the CSP bottleneck with 2 convolutions."""
return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), 1))
class C3k(C3):
"""C3k is a CSP bottleneck module with customizable kernel sizes for feature extraction in neural networks."""
def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5, k=3):
"""Initializes the C3k module with specified channels, number of layers, and configurations."""
super().__init__(c1, c2, n, shortcut, g, e)
c_ = int(c2 * e) # hidden channels
# self.m = nn.Sequential(*(RepBottleneck(c_, c_, shortcut, g, k=(k, k), e=1.0) for _ in range(n)))
self.m = nn.Sequential(*(Bottleneck(c_, c_, shortcut, g, k=(k, k), e=1.0) for _ in range(n)))
class C3k2_MBConv(C2f):
"""Faster Implementation of CSP Bottleneck with 2 convolutions."""
def __init__(self, c1, c2, n=1, c3k=False, e=0.5, g=1, shortcut=True):
"""Initializes the C3k2 module, a faster CSP Bottleneck with 2 convolutions and optional C3k blocks."""
super().__init__(c1, c2, n, shortcut, g, e)
self.m = nn.ModuleList(
C3k(self.c, self.c, 2, shortcut, g) if c3k else MBConvBlock(self.c, self.c) for _ in range(n)
)
class A2C2f_MBConv(nn.Module):
"""
Area-Attention C2f module for enhanced feature extraction with area-based attention mechanisms.
This module extends the C2f architecture by incorporating area-attention and ABlock layers for improved feature
processing. It supports both area-attention and standard convolution modes.
Attributes:
cv1 (Conv): Initial 1x1 convolution layer that reduces input channels to hidden channels.
cv2 (Conv): Final 1x1 convolution layer that processes concatenated features.
gamma (nn.Parameter | None): Learnable parameter for residual scaling when using area attention.
m (nn.ModuleList): List of either ABlock or C3k modules for feature processing.
Methods:
forward: Processes input through area-attention or standard convolution pathway.
Examples:
>>> m = A2C2f(512, 512, n=1, a2=True, area=1)
>>> x = torch.randn(1, 512, 32, 32)
>>> output = m(x)
>>> print(output.shape)
torch.Size([1, 512, 32, 32])
"""
def __init__(self, c1, c2, n=1, a2=True, area=1, residual=False, mlp_ratio=2.0, e=0.5, g=1, shortcut=True):
"""
Initialize Area-Attention C2f module.
Args:
c1 (int): Number of input channels.
c2 (int): Number of output channels.
n (int): Number of ABlock or C3k modules to stack.
a2 (bool): Whether to use area attention blocks. If False, uses C3k blocks instead.
area (int): Number of areas the feature map is divided.
residual (bool): Whether to use residual connections with learnable gamma parameter.
mlp_ratio (float): Expansion ratio for MLP hidden dimension.
e (float): Channel expansion ratio for hidden channels.
g (int): Number of groups for grouped convolutions.
shortcut (bool): Whether to use shortcut connections in C3k blocks.
"""
super().__init__()
c_ = int(c2 * e) # hidden channels
assert c_ % 32 == 0, "Dimension of ABlock be a multiple of 32."
self.cv1 = Conv(c1, c_, 1, 1)
self.cv2 = Conv((1 + n) * c_, c2, 1)
self.gamma = nn.Parameter(0.01 * torch.ones(c2), requires_grad=True) if a2 and residual else None
self.m = nn.ModuleList(
nn.Sequential(*(MBConvBlock(c_, c_) for _ in range(2)))
if a2
else C3k(c_, c_, 2, shortcut, g)
for _ in range(n)
)
def forward(self, x):
"""
Forward pass through A2C2f layer.
Args:
x (torch.Tensor): Input tensor.
Returns:
(torch.Tensor): Output tensor after processing.
"""
y = [self.cv1(x)]
y.extend(m(y[-1]) for m in self.m)
y = self.cv2(torch.cat(y, 1))
if self.gamma is not None:
return x + self.gamma.view(-1, len(self.gamma), 1, 1) * y
return y
class C2f_MBConv(nn.Module):
"""Faster Implementation of CSP Bottleneck with 2 convolutions."""
def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5):
"""Initializes a CSP bottleneck with 2 convolutions and n Bottleneck blocks for faster processing."""
super().__init__()
self.c = int(c2 * e) # hidden channels
self.cv1 = Conv(c1, 2 * self.c, 1, 1)
self.cv2 = Conv((2 + n) * self.c, c2, 1) # optional act=FReLU(c2)
self.m = nn.ModuleList(MBConvBlock(self.c, self.c) for _ in range(n))
def forward(self, x):
"""Forward pass through C2f layer."""
y = list(self.cv1(x).chunk(2, 1))
y.extend(m(y[-1]) for m in self.m)
return self.cv2(torch.cat(y, 1))
def forward_split(self, x):
"""Forward pass using split() instead of chunk()."""
y = list(self.cv1(x).split((self.c, self.c), 1))
y.extend(m(y[-1]) for m in self.m)
return self.cv2(torch.cat(y, 1))
class RepVGGDW(torch.nn.Module):
"""RepVGGDW is a class that represents a depth wise separable convolutional block in RepVGG architecture."""
def __init__(self, ed) -> None:
"""Initializes RepVGGDW with depthwise separable convolutional layers for efficient processing."""
super().__init__()
self.conv = Conv(ed, ed, 7, 1, 3, g=ed, act=False)
self.conv1 = Conv(ed, ed, 3, 1, 1, g=ed, act=False)
self.dim = ed
self.act = nn.SiLU()
def forward(self, x):
"""
Performs a forward pass of the RepVGGDW block.
Args:
x (torch.Tensor): Input tensor.
Returns:
(torch.Tensor): Output tensor after applying the depth wise separable convolution.
"""
return self.act(self.conv(x) + self.conv1(x))
def forward_fuse(self, x):
"""
Performs a forward pass of the RepVGGDW block without fusing the convolutions.
Args:
x (torch.Tensor): Input tensor.
Returns:
(torch.Tensor): Output tensor after applying the depth wise separable convolution.
"""
return self.act(self.conv(x))
@torch.no_grad()
def fuse(self):
"""
Fuses the convolutional layers in the RepVGGDW block.
This method fuses the convolutional layers and updates the weights and biases accordingly.
"""
conv = fuse_conv_and_bn(self.conv.conv, self.conv.bn)
conv1 = fuse_conv_and_bn(self.conv1.conv, self.conv1.bn)
conv_w = conv.weight
conv_b = conv.bias
conv1_w = conv1.weight
conv1_b = conv1.bias
conv1_w = torch.nn.functional.pad(conv1_w, [2, 2, 2, 2])
final_conv_w = conv_w + conv1_w
final_conv_b = conv_b + conv1_b
conv.weight.data.copy_(final_conv_w)
conv.bias.data.copy_(final_conv_b)
self.conv = conv
del self.conv1
class CIB(nn.Module):
"""
Conditional Identity Block (CIB) module.
Args:
c1 (int): Number of input channels.
c2 (int): Number of output channels.
shortcut (bool, optional): Whether to add a shortcut connection. Defaults to True.
e (float, optional): Scaling factor for the hidden channels. Defaults to 0.5.
lk (bool, optional): Whether to use RepVGGDW for the third convolutional layer. Defaults to False.
"""
def __init__(self, c1, c2, shortcut=True, e=0.5, lk=False):
"""Initializes the custom model with optional shortcut, scaling factor, and RepVGGDW layer."""
super().__init__()
c_ = int(c2 * e) # hidden channels
self.cv1 = nn.Sequential(
Conv(c1, c1, 3, g=c1),
Conv(c1, 2 * c_, 1),
RepVGGDW(2 * c_) if lk else Conv(2 * c_, 2 * c_, 3, g=2 * c_),
Conv(2 * c_, c2, 1),
Conv(c2, c2, 3, g=c2),
)
self.add = shortcut and c1 == c2
def forward(self, x):
"""
Forward pass of the CIB module.
Args:
x (torch.Tensor): Input tensor.
Returns:
(torch.Tensor): Output tensor.
"""
return x + self.cv1(x) if self.add else self.cv1(x)
class C2fCIB_MBConv(C2f_MBConv):
"""
C2fCIB class represents a convolutional block with C2f and CIB modules.
Args:
c1 (int): Number of input channels.
c2 (int): Number of output channels.
n (int, optional): Number of CIB modules to stack. Defaults to 1.
shortcut (bool, optional): Whether to use shortcut connection. Defaults to False.
lk (bool, optional): Whether to use local key connection. Defaults to False.
g (int, optional): Number of groups for grouped convolution. Defaults to 1.
e (float, optional): Expansion ratio for CIB modules. Defaults to 0.5.
"""
def __init__(self, c1, c2, n=1, shortcut=False, lk=False, g=1, e=0.5):
"""Initializes the module with specified parameters for channel, shortcut, local key, groups, and expansion."""
super().__init__(c1, c2, n, shortcut, g, e)
self.m = nn.ModuleList(CIB(self.c, self.c, shortcut, e=1.0, lk=lk) for _ in range(n))
四、改进点
对
YOLOv10
中的
C2f模块
进行改进,并将
MBConv
在加入到
C2f
模块中。
(第五节讲解添加步骤)
改进代码如下:
对
C2f
模块进行改进,加入
MBConv模块
。,替换原本的
Bottleneck
class C2f_MBConv(nn.Module):
"""Faster Implementation of CSP Bottleneck with 2 convolutions."""
def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5):
"""Initializes a CSP bottleneck with 2 convolutions and n Bottleneck blocks for faster processing."""
super().__init__()
self.c = int(c2 * e) # hidden channels
self.cv1 = Conv(c1, 2 * self.c, 1, 1)
self.cv2 = Conv((2 + n) * self.c, c2, 1) # optional act=FReLU(c2)
self.m = nn.ModuleList(MBConvBlock(self.c, self.c) for _ in range(n))
def forward(self, x):
"""Forward pass through C2f layer."""
y = list(self.cv1(x).chunk(2, 1))
y.extend(m(y[-1]) for m in self.m)
return self.cv2(torch.cat(y, 1))
def forward_split(self, x):
"""Forward pass using split() instead of chunk()."""
y = list(self.cv1(x).split((self.c, self.c), 1))
y.extend(m(y[-1]) for m in self.m)
return self.cv2(torch.cat(y, 1))
对
YOLOv10
中的
C2fCIB模块
进行改进,并将
MBConv
在加入到
C2fCIB
模块中。
(第五节讲解添加步骤)
改进代码如下:
对
C2fCIB
模块进行改进,继承
C2f_MBConv
class C2fCIB_MBConv(C2f_MBConv):
"""
C2fCIB class represents a convolutional block with C2f and CIB modules.
Args:
c1 (int): Number of input channels.
c2 (int): Number of output channels.
n (int, optional): Number of CIB modules to stack. Defaults to 1.
shortcut (bool, optional): Whether to use shortcut connection. Defaults to False.
lk (bool, optional): Whether to use local key connection. Defaults to False.
g (int, optional): Number of groups for grouped convolution. Defaults to 1.
e (float, optional): Expansion ratio for CIB modules. Defaults to 0.5.
"""
def __init__(self, c1, c2, n=1, shortcut=False, lk=False, g=1, e=0.5):
"""Initializes the module with specified parameters for channel, shortcut, local key, groups, and expansion."""
super().__init__(c1, c2, n, shortcut, g, e)
self.m = nn.ModuleList(CIB(self.c, self.c, shortcut, e=1.0, lk=lk) for _ in range(n))
五、配置步骤
5.1 修改一
① 在
ultralytics/nn/
目录下新建
AddModules
文件夹用于存放模块代码(提供的项目包里已存在
AddModules
文件夹)
② 在
AddModules
文件夹下新建
MBConv.py
,将
第三节
中的代码粘贴到此处
5.2 修改二
在
AddModules
文件夹下新建
__init__.py
(提供的项目包里已存在
__init__.py
),在文件内导入模块:
from .MBConv import *
5.3 修改三
在
ultralytics/nn/modules/tasks.py
文件中,需要在两处位置添加各模块类名称。
首先:导入模块
其次:在
parse_model函数
中注册
C2f_MBConv
和
C2fCIB_MBConv
模块
六、yaml模型文件
!!! 获取的项目包就已经把相关的多模态输入、训练等改动都已经配好了,只需要新建模型yaml文件,粘贴对应的模型,进行训练即可。 项目包获取及使用教程可参考链接: 《YOLO系列模型的多模态项目》配置使用教程
在什么地方新建,n,s,m,l,x,用哪个版本按自己的需求来即可,和普通的训练步骤一致。
除了模型结构方面的改动,在yaml文件中还传入了一个通道数
ch: 6
表示传入的是双模态,6通道 ,前三个是可见光,后三个是红外。
在default.yaml中也配置了这个参数。
下面便是在前期、中期、中后期、后期融合中添加
C2f_MBConv
和
C2fCIB_MBConv
6.1 前期融合
# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLOv10 object detection model. For Usage examples see https://docs.ultralytics.com/tasks/detect
# Parameters
ch: 6
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov10n.yaml' will call yolov10.yaml with scale 'n'
# [depth, width, max_channels]
n: [0.33, 0.25, 1024]
backbone:
# [from, repeats, module, args]
- [-1, 1, MF, [64]] # 0
- [-1, 1, Conv, [64, 3, 2]] # 1-P1/2
- [-1, 1, Conv, [128, 3, 2]] # 2-P2/4
- [-1, 3, C2f_MBConv, [128, True]]
- [-1, 1, Conv, [256, 3, 2]] # 4-P3/8
- [-1, 6, C2f_MBConv, [256, True]]
- [-1, 1, SCDown, [512, 3, 2]] # 6-P4/16
- [-1, 6, C2f_MBConv, [512, True]]
- [-1, 1, SCDown, [1024, 3, 2]] # 8-P5/32
- [-1, 3, C2f_MBConv, [1024, True]]
- [-1, 1, SPPF, [1024, 5]] # 10
- [-1, 1, PSA, [1024]] # 11
# YOLOv10.0n head
head:
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 7], 1, Concat, [1]] # cat backbone P4
- [-1, 3, C2f_MBConv, [512]] # 14
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 5], 1, Concat, [1]] # cat backbone P3
- [-1, 3, C2f_MBConv, [256]] # 17 (P3/8-small)
- [-1, 1, Conv, [256, 3, 2]]
- [[-1, 12], 1, Concat, [1]] # cat head P4
- [-1, 3, C2f_MBConv, [512]] # 20 (P4/16-medium)
- [-1, 1, SCDown, [512, 3, 2]]
- [[-1, 11], 1, Concat, [1]] # cat head P5
- [-1, 3, C2fCIB_MBConv, [1024, True, True]] # 23 (P5/32-large)
- [[17, 20, 23], 1, v10Detect, [nc]] # Detect(P3, P4, P5)
6.2 中期融合
# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLOv10 object detection model. For Usage examples see https://docs.ultralytics.com/tasks/detect
# Parameters
ch: 6
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov10n.yaml' will call yolov10.yaml with scale 'n'
# [depth, width, max_channels]
n: [0.33, 0.25, 1024]
backbone:
# [from, repeats, module, args]
- [-1, 1, IN, []] # 0
- [-1, 1, Multiin, [1]] # 1
- [-2, 1, Multiin, [2]] # 2
- [1, 1, Conv, [64, 3, 2]] # 3-P1/2
- [-1, 1, Conv, [128, 3, 2]] # 4-P2/4
- [-1, 3, C2f_MBConv, [128, True]]
- [-1, 1, Conv, [256, 3, 2]] # 6-P3/8
- [-1, 6, C2f_MBConv, [256, True]]
- [-1, 1, SCDown, [512, 3, 2]] # 8-P4/16
- [-1, 6, C2f_MBConv, [512, True]]
- [-1, 1, SCDown, [1024, 3, 2]] # 10-P5/32
- [-1, 3, C2f_MBConv, [1024, True]]
- [2, 1, Conv, [64, 3, 2]] # 12-P1/2
- [-1, 1, Conv, [128, 3, 2]] # 13-P2/4
- [-1, 3, C2f_MBConv, [128, True]]
- [-1, 1, Conv, [256, 3, 2]] # 15-P3/8
- [-1, 6, C2f_MBConv, [256, True]]
- [-1, 1, SCDown, [512, 3, 2]] # 17-P4/16
- [-1, 6, C2f_MBConv, [512, True]]
- [-1, 1, SCDown, [1024, 3, 2]] # 19-P5/32
- [-1, 3, C2f_MBConv, [1024, True]]
- [[7, 16], 1, Concat, [1]] # 21 cat backbone P3
- [[9, 18], 1, Concat, [1]] # 22 cat backbone P4
- [[11, 20], 1, Concat, [1]] # 23 cat backbone P5
- [-1, 1, SPPF, [1024, 5]] # 24
- [-1, 1, PSA, [1024]] # 25
# YOLOv10.0n head
head:
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 22], 1, Concat, [1]] # cat backbone P4
- [-1, 3, C2f_MBConv, [512]] # 28
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 21], 1, Concat, [1]] # cat backbone P3
- [-1, 3, C2f_MBConv, [256]] # 31 (P3/8-small)
- [-1, 1, Conv, [256, 3, 2]]
- [[-1, 28], 1, Concat, [1]] # cat head P4
- [-1, 3, C2f_MBConv, [512]] # 34 (P4/16-medium)
- [-1, 1, SCDown, [512, 3, 2]]
- [[-1, 25], 1, Concat, [1]] # cat head P5
- [-1, 3, C2fCIB_MBConv, [1024, True, True]] # 37 (P5/32-large)
- [[31, 34, 37], 1, v10Detect, [nc]] # Detect(P3, P4, P5)
6.3 中-后期融合
# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLOv10 object detection model. For Usage examples see https://docs.ultralytics.com/tasks/detect
# Parameters
ch: 6
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov10n.yaml' will call yolov10.yaml with scale 'n'
# [depth, width, max_channels]
n: [0.33, 0.25, 1024]
backbone:
# [from, repeats, module, args]
- [-1, 1, IN, []] # 0
- [-1, 1, Multiin, [1]] # 1
- [-2, 1, Multiin, [2]] # 2
- [1, 1, Conv, [64, 3, 2]] # 3-P1/2
- [-1, 1, Conv, [128, 3, 2]] # 4-P2/4
- [-1, 3, C2f_MBConv, [128, True]]
- [-1, 1, Conv, [256, 3, 2]] # 6-P3/8
- [-1, 6, C2f_MBConv, [256, True]]
- [-1, 1, SCDown, [512, 3, 2]] # 8-P4/16
- [-1, 6, C2f_MBConv, [512, True]]
- [-1, 1, SCDown, [1024, 3, 2]] # 10-P5/32
- [-1, 3, C2f_MBConv, [1024, True]]
- [-1, 1, SPPF, [1024, 5]] # 12
- [-1, 1, PSA, [1024]] # 13
- [2, 1, Conv, [64, 3, 2]] # 14-P1/2
- [-1, 1, Conv, [128, 3, 2]] # 15-P2/4
- [-1, 3, C2f_MBConv, [128, True]]
- [-1, 1, Conv, [256, 3, 2]] # 17-P3/8
- [-1, 6, C2f_MBConv, [256, True]]
- [-1, 1, SCDown, [512, 3, 2]] # 19-P4/16
- [-1, 6, C2f_MBConv, [512, True]]
- [-1, 1, SCDown, [1024, 3, 2]] # 21-P5/32
- [-1, 3, C2f_MBConv, [1024, True]]
- [-1, 1, SPPF, [1024, 5]] # 23
- [-1, 1, PSA, [1024]] # 24
# YOLOv10.0n head
head:
- [13, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 9], 1, Concat, [1]] # cat backbone P4
- [-1, 3, C2f_MBConv, [512]] # 27
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 7], 1, Concat, [1]] # cat backbone P3
- [-1, 3, C2f_MBConv, [256]] # 30 (P3/8-small)
- [24, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 20], 1, Concat, [1]] # cat backbone P4
- [-1, 3, C2f_MBConv, [512]] # 33
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 18], 1, Concat, [1]] # cat backbone P3
- [-1, 3, C2f_MBConv, [256]] # 36 (P3/8-small)
- [[13, 24], 1, Concat, [1]] # 37 cat backbone P3
- [[27, 33], 1, Concat, [1]] # 38 cat backbone P4
- [[30, 36], 1, Concat, [1]] # 39 cat backbone P5
- [-1, 1, Conv, [256, 3, 2]]
- [[-1, 38], 1, Concat, [1]] # cat head P4
- [-1, 3, C2f_MBConv, [512]] # 42 (P4/16-medium)
- [-1, 1, SCDown, [512, 3, 2]]
- [[-1, 37], 1, Concat, [1]] # cat head P5
- [-1, 3, C2fCIB_MBConv, [1024, True, True]] # 45 (P5/32-large)
- [[39, 42, 45], 1, v10Detect, [nc]] # Detect(P3, P4, P5)
6.4 后期融合
# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLOv10 object detection model. For Usage examples see https://docs.ultralytics.com/tasks/detect
# Parameters
ch: 6
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov10n.yaml' will call yolov10.yaml with scale 'n'
# [depth, width, max_channels]
n: [0.33, 0.25, 1024]
backbone:
# [from, repeats, module, args]
- [-1, 1, IN, []] # 0
- [-1, 1, Multiin, [1]] # 1
- [-2, 1, Multiin, [2]] # 2
- [1, 1, Conv, [64, 3, 2]] # 3-P1/2
- [-1, 1, Conv, [128, 3, 2]] # 4-P2/4
- [-1, 3, C2f_MBConv, [128, True]]
- [-1, 1, Conv, [256, 3, 2]] # 6-P3/8
- [-1, 6, C2f_MBConv, [256, True]]
- [-1, 1, SCDown, [512, 3, 2]] # 8-P4/16
- [-1, 6, C2f_MBConv, [512, True]]
- [-1, 1, SCDown, [1024, 3, 2]] # 10-P5/32
- [-1, 3, C2f_MBConv, [1024, True]]
- [-1, 1, SPPF, [1024, 5]] # 12
- [-1, 1, PSA, [1024]] # 13
- [2, 1, Conv, [64, 3, 2]] # 14-P1/2
- [-1, 1, Conv, [128, 3, 2]] # 15-P2/4
- [-1, 3, C2f_MBConv, [128, True]]
- [-1, 1, Conv, [256, 3, 2]] # 17-P3/8
- [-1, 6, C2f_MBConv, [256, True]]
- [-1, 1, SCDown, [512, 3, 2]] # 19-P4/16
- [-1, 6, C2f_MBConv, [512, True]]
- [-1, 1, SCDown, [1024, 3, 2]] # 21-P5/32
- [-1, 3, C2f_MBConv, [1024, True]]
- [-1, 1, SPPF, [1024, 5]] # 23
- [-1, 1, PSA, [1024]] # 24
# YOLOv10.0n head
head:
- [13, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 9], 1, Concat, [1]] # cat backbone P4
- [-1, 3, C2f_MBConv, [512]] # 27
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 7], 1, Concat, [1]] # cat backbone P3
- [-1, 3, C2f_MBConv, [256]] # 30 (P3/8-small)
- [-1, 1, Conv, [256, 3, 2]]
- [[-1, 27], 1, Concat, [1]] # cat head P4
- [-1, 3, C2f_MBConv, [512]] # 33 (P4/16-medium)
- [-1, 1, SCDown, [512, 3, 2]]
- [[-1, 13], 1, Concat, [1]] # cat head P5
- [-1, 3, C2fCIB_MBConv, [1024, True, True]] # 36 (P5/32-large)
- [24, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 20], 1, Concat, [1]] # cat backbone P4
- [-1, 3, C2f_MBConv, [512]] # 39
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 18], 1, Concat, [1]] # cat backbone P3
- [-1, 3, C2f_MBConv, [256]] # 42 (P3/8-small)
- [-1, 1, Conv, [256, 3, 2]]
- [[-1, 39], 1, Concat, [1]] # cat head P4
- [-1, 3, C2f_MBConv, [512]] # 45 (P4/16-medium)
- [-1, 1, SCDown, [512, 3, 2]]
- [[-1, 24], 1, Concat, [1]] # cat head P5
- [-1, 3, C2fCIB_MBConv, [1024, True, True]] # 48 (P5/32-large)
- [[30, 42], 1, Concat, [1]] # 49 cat backbone P3
- [[33, 45], 1, Concat, [1]] # 50 cat backbone P4
- [[36, 48], 1, Concat, [1]] # 51 cat backbone P5
- [[49, 50, 51], 1, v10Detect, [nc]] # Detect(P3, P4, P5)
七、成功运行结果
前期融合结果: 可以看到输入的通道数为6,表明可见光图像和红外图像均输入到了模型中进行融合训练。
YOLOv10n-early-MBConv summary: 401 layers, 2,145,662 parameters, 2,145,646 gradients, 7.4 GFLOPs
from n params module arguments
0 -1 1 472 ultralytics.nn.AddModules.multimodal.MF [6, 16]
1 -1 1 2336 ultralytics.nn.modules.conv.Conv [16, 16, 3, 2]
2 -1 1 4672 ultralytics.nn.modules.conv.Conv [16, 32, 3, 2]
3 -1 1 3152 ultralytics.nn.AddModules.MBConv.C2f_MBConv [32, 32, True]
4 -1 1 18560 ultralytics.nn.modules.conv.Conv [32, 64, 3, 2]
5 -1 2 23872 ultralytics.nn.AddModules.MBConv.C2f_MBConv [64, 64, True]
6 -1 1 9856 ultralytics.nn.modules.block.SCDown [64, 128, 3, 2]
7 -1 2 92800 ultralytics.nn.AddModules.MBConv.C2f_MBConv [128, 128, True]
8 -1 1 36096 ultralytics.nn.modules.block.SCDown [128, 256, 3, 2]
9 -1 1 182912 ultralytics.nn.AddModules.MBConv.C2f_MBConv [256, 256, True]
10 -1 1 164608 ultralytics.nn.modules.block.SPPF [256, 256, 5]
11 -1 1 249728 ultralytics.nn.modules.block.PSA [256, 256]
12 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
13 [-1, 7] 1 0 ultralytics.nn.modules.conv.Concat [1]
14 -1 1 79168 ultralytics.nn.AddModules.MBConv.C2f_MBConv [384, 128]
15 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
16 [-1, 5] 1 0 ultralytics.nn.modules.conv.Concat [1]
17 -1 1 20128 ultralytics.nn.AddModules.MBConv.C2f_MBConv [192, 64]
18 -1 1 36992 ultralytics.nn.modules.conv.Conv [64, 64, 3, 2]
19 [-1, 12] 1 0 ultralytics.nn.modules.conv.Concat [1]
20 -1 1 70976 ultralytics.nn.AddModules.MBConv.C2f_MBConv [320, 128]
21 -1 1 18048 ultralytics.nn.modules.block.SCDown [128, 128, 3, 2]
22 [-1, 11] 1 0 ultralytics.nn.modules.conv.Concat [1]
23 -1 1 269568 ultralytics.nn.AddModules.MBConv.C2fCIB_MBConv[384, 256, True, True]
24 [17, 20, 23] 1 861718 ultralytics.nn.modules.head.v10Detect [1, [64, 128, 256]]
YOLOv10n-early-MBConv summary: 401 layers, 2,145,662 parameters, 2,145,646 gradients, 7.4 GFLOPs
中期融合结果:
YOLOv10n-mid-MBConv summary: 505 layers, 2,749,430 parameters, 2,749,414 gradients, 8.7 GFLOPs
from n params module arguments
0 -1 1 0 ultralytics.nn.AddModules.multimodal.IN []
1 -1 1 0 ultralytics.nn.AddModules.multimodal.Multiin [1]
2 -2 1 0 ultralytics.nn.AddModules.multimodal.Multiin [2]
3 1 1 464 ultralytics.nn.modules.conv.Conv [3, 16, 3, 2]
4 -1 1 4672 ultralytics.nn.modules.conv.Conv [16, 32, 3, 2]
5 -1 1 3152 ultralytics.nn.AddModules.MBConv.C2f_MBConv [32, 32, True]
6 -1 1 18560 ultralytics.nn.modules.conv.Conv [32, 64, 3, 2]
7 -1 2 23872 ultralytics.nn.AddModules.MBConv.C2f_MBConv [64, 64, True]
8 -1 1 9856 ultralytics.nn.modules.block.SCDown [64, 128, 3, 2]
9 -1 2 92800 ultralytics.nn.AddModules.MBConv.C2f_MBConv [128, 128, True]
10 -1 1 36096 ultralytics.nn.modules.block.SCDown [128, 256, 3, 2]
11 -1 1 182912 ultralytics.nn.AddModules.MBConv.C2f_MBConv [256, 256, True]
12 2 1 464 ultralytics.nn.modules.conv.Conv [3, 16, 3, 2]
13 -1 1 4672 ultralytics.nn.modules.conv.Conv [16, 32, 3, 2]
14 -1 1 3152 ultralytics.nn.AddModules.MBConv.C2f_MBConv [32, 32, True]
15 -1 1 18560 ultralytics.nn.modules.conv.Conv [32, 64, 3, 2]
16 -1 2 23872 ultralytics.nn.AddModules.MBConv.C2f_MBConv [64, 64, True]
17 -1 1 9856 ultralytics.nn.modules.block.SCDown [64, 128, 3, 2]
18 -1 2 92800 ultralytics.nn.AddModules.MBConv.C2f_MBConv [128, 128, True]
19 -1 1 36096 ultralytics.nn.modules.block.SCDown [128, 256, 3, 2]
20 -1 1 182912 ultralytics.nn.AddModules.MBConv.C2f_MBConv [256, 256, True]
21 [7, 16] 1 0 ultralytics.nn.modules.conv.Concat [1]
22 [9, 18] 1 0 ultralytics.nn.modules.conv.Concat [1]
23 [11, 20] 1 0 ultralytics.nn.modules.conv.Concat [1]
24 -1 1 394240 ultralytics.nn.modules.block.SPPF [512, 256, 5]
25 -1 1 249728 ultralytics.nn.modules.block.PSA [256, 256]
26 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
27 [-1, 22] 1 0 ultralytics.nn.modules.conv.Concat [1]
28 -1 1 95552 ultralytics.nn.AddModules.MBConv.C2f_MBConv [512, 128]
29 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
30 [-1, 21] 1 0 ultralytics.nn.modules.conv.Concat [1]
31 -1 1 24224 ultralytics.nn.AddModules.MBConv.C2f_MBConv [256, 64]
32 -1 1 36992 ultralytics.nn.modules.conv.Conv [64, 64, 3, 2]
33 [-1, 28] 1 0 ultralytics.nn.modules.conv.Concat [1]
34 -1 1 54592 ultralytics.nn.AddModules.MBConv.C2f_MBConv [192, 128]
35 -1 1 18048 ultralytics.nn.modules.block.SCDown [128, 128, 3, 2]
36 [-1, 25] 1 0 ultralytics.nn.modules.conv.Concat [1]
37 -1 1 269568 ultralytics.nn.AddModules.MBConv.C2fCIB_MBConv[384, 256, True, True]
38 [31, 34, 37] 1 861718 ultralytics.nn.modules.head.v10Detect [1, [64, 128, 256]]
YOLOv10n-mid-MBConv summary: 505 layers, 2,749,430 parameters, 2,749,414 gradients, 8.7 GFLOPs
中-后期融合结果:
YOLOv10n-mid-to-late-MBConv summary: 573 layers, 3,360,470 parameters, 3,360,454 gradients, 11.3 GFLOPs
from n params module arguments
0 -1 1 0 ultralytics.nn.AddModules.multimodal.IN []
1 -1 1 0 ultralytics.nn.AddModules.multimodal.Multiin [1]
2 -2 1 0 ultralytics.nn.AddModules.multimodal.Multiin [2]
3 1 1 464 ultralytics.nn.modules.conv.Conv [3, 16, 3, 2]
4 -1 1 4672 ultralytics.nn.modules.conv.Conv [16, 32, 3, 2]
5 -1 1 3152 ultralytics.nn.AddModules.MBConv.C2f_MBConv [32, 32, True]
6 -1 1 18560 ultralytics.nn.modules.conv.Conv [32, 64, 3, 2]
7 -1 2 23872 ultralytics.nn.AddModules.MBConv.C2f_MBConv [64, 64, True]
8 -1 1 9856 ultralytics.nn.modules.block.SCDown [64, 128, 3, 2]
9 -1 2 92800 ultralytics.nn.AddModules.MBConv.C2f_MBConv [128, 128, True]
10 -1 1 36096 ultralytics.nn.modules.block.SCDown [128, 256, 3, 2]
11 -1 1 182912 ultralytics.nn.AddModules.MBConv.C2f_MBConv [256, 256, True]
12 -1 1 164608 ultralytics.nn.modules.block.SPPF [256, 256, 5]
13 -1 1 249728 ultralytics.nn.modules.block.PSA [256, 256]
14 2 1 464 ultralytics.nn.modules.conv.Conv [3, 16, 3, 2]
15 -1 1 4672 ultralytics.nn.modules.conv.Conv [16, 32, 3, 2]
16 -1 1 3152 ultralytics.nn.AddModules.MBConv.C2f_MBConv [32, 32, True]
17 -1 1 18560 ultralytics.nn.modules.conv.Conv [32, 64, 3, 2]
18 -1 2 23872 ultralytics.nn.AddModules.MBConv.C2f_MBConv [64, 64, True]
19 -1 1 9856 ultralytics.nn.modules.block.SCDown [64, 128, 3, 2]
20 -1 2 92800 ultralytics.nn.AddModules.MBConv.C2f_MBConv [128, 128, True]
21 -1 1 36096 ultralytics.nn.modules.block.SCDown [128, 256, 3, 2]
22 -1 1 182912 ultralytics.nn.AddModules.MBConv.C2f_MBConv [256, 256, True]
23 -1 1 164608 ultralytics.nn.modules.block.SPPF [256, 256, 5]
24 -1 1 249728 ultralytics.nn.modules.block.PSA [256, 256]
25 13 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
26 [-1, 9] 1 0 ultralytics.nn.modules.conv.Concat [1]
27 -1 1 79168 ultralytics.nn.AddModules.MBConv.C2f_MBConv [384, 128]
28 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
29 [-1, 7] 1 0 ultralytics.nn.modules.conv.Concat [1]
30 -1 1 20128 ultralytics.nn.AddModules.MBConv.C2f_MBConv [192, 64]
31 24 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
32 [-1, 20] 1 0 ultralytics.nn.modules.conv.Concat [1]
33 -1 1 79168 ultralytics.nn.AddModules.MBConv.C2f_MBConv [384, 128]
34 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
35 [-1, 18] 1 0 ultralytics.nn.modules.conv.Concat [1]
36 -1 1 20128 ultralytics.nn.AddModules.MBConv.C2f_MBConv [192, 64]
37 [13, 24] 1 0 ultralytics.nn.modules.conv.Concat [1]
38 [27, 33] 1 0 ultralytics.nn.modules.conv.Concat [1]
39 [30, 36] 1 0 ultralytics.nn.modules.conv.Concat [1]
40 -1 1 73856 ultralytics.nn.modules.conv.Conv [128, 64, 3, 2]
41 [-1, 38] 1 0 ultralytics.nn.modules.conv.Concat [1]
42 -1 1 70976 ultralytics.nn.AddModules.MBConv.C2f_MBConv [320, 128]
43 -1 1 18048 ultralytics.nn.modules.block.SCDown [128, 128, 3, 2]
44 [-1, 37] 1 0 ultralytics.nn.modules.conv.Concat [1]
45 -1 1 335104 ultralytics.nn.AddModules.MBConv.C2fCIB_MBConv[640, 256, True, True]
46 [39, 42, 45] 1 1090454 ultralytics.nn.modules.head.v10Detect [1, [128, 128, 256]]
YOLOv10n-mid-to-late-MBConv summary: 573 layers, 3,360,470 parameters, 3,360,454 gradients, 11.3 GFLOPs
后期融合结果:
YOLOv10n-late-MBConv summary: 625 layers, 4,170,006 parameters, 4,169,990 gradients, 12.5 GFLOPs
from n params module arguments
0 -1 1 0 ultralytics.nn.AddModules.multimodal.IN []
1 -1 1 0 ultralytics.nn.AddModules.multimodal.Multiin [1]
2 -2 1 0 ultralytics.nn.AddModules.multimodal.Multiin [2]
3 1 1 464 ultralytics.nn.modules.conv.Conv [3, 16, 3, 2]
4 -1 1 4672 ultralytics.nn.modules.conv.Conv [16, 32, 3, 2]
5 -1 1 3152 ultralytics.nn.AddModules.MBConv.C2f_MBConv [32, 32, True]
6 -1 1 18560 ultralytics.nn.modules.conv.Conv [32, 64, 3, 2]
7 -1 2 23872 ultralytics.nn.AddModules.MBConv.C2f_MBConv [64, 64, True]
8 -1 1 9856 ultralytics.nn.modules.block.SCDown [64, 128, 3, 2]
9 -1 2 92800 ultralytics.nn.AddModules.MBConv.C2f_MBConv [128, 128, True]
10 -1 1 36096 ultralytics.nn.modules.block.SCDown [128, 256, 3, 2]
11 -1 1 182912 ultralytics.nn.AddModules.MBConv.C2f_MBConv [256, 256, True]
12 -1 1 164608 ultralytics.nn.modules.block.SPPF [256, 256, 5]
13 -1 1 249728 ultralytics.nn.modules.block.PSA [256, 256]
14 2 1 464 ultralytics.nn.modules.conv.Conv [3, 16, 3, 2]
15 -1 1 4672 ultralytics.nn.modules.conv.Conv [16, 32, 3, 2]
16 -1 1 3152 ultralytics.nn.AddModules.MBConv.C2f_MBConv [32, 32, True]
17 -1 1 18560 ultralytics.nn.modules.conv.Conv [32, 64, 3, 2]
18 -1 2 23872 ultralytics.nn.AddModules.MBConv.C2f_MBConv [64, 64, True]
19 -1 1 9856 ultralytics.nn.modules.block.SCDown [64, 128, 3, 2]
20 -1 2 92800 ultralytics.nn.AddModules.MBConv.C2f_MBConv [128, 128, True]
21 -1 1 36096 ultralytics.nn.modules.block.SCDown [128, 256, 3, 2]
22 -1 1 182912 ultralytics.nn.AddModules.MBConv.C2f_MBConv [256, 256, True]
23 -1 1 164608 ultralytics.nn.modules.block.SPPF [256, 256, 5]
24 -1 1 249728 ultralytics.nn.modules.block.PSA [256, 256]
25 13 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
26 [-1, 9] 1 0 ultralytics.nn.modules.conv.Concat [1]
27 -1 1 79168 ultralytics.nn.AddModules.MBConv.C2f_MBConv [384, 128]
28 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
29 [-1, 7] 1 0 ultralytics.nn.modules.conv.Concat [1]
30 -1 1 20128 ultralytics.nn.AddModules.MBConv.C2f_MBConv [192, 64]
31 -1 1 36992 ultralytics.nn.modules.conv.Conv [64, 64, 3, 2]
32 [-1, 27] 1 0 ultralytics.nn.modules.conv.Concat [1]
33 -1 1 54592 ultralytics.nn.AddModules.MBConv.C2f_MBConv [192, 128]
34 -1 1 18048 ultralytics.nn.modules.block.SCDown [128, 128, 3, 2]
35 [-1, 13] 1 0 ultralytics.nn.modules.conv.Concat [1]
36 -1 1 269568 ultralytics.nn.AddModules.MBConv.C2fCIB_MBConv[384, 256, True, True]
37 24 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
38 [-1, 20] 1 0 ultralytics.nn.modules.conv.Concat [1]
39 -1 1 79168 ultralytics.nn.AddModules.MBConv.C2f_MBConv [384, 128]
40 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
41 [-1, 18] 1 0 ultralytics.nn.modules.conv.Concat [1]
42 -1 1 20128 ultralytics.nn.AddModules.MBConv.C2f_MBConv [192, 64]
43 -1 1 36992 ultralytics.nn.modules.conv.Conv [64, 64, 3, 2]
44 [-1, 39] 1 0 ultralytics.nn.modules.conv.Concat [1]
45 -1 1 54592 ultralytics.nn.AddModules.MBConv.C2f_MBConv [192, 128]
46 -1 1 18048 ultralytics.nn.modules.block.SCDown [128, 128, 3, 2]
47 [-1, 24] 1 0 ultralytics.nn.modules.conv.Concat [1]
48 -1 1 269568 ultralytics.nn.AddModules.MBConv.C2fCIB_MBConv[384, 256, True, True]
49 [30, 42] 1 0 ultralytics.nn.modules.conv.Concat [1]
50 [33, 45] 1 0 ultralytics.nn.modules.conv.Concat [1]
51 [36, 48] 1 0 ultralytics.nn.modules.conv.Concat [1]
52 [49, 50, 51] 1 1639574 ultralytics.nn.modules.head.v10Detect [1, [128, 256, 512]]
YOLOv10n-late-MBConv summary: 625 layers, 4,170,006 parameters, 4,169,990 gradients, 12.5 GFLOPs