【YOLOv13单模态融合改进】普通数据集的双模型融合改进,涉及中期、后期融合方式的完整配置步骤以及二次改进方案
前言
主题: YOLOv13的单模态融合改进,普通数据集的双模型融合改进(双模型同步提升)
方式: 中期融合、后期融合。
内容: 包含融合方式详解和完整配置步骤以及二次改进建议,通过融合多个模型的优势实现精度提升。
一、融合方式
输入的是一个模态的数据,所以没有早期的融合。
1.1 中期融合方法及结构图
定义: 在网络中间层(骨干网络与颈部网络之间)对多模态特征进行融合。
实现方式: 每个模态通过独立的骨干网络提取特征,融合时采用Add操作合并特征图,送入颈部网络。
结构示意图:
1.2 后期融合方法及结构图
定义: 在网络输出阶段(如检测头或分类器前)对多模态特征进行融合。
实现方式: 每个模态通过独立的骨干网络和颈部网络提取特征,融合时采用Add操作合并特征图,送入检测头。
结构示意图:
二、完整配置步骤
相关的配置只涉及单模态,只需在原本的项目包中配置运行即可,不需要使用我提供的多模态项目包。
①:在
ultralytics/nn/modules/block.py
中添加如下代码,并在
__all__
中添加
“IN”
class IN(nn.Module):
def __init__(self):
super().__init__()
def forward(self, x):
return x
②:在
ultralytics/nn/modules/__init__.py
中的
from .block import (...)
添加
IN
③:在
ultralytics/nn/tasks.py
中的
from ultralytics.nn.modules import (...)
中添加
IN
至此,添加完成。
三、YAML模型结构
此处以
ultralytics/cfg/models/v13/yolov13.yaml
为例,在同目录下创建一个用于自己数据集训练的双模型融合文件,并粘贴下方的模型训练即可。
3.1 中期融合
ch: 6
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov13n.yaml' will call yolov13.yaml with scale 'n'
# [depth, width, max_channels]
n: [0.50, 0.25, 1024] # Nano
s: [0.50, 0.50, 1024] # Small
l: [1.00, 1.00, 512] # Large
x: [1.00, 1.50, 512] # Extra Large
backbone:
# [from, repeats, module, args]
- [-1, 1, IN, []] # 0
- [0, 1, Conv, [64, 3, 2]] # 3-P1/2
- [-1, 1, Conv, [128, 3, 2, 1, 2]] # 4-P2/4
- [-1, 2, DSC3k2, [256, False, 0.25]]
- [-1, 1, Conv, [256, 3, 2, 1, 4]] # 6-P3/8
- [-1, 2, DSC3k2, [512, True]]
- [-1, 1, DSConv, [512, 3, 2]] # 8-P4/16
- [-1, 4, A2C2f, [512, True, 4]]
- [-1, 1, DSConv, [1024, 3, 2]] # 10-P5/32
- [-1, 4, A2C2f, [1024, True, 1]] # 11
- [0, 1, Conv, [64, 3, 2]] # 12-P1/2
- [-1, 1, Conv, [128, 3, 2, 1, 2]] # 13-P2/4
- [-1, 2, DSC3k2, [256, False, 0.25]]
- [-1, 1, Conv, [256, 3, 2, 1, 4]] # 15-P3/8
- [-1, 2, DSC3k2, [512, True]]
- [-1, 1, DSConv, [512, 3, 2]] # 17-P4/16
- [-1, 4, A2C2f, [512, True, 4]]
- [-1, 1, DSConv, [1024, 3, 2]] # 19-P5/32
- [-1, 4, A2C2f, [1024, True, 1]] # 20
- [[5, 14], 1, Add, [1]] # 21 cat backbone P3
- [[7, 16], 1, Add, [1]] # 22 cat backbone P4
- [[9, 18], 1, Add, [1]] # 23 cat backbone P5
head:
- [[19, 20, 21], 2, HyperACE, [512, 8, True, True, 0.5, 1, "both"]]
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [ 22, 1, DownsampleConv, []]
- [[20, 22], 1, FullPAD_Tunnel, []] # 27
- [[19, 23], 1, FullPAD_Tunnel, []] # 28
- [[21, 24], 1, FullPAD_Tunnel, []] # 29
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 25], 1, Concat, [1]] # cat backbone P4
- [-1, 2, DSC3k2, [512, True]] # 32
- [[-1, 22], 1, FullPAD_Tunnel, []] # 33
- [30, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 26], 1, Concat, [1]] # cat backbone P3
- [-1, 2, DSC3k2, [256, True]] # 36
- [23, 1, Conv, [256, 1, 1]]
- [[34, 35], 1, FullPAD_Tunnel, []] # 38
- [-1, 1, Conv, [256, 3, 2]]
- [[-1, 31], 1, Concat, [1]] # cat head P4
- [-1, 2, DSC3k2, [512, True]] # 41
- [[-1, 22], 1, FullPAD_Tunnel, []]
- [39, 1, Conv, [512, 3, 2]]
- [[-1, 27], 1, Concat, [1]] # cat head P5
- [-1, 2, DSC3k2, [1024,True]] # 45 (P5/32-large)
- [[-1, 24], 1, FullPAD_Tunnel, []]
- [[36, 40, 44], 1, Detect, [nc]] # Detect(P3, P4, P5)
3.2 后期融合
ch: 6
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov13n.yaml' will call yolov13.yaml with scale 'n'
# [depth, width, max_channels]
n: [0.50, 0.25, 1024] # Nano
s: [0.50, 0.50, 1024] # Small
l: [1.00, 1.00, 512] # Large
x: [1.00, 1.50, 512] # Extra Large
backbone:
# [from, repeats, module, args]
- [-1, 1, IN, []] # 0
- [0, 1, Conv, [64, 3, 2]] # 3-P1/2
- [-1, 1, Conv, [128, 3, 2, 1, 2]] # 4-P2/4
- [-1, 2, DSC3k2, [256, False, 0.25]]
- [-1, 1, Conv, [256, 3, 2, 1, 4]] # 6-P3/8
- [-1, 2, DSC3k2, [512, True]]
- [-1, 1, DSConv, [512, 3, 2]] # 8-P4/16
- [-1, 4, A2C2f, [512, True, 4]]
- [-1, 1, DSConv, [1024, 3, 2]] # 10-P5/32
- [-1, 4, A2C2f, [1024, True, 1]] # 11
- [0, 1, Conv, [64, 3, 2]] # 12-P1/2
- [-1, 1, Conv, [128, 3, 2, 1, 2]] # 13-P2/4
- [-1, 2, DSC3k2, [256, False, 0.25]]
- [-1, 1, Conv, [256, 3, 2, 1, 4]] # 15-P3/8
- [-1, 2, DSC3k2, [512, True]]
- [-1, 1, DSConv, [512, 3, 2]] # 17-P4/16
- [-1, 4, A2C2f, [512, True, 4]]
- [-1, 1, DSConv, [1024, 3, 2]] # 19-P5/32
- [-1, 4, A2C2f, [1024, True, 1]] # 20
head:
- [[5, 7, 9], 2, HyperACE, [512, 8, True, True, 0.5, 1, "both"]]
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [ 19, 1, DownsampleConv, []]
- [[7, 19], 1, FullPAD_Tunnel, []] # 24
- [[5, 20], 1, FullPAD_Tunnel, []] # 25
- [[9, 21], 1, FullPAD_Tunnel, []] # 26
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 22], 1, Concat, [1]] # cat backbone P4
- [-1, 2, DSC3k2, [512, True]] # 29
- [[-1, 19], 1, FullPAD_Tunnel, []] # 30
- [27, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 23], 1, Concat, [1]] # cat backbone P3
- [-1, 2, DSC3k2, [256, True]] # 33
- [20, 1, Conv, [256, 1, 1]]
- [[31, 32], 1, FullPAD_Tunnel, []] # 35
- [-1, 1, Conv, [256, 3, 2]]
- [[-1, 28], 1, Concat, [1]] # cat head P4
- [-1, 2, DSC3k2, [512, True]] # 38
- [[-1, 19], 1, FullPAD_Tunnel, []]
- [36, 1, Conv, [512, 3, 2]]
- [[-1, 24], 1, Concat, [1]] # cat head P5
- [-1, 2, DSC3k2, [1024,True]] # 42 (P5/32-large)
- [[-1, 21], 1, FullPAD_Tunnel, []]
- [[14, 16, 18], 2, HyperACE, [512, 8, True, True, 0.5, 1, "both"]]
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [ 42, 1, DownsampleConv, []]
- [[16, 42], 1, FullPAD_Tunnel, []] # 47
- [[14, 43], 1, FullPAD_Tunnel, []] # 48
- [[18, 44], 1, FullPAD_Tunnel, []] # 49
- [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 45], 1, Concat, [1]] # cat backbone P4
- [-1, 2, DSC3k2, [512, True]] # 52
- [[-1, 42], 1, FullPAD_Tunnel, []] # 53
- [50, 1, nn.Upsample, [None, 2, "nearest"]]
- [[-1, 46], 1, Concat, [1]] # cat backbone P3
- [-1, 2, DSC3k2, [256, True]] # 56
- [43, 1, Conv, [256, 1, 1]]
- [[54, 55], 1, FullPAD_Tunnel, []] # 58
- [-1, 1, Conv, [256, 3, 2]]
- [[-1, 51], 1, Concat, [1]] # cat head P4
- [-1, 2, DSC3k2, [512, True]] # 61
- [[-1, 42], 1, FullPAD_Tunnel, []]
- [59, 1, Conv, [512, 3, 2]]
- [[-1, 47], 1, Concat, [1]] # cat head P5
- [-1, 2, DSC3k2, [1024,True]] # 65 (P5/32-large)
- [[-1, 44], 1, FullPAD_Tunnel, []]
- [[33, 56], 1, Add, [1]] # 67 cat backbone P3
- [[37, 60], 1, Add, [1]] # 68 cat backbone P4
- [[41, 64], 1, Add, [1]] # 69 cat backbone P5
- [[65, 66, 67], 1, Detect, [nc]] # Detect(P3, P4, P5)
四、二次改进方案
-
双模型的二次改进和普通模型的改进一致,主要涉及到DSC3k2、A2C2f、颈部结构、上采样、下采样等,可以增加或替换成其它模块,可以换成其它的颈部结构在进行融合。
-
两个骨干中均可以再次添加其它模块,需要注意的是融合的时候特征图大小要一致。