【YOLOv8多模态融合改进】| Arxiv 2024 DEYOLO：利用双增强机制和双向解耦聚焦模块，构建跨模态特征融合与单模态优化的完整框架

一、本文介绍

本文记录的是利用 DEYOLO中的多模态融合模块改进 YOLOv8 的多模态融合部分 。

DEYOLO 针对 RGB 和红外多模态目标检测任务设计了 双特征增强机制 ，通过 DECA （双语义增强通道权重分配模块）、 DEPA （双空间增强像素权重分配模块）和 双向解耦聚焦模块 （Bi-direction Decoupled Focus）实现 跨模态特征融合与单模态特征优化 ， 有效解决多模态检测中的信息互补与干扰抑制问题。

二、CGA Fusion模块介绍

DEYOLO: Dual-Feature-Enhancement YOLO for Cross-Modality Object Detection

2.1 双语义增强通道权重分配模块（DECA）

模块结构
- 输入：骨干网络提取的RGB特征图 $F_{V_0}$ 和红外特征图 $F_{IR_0}$ (均为 $\mathbb{R}^{b×c×h×w}$ ）。
- 跨模态混合 ：将 $F_{V_0}$ 和 $F_{IR_0}$ 沿通道维度拼接后，通过卷积操作生成混合特征图 $F_{Mix_0}$ ，过滤冗余信息。
- 权重编码 ：
- 通过 跨模态权重提取操作（CMWE 将 $F_{Mix_0}$ 在空间维度压缩为通道权重 $W_{Mix_0} \in \mathbb{R}^{b×c×1×1}$ ，捕捉跨模态通道依赖。
- 利用 通道权重提取块（CWE) 分别提取单模态通道权重 $W_{V_0}$ 和 $W_{IR_0}$ ，类似SE模块机制。
- 双向增强 ：
  - 第一增强 ：单模态权重与跨模态权重通过Softmax归一化后逐元素相乘，生成增强权重 $W_{enV_0}$ 和 $W_{enIR_0}$ ，突出跨模态互补的关键通道。
  - 第二增强 ：单模态特征图与增强权重相乘，融合另一模态的语义信息，输出 $F_{V_1}$ 和 $F_{IR_1}$ 进入DEPA模块。
核心优势
- 语义信息增强 ：通过通道权重重分配，强化跨模态语义互补（如RGB的纹理与红外的结构），抑制模态间干扰（如红外亮度对RGB细节的掩盖）。
- 双向交互机制 ：既利用单模态特征优化跨模态融合结果，又通过融合结果反哺单模态特征，形成闭环增强。

在这里插入图片描述

2.2 双空间增强像素权重分配模块（DEPA）

模块结构
- 输入：DECA输出的特征图 $F_{V_1}$ 和 $F_{IR_1}$ 。
- 跨模态混合 ：通过卷积变换和逐元素相乘生成全局混合特征 $W_{Mix_1}$ ，捕捉跨模态空间依赖。
- 多尺度空间权重提取 ：
  - 使用不同尺寸卷积核（如3×3和5×5）提取单模态空间特征，拼接后压缩通道数，生成单模态像素权重 $W_{V_1}$ 和 $W_{IR_1}$ 。
- 双向增强 ：
  - 第一增强 ：单模态空间权重与跨模态混合特征的Softmax结果相乘，生成增强后的空间权重 $W_{enV_1}$ 和 $W_{enIR_1}$ ，突出关键像素位置。
  - 第二增强 ：单模态特征图与增强空间权重相乘，融合另一模态的结构信息，最终逐元素相加输出融合特征。
核心优势
- 空间信息对齐 ：通过多尺度卷积捕捉不同粒度的空间结构（如边缘、轮廓），解决红外与RGB像素级对齐难题。
- 干扰抑制 ：利用跨模态混合特征的Softmax权重抑制冲突区域（如红外无纹理区域对RGB细节的破坏），提升特征一致性。

2.3 双向解耦聚焦模块（Bi-direction Decoupled Focus）

模块结构
- 设计灵感 ：基于YOLOv5的Focus模块，通过水平和垂直方向的切片采样，实现无信息损失的下采样。
- 双向采样 ：将输入特征图分为两组，分别沿水平和垂直方向间隔采样，捕捉相邻和远程像素信息。
- 特征融合 ：将采样后的特征图与原始特征图在通道维度拼接，通过深度可分离卷积整合多方向信息，扩大感受野。
核心优势
- 多方向特征捕捉 ：避免传统下采样导致的边缘信息丢失，尤其适用于红外图像的弱纹理目标检测。
- 轻量化设计 ：通过解耦采样和深度卷积，在几乎不增加计算量的前提下提升骨干网络的特征表达能力。

在这里插入图片描述

2.4 模块协同优势

跨模态融合与单模态优化的统一 ：
- DECA和DEPA在特征空间中实现 通道-空间维度的双向增强 ，既融合跨模态互补信息（如RGB语义+红外结构），又通过单模态增强抑制模态间干扰。
- 双向解耦聚焦模块为骨干网络提供更丰富的多方向特征，增强后续DECA/DEPA的特征输入质量。
检测任务导向的设计 ：
区别于传统图像融合方法（仅关注像素级融合质量），DEYOLO的模块设计完全围绕目标检测优化，例如通过通道/空间权重显式强化目标区域的特征响应。

论文： https://arxiv.org/abs/2412.04931
源码： https://github.com/chips96/DEYOLO

三、DEYOLO的实现代码

DEYOLO 的实现代码如下：

"""
DEA (DECA and DEPA) module
"""

import torch
import torch.nn as nn

def autopad(k, p=None, d=1):  # kernel, padding, dilation
    """Pad to 'same' shape outputs."""
    if d > 1:
        k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k]  # actual kernel-size
    if p is None:
        p = k // 2 if isinstance(k, int) else [x // 2 for x in k]  # auto-pad
    return p

class Conv(nn.Module):
    """Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)."""
    default_act = nn.SiLU()  # default activation

    def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):
        """Initialize Conv layer with given arguments including activation."""
        super().__init__()
        self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)
        self.bn = nn.BatchNorm2d(c2)
        self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()

    def forward(self, x):
        """Apply convolution, batch normalization and activation to input tensor."""
        return self.act(self.bn(self.conv(x)))

    def forward_fuse(self, x):
        """Perform transposed convolution of 2D data."""
        return self.act(self.conv(x))

class DEA(nn.Module):
    """x0 --> RGB feature map,  x1 --> IR feature map"""

    def __init__(self, channel=512, kernel_size=80, p_kernel=None, m_kernel=None, reduction=16):
        super().__init__()
        self.deca = DECA(channel, kernel_size, p_kernel, reduction)
        self.depa = DEPA(channel, m_kernel)
        self.act = nn.Sigmoid()

    def forward(self, x):
        result_vi, result_ir = self.depa(self.deca(x))
        return self.act(result_vi + result_ir)

class DECA(nn.Module):
    """x0 --> RGB feature map,  x1 --> IR feature map"""

    def __init__(self, channel=512, kernel_size=80, p_kernel=None, reduction=16):
        super().__init__()
        self.kernel_size = kernel_size
        self.avg_pool = nn.AdaptiveAvgPool2d(1)
        self.fc = nn.Sequential(
            nn.Linear(channel, channel // reduction, bias=False),
            nn.ReLU(inplace=True),
            nn.Linear(channel // reduction, channel, bias=False),
            nn.Sigmoid()
        )
        self.act = nn.Sigmoid()
        self.compress = Conv(channel * 2, channel, 3)

        """convolution pyramid"""
        if p_kernel is None:
            p_kernel = [5, 4]
        kernel1, kernel2 = p_kernel
        self.conv_c1 = nn.Sequential(nn.Conv2d(channel, channel, kernel1, kernel1, 0, groups=channel), nn.SiLU())
        self.conv_c2 = nn.Sequential(nn.Conv2d(channel, channel, kernel2, kernel2, 0, groups=channel), nn.SiLU())
        self.conv_c3 = nn.Sequential(
            nn.Conv2d(channel, channel, int(self.kernel_size/kernel1/kernel2), int(self.kernel_size/kernel1/kernel2), 0,
                      groups=channel),
            nn.SiLU()
        )

    def forward(self, x):
        b, c, h, w = x[0].size()
        w_vi = self.avg_pool(x[0]).view(b, c)
        w_ir = self.avg_pool(x[1]).view(b, c)
        w_vi = self.fc(w_vi).view(b, c, 1, 1)
        w_ir = self.fc(w_ir).view(b, c, 1, 1)

        glob_t = self.compress(torch.cat([x[0], x[1]], 1))
        glob = self.conv_c3(self.conv_c2(self.conv_c1(glob_t))) if min(h, w) >= self.kernel_size else torch.mean(
                                                                                    glob_t, dim=[2, 3], keepdim=True)
        result_vi = x[0] * (self.act(w_ir * glob)).expand_as(x[0])
        result_ir = x[1] * (self.act(w_vi * glob)).expand_as(x[1])

        return result_vi, result_ir

class DEPA(nn.Module):
    """x0 --> RGB feature map,  x1 --> IR feature map"""
    def __init__(self, channel=512, m_kernel=None):
        super().__init__()
        self.conv1 = Conv(2, 1, 5)
        self.conv2 = Conv(2, 1, 5)
        self.compress1 = Conv(channel, 1, 3)
        self.compress2 = Conv(channel, 1, 3)
        self.act = nn.Sigmoid()

        """convolution merge"""
        if m_kernel is None:
            m_kernel = [3, 7]
        self.cv_v1 = Conv(channel, 1, m_kernel[0])
        self.cv_v2 = Conv(channel, 1, m_kernel[1])
        self.cv_i1 = Conv(channel, 1, m_kernel[0])
        self.cv_i2 = Conv(channel, 1, m_kernel[1])

    def forward(self, x):
        w_vi = self.conv1(torch.cat([self.cv_v1(x[0]), self.cv_v2(x[0])], 1))
        w_ir = self.conv2(torch.cat([self.cv_i1(x[1]), self.cv_i2(x[1])], 1))
        glob = self.act(self.compress1(x[0]) + self.compress2(x[1]))
        w_vi = self.act(glob + w_vi)
        w_ir = self.act(glob + w_ir)
        result_vi = x[0] * w_ir.expand_as(x[0])
        result_ir = x[1] * w_vi.expand_as(x[1])

        return result_vi, result_ir

class Bottleneck(nn.Module):
    """Standard bottleneck."""

    def __init__(self, c1, c2, shortcut=True, g=1, k=(3, 3), e=0.5):
        """Initializes a standard bottleneck module with optional shortcut connection and configurable parameters."""
        super().__init__()
        c_ = int(c2 * e)  # hidden channels
        self.cv1 = Conv(c1, c_, k[0], 1)
        self.cv2 = Conv(c_, c2, k[1], 1, g=g)
        self.add = shortcut and c1 == c2

    def forward(self, x):
        """Applies the YOLO FPN to input data."""
        return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))

class BiFocus(nn.Module):
    def __init__(self, c1, c2):
        super().__init__()
        self.focus_h = FocusH(c1, c1, 3, 1)
        self.focus_v = FocusV(c1, c1, 3, 1)
        self.depth_wise = DepthWiseConv(3 * c1, c2, 3)

    def forward(self, x):
        return self.depth_wise(torch.cat([x, self.focus_h(x), self.focus_v(x)], dim=1))

class FocusH(nn.Module):

    def __init__(self, c1, c2, kernel=3, stride=1):
        super().__init__()
        self.c2 = c2
        self.conv1 = Conv(c1, c2, kernel, stride)
        self.conv2 = Conv(c1, c2, kernel, stride)

    def forward(self, x):
        b, _, h, w = x.shape
        result = torch.zeros(size=[b, self.c2, h, w], device=x.device, dtype=x.dtype)
        x1 = torch.zeros(size=[b, self.c2, h, w // 2], device=x.device, dtype=x.dtype)
        x2 = torch.zeros(size=[b, self.c2, h, w // 2], device=x.device, dtype=x.dtype)

        x1[..., ::2, :], x1[..., 1::2, :] = x[..., ::2, ::2], x[..., 1::2, 1::2]
        x2[..., ::2, :], x2[..., 1::2, :] = x[..., ::2, 1::2], x[..., 1::2, ::2]

        x1 = self.conv1(x1)
        x2 = self.conv2(x2)

        result[..., ::2, ::2] = x1[..., ::2, :]
        result[..., 1::2, 1::2] = x1[..., 1::2, :]
        result[..., ::2, 1::2] = x2[..., ::2, :]
        result[..., 1::2, ::2] = x2[..., 1::2, :]

        return result

class FocusV(nn.Module):

    def __init__(self, c1, c2, kernel=3, stride=1):
        super().__init__()
        self.c2 = c2
        self.conv1 = Conv(c1, c2, kernel, stride)
        self.conv2 = Conv(c1, c2, kernel, stride)

    def forward(self, x):
        b, _, h, w = x.shape
        result = torch.zeros(size=[b, self.c2, h, w], device=x.device, dtype=x.dtype)
        x1 = torch.zeros(size=[b, self.c2, h // 2, w], device=x.device, dtype=x.dtype)
        x2 = torch.zeros(size=[b, self.c2, h // 2, w], device=x.device, dtype=x.dtype)

        x1[..., ::2], x1[..., 1::2] = x[..., ::2, ::2], x[..., 1::2, 1::2]
        x2[..., ::2], x2[..., 1::2] = x[..., 1::2, ::2], x[..., ::2, 1::2]

        x1 = self.conv1(x1)
        x2 = self.conv2(x2)

        result[..., ::2, ::2] = x1[..., ::2]
        result[..., 1::2, 1::2] = x1[..., 1::2]
        result[..., 1::2, ::2] = x2[..., ::2]
        result[..., ::2, 1::2] = x2[..., 1::2]

        return result

class DepthWiseConv(nn.Module):

    def __init__(self, in_channel, out_channel, kernel):
        super(DepthWiseConv, self).__init__()
        self.depth_conv = Conv(in_channel, in_channel, kernel, 1, 1, in_channel)
        self.point_conv = Conv(in_channel, out_channel, 1, 1, 0, 1)

    def forward(self, x):
        out = self.depth_conv(x)
        out = self.point_conv(out)

        return out

class C2f_BiFocus(nn.Module):

    def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5):  # ch_in, ch_out, number, shortcut, groups, expansion
        super().__init__()
        self.c = int(c2 * e)  # hidden channels
        self.cv1 = Conv(c1, 2 * self.c, 1, 1)
        self.cv2 = Conv((2 + n) * self.c, c2, 1)  # optional act=FReLU(c2)
        self.m = nn.ModuleList(Bottleneck(self.c, self.c, shortcut, g, k=((3, 3), (3, 3)), e=1.0) for _ in range(n))

        self.bifocus = BiFocus(c2, c2)

    def forward(self, x):
        """Forward pass through C2f layer."""
        y = list(self.cv1(x).chunk(2, 1))
        y.extend(m(y[-1]) for m in self.m)
        y = self.cv2(torch.cat(y, 1))

        return self.bifocus(y)

    def forward_split(self, x):
        """Forward pass using split() instead of chunk()."""
        y = list(self.cv1(x).split((self.c, self.c), 1))
        y.extend(m(y[-1]) for m in self.m)
        return self.cv2(torch.cat(y, 1))

class C3f(nn.Module):
    """Faster Implementation of CSP Bottleneck with 2 convolutions."""

    def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5):
        """Initialize CSP bottleneck layer with two convolutions with arguments ch_in, ch_out, number, shortcut, groups,
        expansion.
        """
        super().__init__()
        c_ = int(c2 * e)  # hidden channels
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = Conv(c1, c_, 1, 1)
        self.cv3 = Conv((2 + n) * c_, c2, 1)  # optional act=FReLU(c2)
        self.m = nn.ModuleList(Bottleneck(c_, c_, shortcut, g, k=((3, 3), (3, 3)), e=1.0) for _ in range(n))

    def forward(self, x):
        """Forward pass through C2f layer."""
        y = [self.cv2(x), self.cv1(x)]
        y.extend(m(y[-1]) for m in self.m)
        return self.cv3(torch.cat(y, 1))

class C3(nn.Module):
    """CSP Bottleneck with 3 convolutions."""

    def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):
        """Initialize the CSP Bottleneck with given channels, number, shortcut, groups, and expansion values."""
        super().__init__()
        c_ = int(c2 * e)  # hidden channels
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = Conv(c1, c_, 1, 1)
        self.cv3 = Conv(2 * c_, c2, 1)  # optional act=FReLU(c2)
        self.m = nn.Sequential(*(Bottleneck(c_, c_, shortcut, g, k=((1, 1), (3, 3)), e=1.0) for _ in range(n)))

    def forward(self, x):
        """Forward pass through the CSP bottleneck with 2 convolutions."""
        return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), 1))

class C3k(C3):
    """C3k is a CSP bottleneck module with customizable kernel sizes for feature extraction in neural networks."""

    def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5, k=3):
        """Initializes the C3k module with specified channels, number of layers, and configurations."""
        super().__init__(c1, c2, n, shortcut, g, e)
        c_ = int(c2 * e)  # hidden channels
        # self.m = nn.Sequential(*(RepBottleneck(c_, c_, shortcut, g, k=(k, k), e=1.0) for _ in range(n)))
        self.m = nn.Sequential(*(Bottleneck(c_, c_, shortcut, g, k=(k, k), e=1.0) for _ in range(n)))

class C3k2_BiFocus(C2f_BiFocus):
    """Faster Implementation of CSP Bottleneck with 2 convolutions."""

    def __init__(self, c1, c2, n=1, c3k=False, e=0.5, g=1, shortcut=True):
        """Initializes the C3k2 module, a faster CSP Bottleneck with 2 convolutions and optional C3k blocks."""
        super().__init__(c1, c2, n, shortcut, g, e)
        self.m = nn.ModuleList(
            C3k(self.c, self.c, 2, shortcut, g) if c3k else Bottleneck(self.c, self.c, shortcut, g) for _ in range(n)
        )

四、融合步骤

5.1 修改一

① 在 ultralytics/nn/ 目录下新建 AddModules 文件夹用于存放模块代码

② 在 AddModules 文件夹下新建 DEYOLO.py ，将 第三节 中的代码粘贴到此处

在这里插入图片描述

5.2 修改二

在 AddModules 文件夹下新建 __init__.py （已有则不用新建），在文件内导入模块： from .DEYOLO import *

在这里插入图片描述

5.3 修改三

在 ultralytics/nn/modules/tasks.py 文件中，需要在两处位置添加各模块类名称。

首先：导入模块

在这里插入图片描述

其次：在 parse_model函数 中注册 C2f_BiFocus, C3k2_BiFocus, DEA 模块

在这里插入图片描述

        elif m in {C2f_BiFocus, C3k2_BiFocus}:
            c1, c2 = ch[f], args[0]
            if c2 != nc:  # if c2 not equal to number of classes (i.e. for Classify() output)
                c2 = make_divisible(min(c2, max_channels) * width, 8)
            args = [c1, c2, *args[1:]]
        elif m is DEA:
            c1, c2 = ch[f[0]], args[0]
            if c2 != nc:
                c2 = make_divisible(min(c2, max_channels) * width, 8)
            args = [c1, *args[1:]]

在这里插入图片描述

五、yaml模型文件

5.1 中期融合⭐

📌 此模型的修方法是将DEYOLO的核心模块应用到YOLOv8中。

# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect

# Parameters
ch: 6
nc: 1  # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
  # [depth, width, max_channels]
   n: [0.33, 0.25, 1024]  # YOLOv8n summary: 225 layers,  3157200 parameters,  3157184 gradients,   8.9 GFLOPs
   s: [0.33, 0.50, 1024]  # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients,  28.8 GFLOPs
   m: [0.67, 0.75, 768]   # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients,  79.3 GFLOPs
   l: [1.00, 1.00, 512]   # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPs
   x: [1.00, 1.25, 512]   # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs

# YOLOv8.0n backbone
backbone:
  # [from, repeats, module, args]
  - [-1, 1, IN, []]  # 0
  - [-1, 1, Multiin, [1]]  # 1
  - [-2, 1, Multiin, [2]]  # 2

  - [1, 1, Conv, [64, 3, 2]] # 3-P1/2
  - [-1, 1, Conv, [128, 3, 2]] # 4-P2/4
  - [-1, 3, C2f_BiFocus, [128, True]]
  - [-1, 1, Conv, [256, 3, 2]] # 6-P3/8
  - [-1, 6, C2f, [256, True]]
  - [-1, 1, Conv, [512, 3, 2]] # 8-P4/16
  - [-1, 6, C2f, [512, True]]
  - [-1, 1, Conv, [1024, 3, 2]] # 10-P5/32
  - [-1, 3, C2f, [1024, True]]
  - [-1, 1, SPPF, [1024, 5]] # 12

  - [2, 1, Conv, [64, 3, 2]] # 13-P1/2
  - [-1, 1, Conv, [128, 3, 2]] # 14-P2/4
  - [-1, 3, C2f_BiFocus, [128, True]]
  - [-1, 1, Conv, [256, 3, 2]] # 16-P3/8
  - [-1, 6, C2f, [256, True]]
  - [-1, 1, Conv, [512, 3, 2]] # 18-P4/16
  - [-1, 6, C2f, [512, True]]
  - [-1, 1, Conv, [1024, 3, 2]] # 20-P5/32
  - [-1, 3, C2f, [1024, True]]
  - [-1, 1, SPPF, [1024, 5]] # 22

  - [[7, 17], 1, DEA, [256, 80]]  # 23 cat backbone P3
  - [[9, 19], 1, DEA, [512, 40]]  # 24 cat backbone P4
  - [[12, 22], 1, DEA, [1024, 20]]  # 25 cat backbone P5

 # YOLOv8.0n head
head:
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
  - [[-1, 24], 1, Concat, [1]]  # cat backbone P4
  - [-1, 3, C2f, [512]]  # 28

  - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
  - [[-1, 23], 1, Concat, [1]]  # cat backbone P3
  - [-1, 3, C2f, [256]]  # 31 (P3/8-small)

  - [-1, 1, Conv, [256, 3, 2]]
  - [[-1, 28], 1, Concat, [1]]  # cat head P4
  - [-1, 3, C2f, [512]]  # 34 (P4/16-medium)

  - [-1, 1, Conv, [512, 3, 2]]
  - [[-1, 25], 1, Concat, [1]]  # cat head P5
  - [-1, 3, C2f, [1024]]  # 37 (P5/32-large)

  - [[31, 34, 37], 1, Detect, [nc]]  # Detect(P3, P4, P5)

六、成功运行结果

打印网络模型可以看到不同的融合层已经加入到模型中，并可以进行训练了。

YOLOv8-DEYOLO ：

YOLOv8-DEYOLO summary: 538 layers, 5,687,503 parameters, 5,687,487 gradients, 15.6 GFLOPs

                   from  n    params  module                                       arguments
  0                  -1  1         0  ultralytics.nn.AddModules.multimodal.IN      []
  1                  -1  1         0  ultralytics.nn.AddModules.multimodal.Multiin [1]
  2                  -2  1         0  ultralytics.nn.AddModules.multimodal.Multiin [2]
  3                   1  1       464  ultralytics.nn.modules.conv.Conv             [3, 16, 3, 2]
  4                  -1  1      4672  ultralytics.nn.modules.conv.Conv             [16, 32, 3, 2]
  5                  -1  1     48672  ultralytics.nn.AddModules.DEYOLO.C2f_BiFocus [32, 32, True]
  6                  -1  1     18560  ultralytics.nn.modules.conv.Conv             [32, 64, 3, 2]
  7                  -1  2     49664  ultralytics.nn.modules.block.C2f             [64, 64, 2, True]
  8                  -1  1     73984  ultralytics.nn.modules.conv.Conv             [64, 128, 3, 2]
  9                  -1  2    197632  ultralytics.nn.modules.block.C2f             [128, 128, 2, True]
 10                  -1  1    295424  ultralytics.nn.modules.conv.Conv             [128, 256, 3, 2]
 11                  -1  1    460288  ultralytics.nn.modules.block.C2f             [256, 256, 1, True]
 12                  -1  1    164608  ultralytics.nn.modules.block.SPPF            [256, 256, 5]
 13                   2  1       464  ultralytics.nn.modules.conv.Conv             [3, 16, 3, 2]
 14                  -1  1      4672  ultralytics.nn.modules.conv.Conv             [16, 32, 3, 2]
 15                  -1  1     48672  ultralytics.nn.AddModules.DEYOLO.C2f_BiFocus [32, 32, True]
 16                  -1  1     18560  ultralytics.nn.modules.conv.Conv             [32, 64, 3, 2]
 17                  -1  2     49664  ultralytics.nn.modules.block.C2f             [64, 64, 2, True]
 18                  -1  1     73984  ultralytics.nn.modules.conv.Conv             [64, 128, 3, 2]
 19                  -1  2    197632  ultralytics.nn.modules.block.C2f             [128, 128, 2, True]
 20                  -1  1    295424  ultralytics.nn.modules.conv.Conv             [128, 256, 3, 2]
 21                  -1  1    460288  ultralytics.nn.modules.block.C2f             [256, 256, 1, True]
 22                  -1  1    164608  ultralytics.nn.modules.block.SPPF            [256, 256, 5]
 23             [7, 17]  1     86900  ultralytics.nn.AddModules.DEYOLO.DEA         [64, 80]
 24             [9, 19]  1    320628  ultralytics.nn.AddModules.DEYOLO.DEA         [128, 40]
 25            [12, 22]  1   1234292  ultralytics.nn.AddModules.DEYOLO.DEA         [256, 20]
 26                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']
 27            [-1, 24]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 28                  -1  1    148224  ultralytics.nn.modules.block.C2f             [384, 128, 1]
 29                  -1  1         0  torch.nn.modules.upsampling.Upsample         [None, 2, 'nearest']
 30            [-1, 23]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 31                  -1  1     37248  ultralytics.nn.modules.block.C2f             [192, 64, 1]
 32                  -1  1     36992  ultralytics.nn.modules.conv.Conv             [64, 64, 3, 2]
 33            [-1, 28]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 34                  -1  1    123648  ultralytics.nn.modules.block.C2f             [192, 128, 1]
 35                  -1  1    147712  ultralytics.nn.modules.conv.Conv             [128, 128, 3, 2]
 36            [-1, 25]  1         0  ultralytics.nn.modules.conv.Concat           [1]
 37                  -1  1    493056  ultralytics.nn.modules.block.C2f             [384, 256, 1]
 38        [31, 34, 37]  1    430867  ultralytics.nn.modules.head.Detect           [1, [64, 128, 256]]
YOLOv8-DEYOLO summary: 538 layers, 5,687,503 parameters, 5,687,487 gradients, 15.6 GFLOPs

学习资源站

【YOLOv8多模态融合改进】_Arxiv2024DEYOLO-利用双增强机制和双向解耦聚焦模块，构建跨模态特征融合与单模态优化的完整框架-

【YOLOv8多模态融合改进】| Arxiv 2024 DEYOLO：利用双增强机制和双向解耦聚焦模块，构建跨模态特征融合与单模态优化的完整框架

一、本文介绍

二、CGA Fusion模块介绍

2.1 双语义增强通道权重分配模块（DECA）

2.2 双空间增强像素权重分配模块（DEPA）

2.3 双向解耦聚焦模块（Bi-direction Decoupled Focus）

2.4 模块协同优势

三、DEYOLO的实现代码

四、融合步骤

5.1 修改一

5.2 修改二

5.3 修改三

五、yaml模型文件

5.1 中期融合⭐

六、成功运行结果