学习资源站

YOLOv8独家原创改进: 多种新颖的改进方式 _ 保持原始信息-深度可分离卷积(MDSConv) _ 全局感受野的空间金字塔 (Improve-SPPF)算法 _ CA注意力改进版_改进版深度可分离卷积

深度可分离卷积💡💡💡本文自研创新改进:改进1)保持原始信息-深度可分离卷积(MDSConv)解决了不能与原始特征层通道之间的信息交互的问题(如经典的深度可分离卷积);

改进2)提出快速的全局感受野的空间金字塔 (Improve-SPPF)算法,融合局部感受野和全局感受野,以减少不同尺度的影响;

改进3)CA改进版:解决CA注意力机制并没有很好地利用显著信息。因此,设计了一种结合平均池化和最大池化的即插即用坐标注意力;

改进4)基于MODSConv和CA改进版,构建了保持原始信息深度可分离层(MDSLayer)结构,以不降级的方式保护了通道之间的丰富信息;

收录

YOLOv8原创自研

💡💡💡全网独家首发创新(原创),适合paper !!!

💡💡💡 2024年计算机视觉顶会创新点适用于Yolov5、Yolov7、Yolov8等各个Yolo系列,专栏文章提供每一步步骤和源码,轻松带你上手魔改网络 !!!

💡💡💡重点:通过本专栏的阅读,后续你也可以设计魔改网络,在网络不同位置(Backbone、head、detect、loss等)进行魔改,实现创新!!!

1.原理介绍

摘要:民用基础设施在日常生活中发挥着重要作用。如果不及时发现裂缝,可能会对人员和财产造成不可估量的损失。因此,及时准确地检测和定位裂缝具有重要意义。考虑到以往的YOLO (You Only Look Once)系列算法可能会丢失信道信息和缺乏接受场,我们设计了一种保持原始维度的YOLO (MOD-YOLO)算法,并将其应用于民用基础设施的裂缝检测。算法中所有的改进方案都是即插即用的。首先,我们提出了保持原始信息-深度可分离卷积(MODSConv),它解决了我们不能与原始特征层通道之间的信息交互的问题(如经典的深度可分离卷积)。其次,我们提出了Global Receptive Field-Space Pooling Pyramid-Fast来获取全局视角信息,减轻不同尺度的影响。第三,提出了显著特征和平均特征-协调注意(DAF-CA)。这不仅处理参考平均信息,而且考虑显著信息。有了它,我们可以更准确地找到和增强关键信息。此外,我们设计了保持原始信息深度可分离层(MODSLayer),以不降低信道维数的方式保护信道之间的丰富信息。同时,MODSLayer构建了网络的主干和颈部。该网络被命名为维护原始信息深度可分离网络(MODSNet)。最后,设计了保持原始尺寸的光头,以降低通道的非维数。在尽可能轻量化的前提下,在预测前保持尽可能多的特征层信息,显著提高了检测精度和检测速度。实验结果表明,与YOLOX算法相比,该算法在裂纹数据集上的准确率提高了27.5% ~ 91.1%,裂纹检测时间与YOLOX算法基本相同,参数量减少19.7%,计算量减少35.9%。同时,在COCO2017、VOC2007等数据集上的实验验证了其良好的泛化性。提出了用于裂纹检测的整车部署方案,并采用该方案实现了车辆在行驶中裂纹检测算法,附带的实验证明,该算法能够很好地完成车辆在行驶中裂纹检测任务。

论文地址:https://www.sciencedirect.com/science/article/abs/pii/S0957417423018481?via%3Dihub

本文主要贡献点:

1)我们提出了显著特征和平均特征-坐标注意(DAF-CA),它关注空间维度上的重要信息,并将其与空间维度上的背景信息相结合。这样,在分配权重时不仅参考参考平均信息,而且考虑重要信息,从而更准确地找到关键信息,增强关键信息。

2)我们提出了快速的全局感受野的空间金字塔 (GRF-SPPF)算法,将特征层的重要信息与全局感受野的背景信息相结合,从全局角度获取更多的特征。它还融合了局部感受野和全局感受野,以减少不同尺度的影响。

3)深度可分离卷积(MODSConv)是为了解决传统深度可分离卷积无法与原始特征层通道间信息交互的问题而提出的。因此,提取的特征包含更多的相关信息。在MODSConv的基础上,对保持原始结构并保持轻量级(MODL-Head)进行了重新设计,以在预测之前保持尽可能多的特征层信息,同时尽可能轻量化。这大大提高了检测精度和检测速度。

4)基于MODSConv和DAF-CA机制,构建了保持原始信息深度可分离层(MODSLayer),以不降级的方式保护了通道之间的丰富信息,为下一步的特征提取操作提供了更好的操作空间。为接下来的特征提取操作提供了更好的操作空间,通过提取更准确的特征,大大提高了模型的检测性能。同时,利用MODSLayer构建网络的主干层和颈部层,将该网络命名为保持原始信息深度可分离网络(MODSNet)。

1.1 MODSConv

解决经典深度卷积后通道间信息缺失和降维导致的信息缺失问题

深度可分离卷积

论文:https://arxiv.org/pdf/1704.04861.pdf

下图就是depthwise separable convolution的示意图,其实就是将传统的卷积操作分成两步,假设原来是3*3的卷积,那么depthwise separable convolution就是先用M个3*3卷积核一对一卷积输入的M个feature map,不求和,生成M个结果;然后用N个1*1的卷积核正常卷积前面生成的M个结果,求和,最后生成N个结果。因此文章中将depthwise separable convolution分成两步,一步叫depthwise convolution,就是下图的(b),另一步是pointwise convolution,就是下图的(c)。
 

MODSConv和 深度可分离卷积区别

伪代码如下:

MODSConv

    def forward(self, x):
        x1 = self.dconv(x) + x
        return self.pconv(x1)

深度可分离卷积 

    def forward(self, x):
        x1 = self.dconv(x)
        return self.pconv(x1)

1.2 DAF-CA

受CBAM注意力的启发,我们发现CA注意力机制并没有很好地利用显著信息。因此,我们设计了一种结合平均池化和最大池化的即插即用坐标注意力,并将其命名为DAF-CA。 

1.3 GRF-SPPF介绍

在许多目标检测任务中,如果只看边缘信息而忽略背景信息,那么对于特征判断来说,无疑会遗漏一些重要的语义信息。因此,我们在SPPF模块的基础上,利用全局平均池化层和全局最大池化层,加入一些全局背景信息和边缘信息,帮助网络更好的做出判断。

上式中,𝑋1,𝑋2,𝑋3表示主路通过三个最大池化层输出的三个特征层;𝑣,𝑢表示经过全局平均池化层和全局最大池化层后的输入特征𝑋。另外, 𝐸(𝑣) and 𝐸(𝑢)延展到与输入特征相同尺度的特征层;该函数表示这6个特征层在宽度和高度上的叠加维度。通道叠加后对特征层进行1 × 1卷积核的普通卷积运算,𝑋𝑜𝑢𝑡为GRF-SPPF模块输出的特征层。

MODL-Head

我们设计的初衷是为了解决由于提取的特征降维而导致的通道信息缺乏的问题,使模型尽可能的轻量化。我们的modhead模块很好地满足了这些要求。如图13所示,我们对输入特征层的处理与YOLOX的去耦头不同。我们根据分类和回归任务分开处理的思想设计 head。因此,输入特征分为两条路线:一条用于分类任务,另一条用于回归任务。

实验结果分析

 2.如何加入YOLOv8

2.1 新建ultralytics/nn/block/MOD_YOLO.py

block文件夹为新建文件夹,并非源码自带

import torch
from torch import nn
import warnings

class BaseConv(nn.Module):
    def __init__(self, in_channels, out_channels, ksize, stride, groups=1, bias=False, act="silu"):
        super().__init__()
        pad         = (ksize - 1) // 2
        self.conv   = nn.Conv2d(in_channels, out_channels, kernel_size=ksize, stride=stride, padding=pad, groups=groups, bias=bias)
        self.bn     = nn.BatchNorm2d(out_channels, eps=0.001, momentum=0.03)
        self.act    = get_activation(act, inplace=True)

    def forward(self, x):
        return self.act(self.bn(self.conv(x)))

    def fuseforward(self, x):
        return self.act(self.conv(x))

class SiLU(nn.Module):
    @staticmethod
    def forward(x):
        return x * torch.sigmoid(x)

class h_sigmoid(nn.Module):
    def __init__(self, inplace=True):
        super(h_sigmoid, self).__init__()
        self.relu = nn.ReLU6(inplace=inplace)

    def forward(self, x):
        return self.relu(x + 3) / 6

class h_swish(nn.Module):
    def __init__(self, inplace=True):
        super(h_swish, self).__init__()
        self.sigmoid = h_sigmoid(inplace=inplace)

    def forward(self, x):
        return x * self.sigmoid(x)

def get_activation(name="silu", inplace=True):
    if name == "silu":
        module = SiLU()
    elif name == "relu":
        module = nn.ReLU(inplace=inplace)
    elif name == "lrelu":
        module = nn.LeakyReLU(0.1, inplace=inplace)
    else:
        raise AttributeError("Unsupported act type: {}".format(name))
    return module

class MODSConv(nn.Module):
    def __init__(self, in_channels, out_channels, ksize, stride=1, act="silu"):
        super().__init__()
        self.dconv = BaseConv(in_channels, in_channels, ksize=ksize, stride=stride, groups=in_channels, act=act,)
        self.pconv = BaseConv(in_channels, out_channels, ksize=1, stride=1, groups=1, act=act)

    def forward(self, x):
        x1 = self.dconv(x) + x
        return self.pconv(x1)

class GRF_SPPF(nn.Module):
    def __init__(self, c1, c2, k=5):
        super().__init__()
        c_ = c1 // 2
        Conv = BaseConv
        self.cv1 = Conv(c1, c_, 1, 1)
        self.cv2 = Conv(c_ * 6, c2, 1, 1)
        self.m = nn.MaxPool2d(kernel_size=k, stride=1, padding=k // 2)
        self.am = nn.AdaptiveMaxPool2d(1)
        self.aa = nn.AdaptiveAvgPool2d(1)

    def forward(self, x):
        x = self.cv1(x)
        with warnings.catch_warnings():
            warnings.simplefilter('ignore')
            y1 = self.m(x)
            y2 = self.m(y1)
            return self.cv2(torch.cat((x, y1, y2, self.m(y2),self.am(x).expand_as(x),self.aa(x).expand_as(x)), 1) )

class DAF_CA(nn.Module):
    def __init__(self, inp, oup, reduction=32):
        super(DAF_CA, self).__init__()
        mip = max(8, inp // reduction)

        self.conv1 = nn.Conv2d(inp, mip, kernel_size=(1,2), stride=1,padding=(0,0), bias=False)
        self.bn1 = nn.BatchNorm2d(mip)
        self.act = h_swish()
        self.conv_h = nn.Conv2d(mip, oup, 1, 1, bias=False)
        self.conv_w = nn.Conv2d(mip, oup, 1, 1, bias=False)

    def forward(self, x):
        identity = x
        _, _, h, w = x.size()
        pool_ha = nn.AdaptiveAvgPool2d((h, 1))
        pool_hm = nn.AdaptiveMaxPool2d((h, 1))
        x_ha = pool_ha(x)
        x_hm = pool_hm(x)
        x_h = torch.cat([x_ha,x_hm],dim=3)
        pool_wa = nn.AdaptiveAvgPool2d((1, w))
        pool_wm = nn.AdaptiveMaxPool2d((1, w))
        x_wa = pool_wa(x).permute(0, 1, 3, 2)
        x_wm = pool_wm(x).permute(0, 1, 3, 2)
        x_w = torch.cat([x_wa,x_wm],dim=3)
        y1 = torch.cat([x_h,x_w], dim=2)
        y1 = self.conv1(y1)
        y1 = self.bn1(y1)
        y1 = self.act(y1)
        y_h,y_w = torch.split(y1, [h,  w], dim=2)
        y_w = y_w.permute(0, 1, 3, 2)
        a_h = (self.conv_h(y_h)).sigmoid()
        a_w = (self.conv_w(y_w)).sigmoid()

        return identity * a_h * a_w

class MODSLayer(nn.Module):
    def __init__(self, in_channels, out_channels, n=1, act="silu"):
        super().__init__()
        self.s = BaseConv(in_channels,out_channels,ksize=1,stride=1,act=act)
        self.conv3 = DAF_CA(out_channels,out_channels, reduction=32)
        module_list = [MODSConv(in_channels, in_channels,ksize=3) for _ in range(n)]
        self.m      = nn.Sequential(*module_list)

    def forward(self, x):
        x_1 = self.m(x)
        x = x + x_1
        return self.conv3(self.s(x))

class MODL_Head(nn.Module):
    def __init__(self, num_classes, width = 1.0, in_channels = [256, 512, 1024], act = "silu"):
        super().__init__()
        self.cls_convs  = nn.ModuleList()
        self.reg_convs  = nn.ModuleList()
        self.cls_preds  = nn.ModuleList()
        self.reg_preds  = nn.ModuleList()
        self.obj_preds  = nn.ModuleList()
        self.stems      = nn.ModuleList()

        for i in range(len(in_channels)):
            self.cls_convs.append(nn.Sequential(*[
                MODSConv(in_channels = int(in_channels[i] * width), out_channels = int(in_channels[i] * width), ksize = 3, stride = 1, act = act),
            ]))
            self.cls_preds.append(
                nn.Conv2d(in_channels = int(in_channels[i] * width), out_channels = num_classes, kernel_size = 1, stride = 1, padding = 0)
            )

            self.reg_convs.append(nn.Sequential(*[
                MODSConv(in_channels = int(in_channels[i] * width), out_channels = int(in_channels[i] * width), ksize = 3, stride = 1, act = act),
            ]))
            self.reg_preds.append(
                nn.Conv2d(in_channels =int(in_channels[i] * width), out_channels = 4, kernel_size = 1, stride = 1, padding = 0)
            )
            self.obj_preds.append(
                nn.Conv2d(in_channels = int(in_channels[i] * width), out_channels = 1, kernel_size = 1, stride = 1, padding = 0)
            )

    def forward(self, inputs):
        outputs = []
        for k, x in enumerate(inputs):
            cls_feat    = self.cls_convs[k](x)
            cls_output  = self.cls_preds[k](cls_feat)
            reg_feat    = self.reg_convs[k](x)
            reg_output  = self.reg_preds[k](reg_feat)
            obj_output  = self.obj_preds[k](reg_feat)
            output      = torch.cat([reg_output, obj_output, cls_output], 1)
            outputs.append(output)

        return outputs

2.2 修改tasks.py

1)第一处修改   MODSConv,GRF_SPPF,DAF_CA,MODSLayer进行注册

from models.block.MOD_YOLO import MODSConv,GRF_SPPF,DAF_CA,MODSLayer

2)第二处修改 

修改def parse_model(d, ch, verbose=True):  # model_dict, input_channels(3)

        if m in {
                Conv, GhostConv, Bottleneck, GhostBottleneck, SPP, SPPF,DWConv, MixConv2d, Focus, CrossConv,
                BottleneckCSP, C3, C3TR, C3SPP, C3Ghost, CNeB, nn.ConvTranspose2d, DWConvTranspose2d, C3x, C2f,
                MODSConv, GRF_SPPF, DAF_CA, MODSLayer,
                }:

2.3 yolov8_GRF_SPPF.yaml

# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect

# Parameters
nc: 80  # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
  # [depth, width, max_channels]
  n: [0.33, 0.25, 1024]  # YOLOv8n summary: 225 layers,  3157200 parameters,  3157184 gradients,   8.9 GFLOPs
  s: [0.33, 0.50, 1024]  # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients,  28.8 GFLOPs
  m: [0.67, 0.75, 768]   # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients,  79.3 GFLOPs
  l: [1.00, 1.00, 512]   # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPs
  x: [1.00, 1.25, 512]   # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs

# YOLOv8.0n backbone
backbone:
  # [from, repeats, module, args]
  - [-1, 1, Conv, [64, 3, 2]]  # 0-P1/2
  - [-1, 1, Conv, [128, 3, 2]]  # 1-P2/4
  - [-1, 3, C2f, [128, True]]
  - [-1, 1, Conv, [256, 3, 2]]  # 3-P3/8
  - [-1, 6, C2f, [256, True]]
  - [-1, 1, Conv, [512, 3, 2]]  # 5-P4/16
  - [-1, 6, C2f, [512, True]]
  - [-1, 1, Conv, [1024, 3, 2]]  # 7-P5/32
  - [-1, 3, C2f, [1024, True]]
  - [-1, 1, GRF_SPPF, [1024, 5]]  # 9

# YOLOv8.0n head
head:
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
  - [[-1, 6], 1, Concat, [1]]  # cat backbone P4
  - [-1, 3, C2f, [512]]  # 12

  - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
  - [[-1, 4], 1, Concat, [1]]  # cat backbone P3
  - [-1, 3, C2f, [256]]  # 15 (P3/8-small)

  - [-1, 1, Conv, [256, 3, 2]]
  - [[-1, 12], 1, Concat, [1]]  # cat head P4
  - [-1, 3, C2f, [512]]  # 18 (P4/16-medium)

  - [-1, 1, Conv, [512, 3, 2]]
  - [[-1, 9], 1, Concat, [1]]  # cat head P5
  - [-1, 3, C2f, [1024]]  # 21 (P5/32-large)

  - [[15, 18, 21], 1, Detect, [nc]]  # Detect(P3, P4, P5)

2.4 yolov8_DAF_CA.yaml

# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect

# Parameters
nc: 80  # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
  # [depth, width, max_channels]
  n: [0.33, 0.25, 1024]  # YOLOv8n summary: 225 layers,  3157200 parameters,  3157184 gradients,   8.9 GFLOPs
  s: [0.33, 0.50, 1024]  # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients,  28.8 GFLOPs
  m: [0.67, 0.75, 768]   # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients,  79.3 GFLOPs
  l: [1.00, 1.00, 512]   # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPs
  x: [1.00, 1.25, 512]   # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs

# YOLOv8.0n backbone
backbone:
  # [from, repeats, module, args]
  - [-1, 1, Conv, [64, 3, 2]]  # 0-P1/2
  - [-1, 1, Conv, [128, 3, 2]]  # 1-P2/4
  - [-1, 3, C2f, [128, True]]
  - [-1, 1, Conv, [256, 3, 2]]  # 3-P3/8
  - [-1, 6, C2f, [256, True]]
  - [-1, 1, Conv, [512, 3, 2]]  # 5-P4/16
  - [-1, 6, C2f, [512, True]]
  - [-1, 1, Conv, [1024, 3, 2]]  # 7-P5/32
  - [-1, 3, C2f, [1024, True]]
  - [-1, 1, SPPF, [1024, 5]]  # 9
  - [-1, 1, DAF_CA, [1024]]  # 10

# YOLOv8.0n head
head:
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
  - [[-1, 6], 1, Concat, [1]]  # cat backbone P4
  - [-1, 3, C2f, [512]]  # 12

  - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
  - [[-1, 4], 1, Concat, [1]]  # cat backbone P3
  - [-1, 3, C2f, [256]]  # 15 (P3/8-small)

  - [-1, 1, Conv, [256, 3, 2]]
  - [[-1, 13], 1, Concat, [1]]  # cat head P4
  - [-1, 3, C2f, [512]]  # 18 (P4/16-medium)

  - [-1, 1, Conv, [512, 3, 2]]
  - [[-1, 10], 1, Concat, [1]]  # cat head P5
  - [-1, 3, C2f, [1024]]  # 21 (P5/32-large)

  - [[16, 19, 22], 1, Detect, [nc]]  # Detect(P3, P4, P5)

2.5 yolov8_MODSConv.yaml

# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect

# Parameters
nc: 80  # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
  # [depth, width, max_channels]
  n: [0.33, 0.25, 1024]  # YOLOv8n summary: 225 layers,  3157200 parameters,  3157184 gradients,   8.9 GFLOPs
  s: [0.33, 0.50, 1024]  # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients,  28.8 GFLOPs
  m: [0.67, 0.75, 768]   # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients,  79.3 GFLOPs
  l: [1.00, 1.00, 512]   # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPs
  x: [1.00, 1.25, 512]   # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs

# YOLOv8.0n backbone
backbone:
  # [from, repeats, module, args]
  - [-1, 1, Conv, [64, 3, 2]]  # 0-P1/2
  - [-1, 1, Conv, [128, 3, 2]]  # 1-P2/4
  - [-1, 3, C2f, [128, True]]
  - [-1, 1, Conv, [256, 3, 2]]  # 3-P3/8
  - [-1, 6, C2f, [256, True]]
  - [-1, 1, Conv, [512, 3, 2]]  # 5-P4/16
  - [-1, 6, C2f, [512, True]]
  - [-1, 1, Conv, [1024, 3, 2]]  # 7-P5/32
  - [-1, 3, C2f, [1024, True]]
  - [-1, 1, SPPF, [1024, 5]]  # 9

# YOLOv8.0n head
head:
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
  - [[-1, 6], 1, Concat, [1]]  # cat backbone P4
  - [-1, 3, C2f, [512]]  # 12

  - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
  - [[-1, 4], 1, Concat, [1]]  # cat backbone P3
  - [-1, 3, C2f, [256]]  # 15 (P3/8-small)

  - [-1, 1, Conv, [256, 3, 2]]
  - [[-1, 12], 1, Concat, [1]]  # cat head P4
  - [-1, 3, C2f, [512]]  # 18 (P4/16-medium)

  - [-1, 1, Conv, [512, 3, 2]]
  - [[-1, 9], 1, Concat, [1]]  # cat head P5
  - [-1, 3, C2f, [1024]]  # 21 (P5/32-large)
  
  - [15, 1, MODSConv, [64,3]]  # 22
  - [18, 1, MODSConv, [128,3]]  # 23
  - [21, 1, MODSConv, [256,3]]  # 24

  - [[22, 23, 24], 1, Detect, [nc]]  # Detect(P3, P4, P5)

2.6 yolov8_MODSLayer.yaml

# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect

# Parameters
nc: 80  # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
  # [depth, width, max_channels]
  n: [0.33, 0.25, 1024]  # YOLOv8n summary: 225 layers,  3157200 parameters,  3157184 gradients,   8.9 GFLOPs
  s: [0.33, 0.50, 1024]  # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients,  28.8 GFLOPs
  m: [0.67, 0.75, 768]   # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients,  79.3 GFLOPs
  l: [1.00, 1.00, 512]   # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPs
  x: [1.00, 1.25, 512]   # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs

# YOLOv8.0n backbone
backbone:
  # [from, repeats, module, args]
  - [-1, 1, Conv, [64, 3, 2]]  # 0-P1/2
  - [-1, 1, Conv, [128, 3, 2]]  # 1-P2/4
  - [-1, 3, C2f, [128, True]]
  - [-1, 1, Conv, [256, 3, 2]]  # 3-P3/8
  - [-1, 6, C2f, [256, True]]
  - [-1, 1, Conv, [512, 3, 2]]  # 5-P4/16
  - [-1, 6, C2f, [512, True]]
  - [-1, 1, Conv, [1024, 3, 2]]  # 7-P5/32
  - [-1, 3, C2f, [1024, True]]
  - [-1, 1, SPPF, [1024, 5]]  # 9
  - [-1, 1, MODSLayer, [1024]]  # 10

# YOLOv8.0n head
head:
  - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
  - [[-1, 6], 1, Concat, [1]]  # cat backbone P4
  - [-1, 3, C2f, [512]]  # 12

  - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
  - [[-1, 4], 1, Concat, [1]]  # cat backbone P3
  - [-1, 3, C2f, [256]]  # 15 (P3/8-small)

  - [-1, 1, Conv, [256, 3, 2]]
  - [[-1, 13], 1, Concat, [1]]  # cat head P4
  - [-1, 3, C2f, [512]]  # 18 (P4/16-medium)

  - [-1, 1, Conv, [512, 3, 2]]
  - [[-1, 10], 1, Concat, [1]]  # cat head P5
  - [-1, 3, C2f, [1024]]  # 21 (P5/32-large)

  - [[16, 19, 22], 1, Detect, [nc]]  # Detect(P3, P4, P5)