一、本文介绍
本文给家大家带来的改进机制是 iRMB ,其是在论文Rethinking Mobile Block for Efficient Attention-based Models 种提出,论文提出了一个新的主干网络EMO ( 后面我也会教大家如何使用该主干,本文先教大家使用该文中提出的注意力机制 )。 其主要思想是将 轻量级 的CNN架构与基于注意力的 模型 结构相结合 ( 有点类似ACmix ), 我将iRMB和C2PSA结合,然后也将其用在了检测头种进行尝试 , 三种结果进行对比,针对的作用也不相同,但是 无论那种实验均有一定涨点效果 ,同时该注意力机制属于是比较轻量化的参数量比较小,训练速度也很快, 本文 后面我会将各种添加方法教给大家 ,让大家在自己的模型中进行复现。
二、iRMB的框架原理
官方论文地址: 官方论文地址点击即可跳转
官方代码地址: 官方代码地址点击即可跳转
iRMB(Inverted Residual Mobile Block) 的主要思想是将轻量级的CNN架构与基于注意力的模型结构相结合 ( 有点类似ACmix ) ,以创建高效的移动网络。iRMB通过重新考虑倒置残差块(IRB)和 Transformer 的有效组件,实现了一种统一的视角,从而扩展了CNN的IRB到基于注意力的模型。iRMB的设计目标是在保持模型轻量级的同时,实现对计算资源的有效利用和高准确率。这一方法通过在下游任务上的广泛实验得到验证,展示出其在轻量级模型领域的优越 性能 。
iRMB的主要创新点在于以下三点:
1. 结合CNN的轻量级特性和Transformer的动态模型能力,创新提出了iRMB结构,适用于移动设备上的密集预测任务。
2. 使用倒置残差块设计,扩展了传统CNN的IRB到基于注意力的模型,增强了模型处理长距离信息的能力。
3. 提出了元移动块(Meta-Mobile Block),通过不同的扩展比率和高效操作符,实现了模型的模块化设计,使得模型更加灵活和高效。
2.1 iRMB结构
iRMB 结构的主要创新点是它结合了卷积 神经网络 (CNN)的轻量级特性和 Transformer 模型的动态处理能力。 这种结构特别适用于移动设备上的密集预测任务 ,因为它旨在在计算能力有限的环境中提供高效的性能。iRMB 通过其倒置残差设计改进了信息流的处理,允许在保持模型轻量的同时捕捉和利用长距离依赖,这对于图像分类、对象检测和语义分割等任务至关重要。这种设计使得模型在资源受限的设备上也能高效运行,同时保持或提高预测准确性。
上面的图片来自与论文的图片2展示了 iRMB(Inverted Residual Mobile Block) 的设计理念和结构。左侧是从多头自注意力和前馈网络中抽象出的统一元移动块(Meta-Mobile Block),它将不同扩展比率和高效操作符结合起来,形成特定的模块。右侧是基于iRMB构建的类似ResNet的高效模型(EMO),它仅由推导出的iRMB组成,并用于各种下游任务,如分类(CLS)、检测(Det)和分割(Seg)。这种设计实现了模型的轻量化,同时保持了良好的性能和效率。
这幅图展示了 iRMB(Inverted Residual Mobile Block )的结构范式。iRMB是一种混合网络模块,它结合了深度可分离卷积(3x3 DW-Conv)和 自注意力机制 。1x1卷积用于通道数的压缩和扩张,以此优化计算效率。深度可分离卷积(DW-Conv)用于捕捉空间特征,而注意力机制则用于捕获特征间的全局依赖关系。
2.2 倒置残差块
在iRMB设计中,使用倒置残差块(IRB)的概念被扩展到了基于注意力的模型中。这使得模型能够更有效地处理长距离信息,这是因为自注意力机制能够捕获输入数据中不同部分之间的全局依赖关系。传统的CNN通常只能捕捉到局部特征,而通过引入注意力机制,iRMB能够在提取特征时考虑到整个输入空间,增强了模型对复杂数据模式的理解能力,特别是在处理视觉和序列数据时。这种结合了传统CNN的轻量化和Transformer的长距离建模能力的设计,为在资源受限的环境中实现高效的 深度学习 模型提供了新的可能性 ( 文章中并没有关于IRB的结构图) 。
2.3 元移动块(Meta-Mobile Block)
元移动块(Meta-Mobile Block),它通过不同的扩展比率和高效操作符实现模块化设计。这种方法使得模型可以根据需要调整其容量,而无需重新设计整个网络。元移动块的核心理念是通过可插拔的方式,将不同的操作如卷积、自注意力等集成到一个统一的框架中,从而提高模型的效率和灵活性。这允许模型在复杂性和计算效率之间进行更好的权衡,特别适用于那些需要在有限资源下运行的应用。
图中展示的是Meta Mobile Block的设计。在这个构件中,1x1的卷积层被用来改变特征图的通道数,从而控制网络的容量。中间的“Efficient Operator”是一个高效的运算符,可以是自注意力机制或其他任何高效的层或操作。这种设计使得Meta Mobile Block能够灵活地适应不同的任务需求,并保持高效的计算性能。通过这样的模块化,网络能够在不同的环境和任务中进行快速调整和优化。
三、iRMB的核心代码
该代码的使用方式我们看章节四来进行使用.
- import math
- import torch
- import torch.nn as nn
- import torch.nn.functional as F
- from functools import partial
- from einops import rearrange
- from timm.models._efficientnet_blocks import SqueezeExcite
- from timm.models.layers import DropPath
- __all__ = ['iRMB', 'C2PSA_iRMB']
- inplace = True # 全局变量
- class LayerNorm2d(nn.Module):
- def __init__(self, normalized_shape, eps=1e-6, elementwise_affine=True):
- super().__init__()
- self.norm = nn.LayerNorm(normalized_shape, eps, elementwise_affine)
- def forward(self, x):
- x = rearrange(x, 'b c h w -> b h w c').contiguous()
- x = self.norm(x)
- x = rearrange(x, 'b h w c -> b c h w').contiguous()
- return x
- def get_norm(norm_layer='in_1d'):
- eps = 1e-6
- norm_dict = {
- 'none': nn.Identity,
- 'in_1d': partial(nn.InstanceNorm1d, eps=eps),
- 'in_2d': partial(nn.InstanceNorm2d, eps=eps),
- 'in_3d': partial(nn.InstanceNorm3d, eps=eps),
- 'bn_1d': partial(nn.BatchNorm1d, eps=eps),
- 'bn_2d': partial(nn.BatchNorm2d, eps=eps),
- # 'bn_2d': partial(nn.SyncBatchNorm, eps=eps),
- 'bn_3d': partial(nn.BatchNorm3d, eps=eps),
- 'gn': partial(nn.GroupNorm, eps=eps),
- 'ln_1d': partial(nn.LayerNorm, eps=eps),
- 'ln_2d': partial(LayerNorm2d, eps=eps),
- }
- return norm_dict[norm_layer]
- def get_act(act_layer='relu'):
- act_dict = {
- 'none': nn.Identity,
- 'relu': nn.ReLU,
- 'relu6': nn.ReLU6,
- 'silu': nn.SiLU
- }
- return act_dict[act_layer]
- class ConvNormAct(nn.Module):
- def __init__(self, dim_in, dim_out, kernel_size, stride=1, dilation=1, groups=1, bias=False,
- skip=False, norm_layer='bn_2d', act_layer='relu', inplace=True, drop_path_rate=0.):
- super(ConvNormAct, self).__init__()
- self.has_skip = skip and dim_in == dim_out
- padding = math.ceil((kernel_size - stride) / 2)
- self.conv = nn.Conv2d(dim_in, dim_out, kernel_size, stride, padding, dilation, groups, bias)
- self.norm = get_norm(norm_layer)(dim_out)
- self.act = get_act(act_layer)(inplace=inplace)
- self.drop_path = DropPath(drop_path_rate) if drop_path_rate else nn.Identity()
- def forward(self, x):
- shortcut = x
- x = self.conv(x)
- x = self.norm(x)
- x = self.act(x)
- if self.has_skip:
- x = self.drop_path(x) + shortcut
- return x
- class iRMB(nn.Module):
- def __init__(self, dim_in, norm_in=True, has_skip=True, exp_ratio=1.0, norm_layer='bn_2d',
- act_layer='relu', v_proj=True, dw_ks=3, stride=1, dilation=1, se_ratio=0.0, dim_head=8, window_size=7,
- attn_s=True, qkv_bias=False, attn_drop=0., drop=0., drop_path=0., v_group=False, attn_pre=False):
- super().__init__()
- dim_out = dim_in
- self.norm = get_norm(norm_layer)(dim_in) if norm_in else nn.Identity()
- dim_mid = int(dim_in * exp_ratio)
- self.has_skip = (dim_in == dim_out and stride == 1) and has_skip
- self.attn_s = attn_s
- if self.attn_s:
- assert dim_in % dim_head == 0, 'dim should be divisible by num_heads'
- self.dim_head = dim_head
- self.window_size = window_size
- self.num_head = dim_in // dim_head
- self.scale = self.dim_head ** -0.5
- self.attn_pre = attn_pre
- self.qk = ConvNormAct(dim_in, int(dim_in * 2), kernel_size=1, bias=qkv_bias, norm_layer='none',
- act_layer='none')
- self.v = ConvNormAct(dim_in, dim_mid, kernel_size=1, groups=self.num_head if v_group else 1, bias=qkv_bias,
- norm_layer='none', act_layer=act_layer, inplace=inplace)
- self.attn_drop = nn.Dropout(attn_drop)
- else:
- if v_proj:
- self.v = ConvNormAct(dim_in, dim_mid, kernel_size=1, bias=qkv_bias, norm_layer='none',
- act_layer=act_layer, inplace=inplace)
- else:
- self.v = nn.Identity()
- self.conv_local = ConvNormAct(dim_mid, dim_mid, kernel_size=dw_ks, stride=stride, dilation=dilation,
- groups=dim_mid, norm_layer='bn_2d', act_layer='silu', inplace=inplace)
- self.se = SqueezeExcite(dim_mid, rd_ratio=se_ratio, act_layer=get_act(act_layer)) if se_ratio > 0.0 else nn.Identity()
- self.proj_drop = nn.Dropout(drop)
- self.proj = ConvNormAct(dim_mid, dim_out, kernel_size=1, norm_layer='none', act_layer='none', inplace=inplace)
- self.drop_path = DropPath(drop_path) if drop_path else nn.Identity()
- def forward(self, x):
- shortcut = x
- x = self.norm(x)
- B, C, H, W = x.shape
- if self.attn_s:
- # padding
- if self.window_size <= 0:
- window_size_W, window_size_H = W, H
- else:
- window_size_W, window_size_H = self.window_size, self.window_size
- pad_l, pad_t = 0, 0
- pad_r = (window_size_W - W % window_size_W) % window_size_W
- pad_b = (window_size_H - H % window_size_H) % window_size_H
- x = F.pad(x, (pad_l, pad_r, pad_t, pad_b, 0, 0,))
- n1, n2 = (H + pad_b) // window_size_H, (W + pad_r) // window_size_W
- x = rearrange(x, 'b c (h1 n1) (w1 n2) -> (b n1 n2) c h1 w1', n1=n1, n2=n2).contiguous()
- # attention
- b, c, h, w = x.shape
- qk = self.qk(x)
- qk = rearrange(qk, 'b (qk heads dim_head) h w -> qk b heads (h w) dim_head', qk=2, heads=self.num_head,
- dim_head=self.dim_head).contiguous()
- q, k = qk[0], qk[1]
- attn_spa = (q @ k.transpose(-2, -1)) * self.scale
- attn_spa = attn_spa.softmax(dim=-1)
- attn_spa = self.attn_drop(attn_spa)
- if self.attn_pre:
- x = rearrange(x, 'b (heads dim_head) h w -> b heads (h w) dim_head', heads=self.num_head).contiguous()
- x_spa = attn_spa @ x
- x_spa = rearrange(x_spa, 'b heads (h w) dim_head -> b (heads dim_head) h w', heads=self.num_head, h=h,
- w=w).contiguous()
- x_spa = self.v(x_spa)
- else:
- v = self.v(x)
- v = rearrange(v, 'b (heads dim_head) h w -> b heads (h w) dim_head', heads=self.num_head).contiguous()
- x_spa = attn_spa @ v
- x_spa = rearrange(x_spa, 'b heads (h w) dim_head -> b (heads dim_head) h w', heads=self.num_head, h=h,
- w=w).contiguous()
- # unpadding
- x = rearrange(x_spa, '(b n1 n2) c h1 w1 -> b c (h1 n1) (w1 n2)', n1=n1, n2=n2).contiguous()
- if pad_r > 0 or pad_b > 0:
- x = x[:, :, :H, :W].contiguous()
- else:
- x = self.v(x)
- x = x + self.se(self.conv_local(x)) if self.has_skip else self.se(self.conv_local(x))
- x = self.proj_drop(x)
- x = self.proj(x)
- x = (shortcut + self.drop_path(x)) if self.has_skip else x
- return x
- def autopad(k, p=None, d=1): # kernel, padding, dilation
- """Pad to 'same' shape outputs."""
- if d > 1:
- k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k] # actual kernel-size
- if p is None:
- p = k // 2 if isinstance(k, int) else [x // 2 for x in k] # auto-pad
- return p
- class Conv(nn.Module):
- """Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)."""
- default_act = nn.SiLU() # default activation
- def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):
- """Initialize Conv layer with given arguments including activation."""
- super().__init__()
- self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)
- self.bn = nn.BatchNorm2d(c2)
- self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()
- def forward(self, x):
- """Apply convolution, batch normalization and activation to input tensor."""
- return self.act(self.bn(self.conv(x)))
- def forward_fuse(self, x):
- """Perform transposed convolution of 2D data."""
- return self.act(self.conv(x))
- class PSABlock(nn.Module):
- """
- PSABlock class implementing a Position-Sensitive Attention block for neural networks.
- This class encapsulates the functionality for applying multi-head attention and feed-forward neural network layers
- with optional shortcut connections.
- Attributes:
- attn (Attention): Multi-head attention module.
- ffn (nn.Sequential): Feed-forward neural network module.
- add (bool): Flag indicating whether to add shortcut connections.
- Methods:
- forward: Performs a forward pass through the PSABlock, applying attention and feed-forward layers.
- Examples:
- Create a PSABlock and perform a forward pass
- >>> psablock = PSABlock(c=128, attn_ratio=0.5, num_heads=4, shortcut=True)
- >>> input_tensor = torch.randn(1, 128, 32, 32)
- >>> output_tensor = psablock(input_tensor)
- """
- def __init__(self, c, attn_ratio=0.5, num_heads=4, shortcut=True) -> None:
- """Initializes the PSABlock with attention and feed-forward layers for enhanced feature extraction."""
- super().__init__()
- self.attn = iRMB(c)
- self.ffn = nn.Sequential(Conv(c, c * 2, 1), Conv(c * 2, c, 1, act=False))
- self.add = shortcut
- def forward(self, x):
- """Executes a forward pass through PSABlock, applying attention and feed-forward layers to the input tensor."""
- x = x + self.attn(x) if self.add else self.attn(x)
- x = x + self.ffn(x) if self.add else self.ffn(x)
- return x
- class C2PSA_iRMB(nn.Module):
- """
- C2PSA module with attention mechanism for enhanced feature extraction and processing.
- This module implements a convolutional block with attention mechanisms to enhance feature extraction and processing
- capabilities. It includes a series of PSABlock modules for self-attention and feed-forward operations.
- Attributes:
- c (int): Number of hidden channels.
- cv1 (Conv): 1x1 convolution layer to reduce the number of input channels to 2*c.
- cv2 (Conv): 1x1 convolution layer to reduce the number of output channels to c.
- m (nn.Sequential): Sequential container of PSABlock modules for attention and feed-forward operations.
- Methods:
- forward: Performs a forward pass through the C2PSA module, applying attention and feed-forward operations.
- Notes:
- This module essentially is the same as PSA module, but refactored to allow stacking more PSABlock modules.
- Examples:
- >>> c2psa = C2PSA(c1=256, c2=256, n=3, e=0.5)
- >>> input_tensor = torch.randn(1, 256, 64, 64)
- >>> output_tensor = c2psa(input_tensor)
- """
- def __init__(self, c1, c2, n=1, e=0.5):
- """Initializes the C2PSA module with specified input/output channels, number of layers, and expansion ratio."""
- super().__init__()
- assert c1 == c2
- self.c = int(c1 * e)
- self.cv1 = Conv(c1, 2 * self.c, 1, 1)
- self.cv2 = Conv(2 * self.c, c1, 1)
- self.m = nn.Sequential(*(PSABlock(self.c, attn_ratio=0.5, num_heads=self.c // 64) for _ in range(n)))
- def forward(self, x):
- """Processes the input tensor 'x' through a series of PSA blocks and returns the transformed tensor."""
- a, b = self.cv1(x).split((self.c, self.c), dim=1)
- b = self.m(b)
- return self.cv2(torch.cat((a, b), 1))
- if __name__ == "__main__":
- # Generating Sample image
- image_size = (1, 64, 224, 224)
- image = torch.rand(*image_size)
- # Model
- model = C2PSA_iRMB(64, 64)
- out = model(image)
- print(out.size())
四、手把手教你添加iRMB和C2PSA_iRMB机制
4.1 步骤一
首先我们找到如下的目录'ultralytics/nn',然后在这个目录下创建一个py文件,名字可以根据你自己的习惯起,然后将核心代码复制进去。
4.2 步骤二
之后我们找到'ultralytics/nn/tasks.py'文件,在其中注册我们的iRMB和C2f_iRMB模块。
首先我们需要在文件的开头导入我们的iRMB和C2f_iRMB模块, 如下图所示->
4.3 步骤三
第三步我门中到如下文件'ultralytics/nn/tasks.py'进行导入和注册我们的模块( 用群内的文件的话已经有了无需重新导入直接开始第四步即可) !
从今天开始以后的教程就都统一成这个样子了,因为我默认大家用了我群内的文件来进行修改!!
4.4 步骤四
我们找到parse_model这个方法,可以用搜索 (Ctrl + F) 也可以自己手动找, 我们找到如下的地方,然后将iRMB和C2PSA_iRMB添加进去即可,模仿我添加即可。
到此我们就注册成功了。
五、yaml文件和运行记录
下面推荐几个版本的yaml文件给大家,大家可以复制进行训练,但是组合用很多具体那种最有效果都不一定,针对不同的数据集效果也不一样,我下面推荐了几种我自己认为可能有效果的配合方式,你也可以自己进行组合。
5.1 yaml版本一(推荐)
此版本训练信息:YOLO11-C2PSA-iRMB summary: 333 layers, 2,610,587 parameters, 2,610,571 gradients, 8.4 GFLOPs
- # Ultralytics YOLO 🚀, AGPL-3.0 license
- # YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
- # Parameters
- nc: 80 # number of classes
- scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
- # [depth, width, max_channels]
- n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
- s: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
- m: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
- l: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
- x: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs
- # YOLO11n backbone
- backbone:
- # [from, repeats, module, args]
- - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
- - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
- - [-1, 2, C3k2, [256, False, 0.25]]
- - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
- - [-1, 2, C3k2, [512, False, 0.25]]
- - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
- - [-1, 2, C3k2, [512, True]]
- - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
- - [-1, 2, C3k2, [1024, True]]
- - [-1, 1, SPPF, [1024, 5]] # 9
- - [-1, 2, C2PSA_iRMB, [1024]] # 10
- # YOLO11n head
- head:
- - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- - [[-1, 6], 1, Concat, [1]] # cat backbone P4
- - [-1, 2, C3k2, [512, False]] # 13
- - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- - [[-1, 4], 1, Concat, [1]] # cat backbone P3
- - [-1, 2, C3k2, [256, False]] # 16 (P3/8-small)
- - [-1, 1, Conv, [256, 3, 2]]
- - [[-1, 13], 1, Concat, [1]] # cat head P4
- - [-1, 2, C3k2, [512, False]] # 19 (P4/16-medium)
- - [-1, 1, Conv, [512, 3, 2]]
- - [[-1, 10], 1, Concat, [1]] # cat head P5
- - [-1, 2, C3k2, [1024, True]] # 22 (P5/32-large)
- - [[16, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)
5.2 yaml版本二
此版本训练信息:YOLO11-iRMB summary: 397 layers, 2,944,603 parameters, 2,944,587 gradients, 16.8 GFLOPs
- # Ultralytics YOLO 🚀, AGPL-3.0 license
- # YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
- # Parameters
- nc: 80 # number of classes
- scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
- # [depth, width, max_channels]
- n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
- s: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
- m: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
- l: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
- x: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs
- # YOLO11n backbone
- backbone:
- # [from, repeats, module, args]
- - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
- - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
- - [-1, 2, C3k2, [256, False, 0.25]]
- - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
- - [-1, 2, C3k2, [512, False, 0.25]]
- - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
- - [-1, 2, C3k2, [512, True]]
- - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
- - [-1, 2, C3k2, [1024, True]]
- - [-1, 1, SPPF, [1024, 5]] # 9
- - [-1, 2, C2PSA, [1024]] # 10
- # YOLO11n head
- head:
- - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- - [[-1, 6], 1, Concat, [1]] # cat backbone P4
- - [-1, 2, C3k2, [512, False]] # 13
- - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- - [[-1, 4], 1, Concat, [1]] # cat backbone P3
- - [-1, 2, C3k2, [256, False]] # 16 (P3/8-small)
- - [-1, 1, iRMB, []] # 17 (P3/8-small) 小目标检测层输出位置增加注意力机制
- - [-1, 1, Conv, [256, 3, 2]]
- - [[-1, 13], 1, Concat, [1]] # cat head P4
- - [-1, 2, C3k2, [512, False]] # 20 (P4/16-medium)
- - [-1, 1, iRMB, []] # 21 (P4/16-medium) 中目标检测层输出位置增加注意力机制
- - [-1, 1, Conv, [512, 3, 2]]
- - [[-1, 10], 1, Concat, [1]] # cat head P5
- - [-1, 2, C3k2, [1024, True]] # 24 (P5/32-large)
- - [-1, 1, iRMB, []] # 25 (P5/32-large) 大目标检测层输出位置增加注意力机制
- # 注意力机制我这里其实是添加了三个但是实际一般生效就只添加一个就可以了,所以大家可以自行注释来尝试.
- # 具体在那一层用注意力机制可以根据自己的数据集场景进行选择。
- # 如果你自己配置注意力位置注意from[17, 21, 25]位置要对应上对应的检测层!
- - [[17, 21, 25], 1, Detect, [nc]] # Detect(P3, P4, P5)
5.3 iRMB的训练过程截图
大家可以看下面的运行结果和添加的位置所以不存在我发的代码不全或者运行不了的问题大家有问题也可以在评论区评论我看到都会为大家解答(我知道的)。
五、本文总结
到此本文的正式分享内容就结束了,在这里给大家推荐我的YOLOv11改进有效涨点专栏,本专栏目前为新开的平均质量分98分,后期我会根据各种最新的前沿顶会进行论文复现,也会对一些老的改进机制进行补充,如果大家觉得本文帮助到你了,订阅本专栏,关注后续更多的更新~