一、本文介绍
本文给大家带来的是YOLOv11改进 DAT(Vision Transformer with Deformable Attention) 的教程, CVPR2022 上同时被评选为 Best Paper , 由此可以证明其是一种十分 有效的 改进机制,其主要的 核心思想是: 引入可变形注意力机制和动态采样点(听着是不是和可变形动态卷积DCN挺相似)。文的讲解主要包含三方面:DAT的网络结构思想、DAttention的代码复现,如何添加DAttention到你的结构中实现涨点,下面先来分享我测试的对比图 。
二、DAT的网络结构思想
论文地址: DAT论文地址
官方地址: 官方代码的地址
2.1 DAT的主要思想和改进
DAT(Vision Transformer with Deformable Attention) 是一种引入了可变形注意力机制的视觉 Transformer ,DAT的核心思想主要包括以下几个方面:
-
可变形注意力(Deformable Attention) :传统的 Transformer 使用标准的自注意力机制,这种机制会处理图像中的所有像素,导致计算量很大。而DAT引入了可变形注意力机制,它只关注图像中的一小部分关键区域。这种方法可以显著减少计算量,同时保持良好的 性能 。
-
动态采样点 :在可变形注意力机制中,DAT动态地选择采样点,而不是固定地处理整个图像。这种动态选择机制使得 模型 可以更加集中地关注于那些对当前任务最重要的区域。
-
即插即用 :DAT的设计允许它适应不同的图像大小和内容,使其在多种视觉任务中都能有效工作,如图像分类、对象检测等。
总结: DAT通过引入可变形注意力机制,改进了视觉Transformer的效率和性能,使其在处理复杂的视觉任务时更加高效和准确。
2.2 DAT的网络结构图
(a) 展示了可变形注意力的信息流。左侧部分,一组参考点均匀地放置在特征图上,这些点的偏移量是由查询通过偏移网络学习得到的。然后,如右侧所示,根据变形点从采样特征中投影出变形的键和值。相对位置偏差也通过变形点计算,增强了输出转换特征的多头注意力。为了清晰展示,图中仅显示了4个参考点,但在实际实现中实际上有更多的点。
(b) 展示了偏移生成网络的详细结构,每层输入和输出特征图的大小都有标注 ( 这个Offset network在网络的代码中需要控制可添加可不添加 )。
通过上面的方式产生多种参考点分布在图像上,从而提高检测的效率, 最终的效果图如下->
2.3 DAT和其他机制的对比
DAT与其他视觉Transformer模型和CNN模型中的DCN(可变形卷积网络)的对比图如下,突出了它们处理查询的不同方法 ( 图片展示的很直观,不给大家描述过程了 ) :
三、DAT即插即用的代码块
下面的代码是DAT的网络结构代码.
- import numpy as np
- import torch
- import torch.nn as nn
- import torch.nn.functional as F
- import einops
- from timm.models.layers import trunc_normal_
- __all__ = ['DAttentionBaseline', 'C2PSA_DAT']
- class LayerNormProxy(nn.Module):
- def __init__(self, dim):
- super().__init__()
- self.norm = nn.LayerNorm(dim)
- def forward(self, x):
- x = einops.rearrange(x, 'b c h w -> b h w c')
- x = self.norm(x)
- return einops.rearrange(x, 'b h w c -> b c h w')
- class DAttentionBaseline(nn.Module):
- def __init__(
- self, q_size=(224,224), kv_size=(224,224), n_heads=8, n_head_channels=32, n_groups=1,
- attn_drop=0.0, proj_drop=0.0, stride=1,
- offset_range_factor=-1, use_pe=True, dwc_pe=True,
- no_off=False, fixed_pe=False, ksize=9, log_cpb=False
- ):
- super().__init__()
- n_head_channels = int(q_size / 8)
- q_size = (q_size, q_size)
- self.dwc_pe = dwc_pe
- self.n_head_channels = n_head_channels
- self.scale = self.n_head_channels ** -0.5
- self.n_heads = n_heads
- self.q_h, self.q_w = q_size
- # self.kv_h, self.kv_w = kv_size
- self.kv_h, self.kv_w = self.q_h // stride, self.q_w // stride
- self.nc = n_head_channels * n_heads
- self.n_groups = n_groups
- self.n_group_channels = self.nc // self.n_groups
- self.n_group_heads = self.n_heads // self.n_groups
- self.use_pe = use_pe
- self.fixed_pe = fixed_pe
- self.no_off = no_off
- self.offset_range_factor = offset_range_factor
- self.ksize = ksize
- self.log_cpb = log_cpb
- self.stride = stride
- kk = self.ksize
- pad_size = kk // 2 if kk != stride else 0
- self.conv_offset = nn.Sequential(
- nn.Conv2d(self.n_group_channels, self.n_group_channels, kk, stride, pad_size, groups=self.n_group_channels),
- LayerNormProxy(self.n_group_channels),
- nn.GELU(),
- nn.Conv2d(self.n_group_channels, 2, 1, 1, 0, bias=False)
- )
- if self.no_off:
- for m in self.conv_offset.parameters():
- m.requires_grad_(False)
- self.proj_q = nn.Conv2d(
- self.nc, self.nc,
- kernel_size=1, stride=1, padding=0
- )
- self.proj_k = nn.Conv2d(
- self.nc, self.nc,
- kernel_size=1, stride=1, padding=0)
- self.proj_v = nn.Conv2d(
- self.nc, self.nc,
- kernel_size=1, stride=1, padding=0
- )
- self.proj_out = nn.Conv2d(
- self.nc, self.nc,
- kernel_size=1, stride=1, padding=0
- )
- self.proj_drop = nn.Dropout(proj_drop, inplace=True)
- self.attn_drop = nn.Dropout(attn_drop, inplace=True)
- if self.use_pe and not self.no_off:
- if self.dwc_pe:
- self.rpe_table = nn.Conv2d(
- self.nc, self.nc, kernel_size=3, stride=1, padding=1, groups=self.nc)
- elif self.fixed_pe:
- self.rpe_table = nn.Parameter(
- torch.zeros(self.n_heads, self.q_h * self.q_w, self.kv_h * self.kv_w)
- )
- trunc_normal_(self.rpe_table, std=0.01)
- elif self.log_cpb:
- # Borrowed from Swin-V2
- self.rpe_table = nn.Sequential(
- nn.Linear(2, 32, bias=True),
- nn.ReLU(inplace=True),
- nn.Linear(32, self.n_group_heads, bias=False)
- )
- else:
- self.rpe_table = nn.Parameter(
- torch.zeros(self.n_heads, self.q_h * 2 - 1, self.q_w * 2 - 1)
- )
- trunc_normal_(self.rpe_table, std=0.01)
- else:
- self.rpe_table = None
- @torch.no_grad()
- def _get_ref_points(self, H_key, W_key, B, dtype, device):
- ref_y, ref_x = torch.meshgrid(
- torch.linspace(0.5, H_key - 0.5, H_key, dtype=dtype, device=device),
- torch.linspace(0.5, W_key - 0.5, W_key, dtype=dtype, device=device),
- indexing='ij'
- )
- ref = torch.stack((ref_y, ref_x), -1)
- ref[..., 1].div_(W_key - 1.0).mul_(2.0).sub_(1.0)
- ref[..., 0].div_(H_key - 1.0).mul_(2.0).sub_(1.0)
- ref = ref[None, ...].expand(B * self.n_groups, -1, -1, -1) # B * g H W 2
- return ref
- @torch.no_grad()
- def _get_q_grid(self, H, W, B, dtype, device):
- ref_y, ref_x = torch.meshgrid(
- torch.arange(0, H, dtype=dtype, device=device),
- torch.arange(0, W, dtype=dtype, device=device),
- indexing='ij'
- )
- ref = torch.stack((ref_y, ref_x), -1)
- ref[..., 1].div_(W - 1.0).mul_(2.0).sub_(1.0)
- ref[..., 0].div_(H - 1.0).mul_(2.0).sub_(1.0)
- ref = ref[None, ...].expand(B * self.n_groups, -1, -1, -1) # B * g H W 2
- return ref
- def forward(self, x):
- x = x
- B, C, H, W = x.size()
- dtype, device = x.dtype, x.device
- q = self.proj_q(x)
- q_off = einops.rearrange(q, 'b (g c) h w -> (b g) c h w', g=self.n_groups, c=self.n_group_channels)
- offset = self.conv_offset(q_off).contiguous() # B * g 2 Hg Wg
- Hk, Wk = offset.size(2), offset.size(3)
- n_sample = Hk * Wk
- if self.offset_range_factor >= 0 and not self.no_off:
- offset_range = torch.tensor([1.0 / (Hk - 1.0), 1.0 / (Wk - 1.0)], device=device).reshape(1, 2, 1, 1)
- offset = offset.tanh().mul(offset_range).mul(self.offset_range_factor)
- offset = einops.rearrange(offset, 'b p h w -> b h w p')
- reference = self._get_ref_points(Hk, Wk, B, dtype, device)
- if self.no_off:
- offset = offset.fill_(0.0)
- if self.offset_range_factor >= 0:
- pos = offset + reference
- else:
- pos = (offset + reference).clamp(-1., +1.)
- if self.no_off:
- x_sampled = F.avg_pool2d(x, kernel_size=self.stride, stride=self.stride)
- assert x_sampled.size(2) == Hk and x_sampled.size(3) == Wk, f"Size is {x_sampled.size()}"
- else:
- x_sampled = F.grid_sample(
- input=x.reshape(B * self.n_groups, self.n_group_channels, H, W),
- grid=pos[..., (1, 0)], # y, x -> x, y
- mode='bilinear', align_corners=True) # B * g, Cg, Hg, Wg
- x_sampled = x_sampled.reshape(B, C, 1, n_sample)
- # self.proj_k.weight = torch.nn.Parameter(self.proj_k.weight.float())
- # self.proj_k.bias = torch.nn.Parameter(self.proj_k.bias.float())
- # self.proj_v.weight = torch.nn.Parameter(self.proj_v.weight.float())
- # self.proj_v.bias = torch.nn.Parameter(self.proj_v.bias.float())
- # 检查权重的数据类型
- q = q.reshape(B * self.n_heads, self.n_head_channels, H * W)
- k = self.proj_k(x_sampled).reshape(B * self.n_heads, self.n_head_channels, n_sample)
- v = self.proj_v(x_sampled).reshape(B * self.n_heads, self.n_head_channels, n_sample)
- attn = torch.einsum('b c m, b c n -> b m n', q, k) # B * h, HW, Ns
- attn = attn.mul(self.scale)
- if self.use_pe and (not self.no_off):
- if self.dwc_pe:
- residual_lepe = self.rpe_table(q.reshape(B, C, H, W)).reshape(B * self.n_heads, self.n_head_channels,
- H * W)
- elif self.fixed_pe:
- rpe_table = self.rpe_table
- attn_bias = rpe_table[None, ...].expand(B, -1, -1, -1)
- attn = attn + attn_bias.reshape(B * self.n_heads, H * W, n_sample)
- elif self.log_cpb:
- q_grid = self._get_q_grid(H, W, B, dtype, device)
- displacement = (
- q_grid.reshape(B * self.n_groups, H * W, 2).unsqueeze(2) - pos.reshape(B * self.n_groups,
- n_sample,
- 2).unsqueeze(1)).mul(
- 4.0) # d_y, d_x [-8, +8]
- displacement = torch.sign(displacement) * torch.log2(torch.abs(displacement) + 1.0) / np.log2(8.0)
- attn_bias = self.rpe_table(displacement) # B * g, H * W, n_sample, h_g
- attn = attn + einops.rearrange(attn_bias, 'b m n h -> (b h) m n', h=self.n_group_heads)
- else:
- rpe_table = self.rpe_table
- rpe_bias = rpe_table[None, ...].expand(B, -1, -1, -1)
- q_grid = self._get_q_grid(H, W, B, dtype, device)
- displacement = (
- q_grid.reshape(B * self.n_groups, H * W, 2).unsqueeze(2) - pos.reshape(B * self.n_groups,
- n_sample,
- 2).unsqueeze(1)).mul(
- 0.5)
- attn_bias = F.grid_sample(
- input=einops.rearrange(rpe_bias, 'b (g c) h w -> (b g) c h w', c=self.n_group_heads,
- g=self.n_groups),
- grid=displacement[..., (1, 0)],
- mode='bilinear', align_corners=True) # B * g, h_g, HW, Ns
- attn_bias = attn_bias.reshape(B * self.n_heads, H * W, n_sample)
- attn = attn + attn_bias
- attn = F.softmax(attn, dim=2)
- attn = self.attn_drop(attn)
- out = torch.einsum('b m n, b c n -> b c m', attn, v)
- if self.use_pe and self.dwc_pe:
- out = out + residual_lepe
- out = out.reshape(B, C, H, W)
- y = self.proj_drop(self.proj_out(out))
- h, w = pos.reshape(B, self.n_groups, Hk, Wk, 2), reference.reshape(B, self.n_groups, Hk, Wk, 2)
- return y
- def autopad(k, p=None, d=1): # kernel, padding, dilation
- """Pad to 'same' shape outputs."""
- if d > 1:
- k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k] # actual kernel-size
- if p is None:
- p = k // 2 if isinstance(k, int) else [x // 2 for x in k] # auto-pad
- return p
- class Conv(nn.Module):
- """Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)."""
- default_act = nn.SiLU() # default activation
- def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):
- """Initialize Conv layer with given arguments including activation."""
- super().__init__()
- self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)
- self.bn = nn.BatchNorm2d(c2)
- self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()
- def forward(self, x):
- """Apply convolution, batch normalization and activation to input tensor."""
- return self.act(self.bn(self.conv(x)))
- def forward_fuse(self, x):
- """Perform transposed convolution of 2D data."""
- return self.act(self.conv(x))
- class PSABlock(nn.Module):
- """
- PSABlock class implementing a Position-Sensitive Attention block for neural networks.
- This class encapsulates the functionality for applying multi-head attention and feed-forward neural network layers
- with optional shortcut connections.
- Attributes:
- attn (Attention): Multi-head attention module.
- ffn (nn.Sequential): Feed-forward neural network module.
- add (bool): Flag indicating whether to add shortcut connections.
- Methods:
- forward: Performs a forward pass through the PSABlock, applying attention and feed-forward layers.
- Examples:
- Create a PSABlock and perform a forward pass
- >>> psablock = PSABlock(c=128, attn_ratio=0.5, num_heads=4, shortcut=True)
- >>> input_tensor = torch.randn(1, 128, 32, 32)
- >>> output_tensor = psablock(input_tensor)
- """
- def __init__(self, c, attn_ratio=0.5, num_heads=4, shortcut=True) -> None:
- """Initializes the PSABlock with attention and feed-forward layers for enhanced feature extraction."""
- super().__init__()
- self.attn = DAttentionBaseline(c)
- self.ffn = nn.Sequential(Conv(c, c * 2, 1), Conv(c * 2, c, 1, act=False))
- self.add = shortcut
- def forward(self, x):
- """Executes a forward pass through PSABlock, applying attention and feed-forward layers to the input tensor."""
- x = x + self.attn(x) if self.add else self.attn(x)
- x = x + self.ffn(x) if self.add else self.ffn(x)
- return x
- class C2PSA_DAT(nn.Module):
- """
- C2PSA module with attention mechanism for enhanced feature extraction and processing.
- This module implements a convolutional block with attention mechanisms to enhance feature extraction and processing
- capabilities. It includes a series of PSABlock modules for self-attention and feed-forward operations.
- Attributes:
- c (int): Number of hidden channels.
- cv1 (Conv): 1x1 convolution layer to reduce the number of input channels to 2*c.
- cv2 (Conv): 1x1 convolution layer to reduce the number of output channels to c.
- m (nn.Sequential): Sequential container of PSABlock modules for attention and feed-forward operations.
- Methods:
- forward: Performs a forward pass through the C2PSA module, applying attention and feed-forward operations.
- Notes:
- This module essentially is the same as PSA module, but refactored to allow stacking more PSABlock modules.
- Examples:
- >>> c2psa = C2PSA(c1=256, c2=256, n=3, e=0.5)
- >>> input_tensor = torch.randn(1, 256, 64, 64)
- >>> output_tensor = c2psa(input_tensor)
- """
- def __init__(self, c1, c2, n=1, e=0.5):
- """Initializes the C2PSA module with specified input/output channels, number of layers, and expansion ratio."""
- super().__init__()
- assert c1 == c2
- self.c = int(c1 * e)
- self.cv1 = Conv(c1, 2 * self.c, 1, 1)
- self.cv2 = Conv(2 * self.c, c1, 1)
- self.m = nn.Sequential(*(PSABlock(self.c, attn_ratio=0.5, num_heads=self.c // 64) for _ in range(n)))
- def forward(self, x):
- """Processes the input tensor 'x' through a series of PSA blocks and returns the transformed tensor."""
- a, b = self.cv1(x).split((self.c, self.c), dim=1)
- b = self.m(b)
- return self.cv2(torch.cat((a, b), 1))
- if __name__ == "__main__":
- # Generating Sample image
- image_size = (1, 64, 224, 224)
- image = torch.rand(*image_size)
- # Model
- model = C2PSA_DAT(64, 64)
- out = model(image)
- print(out.size())
四、添加DAT到你的网络中
4.1 修改一
第一还是建立文件,我们找到如下 ultralytics /nn文件夹下建立一个目录名字呢就是'Addmodules'文件夹( 用群内的文件的话已经有了无需新建) !然后在其内部建立一个新的py文件将核心代码复制粘贴进去即可。
4.2 修改二
第二步我们在该目录下创建一个新的py文件名字为'__init__.py'( 用群内的文件的话已经有了无需新建) ,然后在其内部导入我们的检测头如下图所示。
4.3 修改三
第三步我门中到如下文件'ultralytics/nn/tasks.py'进行导入和注册我们的模块( 用群内的文件的话已经有了无需重新导入直接开始第四步即可) !
从今天开始以后的教程就都统一成这个样子了,因为我默认大家用了我群内的文件来进行修改!!
4.4 修改四
按照我的添加在parse_model里添加即可。
到此就修改完成了,大家可以复制下面的yaml文件运行。
五、DAT的yaml文件和运行记录
5.1 DAT的yaml文件1
此版本训练信息:YOLO11-C2PSA-DAT summary: 321 layers, 2,621,723 parameters, 2,621,707 gradients, 6.5 GFLOPs
- # Ultralytics YOLO 🚀, AGPL-3.0 license
- # YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
- # Parameters
- nc: 80 # number of classes
- scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
- # [depth, width, max_channels]
- n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
- s: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
- m: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
- l: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
- x: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs
- # YOLO11n backbone
- backbone:
- # [from, repeats, module, args]
- - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
- - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
- - [-1, 2, C3k2, [256, False, 0.25]]
- - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
- - [-1, 2, C3k2, [512, False, 0.25]]
- - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
- - [-1, 2, C3k2, [512, True]]
- - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
- - [-1, 2, C3k2, [1024, True]]
- - [-1, 1, SPPF, [1024, 5]] # 9
- - [-1, 2, C2PSA_DAT, [1024]] # 10
- # YOLO11n head
- head:
- - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- - [[-1, 6], 1, Concat, [1]] # cat backbone P4
- - [-1, 2, C3k2, [512, False]] # 13
- - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- - [[-1, 4], 1, Concat, [1]] # cat backbone P3
- - [-1, 2, C3k2, [256, False]] # 16 (P3/8-small)
- - [-1, 1, Conv, [256, 3, 2]]
- - [[-1, 13], 1, Concat, [1]] # cat head P4
- - [-1, 2, C3k2, [512, False]] # 19 (P4/16-medium)
- - [-1, 1, Conv, [512, 3, 2]]
- - [[-1, 10], 1, Concat, [1]] # cat head P5
- - [-1, 2, C3k2, [1024, True]] # 22 (P5/32-large)
- - [[16, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)
5.2 DAT的yaml文件2
注意力机制我这里其实是添加了三个但是实际一般生效就只添加一个就可以了,所以大家可以自行注释来尝试.
此版本训练信息:YOLO11-DAT summary: 361 layers, 2,983,579 parameters, 2,983,563 gradients, 7.2 GFLOPs
- # Ultralytics YOLO 🚀, AGPL-3.0 license
- # YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
- # Parameters
- nc: 80 # number of classes
- scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
- # [depth, width, max_channels]
- n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
- s: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
- m: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
- l: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
- x: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs
- # YOLO11n backbone
- backbone:
- # [from, repeats, module, args]
- - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
- - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
- - [-1, 2, C3k2, [256, False, 0.25]]
- - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
- - [-1, 2, C3k2, [512, False, 0.25]]
- - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
- - [-1, 2, C3k2, [512, True]]
- - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
- - [-1, 2, C3k2, [1024, True]]
- - [-1, 1, SPPF, [1024, 5]] # 9
- - [-1, 2, C2PSA, [1024]] # 10
- # YOLO11n head
- head:
- - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- - [[-1, 6], 1, Concat, [1]] # cat backbone P4
- - [-1, 2, C3k2, [512, False]] # 13
- - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- - [[-1, 4], 1, Concat, [1]] # cat backbone P3
- - [-1, 2, C3k2, [256, False]] # 16 (P3/8-small)
- - [-1, 1, DAttentionBaseline, []] # 17 (P3/8-small) 小目标检测层输出位置增加注意力机制
- - [-1, 1, Conv, [256, 3, 2]]
- - [[-1, 13], 1, Concat, [1]] # cat head P4
- - [-1, 2, C3k2, [512, False]] # 20 (P4/16-medium)
- - [-1, 1, DAttentionBaseline, []] # 21 (P4/16-medium) 中目标检测层输出位置增加注意力机制
- - [-1, 1, Conv, [512, 3, 2]]
- - [[-1, 10], 1, Concat, [1]] # cat head P5
- - [-1, 2, C3k2, [1024, True]] # 24 (P5/32-large)
- - [-1, 1, DAttentionBaseline, []] # 25 (P5/32-large) 大目标检测层输出位置增加注意力机制
- # 注意力机制我这里其实是添加了三个但是实际一般生效就只添加一个就可以了,所以大家可以自行注释来尝试.
- # 具体在那一层用注意力机制可以根据自己的数据集场景进行选择。
- # 如果你自己配置注意力位置注意from[17, 21, 25]位置要对应上对应的检测层!
- - [[17, 21, 25], 1, Detect, [nc]] # Detect(P3, P4, P5)
5.3 训练代码
大家可以创建一个py文件将我给的代码复制粘贴进去,配置好自己的文件路径即可运行。
- import warnings
- warnings.filterwarnings('ignore')
- from ultralytics import YOLO
- if __name__ == '__main__':
- model = YOLO('ultralytics/cfg/models/v8/yolov8-C2f-FasterBlock.yaml')
- # model.load('yolov8n.pt') # loading pretrain weights
- model.train(data=r'替换数据集yaml文件地址',
- # 如果大家任务是其它的'ultralytics/cfg/default.yaml'找到这里修改task可以改成detect, segment, classify, pose
- cache=False,
- imgsz=640,
- epochs=150,
- single_cls=False, # 是否是单类别检测
- batch=4,
- close_mosaic=10,
- workers=0,
- device='0',
- optimizer='SGD', # using SGD
- # resume='', # 如过想续训就设置last.pt的地址
- amp=False, # 如果出现训练损失为Nan可以关闭amp
- project='runs/train',
- name='exp',
- )
5.4 DAT的训练过程截图
六、本文总结
到此本文的正式分享内容就结束了,在这里给大家推荐我的YOLOv11改进有效涨点专栏,本专栏目前为新开的平均质量分98分,后期我会根据各种最新的前沿顶会进行论文复现,也会对一些老的改进机制进行补充, 目前本专栏免费阅读(暂时,大家尽早关注不迷路~) ,如果大家觉得本文帮助到你了,订阅本专栏,关注后续更多的更新~