学习资源站

YOLOv11改进-Conv_卷积篇-利用ModulatedDeformConv二次创新C3k2(降低网络层数加计算量)

一、本文介绍

本文给大家带来的改进机制是 ModulatedDeformConv 来替换我们 模型 下采样 操作, 同时含二次创新C3k2机制 ,其主要思想是通过引入可学习的空间 偏移量 ,实现感受野的动态调整,增强 卷积神经网络 对图像中几何变换的适应能力。不同于其它的Conv这种可变形Conv主要就是通过学习下采样的位置来进行提高检测精度,但是 这种方法可以减少计算量,网络层数 所以这个方法还是比较推荐大家在自己数据集上尝试一下的, 能够减少网络层数的机制不多。

欢迎大家订阅我的专栏一起学习YOLO!

训练信息1:YOLO11-C3k2-MDConv-1 summary: 315 layers, 2,616,209 parameters, 2,616,193 gradients, 6.4 GFLOPs
训练信息2:YOLO11-C3k2-MDConv-2 summary: 314 layers, 2,668,612 parameters, 2,668,596 gradients, 6.4 GFLOPs
训练信息3:YOLO11-MDConv summary: 313 layers, 2,718,804 parameters, 2,718,788 gradients, 5.4 GFLOPs
基础版本:YOLO11 summary: 319 layers, 2,591,010 parameters, 2,590,994 gradients, 6.4 GFLOPs

二、原理介绍

官方论文地址 官方论文地址点击此处即可跳转

官方代码地址: 官方代码地址点击此处即可跳转


可变形卷积(Deformable Convolution)的主要思想可以总结为以下几点:

1. 引入可学习的偏移量:传统卷积在固定的规则网格上对输入特征图进行采样,例如3×3的卷积核总是在预定义的固定位置上采样。可变形卷积通过引入可学习的2D偏移量,使每个采样点的位置可以根据输入特征自适应地调整。这意味着卷积核的采样位置不再固定,可以根据图像的内容进行变化。

2. 增强几何变换建模能力:通过引入这些偏移量,可变形卷积可以动态调整卷积核的 感受野 ,使其适应图像中不同尺度、形状和姿态的对象。这种灵活性使网络更好地捕捉复杂的几何变化,如物体的形变、旋转和缩放,从而提高对目标对象的特征表达能力。

3. 自适应空间变换:这些偏移量是由前一层的特征图通过额外的卷积层来学习的,并根据局部特征进行自适应调整。这种调整是针对输入特征的局部性、密集性和自适应性进行的,不需要任何额外的监督信号。这使得可变形卷积可以在保持轻量级的同时,灵活地处理复杂的几何变换。

4. 易于集成:可变形卷积可以直接替换现有CNN中的标准卷积层,且能够通过标准的反向传播进行端到端的训练。因此,它可以很容易地集成到现有的 深度学习 模型中,显著提高性能,尤其在目标检测和语义分割等视觉任务中表现突出。

简而言之,可变形卷积的核心思想是通过学习可变的空间偏移量,动态调整卷积核的采样位置,从而增强卷积神经网络对图像中几何变化的适应性,提高对复杂视觉任务的建模能力。


三、核心代码

核心代码的使用方式看章节四,需要注意的是该方法需要安装mmcv-full包,这是一个很难安装的第三方库!

  1. import math
  2. from typing import Optional, Tuple, Union
  3. import torch
  4. import torch.nn as nn
  5. from torch.autograd import Function
  6. from torch.autograd.function import once_differentiable
  7. from torch.nn.modules.utils import _pair, _single
  8. import importlib
  9. __all__ = ['C3k2_MDConv1', 'C3k2_MDConv2']
  10. ext_module = importlib.import_module('mmcv.' + "_ext")
  11. class ModulatedDeformConv2dFunction(Function):
  12. @staticmethod
  13. def symbolic(g, input, offset, mask, weight, bias, stride, padding,
  14. dilation, groups, deform_groups):
  15. input_tensors = [input, offset, mask, weight]
  16. if bias is not None:
  17. input_tensors.append(bias)
  18. return g.op(
  19. 'mmcv::MMCVModulatedDeformConv2d',
  20. *input_tensors,
  21. stride_i=stride,
  22. padding_i=padding,
  23. dilation_i=dilation,
  24. groups_i=groups,
  25. deform_groups_i=deform_groups)
  26. @staticmethod
  27. def forward(ctx,
  28. input: torch.Tensor,
  29. offset: torch.Tensor,
  30. mask: torch.Tensor,
  31. weight: nn.Parameter,
  32. bias: Optional[nn.Parameter] = None,
  33. stride: int = 1,
  34. padding: int = 0,
  35. dilation: int = 1,
  36. groups: int = 1,
  37. deform_groups: int = 1) -> torch.Tensor:
  38. if input is not None and input.dim() != 4:
  39. raise ValueError(
  40. f'Expected 4D tensor as input, got {input.dim()}D tensor \
  41. instead.')
  42. ctx.stride = _pair(stride)
  43. ctx.padding = _pair(padding)
  44. ctx.dilation = _pair(dilation)
  45. ctx.groups = groups
  46. ctx.deform_groups = deform_groups
  47. ctx.with_bias = bias is not None
  48. if not ctx.with_bias:
  49. bias = input.new_empty(0) # fake tensor
  50. # When pytorch version >= 1.6.0, amp is adopted for fp16 mode;
  51. # amp won't cast the type of model (float32), but "offset" is cast
  52. # to float16 by nn.Conv2d automatically, leading to the type
  53. # mismatch with input (when it is float32) or weight.
  54. # The flag for whether to use fp16 or amp is the type of "offset",
  55. # we cast weight and input to temporarily support fp16 and amp
  56. # whatever the pytorch version is.
  57. input = input.type_as(offset)
  58. weight = weight.type_as(input)
  59. bias = bias.type_as(input) # type: ignore
  60. ctx.save_for_backward(input, offset, mask, weight, bias)
  61. output = input.new_empty(
  62. ModulatedDeformConv2dFunction._output_size(ctx, input, weight))
  63. ctx._bufs = [input.new_empty(0), input.new_empty(0)]
  64. ext_module.modulated_deform_conv_forward(
  65. input,
  66. weight,
  67. bias,
  68. ctx._bufs[0],
  69. offset,
  70. mask,
  71. output,
  72. ctx._bufs[1],
  73. kernel_h=weight.size(2),
  74. kernel_w=weight.size(3),
  75. stride_h=ctx.stride[0],
  76. stride_w=ctx.stride[1],
  77. pad_h=ctx.padding[0],
  78. pad_w=ctx.padding[1],
  79. dilation_h=ctx.dilation[0],
  80. dilation_w=ctx.dilation[1],
  81. group=ctx.groups,
  82. deformable_group=ctx.deform_groups,
  83. with_bias=ctx.with_bias)
  84. return output
  85. @staticmethod
  86. @once_differentiable
  87. def backward(ctx, grad_output: torch.Tensor) -> tuple:
  88. input, offset, mask, weight, bias = ctx.saved_tensors
  89. grad_input = torch.zeros_like(input)
  90. grad_offset = torch.zeros_like(offset)
  91. grad_mask = torch.zeros_like(mask)
  92. grad_weight = torch.zeros_like(weight)
  93. grad_bias = torch.zeros_like(bias)
  94. grad_output = grad_output.contiguous()
  95. ext_module.modulated_deform_conv_backward(
  96. input,
  97. weight,
  98. bias,
  99. ctx._bufs[0],
  100. offset,
  101. mask,
  102. ctx._bufs[1],
  103. grad_input,
  104. grad_weight,
  105. grad_bias,
  106. grad_offset,
  107. grad_mask,
  108. grad_output,
  109. kernel_h=weight.size(2),
  110. kernel_w=weight.size(3),
  111. stride_h=ctx.stride[0],
  112. stride_w=ctx.stride[1],
  113. pad_h=ctx.padding[0],
  114. pad_w=ctx.padding[1],
  115. dilation_h=ctx.dilation[0],
  116. dilation_w=ctx.dilation[1],
  117. group=ctx.groups,
  118. deformable_group=ctx.deform_groups,
  119. with_bias=ctx.with_bias)
  120. if not ctx.with_bias:
  121. grad_bias = None
  122. return (grad_input, grad_offset, grad_mask, grad_weight, grad_bias,
  123. None, None, None, None, None)
  124. @staticmethod
  125. def _output_size(ctx, input, weight):
  126. channels = weight.size(0)
  127. output_size = (input.size(0), channels)
  128. for d in range(input.dim() - 2):
  129. in_size = input.size(d + 2)
  130. pad = ctx.padding[d]
  131. kernel = ctx.dilation[d] * (weight.size(d + 2) - 1) + 1
  132. stride_ = ctx.stride[d]
  133. output_size += ((in_size + (2 * pad) - kernel) // stride_ + 1, )
  134. if not all(map(lambda s: s > 0, output_size)):
  135. raise ValueError(
  136. 'convolution input is too small (output would be ' +
  137. 'x'.join(map(str, output_size)) + ')')
  138. return output_size
  139. modulated_deform_conv2d = ModulatedDeformConv2dFunction.apply
  140. class ModulatedDeformConv2d(nn.Module):
  141. def __init__(self,
  142. in_channels: int,
  143. out_channels: int,
  144. kernel_size: Union[int, Tuple[int]],
  145. stride: int = 1,
  146. padding: int = 1,
  147. dilation: int = 1,
  148. groups: int = 1,
  149. deform_groups: int = 1,
  150. bias: Union[bool, str] = True):
  151. super().__init__()
  152. self.in_channels = in_channels
  153. self.out_channels = out_channels
  154. self.kernel_size = _pair(kernel_size)
  155. self.stride = _pair(stride)
  156. self.padding = _pair(padding)
  157. self.dilation = _pair(dilation)
  158. self.groups = groups
  159. self.deform_groups = deform_groups
  160. # enable compatibility with nn.Conv2d
  161. self.transposed = False
  162. self.output_padding = _single(0)
  163. self.weight = nn.Parameter(
  164. torch.Tensor(out_channels, in_channels // groups,
  165. *self.kernel_size))
  166. if bias:
  167. self.bias = nn.Parameter(torch.Tensor(out_channels))
  168. else:
  169. self.register_parameter('bias', None)
  170. self.init_weights()
  171. def init_weights(self):
  172. n = self.in_channels
  173. for k in self.kernel_size:
  174. n *= k
  175. stdv = 1. / math.sqrt(n)
  176. self.weight.data.uniform_(-stdv, stdv)
  177. if self.bias is not None:
  178. self.bias.data.zero_()
  179. def forward(self, x: torch.Tensor, offset: torch.Tensor,
  180. mask: torch.Tensor) -> torch.Tensor:
  181. return modulated_deform_conv2d(x, offset, mask, self.weight, self.bias,
  182. self.stride, self.padding,
  183. self.dilation, self.groups,
  184. self.deform_groups)
  185. class ModulatedDeformConv2dPack(ModulatedDeformConv2d):
  186. """A ModulatedDeformable Conv Encapsulation that acts as normal Conv
  187. layers.
  188. Args:
  189. in_channels (int): Same as nn.Conv2d.
  190. out_channels (int): Same as nn.Conv2d.
  191. kernel_size (int or tuple[int]): Same as nn.Conv2d.
  192. stride (int): Same as nn.Conv2d, while tuple is not supported.
  193. padding (int): Same as nn.Conv2d, while tuple is not supported.
  194. dilation (int): Same as nn.Conv2d, while tuple is not supported.
  195. groups (int): Same as nn.Conv2d.
  196. bias (bool or str): If specified as `auto`, it will be decided by the
  197. norm_cfg. Bias will be set as True if norm_cfg is None, otherwise
  198. False.
  199. """
  200. _version = 2
  201. def __init__(self, *args, **kwargs):
  202. super().__init__(*args, **kwargs)
  203. self.conv_offset = nn.Conv2d(
  204. self.in_channels,
  205. self.deform_groups * 3 * self.kernel_size[0] * self.kernel_size[1],
  206. kernel_size=self.kernel_size,
  207. stride=self.stride,
  208. padding=self.padding,
  209. dilation=self.dilation,
  210. bias=True)
  211. self.init_weights()
  212. def init_weights(self) -> None:
  213. super().init_weights()
  214. if hasattr(self, 'conv_offset'):
  215. self.conv_offset.weight.data.zero_()
  216. self.conv_offset.bias.data.zero_()
  217. def forward(self, x: torch.Tensor) -> torch.Tensor: # type: ignore
  218. out = self.conv_offset(x)
  219. o1, o2, mask = torch.chunk(out, 3, dim=1)
  220. offset = torch.cat((o1, o2), dim=1)
  221. mask = torch.sigmoid(mask)
  222. return modulated_deform_conv2d(x, offset, mask, self.weight, self.bias,
  223. self.stride, self.padding,
  224. self.dilation, self.groups,
  225. self.deform_groups)
  226. def _load_from_state_dict(self, state_dict, prefix, local_metadata, strict,
  227. missing_keys, unexpected_keys, error_msgs):
  228. version = local_metadata.get('version', None)
  229. if version is None or version < 2:
  230. # the key is different in early versions
  231. # In version < 2, ModulatedDeformConvPack
  232. # loads previous benchmark models.
  233. if (prefix + 'conv_offset.weight' not in state_dict
  234. and prefix[:-1] + '_offset.weight' in state_dict):
  235. state_dict[prefix + 'conv_offset.weight'] = state_dict.pop(
  236. prefix[:-1] + '_offset.weight')
  237. if (prefix + 'conv_offset.bias' not in state_dict
  238. and prefix[:-1] + '_offset.bias' in state_dict):
  239. state_dict[prefix +
  240. 'conv_offset.bias'] = state_dict.pop(prefix[:-1] +
  241. '_offset.bias')
  242. super()._load_from_state_dict(state_dict, prefix, local_metadata,
  243. strict, missing_keys, unexpected_keys,
  244. error_msgs)
  245. def autopad(k, p=None, d=1): # kernel, padding, dilation
  246. """Pad to 'same' shape outputs."""
  247. if d > 1:
  248. k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k] # actual kernel-size
  249. if p is None:
  250. p = k // 2 if isinstance(k, int) else [x // 2 for x in k] # auto-pad
  251. return p
  252. class Conv(nn.Module):
  253. """Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)."""
  254. default_act = nn.SiLU() # default activation
  255. def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):
  256. """Initialize Conv layer with given arguments including activation."""
  257. super().__init__()
  258. self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)
  259. self.bn = nn.BatchNorm2d(c2)
  260. self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()
  261. def forward(self, x):
  262. """Apply convolution, batch normalization and activation to input tensor."""
  263. return self.act(self.bn(self.conv(x)))
  264. def forward_fuse(self, x):
  265. """Perform transposed convolution of 2D data."""
  266. return self.act(self.conv(x))
  267. class Bottleneck_MDConv(nn.Module):
  268. # Standard bottleneck with DCN
  269. def __init__(self, c1, c2, shortcut=True, g=1, k=(3, 3), e=0.5): # ch_in, ch_out, shortcut, groups, kernels, expand
  270. super().__init__()
  271. c_ = int(c2 * e) # hidden channels
  272. self.cv1 = Conv(c1, c_, k[0], 1)
  273. self.cv2 = ModulatedDeformConv2dPack(c_, c2, 3)
  274. self.add = shortcut and c1 == c2
  275. def forward(self, x):
  276. return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))
  277. class Bottleneck(nn.Module):
  278. """Standard bottleneck."""
  279. def __init__(self, c1, c2, shortcut=True, g=1, k=(3, 3), e=0.5):
  280. """Initializes a standard bottleneck module with optional shortcut connection and configurable parameters."""
  281. super().__init__()
  282. c_ = int(c2 * e) # hidden channels
  283. self.cv1 = Conv(c1, c_, k[0], 1)
  284. self.cv2 = Conv(c_, c2, k[1], 1, g=g)
  285. self.add = shortcut and c1 == c2
  286. def forward(self, x):
  287. """Applies the YOLO FPN to input data."""
  288. return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))
  289. class C2f(nn.Module):
  290. """Faster Implementation of CSP Bottleneck with 2 convolutions."""
  291. def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5):
  292. """Initializes a CSP bottleneck with 2 convolutions and n Bottleneck blocks for faster processing."""
  293. super().__init__()
  294. self.c = int(c2 * e) # hidden channels
  295. self.cv1 = Conv(c1, 2 * self.c, 1, 1)
  296. self.cv2 = Conv((2 + n) * self.c, c2, 1) # optional act=FReLU(c2)
  297. self.m = nn.ModuleList(Bottleneck(self.c, self.c, shortcut, g, k=((3, 3), (3, 3)), e=1.0) for _ in range(n))
  298. def forward(self, x):
  299. """Forward pass through C2f layer."""
  300. y = list(self.cv1(x).chunk(2, 1))
  301. y.extend(m(y[-1]) for m in self.m)
  302. return self.cv2(torch.cat(y, 1))
  303. def forward_split(self, x):
  304. """Forward pass using split() instead of chunk()."""
  305. y = list(self.cv1(x).split((self.c, self.c), 1))
  306. y.extend(m(y[-1]) for m in self.m)
  307. return self.cv2(torch.cat(y, 1))
  308. class C3(nn.Module):
  309. """CSP Bottleneck with 3 convolutions."""
  310. def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):
  311. """Initialize the CSP Bottleneck with given channels, number, shortcut, groups, and expansion values."""
  312. super().__init__()
  313. c_ = int(c2 * e) # hidden channels
  314. self.cv1 = Conv(c1, c_, 1, 1)
  315. self.cv2 = Conv(c1, c_, 1, 1)
  316. self.cv3 = Conv(2 * c_, c2, 1) # optional act=FReLU(c2)
  317. self.m = nn.Sequential(*(Bottleneck(c_, c_, shortcut, g, k=((1, 1), (3, 3)), e=1.0) for _ in range(n)))
  318. def forward(self, x):
  319. """Forward pass through the CSP bottleneck with 2 convolutions."""
  320. return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), 1))
  321. class C3k(C3):
  322. """C3k is a CSP bottleneck module with customizable kernel sizes for feature extraction in neural networks."""
  323. def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5, k=3):
  324. """Initializes the C3k module with specified channels, number of layers, and configurations."""
  325. super().__init__(c1, c2, n, shortcut, g, e)
  326. c_ = int(c2 * e) # hidden channels
  327. # self.m = nn.Sequential(*(RepBottleneck(c_, c_, shortcut, g, k=(k, k), e=1.0) for _ in range(n)))
  328. self.m = nn.Sequential(*(Bottleneck(c_, c_, shortcut, g, k=(k, k), e=1.0) for _ in range(n)))
  329. class C3kMDConv(C3):
  330. """C3k is a CSP bottleneck module with customizable kernel sizes for feature extraction in neural networks."""
  331. def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5, k=3):
  332. """Initializes the C3k module with specified channels, number of layers, and configurations."""
  333. super().__init__(c1, c2, n, shortcut, g, e)
  334. c_ = int(c2 * e) # hidden channels
  335. # self.m = nn.Sequential(*(RepBottleneck(c_, c_, shortcut, g, k=(k, k), e=1.0) for _ in range(n)))
  336. self.m = nn.Sequential(*(Bottleneck_MDConv(c_, c_, shortcut, g, k=(k, k), e=1.0) for _ in range(n)))
  337. class C3k2_MDConv1(C2f):
  338. """Faster Implementation of CSP Bottleneck with 2 convolutions."""
  339. def __init__(self, c1, c2, n=1, c3k=False, e=0.5, g=1, shortcut=True):
  340. """Initializes the C3k2 module, a faster CSP Bottleneck with 2 convolutions and optional C3k blocks."""
  341. super().__init__(c1, c2, n, shortcut, g, e)
  342. self.m = nn.ModuleList(
  343. C3k(self.c, self.c, 2, shortcut, g) if c3k else Bottleneck_MDConv(self.c, self.c, shortcut, g)for _ in range(n)
  344. )
  345. # 解析利用MLLABlock替换Bottneck
  346. class C3k2_MDConv2(C2f):
  347. """Faster Implementation of CSP Bottleneck with 2 convolutions."""
  348. def __init__(self, c1, c2, n=1, c3k=False, e=0.5, g=1, shortcut=True):
  349. """Initializes the C3k2 module, a faster CSP Bottleneck with 2 convolutions and optional C3k blocks."""
  350. super().__init__(c1, c2, n, shortcut, g, e)
  351. self.m = nn.ModuleList(
  352. C3kMDConv(self.c, self.c, 2, shortcut, g) if c3k else Bottleneck(self.c, self.c, shortcut, g) for _ in
  353. range(n)
  354. )
  355. # 解析利用MLLABlock替换C3k中的Bottneck
  356. if __name__ == '__main__':
  357. x1 = torch.randn(1, 32, 16, 16)
  358. x2 = torch.randn(1, 32, 16, 16)
  359. model = ModulatedDeformConv2dPack(32, 16, 3, 2, 1)
  360. x = model(x1)
  361. print(x.shape)


四、手把手教你添加本文机制


4.1 修改一

第一还是建立文件,我们找到如下 ultralytics /nn文件夹下建立一个目录名字呢就是'Addmodules'文件夹( 用群内的文件的话已经有了无需新建) !然后在其内部建立一个新的py文件将核心代码复制粘贴进去即可。


4.2 修改二

第二步我们在该目录下创建一个新的py文件名字为'__init__.py'( 用群内的文件的话已经有了无需新建) ,然后在其内部导入我们的检测头如下图所示。

​​


4.3 修改三

第三步我门中到如下文件'ultralytics/nn/tasks.py'进行导入和注册我们的模块( 用群内的文件的话已经有了无需重新导入直接开始第四步即可)

​​


4.4 修改四

按照我的添加在parse_model里添加即可。

​​


到此就修改完成了,大家可以复制下面的yaml文件运行。


五、正式训练


5.1 yaml文件1

训练信息:YOLO11-C3k2-MDConv-1 summary: 315 layers, 2,616,209 parameters, 2,616,193 gradients, 6.4 GFLOPs

  1. # Ultralytics YOLO 🚀, AGPL-3.0 license
  2. # YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
  3. # Parameters
  4. nc: 80 # number of classes
  5. scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
  6. # [depth, width, max_channels]
  7. n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
  8. s: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
  9. m: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
  10. l: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
  11. x: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs
  12. # YOLO11n backbone
  13. backbone:
  14. # [from, repeats, module, args]
  15. - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
  16. - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
  17. - [-1, 2, C3k2_MDConv1, [256, False, 0.25]]
  18. - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
  19. - [-1, 2, C3k2_MDConv1, [512, False, 0.25]]
  20. - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
  21. - [-1, 2, C3k2_MDConv1, [512, True]]
  22. - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
  23. - [-1, 2, C3k2_MDConv1, [1024, True]]
  24. - [-1, 1, SPPF, [1024, 5]] # 9
  25. - [-1, 2, C2PSA, [1024]] # 10
  26. # YOLO11n head
  27. head:
  28. - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  29. - [[-1, 6], 1, Concat, [1]] # cat backbone P4
  30. - [-1, 2, C3k2_MDConv1, [512, False]] # 13
  31. - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  32. - [[-1, 4], 1, Concat, [1]] # cat backbone P3
  33. - [-1, 2, C3k2_MDConv1, [256, False]] # 16 (P3/8-small)
  34. - [-1, 1, Conv, [256, 3, 2]]
  35. - [[-1, 13], 1, Concat, [1]] # cat head P4
  36. - [-1, 2, C3k2_MDConv1, [512, False]] # 19 (P4/16-medium)
  37. - [-1, 1, Conv, [512, 3, 2]]
  38. - [[-1, 10], 1, Concat, [1]] # cat head P5
  39. - [-1, 2, C3k2_MDConv1, [1024, True]] # 22 (P5/32-large)
  40. - [[16, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)


5.2 yaml文件2

训练信息:YOLO11-C3k2-MDConv-2 summary: 314 layers, 2,668,612 parameters, 2,668,596 gradients, 6.4 GFLOPs

  1. # Ultralytics YOLO 🚀, AGPL-3.0 license
  2. # YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
  3. # Parameters
  4. nc: 80 # number of classes
  5. scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
  6. # [depth, width, max_channels]
  7. n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
  8. s: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
  9. m: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
  10. l: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
  11. x: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs
  12. # YOLO11n backbone
  13. backbone:
  14. # [from, repeats, module, args]
  15. - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
  16. - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
  17. - [-1, 2, C3k2_MDConv2, [256, False, 0.25]]
  18. - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
  19. - [-1, 2, C3k2_MDConv2, [512, False, 0.25]]
  20. - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
  21. - [-1, 2, C3k2_MDConv2, [512, True]]
  22. - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
  23. - [-1, 2, C3k2_MDConv2, [1024, True]]
  24. - [-1, 1, SPPF, [1024, 5]] # 9
  25. - [-1, 2, C2PSA, [1024]] # 10
  26. # YOLO11n head
  27. head:
  28. - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  29. - [[-1, 6], 1, Concat, [1]] # cat backbone P4
  30. - [-1, 2, C3k2_MDConv2, [512, False]] # 13
  31. - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  32. - [[-1, 4], 1, Concat, [1]] # cat backbone P3
  33. - [-1, 2, C3k2_MDConv2, [256, False]] # 16 (P3/8-small)
  34. - [-1, 1, Conv, [256, 3, 2]]
  35. - [[-1, 13], 1, Concat, [1]] # cat head P4
  36. - [-1, 2, C3k2_MDConv2, [512, False]] # 19 (P4/16-medium)
  37. - [-1, 1, Conv, [512, 3, 2]]
  38. - [[-1, 10], 1, Concat, [1]] # cat head P5
  39. - [-1, 2, C3k2_MDConv2, [1024, True]] # 22 (P5/32-large)
  40. - [[16, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)


5.3 yaml文件3

训练信息:YOLO11-MDConv summary: 313 layers, 2,718,804 parameters, 2,718,788 gradients, 5.4 GFLOPs

  1. # Ultralytics YOLO 🚀, AGPL-3.0 license
  2. # YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
  3. # Parameters
  4. nc: 80 # number of classes
  5. scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
  6. # [depth, width, max_channels]
  7. n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
  8. s: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
  9. m: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
  10. l: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
  11. x: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs
  12. # YOLO11n backbone
  13. backbone:
  14. # [from, repeats, module, args]
  15. - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
  16. - [-1, 1, ModulatedDeformConv2dPack, [128, 3, 2]] # 1-P2/4
  17. - [-1, 2, C3k2, [256, False, 0.25]]
  18. - [-1, 1, ModulatedDeformConv2dPack, [256, 3, 2]] # 3-P3/8
  19. - [-1, 2, C3k2, [512, False, 0.25]]
  20. - [-1, 1, ModulatedDeformConv2dPack, [512, 3, 2]] # 5-P4/16
  21. - [-1, 2, C3k2, [512, True]]
  22. - [-1, 1, ModulatedDeformConv2dPack, [1024, 3, 2]] # 7-P5/32
  23. - [-1, 2, C3k2, [1024, True]]
  24. - [-1, 1, SPPF, [1024, 5]] # 9
  25. - [-1, 2, C2PSA, [1024]] # 10
  26. # YOLO11n head
  27. head:
  28. - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  29. - [[-1, 6], 1, Concat, [1]] # cat backbone P4
  30. - [-1, 2, C3k2, [512, False]] # 13
  31. - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  32. - [[-1, 4], 1, Concat, [1]] # cat backbone P3
  33. - [-1, 2, C3k2, [256, False]] # 16 (P3/8-small)
  34. - [-1, 1, ModulatedDeformConv2dPack, [256, 3, 2]]
  35. - [[-1, 13], 1, Concat, [1]] # cat head P4
  36. - [-1, 2, C3k2, [512, False]] # 19 (P4/16-medium)
  37. - [-1, 1, ModulatedDeformConv2dPack, [512, 3, 2]]
  38. - [[-1, 10], 1, Concat, [1]] # cat head P5
  39. - [-1, 2, C3k2, [1024, True]] # 22 (P5/32-large)
  40. - [[16, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)


5.4 训练代码

大家可以创建一个py文件将我给的代码复制粘贴进去,配置好自己的文件路径即可运行。

  1. import warnings
  2. warnings.filterwarnings('ignore')
  3. from ultralytics import YOLO
  4. if __name__ == '__main__':
  5. model = YOLO('yolov8-MLLA.yaml')
  6. # 如何切换模型版本, 上面的ymal文件可以改为 yolov8s.yaml就是使用的v8s,
  7. # 类似某个改进的yaml文件名称为yolov8-XXX.yaml那么如果想使用其它版本就把上面的名称改为yolov8l-XXX.yaml即可(改的是上面YOLO中间的名字不是配置文件的)!
  8. # model.load('yolov8n.pt') # 是否加载预训练权重,科研不建议大家加载否则很难提升精度
  9. model.train(data=r"C:\Users\Administrator\PycharmProjects\yolov5-master\yolov5-master\Construction Site Safety.v30-raw-images_latestversion.yolov8\data.yaml",
  10. # 如果大家任务是其它的'ultralytics/cfg/default.yaml'找到这里修改task可以改成detect, segment, classify, pose
  11. cache=False,
  12. imgsz=640,
  13. epochs=150,
  14. single_cls=False, # 是否是单类别检测
  15. batch=16,
  16. close_mosaic=0,
  17. workers=0,
  18. device='0',
  19. optimizer='SGD', # using SGD
  20. # resume='runs/train/exp21/weights/last.pt', # 如过想续训就设置last.pt的地址
  21. amp=True, # 如果出现训练损失为Nan可以关闭amp
  22. project='runs/train',
  23. name='exp',
  24. )


5.5 训练过程截图


五、本文总结

到此本文的正式分享内容就结束了,在这里给大家推荐我的YOLOv11改进有效涨点专栏,本专栏目前为新开的平均质量分98分,后期我会根据各种最新的前沿顶会进行论文复现,也会对一些老的改进机制进行补充,如果大家觉得本文帮助到你了,订阅本专栏,关注后续更多的更新~