学习资源站

YOLOv11改进-融合改进篇-利用尺度统一检测头DynamicHead融合P2增加小目标检测层(让小目标无所遁形)

一、本文介绍

本文给大家带来的最新改进机制是针对性的改进, 针对于小目标检测增加P2层 针对于大目标检测增加P6层 利用DynamicHead(原版本一比一复现,全网独一份,不同于网上魔改版本)进行检测,其中我们增加P2层其拥有更高的 分辨率 ,这使得 模型 能够更好地捕捉到小尺寸目标的细节。我们增加P6层是一个较低分辨率但具有更大感受野的特征层。对于大尺寸目标,这意味着模型可以更有效地捕捉到整体的结构信息。在这些的基础上我们配合DynamicHead可以使模型根据不同尺寸的目标动态调整其检测策略,进一步提升模型的精度。 本文的内容是订阅专栏的读者提出来的,所以大家订阅专栏以后如果有感兴趣的机制均可指定。

欢迎大家订阅我的专栏一起学习YOLO!




二、增加P2和P6层的好处

我们增加P2和P6层是为了改进 目标检测模型 ,特别是在处理不同大小目标的能力上。

1. 增加P2层的好处:

  • 改善小目标检测: P2层通常有更高的分辨率,这使得模型能够更好地捕捉到小尺寸目标的细节。较高分辨率的特征图能够提供更多的空间信息,有助于检测小物体。
  • 更精细的特征: 由于P2层处于网络的较浅层,它能够捕捉到更多的细粒度特征,这对于理解小目标的形状和纹理非常重要。

2. 增加P6层的好处:

  • 提升大目标检测性能: P6层是一个较低分辨率但具有更大感受野的特征层。对于大尺寸目标,这意味着模型可以更有效地捕捉到整体的结构信息。
  • 降低计算复杂度: 对于大目标,使用较低分辨率的特征图可以减少计算量,因为处理每个大目标需要的像素数较少。

3. 适应性能力的提升:

  • 使用DynamicHead可以使模型根据不同尺寸的目标动态调整其检测策略,进一步提升模型的泛化能力和适应性,从而进一步提高精度。

总结: 增加P2和P6层是为了让模型在处理不同尺寸的目标时更加高效和准确。这种策略特别适用于那些需要同时处理多种尺寸目标的应用场景的数据集,如街景图像分析、无人机视觉监控等。


三、DynamicHead的核心代码

代码的使用方式看章节四,看过前面本检测头的无需在重复添加!!!!!

  1. import copy
  2. import math
  3. from mmcv.ops import ModulatedDeformConv2d
  4. from ultralytics.utils.tal import dist2bbox, make_anchors
  5. import torch
  6. import torch.nn as nn
  7. import torch.nn.functional as F
  8. __all__ = ['DynamicHead']
  9. def _make_divisible(v, divisor, min_value=None):
  10. if min_value is None:
  11. min_value = divisor
  12. new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
  13. # Make sure that round down does not go down by more than 10%.
  14. if new_v < 0.9 * v:
  15. new_v += divisor
  16. return new_v
  17. class h_swish(nn.Module):
  18. def __init__(self, inplace=False):
  19. super(h_swish, self).__init__()
  20. self.inplace = inplace
  21. def forward(self, x):
  22. return x * F.relu6(x + 3.0, inplace=self.inplace) / 6.0
  23. class h_sigmoid(nn.Module):
  24. def __init__(self, inplace=True, h_max=1):
  25. super(h_sigmoid, self).__init__()
  26. self.relu = nn.ReLU6(inplace=inplace)
  27. self.h_max = h_max
  28. def forward(self, x):
  29. return self.relu(x + 3) * self.h_max / 6
  30. class DYReLU(nn.Module):
  31. def __init__(self, inp, oup, reduction=4, lambda_a=1.0, K2=True, use_bias=True, use_spatial=False,
  32. init_a=[1.0, 0.0], init_b=[0.0, 0.0]):
  33. super(DYReLU, self).__init__()
  34. self.oup = oup
  35. self.lambda_a = lambda_a * 2
  36. self.K2 = K2
  37. self.avg_pool = nn.AdaptiveAvgPool2d(1)
  38. self.use_bias = use_bias
  39. if K2:
  40. self.exp = 4 if use_bias else 2
  41. else:
  42. self.exp = 2 if use_bias else 1
  43. self.init_a = init_a
  44. self.init_b = init_b
  45. # determine squeeze
  46. if reduction == 4:
  47. squeeze = inp // reduction
  48. else:
  49. squeeze = _make_divisible(inp // reduction, 4)
  50. # print('reduction: {}, squeeze: {}/{}'.format(reduction, inp, squeeze))
  51. # print('init_a: {}, init_b: {}'.format(self.init_a, self.init_b))
  52. self.fc = nn.Sequential(
  53. nn.Linear(inp, squeeze),
  54. nn.ReLU(inplace=True),
  55. nn.Linear(squeeze, oup * self.exp),
  56. h_sigmoid()
  57. )
  58. if use_spatial:
  59. self.spa = nn.Sequential(
  60. nn.Conv2d(inp, 1, kernel_size=1),
  61. nn.BatchNorm2d(1),
  62. )
  63. else:
  64. self.spa = None
  65. def forward(self, x):
  66. if isinstance(x, list):
  67. x_in = x[0]
  68. x_out = x[1]
  69. else:
  70. x_in = x
  71. x_out = x
  72. b, c, h, w = x_in.size()
  73. y = self.avg_pool(x_in).view(b, c)
  74. y = self.fc(y).view(b, self.oup * self.exp, 1, 1)
  75. if self.exp == 4:
  76. a1, b1, a2, b2 = torch.split(y, self.oup, dim=1)
  77. a1 = (a1 - 0.5) * self.lambda_a + self.init_a[0] # 1.0
  78. a2 = (a2 - 0.5) * self.lambda_a + self.init_a[1]
  79. b1 = b1 - 0.5 + self.init_b[0]
  80. b2 = b2 - 0.5 + self.init_b[1]
  81. out = torch.max(x_out * a1 + b1, x_out * a2 + b2)
  82. elif self.exp == 2:
  83. if self.use_bias: # bias but not PL
  84. a1, b1 = torch.split(y, self.oup, dim=1)
  85. a1 = (a1 - 0.5) * self.lambda_a + self.init_a[0] # 1.0
  86. b1 = b1 - 0.5 + self.init_b[0]
  87. out = x_out * a1 + b1
  88. else:
  89. a1, a2 = torch.split(y, self.oup, dim=1)
  90. a1 = (a1 - 0.5) * self.lambda_a + self.init_a[0] # 1.0
  91. a2 = (a2 - 0.5) * self.lambda_a + self.init_a[1]
  92. out = torch.max(x_out * a1, x_out * a2)
  93. elif self.exp == 1:
  94. a1 = y
  95. a1 = (a1 - 0.5) * self.lambda_a + self.init_a[0] # 1.0
  96. out = x_out * a1
  97. if self.spa:
  98. ys = self.spa(x_in).view(b, -1)
  99. ys = F.softmax(ys, dim=1).view(b, 1, h, w) * h * w
  100. ys = F.hardtanh(ys, 0, 3, inplace=True) / 3
  101. out = out * ys
  102. return out
  103. class Conv3x3Norm(torch.nn.Module):
  104. def __init__(self, in_channels, out_channels, stride):
  105. super(Conv3x3Norm, self).__init__()
  106. self.conv = ModulatedDeformConv2d(in_channels, out_channels, kernel_size=3, stride=stride, padding=1)
  107. self.bn = nn.GroupNorm(num_groups=16, num_channels=out_channels)
  108. def forward(self, input, **kwargs):
  109. x = self.conv(input.contiguous(), **kwargs)
  110. x = self.bn(x)
  111. return x
  112. class DyConv(nn.Module):
  113. def __init__(self, in_channels=256, out_channels=256, conv_func=Conv3x3Norm):
  114. super(DyConv, self).__init__()
  115. self.DyConv = nn.ModuleList()
  116. self.DyConv.append(conv_func(in_channels, out_channels, 1))
  117. self.DyConv.append(conv_func(in_channels, out_channels, 1))
  118. self.DyConv.append(conv_func(in_channels, out_channels, 2))
  119. self.AttnConv = nn.Sequential(
  120. nn.AdaptiveAvgPool2d(1),
  121. nn.Conv2d(in_channels, 1, kernel_size=1),
  122. nn.ReLU(inplace=True))
  123. self.h_sigmoid = h_sigmoid()
  124. self.relu = DYReLU(in_channels, out_channels)
  125. self.offset = nn.Conv2d(in_channels, 27, kernel_size=3, stride=1, padding=1)
  126. self.init_weights()
  127. def init_weights(self):
  128. for m in self.DyConv.modules():
  129. if isinstance(m, nn.Conv2d):
  130. nn.init.normal_(m.weight.data, 0, 0.01)
  131. if m.bias is not None:
  132. m.bias.data.zero_()
  133. for m in self.AttnConv.modules():
  134. if isinstance(m, nn.Conv2d):
  135. nn.init.normal_(m.weight.data, 0, 0.01)
  136. if m.bias is not None:
  137. m.bias.data.zero_()
  138. def forward(self, x):
  139. next_x = {}
  140. feature_names = list(x.keys())
  141. for level, name in enumerate(feature_names):
  142. feature = x[name]
  143. offset_mask = self.offset(feature)
  144. offset = offset_mask[:, :18, :, :]
  145. mask = offset_mask[:, 18:, :, :].sigmoid()
  146. conv_args = dict(offset=offset, mask=mask)
  147. temp_fea = [self.DyConv[1](feature, **conv_args)]
  148. if level > 0:
  149. temp_fea.append(self.DyConv[2](x[feature_names[level - 1]], **conv_args))
  150. if level < len(x) - 1:
  151. input = x[feature_names[level + 1]]
  152. temp_fea.append(F.interpolate(self.DyConv[0](input, **conv_args),
  153. size=[feature.size(2), feature.size(3)]))
  154. attn_fea = []
  155. res_fea = []
  156. for fea in temp_fea:
  157. res_fea.append(fea)
  158. attn_fea.append(self.AttnConv(fea))
  159. res_fea = torch.stack(res_fea)
  160. spa_pyr_attn = self.h_sigmoid(torch.stack(attn_fea))
  161. mean_fea = torch.mean(res_fea * spa_pyr_attn, dim=0, keepdim=False)
  162. next_x[name] = self.relu(mean_fea)
  163. return next_x
  164. def autopad(k, p=None, d=1): # kernel, padding, dilation
  165. """Pad to 'same' shape outputs."""
  166. if d > 1:
  167. k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k] # actual kernel-size
  168. if p is None:
  169. p = k // 2 if isinstance(k, int) else [x // 2 for x in k] # auto-pad
  170. return p
  171. class Conv(nn.Module):
  172. """Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)."""
  173. default_act = nn.SiLU() # default activation
  174. def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):
  175. """Initialize Conv layer with given arguments including activation."""
  176. super().__init__()
  177. self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)
  178. self.bn = nn.BatchNorm2d(c2)
  179. self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()
  180. def forward(self, x):
  181. """Apply convolution, batch normalization and activation to input tensor."""
  182. return self.act(self.bn(self.conv(x)))
  183. def forward_fuse(self, x):
  184. """Perform transposed convolution of 2D data."""
  185. return self.act(self.conv(x))
  186. class DFL(nn.Module):
  187. """
  188. Integral module of Distribution Focal Loss (DFL).
  189. Proposed in Generalized Focal Loss https://ieeexplore.ieee.org/document/9792391
  190. """
  191. def __init__(self, c1=16):
  192. """Initialize a convolutional layer with a given number of input channels."""
  193. super().__init__()
  194. self.conv = nn.Conv2d(c1, 1, 1, bias=False).requires_grad_(False)
  195. x = torch.arange(c1, dtype=torch.float)
  196. self.conv.weight.data[:] = nn.Parameter(x.view(1, c1, 1, 1))
  197. self.c1 = c1
  198. def forward(self, x):
  199. """Applies a transformer layer on input tensor 'x' and returns a tensor."""
  200. b, c, a = x.shape # batch, channels, anchors
  201. return self.conv(x.view(b, 4, self.c1, a).transpose(2, 1).softmax(1)).view(b, 4, a)
  202. # return self.conv(x.view(b, self.c1, 4, a).softmax(1)).view(b, 4, a)
  203. class DWConv(Conv):
  204. """Depth-wise convolution."""
  205. def __init__(self, c1, c2, k=1, s=1, d=1, act=True): # ch_in, ch_out, kernel, stride, dilation, activation
  206. """Initialize Depth-wise convolution with given parameters."""
  207. super().__init__(c1, c2, k, s, g=math.gcd(c1, c2), d=d, act=act)
  208. class DynamicHead(nn.Module):
  209. """YOLOv8 Detect head for detection models. CSDNSnu77"""
  210. dynamic = False # force grid reconstruction
  211. export = False # export mode
  212. end2end = False # end2end
  213. max_det = 300 # max_det
  214. shape = None
  215. anchors = torch.empty(0) # init
  216. strides = torch.empty(0) # init
  217. def __init__(self, nc=80, ch=()):
  218. """Initializes the YOLOv8 detection layer with specified number of classes and channels."""
  219. super().__init__()
  220. self.nc = nc # number of classes
  221. self.nl = len(ch) # number of detection layers
  222. self.reg_max = 16 # DFL channels (ch[0] // 16 to scale 4/8/12/16/20 for n/s/m/l/x)
  223. self.no = nc + self.reg_max * 4 # number of outputs per anchor
  224. self.stride = torch.zeros(self.nl) # strides computed during build
  225. c2, c3 = max((16, ch[0] // 4, self.reg_max * 4)), max(ch[0], min(self.nc, 100)) # channels
  226. self.cv2 = nn.ModuleList(
  227. nn.Sequential(Conv(x, c2, 3), Conv(c2, c2, 3), nn.Conv2d(c2, 4 * self.reg_max, 1)) for x in ch
  228. )
  229. self.cv3 = nn.ModuleList(
  230. nn.Sequential(
  231. nn.Sequential(DWConv(x, x, 3), Conv(x, c3, 1)),
  232. nn.Sequential(DWConv(c3, c3, 3), Conv(c3, c3, 1)),
  233. nn.Conv2d(c3, self.nc, 1),
  234. )
  235. for x in ch
  236. )
  237. self.dfl = DFL(self.reg_max) if self.reg_max > 1 else nn.Identity()
  238. dyhead_tower = []
  239. for i in range(self.nl):
  240. channel = ch[i]
  241. dyhead_tower.append(
  242. DyConv(
  243. channel,
  244. channel,
  245. conv_func=Conv3x3Norm,
  246. )
  247. )
  248. self.add_module('dyhead_tower', nn.Sequential(*dyhead_tower))
  249. if self.end2end:
  250. self.one2one_cv2 = copy.deepcopy(self.cv2)
  251. self.one2one_cv3 = copy.deepcopy(self.cv3)
  252. def forward(self, x):
  253. tensor_dict = {i: tensor for i, tensor in enumerate(x)}
  254. x = self.dyhead_tower(tensor_dict)
  255. x = list(x.values())
  256. """Concatenates and returns predicted bounding boxes and class probabilities."""
  257. if self.end2end:
  258. return self.forward_end2end(x)
  259. for i in range(self.nl):
  260. x[i] = torch.cat((self.cv2[i](x[i]), self.cv3[i](x[i])), 1)
  261. if self.training: # Training path
  262. return x
  263. y = self._inference(x)
  264. return y if self.export else (y, x)
  265. def forward_end2end(self, x):
  266. """
  267. Performs forward pass of the v10Detect module.
  268. Args:
  269. x (tensor): Input tensor.
  270. Returns:
  271. (dict, tensor): If not in training mode, returns a dictionary containing the outputs of both one2many and one2one detections.
  272. If in training mode, returns a dictionary containing the outputs of one2many and one2one detections separately.
  273. """
  274. x_detach = [xi.detach() for xi in x]
  275. one2one = [
  276. torch.cat((self.one2one_cv2[i](x_detach[i]), self.one2one_cv3[i](x_detach[i])), 1) for i in range(self.nl)
  277. ]
  278. for i in range(self.nl):
  279. x[i] = torch.cat((self.cv2[i](x[i]), self.cv3[i](x[i])), 1)
  280. if self.training: # Training path
  281. return {"one2many": x, "one2one": one2one}
  282. y = self._inference(one2one)
  283. y = self.postprocess(y.permute(0, 2, 1), self.max_det, self.nc)
  284. return y if self.export else (y, {"one2many": x, "one2one": one2one})
  285. def _inference(self, x):
  286. """Decode predicted bounding boxes and class probabilities based on multiple-level feature maps."""
  287. # Inference path
  288. shape = x[0].shape # BCHW
  289. x_cat = torch.cat([xi.view(shape[0], self.no, -1) for xi in x], 2)
  290. if self.dynamic or self.shape != shape:
  291. self.anchors, self.strides = (x.transpose(0, 1) for x in make_anchors(x, self.stride, 0.5))
  292. self.shape = shape
  293. if self.export and self.format in {"saved_model", "pb", "tflite", "edgetpu", "tfjs"}: # avoid TF FlexSplitV ops
  294. box = x_cat[:, : self.reg_max * 4]
  295. cls = x_cat[:, self.reg_max * 4 :]
  296. else:
  297. box, cls = x_cat.split((self.reg_max * 4, self.nc), 1)
  298. if self.export and self.format in {"tflite", "edgetpu"}:
  299. # Precompute normalization factor to increase numerical stability
  300. # See https://github.com/ultralytics/ultralytics/issues/7371
  301. grid_h = shape[2]
  302. grid_w = shape[3]
  303. grid_size = torch.tensor([grid_w, grid_h, grid_w, grid_h], device=box.device).reshape(1, 4, 1)
  304. norm = self.strides / (self.stride[0] * grid_size)
  305. dbox = self.decode_bboxes(self.dfl(box) * norm, self.anchors.unsqueeze(0) * norm[:, :2])
  306. else:
  307. dbox = self.decode_bboxes(self.dfl(box), self.anchors.unsqueeze(0)) * self.strides
  308. return torch.cat((dbox, cls.sigmoid()), 1)
  309. def bias_init(self):
  310. """Initialize Detect() biases, WARNING: requires stride availability."""
  311. m = self # self.model[-1] # Detect() module
  312. # cf = torch.bincount(torch.tensor(np.concatenate(dataset.labels, 0)[:, 0]).long(), minlength=nc) + 1
  313. # ncf = math.log(0.6 / (m.nc - 0.999999)) if cf is None else torch.log(cf / cf.sum()) # nominal class frequency
  314. for a, b, s in zip(m.cv2, m.cv3, m.stride): # from
  315. a[-1].bias.data[:] = 1.0 # box
  316. b[-1].bias.data[: m.nc] = math.log(5 / m.nc / (640 / s) ** 2) # cls (.01 objects, 80 classes, 640 img)
  317. if self.end2end:
  318. for a, b, s in zip(m.one2one_cv2, m.one2one_cv3, m.stride): # from
  319. a[-1].bias.data[:] = 1.0 # box
  320. b[-1].bias.data[: m.nc] = math.log(5 / m.nc / (640 / s) ** 2) # cls (.01 objects, 80 classes, 640 img)
  321. def decode_bboxes(self, bboxes, anchors):
  322. """Decode bounding boxes."""
  323. return dist2bbox(bboxes, anchors, xywh=not self.end2end, dim=1)
  324. @staticmethod
  325. def postprocess(preds: torch.Tensor, max_det: int, nc: int = 80):
  326. """
  327. Post-processes YOLO model predictions.
  328. Args:
  329. preds (torch.Tensor): Raw predictions with shape (batch_size, num_anchors, 4 + nc) with last dimension
  330. format [x, y, w, h, class_probs].
  331. max_det (int): Maximum detections per image.
  332. nc (int, optional): Number of classes. Default: 80.
  333. Returns:
  334. (torch.Tensor): Processed predictions with shape (batch_size, min(max_det, num_anchors), 6) and last
  335. dimension format [x, y, w, h, max_class_prob, class_index].
  336. """
  337. batch_size, anchors, _ = preds.shape # i.e. shape(16,8400,84)
  338. boxes, scores = preds.split([4, nc], dim=-1)
  339. index = scores.amax(dim=-1).topk(min(max_det, anchors))[1].unsqueeze(-1)
  340. boxes = boxes.gather(dim=1, index=index.repeat(1, 1, 4))
  341. scores = scores.gather(dim=1, index=index.repeat(1, 1, nc))
  342. scores, index = scores.flatten(1).topk(min(max_det, anchors))
  343. i = torch.arange(batch_size)[..., None] # batch indices
  344. return torch.cat([boxes[i, index // nc], scores[..., None], (index % nc)[..., None].float()], dim=-1)
  345. if __name__ == "__main__":
  346. # Generating Sample image CSDN Snu77
  347. image1 = (1, 64, 32, 32)
  348. image2 = (1, 64, 16, 16)
  349. image3 = (1, 64, 8, 8)
  350. image1 = torch.rand(image1)
  351. image2 = torch.rand(image2)
  352. image3 = torch.rand(image3)
  353. image = [image1, image2, image3]
  354. channel = (64, 64, 64)
  355. # Model
  356. mobilenet_v1 = DynamicHead(nc=80, ch=channel) # 尺度统一检测头.
  357. out = mobilenet_v1(image)
  358. print(out)


四、手把手教你添加DynamicHead检测头

4.1 修改一

首先我们将上面的代码复制粘贴到' ultralytics /nn' 目录下新建一个py文件复制粘贴进去,具体名字自己来定,我这里起名为DynamicHead.py。


4.2 修改二

第二步我们在该目录下创建一个新的py文件名字为'__init__.py'( 用群内的文件的话已经有了无需新建) ,然后在其内部导入我们的检测头如下图所示。

​​


4.3 修改三

第三步我门中到如下文件'ultralytics/nn/tasks.py'进行导入和注册我们的模块( 用群内的文件的话已经有了无需重新导入直接开始第四步即可)

​​


4.4 修改四

第四步我门找到如下文件'ultralytics/nn/tasks.py,找到如下的代码进行将检测头添加进去,这里给大家推荐个快速搜索的方法用ctrl+f然后搜索Detect然后就能快速查找了。

​​​


4.5 修改五

同理

​​​


4.6 修改六

这里有一些不一样,我们需要加一行代码

  1. else:
  2. return 'detect'

为啥呢不一样,因为这里的m在代码执行过程中会将你的代码自动转换为小写,所以直接else方便一点,以后出现一些其它分割或者其它的教程的时候在提供其它的修改教程。

​​​


4.7 修改七

同理.

​​​


到此就修改完成了,大家可以复制下面的yaml文件运行。


五、DynamicHead检测头的yaml文件


5.1 DynamicHead和P2融合yaml文件

此版本训练信息:YOLO11-P2-DynamicHead summary: 485 layers, 2,515,396 parameters, 2,515,380 gradients, 16.0 GFLOPs

  1. # Ultralytics YOLO 🚀, AGPL-3.0 license
  2. # YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
  3. # Parameters
  4. nc: 80 # number of classes
  5. scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
  6. # [depth, width, max_channels]
  7. n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
  8. s: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
  9. m: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
  10. l: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
  11. x: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs
  12. # YOLO11n backbone
  13. backbone:
  14. # [from, repeats, module, args]
  15. - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
  16. - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
  17. - [-1, 2, C3k2, [256, False, 0.25]]
  18. - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
  19. - [-1, 2, C3k2, [512, False, 0.25]]
  20. - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
  21. - [-1, 2, C3k2, [512, True]]
  22. - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
  23. - [-1, 2, C3k2, [1024, True]]
  24. - [-1, 1, SPPF, [1024, 5]] # 9
  25. - [-1, 2, C2PSA, [1024]] # 10
  26. # YOLOv11.0-p2 head
  27. head:
  28. - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  29. - [[-1, 6], 1, Concat, [1]] # cat backbone P4
  30. - [-1, 2, C3k2, [256, False]] # 13
  31. - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  32. - [[-1, 4], 1, Concat, [1]] # cat backbone P3
  33. - [-1, 2, C3k2, [256, False]] # 16 (P3/8-small)
  34. - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  35. - [[-1, 2], 1, Concat, [1]] # cat backbone P2
  36. - [-1, 2, C3k2, [256, False]] # 19 (P2/4-xsmall) # 小目标可以尝试将这里的False设置为True.
  37. - [-1, 1, Conv, [128, 3, 2]]
  38. - [[-1, 16], 1, Concat, [1]] # cat head P3
  39. - [-1, 2, C3k2, [256, False]] # 22 (P3/8-small)
  40. - [-1, 1, Conv, [256, 3, 2]]
  41. - [[-1, 13], 1, Concat, [1]] # cat head P4
  42. - [-1, 2, C3k2, [256, False]] # 25 (P4/16-medium)
  43. - [-1, 1, Conv, [256, 3, 2]]
  44. - [[-1, 10], 1, Concat, [1]] # cat head P5
  45. - [-1, 2, C3k2, [256, True]] # 28 (P5/32-large)
  46. - [[19, 22, 25, 28], 1, DynamicHead, [nc]] # Detect(P2, P3, P4, P5)


5.2 DynamicHead和P6融合yaml文件

此版本训练信息:YOLO11-P6-DynamicHead summary: 491 layers, 2,725,596 parameters, 2,725,580 gradients, 5.0 GFLOPs

  1. # Ultralytics YOLO 🚀, AGPL-3.0 license
  2. # YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
  3. # Parameters
  4. nc: 80 # number of classes
  5. scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
  6. # [depth, width, max_channels]
  7. n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
  8. s: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
  9. m: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
  10. l: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
  11. x: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs
  12. # YOLO11n backbone
  13. backbone:
  14. # [from, repeats, module, args]
  15. - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
  16. - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
  17. - [-1, 2, C3k2, [128, False, 0.25]]
  18. - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
  19. - [-1, 2, C3k2, [256, False, 0.25]]
  20. - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
  21. - [-1, 2, C3k2, [512, False, 0.25]]
  22. - [-1, 1, Conv, [768, 3, 2]] # 7-P5/32
  23. - [-1, 2, C3k2, [768, True]]
  24. - [-1, 1, Conv, [1024, 3, 2]] # 9-P6/64
  25. - [-1, 2, C3k2, [1024, True]]
  26. - [-1, 1, SPPF, [1024, 5]] # 11
  27. # YOLOv11.0x6 head
  28. head:
  29. - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  30. - [[-1, 8], 1, Concat, [1]] # cat backbone P5
  31. - [-1, 2, C3k2, [256, False]] # 14
  32. - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  33. - [[-1, 6], 1, Concat, [1]] # cat backbone P4
  34. - [-1, 2, C3k2, [256, False]] # 17
  35. - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  36. - [[-1, 4], 1, Concat, [1]] # cat backbone P3
  37. - [-1, 2, C3k2, [256, False]] # 20 (P3/8-small)
  38. - [-1, 1, Conv, [256, 3, 2]]
  39. - [[-1, 17], 1, Concat, [1]] # cat head P4
  40. - [-1, 2, C3k2, [256, False]] # 23 (P4/16-medium)
  41. - [-1, 1, Conv, [512, 3, 2]]
  42. - [[-1, 14], 1, Concat, [1]] # cat head P5
  43. - [-1, 2, C3k2, [256, True]] # 26 (P5/32-large)
  44. - [-1, 1, Conv, [256, 3, 2]]
  45. - [[-1, 11], 1, Concat, [1]] # cat head P6
  46. - [-1, 2, C3k2, [256, True]] # 29 (P6/64-xlarge) # True也可设置False尝试.
  47. - [[20, 23, 26, 29], 1, DynamicHead, [nc]] # Detect(P3, P4, P5, P6)


六、完美运行记录

最后提供一下完美运行的图片。

​​


七、本文总结

到此本文的正式分享内容就结束了,在这里给大家推荐我的YOLOv11改进有效涨点专栏,本专栏目前为新开的平均质量分98分,后期我会根据各种最新的前沿顶会进行论文复现,也会对一些老的改进机制进行补充,如果大家觉得本文帮助到你了,订阅本专栏,关注后续更多的更新~