学习资源站

YOLOv11改进-Neck篇-利用Gold-YOLO改进yolov11对小目标检测能力(全网独家首发)

一、本文介绍

本文给大家带来的最新改进机制是 Gold-YOLO 利用其 Neck 改进v11的Neck,GoLd-YOLO引入了一种新的机制—— 信息聚集-分发 (Gather-and-Distribute, GD) 。这个机制通过全局融合不同层次的特征并将融合后的全局信息注入到各个层级中,从而实现 更高效 的信息交互和融合。这种方法增强了 模型 的颈部(neck)信息融合能力 (有点类似于长颈鹿的脖子该Neck部分很长) 同时也没有显著增加延迟, 提高了模型在检测不同大小物体时的性能, 同时欢迎大家订阅本专栏,本专栏每周更新3-5篇最新机制,更有包含我所有改进的文件和交流群提供给大家。

欢迎大家订阅我的专栏一起学习YOLO!



二、Gold-YOLO模型原理

论文地址: 官方论文地址

代码地址: 官方代码地址


2.1 Gold-YOLO的基本原理

Gold-YOLO 是一种先进的目标检测模型,它通过一种创新的 聚合-分发(Gather-and-Distribute, GD)机制 来提高信息融合效率。这一机制利用卷积和自注意力操作来处理来自网络不同层的信息。通过这种方式,Gold-YOLO能够更有效地融合多尺度特征,实现低延迟和高准确性之间的理想平衡。此外,Gold-YOLO还首次在YOLO系列中采用了 MAE风格的预训练 ,从而提高了模型的学习效率和准确度。

Gold-YOLO的基本原理可以概括如下:

1. 聚合-分发机制(GD) : 通过卷积和自注意力操作实现,这一机制有效地融合了来自网络不同层的信息。

2. 多尺度特征融合: GD 机制提高了多尺度特征的融合能力,从而提升了目标检测的准确性。

3. MAE风格预训练: 首次在YOLO系列中采用,提高了模型的学习效率和准确度。

下面我将为大家展示 Gold-YOLO架构

主要包括以下几个部分:

1. 主干(Backbone): 对输入图像进行初步处理,提取特征。
2. 低阶聚合分发(Low-GD)分支: 用于对较大尺寸特征图进行对齐(Low-FAM)和融合(Low-IFM)。
3. 高阶聚合分发(High-GD)分支: 用于对较小尺寸特征图进行对齐(High-FAM)和融合(High-IFM)。
4. 注入模块(Inject): 将融合的信息整合并传递给检测头部。
5. 头部(Head): 利用融合后的特征进行目标检测。

总结: 在这张图中,Gold-YOLO的多尺度特征融合体现在低阶(Low-GD)和高阶(High-GD)聚合-分发分支的设计上。这两个分支通过特征对齐模块(FAM)和信息融合模块(IFM)来处理不同尺寸的特征图。通过这种结构,Gold-YOLO可以有效地融合来自网络不同深度层次的信息,这对于准确检测不同大小的目标至关重要。


2.2 聚合-分发机制(GD)

聚合-分发机制(GD) 是Gold-YOLO模型的核心特征之一,其目的是 解决信息融合问题 。在这个机制中,采用 特征对齐模块(FAM) 信息融合模块(IFM) 对不同层级的特征进行聚合,并通过 信息注入模块(Inject) 将融合后的信息分发回网络的各个层级。这样,模型就能更有效地利用多尺度特征,从而在保持低延迟的同时提高目标检测的准确性。

下面展示给大家的图像展示了Gold-YOLO架构中的 两个关键模块

(a) 信息注入模块(Inject): 该模块通过卷积和Sigmoid激活 函数 等操作结合本地特征和全局特征,旨在用全局上下文信息增强特征图,这对于准确的目标检测至关重要。

(b) 轻量级邻层融合(LAF)模块: 此模块用于改进相邻层特征图的融合。它使用平均池化和双线性上/下采样等操作来对齐和合并特征图,从而确保每一层的本地特征都富含来自其直接邻层的信息。

总结: 图中展示的信息注入模块和轻量级邻层融合(LAF)模块是实现高效信息融合的关键组成部分,通过结合不同层的局部(本地)和全局特征,提高了模型的目标检测 性能


2.3 多尺度特征融合

多尺度特征融合 是一种在目标检测模型中常用的技术,旨在提高模型对不同大小目标的检测能力。通过结合来自网络不同层级的特征,该技术能够捕获从粗糙到精细的多种尺度的信息。低层次特征通常含有更多关于小对象的细节,而高层次特征则捕捉到大对象的语义信息。多尺度特征融合通过 聚合这些层级的特征 来增强模型的表示能力,使得模型能够更准确地识别和定位图像中的各种尺寸的对象。

下面展示了 Gold-YOLO模型中的聚合-分发结构

图(a)中的 低阶聚合分发(Low-GD) 分支包括低阶特征对齐模块(Low-FAM)和低阶信息融合模块(Low-IFM)。图(b)中的 高阶聚合分发(High-GD) 分支包含高阶特征对齐模块(High-FAM)和高阶信息融合模块(High-IFM)。

总结: 这两个分支是Gold-YOLO模型中处理不同尺寸特征图并提高目标检测性能的关键部分,通过不同尺度的特征对齐(FAM)和信息融合(IFM)模块,增强了模型处理不同尺度特征并提高目标检测性能的能力。


2.4 MAE风格预训练

MAE风格预训练(Masked Autoencoder for self-supervised learning) 指的是一种 自监督学习方法 ,用于提升模型在处理大规模数据集时的学习效率和准确性。在这种预训练方法中,模型被训练来重建输入数据中被随机遮蔽的部分,通过这一过程模型学习到了数据的内在表示。这种训练方式 不依赖于标签数据 ,使得模型能够学习到丰富的数据表示。 在计算机视觉领域 ,MAE风格预训练尤其有效,因为它促使模型捕捉到图像的结构性特征和内容,从而在之后的监督学习任务中,如目标检测或图像分类,能更快地收敛并提高性能。 在Gold-YOLO中 ,采用MAE预训练进一步提升了模型对图像特征的理解,从而在目标检测任务中实现了更高的准确率。


三、Gold-YOLO核心代码

使用方式看章节四,同时其中代码使用涉及到mmcv这个库,这个库需要很强的版本特征,如果你们通过pip下载失败,可以通过下面的链接直接下载编译版本的,然后到本地通过pip安装即可,同时我以后发的检测头基本上都需要这个mmcv库。

  1. import torch
  2. from torch import nn
  3. import torch.nn.functional as F
  4. import numpy as np
  5. from mmcv.cnn import ConvModule, build_norm_layer
  6. __all__ = ('Low_FAM', 'Low_IFM', 'Split', 'SimConv', 'Low_LAF', 'Inject', 'RepBlock', 'High_FAM', 'High_IFM', 'High_LAF')
  7. class High_LAF(nn.Module):
  8. def forward(self, x1, x2):
  9. if torch.onnx.is_in_onnx_export():
  10. self.pool = onnx_AdaptiveAvgPool2d
  11. else:
  12. self.pool = nn.functional.adaptive_avg_pool2d
  13. N, C, H, W = x2.shape
  14. # output_size = np.array([H, W])
  15. output_size = [H, W]
  16. x1 = self.pool(x1, output_size)
  17. return torch.cat([x1, x2], 1)
  18. class High_IFM(nn.Module):
  19. def __init__(self, block_num, embedding_dim, key_dim, num_heads,
  20. mlp_ratio=4., attn_ratio=2., drop=0., attn_drop=0., drop_path=0.,
  21. norm_cfg=dict(type='BN', requires_grad=True),
  22. act_layer=nn.ReLU6):
  23. super().__init__()
  24. self.block_num = block_num
  25. drop_path = [x.item() for x in torch.linspace(0, drop_path[0], drop_path[1])] # 0.1, 2
  26. self.transformer_blocks = nn.ModuleList()
  27. for i in range(self.block_num):
  28. self.transformer_blocks.append(top_Block(
  29. embedding_dim, key_dim=key_dim, num_heads=num_heads,
  30. mlp_ratio=mlp_ratio, attn_ratio=attn_ratio,
  31. drop=drop, drop_path=drop_path[i] if isinstance(drop_path, list) else drop_path,
  32. norm_cfg=norm_cfg, act_layer=act_layer))
  33. def forward(self, x):
  34. # token * N
  35. for i in range(self.block_num):
  36. x = self.transformer_blocks[i](x)
  37. return x
  38. class Mlp(nn.Module):
  39. def __init__(self, in_features, hidden_features=None, out_features=None, act_layer=nn.ReLU, drop=0.,
  40. norm_cfg=dict(type='BN', requires_grad=True)):
  41. super().__init__()
  42. out_features = out_features or in_features
  43. hidden_features = hidden_features or in_features
  44. self.fc1 = Conv2d_BN(in_features, hidden_features, norm_cfg=norm_cfg)
  45. self.dwconv = nn.Conv2d(hidden_features, hidden_features, 3, 1, 1, bias=True, groups=hidden_features)
  46. self.act = act_layer()
  47. self.fc2 = Conv2d_BN(hidden_features, out_features, norm_cfg=norm_cfg)
  48. self.drop = nn.Dropout(drop)
  49. def forward(self, x):
  50. x = self.fc1(x)
  51. x = self.dwconv(x)
  52. x = self.act(x)
  53. x = self.drop(x)
  54. x = self.fc2(x)
  55. x = self.drop(x)
  56. return x
  57. class top_Block(nn.Module):
  58. def __init__(self, dim, key_dim, num_heads, mlp_ratio=4., attn_ratio=2., drop=0.,
  59. drop_path=0., act_layer=nn.ReLU, norm_cfg=dict(type='BN2d', requires_grad=True)):
  60. super().__init__()
  61. self.dim = dim
  62. self.num_heads = num_heads
  63. self.mlp_ratio = mlp_ratio
  64. self.attn = Attention(dim, key_dim=key_dim, num_heads=num_heads, attn_ratio=attn_ratio, activation=act_layer,
  65. norm_cfg=norm_cfg)
  66. # NOTE: drop path for stochastic depth, we shall see if this is better than dropout here
  67. self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
  68. mlp_hidden_dim = int(dim * mlp_ratio)
  69. self.mlp = Mlp(in_features=dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop,
  70. norm_cfg=norm_cfg)
  71. def forward(self, x1):
  72. x1 = x1 + self.drop_path(self.attn(x1))
  73. x1 = x1 + self.drop_path(self.mlp(x1))
  74. return x1
  75. def drop_path(x, drop_prob: float = 0., training: bool = False):
  76. """Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).
  77. This is the same as the DropConnect impl I created for EfficientNet, etc networks, however,
  78. the original name is misleading as 'Drop Connect' is a different form of dropout in a separate paper...
  79. See discussion: https://github.com/tensorflow/tpu/issues/494#issuecomment-532968956 ... I've opted for
  80. changing the layer and argument names to 'drop path' rather than mix DropConnect as a layer name and use
  81. 'survival rate' as the argument.
  82. """
  83. if drop_prob == 0. or not training:
  84. return x
  85. keep_prob = 1 - drop_prob
  86. shape = (x.shape[0],) + (1,) * (x.ndim - 1) # work with diff dim tensors, not just 2D ConvNets
  87. random_tensor = keep_prob + torch.rand(shape, dtype=x.dtype, device=x.device)
  88. random_tensor.floor_() # binarize
  89. output = x.div(keep_prob) * random_tensor
  90. return output
  91. class DropPath(nn.Module):
  92. """Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).
  93. """
  94. def __init__(self, drop_prob=None):
  95. super(DropPath, self).__init__()
  96. self.drop_prob = drop_prob
  97. def forward(self, x):
  98. return drop_path(x, self.drop_prob, self.training)
  99. class Attention(torch.nn.Module):
  100. def __init__(self, dim, key_dim, num_heads,
  101. attn_ratio=4,
  102. activation=None,
  103. norm_cfg=dict(type='BN', requires_grad=True), ):
  104. super().__init__()
  105. self.num_heads = num_heads
  106. self.scale = key_dim ** -0.5
  107. self.key_dim = key_dim
  108. self.nh_kd = nh_kd = key_dim * num_heads # num_head key_dim
  109. self.d = int(attn_ratio * key_dim)
  110. self.dh = int(attn_ratio * key_dim) * num_heads
  111. self.attn_ratio = attn_ratio
  112. self.to_q = Conv2d_BN(dim, nh_kd, 1, norm_cfg=norm_cfg)
  113. self.to_k = Conv2d_BN(dim, nh_kd, 1, norm_cfg=norm_cfg)
  114. self.to_v = Conv2d_BN(dim, self.dh, 1, norm_cfg=norm_cfg)
  115. self.proj = torch.nn.Sequential(activation(), Conv2d_BN(
  116. self.dh, dim, bn_weight_init=0, norm_cfg=norm_cfg))
  117. def forward(self, x): # x (B,N,C)
  118. B, C, H, W = get_shape(x)
  119. qq = self.to_q(x).reshape(B, self.num_heads, self.key_dim, H * W).permute(0, 1, 3, 2)
  120. kk = self.to_k(x).reshape(B, self.num_heads, self.key_dim, H * W)
  121. vv = self.to_v(x).reshape(B, self.num_heads, self.d, H * W).permute(0, 1, 3, 2)
  122. attn = torch.matmul(qq, kk)
  123. attn = attn.softmax(dim=-1) # dim = k
  124. xx = torch.matmul(attn, vv)
  125. xx = xx.permute(0, 1, 3, 2).reshape(B, self.dh, H, W)
  126. xx = self.proj(xx)
  127. return xx
  128. def get_shape(tensor):
  129. shape = tensor.shape
  130. if torch.onnx.is_in_onnx_export():
  131. shape = [i.cpu().numpy() for i in shape]
  132. return shape
  133. class Conv2d_BN(nn.Sequential):
  134. def __init__(self, a, b, ks=1, stride=1, pad=0, dilation=1,
  135. groups=1, bn_weight_init=1,
  136. norm_cfg=dict(type='BN', requires_grad=True)):
  137. super().__init__()
  138. self.inp_channel = a
  139. self.out_channel = b
  140. self.ks = ks
  141. self.pad = pad
  142. self.stride = stride
  143. self.dilation = dilation
  144. self.groups = groups
  145. self.add_module('c', nn.Conv2d(
  146. a, b, ks, stride, pad, dilation, groups, bias=False))
  147. bn = build_norm_layer(norm_cfg, b)[1]
  148. nn.init.constant_(bn.weight, bn_weight_init)
  149. nn.init.constant_(bn.bias, 0)
  150. self.add_module('bn', bn)
  151. class High_FAM(nn.Module):
  152. def __init__(self, stride, pool_mode='onnx'):
  153. super().__init__()
  154. self.stride = stride
  155. if pool_mode == 'torch':
  156. self.pool = nn.functional.adaptive_avg_pool2d
  157. elif pool_mode == 'onnx':
  158. self.pool = onnx_AdaptiveAvgPool2d
  159. def forward(self, inputs):
  160. B, C, H, W = get_shape(inputs[-1])
  161. H = (H - 1) // self.stride + 1
  162. W = (W - 1) // self.stride + 1
  163. # output_size = np.array([H, W])
  164. output_size = [H, W]
  165. if not hasattr(self, 'pool'):
  166. self.pool = nn.functional.adaptive_avg_pool2d
  167. if torch.onnx.is_in_onnx_export():
  168. self.pool = onnx_AdaptiveAvgPool2d
  169. out = [self.pool(inp, output_size) for inp in inputs]
  170. return torch.cat(out, dim=1)
  171. class RepVGGBlock(nn.Module):
  172. '''RepVGGBlock is a basic rep-style block, including training and deploy status
  173. This code is based on https://github.com/DingXiaoH/RepVGG/blob/main/repvgg.py
  174. '''
  175. def __init__(self, in_channels, out_channels, kernel_size=3,
  176. stride=1, padding=1, dilation=1, groups=1, padding_mode='zeros', deploy=False, use_se=False):
  177. super(RepVGGBlock, self).__init__()
  178. """ Initialization of the class.
  179. Args:
  180. in_channels (int): Number of channels in the input image
  181. out_channels (int): Number of channels produced by the convolution
  182. kernel_size (int or tuple): Size of the convolving kernel
  183. stride (int or tuple, optional): Stride of the convolution. Default: 1
  184. padding (int or tuple, optional): Zero-padding added to both sides of
  185. the input. Default: 1
  186. dilation (int or tuple, optional): Spacing between kernel elements. Default: 1
  187. groups (int, optional): Number of blocked connections from input
  188. channels to output channels. Default: 1
  189. padding_mode (string, optional): Default: 'zeros'
  190. deploy: Whether to be deploy status or training status. Default: False
  191. use_se: Whether to use se. Default: False
  192. """
  193. self.deploy = deploy
  194. self.groups = groups
  195. self.in_channels = in_channels
  196. self.out_channels = out_channels
  197. assert kernel_size == 3
  198. assert padding == 1
  199. padding_11 = padding - kernel_size // 2
  200. self.nonlinearity = nn.ReLU()
  201. if use_se:
  202. raise NotImplementedError("se block not supported yet")
  203. else:
  204. self.se = nn.Identity()
  205. if deploy:
  206. self.rbr_reparam = nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size,
  207. stride=stride,
  208. padding=padding, dilation=dilation, groups=groups, bias=True,
  209. padding_mode=padding_mode)
  210. else:
  211. self.rbr_identity = nn.BatchNorm2d(
  212. num_features=in_channels) if out_channels == in_channels and stride == 1 else None
  213. self.rbr_dense = conv_bn(in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size,
  214. stride=stride, padding=padding, groups=groups)
  215. self.rbr_1x1 = conv_bn(in_channels=in_channels, out_channels=out_channels, kernel_size=1, stride=stride,
  216. padding=padding_11, groups=groups)
  217. def forward(self, inputs):
  218. '''Forward process'''
  219. if hasattr(self, 'rbr_reparam'):
  220. return self.nonlinearity(self.se(self.rbr_reparam(inputs)))
  221. if self.rbr_identity is None:
  222. id_out = 0
  223. else:
  224. id_out = self.rbr_identity(inputs)
  225. return self.nonlinearity(self.se(self.rbr_dense(inputs) + self.rbr_1x1(inputs) + id_out))
  226. def get_equivalent_kernel_bias(self):
  227. kernel3x3, bias3x3 = self._fuse_bn_tensor(self.rbr_dense)
  228. kernel1x1, bias1x1 = self._fuse_bn_tensor(self.rbr_1x1)
  229. kernelid, biasid = self._fuse_bn_tensor(self.rbr_identity)
  230. return kernel3x3 + self._pad_1x1_to_3x3_tensor(kernel1x1) + kernelid, bias3x3 + bias1x1 + biasid
  231. def _pad_1x1_to_3x3_tensor(self, kernel1x1):
  232. if kernel1x1 is None:
  233. return 0
  234. else:
  235. return torch.nn.functional.pad(kernel1x1, [1, 1, 1, 1])
  236. def _fuse_bn_tensor(self, branch):
  237. if branch is None:
  238. return 0, 0
  239. if isinstance(branch, nn.Sequential):
  240. kernel = branch.conv.weight
  241. running_mean = branch.bn.running_mean
  242. running_var = branch.bn.running_var
  243. gamma = branch.bn.weight
  244. beta = branch.bn.bias
  245. eps = branch.bn.eps
  246. else:
  247. assert isinstance(branch, nn.BatchNorm2d)
  248. if not hasattr(self, 'id_tensor'):
  249. input_dim = self.in_channels // self.groups
  250. kernel_value = np.zeros((self.in_channels, input_dim, 3, 3), dtype=np.float32)
  251. for i in range(self.in_channels):
  252. kernel_value[i, i % input_dim, 1, 1] = 1
  253. self.id_tensor = torch.from_numpy(kernel_value).to(branch.weight.device)
  254. kernel = self.id_tensor
  255. running_mean = branch.running_mean
  256. running_var = branch.running_var
  257. gamma = branch.weight
  258. beta = branch.bias
  259. eps = branch.eps
  260. std = (running_var + eps).sqrt()
  261. t = (gamma / std).reshape(-1, 1, 1, 1)
  262. return kernel * t, beta - running_mean * gamma / std
  263. def switch_to_deploy(self):
  264. if hasattr(self, 'rbr_reparam'):
  265. return
  266. kernel, bias = self.get_equivalent_kernel_bias()
  267. self.rbr_reparam = nn.Conv2d(in_channels=self.rbr_dense.conv.in_channels,
  268. out_channels=self.rbr_dense.conv.out_channels,
  269. kernel_size=self.rbr_dense.conv.kernel_size, stride=self.rbr_dense.conv.stride,
  270. padding=self.rbr_dense.conv.padding, dilation=self.rbr_dense.conv.dilation,
  271. groups=self.rbr_dense.conv.groups, bias=True)
  272. self.rbr_reparam.weight.data = kernel
  273. self.rbr_reparam.bias.data = bias
  274. for para in self.parameters():
  275. para.detach_()
  276. self.__delattr__('rbr_dense')
  277. self.__delattr__('rbr_1x1')
  278. if hasattr(self, 'rbr_identity'):
  279. self.__delattr__('rbr_identity')
  280. if hasattr(self, 'id_tensor'):
  281. self.__delattr__('id_tensor')
  282. self.deploy = True
  283. class RepBlock(nn.Module):
  284. '''
  285. RepBlock is a stage block with rep-style basic block
  286. '''
  287. def __init__(self, in_channels, out_channels, n=1, block=RepVGGBlock, basic_block=RepVGGBlock):
  288. super().__init__()
  289. self.conv1 = block(in_channels, out_channels)
  290. self.block = nn.Sequential(*(block(out_channels, out_channels) for _ in range(n - 1))) if n > 1 else None
  291. '''
  292. if block == BottleRep:
  293. self.conv1 = BottleRep(in_channels, out_channels, basic_block=basic_block, weight=True)
  294. n = n // 2
  295. self.block = nn.Sequential(
  296. *(BottleRep(out_channels, out_channels, basic_block=basic_block, weight=True) for _ in
  297. range(n - 1))) if n > 1 else None
  298. '''
  299. def forward(self, x):
  300. x = self.conv1(x)
  301. if self.block is not None:
  302. x = self.block(x)
  303. return x
  304. class Inject(nn.Module):
  305. def __init__(
  306. self,
  307. inp: int,
  308. oup: int,
  309. global_index: int,
  310. norm_cfg=dict(type='BN', requires_grad=True),
  311. activations=nn.ReLU6,
  312. global_inp=None,
  313. ) -> None:
  314. super().__init__()
  315. self.global_index = global_index
  316. self.norm_cfg = norm_cfg
  317. if not global_inp:
  318. global_inp = inp
  319. self.local_embedding = ConvModule(inp, oup, kernel_size=1, norm_cfg=self.norm_cfg, act_cfg=None)
  320. self.global_embedding = ConvModule(global_inp, oup, kernel_size=1, norm_cfg=self.norm_cfg, act_cfg=None)
  321. self.global_act = ConvModule(global_inp, oup, kernel_size=1, norm_cfg=self.norm_cfg, act_cfg=None)
  322. self.act = h_sigmoid()
  323. def forward(self, x_l, x_g):
  324. '''
  325. x_g: global features
  326. x_l: local features
  327. '''
  328. x_g = x_g[self.global_index]
  329. B, C, H, W = x_l.shape
  330. g_B, g_C, g_H, g_W = x_g.shape
  331. use_pool = H < g_H
  332. local_feat = self.local_embedding(x_l)
  333. global_act = self.global_act(x_g)
  334. global_feat = self.global_embedding(x_g)
  335. if use_pool:
  336. avg_pool = get_avg_pool()
  337. # output_size = np.array([H, W])
  338. output_size = [H, W]
  339. sig_act = avg_pool(global_act, output_size)
  340. global_feat = avg_pool(global_feat, output_size)
  341. else:
  342. sig_act = F.interpolate(self.act(global_act), size=(H, W), mode='bilinear', align_corners=False)
  343. global_feat = F.interpolate(global_feat, size=(H, W), mode='bilinear', align_corners=False)
  344. out = local_feat * sig_act + global_feat
  345. return out
  346. class h_sigmoid(nn.Module):
  347. def __init__(self, inplace=True):
  348. super(h_sigmoid, self).__init__()
  349. self.relu = nn.ReLU6(inplace=inplace)
  350. def forward(self, x):
  351. return self.relu(x + 3) / 6
  352. def get_avg_pool():
  353. if torch.onnx.is_in_onnx_export():
  354. avg_pool = onnx_AdaptiveAvgPool2d
  355. else:
  356. avg_pool = nn.functional.adaptive_avg_pool2d
  357. return avg_pool
  358. class Low_LAF(nn.Module):
  359. def __init__(self, in_channels, out_channels):
  360. super().__init__()
  361. self.cv1 = SimConv(in_channels, out_channels, 1, 1)
  362. self.cv_fuse = SimConv(round(out_channels * 2.5), out_channels, 1, 1)
  363. self.downsample = nn.functional.adaptive_avg_pool2d
  364. def forward(self, x):
  365. N, C, H, W = x[1].shape
  366. # output_size = np.array([H, W])
  367. output_size = [H, W]
  368. if torch.onnx.is_in_onnx_export():
  369. self.downsample = onnx_AdaptiveAvgPool2d
  370. output_size = np.array([H, W])
  371. x0 = self.downsample(x[0], output_size)
  372. x1 = self.cv1(x[1])
  373. x2 = F.interpolate(x[2], size=(H, W), mode='bilinear', align_corners=False)
  374. return self.cv_fuse(torch.cat((x0, x1, x2), dim=1))
  375. class SimConv(nn.Module):
  376. '''Normal Conv with ReLU VAN_activation'''
  377. def __init__(self, in_channels, out_channels, kernel_size, stride, groups=1, bias=False, padding=None):
  378. super().__init__()
  379. if padding is None:
  380. padding = kernel_size // 2
  381. self.conv = nn.Conv2d(
  382. in_channels,
  383. out_channels,
  384. kernel_size=kernel_size,
  385. stride=stride,
  386. padding=padding,
  387. groups=groups,
  388. bias=bias,
  389. )
  390. self.bn = nn.BatchNorm2d(out_channels)
  391. self.act = nn.ReLU()
  392. def forward(self, x):
  393. return self.act(self.bn(self.conv(x)))
  394. def forward_fuse(self, x):
  395. return self.act(self.conv(x))
  396. class Split(nn.Module):
  397. def __init__(self, trans_channels):
  398. super().__init__()
  399. self.trans_channels = trans_channels
  400. def forward(self, x):
  401. return x.split(self.trans_channels, dim=1)
  402. class Low_IFM(nn.Module):
  403. def __init__(self, in_channels, embed_dims, fuse_block_num, out_channels):
  404. super().__init__()
  405. self.conv1 = Conv(in_channels, embed_dims, kernel_size=1, stride=1, padding=0)
  406. self.block = nn.ModuleList([RepVGGBlock(embed_dims, embed_dims) for _ in range(fuse_block_num)]) if fuse_block_num > 0 else nn.Identity
  407. self.conv2 = Conv(embed_dims, out_channels, kernel_size=1, stride=1, padding=0)
  408. def forward(self, x):
  409. x = self.conv1(x)
  410. for b in self.block:
  411. x = b(x)
  412. out = self.conv2(x)
  413. return out
  414. class Low_FAM(nn.Module):
  415. def __init__(self):
  416. super().__init__()
  417. self.avg_pool = nn.functional.adaptive_avg_pool2d
  418. def forward(self, x):
  419. x_l, x_m, x_s, x_n = x
  420. B, C, H, W = x_s.shape
  421. # output_size = np.array([H, W])
  422. output_size = [H, W]
  423. if torch.onnx.is_in_onnx_export():
  424. self.avg_pool = onnx_AdaptiveAvgPool2d
  425. x_l = self.avg_pool(x_l, output_size)
  426. x_m = self.avg_pool(x_m, output_size)
  427. x_n = F.interpolate(x_n, size=(H, W), mode='bilinear', align_corners=False)
  428. out = torch.cat([x_l, x_m, x_s, x_n], 1)
  429. return out
  430. def conv_bn(in_channels, out_channels, kernel_size, stride, padding, groups=1, bias=False):
  431. '''Basic cell for rep-style block, including conv and bn'''
  432. result = nn.Sequential()
  433. result.add_module('conv', nn.Conv2d(in_channels=in_channels, out_channels=out_channels,
  434. kernel_size=kernel_size, stride=stride, padding=padding, groups=groups,
  435. bias=bias))
  436. result.add_module('bn', nn.BatchNorm2d(num_features=out_channels))
  437. return result
  438. class Conv(nn.Module):
  439. '''Normal Conv with SiLU VAN_activation'''
  440. def __init__(self, in_channels, out_channels, kernel_size, stride, groups=1, bias=False, padding=None):
  441. super().__init__()
  442. if padding is None:
  443. padding = kernel_size // 2
  444. self.conv = nn.Conv2d(
  445. in_channels,
  446. out_channels,
  447. kernel_size=kernel_size,
  448. stride=stride,
  449. padding=padding,
  450. groups=groups,
  451. bias=bias,
  452. )
  453. self.bn = nn.BatchNorm2d(out_channels)
  454. self.act = nn.SiLU()
  455. def forward(self, x):
  456. return self.act(self.bn(self.conv(x)))
  457. def onnx_AdaptiveAvgPool2d(x, output_size):
  458. stride_size = np.floor(np.array(x.shape[-2:]) / output_size).astype(np.int32)
  459. kernel_size = np.array(x.shape[-2:]) - (output_size - 1) * stride_size
  460. avg = nn.AvgPool2d(kernel_size=list(kernel_size), stride=list(stride_size))
  461. x = avg(x)
  462. return x


四、Gold-YOLO使用方式

4.1 修改一

第一还是建立文件,我们找到如下 ultralytics /nn文件夹下建立一个目录名字呢就是'Addmodules'文件夹( 用群内的文件的话已经有了无需新建) !然后在其内部建立一个新的py文件将核心代码复制粘贴进去即可。


4.2 修改二

第二步我们在该目录下创建一个新的py文件名字为'__init__.py'( 用群内的文件的话已经有了无需新建) ,然后在其内部导入我们的检测头如下图所示。


4.3 修改三

第三步我门中到如下文件'ultralytics/nn/tasks.py'进行导入和注册我们的模块( 用群内的文件的话已经有了无需重新导入直接开始第四步即可)

从今天开始以后的教程就都统一成这个样子了,因为我默认大家用了我群内的文件来进行修改!!

​​


4.4 修改四

按照图片进行修改.


4.5 修改五

将下面的代码按照图片添加,注意缩进不要复制时候缩进错误了导致报错。

  1. # --------------GOLD-YOLO--------------
  2. elif m in (nn.Conv2d, SimConv):
  3. c1, c2 = ch[f], args[0]
  4. if c2 != nc: # if c2 not equal to number of classes (i.e. for Classify() output)
  5. c2 = make_divisible(c2 * width, 8)
  6. args = [c1, c2, *args[1:]]
  7. elif m in (High_IFM, ):
  8. c2 = args[1]
  9. if c2 != nc: # if c2 not equal to number of classes (i.e. for Classify() output)
  10. c2 = make_divisible(c2 * width, 8)
  11. args = [args[0], c2, *args[2:]]
  12. elif m in (Low_FAM, High_FAM, High_LAF):
  13. c2 = sum(ch[x] for x in f)
  14. elif m is Low_IFM:
  15. c1, c2 = ch[f], args[2]
  16. if c2 != nc: # if c2 not equal to number of classes (i.e. for Classify() output)
  17. c2 = make_divisible(min(c2, max_channels) * width, 8)
  18. args = [c1, *args[:-1], c2]
  19. elif m is Low_LAF:
  20. c1, c2 = ch[f[1]], args[0]
  21. if c2 != nc: # if c2 not equal to number of classes (i.e. for Classify() output)
  22. c2 = make_divisible(min(c2, max_channels) * width, 8)
  23. args = [c1, c2, *args[1:]]
  24. elif m is Inject:
  25. global_index = args[1]
  26. c1, c2 = ch[f[1]][global_index], args[0]
  27. if c2 != nc: # if c2 not equal to number of classes (i.e. for Classify() output)
  28. c2 = make_divisible(min(c2, max_channels) * width, 8)
  29. args = [c1, c2, global_index]
  30. elif m is RepBlock:
  31. c1, c2 = ch[f], args[0]
  32. if c2 != nc: # if c2 not equal to number of classes (i.e. for Classify() output)
  33. c2 = make_divisible(min(c2, max_channels) * width, 8)
  34. nums_repeat = max(round(args[1] * depth), 1) if args[1] > 1 else args[1] # depth gain
  35. args = [c1, c2, nums_repeat]
  36. elif m is Split:
  37. goldyolo = True
  38. c2 = []
  39. for arg in args:
  40. if arg != nc: # if c2 not equal to number of classes (i.e. for Classify() output)
  41. c2.append(make_divisible(min(arg, max_channels) * width, 8))
  42. args = [c2]
  43. # --------------GOLD-YOLO--------------

4.6 修改六

此处如果你修改了我的主干篇代码那么就需要修改,如果你没修改跳过此步即可.


4.7 修改七

同理按照图片添加即可,此处是必须修改的!

  1. if m in [Inject, High_LAF]:
  2. # input nums
  3. m_.input_nums = len(f)
  4. else:
  5. m_.input_nums = 1


4.8 修改八

此处如果你修改了我的主干篇代码那么就需要修改,如果你没修改跳过此步即可.


4.9 修改九

如果你未修改主干则看4.9.1如果修改了主干则看4.9.2.


4.9.1 未修改主干

上面的代码修改都是按照顺序来的,此处的代码修改不和上面的顺序一样我们需要找到'ultralytics/nn/tasks.py'文件的开头,然后进行修改。

  1. try:
  2. if m.input_nums > 1:
  3. # input nums more than one
  4. x = m(*x) # run
  5. else:
  6. x = m(x)
  7. except AttributeError:
  8. # AttributeError: 'Conv' object has no attribute 'input_nums'
  9. x = m(x)

修改完之后的样子如下图所示!


4.9.2 修改主干

上面的代码修改都是按照顺序来的,此处的代码修改不和上面的顺序一样我们需要找到'ultralytics/nn/tasks.py'文件的开头,然后进行修改。

  1. def _predict_once(self, x, profile=False, visualize=False, embed=None):
  2. """
  3. Perform a forward pass through the network.
  4. Args:
  5. x (torch.Tensor): The input tensor to the model.
  6. profile (bool): Print the computation time of each layer if True, defaults to False.
  7. visualize (bool): Save the feature maps of the model if True, defaults to False.
  8. embed (list, optional): A list of feature vectors/embeddings to return.
  9. Returns:
  10. (torch.Tensor): The last output of the model.
  11. """
  12. y, dt, embeddings = [], [], [] # outputs
  13. for m in self.model:
  14. if m.f != -1: # if not from previous layer
  15. x = y[m.f] if isinstance(m.f, int) else [x if j == -1 else y[j] for j in m.f] # from earlier layers
  16. if profile:
  17. self._profile_one_layer(m, x, dt)
  18. if hasattr(m, 'backbone'):
  19. try:
  20. if m.input_nums > 1:
  21. # input nums more than one
  22. x = m(*x) # run
  23. else:
  24. x = m(x)
  25. except AttributeError:
  26. # AttributeError: 'Conv' object has no attribute 'input_nums'
  27. x = m(x)
  28. if len(x) != 5: # 0 - 5
  29. x.insert(0, None)
  30. for index, i in enumerate(x):
  31. if index in self.save:
  32. y.append(i)
  33. else:
  34. y.append(None)
  35. x = x[-1] # 最后一个输出传给下一层
  36. else:
  37. try:
  38. if m.input_nums > 1:
  39. # input nums more than one
  40. x = m(*x) # run
  41. else:
  42. x = m(x)
  43. except AttributeError:
  44. # AttributeError: 'Conv' object has no attribute 'input_nums'
  45. x = m(x)
  46. y.append(x if m.i in self.save else None) # save output
  47. if visualize:
  48. feature_visualization(x, m.type, m.i, save_dir=visualize)
  49. if embed and m.i in embed:
  50. embeddings.append(nn.functional.adaptive_avg_pool2d(x, (1, 1)).squeeze(-1).squeeze(-1)) # flatten
  51. if m.i == max(embed):
  52. return torch.unbind(torch.cat(embeddings, 1), dim=0)
  53. return x


五、Gold-YOLO的yaml文件

5.1 yaml文件

需要注意的是本文的代码仅支持YOLOv11n和s使用,其余版本使用会报错,因为不兼容的问题导致.

  1. # Ultralytics YOLO 🚀, AGPL-3.0 license
  2. # YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
  3. # Parameters
  4. nc: 80 # number of classes
  5. scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
  6. # [depth, width, max_channels]
  7. n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
  8. s: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
  9. m: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
  10. l: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
  11. x: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs
  12. # YOLO11n backbone
  13. backbone:
  14. # [from, repeats, module, args]
  15. - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
  16. - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
  17. - [-1, 2, C3k2, [256, False, 0.25]]
  18. - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
  19. - [-1, 2, C3k2, [512, False, 0.25]]
  20. - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
  21. - [-1, 2, C3k2, [512, True]]
  22. - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
  23. - [-1, 2, C3k2, [1024, True]]
  24. - [-1, 1, SPPF, [1024, 5]] # 9
  25. - [-1, 2, C2PSA, [1024]] # 10
  26. # YOLO11n head
  27. head:
  28. - [[2, 4, 6, -1], 1, Low_FAM, []]
  29. - [-1, 1, Low_IFM, [96, 3, 768]]
  30. - [-1, 1, Split, [512, 256]] # 13-low_global_info
  31. - [10, 1, SimConv, [512, 1, 1]] # 14-c5_half
  32. - [[4, 6, -1], 1, Low_LAF, [512]]
  33. - [[-1, 13], 1, Inject, [512, 0]]
  34. - [-1, 1, RepBlock, [512, 12]] # 17-p4
  35. - [-1, 1, SimConv, [256, 1, 1]] # 18-p4_half
  36. - [[2, 4, -1], 1, Low_LAF, [256]]
  37. - [[-1, 13], 1, Inject, [256, 1]]
  38. - [-1, 1, RepBlock, [256, 12]] # 21-p3
  39. - [[-1, 17, 10], 1, High_FAM, [1, 'torch']]
  40. - [-1, 1, High_IFM, [2, 1792, 8, 4, 1, 2, 0, 0, [0.1, 2]]]
  41. - [-1, 1, nn.Conv2d, [1536, 1, 1, 0]]
  42. - [-1, 1, Split, [512, 1024]] # 25-high_global_info
  43. - [[21, 18], 1, High_LAF, []]
  44. - [[-1, 25], 1, Inject, [512, 0]]
  45. - [-1, 1, RepBlock, [512, 12]] # 28-n4
  46. - [[-1, 14], 1, High_LAF, []]
  47. - [[-1, 25], 1, Inject, [1024, 1]]
  48. - [-1, 1, RepBlock, [1024, 12]] # 31-n5
  49. - [[21, 28, 31], 1, Detect, [nc]] # Detect(P3, N4, N5)

5.2 运行代码

创建一个run.py文件放在v11项目的根目录下。

  1. import warnings
  2. warnings.filterwarnings('ignore')
  3. from ultralytics import YOLO
  4. if __name__ == '__main__':
  5. model = YOLO("替换你的模型yaml文件路径")
  6. model.load('yolov8n.pt') # 我这里用的n的权重文件,大家可以自行替换自己的版本的
  7. model.train(data=r'替换你的数据集yaml文件地址',
  8. cache=False,
  9. imgsz=640,
  10. epochs=150,
  11. batch=16,
  12. close_mosaic=0,
  13. workers=0,
  14. device=0,
  15. optimizer='SGD', # using SGD
  16. amp=False,# close amp
  17. )


5.3 成功运行截图


六、全文总结

到此本文的正式分享内容就结束了,在这里给大家推荐我的YOLOv11改进有效涨点专栏,本专栏目前为新开的平均质量分98分,后期我会根据各种最新的前沿顶会进行论文复现,也会对一些老的改进机制进行补充, 目前本专栏免费阅读(暂时,大家尽早关注不迷路~), 如果大家觉得本文帮助到你了,订阅本专栏,关注后续更多的更新~

​​