学习资源站

YOLOv11改进-Conv篇-重参数化多元分支模块DiverseBranchBlock二次创新C3k2(有效涨点,重参数化模块高效推理)

一、本文介绍

本文带来的改进机制是YOLOv11模型与 多元分支模块(Diverse Branch Block) 的结合,Diverse Branch Block (DBB) 是一种用于 增强卷积神经网络性能的结构重新参数化技术 。这种技术的核心在于结合多样化的分支,这些分支具有 不同的 尺度和复杂度,从而丰富特征空间。 我将其放在了YOLOv11的不同位置上均有一定的涨点幅度 ,同时这个DBB模块的参数量并不会上涨太多,我添加三个该机制到模型中, GFLOPs上涨了0.04,本文内容含独家二次创新C3k2机制。



二、Diverse Branch Block原理

论文地址: 论文官方地址

代码地址: 官方代码地址


2.1 Diverse Branch Block的基本原理

Diverse Branch Block(DBB)的基本原理是在训练阶段增加卷积层的复杂性,通过 引入不同尺寸和结构的卷积分支 来丰富网络的特征表示能力。我们可以将基本原理可以概括为以下几点:

1. 多样化分支结构: DBB 结合了不同尺度和复杂度的分支,如不同大小的卷积核和平均池化,以增加单个卷积的特征表达能力。
2. 训练与推理分离: 在训练阶段,DBB 采用复杂的分支结构,而在推理阶段,这些分支可以被等效地转换为单个卷积层,以保持高效推理。
3. 宏观架构不变: DBB 允许在不改变整体网络架构的情况下,作为常规卷积层的替代品插入到现有网络中。

下面我将为大家展示Diverse Branch Block(DBB)的 设计示例

在训练时(左侧),DBB由不同大小的卷积层和平均池化层组成,这些层以一种复杂的方式 并行 排列,并最终合并输出。训练完成后,这些复杂的结构会 转换成单个卷积层 ,用于模型的推理阶段(右侧),以此保持推理时的效率。这种转换允许DBB在 保持宏观架构不变 的同时,增加训练时的微观结构复杂性。


2.2 多样化分支结构

多样化分支结构是在 卷积神经网络 中引入的一种结构,旨在通过 多样化的分支来增强模型的特征提取能力 。这些分支包含不同尺寸的卷积层和池化层,以及其他潜在的操作,它们并行工作以捕获不同的特征表示。在训练完成后,这些复杂的结构可以合并并简化为单个的卷积层,以便在推理时不增加额外的计算负担。这种设计使得DBB可以作为现有卷积层的 直接替换 ,增强了现有网络架构的性能,而 不需要修改整体架构

下面我详细展示了如何通过六种转换方法将训练时的Diverse Branch Block(DBB)转换为推理时的常规卷积层,每一种转换对应于一种特定的操作:

1. Transform I: 将具有批量规范化(batch norm)的卷积层融合。
2. Transform II: 合并具有相同配置的卷积层的输出。
3. Transform III: 合并序列卷积层。
4. Transform IV: 通过深度串联(concat)来合并卷积层。
5. Transform V: 将平均池化(AVG)操作融入卷积操作中。
6. Transform VI: 结合不同尺度的卷积层。

可以看到右侧的框显示了经过这些转换后,可以实现的推理时DBB,其中包含了常规卷积、平均池化和批量规范化操作。这些转换确保了在不增加推理时负担的同时,能够在训练时利用DBB的多样化特征提取能力。


2.3 训练与推理分离

训练与推理分离的概念是指在模型 训练阶段使用复杂的DBB结构 而在 模型推理阶段则转换为简化的卷积结构 。这种设计允许模型在训练时利用DBB的多样性来增强特征提取和学习能力,而在实际应用中,即推理时,通过减少计算量来保持高效。这样,模型在保持高性能的同时,也保证了运行速度和资源效率。

上面我将展示在训练阶段如何通过不同的卷积组合(如图中的1x1和KxK卷积),以及在推理阶段如何将这些组合转换成一个简化的结构(如图中的转换IV所示的拼接操作):

经过分析,我们可以发现它说明了 三种不同的情况

A)组卷积(Groupwise conv):将输入分成多个组,每个组使用不同的卷积核。
B)训练时的1x1-KxK结构:首先应用1x1的卷积(减少特征维度),然后是分组的KxK卷积。
C)从转换IV的角度看:这是将多个分组的卷积输出合并的视角。这里,组内卷积后的特征图先分别通过1x1卷积处理,然后再进行拼接(concat)。


2.4 宏观架构不变

宏观架构不变指的是DBB在设计时 考虑到了与现有的网络架构兼容性 ,确保可以在不改变整体网络架构(如ResNet等流行架构)的前提下,将DBB作为一个模块嵌入。这意味着DBB增强了网络的特征提取能力,同时保持了原有网络结构的布局,确保了推理时的效率和性能。这样的设计允许研究者和开发者将DBB直接应用到现有的 深度学习 模型中,而无需进行大规模的架构调整。


三、Diverse Branch Block的核心代码

  1. import torch
  2. import torch.nn as nn
  3. import torch.nn.functional as F
  4. import numpy as np
  5. __all__ = ['DiverseBranchBlock', 'C3k2_DBB_backbone', 'C3k2_DBB_neck']
  6. def autopad(k, p=None, d=1): # kernel, padding, dilation
  7. """Pad to 'same' shape outputs."""
  8. if d > 1:
  9. k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k] # actual kernel-size
  10. if p is None:
  11. p = k // 2 if isinstance(k, int) else [x // 2 for x in k] # auto-pad
  12. return p
  13. class Conv(nn.Module):
  14. """Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)."""
  15. default_act = nn.SiLU() # default activation
  16. def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):
  17. """Initialize Conv layer with given arguments including activation."""
  18. super().__init__()
  19. self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)
  20. self.bn = nn.BatchNorm2d(c2)
  21. self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()
  22. def forward(self, x):
  23. """Apply convolution, batch normalization and activation to input tensor."""
  24. return self.act(self.bn(self.conv(x)))
  25. def forward_fuse(self, x):
  26. """Perform transposed convolution of 2D data."""
  27. return self.act(self.conv(x))
  28. def transI_fusebn(kernel, bn):
  29. gamma = bn.weight
  30. std = (bn.running_var + bn.eps).sqrt()
  31. return kernel * ((gamma / std).reshape(-1, 1, 1, 1)), bn.bias - bn.running_mean * gamma / std
  32. def transII_addbranch(kernels, biases):
  33. return sum(kernels), sum(biases)
  34. def transIII_1x1_kxk(k1, b1, k2, b2, groups):
  35. if groups == 1:
  36. k = F.conv2d(k2, k1.permute(1, 0, 2, 3)) #
  37. b_hat = (k2 * b1.reshape(1, -1, 1, 1)).sum((1, 2, 3))
  38. else:
  39. k_slices = []
  40. b_slices = []
  41. k1_T = k1.permute(1, 0, 2, 3)
  42. k1_group_width = k1.size(0) // groups
  43. k2_group_width = k2.size(0) // groups
  44. for g in range(groups):
  45. k1_T_slice = k1_T[:, g * k1_group_width:(g + 1) * k1_group_width, :, :]
  46. k2_slice = k2[g * k2_group_width:(g + 1) * k2_group_width, :, :, :]
  47. k_slices.append(F.conv2d(k2_slice, k1_T_slice))
  48. b_slices.append(
  49. (k2_slice * b1[g * k1_group_width:(g + 1) * k1_group_width].reshape(1, -1, 1, 1)).sum((1, 2, 3)))
  50. k, b_hat = transIV_depthconcat(k_slices, b_slices)
  51. return k, b_hat + b2
  52. def transIV_depthconcat(kernels, biases):
  53. return torch.cat(kernels, dim=0), torch.cat(biases)
  54. def transV_avg(channels, kernel_size, groups):
  55. input_dim = channels // groups
  56. k = torch.zeros((channels, input_dim, kernel_size, kernel_size))
  57. k[np.arange(channels), np.tile(np.arange(input_dim), groups), :, :] = 1.0 / kernel_size ** 2
  58. return k
  59. # This has not been tested with non-square kernels (kernel.size(2) != kernel.size(3)) nor even-size kernels
  60. def transVI_multiscale(kernel, target_kernel_size):
  61. H_pixels_to_pad = (target_kernel_size - kernel.size(2)) // 2
  62. W_pixels_to_pad = (target_kernel_size - kernel.size(3)) // 2
  63. return F.pad(kernel, [H_pixels_to_pad, H_pixels_to_pad, W_pixels_to_pad, W_pixels_to_pad])
  64. def conv_bn(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, groups=1,
  65. padding_mode='zeros'):
  66. conv_layer = nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size,
  67. stride=stride, padding=padding, dilation=dilation, groups=groups,
  68. bias=False, padding_mode=padding_mode)
  69. bn_layer = nn.BatchNorm2d(num_features=out_channels, affine=True)
  70. se = nn.Sequential()
  71. se.add_module('conv', conv_layer)
  72. se.add_module('bn', bn_layer)
  73. return se
  74. class IdentityBasedConv1x1(nn.Conv2d):
  75. def __init__(self, channels, groups=1):
  76. super(IdentityBasedConv1x1, self).__init__(in_channels=channels, out_channels=channels, kernel_size=1, stride=1,
  77. padding=0, groups=groups, bias=False)
  78. assert channels % groups == 0
  79. input_dim = channels // groups
  80. id_value = np.zeros((channels, input_dim, 1, 1))
  81. for i in range(channels):
  82. id_value[i, i % input_dim, 0, 0] = 1
  83. self.id_tensor = torch.from_numpy(id_value).type_as(self.weight)
  84. nn.init.zeros_(self.weight)
  85. def forward(self, input):
  86. kernel = self.weight + self.id_tensor.to(self.weight.device).type_as(self.weight)
  87. result = F.conv2d(input, kernel, None, stride=1, padding=0, dilation=self.dilation, groups=self.groups)
  88. return result
  89. def get_actual_kernel(self):
  90. return self.weight + self.id_tensor.to(self.weight.device)
  91. class BNAndPadLayer(nn.Module):
  92. def __init__(self,
  93. pad_pixels,
  94. num_features,
  95. eps=1e-5,
  96. momentum=0.1,
  97. affine=True,
  98. track_running_stats=True):
  99. super(BNAndPadLayer, self).__init__()
  100. self.bn = nn.BatchNorm2d(num_features, eps, momentum, affine, track_running_stats)
  101. self.pad_pixels = pad_pixels
  102. def forward(self, input):
  103. output = self.bn(input)
  104. if self.pad_pixels > 0:
  105. if self.bn.affine:
  106. pad_values = self.bn.bias.detach() - self.bn.running_mean * self.bn.weight.detach() / torch.sqrt(
  107. self.bn.running_var + self.bn.eps)
  108. else:
  109. pad_values = - self.bn.running_mean / torch.sqrt(self.bn.running_var + self.bn.eps)
  110. output = F.pad(output, [self.pad_pixels] * 4)
  111. pad_values = pad_values.view(1, -1, 1, 1)
  112. output[:, :, 0:self.pad_pixels, :] = pad_values
  113. output[:, :, -self.pad_pixels:, :] = pad_values
  114. output[:, :, :, 0:self.pad_pixels] = pad_values
  115. output[:, :, :, -self.pad_pixels:] = pad_values
  116. return output
  117. @property
  118. def weight(self):
  119. return self.bn.weight
  120. @property
  121. def bias(self):
  122. return self.bn.bias
  123. @property
  124. def running_mean(self):
  125. return self.bn.running_mean
  126. @property
  127. def running_var(self):
  128. return self.bn.running_var
  129. @property
  130. def eps(self):
  131. return self.bn.eps
  132. class DiverseBranchBlock(nn.Module):
  133. def __init__(self, in_channels, out_channels, kernel_size,
  134. stride=1, padding=None, dilation=1, groups=1,
  135. internal_channels_1x1_3x3=None,
  136. deploy=False, single_init=False):
  137. super(DiverseBranchBlock, self).__init__()
  138. self.deploy = deploy
  139. self.nonlinear = Conv.default_act
  140. self.kernel_size = kernel_size
  141. self.out_channels = out_channels
  142. self.groups = groups
  143. if padding is None:
  144. padding = autopad(kernel_size, padding, dilation)
  145. assert padding == kernel_size // 2
  146. if deploy:
  147. self.dbb_reparam = nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size,
  148. stride=stride,
  149. padding=padding, dilation=dilation, groups=groups, bias=True)
  150. else:
  151. self.dbb_origin = conv_bn(in_channels=in_channels, out_channels=out_channels, kernel_size=kernel_size,
  152. stride=stride, padding=padding, dilation=dilation, groups=groups)
  153. self.dbb_avg = nn.Sequential()
  154. if groups < out_channels:
  155. self.dbb_avg.add_module('conv',
  156. nn.Conv2d(in_channels=in_channels, out_channels=out_channels, kernel_size=1,
  157. stride=1, padding=0, groups=groups, bias=False))
  158. self.dbb_avg.add_module('bn', BNAndPadLayer(pad_pixels=padding, num_features=out_channels))
  159. self.dbb_avg.add_module('avg', nn.AvgPool2d(kernel_size=kernel_size, stride=stride, padding=0))
  160. self.dbb_1x1 = conv_bn(in_channels=in_channels, out_channels=out_channels, kernel_size=1, stride=stride,
  161. padding=0, groups=groups)
  162. else:
  163. self.dbb_avg.add_module('avg', nn.AvgPool2d(kernel_size=kernel_size, stride=stride, padding=padding))
  164. self.dbb_avg.add_module('avgbn', nn.BatchNorm2d(out_channels))
  165. if internal_channels_1x1_3x3 is None:
  166. internal_channels_1x1_3x3 = in_channels if groups < out_channels else 2 * in_channels # For mobilenet, it is better to have 2X internal channels
  167. self.dbb_1x1_kxk = nn.Sequential()
  168. if internal_channels_1x1_3x3 == in_channels:
  169. self.dbb_1x1_kxk.add_module('idconv1', IdentityBasedConv1x1(channels=in_channels, groups=groups))
  170. else:
  171. self.dbb_1x1_kxk.add_module('conv1',
  172. nn.Conv2d(in_channels=in_channels, out_channels=internal_channels_1x1_3x3,
  173. kernel_size=1, stride=1, padding=0, groups=groups, bias=False))
  174. self.dbb_1x1_kxk.add_module('bn1', BNAndPadLayer(pad_pixels=padding, num_features=internal_channels_1x1_3x3,
  175. affine=True))
  176. self.dbb_1x1_kxk.add_module('conv2',
  177. nn.Conv2d(in_channels=internal_channels_1x1_3x3, out_channels=out_channels,
  178. kernel_size=kernel_size, stride=stride, padding=0, groups=groups,
  179. bias=False))
  180. self.dbb_1x1_kxk.add_module('bn2', nn.BatchNorm2d(out_channels))
  181. # The experiments reported in the paper used the default initialization of bn.weight (all as 1). But changing the initialization may be useful in some cases.
  182. if single_init:
  183. # Initialize the bn.weight of dbb_origin as 1 and others as 0. This is not the default setting.
  184. self.single_init()
  185. def get_equivalent_kernel_bias(self):
  186. k_origin, b_origin = transI_fusebn(self.dbb_origin.conv.weight, self.dbb_origin.bn)
  187. if hasattr(self, 'dbb_1x1'):
  188. k_1x1, b_1x1 = transI_fusebn(self.dbb_1x1.conv.weight, self.dbb_1x1.bn)
  189. k_1x1 = transVI_multiscale(k_1x1, self.kernel_size)
  190. else:
  191. k_1x1, b_1x1 = 0, 0
  192. if hasattr(self.dbb_1x1_kxk, 'idconv1'):
  193. k_1x1_kxk_first = self.dbb_1x1_kxk.idconv1.get_actual_kernel()
  194. else:
  195. k_1x1_kxk_first = self.dbb_1x1_kxk.conv1.weight
  196. k_1x1_kxk_first, b_1x1_kxk_first = transI_fusebn(k_1x1_kxk_first, self.dbb_1x1_kxk.bn1)
  197. k_1x1_kxk_second, b_1x1_kxk_second = transI_fusebn(self.dbb_1x1_kxk.conv2.weight, self.dbb_1x1_kxk.bn2)
  198. k_1x1_kxk_merged, b_1x1_kxk_merged = transIII_1x1_kxk(k_1x1_kxk_first, b_1x1_kxk_first, k_1x1_kxk_second,
  199. b_1x1_kxk_second, groups=self.groups)
  200. k_avg = transV_avg(self.out_channels, self.kernel_size, self.groups)
  201. k_1x1_avg_second, b_1x1_avg_second = transI_fusebn(k_avg.to(self.dbb_avg.avgbn.weight.device),
  202. self.dbb_avg.avgbn)
  203. if hasattr(self.dbb_avg, 'conv'):
  204. k_1x1_avg_first, b_1x1_avg_first = transI_fusebn(self.dbb_avg.conv.weight, self.dbb_avg.bn)
  205. k_1x1_avg_merged, b_1x1_avg_merged = transIII_1x1_kxk(k_1x1_avg_first, b_1x1_avg_first, k_1x1_avg_second,
  206. b_1x1_avg_second, groups=self.groups)
  207. else:
  208. k_1x1_avg_merged, b_1x1_avg_merged = k_1x1_avg_second, b_1x1_avg_second
  209. return transII_addbranch((k_origin, k_1x1, k_1x1_kxk_merged, k_1x1_avg_merged),
  210. (b_origin, b_1x1, b_1x1_kxk_merged, b_1x1_avg_merged))
  211. def switch_to_deploy(self):
  212. if hasattr(self, 'dbb_reparam'):
  213. return
  214. kernel, bias = self.get_equivalent_kernel_bias()
  215. self.dbb_reparam = nn.Conv2d(in_channels=self.dbb_origin.conv.in_channels,
  216. out_channels=self.dbb_origin.conv.out_channels,
  217. kernel_size=self.dbb_origin.conv.kernel_size, stride=self.dbb_origin.conv.stride,
  218. padding=self.dbb_origin.conv.padding, dilation=self.dbb_origin.conv.dilation,
  219. groups=self.dbb_origin.conv.groups, bias=True)
  220. self.dbb_reparam.weight.data = kernel
  221. self.dbb_reparam.bias.data = bias
  222. for para in self.parameters():
  223. para.detach_()
  224. self.__delattr__('dbb_origin')
  225. self.__delattr__('dbb_avg')
  226. if hasattr(self, 'dbb_1x1'):
  227. self.__delattr__('dbb_1x1')
  228. self.__delattr__('dbb_1x1_kxk')
  229. def forward(self, inputs):
  230. if hasattr(self, 'dbb_reparam'):
  231. return self.nonlinear(self.dbb_reparam(inputs))
  232. out = self.dbb_origin(inputs)
  233. if hasattr(self, 'dbb_1x1'):
  234. out += self.dbb_1x1(inputs)
  235. out += self.dbb_avg(inputs)
  236. out += self.dbb_1x1_kxk(inputs)
  237. return self.nonlinear(out)
  238. def init_gamma(self, gamma_value):
  239. if hasattr(self, "dbb_origin"):
  240. torch.nn.init.constant_(self.dbb_origin.bn.weight, gamma_value)
  241. if hasattr(self, "dbb_1x1"):
  242. torch.nn.init.constant_(self.dbb_1x1.bn.weight, gamma_value)
  243. if hasattr(self, "dbb_avg"):
  244. torch.nn.init.constant_(self.dbb_avg.avgbn.weight, gamma_value)
  245. if hasattr(self, "dbb_1x1_kxk"):
  246. torch.nn.init.constant_(self.dbb_1x1_kxk.bn2.weight, gamma_value)
  247. def single_init(self):
  248. self.init_gamma(0.0)
  249. if hasattr(self, "dbb_origin"):
  250. torch.nn.init.constant_(self.dbb_origin.bn.weight, 1.0)
  251. class Bottleneck_DBB(nn.Module):
  252. # Standard bottleneck with DCN
  253. def __init__(self, c1, c2, shortcut=True, g=1, k=(3, 3), e=0.5): # ch_in, ch_out, shortcut, groups, kernels, expand
  254. super().__init__()
  255. c_ = int(c2 * e) # hidden channels
  256. self.cv1 = Conv(c1, c_, k[0], 1)
  257. self.cv2 = DiverseBranchBlock(c_, c2, 3, stride=1, groups=g)
  258. self.add = shortcut and c1 == c2
  259. def forward(self, x):
  260. return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))
  261. class Bottleneck(nn.Module):
  262. """Standard bottleneck."""
  263. def __init__(self, c1, c2, shortcut=True, g=1, k=(3, 3), e=0.5):
  264. """Initializes a standard bottleneck module with optional shortcut connection and configurable parameters."""
  265. super().__init__()
  266. c_ = int(c2 * e) # hidden channels
  267. self.cv1 = Conv(c1, c_, k[0], 1)
  268. self.cv2 = Conv(c_, c2, k[1], 1, g=g)
  269. self.add = shortcut and c1 == c2
  270. def forward(self, x):
  271. """Applies the YOLO FPN to input data."""
  272. return x + self.cv2(self.cv1(x)) if self.add else self.cv2(self.cv1(x))
  273. class C2f(nn.Module):
  274. """Faster Implementation of CSP Bottleneck with 2 convolutions."""
  275. def __init__(self, c1, c2, n=1, shortcut=False, g=1, e=0.5):
  276. """Initializes a CSP bottleneck with 2 convolutions and n Bottleneck blocks for faster processing."""
  277. super().__init__()
  278. self.c = int(c2 * e) # hidden channels
  279. self.cv1 = Conv(c1, 2 * self.c, 1, 1)
  280. self.cv2 = Conv((2 + n) * self.c, c2, 1) # optional act=FReLU(c2)
  281. self.m = nn.ModuleList(Bottleneck(self.c, self.c, shortcut, g, k=((3, 3), (3, 3)), e=1.0) for _ in range(n))
  282. def forward(self, x):
  283. """Forward pass through C2f layer."""
  284. y = list(self.cv1(x).chunk(2, 1))
  285. y.extend(m(y[-1]) for m in self.m)
  286. return self.cv2(torch.cat(y, 1))
  287. def forward_split(self, x):
  288. """Forward pass using split() instead of chunk()."""
  289. y = list(self.cv1(x).split((self.c, self.c), 1))
  290. y.extend(m(y[-1]) for m in self.m)
  291. return self.cv2(torch.cat(y, 1))
  292. class C3(nn.Module):
  293. """CSP Bottleneck with 3 convolutions."""
  294. def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5):
  295. """Initialize the CSP Bottleneck with given channels, number, shortcut, groups, and expansion values."""
  296. super().__init__()
  297. c_ = int(c2 * e) # hidden channels
  298. self.cv1 = Conv(c1, c_, 1, 1)
  299. self.cv2 = Conv(c1, c_, 1, 1)
  300. self.cv3 = Conv(2 * c_, c2, 1) # optional act=FReLU(c2)
  301. self.m = nn.Sequential(*(Bottleneck(c_, c_, shortcut, g, k=((1, 1), (3, 3)), e=1.0) for _ in range(n)))
  302. def forward(self, x):
  303. """Forward pass through the CSP bottleneck with 2 convolutions."""
  304. return self.cv3(torch.cat((self.m(self.cv1(x)), self.cv2(x)), 1))
  305. class C3k(C3):
  306. """C3k is a CSP bottleneck module with customizable kernel sizes for feature extraction in neural networks."""
  307. def __init__(self, c1, c2, n=1, shortcut=True, g=1, e=0.5, k=3):
  308. """Initializes the C3k module with specified channels, number of layers, and configurations."""
  309. super().__init__(c1, c2, n, shortcut, g, e)
  310. c_ = int(c2 * e) # hidden channels
  311. # self.m = nn.Sequential(*(RepBottleneck(c_, c_, shortcut, g, k=(k, k), e=1.0) for _ in range(n)))
  312. self.m = nn.Sequential(*(Bottleneck_DBB(c_, c_, shortcut, g, k=(k, k), e=1.0) for _ in range(n)))
  313. class C3k2_DBB_backbone(C2f):
  314. """Faster Implementation of CSP Bottleneck with 2 convolutions."""
  315. def __init__(self, c1, c2, n=1, c3k=False, e=0.5, g=1, shortcut=True):
  316. """Initializes the C3k2 module, a faster CSP Bottleneck with 2 convolutions and optional C3k blocks."""
  317. super().__init__(c1, c2, n, shortcut, g, e)
  318. self.m = nn.ModuleList(
  319. C3k(self.c, self.c, 2, shortcut, g) if c3k else Bottleneck_DBB(self.c, self.c, shortcut, g) for _ in range(n)
  320. )
  321. class C3k2_DBB_neck(C2f):
  322. """Faster Implementation of CSP Bottleneck with 2 convolutions."""
  323. def __init__(self, c1, c2, n=1, c3k=False, e=0.5, g=1, shortcut=True):
  324. """Initializes the C3k2 module, a faster CSP Bottleneck with 2 convolutions and optional C3k blocks."""
  325. super().__init__(c1, c2, n, shortcut, g, e)
  326. self.m = nn.ModuleList(
  327. C3k(self.c, self.c, 2, shortcut, g) if c3k else Bottleneck(self.c, self.c, shortcut, g) for _ in range(n)
  328. )
  329. if __name__ == "__main__":
  330. # Generating Sample image
  331. image_size = (1, 64, 224, 224)
  332. image = torch.rand(*image_size)
  333. # Model
  334. model = C3k2_DBB_backbone(64, 64)
  335. out = model(image)
  336. print(out.size())

四、手把手教你添加Diverse Branch Block机制

4.1 Diverse Branch Block的添加教程

这个添加方式和之前的变了一下,以后的添加方法都按照这个来了,是为了和群内的文件适配。


4.1.1 修改一

第一还是建立文件,我们找到如下 ultralytics /nn文件夹下建立一个目录名字呢就是'Addmodules'文件夹( 用群内的文件的话已经有了无需新建) !然后在其内部建立一个新的py文件将核心代码复制粘贴进去即可。


4.1.2 修改二

第二步我们在该目录下创建一个新的py文件名字为'__init__.py'( 用群内的文件的话已经有了无需新建) ,然后在其内部导入我们的检测头如下图所示。


4.1.3 修改三

第三步我门中到如下文件'ultralytics/nn/tasks.py'进行导入和注册我们的模块( 用群内的文件的话已经有了无需重新导入直接开始第四步即可)

从今天开始以后的教程就都统一成这个样子了,因为我默认大家用了我群内的文件来进行修改!!


4.1.4 修改四

按照我的添加在parse_model里添加即可。


到此就修改完成了,大家可以复制下面的yaml文件运行。


4.2 Diverse Branch Block的yaml文件和训练截图

下面推荐几个版本的yaml文件给大家,大家可以复制进行训练,但是组合用很多具体那种最有效果都不一定,针对不同的数据集效果也不一样,我不可每一种都做实验,所以我下面推荐了三种我自己认为可能有效果的配合方式,你也可以自己进行组合。


4.2.1 Diverse Branch Block的yaml版本一(推荐)

此版本训练信息:YOLO11-C3k2-DBB-backbone summary: 496 layers, 2,877,423 parameters, 2,877,407 gradients, 7.1 GFLOPs

  1. # Ultralytics YOLO 🚀, AGPL-3.0 license
  2. # YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
  3. # Parameters
  4. nc: 80 # number of classes
  5. scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
  6. # [depth, width, max_channels]
  7. n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
  8. s: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
  9. m: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
  10. l: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
  11. x: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs
  12. # YOLO11n backbone
  13. backbone:
  14. # [from, repeats, module, args]
  15. - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
  16. - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
  17. - [-1, 2, C3k2_DBB_backbone, [256, False, 0.25]]
  18. - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
  19. - [-1, 2, C3k2_DBB_backbone, [512, False, 0.25]]
  20. - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
  21. - [-1, 2, C3k2_DBB_backbone, [512, True]]
  22. - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
  23. - [-1, 2, C3k2_DBB_backbone, [1024, True]]
  24. - [-1, 1, SPPF, [1024, 5]] # 9
  25. - [-1, 2, C2PSA, [1024]] # 10
  26. # YOLO11n head
  27. head:
  28. - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  29. - [[-1, 6], 1, Concat, [1]] # cat backbone P4
  30. - [-1, 2, C3k2_DBB_backbone, [512, False]] # 13
  31. - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  32. - [[-1, 4], 1, Concat, [1]] # cat backbone P3
  33. - [-1, 2, C3k2_DBB_backbone, [256, False]] # 16 (P3/8-small)
  34. - [-1, 1, Conv, [256, 3, 2]]
  35. - [[-1, 13], 1, Concat, [1]] # cat head P4
  36. - [-1, 2, C3k2_DBB_backbone, [512, False]] # 19 (P4/16-medium)
  37. - [-1, 1, Conv, [512, 3, 2]]
  38. - [[-1, 10], 1, Concat, [1]] # cat head P5
  39. - [-1, 2, C3k2_DBB_backbone, [1024, True]] # 22 (P5/32-large)
  40. - [[16, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)


4.2.2 Diverse Branch Block的yaml版本二

添加的版本二具体那种适合你需要大家自己多做实验来尝试。

此版本训练信息:YOLO11-C3k2-DBB-neck summary: 416 layers, 2,815,199 parameters, 2,815,183 gradients, 6.7 GFLOPs

  1. # Ultralytics YOLO 🚀, AGPL-3.0 license
  2. # YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
  3. # Parameters
  4. nc: 80 # number of classes
  5. scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
  6. # [depth, width, max_channels]
  7. n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
  8. s: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
  9. m: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
  10. l: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
  11. x: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs
  12. # YOLO11n backbone
  13. backbone:
  14. # [from, repeats, module, args]
  15. - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
  16. - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
  17. - [-1, 2, C3k2_DBB_neck, [256, False, 0.25]]
  18. - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
  19. - [-1, 2, C3k2_DBB_neck, [512, False, 0.25]]
  20. - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
  21. - [-1, 2, C3k2_DBB_neck, [512, True]]
  22. - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
  23. - [-1, 2, C3k2_DBB_neck, [1024, True]]
  24. - [-1, 1, SPPF, [1024, 5]] # 9
  25. - [-1, 2, C2PSA, [1024]] # 10
  26. # YOLO11n head
  27. head:
  28. - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  29. - [[-1, 6], 1, Concat, [1]] # cat backbone P4
  30. - [-1, 2, C3k2_DBB_neck, [512, False]] # 13
  31. - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  32. - [[-1, 4], 1, Concat, [1]] # cat backbone P3
  33. - [-1, 2, C3k2_DBB_neck, [256, False]] # 16 (P3/8-small)
  34. - [-1, 1, Conv, [256, 3, 2]]
  35. - [[-1, 13], 1, Concat, [1]] # cat head P4
  36. - [-1, 2, C3k2_DBB_neck, [512, False]] # 19 (P4/16-medium)
  37. - [-1, 1, Conv, [512, 3, 2]]
  38. - [[-1, 10], 1, Concat, [1]] # cat head P5
  39. - [-1, 2, C3k2_DBB_neck, [1024, True]] # 22 (P5/32-large)
  40. - [[16, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)


4.2.3Diverse Branch Block的yaml版本三

此版本训练信息:YOLO11-DBB summary: 416 layers, 3,471,487 parameters, 3,471,471 gradients, 9.2 GFLOPs

  1. # Ultralytics YOLO 🚀, AGPL-3.0 license
  2. # YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
  3. # Parameters
  4. nc: 80 # number of classes
  5. scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
  6. # [depth, width, max_channels]
  7. n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
  8. s: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
  9. m: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
  10. l: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
  11. x: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs
  12. # YOLO11n backbone
  13. backbone:
  14. # [from, repeats, module, args]
  15. - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
  16. - [-1, 1, DiverseBranchBlock, [128, 3, 2]] # 1-P2/4
  17. - [-1, 2, C3k2, [256, False, 0.25]]
  18. - [-1, 1, DiverseBranchBlock, [256, 3, 2]] # 3-P3/8
  19. - [-1, 2, C3k2, [512, False, 0.25]]
  20. - [-1, 1, DiverseBranchBlock, [512, 3, 2]] # 5-P4/16
  21. - [-1, 2, C3k2, [512, True]]
  22. - [-1, 1, DiverseBranchBlock, [1024, 3, 2]] # 7-P5/32
  23. - [-1, 2, C3k2, [1024, True]]
  24. - [-1, 1, SPPF, [1024, 5]] # 9
  25. - [-1, 2, C2PSA, [1024]] # 10
  26. # YOLO11n head
  27. head:
  28. - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  29. - [[-1, 6], 1, Concat, [1]] # cat backbone P4
  30. - [-1, 2, C3k2, [512, False]] # 13
  31. - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  32. - [[-1, 4], 1, Concat, [1]] # cat backbone P3
  33. - [-1, 2, C3k2, [256, False]] # 16 (P3/8-small)
  34. - [-1, 1, DiverseBranchBlock, [256, 3, 2]]
  35. - [[-1, 13], 1, Concat, [1]] # cat head P4
  36. - [-1, 2, C3k2, [512, False]] # 19 (P4/16-medium)
  37. - [-1, 1, DiverseBranchBlock, [512, 3, 2]]
  38. - [[-1, 10], 1, Concat, [1]] # cat head P5
  39. - [-1, 2, C3k2, [1024, True]] # 22 (P5/32-large)
  40. - [[16, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)


4.2.2 Diverse Branch Block的训练过程截图

下面是添加了 Diverse Branch Block 的训练截图。


五、本文总结

到此本文的正式分享内容就结束了,在这里给大家推荐我的YOLOv11改进有效涨点专栏,本专栏目前为新开的平均质量分98分,后期我会根据各种最新的前沿顶会进行论文复现,也会对一些老的改进机制进行补充,如果大家觉得本文帮助到你了,订阅本专栏,关注后续更多的更新~