学习资源站

YOLOv11改进-Neck篇-利用Damo-YOLO的RepGFPN改进特征融合网络结构(含独家整理版本)

一、本文介绍

本文给大家带来的最新改进机制是Damo-YOLO的RepGFPN 重参数化泛化特征金字塔网络) ,利用其优化YOLOv11的Neck部分,可以在不影响计算量的同时大幅度涨点(亲测在小目标和大目标检测的数据集上效果均表现良好 涨点幅度超级高! )。RepGFPN不同于以往提出的改进模块,其更像是一种结构一种思想(一种处理事情的方法),RepGFPN相对于BiFPN和之前的FPN均有一定程度上的优化效果, 本文含两个版本一个是个人总结的使用方法,另一个是官方的使用方法。



二、RepGFPN的框架原理

官方论文地址: 官方论文地址

官方代码地址: 官方代码地址


RepGFPN(重参数化泛化特征金字塔网络) 是DAMO-YOLO框架中用于实时目标检测的新方法。其主要主要原理是:RepGFPN改善了用于目标检测的 特征金字塔网络 (FPN)的概念,更高效地融合多尺度特征,对于捕捉高层语义和低层空间细节至关重要。

其主要改进机制包括->

  1. 不同尺度通道: 它为不同尺度的特征图采用不同的通道维度,优化了计算资源下的性能。
  2. 优化的皇后融合机制: 该方法通过修改的皇后融合机制增强了特征交互,通过去除额外的上采样操作减少延迟。
  3. 整合CSPNet和ELAN: 它结合了CSPNet和高效层聚合网络(ELAN)以及重参数化,改善了特征融合,而不显著增加计算需求。

总结: RepGFPN更像是一种结构一种思想,其中的模块我们是可以用其它的机制替换的。

下面的图片是Damo-YOLO的网络结构图,其中我用红框标出来的部分就是RepGFPN的路径聚合图。

根据图片我们来说一下GFPN(重 参数化 特征金字塔网络):作为“颈部(也就是 YOLOv8 中的neck),用于优化和融合高层语义和低层空间特征。

在左上角的融合块(Fusion Block)中 我们可以看到反复出现的结构单元,它们由多个1x1卷积,一个3x3卷积组成,这些卷积后面通常跟着批量归一化(BN)和 激活函数 (Act)。这个复合结构在训练时和推理时有所不同,这是通过“简化Rep 3x3”结构来实现的,它在训练时使用3x3卷积,而在推理时则简化为1x1卷积,以提高效率 (现在很多结构都使用在何种思想训练时候用复杂的模块,推理时换为简单的模块,这在大家自己的改进中也可以是一种思想)


三、RepGFPN的核心代码

下面的代码是GFPN的核心代码,我们将其复制导' ultralytics /nn/modules'目录下,在其中创建一个文件,我这里起名为GFPN然后粘贴进去,其余使用方式看章节四。

  1. import torch
  2. import torch.nn as nn
  3. import numpy as np
  4. class swish(nn.Module):
  5. def forward(self, x):
  6. return x * torch.sigmoid(x)
  7. def autopad(k, p=None, d=1): # kernel, padding, dilation
  8. """Pad to 'same' shape outputs."""
  9. if d > 1:
  10. k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k] # actual kernel-size
  11. if p is None:
  12. p = k // 2 if isinstance(k, int) else [x // 2 for x in k] # auto-pad
  13. return p
  14. class Conv(nn.Module):
  15. """Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)."""
  16. default_act = swish() # default activation
  17. def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):
  18. """Initialize Conv layer with given arguments including activation."""
  19. super().__init__()
  20. self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)
  21. self.bn = nn.BatchNorm2d(c2)
  22. self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()
  23. def forward(self, x):
  24. """Apply convolution, batch normalization and activation to input tensor."""
  25. return self.act(self.bn(self.conv(x)))
  26. def forward_fuse(self, x):
  27. """Perform transposed convolution of 2D data."""
  28. return self.act(self.conv(x))
  29. class RepConv(nn.Module):
  30. default_act = swish() # default activation
  31. def __init__(self, c1, c2, k=3, s=1, p=1, g=1, d=1, act=True, bn=False, deploy=False):
  32. """Initializes Light Convolution layer with inputs, outputs & optional activation function."""
  33. super().__init__()
  34. assert k == 3 and p == 1
  35. self.g = g
  36. self.c1 = c1
  37. self.c2 = c2
  38. self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()
  39. self.bn = nn.BatchNorm2d(num_features=c1) if bn and c2 == c1 and s == 1 else None
  40. self.conv1 = Conv(c1, c2, k, s, p=p, g=g, act=False)
  41. self.conv2 = Conv(c1, c2, 1, s, p=(p - k // 2), g=g, act=False)
  42. def forward_fuse(self, x):
  43. """Forward process."""
  44. return self.act(self.conv(x))
  45. def forward(self, x):
  46. """Forward process."""
  47. id_out = 0 if self.bn is None else self.bn(x)
  48. return self.act(self.conv1(x) + self.conv2(x) + id_out)
  49. def get_equivalent_kernel_bias(self):
  50. """Returns equivalent kernel and bias by adding 3x3 kernel, 1x1 kernel and identity kernel with their biases."""
  51. kernel3x3, bias3x3 = self._fuse_bn_tensor(self.conv1)
  52. kernel1x1, bias1x1 = self._fuse_bn_tensor(self.conv2)
  53. kernelid, biasid = self._fuse_bn_tensor(self.bn)
  54. return kernel3x3 + self._pad_1x1_to_3x3_tensor(kernel1x1) + kernelid, bias3x3 + bias1x1 + biasid
  55. def _pad_1x1_to_3x3_tensor(self, kernel1x1):
  56. """Pads a 1x1 tensor to a 3x3 tensor."""
  57. if kernel1x1 is None:
  58. return 0
  59. else:
  60. return torch.nn.functional.pad(kernel1x1, [1, 1, 1, 1])
  61. def _fuse_bn_tensor(self, branch):
  62. """Generates appropriate kernels and biases for convolution by fusing branches of the neural network."""
  63. if branch is None:
  64. return 0, 0
  65. if isinstance(branch, Conv):
  66. kernel = branch.conv.weight
  67. running_mean = branch.bn.running_mean
  68. running_var = branch.bn.running_var
  69. gamma = branch.bn.weight
  70. beta = branch.bn.bias
  71. eps = branch.bn.eps
  72. elif isinstance(branch, nn.BatchNorm2d):
  73. if not hasattr(self, 'id_tensor'):
  74. input_dim = self.c1 // self.g
  75. kernel_value = np.zeros((self.c1, input_dim, 3, 3), dtype=np.float32)
  76. for i in range(self.c1):
  77. kernel_value[i, i % input_dim, 1, 1] = 1
  78. self.id_tensor = torch.from_numpy(kernel_value).to(branch.weight.device)
  79. kernel = self.id_tensor
  80. running_mean = branch.running_mean
  81. running_var = branch.running_var
  82. gamma = branch.weight
  83. beta = branch.bias
  84. eps = branch.eps
  85. std = (running_var + eps).sqrt()
  86. t = (gamma / std).reshape(-1, 1, 1, 1)
  87. return kernel * t, beta - running_mean * gamma / std
  88. def fuse_convs(self):
  89. """Combines two convolution layers into a single layer and removes unused attributes from the class."""
  90. if hasattr(self, 'conv'):
  91. return
  92. kernel, bias = self.get_equivalent_kernel_bias()
  93. self.conv = nn.Conv2d(in_channels=self.conv1.conv.in_channels,
  94. out_channels=self.conv1.conv.out_channels,
  95. kernel_size=self.conv1.conv.kernel_size,
  96. stride=self.conv1.conv.stride,
  97. padding=self.conv1.conv.padding,
  98. dilation=self.conv1.conv.dilation,
  99. groups=self.conv1.conv.groups,
  100. bias=True).requires_grad_(False)
  101. self.conv.weight.data = kernel
  102. self.conv.bias.data = bias
  103. for para in self.parameters():
  104. para.detach_()
  105. self.__delattr__('conv1')
  106. self.__delattr__('conv2')
  107. if hasattr(self, 'nm'):
  108. self.__delattr__('nm')
  109. if hasattr(self, 'bn'):
  110. self.__delattr__('bn')
  111. if hasattr(self, 'id_tensor'):
  112. self.__delattr__('id_tensor')
  113. class BasicBlock_3x3_Reverse(nn.Module):
  114. def __init__(self,
  115. ch_in,
  116. ch_hidden_ratio,
  117. ch_out,
  118. shortcut=True):
  119. super(BasicBlock_3x3_Reverse, self).__init__()
  120. assert ch_in == ch_out
  121. ch_hidden = int(ch_in * ch_hidden_ratio)
  122. self.conv1 = Conv(ch_hidden, ch_out, 3, s=1)
  123. self.conv2 = RepConv(ch_in, ch_hidden, 3, s=1)
  124. self.shortcut = shortcut
  125. def forward(self, x):
  126. y = self.conv2(x)
  127. y = self.conv1(y)
  128. if self.shortcut:
  129. return x + y
  130. else:
  131. return y
  132. class SPP(nn.Module):
  133. def __init__(
  134. self,
  135. ch_in,
  136. ch_out,
  137. k,
  138. pool_size
  139. ):
  140. super(SPP, self).__init__()
  141. self.pool = []
  142. for i, size in enumerate(pool_size):
  143. pool = nn.MaxPool2d(kernel_size=size,
  144. stride=1,
  145. padding=size // 2,
  146. ceil_mode=False)
  147. self.add_module('pool{}'.format(i), pool)
  148. self.pool.append(pool)
  149. self.conv = Conv(ch_in, ch_out, k)
  150. def forward(self, x):
  151. outs = [x]
  152. for pool in self.pool:
  153. outs.append(pool(x))
  154. y = torch.cat(outs, axis=1)
  155. y = self.conv(y)
  156. return y
  157. class CSPStage(nn.Module):
  158. def __init__(self,
  159. ch_in,
  160. ch_out,
  161. n,
  162. block_fn='BasicBlock_3x3_Reverse',
  163. ch_hidden_ratio=1.0,
  164. act='silu',
  165. spp=False):
  166. super(CSPStage, self).__init__()
  167. split_ratio = 2
  168. ch_first = int(ch_out // split_ratio)
  169. ch_mid = int(ch_out - ch_first)
  170. self.conv1 = Conv(ch_in, ch_first, 1)
  171. self.conv2 = Conv(ch_in, ch_mid, 1)
  172. self.convs = nn.Sequential()
  173. next_ch_in = ch_mid
  174. for i in range(n):
  175. if block_fn == 'BasicBlock_3x3_Reverse':
  176. self.convs.add_module(
  177. str(i),
  178. BasicBlock_3x3_Reverse(next_ch_in,
  179. ch_hidden_ratio,
  180. ch_mid,
  181. shortcut=True))
  182. else:
  183. raise NotImplementedError
  184. if i == (n - 1) // 2 and spp:
  185. self.convs.add_module('spp', SPP(ch_mid * 4, ch_mid, 1, [5, 9, 13]))
  186. next_ch_in = ch_mid
  187. self.conv3 = Conv(ch_mid * n + ch_first, ch_out, 1)
  188. def forward(self, x):
  189. y1 = self.conv1(x)
  190. y2 = self.conv2(x)
  191. mid_out = [y1]
  192. for conv in self.convs:
  193. y2 = conv(y2)
  194. mid_out.append(y2)
  195. y = torch.cat(mid_out, axis=1)
  196. y = self.conv3(y)
  197. return y


四、手把手教你添加RepGFPN

4.1 修改一

第一还是建立文件,我们找到如下ultralytics/nn文件夹下建立一个目录名字呢就是'Addmodules'文件夹( 用群内的文件的话已经有了无需新建) !然后在其内部建立一个新的py文件将核心代码复制粘贴进去即可。


4.2 修改二

第二步我们在该目录下创建一个新的py文件名字为'__init__.py'( 用群内的文件的话已经有了无需新建) ,然后在其内部导入我们的检测头如下图所示。


4.3 修改三

第三步我门中到如下文件'ultralytics/nn/tasks.py'进行导入和注册我们的模块( 用群内的文件的话已经有了无需重新导入直接开始第四步即可)

从今天开始以后的教程就都统一成这个样子了,因为我默认大家用了我群内的文件来进行修改!!


4.4 修改四

按照我的添加在parse_model里添加即可。

到此就修改完成了,大家可以复制下面的yaml文件运行。


五、RepGFPN的yaml文件

5.1 yaml文件1

此版本训练信息:YOLO11-RepGFPN-1 summary: 347 layers, 2,903,259 parameters, 2,903,243 gradients, 6.9 GFLOPs

# 版本说明:此版本为只模仿RepGFPN的格式而不使用他提出的机制更有可能涨点.

  1. # Ultralytics YOLO 🚀, AGPL-3.0 license
  2. # YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
  3. # Parameters
  4. nc: 80 # number of classes
  5. scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
  6. # [depth, width, max_channels]
  7. n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
  8. s: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
  9. m: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
  10. l: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
  11. x: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs
  12. # 版本说明:此版本为只模仿RepGFPN的格式而不使用他提出的机制更有可能涨点.
  13. # YOLO11n backbone
  14. backbone:
  15. # [from, repeats, module, args]
  16. - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
  17. - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
  18. - [-1, 2, C3k2, [256, False, 0.25]]
  19. - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
  20. - [-1, 2, C3k2, [512, False, 0.25]]
  21. - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
  22. - [-1, 2, C3k2, [512, True]]
  23. - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
  24. - [-1, 2, C3k2, [1024, True]]
  25. - [-1, 1, SPPF, [1024, 5]] # 9
  26. - [-1, 2, C2PSA, [1024]] # 10
  27. # DAMO-YOLO GFPN Head
  28. head:
  29. - [-1, 1, Conv, [512, 1, 1]] # 11
  30. - [6, 1, Conv, [512, 3, 2]]
  31. - [[-1, 11], 1, Concat, [1]]
  32. - [-1, 2, C3k2, [512, False]] # 14
  33. - [-1, 1, nn.Upsample, [None, 2, 'nearest']] #15
  34. - [4, 1, Conv, [256, 3, 2]] # 16
  35. - [[15, -1, 6], 1, Concat, [1]]
  36. - [-1, 2, C3k2, [512, False]] # 18
  37. - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
  38. - [[-1, 4], 1, Concat, [1]]
  39. - [-1, 2, C3k2, [256, False]] # 21
  40. - [-1, 1, Conv, [256, 3, 2]]
  41. - [[-1, 18], 1, Concat, [1]]
  42. - [-1, 2, C3k2, [512, False]] # 24
  43. - [18, 1, Conv, [256, 3, 2]] # 25
  44. - [24, 1, Conv, [256, 3, 2]] # 26
  45. - [[14, 25, -1], 1, Concat, [1]]
  46. - [-1, 2, C3k2, [1024, True]] # 28
  47. - [[21, 24, 28], 1, Detect, [nc]] # Detect(P3, P4, P5)


5.2 yaml文件2

训练信息:YOLO11-RepGFPN-2 summary: 441 layers, 3,734,203 parameters, 3,734,187 gradients, 8.5 GFLOPs

  1. # Ultralytics YOLO 🚀, AGPL-3.0 license
  2. # YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
  3. # Parameters
  4. nc: 80 # number of classes
  5. scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
  6. # [depth, width, max_channels]
  7. n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
  8. s: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
  9. m: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
  10. l: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
  11. x: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs
  12. # 版本说明:此版本为只模仿RepGFPN的格式而不使用他提出的机制更有可能涨点.
  13. # YOLO11n backbone
  14. backbone:
  15. # [from, repeats, module, args]
  16. - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
  17. - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
  18. - [-1, 2, C3k2, [256, False, 0.25]]
  19. - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
  20. - [-1, 2, C3k2, [512, False, 0.25]]
  21. - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
  22. - [-1, 2, C3k2, [512, True]]
  23. - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
  24. - [-1, 2, C3k2, [1024, True]]
  25. - [-1, 1, SPPF, [1024, 5]] # 9
  26. - [-1, 2, C2PSA, [1024]] # 10
  27. # DAMO-YOLO GFPN Head
  28. head:
  29. - [-1, 1, Conv, [512, 1, 1]] # 11
  30. - [6, 1, Conv, [512, 3, 2]]
  31. - [[-1, 11], 1, Concat, [1]]
  32. - [-1, 3, CSPStage, [512]] # 14
  33. - [-1, 1, nn.Upsample, [None, 2, 'nearest']] #15
  34. - [4, 1, Conv, [256, 3, 2]] # 16
  35. - [[15, -1, 6], 1, Concat, [1]]
  36. - [-1, 3, CSPStage, [512]] # 18
  37. - [-1, 1, nn.Upsample, [None, 2, 'nearest']]
  38. - [[-1, 4], 1, Concat, [1]]
  39. - [-1, 3, CSPStage, [256]] # 21
  40. - [-1, 1, Conv, [256, 3, 2]]
  41. - [[-1, 18], 1, Concat, [1]]
  42. - [-1, 3, CSPStage, [512]] # 24
  43. - [18, 1, Conv, [256, 3, 2]] # 25
  44. - [24, 1, Conv, [256, 3, 2]] # 26
  45. - [[14, 25, -1], 1, Concat, [1]]
  46. - [-1, 3, CSPStage, [1024]] # 28
  47. - [[21, 24, 28], 1, Detect, [nc]] # Detect(P3, P4, P5)

训练代码!

  1. import warnings
  2. warnings.filterwarnings('ignore')
  3. from ultralytics import YOLO
  4. if __name__ == '__main__':
  5. model = YOLO('ultralytics/cfg/models/v8/yolov8-C2f-FasterBlock.yaml')
  6. # model.load('yolov8n.pt') # loading pretrain weights
  7. model.train(data=r'替换数据集yaml文件地址',
  8. # 如果大家任务是其它的'ultralytics/cfg/default.yaml'找到这里修改task可以改成detect, segment, classify, pose
  9. cache=False,
  10. imgsz=640,
  11. epochs=150,
  12. single_cls=False, # 是否是单类别检测
  13. batch=4,
  14. close_mosaic=10,
  15. workers=0,
  16. device='0',
  17. optimizer='SGD', # using SGD
  18. # resume='', # 如过想续训就设置last.pt的地址
  19. amp=False, # 如果出现训练损失为Nan可以关闭amp
  20. project='runs/train',
  21. name='exp',
  22. )


六、成功运行的截图

下面是成功运行的截图,确保我添加的机制是可以完美运行的给大家证明。


六、本文总结

到此本文的正式分享内容就结束了,在这里给大家推荐我的YOLOv11改进有效涨点专栏,本专栏目前为新开的平均质量分98分,后期我会根据各种最新的前沿顶会进行论文复现,也会对一些老的改进机制进行补充 如果大家觉得本文帮助到你了,订阅本专栏,关注后续更多的更新~