学习资源站

YOLOv11改进-Neck篇-独创HFPN利用分层特征融合块HFFB模块融合多层次特征改进yolov11(全网独家创新)

一、本文介绍

本文给大家带来的最新改进是利用分层特征融合块HFFB创新yolov11的neck部分我称之为HFPN,这个模块可以融合局部特征、全局特征、中间特征将三种特征融合在一起辅助yolov11进行检测,经过我的设计分为三种可以 针对大目标、小目标、标准目标的检测方式均不同 ,大家可以根据自己的数据集进行不同的选择,本文的内容为我独家创新。


一、本文介绍

二、原理介绍

三、核心代码

四、添加方法

4.1 修改一

4.2 修改二

4.3 修改三

4.4 修改四

五、正式训练

5.1 yaml文件1

5.2 训练代码

5.3 训练过程截图

五、本文总结


二、原理介绍

官方论文地址: 官方论文点击此处即可跳转

官方代码地址: 官方代码点击此处即可跳转


HiFuse 采用了 三分支分层多尺度特征融合网络 ,结合 CNN 和 Transformer 的优势:

局部分支(Local Feature Block) :通过 3×3 深度可分离卷积提取局部特征。

全局分支(Global Feature Block) :基于 Swin Transformer 采用 窗口多头自注意力(W-MSA) 提取全局信息。

自适应层次特征融合块(HFF Block) :用于融合不同层次的局部和全局特征,包括:

空间注意力(SA) :增强局部细节。

通道注意力(CA) :提升特定语义特征。

残差反向 MLP(IRMLP) :防止梯度消失,提高信息流动。

Shortcut 连接 :优化特征融合效果。


三、核心代码

核心代码的使用方式看章节四!

  1. import torch
  2. import torch.nn as nn
  3. import torch.nn.functional as F
  4. class Conv(nn.Module):
  5. def __init__(self, inp_dim, out_dim, kernel_size=3, stride=1, bn=False, relu=True, bias=True, group=1):
  6. super(Conv, self).__init__()
  7. self.inp_dim = inp_dim
  8. self.conv = nn.Conv2d(inp_dim, out_dim, kernel_size, stride, padding=(kernel_size-1)//2, bias=bias)
  9. self.relu = None
  10. self.bn = None
  11. if relu:
  12. self.relu = nn.ReLU(inplace=True)
  13. if bn:
  14. self.bn = nn.BatchNorm2d(out_dim)
  15. def forward(self, x):
  16. assert x.size()[1] == self.inp_dim, "{} {}".format(x.size()[1], self.inp_dim)
  17. x = self.conv(x)
  18. if self.bn is not None:
  19. x = self.bn(x)
  20. if self.relu is not None:
  21. x = self.relu(x)
  22. return x
  23. def drop_path_f(x, drop_prob: float = 0., training: bool = False):
  24. """Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).
  25. This is the same as the DropConnect impl I created for EfficientNet, etc networks, however,
  26. the original name is misleading as 'Drop Connect' is a different form of dropout in a separate paper...
  27. See discussion: https://github.com/tensorflow/tpu/issues/494#issuecomment-532968956 ... I've opted for
  28. changing the layer and argument names to 'drop path' rather than mix DropConnect as a layer name and use
  29. 'survival rate' as the argument.
  30. """
  31. if drop_prob == 0. or not training:
  32. return x
  33. keep_prob = 1 - drop_prob
  34. shape = (x.shape[0],) + (1,) * (x.ndim - 1) # work with diff dim tensors, not just 2D ConvNets
  35. random_tensor = keep_prob + torch.rand(shape, dtype=x.dtype, device=x.device)
  36. random_tensor.floor_() # binarize
  37. output = x.div(keep_prob) * random_tensor
  38. return output
  39. class DropPath(nn.Module):
  40. """Drop paths (Stochastic Depth) per sample (when applied in main path of residual blocks).
  41. """
  42. def __init__(self, drop_prob=None):
  43. super(DropPath, self).__init__()
  44. self.drop_prob = drop_prob
  45. def forward(self, x):
  46. return drop_path_f(x, self.drop_prob, self.training)
  47. ##### Local Feature Block Component #####
  48. class LayerNorm(nn.Module):
  49. r""" LayerNorm that supports two data formats: channels_last (default) or channels_first.
  50. The ordering of the dimensions in the inputs. channels_last corresponds to inputs with
  51. shape (batch_size, height, width, channels) while channels_first corresponds to inputs
  52. with shape (batch_size, channels, height, width).
  53. """
  54. def __init__(self, normalized_shape, eps=1e-6, data_format="channels_last"):
  55. super().__init__()
  56. self.weight = nn.Parameter(torch.ones(normalized_shape), requires_grad=True)
  57. self.bias = nn.Parameter(torch.zeros(normalized_shape), requires_grad=True)
  58. self.eps = eps
  59. self.data_format = data_format
  60. if self.data_format not in ["channels_last", "channels_first"]:
  61. raise ValueError(f"not support data format '{self.data_format}'")
  62. self.normalized_shape = (normalized_shape,)
  63. def forward(self, x: torch.Tensor) -> torch.Tensor:
  64. if self.data_format == "channels_last":
  65. return F.layer_norm(x, self.normalized_shape, self.weight, self.bias, self.eps)
  66. elif self.data_format == "channels_first":
  67. # [batch_size, channels, height, width]
  68. mean = x.mean(1, keepdim=True)
  69. var = (x - mean).pow(2).mean(1, keepdim=True)
  70. x = (x - mean) / torch.sqrt(var + self.eps)
  71. x = self.weight[:, None, None] * x + self.bias[:, None, None]
  72. return x
  73. class Local_block(nn.Module):
  74. r""" Local Feature Block. There are two equivalent implementations:
  75. (1) DwConv -> LayerNorm (channels_first) -> 1x1 Conv -> GELU -> 1x1 Conv; all in (N, C, H, W)
  76. (2) DwConv -> Permute to (N, H, W, C); LayerNorm (channels_last) -> Linear -> GELU -> Linear; Permute back
  77. We use (2) as we find it slightly faster in PyTorch
  78. Args:
  79. dim (int): Number of input channels.
  80. drop_rate (float): Stochastic depth rate. Default: 0.0
  81. """
  82. def __init__(self, dim, drop_rate=0.):
  83. super().__init__()
  84. self.dwconv = nn.Conv2d(dim, dim, kernel_size=3, padding=1, groups=dim) # depthwise conv
  85. self.norm = LayerNorm(dim, eps=1e-6, data_format="channels_last")
  86. self.pwconv = nn.Linear(dim, dim) # pointwise/1x1 convs, implemented with linear layers
  87. self.act = nn.GELU()
  88. self.drop_path = DropPath(drop_rate) if drop_rate > 0. else nn.Identity()
  89. def forward(self, x: torch.Tensor) -> torch.Tensor:
  90. shortcut = x
  91. x = self.dwconv(x)
  92. x = x.permute(0, 2, 3, 1) # [N, C, H, W] -> [N, H, W, C]
  93. x = self.norm(x)
  94. x = self.pwconv(x)
  95. x = self.act(x)
  96. x = x.permute(0, 3, 1, 2) # [N, H, W, C] -> [N, C, H, W]
  97. x = shortcut + self.drop_path(x)
  98. return x
  99. class IRMLP(nn.Module):
  100. def __init__(self, inp_dim, out_dim):
  101. super(IRMLP, self).__init__()
  102. self.conv1 = Conv(inp_dim, inp_dim, 3, relu=False, bias=False, group=inp_dim)
  103. self.conv2 = Conv(inp_dim, inp_dim * 4, 1, relu=False, bias=False)
  104. self.conv3 = Conv(inp_dim * 4, out_dim, 1, relu=False, bias=False, bn=True)
  105. self.gelu = nn.GELU()
  106. self.bn1 = nn.BatchNorm2d(inp_dim)
  107. def forward(self, x):
  108. residual = x
  109. out = self.conv1(x)
  110. out = self.gelu(out)
  111. out += residual
  112. out = self.bn1(out)
  113. out = self.conv2(out)
  114. out = self.gelu(out)
  115. out = self.conv3(out)
  116. return out
  117. # Hierachical Feature Fusion Block
  118. class HFFB(nn.Module):
  119. def __init__(self, ch_1, r_2=16, drop_rate=0.):
  120. super(HFFB, self).__init__()
  121. ch_2 = ch_1
  122. ch_int = ch_1
  123. ch_out = ch_2
  124. self.maxpool=nn.AdaptiveMaxPool2d(1)
  125. self.avgpool=nn.AdaptiveAvgPool2d(1)
  126. self.se=nn.Sequential(
  127. nn.Conv2d(ch_2, ch_2 // r_2, 1,bias=False),
  128. nn.ReLU(),
  129. nn.Conv2d(ch_2 // r_2, ch_2, 1,bias=False)
  130. )
  131. self.sigmoid = nn.Sigmoid()
  132. self.spatial = Conv(2, 1, 7, bn=True, relu=False, bias=False)
  133. self.W_l = Conv(ch_1, ch_int, 1, bn=True, relu=False)
  134. self.W_g = Conv(ch_2, ch_int, 1, bn=True, relu=False)
  135. self.Avg = nn.AvgPool2d(2, stride=2)
  136. self.Updim = Conv(ch_int//2, ch_int, 1, bn=True, relu=True)
  137. self.norm1 = LayerNorm(ch_int * 3, eps=1e-6, data_format="channels_first")
  138. self.norm2 = LayerNorm(ch_int * 2, eps=1e-6, data_format="channels_first")
  139. self.norm3 = LayerNorm(ch_1 + ch_2 + ch_int, eps=1e-6, data_format="channels_first")
  140. self.W3 = Conv(ch_int * 3, ch_int, 1, bn=True, relu=False)
  141. self.W = Conv(ch_int * 2, ch_int, 1, bn=True, relu=False)
  142. self.gelu = nn.GELU()
  143. self.residual = IRMLP(ch_1 + ch_2 + ch_int, ch_out)
  144. self.drop_path = DropPath(drop_rate) if drop_rate > 0. else nn.Identity()
  145. def forward(self, x):
  146. l, g, f = x
  147. W_local = self.W_l(l) # local feature from Local Feature Block
  148. W_global = self.W_g(g) # global feature from Global Feature Block
  149. if f is not None:
  150. W_f = self.Updim(f)
  151. W_f = self.Avg(W_f)
  152. shortcut = W_f
  153. X_f = torch.cat([W_f, W_local, W_global], 1)
  154. X_f = self.norm1(X_f)
  155. X_f = self.W3(X_f)
  156. X_f = self.gelu(X_f)
  157. else:
  158. shortcut = 0
  159. X_f = torch.cat([W_local, W_global], 1)
  160. X_f = self.norm2(X_f)
  161. X_f = self.W(X_f)
  162. X_f = self.gelu(X_f)
  163. # spatial attention for ConvNeXt branch
  164. l_jump = l
  165. max_result, _ = torch.max(l, dim=1, keepdim=True)
  166. avg_result = torch.mean(l, dim=1, keepdim=True)
  167. result = torch.cat([max_result, avg_result], 1)
  168. l = self.spatial(result)
  169. l = self.sigmoid(l) * l_jump
  170. # channel attetion for transformer branch
  171. g_jump = g
  172. max_result=self.maxpool(g)
  173. avg_result=self.avgpool(g)
  174. max_out=self.se(max_result)
  175. avg_out=self.se(avg_result)
  176. g = self.sigmoid(max_out+avg_out) * g_jump
  177. fuse = torch.cat([g, l, X_f], 1)
  178. fuse = self.norm3(fuse)
  179. fuse = self.residual(fuse)
  180. fuse = shortcut + self.drop_path(fuse)
  181. return fuse


四、添加方法

4.1 修改一

第一还是建立文件,我们找到如下ultralytics/nn文件夹下建立一个目录名字呢就是'Addmodules'文件夹( 用群内的文件的话已经有了无需新建) !然后在其内部建立一个新的py文件将核心代码复制粘贴进去即可。


4.2 修改二

第二步我们在该目录下创建一个新的py文件名字为'__init__.py'( 用群内的文件的话已经有了无需新建) ,然后在其内部导入我们的检测头如下图所示。

​​


4.3 修改三

第三步找到如下文件'ultralytics/nn/tasks.py'进行导入和注册我们的模块( 用群内的文件的话已经有了无需重新导入直接开始第四步即可)

​​


4.4 修改四

找到文件到如下文件'ultralytics/nn/tasks.py',在其中的parse_model方法中添加即可(根据周围代码进行定位即可,如果不会入群内有视频讲解)。

​​


到此就修改完成了,大家可以复制下面的yaml文件运行,

如果不会添加可联系作者入群观看视频教程。


五、正式训练


5.1 yaml文件1

训练信息:YOLO11-HFPN summary: 362 layers, 11,554,950 parameters, 11,554,934 gradients, 13.6 GFLOPs

  1. # Ultralytics YOLO 🚀, AGPL-3.0 license
  2. # YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
  3. # Parameters
  4. nc: 80 # number of classes
  5. scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
  6. # [depth, width, max_channels]
  7. n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
  8. s: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
  9. m: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
  10. l: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
  11. x: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs
  12. # YOLO11n backbone
  13. backbone:
  14. # [from, repeats, module, args]
  15. - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
  16. - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
  17. - [-1, 2, C3k2, [256, False, 0.25]]
  18. - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
  19. - [-1, 2, C3k2, [512, False, 0.25]]
  20. - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
  21. - [-1, 2, C3k2, [512, True]]
  22. - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
  23. - [-1, 2, C3k2, [1024, True]]
  24. - [-1, 1, SPPF, [1024, 5]] # 9
  25. - [-1, 2, C2PSA, [1024]] # 10
  26. # YOLO11n head
  27. head:
  28. - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  29. - [[-1, 6], 1, Concat, [1]] # cat backbone P4
  30. - [-1, 2, C3k2, [512, False]] # 13
  31. - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  32. - [[-1, 4], 1, Concat, [1]] # cat backbone P3
  33. - [-1, 2, C3k2, [256, False]] # 16 (P3/8-small)
  34. - [-1, 1, Conv, [256, 3, 2]]
  35. - [[-1, 13], 1, Concat, [1]] # cat head P4
  36. - [-1, 2, C3k2, [512, False]] # 19 (P4/16-medium)
  37. - [-1, 1, Conv, [512, 3, 2]]
  38. - [[-1, 10], 1, Concat, [1]] # cat head P5
  39. - [-1, 2, C3k2, [1024, True]] # 22 (P5/32-large)
  40. # 下面分了三组,每一组针对的目标不一样顺序是 大、中、小,根据自己的选择进行注释选择即可,只能选择一个默认是小
  41. # - [[22, 10, 19], 1, HFFB, []] # 23 (P5/32-large)
  42. # - [[16, 19, 23], 1, Detect, [nc]] # Detect(P3, P4, P5)
  43. # - [[19, 6, 16], 1, HFFB, []] # 23 (P4/16-medium)
  44. # - [[16, 23, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)
  45. - [[16, 3, 1], 1, HFFB, []] # 23 (P3/8-small)
  46. - [[23, 19, 22], 1, Detect, [nc]] # Detect(P3, P4, P5)


5.2 训练代码

大家可以创建一个py文件将我给的代码复制粘贴进去,配置好自己的文件路径即可运行。

  1. import warnings
  2. warnings.filterwarnings('ignore')
  3. from ultralytics import YOLO
  4. if __name__ == '__main__':
  5. model = YOLO('替换你的模型配置文件yaml文件地址')
  6. # 如何切换模型版本, 上面的ymal文件可以改为 yolov11s.yaml就是使用的v11s,
  7. # 类似某个改进的yaml文件名称为yolov11-XXX.yaml那么如果想使用其它版本就把上面的名称改为yolov11l-XXX.yaml即可(改的是上面YOLO中间的名字不是配置文件的)!
  8. # model.load('yolo11n.pt') # 是否加载预训练权重,科研不建议大家加载否则很难提升精度
  9. model.train(data=r"替换你的数据集配置文件地址",
  10. # 如果大家任务是其它的'ultralytics/cfg/default.yaml'找到这里修改task可以改成detect, segment, classify, pose
  11. cache=False,
  12. imgsz=640,
  13. epochs=150,
  14. single_cls=False, # 是否是单类别检测
  15. batch=16,
  16. close_mosaic=0,
  17. workers=0,
  18. device='0',
  19. optimizer='SGD', # using SGD
  20. # resume='runs/train/exp21/weights/last.pt', # 如过想续训就设置last.pt的地址
  21. amp=False, # 如果出现训练损失为Nan可以关闭amp
  22. project='runs/train',
  23. name='exp',
  24. )


5.3 训练过程截图


五、本文总结

到此本文的正式分享内容就结束了,在这里给大家推荐我的YOLOv11改进有效涨点专栏,本专栏目前为新开的平均质量分98分,后期我会根据各种最新的前沿顶会进行论文复现,也会对一些老的改进机制进行补充,如果大家觉得本文帮助到你了,订阅本专栏,关注后续更多的更新~