学习资源站

YOLOv11改进-Neck篇-结合SDI和ASF-YOLO形成全新的特征融合网络网络(分割高效涨点,二次创新)

一、本文介绍

本文给大家带来的改进机制是利用多层次 特征融合 模块 (SDI) 配上经典的加权 双向特征金字塔网络 ASF-YOLO的Neck 形成一种全新的Neck网络结构,从而达到二次创新的效果,其中 (SDI) 模块的主要思想是通过整合编码器生成的层级特征图来增强图像中的语义信息和细节信息。 ASF-YOLO (发布于2023.12月份的最新机制) 其是特别设计用于细胞实例分割。这个 模型 通过结合空间和尺度特征,提高了在处理细胞图像时的准确性和速度。在实验中, ASF-YOLO 在2018年数据科学竞赛数据集上取得了卓越的分割准确性和速度,达到了0.91的box mAP(平均精度),0.887的mask mAP,以及47.3 FPS的推理速度,效果非常的好。同时这种融合我们在书写论文的时候论文的结构图也比较好画,同时本文的 SDI 模块在分割领域涨点高效, 融合起来非常适用于目标分割。

欢迎大家订阅我的专栏一起学习YOLO!




二、原理介绍

大家想要看原理介绍可以看我单独的博客介绍。

ASF-YOLO: ASFYOLO的原理介绍点击此处即可跳转

SDI: SDI的原理介绍点击此处即可跳转


三、ASF-YOLO的核心代码

使用方式看章节四。

  1. import torch
  2. import torch.nn as nn
  3. import torch.nn.functional as F
  4. import math
  5. def autopad(k, p=None, d=1): # kernel, padding, dilation
  6. # Pad to 'same' shape outputs
  7. if d > 1:
  8. k = d * (k - 1) + 1 if isinstance(k, int) else [d * (x - 1) + 1 for x in k] # actual kernel-size
  9. if p is None:
  10. p = k // 2 if isinstance(k, int) else [x // 2 for x in k] # auto-pad
  11. return p
  12. class Conv(nn.Module):
  13. # Standard convolution with args(ch_in, ch_out, kernel, stride, padding, groups, dilation, activation)
  14. default_act = nn.SiLU() # default activation
  15. def __init__(self, c1, c2, k=1, s=1, p=None, g=1, d=1, act=True):
  16. super().__init__()
  17. self.conv = nn.Conv2d(c1, c2, k, s, autopad(k, p, d), groups=g, dilation=d, bias=False)
  18. self.bn = nn.BatchNorm2d(c2)
  19. self.act = self.default_act if act is True else act if isinstance(act, nn.Module) else nn.Identity()
  20. def forward(self, x):
  21. return self.act(self.bn(self.conv(x)))
  22. def forward_fuse(self, x):
  23. return self.act(self.conv(x))
  24. class Zoom_cat(nn.Module):
  25. def __init__(self):
  26. super().__init__()
  27. # self.conv_l_post_down = Conv(in_dim, 2*in_dim, 3, 1, 1)
  28. def forward(self, x):
  29. """l,m,s表示大中小三个尺度,最终会被整合到m这个尺度上"""
  30. l, m, s = x[0], x[1], x[2]
  31. tgt_size = m.shape[2:]
  32. l = F.adaptive_max_pool2d(l, tgt_size) + F.adaptive_avg_pool2d(l, tgt_size)
  33. # l = self.conv_l_post_down(l)
  34. # m = self.conv_m(m)
  35. # s = self.conv_s_pre_up(s)
  36. s = F.interpolate(s, m.shape[2:], mode='nearest')
  37. # s = self.conv_s_post_up(s)
  38. lms = torch.cat([l, m, s], dim=1)
  39. return lms
  40. class ScalSeq(nn.Module):
  41. def __init__(self, inc, channel):
  42. super(ScalSeq, self).__init__()
  43. self.conv0 = Conv(inc[0], channel, 1)
  44. self.conv1 = Conv(inc[1], channel, 1)
  45. self.conv2 = Conv(inc[2], channel, 1)
  46. self.conv3d = nn.Conv3d(channel, channel, kernel_size=(1, 1, 1))
  47. self.bn = nn.BatchNorm3d(channel)
  48. self.act = nn.LeakyReLU(0.1)
  49. self.pool_3d = nn.MaxPool3d(kernel_size=(3,1,1))
  50. def forward(self, x):
  51. p3, p4, p5 = x[0], x[1], x[2]
  52. p3 = self.conv0(p3)
  53. p4_2 = self.conv1(p4)
  54. p4_2 = F.interpolate(p4_2, p3.size()[2:], mode='nearest')
  55. p5_2 = self.conv2(p5)
  56. p5_2 = F.interpolate(p5_2, p3.size()[2:], mode='nearest')
  57. p3_3d = torch.unsqueeze(p3, -3)
  58. p4_3d = torch.unsqueeze(p4_2, -3)
  59. p5_3d = torch.unsqueeze(p5_2, -3)
  60. combine = torch.cat([p3_3d, p4_3d, p5_3d],dim = 2)
  61. conv_3d = self.conv3d(combine)
  62. bn = self.bn(conv_3d)
  63. act = self.act(bn)
  64. x = self.pool_3d(act)
  65. x = torch.squeeze(x, 2)
  66. return x
  67. class Add(nn.Module):
  68. # Concatenate a list of tensors along dimension
  69. def __init__(self, ch=256):
  70. super().__init__()
  71. def forward(self, x):
  72. input1, input2 = x[0], x[1]
  73. x = input1 + input2
  74. return x
  75. class channel_att(nn.Module):
  76. def __init__(self, channel, b=1, gamma=2):
  77. super(channel_att, self).__init__()
  78. kernel_size = int(abs((math.log(channel, 2) + b) / gamma))
  79. kernel_size = kernel_size if kernel_size % 2 else kernel_size + 1
  80. self.avg_pool = nn.AdaptiveAvgPool2d(1)
  81. self.conv = nn.Conv1d(1, 1, kernel_size=kernel_size, padding=(kernel_size - 1) // 2, bias=False)
  82. self.sigmoid = nn.Sigmoid()
  83. def forward(self, x):
  84. y = self.avg_pool(x)
  85. y = y.squeeze(-1)
  86. y = y.transpose(-1, -2)
  87. y = self.conv(y).transpose(-1, -2).unsqueeze(-1)
  88. y = self.sigmoid(y)
  89. return x * y.expand_as(x)
  90. class local_att(nn.Module):
  91. def __init__(self, channel, reduction=16):
  92. super(local_att, self).__init__()
  93. self.conv_1x1 = nn.Conv2d(in_channels=channel, out_channels=channel // reduction, kernel_size=1, stride=1,
  94. bias=False)
  95. self.relu = nn.ReLU()
  96. self.bn = nn.BatchNorm2d(channel // reduction)
  97. self.F_h = nn.Conv2d(in_channels=channel // reduction, out_channels=channel, kernel_size=1, stride=1,
  98. bias=False)
  99. self.F_w = nn.Conv2d(in_channels=channel // reduction, out_channels=channel, kernel_size=1, stride=1,
  100. bias=False)
  101. self.sigmoid_h = nn.Sigmoid()
  102. self.sigmoid_w = nn.Sigmoid()
  103. def forward(self, x):
  104. _, _, h, w = x.size()
  105. x_h = torch.mean(x, dim=3, keepdim=True).permute(0, 1, 3, 2)
  106. x_w = torch.mean(x, dim=2, keepdim=True)
  107. x_cat_conv_relu = self.relu(self.bn(self.conv_1x1(torch.cat((x_h, x_w), 3))))
  108. x_cat_conv_split_h, x_cat_conv_split_w = x_cat_conv_relu.split([h, w], 3)
  109. s_h = self.sigmoid_h(self.F_h(x_cat_conv_split_h.permute(0, 1, 3, 2)))
  110. s_w = self.sigmoid_w(self.F_w(x_cat_conv_split_w))
  111. out = x * s_h.expand_as(x) * s_w.expand_as(x)
  112. return out
  113. class attention_model(nn.Module):
  114. # Concatenate a list of tensors along dimension
  115. def __init__(self, ch=256):
  116. super().__init__()
  117. self.channel_att = channel_att(ch)
  118. self.local_att = local_att(ch)
  119. def forward(self, x):
  120. input1, input2 = x[0], x[1]
  121. input1 = self.channel_att(input1)
  122. x = input1 + input2
  123. x = self.local_att(x)
  124. return x


四、手把手教你添加ASF-YOLO

4.1 修改一

第一还是建立文件,我们找到如下 ultralytics /nn文件夹下建立一个目录名字呢就是'Addmodules'文件夹( 用群内的文件的话已经有了无需新建) !然后在其内部建立一个新的py文件将核心代码复制粘贴进去即可。


4.2 修改二

第二步我们在该目录下创建一个新的py文件名字为'__init__.py'( 用群内的文件的话已经有了无需新建) ,然后在其内部导入我们的检测头如下图所示。

​​


4.3 修改三

第三步我门中到如下文件'ultralytics/nn/tasks.py'进行导入和注册我们的模块( 用群内的文件的话已经有了无需重新导入直接开始第四步即可)

​​


4.4 修改四

按照我的添加在parse_model里添加即可。

  1. # ------------------------------ASF-YOLO--------------------------------
  2. elif m is Zoom_cat:
  3. c2 = sum(ch[x] for x in f)
  4. elif m is Add:
  5. c2 = ch[f[-1]]
  6. elif m is ScalSeq:
  7. c1 = [ch[x] for x in f]
  8. c2 = make_divisible(args[0] * width, 8)
  9. args = [c1, c2]
  10. elif m is attention_model:
  11. args = [ch[f[-1]]]
  12. # ------------------------------ASF-YOLO--------------------------------

​​


五、 SDI 的核心代码

代码的使用方式看章节七!

  1. import torch
  2. import torch.nn as nn
  3. import torch.nn.functional as F
  4. __all__ = ['SDI']
  5. class SDI(nn.Module):
  6. def __init__(self, channel):
  7. super().__init__()
  8. self.convs = nn.ModuleList(
  9. [nn.Conv2d(c, channel[0], kernel_size=3, stride=1, padding=1) for c in channel])
  10. def forward(self, xs):
  11. ans = torch.ones_like(xs[0])
  12. target_size = xs[0].shape[-2:]
  13. for i, x in enumerate(xs):
  14. if x.shape[-1] > target_size[0]:
  15. x = F.adaptive_avg_pool2d(x, (target_size[0], target_size[1]))
  16. elif x.shape[-1] < target_size[0]:
  17. x = F.interpolate(x, size=(target_size[0], target_size[1]),
  18. mode='bilinear', align_corners=True)
  19. ans = ans * self.convs[i](x)
  20. return ans


六、手把手教你添加SDI机制


6.1 修改一

第一还是建立文件,我们找到如下ultralytics/nn文件夹下建立一个目录名字呢就是'Addmodules'文件夹( 用群内的文件的话已经有了无需新建) !然后在其内部建立一个新的py文件将核心代码复制粘贴进去即可。


6.2 修改二

第二步我们在该目录下创建一个新的py文件名字为'__init__.py'( 用群内的文件的话已经有了无需新建) ,然后在其内部导入我们的检测头如下图所示。


6.3 修改三

第三步我门中到如下文件'ultralytics/nn/tasks.py'进行导入和注册我们的模块( 用群内的文件的话已经有了无需重新导入直接开始第四步即可)

从今天开始以后的教程就都统一成这个样子了,因为我默认大家用了我群内的文件来进行修改!!


6.4 修改四

按照我的添加在parse_model里添加即可。

  1. elif m is SDI:
  2. args = [[ch[x] for x in f]]

到此就修改完成了,大家可以复制下面的yaml文件运行。


七、融合后的yaml文件

大家复制下面的yaml文件通过自己习惯的方式运行即可,具体如何运行这里就不介绍了,不会的可以看群内的视频或者我的运行教程即可。

7.1、yaml文件版本一

此版本训练信息: YOLO11-ASFYOLO-SDI-1 summary: 397 layers, 3,293,634 parameters, 3,293,618 gradients, 8.2 GFLOPs

  1. # Ultralytics YOLO 🚀, AGPL-3.0 license
  2. # YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
  3. # Parameters
  4. nc: 80 # number of classes
  5. scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
  6. # [depth, width, max_channels]
  7. n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
  8. s: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
  9. m: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
  10. l: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
  11. x: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs
  12. # YOLO11n backbone
  13. backbone:
  14. # [from, repeats, module, args]
  15. - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
  16. - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
  17. - [-1, 2, C3k2, [256, False, 0.25]]
  18. - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
  19. - [-1, 2, C3k2, [512, False, 0.25]]
  20. - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
  21. - [-1, 2, C3k2, [512, True]]
  22. - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
  23. - [-1, 2, C3k2, [1024, True]]
  24. - [-1, 1, SPPF, [1024, 5]] # 9
  25. - [-1, 2, C2PSA, [1024]] # 10
  26. # YOLO11n head
  27. head:
  28. - [-1, 1, Conv, [512, 1, 1]] # 11
  29. - [4, 1, Conv, [512, 1, 1]] # 12
  30. - [[-1, 6, -2], 1, Zoom_cat, []] # 13 cat backbone P4
  31. - [-1, 3, C3k2, [512, False]] # 14
  32. - [-1, 1, Conv, [256, 1, 1]] # 15
  33. - [2, 1, Conv, [256, 1, 1]] # 16
  34. - [[-1, 4, -2], 1, Zoom_cat, []] # 17 cat backbone P3
  35. - [-1, 3, C3k2, [256, False]] # 18 (P3/8-small)
  36. - [-1, 1, Conv, [256, 3, 2]] # 19
  37. - [[-1, 15], 1, SDI, []] # 20 cat head P4
  38. - [-1, 3, C3k2, [512, False]] # 21(P4/16-medium)
  39. - [-1, 1, Conv, [512, 3, 2]] # 22
  40. - [[-1, 11], 1, SDI, []] # 23 cat head P5
  41. - [-1, 3, C3k2, [1024, True]] # 24 (P5/32-large)
  42. - [[4, 6, 8], 1, ScalSeq, [256]] # 25 args[inchane]
  43. - [[18, -1], 1, Add, [64]] # 26
  44. - [[26, 21, 24], 1, Detect, [nc]] # RTDETRDecoder(P3, P4, P5)

7.2、yaml文件版本二

此版本训练信息:YOLO11-ASFYOLO-SDI-2 summary: 409 layers, 3,294,413 parameters, 3,294,397 gradients, 8.2 GFLOPs

  1. # Ultralytics YOLO 🚀, AGPL-3.0 license
  2. # YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
  3. # Parameters
  4. nc: 80 # number of classes
  5. scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
  6. # [depth, width, max_channels]
  7. n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
  8. s: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
  9. m: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
  10. l: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
  11. x: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs
  12. # YOLO11n backbone
  13. backbone:
  14. # [from, repeats, module, args]
  15. - [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
  16. - [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
  17. - [-1, 2, C3k2, [256, False, 0.25]]
  18. - [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
  19. - [-1, 2, C3k2, [512, False, 0.25]]
  20. - [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
  21. - [-1, 2, C3k2, [512, True]]
  22. - [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
  23. - [-1, 2, C3k2, [1024, True]]
  24. - [-1, 1, SPPF, [1024, 5]] # 9
  25. - [-1, 2, C2PSA, [1024]] # 10
  26. # YOLO11n head
  27. head:
  28. - [-1, 1, Conv, [512, 1, 1]] # 11
  29. - [4, 1, Conv, [512, 1, 1]] # 12
  30. - [[-1, 6, -2], 1, Zoom_cat, []] # 13 cat backbone P4
  31. - [-1, 3, C3k2, [512, False]] # 14
  32. - [-1, 1, Conv, [256, 1, 1]] # 15
  33. - [2, 1, Conv, [256, 1, 1]] # 16
  34. - [[-1, 4, -2], 1, Zoom_cat, []] # 17 cat backbone P3
  35. - [-1, 3, C3k2, [256, False]] # 18 (P3/8-small)
  36. - [-1, 1, Conv, [256, 3, 2]] # 19
  37. - [[-1, 15], 1, SDI, []] # 20 cat head P4
  38. - [-1, 3, C3k2, [512, False]] # 21(P4/16-medium)
  39. - [-1, 1, Conv, [512, 3, 2]] # 22
  40. - [[-1, 11], 1, SDI, []] # 23 cat head P5
  41. - [-1, 3, C3k2, [1024, True]] # 24 (P5/32-large)
  42. - [[4, 6, 8], 1, ScalSeq, [256]] # 25 args[inchane]
  43. - [[18, -1], 1, attention_model, [256]] # 26
  44. - [[26, 21, 24], 1, Detect, [nc]] # RTDETRDecoder(P3, P4, P5)


八、运行截图


九、全文总结

到此本文的正式分享内容就结束了,在这里给大家推荐我的YOLOv11改进有效涨点专栏,本专栏目前为新开的平均质量分98分,后期我会根据各种最新的前沿顶会进行论文复现,也会对一些老的改进机制进行补充,目前本专栏免费阅读(暂时,大家尽早关注不迷路~),如果大家觉得本文帮助到你了,订阅本专栏,关注后续更多的更新~