学习资源站

YOLOv11改进-主干_Backbone篇-目标检测EfficientNetV2均衡缩放网络改进yolov11特征提取(适配yolov11全系列轻量化)

一、本文介绍

这次给大家带来的改进机制是 EfficientNetV2 其在其V1版本通过均衡地缩放网络的深度、宽度和 分辨率 ,以提高卷积 神经网络 的性能的基础上,又 提出了一种改进的渐进式学习方法 ,通过在训练过程中逐步增加图像尺寸并适应性调整 正则化 来加快训练速度,同时保持准确性。所以其相对于V1版本的改进主要是在速度和效率上的改进 ( 但是经过我实验我觉得V2不如V1快,可能是我使用的不是同一等级的版本,大家也可以进行一下对比 )。 本文通过介绍其主要框架原理,然后教大家如何添加该网络结构到网络模型中。

(本文内容可根据yolov11的N、S、M、L、X进行二次缩放,轻量化更上一层)。



二、EfficientNetV2的框架原理

官方论文地址: 官方论文地址点击即可跳转

官方代码地址: 官方代码地址点击即可跳转


这篇论文主要介绍了EfficientNetV2 这是一种新型的 卷积神经网络 ,它的特点是训练速度更快、参数效率更高。通过结合训练感知的神经架构搜索和缩放,这些模型在训练速度和参数效率上都得到了优化。文章还提出了一种改进的渐进式学习方法,通过在训练过程中逐步增加图像尺寸并适应性调整正则化来加快训练速度,同时保持准确性。

EfficientNetV2的主要创新点包括:

1. 结构创新: EfficientNetV2 在早期层中采用了fused-MBConv结构,这有助于降低内存访问开销。此外,EfficientNetV2倾向于使用较小的扩展比例和3x3的卷积核大小,同时增加更多的层次来补偿由于使用较小卷积核导致的接收域减小。最后,EfficientNetV2完全移除了原始EfficientNet中的最后一个stride-1阶段,可能是因为它的大参数尺寸和内存访问开销。

2. 训练速度的优化: 您的研究比较了EfficientNetV2与其他模型在固定图像大小下的训练步骤时间。EfficientNetV2通过训练感知的神经架构搜索和模型缩放,实现了比其他最新模型更快的训练速度。

3.渐进式学习与自适应正则化: EfficientNetV2采用了改进的渐进式学习方法,该方法在训练早期使用较小的图像尺寸和较弱的正则化,使得网络可以更容易、更快地学习简单的表示。随着训练的进行,逐渐增加图像尺寸,并通过增强正则化来提高学习难度。

4. 自适应正则化的重要性: 您的研究强调了自适应正则化的重要性,这种方法根据图像大小动态调整正则化强度。该方法简单但有效,并且可以与其他方法结合使用。

图片展示了两种卷积神经网络中的模块: MBConv和Fused-MBConv的结构。

MBConv: 这是一种包含了深度可分离卷积(depthwise conv3x3)的模块,其包括1x1的卷积用于调整通道数,随后是深度可分离卷积用于捕捉空间特征,最后又是一个1x1的卷积来恢复通道数。此外,它还包含一个SE模块(Squeeze-and-Excitation),用于通过学习重要通道的权重来提高网络的表示能力。

Fused-MBConv: 与MBConv类似,这种结构也包含了SE模块和1x1的卷积,但它将深度可分离卷积替换为了一个标准的3x3卷积,这通常可以减少运算量并提高性能。

这两种结构通常用于构建高效的 深度学习 模型,特别是在计算资源有限的情况下。Fused-MBConv因为其结构简会带来计算效率的提升。


三、EfficientNetV2的核心代码

  1. import copy
  2. from functools import partial
  3. from collections import OrderedDict
  4. from torch import nn
  5. import os
  6. import re
  7. import subprocess
  8. from pathlib import Path
  9. import numpy as np
  10. import torch
  11. __all__ = ['efficientnet_v2']
  12. def get_efficientnet_v2_structure(model_name):
  13. if 'efficientnet_v2_s' in model_name:
  14. return [
  15. # e k s in out xN se fused
  16. (1, 3, 1, 24, 24, 2, False, True),
  17. (4, 3, 2, 24, 48, 4, False, True),
  18. (4, 3, 2, 48, 64, 4, False, True),
  19. (4, 3, 2, 64, 128, 6, True, False),
  20. (6, 3, 1, 128, 160, 9, True, False),
  21. (6, 3, 2, 160, 256, 15, True, False),
  22. ]
  23. elif 'efficientnet_v2_m' in model_name:
  24. return [
  25. # e k s in out xN se fused
  26. (1, 3, 1, 24, 24, 3, False, True),
  27. (4, 3, 2, 24, 48, 5, False, True),
  28. (4, 3, 2, 48, 80, 5, False, True),
  29. (4, 3, 2, 80, 160, 7, True, False),
  30. (6, 3, 1, 160, 176, 14, True, False),
  31. (6, 3, 2, 176, 304, 18, True, False),
  32. (6, 3, 1, 304, 512, 5, True, False),
  33. ]
  34. elif 'efficientnet_v2_l' in model_name:
  35. return [
  36. # e k s in out xN se fused
  37. (1, 3, 1, 32, 32, 4, False, True),
  38. (4, 3, 2, 32, 64, 7, False, True),
  39. (4, 3, 2, 64, 96, 7, False, True),
  40. (4, 3, 2, 96, 192, 10, True, False),
  41. (6, 3, 1, 192, 224, 19, True, False),
  42. (6, 3, 2, 224, 384, 25, True, False),
  43. (6, 3, 1, 384, 640, 7, True, False),
  44. ]
  45. elif 'efficientnet_v2_xl' in model_name:
  46. return [
  47. # e k s in out xN se fused
  48. (1, 3, 1, 32, 32, 4, False, True),
  49. (4, 3, 2, 32, 64, 8, False, True),
  50. (4, 3, 2, 64, 96, 8, False, True),
  51. (4, 3, 2, 96, 192, 16, True, False),
  52. (6, 3, 1, 192, 256, 24, True, False),
  53. (6, 3, 2, 256, 512, 32, True, False),
  54. (6, 3, 1, 512, 640, 8, True, False),
  55. ]
  56. class ConvBNAct(nn.Sequential):
  57. """Convolution-Normalization-Activation Module"""
  58. def __init__(self, in_channel, out_channel, kernel_size, stride, groups, norm_layer, act, conv_layer=nn.Conv2d):
  59. super(ConvBNAct, self).__init__(
  60. conv_layer(in_channel, out_channel, kernel_size, stride=stride, padding=(kernel_size-1)//2, groups=groups, bias=False),
  61. norm_layer(out_channel),
  62. act()
  63. )
  64. class SEUnit(nn.Module):
  65. """Squeeze-Excitation Unit
  66. paper: https://openaccess.thecvf.com/content_cvpr_2018/html/Hu_Squeeze-and-Excitation_Networks_CVPR_2018_paper
  67. """
  68. def __init__(self, in_channel, reduction_ratio=4, act1=partial(nn.SiLU, inplace=True), act2=nn.Sigmoid):
  69. super(SEUnit, self).__init__()
  70. hidden_dim = in_channel // reduction_ratio
  71. self.avg_pool = nn.AdaptiveAvgPool2d((1, 1))
  72. self.fc1 = nn.Conv2d(in_channel, hidden_dim, (1, 1), bias=True)
  73. self.fc2 = nn.Conv2d(hidden_dim, in_channel, (1, 1), bias=True)
  74. self.act1 = act1()
  75. self.act2 = act2()
  76. def forward(self, x):
  77. return x * self.act2(self.fc2(self.act1(self.fc1(self.avg_pool(x)))))
  78. class StochasticDepth(nn.Module):
  79. """StochasticDepth
  80. paper: https://link.springer.com/chapter/10.1007/978-3-319-46493-0_39
  81. :arg
  82. - prob: Probability of dying
  83. - mode: "row" or "all". "row" means that each row survives with different probability
  84. """
  85. def __init__(self, prob, mode):
  86. super(StochasticDepth, self).__init__()
  87. self.prob = prob
  88. self.survival = 1.0 - prob
  89. self.mode = mode
  90. def forward(self, x):
  91. if self.prob == 0.0 or not self.training:
  92. return x
  93. else:
  94. shape = [x.size(0)] + [1] * (x.ndim - 1) if self.mode == 'row' else [1]
  95. return x * torch.empty(shape).bernoulli_(self.survival).div_(self.survival).to(x.device)
  96. class MBConvConfig:
  97. """EfficientNet Building block configuration"""
  98. def __init__(self, expand_ratio: float, kernel: int, stride: int, in_ch: int, out_ch: int, layers: int,
  99. use_se: bool, fused: bool, act=nn.SiLU, norm_layer=nn.BatchNorm2d):
  100. self.expand_ratio = expand_ratio
  101. self.kernel = kernel
  102. self.stride = stride
  103. self.in_ch = in_ch
  104. self.out_ch = out_ch
  105. self.num_layers = layers
  106. self.act = act
  107. self.norm_layer = norm_layer
  108. self.use_se = use_se
  109. self.fused = fused
  110. @staticmethod
  111. def adjust_channels(channel, factor, divisible=8):
  112. new_channel = channel * factor
  113. divisible_channel = max(divisible, (int(new_channel + divisible / 2) // divisible) * divisible)
  114. divisible_channel += divisible if divisible_channel < 0.9 * new_channel else 0
  115. return divisible_channel
  116. class MBConv(nn.Module):
  117. """EfficientNet main building blocks
  118. :arg
  119. - c: MBConvConfig instance
  120. - sd_prob: stochastic path probability
  121. """
  122. def __init__(self, c, sd_prob=0.0):
  123. super(MBConv, self).__init__()
  124. inter_channel = c.adjust_channels(c.in_ch, c.expand_ratio)
  125. block = []
  126. if c.expand_ratio == 1:
  127. block.append(('fused', ConvBNAct(c.in_ch, inter_channel, c.kernel, c.stride, 1, c.norm_layer, c.act)))
  128. elif c.fused:
  129. block.append(('fused', ConvBNAct(c.in_ch, inter_channel, c.kernel, c.stride, 1, c.norm_layer, c.act)))
  130. block.append(('fused_point_wise', ConvBNAct(inter_channel, c.out_ch, 1, 1, 1, c.norm_layer, nn.Identity)))
  131. else:
  132. block.append(('linear_bottleneck', ConvBNAct(c.in_ch, inter_channel, 1, 1, 1, c.norm_layer, c.act)))
  133. block.append(('depth_wise', ConvBNAct(inter_channel, inter_channel, c.kernel, c.stride, inter_channel, c.norm_layer, c.act)))
  134. block.append(('se', SEUnit(inter_channel, 4 * c.expand_ratio)))
  135. block.append(('point_wise', ConvBNAct(inter_channel, c.out_ch, 1, 1, 1, c.norm_layer, nn.Identity)))
  136. self.block = nn.Sequential(OrderedDict(block))
  137. self.use_skip_connection = c.stride == 1 and c.in_ch == c.out_ch
  138. self.stochastic_path = StochasticDepth(sd_prob, "row")
  139. def forward(self, x):
  140. out = self.block(x)
  141. if self.use_skip_connection:
  142. out = x + self.stochastic_path(out)
  143. return out
  144. def _make_divisible(v, divisor, min_value=None):
  145. """
  146. This function is taken from the original tf repo.
  147. It ensures that all layers have a channel number that is divisible by 8
  148. It can be seen here:
  149. https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
  150. :param v:
  151. :param divisor:
  152. :param min_value:
  153. :return:
  154. """
  155. if min_value is None:
  156. min_value = divisor
  157. new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
  158. # Make sure that round down does not go down by more than 10%.
  159. if new_v < 0.9 * v:
  160. new_v += divisor
  161. return new_v
  162. class EfficientNetV2(nn.Module):
  163. """Pytorch Implementation of EfficientNetV2
  164. paper: https://arxiv.org/abs/2104.00298
  165. - reference 1 (pytorch): https://github.com/d-li14/efficientnetv2.pytorch/blob/main/effnetv2.py
  166. - reference 2 (official): https://github.com/google/automl/blob/master/efficientnetv2/effnetv2_configs.py
  167. :arg
  168. - layer_infos: list of MBConvConfig
  169. - out_channels: bottleneck channel
  170. - nlcass: number of class
  171. - dropout: dropout probability before classifier layer
  172. - stochastic depth: stochastic depth probability
  173. """
  174. def __init__(self, factor, depth, layer_infos, nclass=0, dropout=0.2, stochastic_depth=0.0,
  175. block=MBConv, act_layer=nn.SiLU, norm_layer=nn.BatchNorm2d):
  176. super(EfficientNetV2, self).__init__()
  177. for layer in layer_infos:
  178. layer.in_ch = _make_divisible(int(layer.in_ch * factor), 8)
  179. layer.out_ch = _make_divisible(int(layer.out_ch * factor), 8)
  180. layer.num_layers = max(1, int(layer.num_layers * depth))
  181. self.layer_infos = layer_infos
  182. self.norm_layer = norm_layer
  183. self.act = act_layer
  184. self.in_channel = layer_infos[0].in_ch
  185. self.final_stage_channel = layer_infos[-1].out_ch
  186. self.cur_block = 0
  187. self.num_block = sum(stage.num_layers for stage in layer_infos)
  188. self.stochastic_depth = stochastic_depth
  189. self.stem = ConvBNAct(3, self.in_channel, 3, 2, 1, self.norm_layer, self.act)
  190. self.blocks = nn.Sequential(*self.make_stages(layer_infos, block))
  191. self.width_list = [i.size(1) for i in self.forward(torch.randn(1, 3, 640, 640))]
  192. def make_stages(self, layer_infos, block):
  193. return [layer for layer_info in layer_infos for layer in self.make_layers(copy.copy(layer_info), block)]
  194. def make_layers(self, layer_info, block):
  195. layers = []
  196. for i in range(layer_info.num_layers):
  197. layers.append(block(layer_info, sd_prob=self.get_sd_prob()))
  198. layer_info.in_ch = layer_info.out_ch
  199. layer_info.stride = 1
  200. return layers
  201. def get_sd_prob(self):
  202. sd_prob = self.stochastic_depth * (self.cur_block / self.num_block)
  203. self.cur_block += 1
  204. return sd_prob
  205. def forward(self, x):
  206. x = self.stem(x)
  207. unique_tensors = {}
  208. for idx, block in enumerate(self.blocks):
  209. x = block(x)
  210. width, height = x.shape[2], x.shape[3]
  211. unique_tensors[(width, height)] = x
  212. result_list = list(unique_tensors.values())[-4:]
  213. return result_list
  214. def efficientnet_v2_init(model):
  215. for m in model.modules():
  216. if isinstance(m, nn.Conv2d):
  217. nn.init.kaiming_normal_(m.weight, mode='fan_out')
  218. if m.bias is not None:
  219. nn.init.zeros_(m.bias)
  220. elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
  221. nn.init.ones_(m.weight)
  222. nn.init.zeros_(m.bias)
  223. elif isinstance(m, nn.Linear):
  224. nn.init.normal_(m.weight, mean=0.0, std=0.01)
  225. nn.init.zeros_(m.bias)
  226. model_urls = {
  227. "efficientnet_v2_s": "https://github.com/hankyul2/EfficientNetV2-pytorch/releases/download/EfficientNetV2-pytorch/efficientnetv2-s.npy",
  228. "efficientnet_v2_m": "https://github.com/hankyul2/EfficientNetV2-pytorch/releases/download/EfficientNetV2-pytorch/efficientnetv2-m.npy",
  229. "efficientnet_v2_l": "https://github.com/hankyul2/EfficientNetV2-pytorch/releases/download/EfficientNetV2-pytorch/efficientnetv2-l.npy",
  230. "efficientnet_v2_s_in21k": "https://github.com/hankyul2/EfficientNetV2-pytorch/releases/download/EfficientNetV2-pytorch/efficientnetv2-s-21k.npy",
  231. "efficientnet_v2_m_in21k": "https://github.com/hankyul2/EfficientNetV2-pytorch/releases/download/EfficientNetV2-pytorch/efficientnetv2-m-21k.npy",
  232. "efficientnet_v2_l_in21k": "https://github.com/hankyul2/EfficientNetV2-pytorch/releases/download/EfficientNetV2-pytorch/efficientnetv2-l-21k.npy",
  233. "efficientnet_v2_xl_in21k": "https://github.com/hankyul2/EfficientNetV2-pytorch/releases/download/EfficientNetV2-pytorch/efficientnetv2-xl-21k.npy",
  234. }
  235. def load_from_zoo(model, model_name, pretrained_path='pretrained/official'):
  236. Path(os.path.join(pretrained_path, model_name)).mkdir(parents=True, exist_ok=True)
  237. file_name = os.path.join(pretrained_path, model_name, os.path.basename(model_urls[model_name]))
  238. load_npy(model, load_npy_from_url(url=model_urls[model_name], file_name=file_name))
  239. def load_npy_from_url(url, file_name):
  240. if not Path(file_name).exists():
  241. subprocess.run(["wget", "-r", "-nc", '-O', file_name, url])
  242. return np.load(file_name, allow_pickle=True).item()
  243. def npz_dim_convertor(name, weight):
  244. weight = torch.from_numpy(weight)
  245. if 'kernel' in name:
  246. if weight.dim() == 4:
  247. if weight.shape[3] == 1:
  248. # depth-wise convolution 'h w in_c out_c -> in_c out_c h w'
  249. weight = torch.permute(weight, (2, 3, 0, 1))
  250. else:
  251. # 'h w in_c out_c -> out_c in_c h w'
  252. weight = torch.permute(weight, (3, 2, 0, 1))
  253. elif weight.dim() == 2:
  254. weight = weight.transpose(1, 0)
  255. elif 'scale' in name or 'bias' in name:
  256. weight = weight.squeeze()
  257. return weight
  258. def load_npy(model, weight):
  259. name_convertor = [
  260. # stem
  261. ('stem.0.weight', 'stem/conv2d/kernel/ExponentialMovingAverage'),
  262. ('stem.1.weight', 'stem/tpu_batch_normalization/gamma/ExponentialMovingAverage'),
  263. ('stem.1.bias', 'stem/tpu_batch_normalization/beta/ExponentialMovingAverage'),
  264. ('stem.1.running_mean', 'stem/tpu_batch_normalization/moving_mean/ExponentialMovingAverage'),
  265. ('stem.1.running_var', 'stem/tpu_batch_normalization/moving_variance/ExponentialMovingAverage'),
  266. # fused layer
  267. ('block.fused.0.weight', 'conv2d/kernel/ExponentialMovingAverage'),
  268. ('block.fused.1.weight', 'tpu_batch_normalization/gamma/ExponentialMovingAverage'),
  269. ('block.fused.1.bias', 'tpu_batch_normalization/beta/ExponentialMovingAverage'),
  270. ('block.fused.1.running_mean', 'tpu_batch_normalization/moving_mean/ExponentialMovingAverage'),
  271. ('block.fused.1.running_var', 'tpu_batch_normalization/moving_variance/ExponentialMovingAverage'),
  272. # linear bottleneck
  273. ('block.linear_bottleneck.0.weight', 'conv2d/kernel/ExponentialMovingAverage'),
  274. ('block.linear_bottleneck.1.weight', 'tpu_batch_normalization/gamma/ExponentialMovingAverage'),
  275. ('block.linear_bottleneck.1.bias', 'tpu_batch_normalization/beta/ExponentialMovingAverage'),
  276. ('block.linear_bottleneck.1.running_mean', 'tpu_batch_normalization/moving_mean/ExponentialMovingAverage'),
  277. ('block.linear_bottleneck.1.running_var', 'tpu_batch_normalization/moving_variance/ExponentialMovingAverage'),
  278. # depth wise layer
  279. ('block.depth_wise.0.weight', 'depthwise_conv2d/depthwise_kernel/ExponentialMovingAverage'),
  280. ('block.depth_wise.1.weight', 'tpu_batch_normalization_1/gamma/ExponentialMovingAverage'),
  281. ('block.depth_wise.1.bias', 'tpu_batch_normalization_1/beta/ExponentialMovingAverage'),
  282. ('block.depth_wise.1.running_mean', 'tpu_batch_normalization_1/moving_mean/ExponentialMovingAverage'),
  283. ('block.depth_wise.1.running_var', 'tpu_batch_normalization_1/moving_variance/ExponentialMovingAverage'),
  284. # se layer
  285. ('block.se.fc1.weight', 'se/conv2d/kernel/ExponentialMovingAverage'), ('block.se.fc1.bias', 'se/conv2d/bias/ExponentialMovingAverage'),
  286. ('block.se.fc2.weight', 'se/conv2d_1/kernel/ExponentialMovingAverage'), ('block.se.fc2.bias', 'se/conv2d_1/bias/ExponentialMovingAverage'),
  287. # point wise layer
  288. ('block.fused_point_wise.0.weight', 'conv2d_1/kernel/ExponentialMovingAverage'),
  289. ('block.fused_point_wise.1.weight', 'tpu_batch_normalization_1/gamma/ExponentialMovingAverage'),
  290. ('block.fused_point_wise.1.bias', 'tpu_batch_normalization_1/beta/ExponentialMovingAverage'),
  291. ('block.fused_point_wise.1.running_mean', 'tpu_batch_normalization_1/moving_mean/ExponentialMovingAverage'),
  292. ('block.fused_point_wise.1.running_var', 'tpu_batch_normalization_1/moving_variance/ExponentialMovingAverage'),
  293. ('block.point_wise.0.weight', 'conv2d_1/kernel/ExponentialMovingAverage'),
  294. ('block.point_wise.1.weight', 'tpu_batch_normalization_2/gamma/ExponentialMovingAverage'),
  295. ('block.point_wise.1.bias', 'tpu_batch_normalization_2/beta/ExponentialMovingAverage'),
  296. ('block.point_wise.1.running_mean', 'tpu_batch_normalization_2/moving_mean/ExponentialMovingAverage'),
  297. ('block.point_wise.1.running_var', 'tpu_batch_normalization_2/moving_variance/ExponentialMovingAverage'),
  298. # head
  299. ('head.bottleneck.0.weight', 'head/conv2d/kernel/ExponentialMovingAverage'),
  300. ('head.bottleneck.1.weight', 'head/tpu_batch_normalization/gamma/ExponentialMovingAverage'),
  301. ('head.bottleneck.1.bias', 'head/tpu_batch_normalization/beta/ExponentialMovingAverage'),
  302. ('head.bottleneck.1.running_mean', 'head/tpu_batch_normalization/moving_mean/ExponentialMovingAverage'),
  303. ('head.bottleneck.1.running_var', 'head/tpu_batch_normalization/moving_variance/ExponentialMovingAverage'),
  304. # classifier
  305. ('head.classifier.weight', 'head/dense/kernel/ExponentialMovingAverage'),
  306. ('head.classifier.bias', 'head/dense/bias/ExponentialMovingAverage'),
  307. ('\\.(\\d+)\\.', lambda x: f'_{int(x.group(1))}/'),
  308. ]
  309. for name, param in list(model.named_parameters()) + list(model.named_buffers()):
  310. for pattern, sub in name_convertor:
  311. name = re.sub(pattern, sub, name)
  312. if 'dense/kernel' in name and list(param.shape) not in [[1000, 1280], [21843, 1280]]:
  313. continue
  314. if 'dense/bias' in name and list(param.shape) not in [[1000], [21843]]:
  315. continue
  316. if 'num_batches_tracked' in name:
  317. continue
  318. param.data.copy_(npz_dim_convertor(name, weight.get(name)))
  319. def efficientnet_v2(model_name='efficientnet_v2_s', factor=0.5, depth=0.5, pretrained=False, nclass=0, dropout=0.1, stochastic_depth=0.2, **kwargs):
  320. residual_config = [MBConvConfig(*layer_config) for layer_config in get_efficientnet_v2_structure(model_name)]
  321. model = EfficientNetV2(factor, depth, residual_config, nclass, dropout=dropout, stochastic_depth=stochastic_depth, block=MBConv, act_layer=nn.SiLU)
  322. efficientnet_v2_init(model)
  323. if pretrained:
  324. load_from_zoo(model, model_name)
  325. return model
  326. if __name__ == "__main__":
  327. # Generating Sample image
  328. image_size = (1, 3, 640, 640)
  329. image = torch.rand(*image_size)
  330. # Model
  331. model = efficientnet_v2('efficientnet_v2_s')
  332. out = model(image)
  333. print(len(out))


四、手把手教你添加EfficientNetV2机制

4.1 修改一

第一步还是建立文件,我们找到如下ultralytics/nn文件夹下建立一个目录名字呢就是'Addmodules'文件夹( 用群内的文件的话已经有了无需新建) !然后在其内部建立一个新的py文件将核心代码复制粘贴进去即可


4.2 修改二

第二步我们在该目录下创建一个新的py文件名字为'__init__.py'( 用群内的文件的话已经有了无需新建) ,然后在其内部导入我们的检测头如下图所示。


4.3 修改三

第三步我门中到如下文件'ultralytics/nn/tasks.py'进行导入和注册我们的模块( 用群内的文件的话已经有了无需重新导入直接开始第四步即可)

从今天开始以后的教程就都统一成这个样子了,因为我默认大家用了我群内的文件来进行修改!!


4.4 修改四

添加如下两行代码!!!


4.5 修改五

找到七百多行大概把具体看图片,按照图片来修改就行,添加红框内的部分,注意没有()只是 函数 名。

  1. elif m in {自行添加对应的模型即可,下面都是一样的}:
  2. m = m(*args)
  3. c2 = m.width_list # 返回通道列表
  4. backbone = True


4.6 修改六

下面的两个红框内都是需要改动的。

  1. if isinstance(c2, list):
  2. m_ = m
  3. m_.backbone = True
  4. else:
  5. m_ = nn.Sequential(*(m(*args) for _ in range(n))) if n > 1 else m(*args) # module
  6. t = str(m)[8:-2].replace('__main__.', '') # module type
  7. m.np = sum(x.numel() for x in m_.parameters()) # number params
  8. m_.i, m_.f, m_.type = i + 4 if backbone else i, f, t # attach index, 'from' index, type


4.7 修改七

如下的也需要修改,全部按照我的来。

代码如下把原先的代码替换了即可。

  1. if verbose:
  2. LOGGER.info(f'{i:>3}{str(f):>20}{n_:>3}{m.np:10.0f} {t:<45}{str(args):<30}') # print
  3. save.extend(x % (i + 4 if backbone else i) for x in ([f] if isinstance(f, int) else f) if x != -1) # append to savelist
  4. layers.append(m_)
  5. if i == 0:
  6. ch = []
  7. if isinstance(c2, list):
  8. ch.extend(c2)
  9. if len(c2) != 5:
  10. ch.insert(0, 0)
  11. else:
  12. ch.append(c2)


4.8 修改八

修改八和前面的都不太一样,需要修改前向传播中的一个部分, 已经离开了parse_model方法了。

可以在图片中开代码行数,没有离开task.py文件都是同一个文件。 同时这个部分有好几个前向传播都很相似,大家不要看错了, 是70多行左右的!!!,同时我后面提供了代码,大家直接复制粘贴即可,有时间我针对这里会出一个视频。

​​

代码如下->

  1. def _predict_once(self, x, profile=False, visualize=False, embed=None):
  2. """
  3. Perform a forward pass through the network.
  4. Args:
  5. x (torch.Tensor): The input tensor to the model.
  6. profile (bool): Print the computation time of each layer if True, defaults to False.
  7. visualize (bool): Save the feature maps of the model if True, defaults to False.
  8. embed (list, optional): A list of feature vectors/embeddings to return.
  9. Returns:
  10. (torch.Tensor): The last output of the model.
  11. """
  12. y, dt, embeddings = [], [], [] # outputs
  13. for m in self.model:
  14. if m.f != -1: # if not from previous layer
  15. x = y[m.f] if isinstance(m.f, int) else [x if j == -1 else y[j] for j in m.f] # from earlier layers
  16. if profile:
  17. self._profile_one_layer(m, x, dt)
  18. if hasattr(m, 'backbone'):
  19. x = m(x)
  20. if len(x) != 5: # 0 - 5
  21. x.insert(0, None)
  22. for index, i in enumerate(x):
  23. if index in self.save:
  24. y.append(i)
  25. else:
  26. y.append(None)
  27. x = x[-1] # 最后一个输出传给下一层
  28. else:
  29. x = m(x) # run
  30. y.append(x if m.i in self.save else None) # save output
  31. if visualize:
  32. feature_visualization(x, m.type, m.i, save_dir=visualize)
  33. if embed and m.i in embed:
  34. embeddings.append(nn.functional.adaptive_avg_pool2d(x, (1, 1)).squeeze(-1).squeeze(-1)) # flatten
  35. if m.i == max(embed):
  36. return torch.unbind(torch.cat(embeddings, 1), dim=0)
  37. return x

到这里就完成了修改部分,但是这里面细节很多,大家千万要注意不要替换多余的代码,导致报错,也不要拉下任何一部,都会导致运行失败,而且报错很难排查!!!很难排查!!!


注意!!! 额外的修改!

关注我的其实都知道,我大部分的修改都是一样的,这个网络需要额外的修改一步,就是s一个参数,将下面的s改为640!!!即可完美运行!!


打印计算量问题解决方案

我们找到如下文件'ultralytics/utils/torch_utils.py'按照如下的图片进行修改,否则容易打印不出来计算量。


注意事项!!!

如果大家在验证的时候报错形状不匹配的错误可以固定验证集的图片尺寸,方法如下 ->

找到下面这个文件ultralytics/ models /yolo/detect/train.py然后其中有一个类是DetectionTrainer class中的build_dataset函数中的一个参数rect=mode == 'val'改为rect=False


五、EfficientNetV2的yaml文件

复制如下yaml文件进行运行!!!


5.1 EfficientNetV2 的yaml文件版本1

此版本训练信息:YOLO11-EfficientNetV2 summary: 559 layers, 2,096,663 parameters, 2,096,647 gradients, 5.3 GFLOPs

使用说明:[-1, 1, efficientnet_v2}, [efficientnet_v2_s, 0.25,0.5]] 参数位置的0.25是通道放缩的系数, YOLOv11N是0.25 YOLOv11S是0.5 YOLOv11M是1. YOLOv11l是1 YOLOv11是1.5大家根据自己训练的YOLO版本设定即可.
#  0.5对应的是模型的深度系数
#  efficientnet_v2_s为模型的版本

# 本文支持版本有efficientnet_v2_s, efficientnet_v2_m, efficientnet_v2_l,  efficientnet_v2_xl

  1. # Ultralytics YOLO 🚀, AGPL-3.0 license
  2. # YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
  3. # Parameters
  4. nc: 80 # number of classes
  5. scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
  6. # [depth, width, max_channels]
  7. n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
  8. s: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
  9. m: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
  10. l: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
  11. x: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs
  12. # 下面 [-1, 1, efficientnet_v2}, [efficientnet_v2_s, 0.250.5]] 参数位置的0.25是通道放缩的系数, YOLOv11N是0.25 YOLOv11S是0.5 YOLOv11M是1. YOLOv11l是1 YOLOv111.5大家根据自己训练的YOLO版本设定即可.
  13. # 0.5对应的是模型的深度系数
  14. # efficientnet_v2_s为模型的版本
  15. # 支持的版本: efficientnet_v2_s, efficientnet_v2_m, efficientnet_v2_l, efficientnet_v2_xl
  16. # YOLO11n backbone
  17. backbone:
  18. # [from, repeats, module, args]
  19. - [-1, 1, efficientnet_v2, [efficientnet_v2_s, 0.25,0.5]] # 0-4 P1/2
  20. - [-1, 1, SPPF, [1024, 5]] # 5
  21. - [-1, 2, C2PSA, [1024]] # 6
  22. # YOLO11n head
  23. head:
  24. - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  25. - [[-1, 3], 1, Concat, [1]] # cat backbone P4
  26. - [-1, 2, C3k2, [512, False]] # 9
  27. - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  28. - [[-1, 2], 1, Concat, [1]] # cat backbone P3
  29. - [-1, 2, C3k2, [256, False]] # 12 (P3/8-small)
  30. - [-1, 1, Conv, [256, 3, 2]]
  31. - [[-1, 9], 1, Concat, [1]] # cat head P4
  32. - [-1, 2, C3k2, [512, False]] # 15 (P4/16-medium)
  33. - [-1, 1, Conv, [512, 3, 2]]
  34. - [[-1, 6], 1, Concat, [1]] # cat head P5
  35. - [-1, 2, C3k2, [1024, True]] # 18 (P5/32-large)
  36. - [[12, 15, 18], 1, Detect, [nc]] # Detect(P3, P4, P5)


5.2 训练文件

  1. import warnings
  2. warnings.filterwarnings('ignore')
  3. from ultralytics import YOLO
  4. if __name__ == '__main__':
  5. model = YOLO('ultralytics/cfg/models/v8/yolov8-C2f-FasterBlock.yaml')
  6. # model.load('yolov8n.pt') # loading pretrain weights
  7. model.train(data=r'替换数据集yaml文件地址',
  8. # 如果大家任务是其它的'ultralytics/cfg/default.yaml'找到这里修改task可以改成detect, segment, classify, pose
  9. cache=False,
  10. imgsz=640,
  11. epochs=150,
  12. single_cls=False, # 是否是单类别检测
  13. batch=4,
  14. close_mosaic=10,
  15. workers=0,
  16. device='0',
  17. optimizer='SGD', # using SGD
  18. # resume='', # 如过想续训就设置last.pt的地址
  19. amp=False, # 如果出现训练损失为Nan可以关闭amp
  20. project='runs/train',
  21. name='exp',
  22. )


六、成功运行记录

下面是成功运行的截图,已经完成了有1个epochs的训练,图片太大截不全第2个epochs,这里改完之后打印出了点问题,但是不影响任何功能,后期我找时间修复一下这个问题。

​​


七、本文总结

到此本文的正式分享内容就结束了,在这里给大家推荐我的YOLOv11改进有效涨点专栏,本专栏目前为新开的平均质量分98分,后期我会根据各种最新的前沿顶会进行论文复现,也会对一些老的改进机制进行补充, 目前本专栏免费阅读(暂时,大家尽早关注不迷路~) ,如果大家觉得本文帮助到你了,订阅本专栏,关注后续更多的更新~

​​​