学习资源站

YOLOv11改进-主干_Backbone篇-反向残差块目标检测网络EMO一种轻量级的CNN架构(支持yolov11全系列轻量化)

一、本文介绍

本文给大家带来的改进机制是 反向残差块网络EMO ,其的构成块iRMB在之前我已经发过了,同时进行了二次创新,本文的网络就是由iRMB组成的网络EMO,所以我们二次创新之后的iEMA也可以用于这个网络中,再次形成二次创新,同时本文的主干网络为一种 轻量级 的CNN架构,在开始之前给大家推荐一下我的专栏,本专栏每周更新3-10篇最新前沿机制 | 包括二次创新全网无重复,以及融合改进, 更有包含我所有的YOLOv11仓库集成文件(文件内集成我所有的改进机制全部注册完毕可以直接运行)和交流群和视频讲解提供给大家, 本文支持yolov11的全系列模型放缩,也就是nsmlx五个版本, 本文内容为个人独家创新,抄袭必究。

欢迎大家订阅我的专栏一起学习YOLO!



二、EMO 模型 原理

论文地址: 官方论文地址

代码地址: 官方代码地址


Efficient MOdel (EMO) 模型基于 反向残差块(Inverted Residual Block, IRB) ,这是一种轻量级 CNN 的基础架构,同时融合了 Transformer的有效组件 。通过这种结合,EMO实现了一个统一的视角来处理轻量级模型的设计,创新地将CNN和注意力机制相结合。此外,EMO模型在各种基准测试中展示出优越的性能,特别是在ImageNet-1K、COCO2017和ADE20K等数据集上的表现。该模型不仅在效率和精度方面取得了平衡,而且在轻量级设计方面实现了突破。

EMO的 基本原理 可以分为以下几个要点:

1. 反向残差块(IRB)的应用: IRB作为轻量级CNN的基础架构,EMO将其扩展到基于注意力的模型。

2. 元移动块(MMB)的抽象化: EMO提出了一种新的轻量级设计方法,即单残差的元移动块(MMB),这是从IRB和 Transformer 的有效 组件 中抽象出的。

3. 现代反向残差移动块(iRMB)的构建: 基于简单但有效的设计标准,EMO推导出了iRMB,并以此构建了类似于ResNet的高效模型(EMO)。

在下面这个图中,我们可以看到 EMO模型的结构细节:

左侧 是一个抽象统一的元移动块(Meta-Mobile Block),它融合了多头自注意力机制(Multi-Head Self-Attention)、前馈网络(Feed-Forward Network)和反向 残差块 (Inverted Residual Block)。这个复合模块通过不同的扩展比率和高效的操作符进行具体化。

右侧 展示了一个类似于ResNet的 EMO模型架构 ,它完全由推导出的iRMB组成。图中突出了EMO模型中微操作组合(如深度可分卷积、窗口Transformer等)和不同尺度的网络层次,这些都是用于分类(CLS)、检测(Det)和分割(Seg)任务的。这种设计强调了EMO模型在处理不同下游任务时的灵活性和效率。


三、EMO的核心代码

EMO的核心代码如下,使用方法看章节四!

  1. from timm.models.layers import trunc_normal_
  2. import math
  3. import torch
  4. import torch.nn as nn
  5. import torch.nn.functional as F
  6. from functools import partial
  7. from einops import rearrange, reduce
  8. from timm.models.layers import DropPath
  9. inplace = True
  10. __all__ = ['EMO_1M', 'EMO_2M', 'EMO_5M', 'EMO_6M']
  11. class SELayerV2(nn.Module):
  12. def __init__(self, in_channel, reduction=1):
  13. super(SELayerV2, self).__init__()
  14. assert in_channel >= reduction and in_channel % reduction == 0, 'invalid in_channel in SaElayer'
  15. self.reduction = reduction
  16. self.cardinality = 4
  17. self.avg_pool = nn.AdaptiveAvgPool2d(1)
  18. # cardinality 1
  19. self.fc1 = nn.Sequential(
  20. nn.Linear(in_channel, in_channel // self.reduction, bias=False),
  21. nn.ReLU(inplace=True)
  22. )
  23. # cardinality 2
  24. self.fc2 = nn.Sequential(
  25. nn.Linear(in_channel, in_channel // self.reduction, bias=False),
  26. nn.ReLU(inplace=True)
  27. )
  28. # cardinality 3
  29. self.fc3 = nn.Sequential(
  30. nn.Linear(in_channel, in_channel // self.reduction, bias=False),
  31. nn.ReLU(inplace=True)
  32. )
  33. # cardinality 4
  34. self.fc4 = nn.Sequential(
  35. nn.Linear(in_channel, in_channel // self.reduction, bias=False),
  36. nn.ReLU(inplace=True)
  37. )
  38. self.fc = nn.Sequential(
  39. nn.Linear(in_channel // self.reduction * self.cardinality, in_channel, bias=False),
  40. nn.Sigmoid()
  41. )
  42. def forward(self, x):
  43. b, c, _, _ = x.size()
  44. y = self.avg_pool(x).view(b, c)
  45. y1 = self.fc1(y)
  46. y2 = self.fc2(y)
  47. y3 = self.fc3(y)
  48. y4 = self.fc4(y)
  49. y_concate = torch.cat([y1, y2, y3, y4], dim=1)
  50. y_ex_dim = self.fc(y_concate).view(b, c, 1, 1)
  51. return x * y_ex_dim.expand_as(x)
  52. def get_act(act_layer='relu'):
  53. act_dict = {
  54. 'none': nn.Identity,
  55. 'relu': nn.ReLU,
  56. 'relu6': nn.ReLU6,
  57. 'silu': nn.SiLU,
  58. 'gelu': nn.GELU
  59. }
  60. return act_dict[act_layer]
  61. class LayerNorm2d(nn.Module):
  62. def __init__(self, normalized_shape, eps=1e-6, elementwise_affine=True):
  63. super().__init__()
  64. self.norm = nn.LayerNorm(normalized_shape, eps, elementwise_affine)
  65. def forward(self, x):
  66. x = rearrange(x, 'b c h w -> b h w c').contiguous()
  67. x = self.norm(x)
  68. x = rearrange(x, 'b h w c -> b c h w').contiguous()
  69. return x
  70. def get_norm(norm_layer='in_1d'):
  71. eps = 1e-6
  72. norm_dict = {
  73. 'none': nn.Identity,
  74. 'in_1d': partial(nn.InstanceNorm1d, eps=eps),
  75. 'in_2d': partial(nn.InstanceNorm2d, eps=eps),
  76. 'in_3d': partial(nn.InstanceNorm3d, eps=eps),
  77. 'bn_1d': partial(nn.BatchNorm1d, eps=eps),
  78. 'bn_2d': partial(nn.BatchNorm2d, eps=eps),
  79. # 'bn_2d': partial(nn.SyncBatchNorm, eps=eps),
  80. 'bn_3d': partial(nn.BatchNorm3d, eps=eps),
  81. 'gn': partial(nn.GroupNorm, eps=eps),
  82. 'ln_1d': partial(nn.LayerNorm, eps=eps),
  83. 'ln_2d': partial(LayerNorm2d, eps=eps),
  84. }
  85. return norm_dict[norm_layer]
  86. class LayerScale(nn.Module):
  87. def __init__(self, dim, init_values=1e-5, inplace=True):
  88. super().__init__()
  89. self.inplace = inplace
  90. self.gamma = nn.Parameter(init_values * torch.ones(1, 1, dim))
  91. def forward(self, x):
  92. return x.mul_(self.gamma) if self.inplace else x * self.gamma
  93. class LayerScale2D(nn.Module):
  94. def __init__(self, dim, init_values=1e-5, inplace=True):
  95. super().__init__()
  96. self.inplace = inplace
  97. self.gamma = nn.Parameter(init_values * torch.ones(1, dim, 1, 1))
  98. def forward(self, x):
  99. return x.mul_(self.gamma) if self.inplace else x * self.gamma
  100. class ConvNormAct(nn.Module):
  101. def __init__(self, dim_in, dim_out, kernel_size, stride=1, dilation=1, groups=1, bias=False,
  102. skip=False, norm_layer='bn_2d', act_layer='relu', inplace=True, drop_path_rate=0.):
  103. super(ConvNormAct, self).__init__()
  104. self.has_skip = skip and dim_in == dim_out
  105. padding = math.ceil((kernel_size - stride) / 2)
  106. self.conv = nn.Conv2d(dim_in, dim_out, kernel_size, stride, padding, dilation, groups, bias)
  107. self.norm = get_norm(norm_layer)(dim_out)
  108. self.act = nn.GELU()
  109. self.drop_path = DropPath(drop_path_rate) if drop_path_rate else nn.Identity()
  110. def forward(self, x):
  111. shortcut = x
  112. x = self.conv(x)
  113. x = self.norm(x)
  114. x = self.act(x)
  115. if self.has_skip:
  116. x = self.drop_path(x) + shortcut
  117. return x
  118. # ========== Multi-Scale Populations, for down-sampling and inductive bias ==========
  119. class MSPatchEmb(nn.Module):
  120. def __init__(self, dim_in, emb_dim, kernel_size=2, c_group=-1, stride=1, dilations=[1, 2, 3],
  121. norm_layer='bn_2d', act_layer='silu'):
  122. super().__init__()
  123. self.dilation_num = len(dilations)
  124. assert dim_in % c_group == 0
  125. c_group = math.gcd(dim_in, emb_dim) if c_group == -1 else c_group
  126. self.convs = nn.ModuleList()
  127. for i in range(len(dilations)):
  128. padding = math.ceil(((kernel_size - 1) * dilations[i] + 1 - stride) / 2)
  129. self.convs.append(nn.Sequential(
  130. nn.Conv2d(dim_in, emb_dim, kernel_size, stride, padding, dilations[i], groups=c_group),
  131. get_norm(norm_layer)(emb_dim),
  132. get_act(act_layer)(emb_dim)))
  133. def forward(self, x):
  134. if self.dilation_num == 1:
  135. x = self.convs[0](x)
  136. else:
  137. x = torch.cat([self.convs[i](x).unsqueeze(dim=-1) for i in range(self.dilation_num)], dim=-1)
  138. x = reduce(x, 'b c h w n -> b c h w', 'mean').contiguous()
  139. return x
  140. class iRMB(nn.Module):
  141. def __init__(self, dim_in, dim_out, norm_in=True, has_skip=True, exp_ratio=1.0, norm_layer='bn_2d',
  142. act_layer='relu', v_proj=True, dw_ks=3, stride=1, dilation=1, se_ratio=0.0, dim_head=64, window_size=7,
  143. attn_s=True, qkv_bias=False, attn_drop=0., drop=0., drop_path=0., v_group=False, attn_pre=False):
  144. super().__init__()
  145. self.norm = get_norm(norm_layer)(dim_in) if norm_in else nn.Identity()
  146. dim_mid = int(dim_in * exp_ratio)
  147. self.has_skip = (dim_in == dim_out and stride == 1) and has_skip
  148. self.attn_s = attn_s
  149. if self.attn_s:
  150. assert dim_in % dim_head == 0, 'dim should be divisible by num_heads'
  151. self.dim_head = dim_head
  152. self.window_size = window_size
  153. self.num_head = dim_in // dim_head
  154. self.scale = self.dim_head ** -0.5
  155. self.attn_pre = attn_pre
  156. self.qk = ConvNormAct(dim_in, int(dim_in * 2), kernel_size=1, bias=qkv_bias, norm_layer='none',
  157. act_layer='none')
  158. self.v = ConvNormAct(dim_in, dim_mid, kernel_size=1, groups=self.num_head if v_group else 1, bias=qkv_bias,
  159. norm_layer='none', act_layer=act_layer, inplace=inplace)
  160. self.attn_drop = nn.Dropout(attn_drop)
  161. else:
  162. if v_proj:
  163. self.v = ConvNormAct(dim_in, dim_mid, kernel_size=1, bias=qkv_bias, norm_layer='none',
  164. act_layer=act_layer, inplace=inplace)
  165. else:
  166. self.v = nn.Identity()
  167. self.conv_local = ConvNormAct(dim_mid, dim_mid, kernel_size=dw_ks, stride=stride, dilation=dilation,
  168. groups=dim_mid, norm_layer='bn_2d', act_layer='silu', inplace=inplace)
  169. self.se = SELayerV2(dim_mid)
  170. self.proj_drop = nn.Dropout(drop)
  171. self.proj = ConvNormAct(dim_mid, dim_out, kernel_size=1, norm_layer='none', act_layer='none', inplace=inplace)
  172. self.drop_path = DropPath(drop_path) if drop_path else nn.Identity()
  173. def forward(self, x):
  174. shortcut = x
  175. x = self.norm(x)
  176. B, C, H, W = x.shape
  177. if self.attn_s:
  178. # padding
  179. if self.window_size <= 0:
  180. window_size_W, window_size_H = W, H
  181. else:
  182. window_size_W, window_size_H = self.window_size, self.window_size
  183. pad_l, pad_t = 0, 0
  184. pad_r = (window_size_W - W % window_size_W) % window_size_W
  185. pad_b = (window_size_H - H % window_size_H) % window_size_H
  186. x = F.pad(x, (pad_l, pad_r, pad_t, pad_b, 0, 0,))
  187. n1, n2 = (H + pad_b) // window_size_H, (W + pad_r) // window_size_W
  188. x = rearrange(x, 'b c (h1 n1) (w1 n2) -> (b n1 n2) c h1 w1', n1=n1, n2=n2).contiguous()
  189. # attention
  190. b, c, h, w = x.shape
  191. qk = self.qk(x)
  192. qk = rearrange(qk, 'b (qk heads dim_head) h w -> qk b heads (h w) dim_head', qk=2, heads=self.num_head,
  193. dim_head=self.dim_head).contiguous()
  194. q, k = qk[0], qk[1]
  195. attn_spa = (q @ k.transpose(-2, -1)) * self.scale
  196. attn_spa = attn_spa.softmax(dim=-1)
  197. attn_spa = self.attn_drop(attn_spa)
  198. if self.attn_pre:
  199. x = rearrange(x, 'b (heads dim_head) h w -> b heads (h w) dim_head', heads=self.num_head).contiguous()
  200. x_spa = attn_spa @ x
  201. x_spa = rearrange(x_spa, 'b heads (h w) dim_head -> b (heads dim_head) h w', heads=self.num_head, h=h,
  202. w=w).contiguous()
  203. x_spa = self.v(x_spa)
  204. else:
  205. v = self.v(x)
  206. v = rearrange(v, 'b (heads dim_head) h w -> b heads (h w) dim_head', heads=self.num_head).contiguous()
  207. x_spa = attn_spa @ v
  208. x_spa = rearrange(x_spa, 'b heads (h w) dim_head -> b (heads dim_head) h w', heads=self.num_head, h=h,
  209. w=w).contiguous()
  210. # unpadding
  211. x = rearrange(x_spa, '(b n1 n2) c h1 w1 -> b c (h1 n1) (w1 n2)', n1=n1, n2=n2).contiguous()
  212. if pad_r > 0 or pad_b > 0:
  213. x = x[:, :, :H, :W].contiguous()
  214. else:
  215. x = self.v(x)
  216. x = x + self.se(self.conv_local(x)) if self.has_skip else self.se(self.conv_local(x))
  217. x = self.proj_drop(x)
  218. x = self.proj(x)
  219. x = (shortcut + self.drop_path(x)) if self.has_skip else x
  220. return x
  221. class EMO(nn.Module):
  222. def __init__(self, dim_in=3,factor=1,
  223. depths=[1, 2, 4, 2], stem_dim=16, embed_dims=[64, 128, 256, 512], exp_ratios=[4., 4., 4., 4.],
  224. norm_layers=['bn_2d', 'bn_2d', 'bn_2d', 'bn_2d'], act_layers=['relu', 'relu', 'relu', 'relu'],
  225. dw_kss=[3, 3, 5, 5], se_ratios=[0.0, 0.0, 0.0, 0.0], dim_heads=[32, 32, 32, 32],
  226. window_sizes=[7, 7, 7, 7], attn_ss=[False, False, True, True], qkv_bias=True,
  227. attn_drop=0., drop=0., drop_path=0., v_group=False, attn_pre=False, pre_dim=0):
  228. super().__init__()
  229. # 放缩系数
  230. scale_factor = factor # 比如放大 1.5
  231. # exp_ratios放缩比例不缩放
  232. # 放缩后的 embed_dims,每个元素都被乘以 scale_factor 并转化为整形
  233. embed_dims = [int(dim * scale_factor) for dim in embed_dims]
  234. dprs = [x.item() for x in torch.linspace(0, drop_path, sum(depths))]
  235. self.stage0 = nn.ModuleList([
  236. MSPatchEmb( # down to 112
  237. dim_in, stem_dim, kernel_size=dw_kss[0], c_group=1, stride=2, dilations=[1],
  238. norm_layer=norm_layers[0], act_layer='none'),
  239. iRMB( # ds
  240. stem_dim, stem_dim, norm_in=False, has_skip=False, exp_ratio=1,
  241. norm_layer=norm_layers[0], act_layer=act_layers[0], v_proj=False, dw_ks=dw_kss[0],
  242. stride=1, dilation=1, se_ratio=1,
  243. dim_head=dim_heads[0], window_size=window_sizes[0], attn_s=False,
  244. qkv_bias=qkv_bias, attn_drop=attn_drop, drop=drop, drop_path=0.,
  245. attn_pre=attn_pre
  246. )
  247. ])
  248. emb_dim_pre = stem_dim
  249. for i in range(len(depths)):
  250. layers = []
  251. dpr = dprs[sum(depths[:i]):sum(depths[:i + 1])]
  252. for j in range(depths[i]):
  253. if j == 0:
  254. stride, has_skip, attn_s, exp_ratio = 2, False, False, exp_ratios[i] * 2
  255. else:
  256. stride, has_skip, attn_s, exp_ratio = 1, True, attn_ss[i], exp_ratios[i]
  257. layers.append(iRMB(
  258. emb_dim_pre, embed_dims[i], norm_in=True, has_skip=has_skip, exp_ratio=exp_ratio,
  259. norm_layer=norm_layers[i], act_layer=act_layers[i], v_proj=True, dw_ks=dw_kss[i],
  260. stride=stride, dilation=1, se_ratio=se_ratios[i],
  261. dim_head=dim_heads[i], window_size=window_sizes[i], attn_s=attn_s,
  262. qkv_bias=qkv_bias, attn_drop=attn_drop, drop=drop, drop_path=dpr[j], v_group=v_group,
  263. attn_pre=attn_pre
  264. ))
  265. emb_dim_pre = embed_dims[i]
  266. self.__setattr__(f'stage{i + 1}', nn.ModuleList(layers))
  267. self.norm = get_norm(norm_layers[-1])(embed_dims[-1])
  268. if pre_dim > 0:
  269. self.pre_head = nn.Sequential(nn.Linear(embed_dims[-1], pre_dim), get_act(act_layers[-1])(inplace=inplace))
  270. self.pre_dim = pre_dim
  271. else:
  272. self.pre_head = nn.Identity()
  273. self.pre_dim = embed_dims[-1]
  274. self.apply(self._init_weights)
  275. self.width_list = [i.size(1) for i in self.forward(torch.randn(1, 3, 640, 640))]
  276. def _init_weights(self, m):
  277. if isinstance(m, nn.Linear):
  278. trunc_normal_(m.weight, std=.02)
  279. if m.bias is not None:
  280. nn.init.zeros_(m.bias)
  281. elif isinstance(m, (nn.LayerNorm, nn.GroupNorm,
  282. nn.BatchNorm1d, nn.BatchNorm2d, nn.BatchNorm3d,
  283. nn.InstanceNorm1d, nn.InstanceNorm2d, nn.InstanceNorm3d)):
  284. nn.init.zeros_(m.bias)
  285. nn.init.ones_(m.weight)
  286. @torch.jit.ignore
  287. def no_weight_decay(self):
  288. return {'token'}
  289. @torch.jit.ignore
  290. def no_weight_decay_keywords(self):
  291. return {'alpha', 'gamma', 'beta'}
  292. @torch.jit.ignore
  293. def no_ft_keywords(self):
  294. # return {'head.weight', 'head.bias'}
  295. return {}
  296. @torch.jit.ignore
  297. def ft_head_keywords(self):
  298. return {'head.weight', 'head.bias'}, self.num_classes
  299. def get_classifier(self):
  300. return self.head
  301. def reset_classifier(self, num_classes):
  302. self.num_classes = num_classes
  303. self.head = nn.Linear(self.pre_dim, num_classes) if num_classes > 0 else nn.Identity()
  304. def check_bn(self):
  305. for name, m in self.named_modules():
  306. if isinstance(m, nn.modules.batchnorm._NormBase):
  307. m.running_mean = torch.nan_to_num(m.running_mean, nan=0, posinf=1, neginf=-1)
  308. m.running_var = torch.nan_to_num(m.running_var, nan=0, posinf=1, neginf=-1)
  309. def forward(self, x):
  310. unique_tensors = {}
  311. for blk in self.stage0:
  312. x = blk(x)
  313. width, height = x.shape[2], x.shape[3]
  314. unique_tensors[(width, height)] = x
  315. for blk in self.stage1:
  316. x = blk(x)
  317. width, height = x.shape[2], x.shape[3]
  318. unique_tensors[(width, height)] = x
  319. for blk in self.stage2:
  320. x = blk(x)
  321. width, height = x.shape[2], x.shape[3]
  322. unique_tensors[(width, height)] = x
  323. for blk in self.stage3:
  324. x = blk(x)
  325. width, height = x.shape[2], x.shape[3]
  326. unique_tensors[(width, height)] = x
  327. for blk in self.stage4:
  328. x = blk(x)
  329. width, height = x.shape[2], x.shape[3]
  330. unique_tensors[(width, height)] = x
  331. result_list = list(unique_tensors.values())[-4:]
  332. return result_list
  333. def EMO_1M(factor=1):
  334. model = EMO(
  335. factor=factor,
  336. depths=[2, 2, 8, 3], stem_dim=24, embed_dims=[32, 48, 80, 168], exp_ratios=[2., 2.5, 3.0, 3.5],
  337. norm_layers=['bn_2d', 'bn_2d', 'ln_2d', 'ln_2d'], act_layers=['silu', 'silu', 'gelu', 'gelu'],
  338. dw_kss=[3, 3, 5, 5], dim_heads=[16, 16, 20, 21], window_sizes=[7, 7, 7, 7], attn_ss=[False, False, True, True],
  339. qkv_bias=True, attn_drop=0., drop=0., drop_path=0.04036, v_group=False, attn_pre=True, pre_dim=0)
  340. return model
  341. def EMO_2M(factor=1):
  342. model = EMO(
  343. factor=factor,
  344. depths=[3, 3, 9, 3], stem_dim=24, embed_dims=[32, 48, 120, 200], exp_ratios=[2., 2.5, 3.0, 3.5],
  345. norm_layers=['bn_2d', 'bn_2d', 'ln_2d', 'ln_2d'], act_layers=['silu', 'silu', 'gelu', 'gelu'],
  346. dw_kss=[3, 3, 5, 5], dim_heads=[16, 16, 20, 20], window_sizes=[7, 7, 7, 7], attn_ss=[False, False, True, True],
  347. qkv_bias=True, attn_drop=0., drop=0., drop_path=0.05, v_group=False, attn_pre=True, pre_dim=0)
  348. return model
  349. def EMO_5M(factor=1):
  350. model = EMO(
  351. factor=factor,
  352. depths=[3, 3, 9, 3], stem_dim=24, embed_dims=[48, 72, 160, 288], exp_ratios=[2., 3., 4., 4.],
  353. norm_layers=['bn_2d', 'bn_2d', 'ln_2d', 'ln_2d'], act_layers=['silu', 'silu', 'gelu', 'gelu'],
  354. dw_kss=[3, 3, 5, 5], dim_heads=[24, 24, 32, 32], window_sizes=[7, 7, 7, 7], attn_ss=[False, False, True, True],
  355. qkv_bias=True, attn_drop=0., drop=0., drop_path=0.05, v_group=False, attn_pre=True, pre_dim=0)
  356. return model
  357. def EMO_6M(factor=1):
  358. model = EMO(
  359. factor=factor,
  360. depths=[3, 3, 9, 3], stem_dim=24, embed_dims=[48, 72, 160, 320], exp_ratios=[2., 3., 4., 5.],
  361. norm_layers=['bn_2d', 'bn_2d', 'ln_2d', 'ln_2d'], act_layers=['silu', 'silu', 'gelu', 'gelu'],
  362. dw_kss=[3, 3, 5, 5], dim_heads=[16, 24, 20, 32], window_sizes=[7, 7, 7, 7], attn_ss=[False, False, True, True],
  363. qkv_bias=True, attn_drop=0., drop=0., drop_path=0.05, v_group=False, attn_pre=True, pre_dim=0)
  364. return model
  365. if __name__ == "__main__":
  366. # Generating Sample image
  367. image_size = (1, 3, 640, 640)
  368. image = torch.rand(*image_size)
  369. # Model
  370. model = EMO_6M()
  371. out = model(image)
  372. print(len(out))

四、手把手教你添加EMO

4.1 修改一

第一步还是建立文件,我们找到如下ultralytics/nn/modules文件夹下建立一个目录名字呢就是'Addmodules'文件夹( 用群内的文件的话已经有了无需新建) !然后在其内部建立一个新的py文件将核心代码复制粘贴进去即可


4.2 修改二

第二步我们在该目录下创建一个新的py文件名字为'__init__.py'( 用群内的文件的话已经有了无需新建) ,然后在其内部导入我们的检测头如下图所示。


4.3 修改三

第三步我门中到如下文件'ultralytics/nn/tasks.py'进行导入和注册我们的模块( 用群内的文件的话已经有了无需重新导入直接开始第四步即可)

从今天开始以后的教程就都统一成这个样子了,因为我默认大家用了我群内的文件来进行修改!!


4.4 修改四

添加如下两行代码!!!


4.5 修改五

找到七百多行大概把具体看图片,按照图片来修改就行,添加红框内的部分,注意没有()只是 函数 名。

  1. elif m in {自行添加对应的模型即可,下面都是一样的}:
  2. m = m(*args)
  3. c2 = m.width_list # 返回通道列表
  4. backbone = True


4.6 修改六

下面的两个红框内都是需要改动的。

  1. if isinstance(c2, list):
  2. m_ = m
  3. m_.backbone = True
  4. else:
  5. m_ = nn.Sequential(*(m(*args) for _ in range(n))) if n > 1 else m(*args) # module
  6. t = str(m)[8:-2].replace('__main__.', '') # module type
  7. m.np = sum(x.numel() for x in m_.parameters()) # number params
  8. m_.i, m_.f, m_.type = i + 4 if backbone else i, f, t # attach index, 'from' index, type


4.7 修改七

如下的也需要修改,全部按照我的来。

代码如下把原先的代码替换了即可。

  1. if verbose:
  2. LOGGER.info(f'{i:>3}{str(f):>20}{n_:>3}{m.np:10.0f} {t:<45}{str(args):<30}') # print
  3. save.extend(x % (i + 4 if backbone else i) for x in ([f] if isinstance(f, int) else f) if x != -1) # append to savelist
  4. layers.append(m_)
  5. if i == 0:
  6. ch = []
  7. if isinstance(c2, list):
  8. ch.extend(c2)
  9. if len(c2) != 5:
  10. ch.insert(0, 0)
  11. else:
  12. ch.append(c2)


4.8 修改八

修改八和前面的都不太一样,需要修改前向传播中的一个部分, 已经离开了parse_model方法了。

可以在图片中开代码行数,没有离开task.py文件都是同一个文件。 同时这个部分有好几个前向传播都很相似,大家不要看错了, 是70多行左右的!!!,同时我后面提供了代码,大家直接复制粘贴即可,有时间我针对这里会出一个视频。

​​

代码如下->

  1. def _predict_once(self, x, profile=False, visualize=False, embed=None):
  2. """
  3. Perform a forward pass through the network.
  4. Args:
  5. x (torch.Tensor): The input tensor to the model.
  6. profile (bool): Print the computation time of each layer if True, defaults to False.
  7. visualize (bool): Save the feature maps of the model if True, defaults to False.
  8. embed (list, optional): A list of feature vectors/embeddings to return.
  9. Returns:
  10. (torch.Tensor): The last output of the model.
  11. """
  12. y, dt, embeddings = [], [], [] # outputs
  13. for m in self.model:
  14. if m.f != -1: # if not from previous layer
  15. x = y[m.f] if isinstance(m.f, int) else [x if j == -1 else y[j] for j in m.f] # from earlier layers
  16. if profile:
  17. self._profile_one_layer(m, x, dt)
  18. if hasattr(m, 'backbone'):
  19. x = m(x)
  20. if len(x) != 5: # 0 - 5
  21. x.insert(0, None)
  22. for index, i in enumerate(x):
  23. if index in self.save:
  24. y.append(i)
  25. else:
  26. y.append(None)
  27. x = x[-1] # 最后一个输出传给下一层
  28. else:
  29. x = m(x) # run
  30. y.append(x if m.i in self.save else None) # save output
  31. if visualize:
  32. feature_visualization(x, m.type, m.i, save_dir=visualize)
  33. if embed and m.i in embed:
  34. embeddings.append(nn.functional.adaptive_avg_pool2d(x, (1, 1)).squeeze(-1).squeeze(-1)) # flatten
  35. if m.i == max(embed):
  36. return torch.unbind(torch.cat(embeddings, 1), dim=0)
  37. return x

到这里就完成了修改部分,但是这里面细节很多,大家千万要注意不要替换多余的代码,导致报错,也不要拉下任何一部,都会导致运行失败,而且报错很难排查!!!很难排查!!!


注意!!! 额外的修改!

关注我的其实都知道,我大部分的修改都是一样的,这个网络需要额外的修改一步,就是s一个参数,将下面的s改为640!!!即可完美运行!!


打印计算量问题解决方案

我们找到如下文件'ultralytics/utils/torch_utils.py'按照如下的图片进行修改,否则容易打印不出来计算量。


注意事项!!!

如果大家在验证的时候报错形状不匹配的错误可以固定 验证集 的图片尺寸,方法如下 ->

找到下面这个文件ultralytics/ models /yolo/detect/train.py然后其中有一个类是DetectionTrainer class中的build_dataset函数中的一个参数rect=mode == 'val'改为rect=False


五、EMO的yaml文件

5.1 EMO的yaml文件

训练信息:YOLO11-EMO summary: 860 layers, 2,423,567 parameters, 2,423,551 gradients, 6.5 GFLOPs

  1. # Ultralytics YOLO 🚀, AGPL-3.0 license
  2. # YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
  3. # Parameters
  4. nc: 80 # number of classes
  5. scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
  6. # [depth, width, max_channels]
  7. n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
  8. s: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
  9. m: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
  10. l: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
  11. x: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs
  12. # 我提供了版本分别是对应是 ['EMO_1M', 'EMO_2M', 'EMO_5M', 'EMO_6M']
  13. # 其中n是对应yolo的版本通道放缩 large 和 small 是模型官方本身自带的版本
  14. # YOLO11n backbone
  15. backbone:
  16. # [from, repeats, module, args]
  17. - [-1, 1, EMO_1M, [0.25]] # 0-4 P1/2 这里是四层大家不要被yaml文件限制住了思维,不会画图进群看视频.
  18. # 注意args位置的参数对应模型的通道放缩系数width在上面scales位置, 假设你用yolov11n那么可以设置0.25 如果你用yolov11s可以设置0.5
  19. - [-1, 1, SPPF, [1024, 5]] # 5
  20. - [-1, 2, C2PSA, [1024]] # 6
  21. # YOLO11n head
  22. head:
  23. - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  24. - [[-1, 3], 1, Concat, [1]] # cat backbone P4
  25. - [-1, 2, C3k2, [512, False]] # 9
  26. - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
  27. - [[-1, 2], 1, Concat, [1]] # cat backbone P3
  28. - [-1, 2, C3k2, [256, False]] # 12 (P3/8-small)
  29. - [-1, 1, Conv, [256, 3, 2]]
  30. - [[-1, 9], 1, Concat, [1]] # cat head P4
  31. - [-1, 2, C3k2, [512, False]] # 15 (P4/16-medium)
  32. - [-1, 1, Conv, [512, 3, 2]]
  33. - [[-1, 6], 1, Concat, [1]] # cat head P5
  34. - [-1, 2, C3k2, [1024, True]] # 18 (P5/32-large)
  35. - [[12, 15, 18], 1, Detect, [nc]] # Detect(P3, P4, P5)


5.2 训练文件的代码

可以复制我的运行文件进行运行。

  1. import warnings
  2. warnings.filterwarnings('ignore')
  3. from ultralytics import YOLO
  4. if __name__ == '__main__':
  5. model = YOLO('yolov8-MLLA.yaml')
  6. # 如何切换模型版本, 上面的ymal文件可以改为 yolov8s.yaml就是使用的v8s,
  7. # 类似某个改进的yaml文件名称为yolov8-XXX.yaml那么如果想使用其它版本就把上面的名称改为yolov8l-XXX.yaml即可(改的是上面YOLO中间的名字不是配置文件的)!
  8. # model.load('yolov8n.pt') # 是否加载预训练权重,科研不建议大家加载否则很难提升精度
  9. model.train(data=r"C:\Users\Administrator\PycharmProjects\yolov5-master\yolov5-master\Construction Site Safety.v30-raw-images_latestversion.yolov8\data.yaml",
  10. # 如果大家任务是其它的'ultralytics/cfg/default.yaml'找到这里修改task可以改成detect, segment, classify, pose
  11. cache=False,
  12. imgsz=640,
  13. epochs=150,
  14. single_cls=False, # 是否是单类别检测
  15. batch=16,
  16. close_mosaic=0,
  17. workers=0,
  18. device='0',
  19. optimizer='SGD', # using SGD
  20. # resume='runs/train/exp21/weights/last.pt', # 如过想续训就设置last.pt的地址
  21. amp=True, # 如果出现训练损失为Nan可以关闭amp
  22. project='runs/train',
  23. name='exp',
  24. )


六、成功运行记录

下面是成功运行的截图,已经完成了有1个epochs的训练,图片太大截不全第2个epochs了。


七、本文总结

到此本文的正式分享内容就结束了,在这里给大家推荐我的YOLOv11改进有效涨点专栏,本专栏目前为新开的平均质量分98分,后期我会根据各种最新的前沿顶会进行论文复现,也会对一些老的改进机制进行补充 如果大家觉得本文帮助到你了,订阅本专栏,关注后续更多的更新~

​​