一、本文介绍
本文给大家带来的改进机制是 反向残差块网络EMO ,其的构成块iRMB在之前我已经发过了,同时进行了二次创新,本文的网络就是由iRMB组成的网络EMO,所以我们二次创新之后的iEMA也可以用于这个网络中,再次形成二次创新,同时本文的主干网络为一种 轻量级 的CNN架构,在开始之前给大家推荐一下我的专栏,本专栏每周更新3-10篇最新前沿机制 | 包括二次创新全网无重复,以及融合改进, 更有包含我所有的YOLOv11仓库集成文件(文件内集成我所有的改进机制全部注册完毕可以直接运行)和交流群和视频讲解提供给大家, 本文支持yolov11的全系列模型放缩,也就是nsmlx五个版本, 本文内容为个人独家创新,抄袭必究。
欢迎大家订阅我的专栏一起学习YOLO!
二、EMO 模型 原理
论文地址: 官方论文地址
代码地址: 官方代码地址
Efficient MOdel (EMO) 模型基于 反向残差块(Inverted Residual Block, IRB) ,这是一种轻量级 CNN 的基础架构,同时融合了 Transformer的有效组件 。通过这种结合,EMO实现了一个统一的视角来处理轻量级模型的设计,创新地将CNN和注意力机制相结合。此外,EMO模型在各种基准测试中展示出优越的性能,特别是在ImageNet-1K、COCO2017和ADE20K等数据集上的表现。该模型不仅在效率和精度方面取得了平衡,而且在轻量级设计方面实现了突破。
EMO的 基本原理 可以分为以下几个要点:
1. 反向残差块(IRB)的应用: IRB作为轻量级CNN的基础架构,EMO将其扩展到基于注意力的模型。
2. 元移动块(MMB)的抽象化: EMO提出了一种新的轻量级设计方法,即单残差的元移动块(MMB),这是从IRB和 Transformer 的有效 组件 中抽象出的。
3. 现代反向残差移动块(iRMB)的构建: 基于简单但有效的设计标准,EMO推导出了iRMB,并以此构建了类似于ResNet的高效模型(EMO)。
在下面这个图中,我们可以看到 EMO模型的结构细节:
左侧 是一个抽象统一的元移动块(Meta-Mobile Block),它融合了多头自注意力机制(Multi-Head Self-Attention)、前馈网络(Feed-Forward Network)和反向 残差块 (Inverted Residual Block)。这个复合模块通过不同的扩展比率和高效的操作符进行具体化。
右侧 展示了一个类似于ResNet的 EMO模型架构 ,它完全由推导出的iRMB组成。图中突出了EMO模型中微操作组合(如深度可分卷积、窗口Transformer等)和不同尺度的网络层次,这些都是用于分类(CLS)、检测(Det)和分割(Seg)任务的。这种设计强调了EMO模型在处理不同下游任务时的灵活性和效率。
三、EMO的核心代码
EMO的核心代码如下,使用方法看章节四!
- from timm.models.layers import trunc_normal_
- import math
- import torch
- import torch.nn as nn
- import torch.nn.functional as F
- from functools import partial
- from einops import rearrange, reduce
- from timm.models.layers import DropPath
- inplace = True
- __all__ = ['EMO_1M', 'EMO_2M', 'EMO_5M', 'EMO_6M']
- class SELayerV2(nn.Module):
- def __init__(self, in_channel, reduction=1):
- super(SELayerV2, self).__init__()
- assert in_channel >= reduction and in_channel % reduction == 0, 'invalid in_channel in SaElayer'
- self.reduction = reduction
- self.cardinality = 4
- self.avg_pool = nn.AdaptiveAvgPool2d(1)
- # cardinality 1
- self.fc1 = nn.Sequential(
- nn.Linear(in_channel, in_channel // self.reduction, bias=False),
- nn.ReLU(inplace=True)
- )
- # cardinality 2
- self.fc2 = nn.Sequential(
- nn.Linear(in_channel, in_channel // self.reduction, bias=False),
- nn.ReLU(inplace=True)
- )
- # cardinality 3
- self.fc3 = nn.Sequential(
- nn.Linear(in_channel, in_channel // self.reduction, bias=False),
- nn.ReLU(inplace=True)
- )
- # cardinality 4
- self.fc4 = nn.Sequential(
- nn.Linear(in_channel, in_channel // self.reduction, bias=False),
- nn.ReLU(inplace=True)
- )
- self.fc = nn.Sequential(
- nn.Linear(in_channel // self.reduction * self.cardinality, in_channel, bias=False),
- nn.Sigmoid()
- )
- def forward(self, x):
- b, c, _, _ = x.size()
- y = self.avg_pool(x).view(b, c)
- y1 = self.fc1(y)
- y2 = self.fc2(y)
- y3 = self.fc3(y)
- y4 = self.fc4(y)
- y_concate = torch.cat([y1, y2, y3, y4], dim=1)
- y_ex_dim = self.fc(y_concate).view(b, c, 1, 1)
- return x * y_ex_dim.expand_as(x)
- def get_act(act_layer='relu'):
- act_dict = {
- 'none': nn.Identity,
- 'relu': nn.ReLU,
- 'relu6': nn.ReLU6,
- 'silu': nn.SiLU,
- 'gelu': nn.GELU
- }
- return act_dict[act_layer]
- class LayerNorm2d(nn.Module):
- def __init__(self, normalized_shape, eps=1e-6, elementwise_affine=True):
- super().__init__()
- self.norm = nn.LayerNorm(normalized_shape, eps, elementwise_affine)
- def forward(self, x):
- x = rearrange(x, 'b c h w -> b h w c').contiguous()
- x = self.norm(x)
- x = rearrange(x, 'b h w c -> b c h w').contiguous()
- return x
- def get_norm(norm_layer='in_1d'):
- eps = 1e-6
- norm_dict = {
- 'none': nn.Identity,
- 'in_1d': partial(nn.InstanceNorm1d, eps=eps),
- 'in_2d': partial(nn.InstanceNorm2d, eps=eps),
- 'in_3d': partial(nn.InstanceNorm3d, eps=eps),
- 'bn_1d': partial(nn.BatchNorm1d, eps=eps),
- 'bn_2d': partial(nn.BatchNorm2d, eps=eps),
- # 'bn_2d': partial(nn.SyncBatchNorm, eps=eps),
- 'bn_3d': partial(nn.BatchNorm3d, eps=eps),
- 'gn': partial(nn.GroupNorm, eps=eps),
- 'ln_1d': partial(nn.LayerNorm, eps=eps),
- 'ln_2d': partial(LayerNorm2d, eps=eps),
- }
- return norm_dict[norm_layer]
- class LayerScale(nn.Module):
- def __init__(self, dim, init_values=1e-5, inplace=True):
- super().__init__()
- self.inplace = inplace
- self.gamma = nn.Parameter(init_values * torch.ones(1, 1, dim))
- def forward(self, x):
- return x.mul_(self.gamma) if self.inplace else x * self.gamma
- class LayerScale2D(nn.Module):
- def __init__(self, dim, init_values=1e-5, inplace=True):
- super().__init__()
- self.inplace = inplace
- self.gamma = nn.Parameter(init_values * torch.ones(1, dim, 1, 1))
- def forward(self, x):
- return x.mul_(self.gamma) if self.inplace else x * self.gamma
- class ConvNormAct(nn.Module):
- def __init__(self, dim_in, dim_out, kernel_size, stride=1, dilation=1, groups=1, bias=False,
- skip=False, norm_layer='bn_2d', act_layer='relu', inplace=True, drop_path_rate=0.):
- super(ConvNormAct, self).__init__()
- self.has_skip = skip and dim_in == dim_out
- padding = math.ceil((kernel_size - stride) / 2)
- self.conv = nn.Conv2d(dim_in, dim_out, kernel_size, stride, padding, dilation, groups, bias)
- self.norm = get_norm(norm_layer)(dim_out)
- self.act = nn.GELU()
- self.drop_path = DropPath(drop_path_rate) if drop_path_rate else nn.Identity()
- def forward(self, x):
- shortcut = x
- x = self.conv(x)
- x = self.norm(x)
- x = self.act(x)
- if self.has_skip:
- x = self.drop_path(x) + shortcut
- return x
- # ========== Multi-Scale Populations, for down-sampling and inductive bias ==========
- class MSPatchEmb(nn.Module):
- def __init__(self, dim_in, emb_dim, kernel_size=2, c_group=-1, stride=1, dilations=[1, 2, 3],
- norm_layer='bn_2d', act_layer='silu'):
- super().__init__()
- self.dilation_num = len(dilations)
- assert dim_in % c_group == 0
- c_group = math.gcd(dim_in, emb_dim) if c_group == -1 else c_group
- self.convs = nn.ModuleList()
- for i in range(len(dilations)):
- padding = math.ceil(((kernel_size - 1) * dilations[i] + 1 - stride) / 2)
- self.convs.append(nn.Sequential(
- nn.Conv2d(dim_in, emb_dim, kernel_size, stride, padding, dilations[i], groups=c_group),
- get_norm(norm_layer)(emb_dim),
- get_act(act_layer)(emb_dim)))
- def forward(self, x):
- if self.dilation_num == 1:
- x = self.convs[0](x)
- else:
- x = torch.cat([self.convs[i](x).unsqueeze(dim=-1) for i in range(self.dilation_num)], dim=-1)
- x = reduce(x, 'b c h w n -> b c h w', 'mean').contiguous()
- return x
- class iRMB(nn.Module):
- def __init__(self, dim_in, dim_out, norm_in=True, has_skip=True, exp_ratio=1.0, norm_layer='bn_2d',
- act_layer='relu', v_proj=True, dw_ks=3, stride=1, dilation=1, se_ratio=0.0, dim_head=64, window_size=7,
- attn_s=True, qkv_bias=False, attn_drop=0., drop=0., drop_path=0., v_group=False, attn_pre=False):
- super().__init__()
- self.norm = get_norm(norm_layer)(dim_in) if norm_in else nn.Identity()
- dim_mid = int(dim_in * exp_ratio)
- self.has_skip = (dim_in == dim_out and stride == 1) and has_skip
- self.attn_s = attn_s
- if self.attn_s:
- assert dim_in % dim_head == 0, 'dim should be divisible by num_heads'
- self.dim_head = dim_head
- self.window_size = window_size
- self.num_head = dim_in // dim_head
- self.scale = self.dim_head ** -0.5
- self.attn_pre = attn_pre
- self.qk = ConvNormAct(dim_in, int(dim_in * 2), kernel_size=1, bias=qkv_bias, norm_layer='none',
- act_layer='none')
- self.v = ConvNormAct(dim_in, dim_mid, kernel_size=1, groups=self.num_head if v_group else 1, bias=qkv_bias,
- norm_layer='none', act_layer=act_layer, inplace=inplace)
- self.attn_drop = nn.Dropout(attn_drop)
- else:
- if v_proj:
- self.v = ConvNormAct(dim_in, dim_mid, kernel_size=1, bias=qkv_bias, norm_layer='none',
- act_layer=act_layer, inplace=inplace)
- else:
- self.v = nn.Identity()
- self.conv_local = ConvNormAct(dim_mid, dim_mid, kernel_size=dw_ks, stride=stride, dilation=dilation,
- groups=dim_mid, norm_layer='bn_2d', act_layer='silu', inplace=inplace)
- self.se = SELayerV2(dim_mid)
- self.proj_drop = nn.Dropout(drop)
- self.proj = ConvNormAct(dim_mid, dim_out, kernel_size=1, norm_layer='none', act_layer='none', inplace=inplace)
- self.drop_path = DropPath(drop_path) if drop_path else nn.Identity()
- def forward(self, x):
- shortcut = x
- x = self.norm(x)
- B, C, H, W = x.shape
- if self.attn_s:
- # padding
- if self.window_size <= 0:
- window_size_W, window_size_H = W, H
- else:
- window_size_W, window_size_H = self.window_size, self.window_size
- pad_l, pad_t = 0, 0
- pad_r = (window_size_W - W % window_size_W) % window_size_W
- pad_b = (window_size_H - H % window_size_H) % window_size_H
- x = F.pad(x, (pad_l, pad_r, pad_t, pad_b, 0, 0,))
- n1, n2 = (H + pad_b) // window_size_H, (W + pad_r) // window_size_W
- x = rearrange(x, 'b c (h1 n1) (w1 n2) -> (b n1 n2) c h1 w1', n1=n1, n2=n2).contiguous()
- # attention
- b, c, h, w = x.shape
- qk = self.qk(x)
- qk = rearrange(qk, 'b (qk heads dim_head) h w -> qk b heads (h w) dim_head', qk=2, heads=self.num_head,
- dim_head=self.dim_head).contiguous()
- q, k = qk[0], qk[1]
- attn_spa = (q @ k.transpose(-2, -1)) * self.scale
- attn_spa = attn_spa.softmax(dim=-1)
- attn_spa = self.attn_drop(attn_spa)
- if self.attn_pre:
- x = rearrange(x, 'b (heads dim_head) h w -> b heads (h w) dim_head', heads=self.num_head).contiguous()
- x_spa = attn_spa @ x
- x_spa = rearrange(x_spa, 'b heads (h w) dim_head -> b (heads dim_head) h w', heads=self.num_head, h=h,
- w=w).contiguous()
- x_spa = self.v(x_spa)
- else:
- v = self.v(x)
- v = rearrange(v, 'b (heads dim_head) h w -> b heads (h w) dim_head', heads=self.num_head).contiguous()
- x_spa = attn_spa @ v
- x_spa = rearrange(x_spa, 'b heads (h w) dim_head -> b (heads dim_head) h w', heads=self.num_head, h=h,
- w=w).contiguous()
- # unpadding
- x = rearrange(x_spa, '(b n1 n2) c h1 w1 -> b c (h1 n1) (w1 n2)', n1=n1, n2=n2).contiguous()
- if pad_r > 0 or pad_b > 0:
- x = x[:, :, :H, :W].contiguous()
- else:
- x = self.v(x)
- x = x + self.se(self.conv_local(x)) if self.has_skip else self.se(self.conv_local(x))
- x = self.proj_drop(x)
- x = self.proj(x)
- x = (shortcut + self.drop_path(x)) if self.has_skip else x
- return x
- class EMO(nn.Module):
- def __init__(self, dim_in=3,factor=1,
- depths=[1, 2, 4, 2], stem_dim=16, embed_dims=[64, 128, 256, 512], exp_ratios=[4., 4., 4., 4.],
- norm_layers=['bn_2d', 'bn_2d', 'bn_2d', 'bn_2d'], act_layers=['relu', 'relu', 'relu', 'relu'],
- dw_kss=[3, 3, 5, 5], se_ratios=[0.0, 0.0, 0.0, 0.0], dim_heads=[32, 32, 32, 32],
- window_sizes=[7, 7, 7, 7], attn_ss=[False, False, True, True], qkv_bias=True,
- attn_drop=0., drop=0., drop_path=0., v_group=False, attn_pre=False, pre_dim=0):
- super().__init__()
- # 放缩系数
- scale_factor = factor # 比如放大 1.5 倍
- # exp_ratios放缩比例不缩放
- # 放缩后的 embed_dims,每个元素都被乘以 scale_factor 并转化为整形
- embed_dims = [int(dim * scale_factor) for dim in embed_dims]
- dprs = [x.item() for x in torch.linspace(0, drop_path, sum(depths))]
- self.stage0 = nn.ModuleList([
- MSPatchEmb( # down to 112
- dim_in, stem_dim, kernel_size=dw_kss[0], c_group=1, stride=2, dilations=[1],
- norm_layer=norm_layers[0], act_layer='none'),
- iRMB( # ds
- stem_dim, stem_dim, norm_in=False, has_skip=False, exp_ratio=1,
- norm_layer=norm_layers[0], act_layer=act_layers[0], v_proj=False, dw_ks=dw_kss[0],
- stride=1, dilation=1, se_ratio=1,
- dim_head=dim_heads[0], window_size=window_sizes[0], attn_s=False,
- qkv_bias=qkv_bias, attn_drop=attn_drop, drop=drop, drop_path=0.,
- attn_pre=attn_pre
- )
- ])
- emb_dim_pre = stem_dim
- for i in range(len(depths)):
- layers = []
- dpr = dprs[sum(depths[:i]):sum(depths[:i + 1])]
- for j in range(depths[i]):
- if j == 0:
- stride, has_skip, attn_s, exp_ratio = 2, False, False, exp_ratios[i] * 2
- else:
- stride, has_skip, attn_s, exp_ratio = 1, True, attn_ss[i], exp_ratios[i]
- layers.append(iRMB(
- emb_dim_pre, embed_dims[i], norm_in=True, has_skip=has_skip, exp_ratio=exp_ratio,
- norm_layer=norm_layers[i], act_layer=act_layers[i], v_proj=True, dw_ks=dw_kss[i],
- stride=stride, dilation=1, se_ratio=se_ratios[i],
- dim_head=dim_heads[i], window_size=window_sizes[i], attn_s=attn_s,
- qkv_bias=qkv_bias, attn_drop=attn_drop, drop=drop, drop_path=dpr[j], v_group=v_group,
- attn_pre=attn_pre
- ))
- emb_dim_pre = embed_dims[i]
- self.__setattr__(f'stage{i + 1}', nn.ModuleList(layers))
- self.norm = get_norm(norm_layers[-1])(embed_dims[-1])
- if pre_dim > 0:
- self.pre_head = nn.Sequential(nn.Linear(embed_dims[-1], pre_dim), get_act(act_layers[-1])(inplace=inplace))
- self.pre_dim = pre_dim
- else:
- self.pre_head = nn.Identity()
- self.pre_dim = embed_dims[-1]
- self.apply(self._init_weights)
- self.width_list = [i.size(1) for i in self.forward(torch.randn(1, 3, 640, 640))]
- def _init_weights(self, m):
- if isinstance(m, nn.Linear):
- trunc_normal_(m.weight, std=.02)
- if m.bias is not None:
- nn.init.zeros_(m.bias)
- elif isinstance(m, (nn.LayerNorm, nn.GroupNorm,
- nn.BatchNorm1d, nn.BatchNorm2d, nn.BatchNorm3d,
- nn.InstanceNorm1d, nn.InstanceNorm2d, nn.InstanceNorm3d)):
- nn.init.zeros_(m.bias)
- nn.init.ones_(m.weight)
- @torch.jit.ignore
- def no_weight_decay(self):
- return {'token'}
- @torch.jit.ignore
- def no_weight_decay_keywords(self):
- return {'alpha', 'gamma', 'beta'}
- @torch.jit.ignore
- def no_ft_keywords(self):
- # return {'head.weight', 'head.bias'}
- return {}
- @torch.jit.ignore
- def ft_head_keywords(self):
- return {'head.weight', 'head.bias'}, self.num_classes
- def get_classifier(self):
- return self.head
- def reset_classifier(self, num_classes):
- self.num_classes = num_classes
- self.head = nn.Linear(self.pre_dim, num_classes) if num_classes > 0 else nn.Identity()
- def check_bn(self):
- for name, m in self.named_modules():
- if isinstance(m, nn.modules.batchnorm._NormBase):
- m.running_mean = torch.nan_to_num(m.running_mean, nan=0, posinf=1, neginf=-1)
- m.running_var = torch.nan_to_num(m.running_var, nan=0, posinf=1, neginf=-1)
- def forward(self, x):
- unique_tensors = {}
- for blk in self.stage0:
- x = blk(x)
- width, height = x.shape[2], x.shape[3]
- unique_tensors[(width, height)] = x
- for blk in self.stage1:
- x = blk(x)
- width, height = x.shape[2], x.shape[3]
- unique_tensors[(width, height)] = x
- for blk in self.stage2:
- x = blk(x)
- width, height = x.shape[2], x.shape[3]
- unique_tensors[(width, height)] = x
- for blk in self.stage3:
- x = blk(x)
- width, height = x.shape[2], x.shape[3]
- unique_tensors[(width, height)] = x
- for blk in self.stage4:
- x = blk(x)
- width, height = x.shape[2], x.shape[3]
- unique_tensors[(width, height)] = x
- result_list = list(unique_tensors.values())[-4:]
- return result_list
- def EMO_1M(factor=1):
- model = EMO(
- factor=factor,
- depths=[2, 2, 8, 3], stem_dim=24, embed_dims=[32, 48, 80, 168], exp_ratios=[2., 2.5, 3.0, 3.5],
- norm_layers=['bn_2d', 'bn_2d', 'ln_2d', 'ln_2d'], act_layers=['silu', 'silu', 'gelu', 'gelu'],
- dw_kss=[3, 3, 5, 5], dim_heads=[16, 16, 20, 21], window_sizes=[7, 7, 7, 7], attn_ss=[False, False, True, True],
- qkv_bias=True, attn_drop=0., drop=0., drop_path=0.04036, v_group=False, attn_pre=True, pre_dim=0)
- return model
- def EMO_2M(factor=1):
- model = EMO(
- factor=factor,
- depths=[3, 3, 9, 3], stem_dim=24, embed_dims=[32, 48, 120, 200], exp_ratios=[2., 2.5, 3.0, 3.5],
- norm_layers=['bn_2d', 'bn_2d', 'ln_2d', 'ln_2d'], act_layers=['silu', 'silu', 'gelu', 'gelu'],
- dw_kss=[3, 3, 5, 5], dim_heads=[16, 16, 20, 20], window_sizes=[7, 7, 7, 7], attn_ss=[False, False, True, True],
- qkv_bias=True, attn_drop=0., drop=0., drop_path=0.05, v_group=False, attn_pre=True, pre_dim=0)
- return model
- def EMO_5M(factor=1):
- model = EMO(
- factor=factor,
- depths=[3, 3, 9, 3], stem_dim=24, embed_dims=[48, 72, 160, 288], exp_ratios=[2., 3., 4., 4.],
- norm_layers=['bn_2d', 'bn_2d', 'ln_2d', 'ln_2d'], act_layers=['silu', 'silu', 'gelu', 'gelu'],
- dw_kss=[3, 3, 5, 5], dim_heads=[24, 24, 32, 32], window_sizes=[7, 7, 7, 7], attn_ss=[False, False, True, True],
- qkv_bias=True, attn_drop=0., drop=0., drop_path=0.05, v_group=False, attn_pre=True, pre_dim=0)
- return model
- def EMO_6M(factor=1):
- model = EMO(
- factor=factor,
- depths=[3, 3, 9, 3], stem_dim=24, embed_dims=[48, 72, 160, 320], exp_ratios=[2., 3., 4., 5.],
- norm_layers=['bn_2d', 'bn_2d', 'ln_2d', 'ln_2d'], act_layers=['silu', 'silu', 'gelu', 'gelu'],
- dw_kss=[3, 3, 5, 5], dim_heads=[16, 24, 20, 32], window_sizes=[7, 7, 7, 7], attn_ss=[False, False, True, True],
- qkv_bias=True, attn_drop=0., drop=0., drop_path=0.05, v_group=False, attn_pre=True, pre_dim=0)
- return model
- if __name__ == "__main__":
- # Generating Sample image
- image_size = (1, 3, 640, 640)
- image = torch.rand(*image_size)
- # Model
- model = EMO_6M()
- out = model(image)
- print(len(out))
四、手把手教你添加EMO
4.1 修改一
第一步还是建立文件,我们找到如下ultralytics/nn/modules文件夹下建立一个目录名字呢就是'Addmodules'文件夹( 用群内的文件的话已经有了无需新建) !然后在其内部建立一个新的py文件将核心代码复制粘贴进去即可
4.2 修改二
第二步我们在该目录下创建一个新的py文件名字为'__init__.py'( 用群内的文件的话已经有了无需新建) ,然后在其内部导入我们的检测头如下图所示。
4.3 修改三
第三步我门中到如下文件'ultralytics/nn/tasks.py'进行导入和注册我们的模块( 用群内的文件的话已经有了无需重新导入直接开始第四步即可) !
从今天开始以后的教程就都统一成这个样子了,因为我默认大家用了我群内的文件来进行修改!!
4.4 修改四
添加如下两行代码!!!
4.5 修改五
找到七百多行大概把具体看图片,按照图片来修改就行,添加红框内的部分,注意没有()只是 函数 名。
- elif m in {自行添加对应的模型即可,下面都是一样的}:
- m = m(*args)
- c2 = m.width_list # 返回通道列表
- backbone = True
4.6 修改六
下面的两个红框内都是需要改动的。
- if isinstance(c2, list):
- m_ = m
- m_.backbone = True
- else:
- m_ = nn.Sequential(*(m(*args) for _ in range(n))) if n > 1 else m(*args) # module
- t = str(m)[8:-2].replace('__main__.', '') # module type
- m.np = sum(x.numel() for x in m_.parameters()) # number params
- m_.i, m_.f, m_.type = i + 4 if backbone else i, f, t # attach index, 'from' index, type
4.7 修改七
如下的也需要修改,全部按照我的来。
代码如下把原先的代码替换了即可。
- if verbose:
- LOGGER.info(f'{i:>3}{str(f):>20}{n_:>3}{m.np:10.0f} {t:<45}{str(args):<30}') # print
- save.extend(x % (i + 4 if backbone else i) for x in ([f] if isinstance(f, int) else f) if x != -1) # append to savelist
- layers.append(m_)
- if i == 0:
- ch = []
- if isinstance(c2, list):
- ch.extend(c2)
- if len(c2) != 5:
- ch.insert(0, 0)
- else:
- ch.append(c2)
4.8 修改八
修改八和前面的都不太一样,需要修改前向传播中的一个部分, 已经离开了parse_model方法了。
可以在图片中开代码行数,没有离开task.py文件都是同一个文件。 同时这个部分有好几个前向传播都很相似,大家不要看错了, 是70多行左右的!!!,同时我后面提供了代码,大家直接复制粘贴即可,有时间我针对这里会出一个视频。
代码如下->
- def _predict_once(self, x, profile=False, visualize=False, embed=None):
- """
- Perform a forward pass through the network.
- Args:
- x (torch.Tensor): The input tensor to the model.
- profile (bool): Print the computation time of each layer if True, defaults to False.
- visualize (bool): Save the feature maps of the model if True, defaults to False.
- embed (list, optional): A list of feature vectors/embeddings to return.
- Returns:
- (torch.Tensor): The last output of the model.
- """
- y, dt, embeddings = [], [], [] # outputs
- for m in self.model:
- if m.f != -1: # if not from previous layer
- x = y[m.f] if isinstance(m.f, int) else [x if j == -1 else y[j] for j in m.f] # from earlier layers
- if profile:
- self._profile_one_layer(m, x, dt)
- if hasattr(m, 'backbone'):
- x = m(x)
- if len(x) != 5: # 0 - 5
- x.insert(0, None)
- for index, i in enumerate(x):
- if index in self.save:
- y.append(i)
- else:
- y.append(None)
- x = x[-1] # 最后一个输出传给下一层
- else:
- x = m(x) # run
- y.append(x if m.i in self.save else None) # save output
- if visualize:
- feature_visualization(x, m.type, m.i, save_dir=visualize)
- if embed and m.i in embed:
- embeddings.append(nn.functional.adaptive_avg_pool2d(x, (1, 1)).squeeze(-1).squeeze(-1)) # flatten
- if m.i == max(embed):
- return torch.unbind(torch.cat(embeddings, 1), dim=0)
- return x
到这里就完成了修改部分,但是这里面细节很多,大家千万要注意不要替换多余的代码,导致报错,也不要拉下任何一部,都会导致运行失败,而且报错很难排查!!!很难排查!!!
注意!!! 额外的修改!
关注我的其实都知道,我大部分的修改都是一样的,这个网络需要额外的修改一步,就是s一个参数,将下面的s改为640!!!即可完美运行!!
打印计算量问题解决方案
我们找到如下文件'ultralytics/utils/torch_utils.py'按照如下的图片进行修改,否则容易打印不出来计算量。
注意事项!!!
如果大家在验证的时候报错形状不匹配的错误可以固定 验证集 的图片尺寸,方法如下 ->
找到下面这个文件ultralytics/ models /yolo/detect/train.py然后其中有一个类是DetectionTrainer class中的build_dataset函数中的一个参数rect=mode == 'val'改为rect=False
五、EMO的yaml文件
5.1 EMO的yaml文件
训练信息:YOLO11-EMO summary: 860 layers, 2,423,567 parameters, 2,423,551 gradients, 6.5 GFLOPs
- # Ultralytics YOLO 🚀, AGPL-3.0 license
- # YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
- # Parameters
- nc: 80 # number of classes
- scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
- # [depth, width, max_channels]
- n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
- s: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
- m: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
- l: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
- x: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs
- # 我提供了版本分别是对应是 ['EMO_1M', 'EMO_2M', 'EMO_5M', 'EMO_6M']
- # 其中n是对应yolo的版本通道放缩 large 和 small 是模型官方本身自带的版本
- # YOLO11n backbone
- backbone:
- # [from, repeats, module, args]
- - [-1, 1, EMO_1M, [0.25]] # 0-4 P1/2 这里是四层大家不要被yaml文件限制住了思维,不会画图进群看视频.
- # 注意args位置的参数对应模型的通道放缩系数width在上面scales位置, 假设你用yolov11n那么可以设置0.25 如果你用yolov11s可以设置0.5
- - [-1, 1, SPPF, [1024, 5]] # 5
- - [-1, 2, C2PSA, [1024]] # 6
- # YOLO11n head
- head:
- - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- - [[-1, 3], 1, Concat, [1]] # cat backbone P4
- - [-1, 2, C3k2, [512, False]] # 9
- - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- - [[-1, 2], 1, Concat, [1]] # cat backbone P3
- - [-1, 2, C3k2, [256, False]] # 12 (P3/8-small)
- - [-1, 1, Conv, [256, 3, 2]]
- - [[-1, 9], 1, Concat, [1]] # cat head P4
- - [-1, 2, C3k2, [512, False]] # 15 (P4/16-medium)
- - [-1, 1, Conv, [512, 3, 2]]
- - [[-1, 6], 1, Concat, [1]] # cat head P5
- - [-1, 2, C3k2, [1024, True]] # 18 (P5/32-large)
- - [[12, 15, 18], 1, Detect, [nc]] # Detect(P3, P4, P5)
5.2 训练文件的代码
可以复制我的运行文件进行运行。
- import warnings
- warnings.filterwarnings('ignore')
- from ultralytics import YOLO
- if __name__ == '__main__':
- model = YOLO('yolov8-MLLA.yaml')
- # 如何切换模型版本, 上面的ymal文件可以改为 yolov8s.yaml就是使用的v8s,
- # 类似某个改进的yaml文件名称为yolov8-XXX.yaml那么如果想使用其它版本就把上面的名称改为yolov8l-XXX.yaml即可(改的是上面YOLO中间的名字不是配置文件的)!
- # model.load('yolov8n.pt') # 是否加载预训练权重,科研不建议大家加载否则很难提升精度
- model.train(data=r"C:\Users\Administrator\PycharmProjects\yolov5-master\yolov5-master\Construction Site Safety.v30-raw-images_latestversion.yolov8\data.yaml",
- # 如果大家任务是其它的'ultralytics/cfg/default.yaml'找到这里修改task可以改成detect, segment, classify, pose
- cache=False,
- imgsz=640,
- epochs=150,
- single_cls=False, # 是否是单类别检测
- batch=16,
- close_mosaic=0,
- workers=0,
- device='0',
- optimizer='SGD', # using SGD
- # resume='runs/train/exp21/weights/last.pt', # 如过想续训就设置last.pt的地址
- amp=True, # 如果出现训练损失为Nan可以关闭amp
- project='runs/train',
- name='exp',
- )
六、成功运行记录
下面是成功运行的截图,已经完成了有1个epochs的训练,图片太大截不全第2个epochs了。
七、本文总结
到此本文的正式分享内容就结束了,在这里给大家推荐我的YOLOv11改进有效涨点专栏,本专栏目前为新开的平均质量分98分,后期我会根据各种最新的前沿顶会进行论文复现,也会对一些老的改进机制进行补充 , 如果大家觉得本文帮助到你了,订阅本专栏,关注后续更多的更新~