一、本文介绍
这次给大家带来的改进机制是 EfficientNetV2 , 其在其V1版本通过均衡地缩放网络的深度、宽度和 分辨率 ,以提高卷积 神经网络 的性能的基础上,又 提出了一种改进的渐进式学习方法 ,通过在训练过程中逐步增加图像尺寸并适应性调整 正则化 来加快训练速度,同时保持准确性。所以其相对于V1版本的改进主要是在速度和效率上的改进 ( 但是经过我实验我觉得V2不如V1快,可能是我使用的不是同一等级的版本,大家也可以进行一下对比 )。 本文通过介绍其主要框架原理,然后教大家如何添加该网络结构到网络模型中。
(本文内容可根据yolov11的N、S、M、L、X进行二次缩放,轻量化更上一层)。
二、EfficientNetV2的框架原理
官方论文地址: 官方论文地址点击即可跳转
官方代码地址: 官方代码地址点击即可跳转
这篇论文主要介绍了EfficientNetV2 , 这是一种新型的 卷积神经网络 ,它的特点是训练速度更快、参数效率更高。通过结合训练感知的神经架构搜索和缩放,这些模型在训练速度和参数效率上都得到了优化。文章还提出了一种改进的渐进式学习方法,通过在训练过程中逐步增加图像尺寸并适应性调整正则化来加快训练速度,同时保持准确性。
EfficientNetV2的主要创新点包括:
1. 结构创新: EfficientNetV2 在早期层中采用了fused-MBConv结构,这有助于降低内存访问开销。此外,EfficientNetV2倾向于使用较小的扩展比例和3x3的卷积核大小,同时增加更多的层次来补偿由于使用较小卷积核导致的接收域减小。最后,EfficientNetV2完全移除了原始EfficientNet中的最后一个stride-1阶段,可能是因为它的大参数尺寸和内存访问开销。
2. 训练速度的优化: 您的研究比较了EfficientNetV2与其他模型在固定图像大小下的训练步骤时间。EfficientNetV2通过训练感知的神经架构搜索和模型缩放,实现了比其他最新模型更快的训练速度。
3.渐进式学习与自适应正则化: EfficientNetV2采用了改进的渐进式学习方法,该方法在训练早期使用较小的图像尺寸和较弱的正则化,使得网络可以更容易、更快地学习简单的表示。随着训练的进行,逐渐增加图像尺寸,并通过增强正则化来提高学习难度。
4. 自适应正则化的重要性: 您的研究强调了自适应正则化的重要性,这种方法根据图像大小动态调整正则化强度。该方法简单但有效,并且可以与其他方法结合使用。
图片展示了两种卷积神经网络中的模块: MBConv和Fused-MBConv的结构。
MBConv: 这是一种包含了深度可分离卷积(depthwise conv3x3)的模块,其包括1x1的卷积用于调整通道数,随后是深度可分离卷积用于捕捉空间特征,最后又是一个1x1的卷积来恢复通道数。此外,它还包含一个SE模块(Squeeze-and-Excitation),用于通过学习重要通道的权重来提高网络的表示能力。
Fused-MBConv: 与MBConv类似,这种结构也包含了SE模块和1x1的卷积,但它将深度可分离卷积替换为了一个标准的3x3卷积,这通常可以减少运算量并提高性能。
这两种结构通常用于构建高效的 深度学习 模型,特别是在计算资源有限的情况下。Fused-MBConv因为其结构简会带来计算效率的提升。
三、EfficientNetV2的核心代码
- import copy
- from functools import partial
- from collections import OrderedDict
- from torch import nn
- import os
- import re
- import subprocess
- from pathlib import Path
- import numpy as np
- import torch
- __all__ = ['efficientnet_v2']
- def get_efficientnet_v2_structure(model_name):
- if 'efficientnet_v2_s' in model_name:
- return [
- # e k s in out xN se fused
- (1, 3, 1, 24, 24, 2, False, True),
- (4, 3, 2, 24, 48, 4, False, True),
- (4, 3, 2, 48, 64, 4, False, True),
- (4, 3, 2, 64, 128, 6, True, False),
- (6, 3, 1, 128, 160, 9, True, False),
- (6, 3, 2, 160, 256, 15, True, False),
- ]
- elif 'efficientnet_v2_m' in model_name:
- return [
- # e k s in out xN se fused
- (1, 3, 1, 24, 24, 3, False, True),
- (4, 3, 2, 24, 48, 5, False, True),
- (4, 3, 2, 48, 80, 5, False, True),
- (4, 3, 2, 80, 160, 7, True, False),
- (6, 3, 1, 160, 176, 14, True, False),
- (6, 3, 2, 176, 304, 18, True, False),
- (6, 3, 1, 304, 512, 5, True, False),
- ]
- elif 'efficientnet_v2_l' in model_name:
- return [
- # e k s in out xN se fused
- (1, 3, 1, 32, 32, 4, False, True),
- (4, 3, 2, 32, 64, 7, False, True),
- (4, 3, 2, 64, 96, 7, False, True),
- (4, 3, 2, 96, 192, 10, True, False),
- (6, 3, 1, 192, 224, 19, True, False),
- (6, 3, 2, 224, 384, 25, True, False),
- (6, 3, 1, 384, 640, 7, True, False),
- ]
- elif 'efficientnet_v2_xl' in model_name:
- return [
- # e k s in out xN se fused
- (1, 3, 1, 32, 32, 4, False, True),
- (4, 3, 2, 32, 64, 8, False, True),
- (4, 3, 2, 64, 96, 8, False, True),
- (4, 3, 2, 96, 192, 16, True, False),
- (6, 3, 1, 192, 256, 24, True, False),
- (6, 3, 2, 256, 512, 32, True, False),
- (6, 3, 1, 512, 640, 8, True, False),
- ]
- class ConvBNAct(nn.Sequential):
- """Convolution-Normalization-Activation Module"""
- def __init__(self, in_channel, out_channel, kernel_size, stride, groups, norm_layer, act, conv_layer=nn.Conv2d):
- super(ConvBNAct, self).__init__(
- conv_layer(in_channel, out_channel, kernel_size, stride=stride, padding=(kernel_size-1)//2, groups=groups, bias=False),
- norm_layer(out_channel),
- act()
- )
- class SEUnit(nn.Module):
- """Squeeze-Excitation Unit
- paper: https://openaccess.thecvf.com/content_cvpr_2018/html/Hu_Squeeze-and-Excitation_Networks_CVPR_2018_paper
- """
- def __init__(self, in_channel, reduction_ratio=4, act1=partial(nn.SiLU, inplace=True), act2=nn.Sigmoid):
- super(SEUnit, self).__init__()
- hidden_dim = in_channel // reduction_ratio
- self.avg_pool = nn.AdaptiveAvgPool2d((1, 1))
- self.fc1 = nn.Conv2d(in_channel, hidden_dim, (1, 1), bias=True)
- self.fc2 = nn.Conv2d(hidden_dim, in_channel, (1, 1), bias=True)
- self.act1 = act1()
- self.act2 = act2()
- def forward(self, x):
- return x * self.act2(self.fc2(self.act1(self.fc1(self.avg_pool(x)))))
- class StochasticDepth(nn.Module):
- """StochasticDepth
- paper: https://link.springer.com/chapter/10.1007/978-3-319-46493-0_39
- :arg
- - prob: Probability of dying
- - mode: "row" or "all". "row" means that each row survives with different probability
- """
- def __init__(self, prob, mode):
- super(StochasticDepth, self).__init__()
- self.prob = prob
- self.survival = 1.0 - prob
- self.mode = mode
- def forward(self, x):
- if self.prob == 0.0 or not self.training:
- return x
- else:
- shape = [x.size(0)] + [1] * (x.ndim - 1) if self.mode == 'row' else [1]
- return x * torch.empty(shape).bernoulli_(self.survival).div_(self.survival).to(x.device)
- class MBConvConfig:
- """EfficientNet Building block configuration"""
- def __init__(self, expand_ratio: float, kernel: int, stride: int, in_ch: int, out_ch: int, layers: int,
- use_se: bool, fused: bool, act=nn.SiLU, norm_layer=nn.BatchNorm2d):
- self.expand_ratio = expand_ratio
- self.kernel = kernel
- self.stride = stride
- self.in_ch = in_ch
- self.out_ch = out_ch
- self.num_layers = layers
- self.act = act
- self.norm_layer = norm_layer
- self.use_se = use_se
- self.fused = fused
- @staticmethod
- def adjust_channels(channel, factor, divisible=8):
- new_channel = channel * factor
- divisible_channel = max(divisible, (int(new_channel + divisible / 2) // divisible) * divisible)
- divisible_channel += divisible if divisible_channel < 0.9 * new_channel else 0
- return divisible_channel
- class MBConv(nn.Module):
- """EfficientNet main building blocks
- :arg
- - c: MBConvConfig instance
- - sd_prob: stochastic path probability
- """
- def __init__(self, c, sd_prob=0.0):
- super(MBConv, self).__init__()
- inter_channel = c.adjust_channels(c.in_ch, c.expand_ratio)
- block = []
- if c.expand_ratio == 1:
- block.append(('fused', ConvBNAct(c.in_ch, inter_channel, c.kernel, c.stride, 1, c.norm_layer, c.act)))
- elif c.fused:
- block.append(('fused', ConvBNAct(c.in_ch, inter_channel, c.kernel, c.stride, 1, c.norm_layer, c.act)))
- block.append(('fused_point_wise', ConvBNAct(inter_channel, c.out_ch, 1, 1, 1, c.norm_layer, nn.Identity)))
- else:
- block.append(('linear_bottleneck', ConvBNAct(c.in_ch, inter_channel, 1, 1, 1, c.norm_layer, c.act)))
- block.append(('depth_wise', ConvBNAct(inter_channel, inter_channel, c.kernel, c.stride, inter_channel, c.norm_layer, c.act)))
- block.append(('se', SEUnit(inter_channel, 4 * c.expand_ratio)))
- block.append(('point_wise', ConvBNAct(inter_channel, c.out_ch, 1, 1, 1, c.norm_layer, nn.Identity)))
- self.block = nn.Sequential(OrderedDict(block))
- self.use_skip_connection = c.stride == 1 and c.in_ch == c.out_ch
- self.stochastic_path = StochasticDepth(sd_prob, "row")
- def forward(self, x):
- out = self.block(x)
- if self.use_skip_connection:
- out = x + self.stochastic_path(out)
- return out
- def _make_divisible(v, divisor, min_value=None):
- """
- This function is taken from the original tf repo.
- It ensures that all layers have a channel number that is divisible by 8
- It can be seen here:
- https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
- :param v:
- :param divisor:
- :param min_value:
- :return:
- """
- if min_value is None:
- min_value = divisor
- new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
- # Make sure that round down does not go down by more than 10%.
- if new_v < 0.9 * v:
- new_v += divisor
- return new_v
- class EfficientNetV2(nn.Module):
- """Pytorch Implementation of EfficientNetV2
- paper: https://arxiv.org/abs/2104.00298
- - reference 1 (pytorch): https://github.com/d-li14/efficientnetv2.pytorch/blob/main/effnetv2.py
- - reference 2 (official): https://github.com/google/automl/blob/master/efficientnetv2/effnetv2_configs.py
- :arg
- - layer_infos: list of MBConvConfig
- - out_channels: bottleneck channel
- - nlcass: number of class
- - dropout: dropout probability before classifier layer
- - stochastic depth: stochastic depth probability
- """
- def __init__(self, factor, depth, layer_infos, nclass=0, dropout=0.2, stochastic_depth=0.0,
- block=MBConv, act_layer=nn.SiLU, norm_layer=nn.BatchNorm2d):
- super(EfficientNetV2, self).__init__()
- for layer in layer_infos:
- layer.in_ch = _make_divisible(int(layer.in_ch * factor), 8)
- layer.out_ch = _make_divisible(int(layer.out_ch * factor), 8)
- layer.num_layers = max(1, int(layer.num_layers * depth))
- self.layer_infos = layer_infos
- self.norm_layer = norm_layer
- self.act = act_layer
- self.in_channel = layer_infos[0].in_ch
- self.final_stage_channel = layer_infos[-1].out_ch
- self.cur_block = 0
- self.num_block = sum(stage.num_layers for stage in layer_infos)
- self.stochastic_depth = stochastic_depth
- self.stem = ConvBNAct(3, self.in_channel, 3, 2, 1, self.norm_layer, self.act)
- self.blocks = nn.Sequential(*self.make_stages(layer_infos, block))
- self.width_list = [i.size(1) for i in self.forward(torch.randn(1, 3, 640, 640))]
- def make_stages(self, layer_infos, block):
- return [layer for layer_info in layer_infos for layer in self.make_layers(copy.copy(layer_info), block)]
- def make_layers(self, layer_info, block):
- layers = []
- for i in range(layer_info.num_layers):
- layers.append(block(layer_info, sd_prob=self.get_sd_prob()))
- layer_info.in_ch = layer_info.out_ch
- layer_info.stride = 1
- return layers
- def get_sd_prob(self):
- sd_prob = self.stochastic_depth * (self.cur_block / self.num_block)
- self.cur_block += 1
- return sd_prob
- def forward(self, x):
- x = self.stem(x)
- unique_tensors = {}
- for idx, block in enumerate(self.blocks):
- x = block(x)
- width, height = x.shape[2], x.shape[3]
- unique_tensors[(width, height)] = x
- result_list = list(unique_tensors.values())[-4:]
- return result_list
- def efficientnet_v2_init(model):
- for m in model.modules():
- if isinstance(m, nn.Conv2d):
- nn.init.kaiming_normal_(m.weight, mode='fan_out')
- if m.bias is not None:
- nn.init.zeros_(m.bias)
- elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
- nn.init.ones_(m.weight)
- nn.init.zeros_(m.bias)
- elif isinstance(m, nn.Linear):
- nn.init.normal_(m.weight, mean=0.0, std=0.01)
- nn.init.zeros_(m.bias)
- model_urls = {
- "efficientnet_v2_s": "https://github.com/hankyul2/EfficientNetV2-pytorch/releases/download/EfficientNetV2-pytorch/efficientnetv2-s.npy",
- "efficientnet_v2_m": "https://github.com/hankyul2/EfficientNetV2-pytorch/releases/download/EfficientNetV2-pytorch/efficientnetv2-m.npy",
- "efficientnet_v2_l": "https://github.com/hankyul2/EfficientNetV2-pytorch/releases/download/EfficientNetV2-pytorch/efficientnetv2-l.npy",
- "efficientnet_v2_s_in21k": "https://github.com/hankyul2/EfficientNetV2-pytorch/releases/download/EfficientNetV2-pytorch/efficientnetv2-s-21k.npy",
- "efficientnet_v2_m_in21k": "https://github.com/hankyul2/EfficientNetV2-pytorch/releases/download/EfficientNetV2-pytorch/efficientnetv2-m-21k.npy",
- "efficientnet_v2_l_in21k": "https://github.com/hankyul2/EfficientNetV2-pytorch/releases/download/EfficientNetV2-pytorch/efficientnetv2-l-21k.npy",
- "efficientnet_v2_xl_in21k": "https://github.com/hankyul2/EfficientNetV2-pytorch/releases/download/EfficientNetV2-pytorch/efficientnetv2-xl-21k.npy",
- }
- def load_from_zoo(model, model_name, pretrained_path='pretrained/official'):
- Path(os.path.join(pretrained_path, model_name)).mkdir(parents=True, exist_ok=True)
- file_name = os.path.join(pretrained_path, model_name, os.path.basename(model_urls[model_name]))
- load_npy(model, load_npy_from_url(url=model_urls[model_name], file_name=file_name))
- def load_npy_from_url(url, file_name):
- if not Path(file_name).exists():
- subprocess.run(["wget", "-r", "-nc", '-O', file_name, url])
- return np.load(file_name, allow_pickle=True).item()
- def npz_dim_convertor(name, weight):
- weight = torch.from_numpy(weight)
- if 'kernel' in name:
- if weight.dim() == 4:
- if weight.shape[3] == 1:
- # depth-wise convolution 'h w in_c out_c -> in_c out_c h w'
- weight = torch.permute(weight, (2, 3, 0, 1))
- else:
- # 'h w in_c out_c -> out_c in_c h w'
- weight = torch.permute(weight, (3, 2, 0, 1))
- elif weight.dim() == 2:
- weight = weight.transpose(1, 0)
- elif 'scale' in name or 'bias' in name:
- weight = weight.squeeze()
- return weight
- def load_npy(model, weight):
- name_convertor = [
- # stem
- ('stem.0.weight', 'stem/conv2d/kernel/ExponentialMovingAverage'),
- ('stem.1.weight', 'stem/tpu_batch_normalization/gamma/ExponentialMovingAverage'),
- ('stem.1.bias', 'stem/tpu_batch_normalization/beta/ExponentialMovingAverage'),
- ('stem.1.running_mean', 'stem/tpu_batch_normalization/moving_mean/ExponentialMovingAverage'),
- ('stem.1.running_var', 'stem/tpu_batch_normalization/moving_variance/ExponentialMovingAverage'),
- # fused layer
- ('block.fused.0.weight', 'conv2d/kernel/ExponentialMovingAverage'),
- ('block.fused.1.weight', 'tpu_batch_normalization/gamma/ExponentialMovingAverage'),
- ('block.fused.1.bias', 'tpu_batch_normalization/beta/ExponentialMovingAverage'),
- ('block.fused.1.running_mean', 'tpu_batch_normalization/moving_mean/ExponentialMovingAverage'),
- ('block.fused.1.running_var', 'tpu_batch_normalization/moving_variance/ExponentialMovingAverage'),
- # linear bottleneck
- ('block.linear_bottleneck.0.weight', 'conv2d/kernel/ExponentialMovingAverage'),
- ('block.linear_bottleneck.1.weight', 'tpu_batch_normalization/gamma/ExponentialMovingAverage'),
- ('block.linear_bottleneck.1.bias', 'tpu_batch_normalization/beta/ExponentialMovingAverage'),
- ('block.linear_bottleneck.1.running_mean', 'tpu_batch_normalization/moving_mean/ExponentialMovingAverage'),
- ('block.linear_bottleneck.1.running_var', 'tpu_batch_normalization/moving_variance/ExponentialMovingAverage'),
- # depth wise layer
- ('block.depth_wise.0.weight', 'depthwise_conv2d/depthwise_kernel/ExponentialMovingAverage'),
- ('block.depth_wise.1.weight', 'tpu_batch_normalization_1/gamma/ExponentialMovingAverage'),
- ('block.depth_wise.1.bias', 'tpu_batch_normalization_1/beta/ExponentialMovingAverage'),
- ('block.depth_wise.1.running_mean', 'tpu_batch_normalization_1/moving_mean/ExponentialMovingAverage'),
- ('block.depth_wise.1.running_var', 'tpu_batch_normalization_1/moving_variance/ExponentialMovingAverage'),
- # se layer
- ('block.se.fc1.weight', 'se/conv2d/kernel/ExponentialMovingAverage'), ('block.se.fc1.bias', 'se/conv2d/bias/ExponentialMovingAverage'),
- ('block.se.fc2.weight', 'se/conv2d_1/kernel/ExponentialMovingAverage'), ('block.se.fc2.bias', 'se/conv2d_1/bias/ExponentialMovingAverage'),
- # point wise layer
- ('block.fused_point_wise.0.weight', 'conv2d_1/kernel/ExponentialMovingAverage'),
- ('block.fused_point_wise.1.weight', 'tpu_batch_normalization_1/gamma/ExponentialMovingAverage'),
- ('block.fused_point_wise.1.bias', 'tpu_batch_normalization_1/beta/ExponentialMovingAverage'),
- ('block.fused_point_wise.1.running_mean', 'tpu_batch_normalization_1/moving_mean/ExponentialMovingAverage'),
- ('block.fused_point_wise.1.running_var', 'tpu_batch_normalization_1/moving_variance/ExponentialMovingAverage'),
- ('block.point_wise.0.weight', 'conv2d_1/kernel/ExponentialMovingAverage'),
- ('block.point_wise.1.weight', 'tpu_batch_normalization_2/gamma/ExponentialMovingAverage'),
- ('block.point_wise.1.bias', 'tpu_batch_normalization_2/beta/ExponentialMovingAverage'),
- ('block.point_wise.1.running_mean', 'tpu_batch_normalization_2/moving_mean/ExponentialMovingAverage'),
- ('block.point_wise.1.running_var', 'tpu_batch_normalization_2/moving_variance/ExponentialMovingAverage'),
- # head
- ('head.bottleneck.0.weight', 'head/conv2d/kernel/ExponentialMovingAverage'),
- ('head.bottleneck.1.weight', 'head/tpu_batch_normalization/gamma/ExponentialMovingAverage'),
- ('head.bottleneck.1.bias', 'head/tpu_batch_normalization/beta/ExponentialMovingAverage'),
- ('head.bottleneck.1.running_mean', 'head/tpu_batch_normalization/moving_mean/ExponentialMovingAverage'),
- ('head.bottleneck.1.running_var', 'head/tpu_batch_normalization/moving_variance/ExponentialMovingAverage'),
- # classifier
- ('head.classifier.weight', 'head/dense/kernel/ExponentialMovingAverage'),
- ('head.classifier.bias', 'head/dense/bias/ExponentialMovingAverage'),
- ('\\.(\\d+)\\.', lambda x: f'_{int(x.group(1))}/'),
- ]
- for name, param in list(model.named_parameters()) + list(model.named_buffers()):
- for pattern, sub in name_convertor:
- name = re.sub(pattern, sub, name)
- if 'dense/kernel' in name and list(param.shape) not in [[1000, 1280], [21843, 1280]]:
- continue
- if 'dense/bias' in name and list(param.shape) not in [[1000], [21843]]:
- continue
- if 'num_batches_tracked' in name:
- continue
- param.data.copy_(npz_dim_convertor(name, weight.get(name)))
- def efficientnet_v2(model_name='efficientnet_v2_s', factor=0.5, depth=0.5, pretrained=False, nclass=0, dropout=0.1, stochastic_depth=0.2, **kwargs):
- residual_config = [MBConvConfig(*layer_config) for layer_config in get_efficientnet_v2_structure(model_name)]
- model = EfficientNetV2(factor, depth, residual_config, nclass, dropout=dropout, stochastic_depth=stochastic_depth, block=MBConv, act_layer=nn.SiLU)
- efficientnet_v2_init(model)
- if pretrained:
- load_from_zoo(model, model_name)
- return model
- if __name__ == "__main__":
- # Generating Sample image
- image_size = (1, 3, 640, 640)
- image = torch.rand(*image_size)
- # Model
- model = efficientnet_v2('efficientnet_v2_s')
- out = model(image)
- print(len(out))
四、手把手教你添加EfficientNetV2机制
4.1 修改一
第一步还是建立文件,我们找到如下ultralytics/nn文件夹下建立一个目录名字呢就是'Addmodules'文件夹( 用群内的文件的话已经有了无需新建) !然后在其内部建立一个新的py文件将核心代码复制粘贴进去即可
4.2 修改二
第二步我们在该目录下创建一个新的py文件名字为'__init__.py'( 用群内的文件的话已经有了无需新建) ,然后在其内部导入我们的检测头如下图所示。
4.3 修改三
第三步我门中到如下文件'ultralytics/nn/tasks.py'进行导入和注册我们的模块( 用群内的文件的话已经有了无需重新导入直接开始第四步即可) !
从今天开始以后的教程就都统一成这个样子了,因为我默认大家用了我群内的文件来进行修改!!
4.4 修改四
添加如下两行代码!!!
4.5 修改五
找到七百多行大概把具体看图片,按照图片来修改就行,添加红框内的部分,注意没有()只是 函数 名。
- elif m in {自行添加对应的模型即可,下面都是一样的}:
- m = m(*args)
- c2 = m.width_list # 返回通道列表
- backbone = True
4.6 修改六
下面的两个红框内都是需要改动的。
- if isinstance(c2, list):
- m_ = m
- m_.backbone = True
- else:
- m_ = nn.Sequential(*(m(*args) for _ in range(n))) if n > 1 else m(*args) # module
- t = str(m)[8:-2].replace('__main__.', '') # module type
- m.np = sum(x.numel() for x in m_.parameters()) # number params
- m_.i, m_.f, m_.type = i + 4 if backbone else i, f, t # attach index, 'from' index, type
4.7 修改七
如下的也需要修改,全部按照我的来。
代码如下把原先的代码替换了即可。
- if verbose:
- LOGGER.info(f'{i:>3}{str(f):>20}{n_:>3}{m.np:10.0f} {t:<45}{str(args):<30}') # print
- save.extend(x % (i + 4 if backbone else i) for x in ([f] if isinstance(f, int) else f) if x != -1) # append to savelist
- layers.append(m_)
- if i == 0:
- ch = []
- if isinstance(c2, list):
- ch.extend(c2)
- if len(c2) != 5:
- ch.insert(0, 0)
- else:
- ch.append(c2)
4.8 修改八
修改八和前面的都不太一样,需要修改前向传播中的一个部分, 已经离开了parse_model方法了。
可以在图片中开代码行数,没有离开task.py文件都是同一个文件。 同时这个部分有好几个前向传播都很相似,大家不要看错了, 是70多行左右的!!!,同时我后面提供了代码,大家直接复制粘贴即可,有时间我针对这里会出一个视频。
代码如下->
- def _predict_once(self, x, profile=False, visualize=False, embed=None):
- """
- Perform a forward pass through the network.
- Args:
- x (torch.Tensor): The input tensor to the model.
- profile (bool): Print the computation time of each layer if True, defaults to False.
- visualize (bool): Save the feature maps of the model if True, defaults to False.
- embed (list, optional): A list of feature vectors/embeddings to return.
- Returns:
- (torch.Tensor): The last output of the model.
- """
- y, dt, embeddings = [], [], [] # outputs
- for m in self.model:
- if m.f != -1: # if not from previous layer
- x = y[m.f] if isinstance(m.f, int) else [x if j == -1 else y[j] for j in m.f] # from earlier layers
- if profile:
- self._profile_one_layer(m, x, dt)
- if hasattr(m, 'backbone'):
- x = m(x)
- if len(x) != 5: # 0 - 5
- x.insert(0, None)
- for index, i in enumerate(x):
- if index in self.save:
- y.append(i)
- else:
- y.append(None)
- x = x[-1] # 最后一个输出传给下一层
- else:
- x = m(x) # run
- y.append(x if m.i in self.save else None) # save output
- if visualize:
- feature_visualization(x, m.type, m.i, save_dir=visualize)
- if embed and m.i in embed:
- embeddings.append(nn.functional.adaptive_avg_pool2d(x, (1, 1)).squeeze(-1).squeeze(-1)) # flatten
- if m.i == max(embed):
- return torch.unbind(torch.cat(embeddings, 1), dim=0)
- return x
到这里就完成了修改部分,但是这里面细节很多,大家千万要注意不要替换多余的代码,导致报错,也不要拉下任何一部,都会导致运行失败,而且报错很难排查!!!很难排查!!!
注意!!! 额外的修改!
关注我的其实都知道,我大部分的修改都是一样的,这个网络需要额外的修改一步,就是s一个参数,将下面的s改为640!!!即可完美运行!!
打印计算量问题解决方案
我们找到如下文件'ultralytics/utils/torch_utils.py'按照如下的图片进行修改,否则容易打印不出来计算量。
注意事项!!!
如果大家在验证的时候报错形状不匹配的错误可以固定验证集的图片尺寸,方法如下 ->
找到下面这个文件ultralytics/ models /yolo/detect/train.py然后其中有一个类是DetectionTrainer class中的build_dataset函数中的一个参数rect=mode == 'val'改为rect=False
五、EfficientNetV2的yaml文件
复制如下yaml文件进行运行!!!
5.1 EfficientNetV2 的yaml文件版本1
此版本训练信息:YOLO11-EfficientNetV2 summary: 559 layers, 2,096,663 parameters, 2,096,647 gradients, 5.3 GFLOPs
使用说明:[-1, 1, efficientnet_v2}, [efficientnet_v2_s, 0.25,0.5]] 参数位置的0.25是通道放缩的系数, YOLOv11N是0.25 YOLOv11S是0.5 YOLOv11M是1. YOLOv11l是1 YOLOv11是1.5大家根据自己训练的YOLO版本设定即可.
# 0.5对应的是模型的深度系数
# efficientnet_v2_s为模型的版本
# 本文支持版本有efficientnet_v2_s, efficientnet_v2_m, efficientnet_v2_l, efficientnet_v2_xl
- # Ultralytics YOLO 🚀, AGPL-3.0 license
- # YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
- # Parameters
- nc: 80 # number of classes
- scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
- # [depth, width, max_channels]
- n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
- s: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
- m: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
- l: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
- x: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs
- # 下面 [-1, 1, efficientnet_v2}, [efficientnet_v2_s, 0.25,0.5]] 参数位置的0.25是通道放缩的系数, YOLOv11N是0.25 YOLOv11S是0.5 YOLOv11M是1. YOLOv11l是1 YOLOv11是1.5大家根据自己训练的YOLO版本设定即可.
- # 0.5对应的是模型的深度系数
- # efficientnet_v2_s为模型的版本
- # 支持的版本: efficientnet_v2_s, efficientnet_v2_m, efficientnet_v2_l, efficientnet_v2_xl
- # YOLO11n backbone
- backbone:
- # [from, repeats, module, args]
- - [-1, 1, efficientnet_v2, [efficientnet_v2_s, 0.25,0.5]] # 0-4 P1/2
- - [-1, 1, SPPF, [1024, 5]] # 5
- - [-1, 2, C2PSA, [1024]] # 6
- # YOLO11n head
- head:
- - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- - [[-1, 3], 1, Concat, [1]] # cat backbone P4
- - [-1, 2, C3k2, [512, False]] # 9
- - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- - [[-1, 2], 1, Concat, [1]] # cat backbone P3
- - [-1, 2, C3k2, [256, False]] # 12 (P3/8-small)
- - [-1, 1, Conv, [256, 3, 2]]
- - [[-1, 9], 1, Concat, [1]] # cat head P4
- - [-1, 2, C3k2, [512, False]] # 15 (P4/16-medium)
- - [-1, 1, Conv, [512, 3, 2]]
- - [[-1, 6], 1, Concat, [1]] # cat head P5
- - [-1, 2, C3k2, [1024, True]] # 18 (P5/32-large)
- - [[12, 15, 18], 1, Detect, [nc]] # Detect(P3, P4, P5)
5.2 训练文件
- import warnings
- warnings.filterwarnings('ignore')
- from ultralytics import YOLO
- if __name__ == '__main__':
- model = YOLO('ultralytics/cfg/models/v8/yolov8-C2f-FasterBlock.yaml')
- # model.load('yolov8n.pt') # loading pretrain weights
- model.train(data=r'替换数据集yaml文件地址',
- # 如果大家任务是其它的'ultralytics/cfg/default.yaml'找到这里修改task可以改成detect, segment, classify, pose
- cache=False,
- imgsz=640,
- epochs=150,
- single_cls=False, # 是否是单类别检测
- batch=4,
- close_mosaic=10,
- workers=0,
- device='0',
- optimizer='SGD', # using SGD
- # resume='', # 如过想续训就设置last.pt的地址
- amp=False, # 如果出现训练损失为Nan可以关闭amp
- project='runs/train',
- name='exp',
- )
六、成功运行记录
下面是成功运行的截图,已经完成了有1个epochs的训练,图片太大截不全第2个epochs,这里改完之后打印出了点问题,但是不影响任何功能,后期我找时间修复一下这个问题。
七、本文总结
到此本文的正式分享内容就结束了,在这里给大家推荐我的YOLOv11改进有效涨点专栏,本专栏目前为新开的平均质量分98分,后期我会根据各种最新的前沿顶会进行论文复现,也会对一些老的改进机制进行补充, 目前本专栏免费阅读(暂时,大家尽早关注不迷路~) ,如果大家觉得本文帮助到你了,订阅本专栏,关注后续更多的更新~