RT-DETR改进策略【模型轻量化】| 替换骨干网络为EfficientNet v2,加速训练,快速收敛
一、本文介绍
本文记录的是
基于 EfficientNet v2 的 RT-DETR轻量化改进方法研究
。
EfficientNet v2
针对
EfficientNet v1
存在的训练瓶颈,如
大图像尺寸训练慢
、
早期深度卷积层速度慢
以及
等比例缩放各阶段非最优
等情况进行改进,以实现
训练速度快
、
参数效率高
和
泛化能力好
的优势,将其应用到
RT-DETR
中有望提升模型整体性能,
在保证精度的同时降低模型复杂度和训练时间
。
本文在替换骨干网络中配置了原论文中的
efficientnet_v2_s
、
efficientnet_v2_m
、
efficientnet_v2_l
和
efficientnet_v2_xl
四种模型,以满足不同的需求。
二、EfficientNet v2详解
EfficientNetV2: Smaller Models and Faster Training
2.1 出发点
2.1.1 训练效率问题
在深度学习中,随着模型和数据规模的增大,训练效率愈发关键。例如,GPT - 3 虽然展现出强大的少样本学习能力,但训练需要耗费大量时间和资源,难以重新训练或改进。EfficientNet v2 旨在 提高训练速度的同时保持参数效率 ,为此对 EfficientNet(v1)的训练瓶颈进行了系统研究,发现了几个关键问题:
-
大图像尺寸训练慢
:
EfficientNet的大图像尺寸导致显著的内存使用,由于 GPU/TPU 总内存固定,不得不使用较小的批量大小进行训练,从而 大幅减慢训练速度 。 -
早期深度卷积层速度慢
:
EfficientNet中广泛使用的深度卷积在早期层中速度较慢,尽管其参数和 FLOPs 比常规卷积少,但 不能充分利用现代加速器 。 -
等比例缩放各阶段非最优
:
EfficientNet使用简单的复合缩放规则等比例缩放所有阶段,然而不同阶段对训练速度和参数效率的贡献并不相同,且激进地增大图像尺寸会导致 内存消耗大和训练慢 。
2.2 结构原理
2.2.1 融合模块(Fused - MBConv)的使用
基于对训练瓶颈的分析,
EfficientNet v2
的搜索空间引入了新的操作,如
Fused - MBConv
。
它用单个常规
conv3x3
替换了
MBConv
中的 depthwise conv3x3 和
expansion conv1x1
。通过在 EfficientNet - B4 中逐步用 Fused - MBConv 替换原始 MBConv 的实验发现,在早期阶段(1 - 3 阶段)替换可以
提高训练速度且参数和 FLOPs 增加较少
,但全部替换(1 - 7 阶段)会显著增加参数和 FLOPs 并减慢训练,因此需要找到两者的最佳组合,这促使使用神经架构搜索来自动寻找。
2.2.2 训练感知的 NAS 和缩放策略
-
NAS 搜索 :训练感知的NAS框架旨在 联合优化现代加速器上的准确性、参数效率和训练效率 。
-
以EfficientNet为骨干网络,搜索空间是基于阶段的分解空间,包括
卷积操作类型({MBConv, Fused-MBConv})、层数、核大小({3x3, 5x5})、扩展比({1, 4, 6})等设计选择。通过减少不必要的搜索选项和重用骨干网络的通道大小来缩小搜索空间,然后在与EfficientNetB4大小相当的更大网络上应用强化学习或随机搜索,采样多达1000个模型并每个模型训练约10个周期,搜索奖励结合了模型准确性、归一化训练步长时间和参数大小。
-
以EfficientNet为骨干网络,搜索空间是基于阶段的分解空间,包括
-
EfficientNet v2 架构特点
-
在早期层广泛使用
MBConv和Fused - MBConv;MBConv倾向于使用更小的扩展比以 减少内存访问开销 ;倾向于使用更小的 3x3 核大小,并通过增加更多层来补偿较小核尺寸导致的感受野减小; -
完全移除了原始
EfficientNet中的最后一个stride-1阶段,可能是由于其较大的参数大小和内存访问开销。 -
以
EfficientNet v2-S为例,其架构中不同阶段的操作、步长、通道数和层数都有特定的设置。
-
在早期层广泛使用
-
缩放策略 :通过类似的复合缩放方法将
EfficientNet v2-S扩展为EfficientNet v2-M/L,并进行了一些优化,如限制最大推理图像尺寸为480以避免内存和训练速度开销过大,以及逐渐向后阶段添加更多层来增加网络容量而不增加太多运行时开销。
2.3 优势
-
训练速度快
:通过训练感知的
NAS
和
缩放
,
EfficientNet v2相比之前的模型训练速度大幅提高。 -
参数效率高
:在保持较高准确性的同时,
EfficientNet v2的参数规模相比之前的模型大幅减小。 -
泛化能力好
:在 CIFAR - 10、CIFAR - 100、Flowers 和 Cars 等迁移学习数据集上的实验表明,
EfficientNet v2模型相比之前的 ConvNets 和 Vision Transformers 表现更好,具有良好的泛化能力。
论文: https://arxiv.org/pdf/2104.00298
源码: https://github.com/google/automl/tree/master/efficientnetv2
三、EfficientNetV2的实现代码
EfficientNetV2
的实现代码如下:
import copy
from functools import partial
from collections import OrderedDict
from torch import nn
import os
import re
import subprocess
from pathlib import Path
import numpy as np
import torch
__all__ = ['efficientnet_v2']
def get_efficientnet_v2_structure(model_name):
if 'efficientnet_v2_s' in model_name:
return [
# e k s in out xN se fused
(1, 3, 1, 24, 24, 2, False, True),
(4, 3, 2, 24, 48, 4, False, True),
(4, 3, 2, 48, 64, 4, False, True),
(4, 3, 2, 64, 128, 6, True, False),
(6, 3, 1, 128, 160, 9, True, False),
(6, 3, 2, 160, 256, 15, True, False),
]
elif 'efficientnet_v2_m' in model_name:
return [
# e k s in out xN se fused
(1, 3, 1, 24, 24, 3, False, True),
(4, 3, 2, 24, 48, 5, False, True),
(4, 3, 2, 48, 80, 5, False, True),
(4, 3, 2, 80, 160, 7, True, False),
(6, 3, 1, 160, 176, 14, True, False),
(6, 3, 2, 176, 304, 18, True, False),
(6, 3, 1, 304, 512, 5, True, False),
]
elif 'efficientnet_v2_l' in model_name:
return [
# e k s in out xN se fused
(1, 3, 1, 32, 32, 4, False, True),
(4, 3, 2, 32, 64, 7, False, True),
(4, 3, 2, 64, 96, 7, False, True),
(4, 3, 2, 96, 192, 10, True, False),
(6, 3, 1, 192, 224, 19, True, False),
(6, 3, 2, 224, 384, 25, True, False),
(6, 3, 1, 384, 640, 7, True, False),
]
elif 'efficientnet_v2_xl' in model_name:
return [
# e k s in out xN se fused
(1, 3, 1, 32, 32, 4, False, True),
(4, 3, 2, 32, 64, 8, False, True),
(4, 3, 2, 64, 96, 8, False, True),
(4, 3, 2, 96, 192, 16, True, False),
(6, 3, 1, 192, 256, 24, True, False),
(6, 3, 2, 256, 512, 32, True, False),
(6, 3, 1, 512, 640, 8, True, False),
]
class ConvBNAct(nn.Sequential):
"""Convolution-Normalization-Activation Module"""
def __init__(self, in_channel, out_channel, kernel_size, stride, groups, norm_layer, act, conv_layer=nn.Conv2d):
super(ConvBNAct, self).__init__(
conv_layer(in_channel, out_channel, kernel_size, stride=stride, padding=(kernel_size - 1) // 2,
groups=groups, bias=False),
norm_layer(out_channel),
act()
)
class SEUnit(nn.Module):
"""Squeeze-Excitation Unit
paper: https://openaccess.thecvf.com/content_cvpr_2018/html/Hu_Squeeze-and-Excitation_Networks_CVPR_2018_paper
"""
def __init__(self, in_channel, reduction_ratio=4, act1=partial(nn.SiLU, inplace=True), act2=nn.Sigmoid):
super(SEUnit, self).__init__()
hidden_dim = in_channel // reduction_ratio
self.avg_pool = nn.AdaptiveAvgPool2d((1, 1))
self.fc1 = nn.Conv2d(in_channel, hidden_dim, (1, 1), bias=True)
self.fc2 = nn.Conv2d(hidden_dim, in_channel, (1, 1), bias=True)
self.act1 = act1()
self.act2 = act2()
def forward(self, x):
return x * self.act2(self.fc2(self.act1(self.fc1(self.avg_pool(x)))))
class StochasticDepth(nn.Module):
"""StochasticDepth
paper: https://link.springer.com/chapter/10.1007/978-3-319-46493-0_39
:arg
- prob: Probability of dying
- mode: "row" or "all". "row" means that each row survives with different probability
"""
def __init__(self, prob, mode):
super(StochasticDepth, self).__init__()
self.prob = prob
self.survival = 1.0 - prob
self.mode = mode
def forward(self, x):
if self.prob == 0.0 or not self.training:
return x
else:
shape = [x.size(0)] + [1] * (x.ndim - 1) if self.mode == 'row' else [1]
return x * torch.empty(shape).bernoulli_(self.survival).div_(self.survival).to(x.device)
class MBConvConfig:
"""EfficientNet Building block configuration"""
def __init__(self, expand_ratio: float, kernel: int, stride: int, in_ch: int, out_ch: int, layers: int,
use_se: bool, fused: bool, act=nn.SiLU, norm_layer=nn.BatchNorm2d):
self.expand_ratio = expand_ratio
self.kernel = kernel
self.stride = stride
self.in_ch = in_ch
self.out_ch = out_ch
self.num_layers = layers
self.act = act
self.norm_layer = norm_layer
self.use_se = use_se
self.fused = fused
@staticmethod
def adjust_channels(channel, factor, divisible=8):
new_channel = channel * factor
divisible_channel = max(divisible, (int(new_channel + divisible / 2) // divisible) * divisible)
divisible_channel += divisible if divisible_channel < 0.9 * new_channel else 0
return divisible_channel
class MBConv(nn.Module):
"""EfficientNet main building blocks
:arg
- c: MBConvConfig instance
- sd_prob: stochastic path probability
"""
def __init__(self, c, sd_prob=0.0):
super(MBConv, self).__init__()
inter_channel = c.adjust_channels(c.in_ch, c.expand_ratio)
block = []
if c.expand_ratio == 1:
block.append(('fused', ConvBNAct(c.in_ch, inter_channel, c.kernel, c.stride, 1, c.norm_layer, c.act)))
elif c.fused:
block.append(('fused', ConvBNAct(c.in_ch, inter_channel, c.kernel, c.stride, 1, c.norm_layer, c.act)))
block.append(('fused_point_wise', ConvBNAct(inter_channel, c.out_ch, 1, 1, 1, c.norm_layer, nn.Identity)))
else:
block.append(('linear_bottleneck', ConvBNAct(c.in_ch, inter_channel, 1, 1, 1, c.norm_layer, c.act)))
block.append(('depth_wise',
ConvBNAct(inter_channel, inter_channel, c.kernel, c.stride, inter_channel, c.norm_layer,
c.act)))
block.append(('se', SEUnit(inter_channel, 4 * c.expand_ratio)))
block.append(('point_wise', ConvBNAct(inter_channel, c.out_ch, 1, 1, 1, c.norm_layer, nn.Identity)))
self.block = nn.Sequential(OrderedDict(block))
self.use_skip_connection = c.stride == 1 and c.in_ch == c.out_ch
self.stochastic_path = StochasticDepth(sd_prob, "row")
def forward(self, x):
out = self.block(x)
if self.use_skip_connection:
out = x + self.stochastic_path(out)
return out
class EfficientNetV2(nn.Module):
"""Pytorch Implementation of EfficientNetV2
paper: https://arxiv.org/abs/2104.00298
- reference 1 (pytorch): https://github.com/d-li14/efficientnetv2.pytorch/blob/main/effnetv2.py
- reference 2 (official): https://github.com/google/automl/blob/master/efficientnetv2/effnetv2_configs.py
:arg
- layer_infos: list of MBConvConfig
- out_channels: bottleneck channel
- nlcass: number of class
- dropout: dropout probability before classifier layer
- stochastic depth: stochastic depth probability
"""
def __init__(self, layer_infos, nclass=0, dropout=0.2, stochastic_depth=0.0,
block=MBConv, act_layer=nn.SiLU, norm_layer=nn.BatchNorm2d):
super(EfficientNetV2, self).__init__()
self.layer_infos = layer_infos
self.norm_layer = norm_layer
self.act = act_layer
self.in_channel = layer_infos[0].in_ch
self.final_stage_channel = layer_infos[-1].out_ch
self.cur_block = 0
self.num_block = sum(stage.num_layers for stage in layer_infos)
self.stochastic_depth = stochastic_depth
self.stem = ConvBNAct(3, self.in_channel, 3, 2, 1, self.norm_layer, self.act)
self.blocks = nn.Sequential(*self.make_stages(layer_infos, block))
self.width_list = [i.size(1) for i in self.forward(torch.randn(1, 3, 640, 640))]
def make_stages(self, layer_infos, block):
return [layer for layer_info in layer_infos for layer in self.make_layers(copy.copy(layer_info), block)]
def make_layers(self, layer_info, block):
layers = []
for i in range(layer_info.num_layers):
layers.append(block(layer_info, sd_prob=self.get_sd_prob()))
layer_info.in_ch = layer_info.out_ch
layer_info.stride = 1
return layers
def get_sd_prob(self):
sd_prob = self.stochastic_depth * (self.cur_block / self.num_block)
self.cur_block += 1
return sd_prob
def forward(self, x):
x = self.stem(x)
unique_tensors = {}
for idx, block in enumerate(self.blocks):
x = block(x)
width, height = x.shape[2], x.shape[3]
unique_tensors[(width, height)] = x
result_list = list(unique_tensors.values())[-4:]
return result_list
def efficientnet_v2_init(model):
for m in model.modules():
if isinstance(m, nn.Conv2d):
nn.init.kaiming_normal_(m.weight, mode='fan_out')
if m.bias is not None:
nn.init.zeros_(m.bias)
elif isinstance(m, (nn.BatchNorm2d, nn.GroupNorm)):
nn.init.ones_(m.weight)
nn.init.zeros_(m.bias)
elif isinstance(m, nn.Linear):
nn.init.normal_(m.weight, mean=0.0, std=0.01)
nn.init.zeros_(m.bias)
model_urls = {
"efficientnet_v2_s": "https://github.com/hankyul2/EfficientNetV2-pytorch/releases/download/EfficientNetV2-pytorch/efficientnetv2-s.npy",
"efficientnet_v2_m": "https://github.com/hankyul2/EfficientNetV2-pytorch/releases/download/EfficientNetV2-pytorch/efficientnetv2-m.npy",
"efficientnet_v2_l": "https://github.com/hankyul2/EfficientNetV2-pytorch/releases/download/EfficientNetV2-pytorch/efficientnetv2-l.npy",
"efficientnet_v2_s_in21k": "https://github.com/hankyul2/EfficientNetV2-pytorch/releases/download/EfficientNetV2-pytorch/efficientnetv2-s-21k.npy",
"efficientnet_v2_m_in21k": "https://github.com/hankyul2/EfficientNetV2-pytorch/releases/download/EfficientNetV2-pytorch/efficientnetv2-m-21k.npy",
"efficientnet_v2_l_in21k": "https://github.com/hankyul2/EfficientNetV2-pytorch/releases/download/EfficientNetV2-pytorch/efficientnetv2-l-21k.npy",
"efficientnet_v2_xl_in21k": "https://github.com/hankyul2/EfficientNetV2-pytorch/releases/download/EfficientNetV2-pytorch/efficientnetv2-xl-21k.npy",
}
def load_from_zoo(model, model_name, pretrained_path='pretrained/official'):
Path(os.path.join(pretrained_path, model_name)).mkdir(parents=True, exist_ok=True)
file_name = os.path.join(pretrained_path, model_name, os.path.basename(model_urls[model_name]))
load_npy(model, load_npy_from_url(url=model_urls[model_name], file_name=file_name))
def load_npy_from_url(url, file_name):
if not Path(file_name).exists():
subprocess.run(["wget", "-r", "-nc", '-O', file_name, url])
return np.load(file_name, allow_pickle=True).item()
def npz_dim_convertor(name, weight):
weight = torch.from_numpy(weight)
if 'kernel' in name:
if weight.dim() == 4:
if weight.shape[3] == 1:
# depth-wise convolution 'h w in_c out_c -> in_c out_c h w'
weight = torch.permute(weight, (2, 3, 0, 1))
else:
# 'h w in_c out_c -> out_c in_c h w'
weight = torch.permute(weight, (3, 2, 0, 1))
elif weight.dim() == 2:
weight = weight.transpose(1, 0)
elif 'scale' in name or 'bias' in name:
weight = weight.squeeze()
return weight
def load_npy(model, weight):
name_convertor = [
# stem
('stem.0.weight', 'stem/conv2d/kernel/ExponentialMovingAverage'),
('stem.1.weight', 'stem/tpu_batch_normalization/gamma/ExponentialMovingAverage'),
('stem.1.bias', 'stem/tpu_batch_normalization/beta/ExponentialMovingAverage'),
('stem.1.running_mean', 'stem/tpu_batch_normalization/moving_mean/ExponentialMovingAverage'),
('stem.1.running_var', 'stem/tpu_batch_normalization/moving_variance/ExponentialMovingAverage'),
# fused layer
('block.fused.0.weight', 'conv2d/kernel/ExponentialMovingAverage'),
('block.fused.1.weight', 'tpu_batch_normalization/gamma/ExponentialMovingAverage'),
('block.fused.1.bias', 'tpu_batch_normalization/beta/ExponentialMovingAverage'),
('block.fused.1.running_mean', 'tpu_batch_normalization/moving_mean/ExponentialMovingAverage'),
('block.fused.1.running_var', 'tpu_batch_normalization/moving_variance/ExponentialMovingAverage'),
# linear bottleneck
('block.linear_bottleneck.0.weight', 'conv2d/kernel/ExponentialMovingAverage'),
('block.linear_bottleneck.1.weight', 'tpu_batch_normalization/gamma/ExponentialMovingAverage'),
('block.linear_bottleneck.1.bias', 'tpu_batch_normalization/beta/ExponentialMovingAverage'),
('block.linear_bottleneck.1.running_mean', 'tpu_batch_normalization/moving_mean/ExponentialMovingAverage'),
('block.linear_bottleneck.1.running_var', 'tpu_batch_normalization/moving_variance/ExponentialMovingAverage'),
# depth wise layer
('block.depth_wise.0.weight', 'depthwise_conv2d/depthwise_kernel/ExponentialMovingAverage'),
('block.depth_wise.1.weight', 'tpu_batch_normalization_1/gamma/ExponentialMovingAverage'),
('block.depth_wise.1.bias', 'tpu_batch_normalization_1/beta/ExponentialMovingAverage'),
('block.depth_wise.1.running_mean', 'tpu_batch_normalization_1/moving_mean/ExponentialMovingAverage'),
('block.depth_wise.1.running_var', 'tpu_batch_normalization_1/moving_variance/ExponentialMovingAverage'),
# se layer
('block.se.fc1.weight', 'se/conv2d/kernel/ExponentialMovingAverage'),
('block.se.fc1.bias', 'se/conv2d/bias/ExponentialMovingAverage'),
('block.se.fc2.weight', 'se/conv2d_1/kernel/ExponentialMovingAverage'),
('block.se.fc2.bias', 'se/conv2d_1/bias/ExponentialMovingAverage'),
# point wise layer
('block.fused_point_wise.0.weight', 'conv2d_1/kernel/ExponentialMovingAverage'),
('block.fused_point_wise.1.weight', 'tpu_batch_normalization_1/gamma/ExponentialMovingAverage'),
('block.fused_point_wise.1.bias', 'tpu_batch_normalization_1/beta/ExponentialMovingAverage'),
('block.fused_point_wise.1.running_mean', 'tpu_batch_normalization_1/moving_mean/ExponentialMovingAverage'),
('block.fused_point_wise.1.running_var', 'tpu_batch_normalization_1/moving_variance/ExponentialMovingAverage'),
('block.point_wise.0.weight', 'conv2d_1/kernel/ExponentialMovingAverage'),
('block.point_wise.1.weight', 'tpu_batch_normalization_2/gamma/ExponentialMovingAverage'),
('block.point_wise.1.bias', 'tpu_batch_normalization_2/beta/ExponentialMovingAverage'),
('block.point_wise.1.running_mean', 'tpu_batch_normalization_2/moving_mean/ExponentialMovingAverage'),
('block.point_wise.1.running_var', 'tpu_batch_normalization_2/moving_variance/ExponentialMovingAverage'),
# head
('head.bottleneck.0.weight', 'head/conv2d/kernel/ExponentialMovingAverage'),
('head.bottleneck.1.weight', 'head/tpu_batch_normalization/gamma/ExponentialMovingAverage'),
('head.bottleneck.1.bias', 'head/tpu_batch_normalization/beta/ExponentialMovingAverage'),
('head.bottleneck.1.running_mean', 'head/tpu_batch_normalization/moving_mean/ExponentialMovingAverage'),
('head.bottleneck.1.running_var', 'head/tpu_batch_normalization/moving_variance/ExponentialMovingAverage'),
# classifier
('head.classifier.weight', 'head/dense/kernel/ExponentialMovingAverage'),
('head.classifier.bias', 'head/dense/bias/ExponentialMovingAverage'),
('\\.(\\d+)\\.', lambda x: f'_{int(x.group(1))}/'),
]
for name, param in list(model.named_parameters()) + list(model.named_buffers()):
for pattern, sub in name_convertor:
name = re.sub(pattern, sub, name)
if 'dense/kernel' in name and list(param.shape) not in [[1000, 1280], [21843, 1280]]:
continue
if 'dense/bias' in name and list(param.shape) not in [[1000], [21843]]:
continue
if 'num_batches_tracked' in name:
continue
param.data.copy_(npz_dim_convertor(name, weight.get(name)))
def efficientnet_v2(model_name='efficientnet_v2_s', pretrained=False, nclass=0, dropout=0.1, stochastic_depth=0.2,
**kwargs):
residual_config = [MBConvConfig(*layer_config) for layer_config in get_efficientnet_v2_structure(model_name)]
model = EfficientNetV2(residual_config, nclass, dropout=dropout, stochastic_depth=stochastic_depth, block=MBConv,
act_layer=nn.SiLU)
efficientnet_v2_init(model)
if pretrained:
load_from_zoo(model, model_name)
return model
if __name__ == "__main__":
# Generating Sample image
image_size = (1, 3, 640, 640)
image = torch.rand(*image_size)
# Model
model = efficientnet_v2('efficientnet_v2_s')
out = model(image)
print(len(out))
四、修改步骤
4.1 修改一
① 在
ultralytics/nn/
目录下新建
AddModules
文件夹用于存放模块代码
② 在
AddModules
文件夹下新建
EfficientNetV2.py
,将
第三节
中的代码粘贴到此处
4.2 修改二
在
AddModules
文件夹下新建
__init__.py
(已有则不用新建),在文件内导入模块:
from .EfficientNetV2 import *
4.3 修改三
在
ultralytics/nn/modules/tasks.py
文件中,需要在两处位置添加各模块类名称。
① 首先:导入模块
② 其次:在
parse_model函数
的如下位置添加两行代码:
backbone = False
t=m
③ 接着,在此函数下添加如下代码:
elif m in {efficientnet_v2}:
m = m(*args)
c2 = m.width_list
backbone = True
④ 然后,将下方红框内的代码全部替换:
if isinstance(c2, list):
is_backbone = True
m_ = m
m_.backbone = True
else:
m_ = nn.Sequential(*(m(*args) for _ in range(n))) if n > 1 else m(*args) # module
t = str(m)[8:-2].replace('__main__.', '') # module type
m.np = sum(x.numel() for x in m_.parameters()) # number params
m_.i, m_.f, m_.type = i + 4 if is_backbone else i, f, t # attach index, 'from' index, type
if verbose:
LOGGER.info(f'{i:>3}{str(f):>20}{n_:>3}{m.np:10.0f} {t:<45}{str(args):<30}') # print
save.extend(x % (i + 4 if is_backbone else i) for x in ([f] if isinstance(f, int) else f) if
x != -1) # append to savelist
layers.append(m_)
if i == 0:
ch = []
if isinstance(c2, list):
ch.extend(c2)
for _ in range(5 - len(ch)):
ch.insert(0, 0)
else:
ch.append(c2)
替换后如下:
⑤ 在此文件下找到
base_model
的
_predict_once
,并将其替换成如下代码。
def _predict_once(self, x, profile=False, visualize=False, embed=None):
y, dt, embeddings = [], [], [] # outputs
for m in self.model:
if m.f != -1: # if not from previous layer
x = y[m.f] if isinstance(m.f, int) else [x if j == -1 else y[j] for j in m.f] # from earlier layers
if profile:
self._profile_one_layer(m, x, dt)
if hasattr(m, 'backbone'):
x = m(x)
if len(x) != 5: # 0 - 5
x.insert(0, None)
for index, i in enumerate(x):
if index in self.save:
y.append(i)
else:
y.append(None)
x = x[-1] # 最后一个输出传给下一层
else:
x = m(x) # run
y.append(x if m.i in self.save else None) # save output
if visualize:
feature_visualization(x, m.type, m.i, save_dir=visualize)
if embed and m.i in embed:
embeddings.append(nn.functional.adaptive_avg_pool2d(x, (1, 1)).squeeze(-1).squeeze(-1)) # flatten
if m.i == max(embed):
return torch.unbind(torch.cat(embeddings, 1), dim=0)
return x
至此就修改完成了,可以配置模型开始训练了
五、yaml模型文件
5.1 模型改进⭐
在代码配置完成后,配置模型的YAML文件。
此处以
ultralytics/cfg/models/rt-detr/rtdetr-l.yaml
为例,在同目录下创建一个用于自己数据集训练的模型文件
rtdetr-EfficientNetV2.yaml
。
将
rtdetr-l.yaml
中的内容复制到
rtdetr-EfficientNetV2.yaml
文件下,修改
nc
数量等于自己数据中目标的数量。
📌 模型的修改方法是将
骨干网络
替换成
efficientnet_v2
。
# Ultralytics YOLO 🚀, AGPL-3.0 license
# RT-DETR-l object detection model with P3-P5 outputs. For details see https://docs.ultralytics.com/models/rtdetr
# Parameters
nc: 1 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'
# [depth, width, max_channels]
l: [1.00, 1.00, 1024]
backbone:
# [from, repeats, module, args]
- [-1, 1, efficientnet_v2, []] # 4
head:
- [-1, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 5 input_proj.2
- [-1, 1, AIFI, [1024, 8]] # 6
- [-1, 1, Conv, [256, 1, 1]] # 7, Y5, lateral_convs.0
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 8
- [3, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 9 input_proj.1
- [[-2, -1], 1, Concat, [1]] # 10
- [-1, 3, RepC3, [256]] # 11, fpn_blocks.0
- [-1, 1, Conv, [256, 1, 1]] # 12, Y4, lateral_convs.1
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 13
- [2, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 14 input_proj.0
- [[-2, -1], 1, Concat, [1]] # 15 cat backbone P4
- [-1, 3, RepC3, [256]] # X3 (16), fpn_blocks.1
- [-1, 1, Conv, [256, 3, 2]] # 17, downsample_convs.0
- [[-1, 12], 1, Concat, [1]] # 18 cat Y4
- [-1, 3, RepC3, [256]] # F4 (19), pan_blocks.0
- [-1, 1, Conv, [256, 3, 2]] # 20, downsample_convs.1
- [[-1, 7], 1, Concat, [1]] # 21 cat Y5
- [-1, 3, RepC3, [256]] # F5 (22), pan_blocks.1
- [[16, 19, 22], 1, RTDETRDecoder, [nc]] # Detect(P3, P4, P5)
六、成功运行结果
分别打印网络模型可以看到
efficientnet_v2模块
已经加入到模型中,并可以进行训练了。
rtdetr-EfficientNetV2 :
rtdetr-EfficientNetV2 summary: 1,108 layers, 38,307,379 parameters, 38,307,379 gradients, 107.5 GFLOPs
from n params module arguments
0 -1 1 19847248 efficientnet_v2 []
1 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1, None, 1, 1, False]
2 -1 1 789760 ultralytics.nn.modules.transformer.AIFI [256, 1024, 8]
3 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
4 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
5 3 1 41472 ultralytics.nn.modules.conv.Conv [160, 256, 1, 1, None, 1, 1, False]
6 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
7 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
8 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
9 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
10 2 1 16896 ultralytics.nn.modules.conv.Conv [64, 256, 1, 1, None, 1, 1, False]
11 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
12 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
13 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
14 [-1, 12] 1 0 ultralytics.nn.modules.conv.Concat [1]
15 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
16 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
17 [-1, 7] 1 0 ultralytics.nn.modules.conv.Concat [1]
18 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
19 [16, 19, 22] 1 7303907 ultralytics.nn.modules.head.RTDETRDecoder [1, [256, 256, 256]]
rtdetr-EfficientNetV2 summary: 1,108 layers, 38,307,379 parameters, 38,307,379 gradients, 107.5 GFLOPs