RT-DETR改进策略【模型轻量化】| 替换骨干网络 CVPR-2023 FasterNet 高效快速的部分卷积块
一、本文介绍
本文记录的是
基于FasterNet的RT-DETR轻量化改进方法研究
。
FasterNet
的网络结构借鉴 CNN 的设计理念,通过提出的
PConv
减少推理时的计算和内存成本,同时减少通道数并增加部分比例,降低延迟
,并通过后续的
PWConv
来
弥补特征信息可能缺失的问题
,提高了准确性。本文在替换骨干网络中配置了原论文中的
fasternet_t0
、
fasternet_t1
、
fasternet_t2
、
fasternet_s
、
fasternet_m
和
fasternet_l
六种模型,以满足不同的需求。
| 模型 | 参数量 | 计算量 | 推理速度 |
|---|---|---|---|
| rtdetr-l | 32.8M | 108.0GFLOPs | 11.6ms |
| Improved | 20.7M | 66.1GFLOPs | 10.3ms |
二、FasterNet结构详解
2.1 出发点
在计算机视觉任务中,追求快速神经网络是一个趋势。然而,现有的一些神经网络在降低FLOPs(浮点运算次数)时,往往忽略了FLOPS(每秒浮点运算次数)的优化,导致实际运行速度不够快。FasterNet的设计出发点是为了克服这一问题,实现既减少FLOPs又提高FLOPS,从而在各种设备上达到快速运行的效果,同时不影响准确性。
2.2 原理
2.2.1 PConv(部分卷积)的原理
-
减少计算冗余和内存访问
:观察到现有算子(如DWConv)因频繁内存访问导致FLOPS低,提出
PConv。它利用特征图在不同通道间的冗余,仅对部分输入通道应用常规卷积进行空间特征提取,其余通道保持不变。
例如,对于输入输出通道数相同且采用典型部分比例 r = 1 4 r=\frac{1}{4} r = 4 1 的情况,
PConv的FLOPs仅为常规卷积的 1 16 \frac{1}{16} 16 1 ,内存访问量也仅为常规卷积的 1 4 \frac{1}{4} 4 1 。
-
与PWConv结合
:
PConv后接PWConv(逐点卷积),其有效感受野类似T形卷积, 更关注中心位置 。这种组合方式比直接使用T形卷积更能利用滤波器间的冗余,进一步节省FLOPs。
2.2.2 FasterNet的构建原理
以
PConv
和
PWConv
为主要构建算子,构建
FasterNet
。它保持架构简单,具有硬件友好性。其整体架构包含四个层次阶段,每个阶段前有嵌入层或合并层用于空间下采样和通道数扩展,每个阶段包含多个
FasterNet块
,每个块由
一个PConv层和两个PWConv层
组成,类似倒置残差块结构,中间层通道数扩展且有 shortcut连接以复用输入特征。同时,仅在每个中间
PWConv层
后放置
归一化和激活层
,以保留特征多样性并降低延迟。
2.3 结构
2.3.1 整体架构
- 四个阶段 :具有四个层次阶段,各阶段通过嵌入层(如 4 × 4 4×4 4 × 4 卷积,步长为4)或合并层(如 2 × 2 2×2 2 × 2 卷积,步长为2)进行空间下采样和通道数扩展。
-
FasterNet块
:每个阶段包含多个
FasterNet块,每个块由PConv层和两个PWConv层组成。
2.3.2 各层细节
- PConv层 :按照 部分比例 对输入通道进行卷积操作,例如 r = 1 4 r=\frac{1}{4} r = 4 1 时仅对 1 4 \frac{1}{4} 4 1 的输入通道进行卷积。
-
PWConv层
:在
PConv层之后,用于进一步处理特征。
- 归一化和激活层 :采用 批量归一化(BN) ,激活层对于较小的FasterNet变体选择GELU,较大变体选择ReLU,且仅在中间PWConv层后放置,以减少对特征多样性的影响并降低延迟。
2.4 优势
-
速度快
:
- PConv的高FLOPS :在GPU、CPU和ARM处理器上,PConv相比DWConv分别实现了 10.5 X 10.5X 10.5 X 、 6.2 X 6.2X 6.2 X 和 22.8 X 22.8X 22.8 X 更高的FLOPS,同时FLOPs显著降低。例如,10层纯PConv的堆叠在不同处理器上展现出良好的计算速度。
- FasterNet的高效运行 :在多种设备上,如GPU、CPU和ARM处理器,FasterNet相比其他神经网络(如MobileViT、ResNet等)在保持相似或更高准确性的情况下,运行速度更快。
例如,在ImageNet - 1k上,FasterNet - T0在GPU、CPU和ARM处理器上分别比MobileViT - XXS快 2.8 × 2.8× 2.8 × 、 3.3 X 3.3X 3.3 X 和 2.4 X 2.4X 2.4 X ,同时精度更高;FasterNet - L达到83.5%的top - 1准确率,与Swin - B相当,在GPU上推理吞吐量提高36%,在CPU上节省37%的计算时间。
- 准确性高 :在分类、检测和分割等视觉任务上取得了先进的性能。在ImageNet - 1k分类任务中,不同变体的FasterNet都取得了较好的准确率,且在下游的COCO数据集上进行目标检测和实例分割任务时,相比ResNet和ResNext等模型,具有更高的平均精度(AP)。
- 结构简单 :架构设计相比许多其他模型更简单,展示了设计简单而强大神经网络的可行性。这种简单性有助于硬件实现和模型的理解与应用。
论文: https://arxiv.org/pdf/2303.03667
源码: https://github.com/JierunChen/FasterNet
三、FasterNet实现代码
FasterNet
的实现代码如下:
# Copyright (c) Microsoft Corporation.
# Licensed under the MIT License.
import torch, yaml
import torch.nn as nn
from timm.models.layers import DropPath, to_2tuple, trunc_normal_
from functools import partial
from typing import List
from torch import Tensor
import copy
import os
import numpy as np
__all__ = ['fasternet_t0', 'fasternet_t1', 'fasternet_t2', 'fasternet_s', 'fasternet_m', 'fasternet_l']
class Partial_conv3(nn.Module):
def __init__(self, dim, n_div, forward):
super().__init__()
self.dim_conv3 = dim // n_div
self.dim_untouched = dim - self.dim_conv3
self.partial_conv3 = nn.Conv2d(self.dim_conv3, self.dim_conv3, 3, 1, 1, bias=False)
if forward == 'slicing':
self.forward = self.forward_slicing
elif forward == 'split_cat':
self.forward = self.forward_split_cat
else:
raise NotImplementedError
def forward_slicing(self, x: Tensor) -> Tensor:
# only for inference
x = x.clone() # !!! Keep the original input intact for the residual connection later
x[:, :self.dim_conv3, :, :] = self.partial_conv3(x[:, :self.dim_conv3, :, :])
return x
def forward_split_cat(self, x: Tensor) -> Tensor:
# for training/inference
x1, x2 = torch.split(x, [self.dim_conv3, self.dim_untouched], dim=1)
x1 = self.partial_conv3(x1)
x = torch.cat((x1, x2), 1)
return x
class MLPBlock(nn.Module):
def __init__(self,
dim,
n_div,
mlp_ratio,
drop_path,
layer_scale_init_value,
act_layer,
norm_layer,
pconv_fw_type
):
super().__init__()
self.dim = dim
self.mlp_ratio = mlp_ratio
self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
self.n_div = n_div
mlp_hidden_dim = int(dim * mlp_ratio)
mlp_layer: List[nn.Module] = [
nn.Conv2d(dim, mlp_hidden_dim, 1, bias=False),
norm_layer(mlp_hidden_dim),
act_layer(),
nn.Conv2d(mlp_hidden_dim, dim, 1, bias=False)
]
self.mlp = nn.Sequential(*mlp_layer)
self.spatial_mixing = Partial_conv3(
dim,
n_div,
pconv_fw_type
)
if layer_scale_init_value > 0:
self.layer_scale = nn.Parameter(layer_scale_init_value * torch.ones((dim)), requires_grad=True)
self.forward = self.forward_layer_scale
else:
self.forward = self.forward
def forward(self, x: Tensor) -> Tensor:
shortcut = x
x = self.spatial_mixing(x)
x = shortcut + self.drop_path(self.mlp(x))
return x
def forward_layer_scale(self, x: Tensor) -> Tensor:
shortcut = x
x = self.spatial_mixing(x)
x = shortcut + self.drop_path(
self.layer_scale.unsqueeze(-1).unsqueeze(-1) * self.mlp(x))
return x
class BasicStage(nn.Module):
def __init__(self,
dim,
depth,
n_div,
mlp_ratio,
drop_path,
layer_scale_init_value,
norm_layer,
act_layer,
pconv_fw_type
):
super().__init__()
blocks_list = [
MLPBlock(
dim=dim,
n_div=n_div,
mlp_ratio=mlp_ratio,
drop_path=drop_path[i],
layer_scale_init_value=layer_scale_init_value,
norm_layer=norm_layer,
act_layer=act_layer,
pconv_fw_type=pconv_fw_type
)
for i in range(depth)
]
self.blocks = nn.Sequential(*blocks_list)
def forward(self, x: Tensor) -> Tensor:
x = self.blocks(x)
return x
class PatchEmbed(nn.Module):
def __init__(self, patch_size, patch_stride, in_chans, embed_dim, norm_layer):
super().__init__()
self.proj = nn.Conv2d(in_chans, embed_dim, kernel_size=patch_size, stride=patch_stride, bias=False)
if norm_layer is not None:
self.norm = norm_layer(embed_dim)
else:
self.norm = nn.Identity()
def forward(self, x: Tensor) -> Tensor:
x = self.norm(self.proj(x))
return x
class PatchMerging(nn.Module):
def __init__(self, patch_size2, patch_stride2, dim, norm_layer):
super().__init__()
self.reduction = nn.Conv2d(dim, 2 * dim, kernel_size=patch_size2, stride=patch_stride2, bias=False)
if norm_layer is not None:
self.norm = norm_layer(2 * dim)
else:
self.norm = nn.Identity()
def forward(self, x: Tensor) -> Tensor:
x = self.norm(self.reduction(x))
return x
class FasterNet(nn.Module):
def __init__(self,
in_chans=3,
num_classes=1000,
embed_dim=96,
depths=(1, 2, 8, 2),
mlp_ratio=2.,
n_div=4,
patch_size=4,
patch_stride=4,
patch_size2=2, # for subsequent layers
patch_stride2=2,
patch_norm=True,
feature_dim=1280,
drop_path_rate=0.1,
layer_scale_init_value=0,
norm_layer='BN',
act_layer='RELU',
init_cfg=None,
pretrained=None,
pconv_fw_type='split_cat',
**kwargs):
super().__init__()
if norm_layer == 'BN':
norm_layer = nn.BatchNorm2d
else:
raise NotImplementedError
if act_layer == 'GELU':
act_layer = nn.GELU
elif act_layer == 'RELU':
act_layer = partial(nn.ReLU, inplace=True)
else:
raise NotImplementedError
self.num_stages = len(depths)
self.embed_dim = embed_dim
self.patch_norm = patch_norm
self.num_features = int(embed_dim * 2 ** (self.num_stages - 1))
self.mlp_ratio = mlp_ratio
self.depths = depths
# split image into non-overlapping patches
self.patch_embed = PatchEmbed(
patch_size=patch_size,
patch_stride=patch_stride,
in_chans=in_chans,
embed_dim=embed_dim,
norm_layer=norm_layer if self.patch_norm else None
)
# stochastic depth decay rule
dpr = [x.item()
for x in torch.linspace(0, drop_path_rate, sum(depths))]
# build layers
stages_list = []
for i_stage in range(self.num_stages):
stage = BasicStage(dim=int(embed_dim * 2 ** i_stage),
n_div=n_div,
depth=depths[i_stage],
mlp_ratio=self.mlp_ratio,
drop_path=dpr[sum(depths[:i_stage]):sum(depths[:i_stage + 1])],
layer_scale_init_value=layer_scale_init_value,
norm_layer=norm_layer,
act_layer=act_layer,
pconv_fw_type=pconv_fw_type
)
stages_list.append(stage)
# patch merging layer
if i_stage < self.num_stages - 1:
stages_list.append(
PatchMerging(patch_size2=patch_size2,
patch_stride2=patch_stride2,
dim=int(embed_dim * 2 ** i_stage),
norm_layer=norm_layer)
)
self.stages = nn.Sequential(*stages_list)
# add a norm layer for each output
self.out_indices = [0, 2, 4, 6]
for i_emb, i_layer in enumerate(self.out_indices):
if i_emb == 0 and os.environ.get('FORK_LAST3', None):
raise NotImplementedError
else:
layer = norm_layer(int(embed_dim * 2 ** i_emb))
layer_name = f'norm{i_layer}'
self.add_module(layer_name, layer)
self.channel = [i.size(1) for i in self.forward(torch.randn(1, 3, 640, 640))]
def forward(self, x: Tensor) -> Tensor:
# output the features of four stages for dense prediction
x = self.patch_embed(x)
outs = []
for idx, stage in enumerate(self.stages):
x = stage(x)
if idx in self.out_indices:
norm_layer = getattr(self, f'norm{idx}')
x_out = norm_layer(x)
outs.append(x_out)
return outs
def update_weight(model_dict, weight_dict):
idx, temp_dict = 0, {}
for k, v in weight_dict.items():
if k in model_dict.keys() and np.shape(model_dict[k]) == np.shape(v):
temp_dict[k] = v
idx += 1
model_dict.update(temp_dict)
print(f'loading weights... {idx}/{len(model_dict)} items')
return model_dict
def fasternet_t0(weights=None, cfg='ultralytics/nn/AddModules/faster_cfg/fasternet_t0.yaml'):
with open(cfg) as f:
cfg = yaml.load(f, Loader=yaml.SafeLoader)
model = FasterNet(**cfg)
if weights is not None:
pretrain_weight = torch.load(weights, map_location='cpu')
model.load_state_dict(update_weight(model.state_dict(), pretrain_weight))
return model
def fasternet_t1(weights=None, cfg='ultralytics/nn/AddModules/faster_cfg/fasternet_t1.yaml'):
with open(cfg) as f:
cfg = yaml.load(f, Loader=yaml.SafeLoader)
model = FasterNet(**cfg)
if weights is not None:
pretrain_weight = torch.load(weights, map_location='cpu')
model.load_state_dict(update_weight(model.state_dict(), pretrain_weight))
return model
def fasternet_t2(weights=None, cfg='ultralytics/nn/AddModules/faster_cfg/fasternet_t2.yaml'):
with open(cfg) as f:
cfg = yaml.load(f, Loader=yaml.SafeLoader)
model = FasterNet(**cfg)
if weights is not None:
pretrain_weight = torch.load(weights, map_location='cpu')
model.load_state_dict(update_weight(model.state_dict(), pretrain_weight))
return model
def fasternet_s(weights=None, cfg='ultralytics/nn/AddModules/faster_cfg/fasternet_s.yaml'):
with open(cfg) as f:
cfg = yaml.load(f, Loader=yaml.SafeLoader)
model = FasterNet(**cfg)
if weights is not None:
pretrain_weight = torch.load(weights, map_location='cpu')
model.load_state_dict(update_weight(model.state_dict(), pretrain_weight))
return model
def fasternet_m(weights=None, cfg='ultralytics/nn/AddModules/faster_cfg/fasternet_m.yaml'):
with open(cfg) as f:
cfg = yaml.load(f, Loader=yaml.SafeLoader)
model = FasterNet(**cfg)
if weights is not None:
pretrain_weight = torch.load(weights, map_location='cpu')
model.load_state_dict(update_weight(model.state_dict(), pretrain_weight))
return model
def fasternet_l(weights=None, cfg='ultralytics/nn/AddModules/faster_cfg/fasternet_l.yaml'):
with open(cfg) as f:
cfg = yaml.load(f, Loader=yaml.SafeLoader)
model = FasterNet(**cfg)
if weights is not None:
pretrain_weight = torch.load(weights, map_location='cpu')
model.load_state_dict(update_weight(model.state_dict(), pretrain_weight))
return model
if __name__ == '__main__':
import yaml
model = fasternet_t0(weights='fasternet_t0-epoch.281-val_acc1.71.9180.pth', cfg='cfg/fasternet_t0.yaml')
print(model.channel)
inputs = torch.randn((1, 3, 640, 640))
for i in model(inputs):
print(i.size())
四、修改步骤
4.1 修改一
① 在
ultralytics/nn/
目录下新建
AddModules
文件夹用于存放模块代码
② 在
AddModules
文件夹下新建
FasterNet.py
,将
第三节
中的代码粘贴到此处
4.2 修改二
在
AddModules
文件夹下新建
__init__.py
(已有则不用新建),在文件内导入模块:
from .FasterNet import *
4.3 修改三
在
ultralytics/nn/modules/tasks.py
文件中,需要添加各模块类。
① 首先:导入模块
② 在BaseModel类的predict函数中,在如下两处位置中去掉
embed
参数:
③ 在BaseModel类的_predict_once函数,替换如下代码:
def _predict_once(self, x, profile=False, visualize=False):
"""
Perform a forward pass through the network.
Args:
x (torch.Tensor): The input tensor to the model.
profile (bool): Print the computation time of each layer if True, defaults to False.
visualize (bool): Save the feature maps of the model if True, defaults to False.
Returns:
(torch.Tensor): The last output of the model.
"""
y, dt = [], [] # outputs
for m in self.model:
if m.f != -1: # if not from previous layer
x = y[m.f] if isinstance(m.f, int) else [x if j == -1 else y[j] for j in m.f] # from earlier layers
if profile:
self._profile_one_layer(m, x, dt)
x = m(x) # run
y.append(x if m.i in self.save else None) # save output
if visualize:
feature_visualization(x, m.type, m.i, save_dir=visualize)
return x
④ 将
RTDETRDetectionModel类
中的
predict函数
完整替换:
def predict(self, x, profile=False, visualize=False, batch=None, augment=False):
"""
Perform a forward pass through the model.
Args:
x (torch.Tensor): The input tensor.
profile (bool, optional): If True, profile the computation time for each layer. Defaults to False.
visualize (bool, optional): If True, save feature maps for visualization. Defaults to False.
batch (dict, optional): Ground truth data for evaluation. Defaults to None.
augment (bool, optional): If True, perform data augmentation during inference. Defaults to False.
Returns:
(torch.Tensor): Model's output tensor.
"""
y, dt = [], [] # outputs
for m in self.model[:-1]: # except the head part
if m.f != -1: # if not from previous layer
x = y[m.f] if isinstance(m.f, int) else [x if j == -1 else y[j] for j in m.f] # from earlier layers
if profile:
self._profile_one_layer(m, x, dt)
if hasattr(m, 'backbone'):
x = m(x)
for _ in range(5 - len(x)):
x.insert(0, None)
for i_idx, i in enumerate(x):
if i_idx in self.save:
y.append(i)
else:
y.append(None)
# for i in x:
# if i is not None:
# print(i.size())
x = x[-1]
else:
x = m(x) # run
y.append(x if m.i in self.save else None) # save output
if visualize:
feature_visualization(x, m.type, m.i, save_dir=visualize)
head = self.model[-1]
x = head([y[j] for j in head.f], batch) # head inference
return x
⑤ 在
parse_model函数
如下位置替换如下代码:
if verbose:
LOGGER.info(f"\n{'':>3}{'from':>20}{'n':>3}{'params':>10} {'module':<45}{'arguments':<30}")
ch = [ch]
layers, save, c2 = [], [], ch[-1] # layers, savelist, ch out
is_backbone = False
for i, (f, n, m, args) in enumerate(d['backbone'] + d['head']): # from, number, module, args
try:
if m == 'node_mode':
m = d[m]
if len(args) > 0:
if args[0] == 'head_channel':
args[0] = int(d[args[0]])
t = m
m = getattr(torch.nn, m[3:]) if 'nn.' in m else globals()[m] # get module
except:
pass
for j, a in enumerate(args):
if isinstance(a, str):
with contextlib.suppress(ValueError):
try:
args[j] = locals()[a] if a in locals() else ast.literal_eval(a)
except:
args[j] = a
替换后如下:
⑥ 在
parse_model
函数,添加如下代码。
elif m in {fasternet_t0, fasternet_t1, fasternet_t2, fasternet_s, fasternet_m, fasternet_l,}:
m = m(*args)
c2 = m.channel
⑦ 在
parse_model函数
如下位置替换如下代码:
if isinstance(c2, list):
is_backbone = True
m_ = m
m_.backbone = True
else:
m_ = nn.Sequential(*(m(*args) for _ in range(n))) if n > 1 else m(*args) # module
t = str(m)[8:-2].replace('__main__.', '') # module type
m_.np = sum(x.numel() for x in m_.parameters()) # number params
m_.i, m_.f, m_.type = i + 4 if is_backbone else i, f, t # attach index, 'from' index, type
if verbose:
LOGGER.info(f'{i:>3}{str(f):>20}{n_:>3}{m_.np:10.0f} {t:<45}{str(args):<30}') # print
save.extend(x % (i + 4 if is_backbone else i) for x in ([f] if isinstance(f, int) else f) if x != -1) # append to savelist
layers.append(m_)
if i == 0:
ch = []
if isinstance(c2, list):
ch.extend(c2)
for _ in range(5 - len(ch)):
ch.insert(0, 0)
else:
ch.append(c2)
return nn.Sequential(*layers), sorted(save)
⑧ 在
ultralytics\nn\autobackend.py
文件的
AutoBackend类
中的
forward函数
,完整替换如下代码:
def forward(self, im, augment=False, visualize=False):
"""
Runs inference on the YOLOv8 MultiBackend model.
Args:
im (torch.Tensor): The image tensor to perform inference on.
augment (bool): whether to perform data augmentation during inference, defaults to False
visualize (bool): whether to visualize the output predictions, defaults to False
Returns:
(tuple): Tuple containing the raw output tensor, and processed output for visualization (if visualize=True)
"""
b, ch, h, w = im.shape # batch, channel, height, width
if self.fp16 and im.dtype != torch.float16:
im = im.half() # to FP16
if self.nhwc:
im = im.permute(0, 2, 3, 1) # torch BCHW to numpy BHWC shape(1,320,192,3)
if self.pt or self.nn_module: # PyTorch
y = self.model(im, augment=augment, visualize=visualize) if augment or visualize else self.model(im)
elif self.jit: # TorchScript
y = self.model(im)
elif self.dnn: # ONNX OpenCV DNN
im = im.cpu().numpy() # torch to numpy
self.net.setInput(im)
y = self.net.forward()
elif self.onnx: # ONNX Runtime
im = im.cpu().numpy() # torch to numpy
y = self.session.run(self.output_names, {self.session.get_inputs()[0].name: im})
elif self.xml: # OpenVINO
im = im.cpu().numpy() # FP32
y = list(self.ov_compiled_model(im).values())
elif self.engine: # TensorRT
if self.dynamic and im.shape != self.bindings['images'].shape:
i = self.model.get_binding_index('images')
self.context.set_binding_shape(i, im.shape) # reshape if dynamic
self.bindings['images'] = self.bindings['images']._replace(shape=im.shape)
for name in self.output_names:
i = self.model.get_binding_index(name)
self.bindings[name].data.resize_(tuple(self.context.get_binding_shape(i)))
s = self.bindings['images'].shape
assert im.shape == s, f"input size {im.shape} {'>' if self.dynamic else 'not equal to'} max model size {s}"
self.binding_addrs['images'] = int(im.data_ptr())
self.context.execute_v2(list(self.binding_addrs.values()))
y = [self.bindings[x].data for x in sorted(self.output_names)]
elif self.coreml: # CoreML
im = im[0].cpu().numpy()
im_pil = Image.fromarray((im * 255).astype('uint8'))
# im = im.resize((192, 320), Image.BILINEAR)
y = self.model.predict({'image': im_pil}) # coordinates are xywh normalized
if 'confidence' in y:
raise TypeError('Ultralytics only supports inference of non-pipelined CoreML models exported with '
f"'nms=False', but 'model={w}' has an NMS pipeline created by an 'nms=True' export.")
# TODO: CoreML NMS inference handling
# from ultralytics.utils.ops import xywh2xyxy
# box = xywh2xyxy(y['coordinates'] * [[w, h, w, h]]) # xyxy pixels
# conf, cls = y['confidence'].max(1), y['confidence'].argmax(1).astype(np.float32)
# y = np.concatenate((box, conf.reshape(-1, 1), cls.reshape(-1, 1)), 1)
elif len(y) == 1: # classification model
y = list(y.values())
elif len(y) == 2: # segmentation model
y = list(reversed(y.values())) # reversed for segmentation models (pred, proto)
elif self.paddle: # PaddlePaddle
im = im.cpu().numpy().astype(np.float32)
self.input_handle.copy_from_cpu(im)
self.predictor.run()
y = [self.predictor.get_output_handle(x).copy_to_cpu() for x in self.output_names]
elif self.ncnn: # ncnn
mat_in = self.pyncnn.Mat(im[0].cpu().numpy())
ex = self.net.create_extractor()
input_names, output_names = self.net.input_names(), self.net.output_names()
ex.input(input_names[0], mat_in)
y = []
for output_name in output_names:
mat_out = self.pyncnn.Mat()
ex.extract(output_name, mat_out)
y.append(np.array(mat_out)[None])
elif self.triton: # NVIDIA Triton Inference Server
im = im.cpu().numpy() # torch to numpy
y = self.model(im)
else: # TensorFlow (SavedModel, GraphDef, Lite, Edge TPU)
im = im.cpu().numpy()
if self.saved_model: # SavedModel
y = self.model(im, training=False) if self.keras else self.model(im)
if not isinstance(y, list):
y = [y]
elif self.pb: # GraphDef
y = self.frozen_func(x=self.tf.constant(im))
if len(y) == 2 and len(self.names) == 999: # segments and names not defined
ip, ib = (0, 1) if len(y[0].shape) == 4 else (1, 0) # index of protos, boxes
nc = y[ib].shape[1] - y[ip].shape[3] - 4 # y = (1, 160, 160, 32), (1, 116, 8400)
self.names = {i: f'class{i}' for i in range(nc)}
else: # Lite or Edge TPU
details = self.input_details[0]
integer = details['dtype'] in (np.int8, np.int16) # is TFLite quantized int8 or int16 model
if integer:
scale, zero_point = details['quantization']
im = (im / scale + zero_point).astype(details['dtype']) # de-scale
self.interpreter.set_tensor(details['index'], im)
self.interpreter.invoke()
y = []
for output in self.output_details:
x = self.interpreter.get_tensor(output['index'])
if integer:
scale, zero_point = output['quantization']
x = (x.astype(np.float32) - zero_point) * scale # re-scale
if x.ndim > 2: # if task is not classification
# Denormalize xywh by image size. See https://github.com/ultralytics/ultralytics/pull/1695
# xywh are normalized in TFLite/EdgeTPU to mitigate quantization error of integer models
x[:, [0, 2]] *= w
x[:, [1, 3]] *= h
y.append(x)
# TF segment fixes: export is reversed vs ONNX export and protos are transposed
if len(y) == 2: # segment with (det, proto) output order reversed
if len(y[1].shape) != 4:
y = list(reversed(y)) # should be y = (1, 116, 8400), (1, 160, 160, 32)
y[1] = np.transpose(y[1], (0, 3, 1, 2)) # should be y = (1, 116, 8400), (1, 32, 160, 160)
y = [x if isinstance(x, np.ndarray) else x.numpy() for x in y]
# for x in y:
# print(type(x), len(x)) if isinstance(x, (list, tuple)) else print(type(x), x.shape) # debug shapes
if isinstance(y, (list, tuple)):
return self.from_numpy(y[0]) if len(y) == 1 else [self.from_numpy(x) for x in y]
else:
return self.from_numpy(y)
4.4 修改四
在
ultralytics/nn/AddModules/
目录下新建
faster_cfg
在此文件下新建
6个YAML
文件
①
fasternet_l.yaml
文件中粘贴以下内容:
mlp_ratio: 2
embed_dim: 192
depths: [3, 4, 18, 3]
feature_dim: 1280
patch_size: 4
patch_stride: 4
patch_size2: 2
patch_stride2: 2
layer_scale_init_value: 0 # no layer scale
drop_path_rate: 0.3
norm_layer: BN
act_layer: RELU
n_div: 4
②
fasternet_m.yaml
文件中粘贴以下内容:
mlp_ratio: 2
embed_dim: 144
depths: [3, 4, 18, 3]
feature_dim: 1280
patch_size: 4
patch_stride: 4
patch_size2: 2
patch_stride2: 2
layer_scale_init_value: 0 # no layer scale
drop_path_rate: 0.2
norm_layer: BN
act_layer: RELU
n_div: 4
③
fasternet_s.yaml
文件中粘贴以下内容:
mlp_ratio: 2
embed_dim: 128
depths: [1, 2, 13, 2]
feature_dim: 1280
patch_size: 4
patch_stride: 4
patch_size2: 2
patch_stride2: 2
layer_scale_init_value: 0 # no layer scale
drop_path_rate: 0.1
norm_layer: BN
act_layer: RELU
n_div: 4
④
fasternet_t0.yaml
文件中粘贴以下内容:
mlp_ratio: 2
embed_dim: 40
depths: [1, 2, 8, 2]
feature_dim: 1280
patch_size: 4
patch_stride: 4
patch_size2: 2
patch_stride2: 2
layer_scale_init_value: 0 # no layer scale
drop_path_rate: 0.
norm_layer: BN
act_layer: GELU
n_div: 4
⑤
fasternet_t1.yaml
文件中粘贴以下内容:
mlp_ratio: 2
embed_dim: 64
depths: [1, 2, 8, 2]
feature_dim: 1280
patch_size: 4
patch_stride: 4
patch_size2: 2
patch_stride2: 2
layer_scale_init_value: 0 # no layer scale
drop_path_rate: 0.02
norm_layer: BN
act_layer: GELU
n_div: 4
⑥
fasternet_t2.yaml
文件中粘贴以下内容:
mlp_ratio: 2
embed_dim: 96
depths: [1, 2, 8, 2]
feature_dim: 1280
patch_size: 4
patch_stride: 4
patch_size2: 2
patch_stride2: 2
layer_scale_init_value: 0 # no layer scale
drop_path_rate: 0.05
norm_layer: BN
act_layer: RELU
n_div: 4
4.5 修改五
在
FasterNet.py
中,将模型配置路径修改成自己的路径,我的路径是:
ultralytics/nn/AddModules/faster_cfg
分别有 6 个,均要修改
至此就修改完成了,可以配置模型开始训练了
五、yaml模型文件
5.1 模型改进⭐
在代码配置完成后,配置模型的YAML文件。
此处以
ultralytics/cfg/models/rt-detr/rtdetr-l.yaml
为例,在同目录下创建一个用于自己数据集训练的模型文件
rtdetr-FasterNet.yaml
。
将
rtdetr-l.yaml
中的内容复制到
rtdetr-FasterNet.yaml
文件下,修改
nc
数量等于自己数据中目标的数量。
📌 模型的修改方法是将
骨干网络
替换成
fasternet_t0
。
# Ultralytics YOLO 🚀, AGPL-3.0 license
# RT-DETR-l object detection model with P3-P5 outputs. For details see https://docs.ultralytics.com/models/rtdetr
# Parameters
nc: 1 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'
# [depth, width, max_channels]
l: [1.00, 1.00, 1024]
backbone:
# [from, repeats, module, args]
- [-1, 1, fasternet_t0, []] # 4
head:
- [-1, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 5 input_proj.2
- [-1, 1, AIFI, [1024, 8]] # 6
- [-1, 1, Conv, [256, 1, 1]] # 7, Y5, lateral_convs.0
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 8
- [3, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 9 input_proj.1
- [[-2, -1], 1, Concat, [1]] # 10
- [-1, 3, RepC3, [256]] # 11, fpn_blocks.0
- [-1, 1, Conv, [256, 1, 1]] # 12, Y4, lateral_convs.1
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 13
- [2, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 14 input_proj.0
- [[-2, -1], 1, Concat, [1]] # 15 cat backbone P4
- [-1, 3, RepC3, [256]] # X3 (16), fpn_blocks.1
- [-1, 1, Conv, [256, 3, 2]] # 17, downsample_convs.0
- [[-1, 12], 1, Concat, [1]] # 18 cat Y4
- [-1, 3, RepC3, [256]] # F4 (19), pan_blocks.0
- [-1, 1, Conv, [256, 3, 2]] # 20, downsample_convs.1
- [[-1, 7], 1, Concat, [1]] # 21 cat Y5
- [-1, 3, RepC3, [256]] # F5 (22), pan_blocks.1
- [[16, 19, 22], 1, RTDETRDecoder, [nc]] # Detect(P3, P4, P5)
六、成功运行结果
分别打印网络模型可以看到
FasterNet模块
已经加入到模型中,并可以进行训练了。
rtdetr-FasterNet :
rtdetr-FasterNet summary: 513 layers, 20,696,711 parameters, 20,696,711 gradients, 66.1 GFLOPs
from n params module arguments
0 -1 1 2216100 fasternet_t0 []
1 -1 1 82432 ultralytics.nn.modules.conv.Conv [320, 256, 1, 1, None, 1, 1, False]
2 -1 1 789760 ultralytics.nn.modules.transformer.AIFI [256, 1024, 8]
3 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
4 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
5 3 1 41472 ultralytics.nn.modules.conv.Conv [160, 256, 1, 1, None, 1, 1, False]
6 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
7 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
8 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
9 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
10 2 1 20992 ultralytics.nn.modules.conv.Conv [80, 256, 1, 1, None, 1, 1, False]
11 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
12 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
13 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
14 [-1, 12] 1 0 ultralytics.nn.modules.conv.Concat [1]
15 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
16 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
17 [-1, 7] 1 0 ultralytics.nn.modules.conv.Concat [1]
18 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
19 [16, 19, 22] 1 7303907 ultralytics.nn.modules.head.RTDETRDecoder [1, [256, 256, 256]]
rtdetr-FasterNet summary: 513 layers, 20,696,711 parameters, 20,696,711 gradients, 66.1 GFLOPs