RT-DETR改进策略【模型轻量化】| 替换骨干网络为 GhostNet V1 基于 Ghost Module 和 Ghost Bottlenecks的轻量化网络结构
一、本文介绍
本文记录的是
基于
GhostNet v1
的RT-DETR网络模型轻量化方法研究
。
GhostNet
中的
Ghost模块
和
Ghost瓶颈结构
是其轻量化的关键。
Ghost模块
克服了传统卷积层
计算资源需求大
的问题,
Ghost瓶颈
则合理设计了通道数量的变化以及与
捷径连接
的方式,能更好地在减少计算成本的同时保持较高性能,从而提升模型在移动设备上的应用能力和效率。
| 模型 | 参数量 | 计算量 | 推理速度 |
|---|---|---|---|
| rtdetr-l | 32.8M | 108.0GFLOPs | 11.6ms |
| Improved | 21.3M | 63.0GFLOPs | 10.9ms |
二、GhostNet V1模型轻量化设计
GhostNet: More Features from Cheap Operations
2.1 出发点
- 减少计算资源需求 :传统的深度卷积神经网络在实现高准确率时通常需要大量的参数和浮点运算(FLOPs),例如ResNet - 50有大约2560万个参数,处理一张224×224大小的图像需要41亿FLOPs。这使得在移动设备(如智能手机和自动驾驶汽车)上部署变得困难,因此需要探索更轻便、高效且性能可接受的网络架构。
- 利用特征图冗余 :观察到在经过良好训练的深度神经网络的特征图中存在大量甚至冗余的信息, 例如ResNet - 50生成的输入图像的特征图中存在许多相似的特征图对。这些冗余信息可以作为一种资源来利用,以更高效的方式生成更多特征。
2.2 原理
2.2.1 卷积层
在常规的卷积层操作中,对于输入数据 X ∈ R c × h × w X \in \mathbb{R}^{c×h×w} X ∈ R c × h × w (其中 c c c 是输入通道数, h h h 和 w w w 分别是输入数据的高度和宽度),产生 n n n 个特征图的任意卷积层操作可以表示为 Y = X ∗ f + b Y = X * f + b Y = X ∗ f + b 。其中 ∗ * ∗ 是卷积操作, b b b 是偏置项, Y ∈ R h ′ × w ′ × n Y \in \mathbb{R}^{h'×w'×n} Y ∈ R h ′ × w ′ × n 是输出特征图, f ∈ R c × k × k × n f \in \mathbb{R}^{c×k×k×n} f ∈ R c × k × k × n 是该层的卷积滤波器, h ′ h' h ′ 和 w ′ w' w ′ 是输出数据的高度和宽度, k × k k×k k × k 是卷积滤波器 f f f 的核大小。在这个卷积过程中,所需的FLOPs数量为 n ⋅ h ′ ⋅ w ′ ⋅ c ⋅ k ⋅ k n \cdot h' \cdot w' \cdot c \cdot k \cdot k n ⋅ h ′ ⋅ w ′ ⋅ c ⋅ k ⋅ k ,由于滤波器数量 n n n 和通道数 c c c 通常很大,这个值往往非常大。
2.2.2 Ghost模块
- 生成内在特征图 :将深度神经网络中的普通卷积层分为两部分。首先通过普通卷积生成m个内在特征图 Y ′ ∈ R h ′ × w ′ × m Y' \in \mathbb{R}^{h'×w'×m} Y ′ ∈ R h ′ × w ′ × m ,其计算公式为 Y ′ = X ∗ f ′ Y' = X * f' Y ′ = X ∗ f ′ ,其中 X X X 是输入数据, f ′ f' f ′ 是卷积滤波器,这里的卷积操作的超参数(如滤波器大小、步长、填充等)与普通卷积层保持一致,以保证输出特征图的空间尺寸不变。
-
生成Ghost特征图
:基于这些内在特征图,通过一系列简单的线性操作
Φ
i
,
j
\Phi_{i, j}
Φ
i
,
j
生成
Ghost特征图。对于每个内在特征图 y i ′ y_{i}' y i ′ ,可以生成s个Ghost特征图y i j y_{ij} y ij ,其计算公式为 y i j = Φ i , j ( y i ′ ) y_{ij} = \Phi_{i, j}(y_{i}') y ij = Φ i , j ( y i ′ ) ,其中 i = 1 , ⋯ , m i = 1, \cdots, m i = 1 , ⋯ , m , j = 1 , ⋯ , s j = 1, \cdots, s j = 1 , ⋯ , s 。最后一个线性操作 Φ i , s \Phi_{i, s} Φ i , s 为恒等映射,用于保留内在特征图。通过这种方式,可以得到 n = m ⋅ s n = m \cdot s n = m ⋅ s 个特征图作为Ghost模块的输出。
2.2.3 复杂度分析
-
理论加速比
:假设存在1个恒等映射和
m
⋅
(
s
−
1
)
m \cdot (s - 1)
m
⋅
(
s
−
1
)
个线性操作,每个线性操作的平均核大小为
d
×
d
d×d
d
×
d
,理想情况下,这些线性操作可以有不同的形状和参数,但考虑到实际的在线推理效率,通常建议在一个Ghost模块中采用相同大小的线性操作(如3×3或5×5)。理论上,使用
Ghost模块升级普通卷积的加速比 r s r_{s} r s 约为 s s s ,压缩比 r c r_{c} r c 也约为 s s s 。
2.3 结构
2.3.1 Ghost瓶颈(Ghost Bottlenecks)
-
组成结构
:主要由两个
堆叠
的
Ghost模块组成。第一个Ghost模块作为扩展层,增加通道数量,其输出通道数与输入通道数的比值称为扩展比;第二个Ghost模块减少通道数量以匹配捷径路径。捷径连接在这两个Ghost模块的输入和输出之间,并且在每个层之后应用批量归一化(BN)和ReLU非线性激活函数, 但根据MobileNetV2的建议,第二个Ghost模块之后不使用ReLU。 -
不同步长的结构差异
:对于
s
t
r
i
d
e
=
1
stride = 1
s
t
r
i
d
e
=
1
的情况,结构如上述描述;对于
s
t
r
i
d
e
=
2
stride = 2
s
t
r
i
d
e
=
2
的情况,捷径路径通过一个下采样层实现,并且在两个
Ghost模块之间插入一个步长为 2 的深度卷积层。 在实际应用中,Ghost模块中的主要卷积采用逐点卷积以提高效率。
2.3.2 GhostNet整体结构
-
基本构建块
:以
Ghost瓶颈作为基本构建块,由一系列Ghost瓶颈组成。第一层是一个具有16个滤波器的标准卷积层,然后是一系列通道逐渐增加的Ghost瓶颈。这些Ghost瓶颈根据输入特征图的大小分组为不同的阶段,除了每个阶段的最后一个Ghost瓶颈采用 s t r i d e = 2 stride = 2 s t r i d e = 2 外,其他均采用 s t r i d e = 1 stride = 1 s t r i d e = 1 。 -
分类相关层
:最后使用
全局平均池化和一个卷积层将特征图转换为1280维的特征向量用于最终分类。在一些Ghost瓶颈的残差层中还应用了挤压和激励(SE)模块。
2.4 优势
- 计算成本降低 :Ghost模块能够在生成相同数量特征图的情况下,减少所需的参数数量和计算复杂度。例如,在CIFAR - 10数据集上对VGG - 16和ResNet - 56进行实验时,替换为Ghost模块后的模型(Ghost - VGG - 16和Ghost - ResNet - 56)在保持较高准确率的同时,FLOPs显著降低。在ImageNet数据集上对ResNet - 50进行实验时,Ghost - ResNet - 50( s = 2 s = 2 s = 2 )在保持准确率的情况下,获得了约2倍的加速和压缩比。
-
性能优越
- 图像分类任务 :在ImageNet分类任务中,与MobileNet系列、ShuffleNet系列、ProxylessNAS、FBNet、MnasNet等现代小网络架构相比,GhostNet在不同计算复杂度水平下均表现出更优的性能。例如,GhostNet在与MobileNetV3具有相似计算成本的情况下,能够获得更高的准确率(如GhostNet - 1.3x的top - 1准确率为75.7%,而MobileNetV3 Large 1.0x的top - 1准确率为75.2%),并且在实际推理速度上也具有优势,在相同延迟下,GhostNet的top - 1准确率比MobileNetV3高约0.5%。
- 对象检测任务 :在MS COCO数据集上进行对象检测实验时,将GhostNet作为骨干特征提取器替换到Faster R - CNN和RetinaNet框架中,在显著降低计算成本的情况下,GhostNet取得了与MobileNetV2和MobileNetV3相似的平均精度(mAP)。
论文: https://arxiv.org/pdf/1911.11907.pdf
源码: https://github.com/huawei-noah/Efficient-AI-Backbones
三、Ghostnetv1模块的实现代码
Ghostnetv1模块
的实现代码如下:
import torch
import torch.nn as nn
import torch.nn.functional as F
import math
__all__ = ['Ghostnetv1']
def _make_divisible(v, divisor, min_value=None):
"""
This function is taken from the original tf repo.
It ensures that all layers have a channel number that is divisible by 8
It can be seen here:
https://github.com/tensorflow/models/blob/master/research/slim/nets/mobilenet/mobilenet.py
"""
if min_value is None:
min_value = divisor
new_v = max(min_value, int(v + divisor / 2) // divisor * divisor)
# Make sure that round down does not go down by more than 10%.
if new_v < 0.9 * v:
new_v += divisor
return new_v
def hard_sigmoid(x, inplace: bool = False):
if inplace:
return x.add_(3.).clamp_(0., 6.).div_(6.)
else:
return F.relu6(x + 3.) / 6.
class SqueezeExcite(nn.Module):
def __init__(self, in_chs, se_ratio=0.25, reduced_base_chs=None,
act_layer=nn.ReLU, gate_fn=hard_sigmoid, divisor=4, **_):
super(SqueezeExcite, self).__init__()
self.gate_fn = gate_fn
reduced_chs = _make_divisible((reduced_base_chs or in_chs) * se_ratio, divisor)
self.avg_pool = nn.AdaptiveAvgPool2d(1)
self.conv_reduce = nn.Conv2d(in_chs, reduced_chs, 1, bias=True)
self.act1 = act_layer(inplace=True)
self.conv_expand = nn.Conv2d(reduced_chs, in_chs, 1, bias=True)
def forward(self, x):
x_se = self.avg_pool(x)
x_se = self.conv_reduce(x_se)
x_se = self.act1(x_se)
x_se = self.conv_expand(x_se)
x = x * self.gate_fn(x_se)
return x
class ConvBnAct(nn.Module):
def __init__(self, in_chs, out_chs, kernel_size,
stride=1, act_layer=nn.ReLU):
super(ConvBnAct, self).__init__()
self.conv = nn.Conv2d(in_chs, out_chs, kernel_size, stride, kernel_size//2, bias=False)
self.bn1 = nn.BatchNorm2d(out_chs)
self.act1 = act_layer(inplace=True)
def forward(self, x):
x = self.conv(x)
x = self.bn1(x)
x = self.act1(x)
return x
class GhostModule(nn.Module):
def __init__(self, inp, oup, kernel_size=1, ratio=2, dw_size=3, stride=1, relu=True):
super(GhostModule, self).__init__()
self.oup = oup
init_channels = math.ceil(oup / ratio)
new_channels = init_channels*(ratio-1)
self.primary_conv = nn.Sequential(
nn.Conv2d(inp, init_channels, kernel_size, stride, kernel_size//2, bias=False),
nn.BatchNorm2d(init_channels),
nn.ReLU(inplace=True) if relu else nn.Sequential(),
)
self.cheap_operation = nn.Sequential(
nn.Conv2d(init_channels, new_channels, dw_size, 1, dw_size//2, groups=init_channels, bias=False),
nn.BatchNorm2d(new_channels),
nn.ReLU(inplace=True) if relu else nn.Sequential(),
)
def forward(self, x):
x1 = self.primary_conv(x)
x2 = self.cheap_operation(x1)
out = torch.cat([x1,x2], dim=1)
return out[:,:self.oup,:,:]
class GhostBottleneck(nn.Module):
""" Ghost bottleneck w/ optional SE"""
def __init__(self, in_chs, mid_chs, out_chs, dw_kernel_size=3,
stride=1, act_layer=nn.ReLU, se_ratio=0.):
super(GhostBottleneck, self).__init__()
has_se = se_ratio is not None and se_ratio > 0.
self.stride = stride
# Point-wise expansion
self.ghost1 = GhostModule(in_chs, mid_chs, relu=True)
# Depth-wise convolution
if self.stride > 1:
self.conv_dw = nn.Conv2d(mid_chs, mid_chs, dw_kernel_size, stride=stride,
padding=(dw_kernel_size-1)//2,
groups=mid_chs, bias=False)
self.bn_dw = nn.BatchNorm2d(mid_chs)
# Squeeze-and-excitation
if has_se:
self.se = SqueezeExcite(mid_chs, se_ratio=se_ratio)
else:
self.se = None
# Point-wise linear projection
self.ghost2 = GhostModule(mid_chs, out_chs, relu=False)
# shortcut
if (in_chs == out_chs and self.stride == 1):
self.shortcut = nn.Sequential()
else:
self.shortcut = nn.Sequential(
nn.Conv2d(in_chs, in_chs, dw_kernel_size, stride=stride,
padding=(dw_kernel_size-1)//2, groups=in_chs, bias=False),
nn.BatchNorm2d(in_chs),
nn.Conv2d(in_chs, out_chs, 1, stride=1, padding=0, bias=False),
nn.BatchNorm2d(out_chs),
)
def forward(self, x):
residual = x
# 1st ghost bottleneck
x = self.ghost1(x)
# Depth-wise convolution
if self.stride > 1:
x = self.conv_dw(x)
x = self.bn_dw(x)
# Squeeze-and-excitation
if self.se is not None:
x = self.se(x)
# 2nd ghost bottleneck
x = self.ghost2(x)
x += self.shortcut(residual)
return x
class GhostNet(nn.Module):
def __init__(self, cfgs, num_classes=1000, width=1.0, dropout=0.2):
super(GhostNet, self).__init__()
# setting of inverted residual blocks
self.cfgs = cfgs
self.dropout = dropout
# building first layer
output_channel = _make_divisible(16 * width, 4)
self.conv_stem = nn.Conv2d(3, output_channel, 3, 2, 1, bias=False)
self.bn1 = nn.BatchNorm2d(output_channel)
self.act1 = nn.ReLU(inplace=True)
input_channel = output_channel
# building inverted residual blocks
stages = []
block = GhostBottleneck
for cfg in self.cfgs:
layers = []
for k, exp_size, c, se_ratio, s in cfg:
output_channel = _make_divisible(c * width, 4)
hidden_channel = _make_divisible(exp_size * width, 4)
layers.append(block(input_channel, hidden_channel, output_channel, k, s,
se_ratio=se_ratio))
input_channel = output_channel
stages.append(nn.Sequential(*layers))
output_channel = _make_divisible(exp_size * width, 4)
stages.append(nn.Sequential(ConvBnAct(input_channel, output_channel, 1)))
input_channel = output_channel
self.blocks = nn.Sequential(*stages)
self.width_list = [i.size(1) for i in self.forward(torch.randn(1, 3, 640, 640))]
def forward(self, x):
unique_tensors = {}
x = self.conv_stem(x)
x = self.bn1(x)
x = self.act1(x)
for model in self.blocks:
x = model(x)
if self.dropout > 0.:
x = F.dropout(x, p=self.dropout, training=self.training)
width, height = x.shape[2], x.shape[3]
unique_tensors[(width, height)] = x
result_list = list(unique_tensors.values())[-4:]
return result_list
def Ghostnetv1(**kwargs):
"""
Constructs a GhostNet model
"""
cfgs = [
# k, t, c, SE, s
# stage1
[[3, 16, 16, 0, 1]],
# stage2
[[3, 48, 24, 0, 2]],
[[3, 72, 24, 0, 1]],
# stage3
[[5, 72, 40, 0.25, 2]],
[[5, 120, 40, 0.25, 1]],
# stage4
[[3, 240, 80, 0, 2]],
[[3, 200, 80, 0, 1],
[3, 184, 80, 0, 1],
[3, 184, 80, 0, 1],
[3, 480, 112, 0.25, 1],
[3, 672, 112, 0.25, 1]
],
# stage5
[[5, 672, 160, 0.25, 2]],
[[5, 960, 160, 0, 1],
[5, 960, 160, 0.25, 1],
[5, 960, 160, 0, 1],
[5, 960, 160, 0.25, 1]
]
]
return GhostNet(cfgs, **kwargs)
if __name__=='__main__':
model = Ghostnetv1()
model.eval()
input = torch.randn(16,3,224,224)
y = model(input)
print(y.size())
四、修改步骤
4.1 修改一
① 在
ultralytics/nn/
目录下新建
AddModules
文件夹用于存放模块代码
② 在
AddModules
文件夹下新建
Ghostnetv1.py
,将
第三节
中的代码粘贴到此处
4.2 修改二
在
AddModules
文件夹下新建
__init__.py
(已有则不用新建),在文件内导入模块:
from .Ghostnetv1 import *
4.3 修改三
在
ultralytics/nn/modules/tasks.py
文件中,需要添加各模块类。
① 首先:导入模块
② 在BaseModel类的predict函数中,在如下两处位置中去掉
embed
参数:
③ 在BaseModel类的_predict_once函数,替换如下代码:
def _predict_once(self, x, profile=False, visualize=False):
"""
Perform a forward pass through the network.
Args:
x (torch.Tensor): The input tensor to the model.
profile (bool): Print the computation time of each layer if True, defaults to False.
visualize (bool): Save the feature maps of the model if True, defaults to False.
Returns:
(torch.Tensor): The last output of the model.
"""
y, dt = [], [] # outputs
for m in self.model:
if m.f != -1: # if not from previous layer
x = y[m.f] if isinstance(m.f, int) else [x if j == -1 else y[j] for j in m.f] # from earlier layers
if profile:
self._profile_one_layer(m, x, dt)
x = m(x) # run
y.append(x if m.i in self.save else None) # save output
if visualize:
feature_visualization(x, m.type, m.i, save_dir=visualize)
return x
④ 将
RTDETRDetectionModel类
中的
predict函数
完整替换:
def predict(self, x, profile=False, visualize=False, batch=None, augment=False):
"""
Perform a forward pass through the model.
Args:
x (torch.Tensor): The input tensor.
profile (bool, optional): If True, profile the computation time for each layer. Defaults to False.
visualize (bool, optional): If True, save feature maps for visualization. Defaults to False.
batch (dict, optional): Ground truth data for evaluation. Defaults to None.
augment (bool, optional): If True, perform data augmentation during inference. Defaults to False.
Returns:
(torch.Tensor): Model's output tensor.
"""
y, dt = [], [] # outputs
for m in self.model[:-1]: # except the head part
if m.f != -1: # if not from previous layer
x = y[m.f] if isinstance(m.f, int) else [x if j == -1 else y[j] for j in m.f] # from earlier layers
if profile:
self._profile_one_layer(m, x, dt)
if hasattr(m, 'backbone'):
x = m(x)
for _ in range(5 - len(x)):
x.insert(0, None)
for i_idx, i in enumerate(x):
if i_idx in self.save:
y.append(i)
else:
y.append(None)
# for i in x:
# if i is not None:
# print(i.size())
x = x[-1]
else:
x = m(x) # run
y.append(x if m.i in self.save else None) # save output
if visualize:
feature_visualization(x, m.type, m.i, save_dir=visualize)
head = self.model[-1]
x = head([y[j] for j in head.f], batch) # head inference
return x
⑤ 在
parse_model函数
如下位置替换如下代码:
if verbose:
LOGGER.info(f"\n{'':>3}{'from':>20}{'n':>3}{'params':>10} {'module':<45}{'arguments':<30}")
ch = [ch]
layers, save, c2 = [], [], ch[-1] # layers, savelist, ch out
is_backbone = False
for i, (f, n, m, args) in enumerate(d['backbone'] + d['head']): # from, number, module, args
try:
if m == 'node_mode':
m = d[m]
if len(args) > 0:
if args[0] == 'head_channel':
args[0] = int(d[args[0]])
t = m
m = getattr(torch.nn, m[3:]) if 'nn.' in m else globals()[m] # get module
except:
pass
for j, a in enumerate(args):
if isinstance(a, str):
with contextlib.suppress(ValueError):
try:
args[j] = locals()[a] if a in locals() else ast.literal_eval(a)
except:
args[j] = a
替换后如下:
⑥ 在
parse_model
函数,添加如下代码。
elif m in {
Ghostnetv1
}:
m = m(*args)
c2 = m.width_list
⑦ 在
parse_model函数
如下位置替换如下代码:
if isinstance(c2, list):
is_backbone = True
m_ = m
m_.backbone = True
else:
m_ = nn.Sequential(*(m(*args) for _ in range(n))) if n > 1 else m(*args) # module
t = str(m)[8:-2].replace('__main__.', '') # module type
m_.np = sum(x.numel() for x in m_.parameters()) # number params
m_.i, m_.f, m_.type = i + 4 if is_backbone else i, f, t # attach index, 'from' index, type
if verbose:
LOGGER.info(f'{i:>3}{str(f):>20}{n_:>3}{m_.np:10.0f} {t:<45}{str(args):<30}') # print
save.extend(x % (i + 4 if is_backbone else i) for x in ([f] if isinstance(f, int) else f) if x != -1) # append to savelist
layers.append(m_)
if i == 0:
ch = []
if isinstance(c2, list):
ch.extend(c2)
for _ in range(5 - len(ch)):
ch.insert(0, 0)
else:
ch.append(c2)
return nn.Sequential(*layers), sorted(save)
⑧ 在
ultralytics\nn\autobackend.py
文件的
AutoBackend类
中的
forward函数
,完整替换如下代码:
def forward(self, im, augment=False, visualize=False):
"""
Runs inference on the YOLOv8 MultiBackend model.
Args:
im (torch.Tensor): The image tensor to perform inference on.
augment (bool): whether to perform data augmentation during inference, defaults to False
visualize (bool): whether to visualize the output predictions, defaults to False
Returns:
(tuple): Tuple containing the raw output tensor, and processed output for visualization (if visualize=True)
"""
b, ch, h, w = im.shape # batch, channel, height, width
if self.fp16 and im.dtype != torch.float16:
im = im.half() # to FP16
if self.nhwc:
im = im.permute(0, 2, 3, 1) # torch BCHW to numpy BHWC shape(1,320,192,3)
if self.pt or self.nn_module: # PyTorch
y = self.model(im, augment=augment, visualize=visualize) if augment or visualize else self.model(im)
elif self.jit: # TorchScript
y = self.model(im)
elif self.dnn: # ONNX OpenCV DNN
im = im.cpu().numpy() # torch to numpy
self.net.setInput(im)
y = self.net.forward()
elif self.onnx: # ONNX Runtime
im = im.cpu().numpy() # torch to numpy
y = self.session.run(self.output_names, {self.session.get_inputs()[0].name: im})
elif self.xml: # OpenVINO
im = im.cpu().numpy() # FP32
y = list(self.ov_compiled_model(im).values())
elif self.engine: # TensorRT
if self.dynamic and im.shape != self.bindings['images'].shape:
i = self.model.get_binding_index('images')
self.context.set_binding_shape(i, im.shape) # reshape if dynamic
self.bindings['images'] = self.bindings['images']._replace(shape=im.shape)
for name in self.output_names:
i = self.model.get_binding_index(name)
self.bindings[name].data.resize_(tuple(self.context.get_binding_shape(i)))
s = self.bindings['images'].shape
assert im.shape == s, f"input size {im.shape} {'>' if self.dynamic else 'not equal to'} max model size {s}"
self.binding_addrs['images'] = int(im.data_ptr())
self.context.execute_v2(list(self.binding_addrs.values()))
y = [self.bindings[x].data for x in sorted(self.output_names)]
elif self.coreml: # CoreML
im = im[0].cpu().numpy()
im_pil = Image.fromarray((im * 255).astype('uint8'))
# im = im.resize((192, 320), Image.BILINEAR)
y = self.model.predict({'image': im_pil}) # coordinates are xywh normalized
if 'confidence' in y:
raise TypeError('Ultralytics only supports inference of non-pipelined CoreML models exported with '
f"'nms=False', but 'model={w}' has an NMS pipeline created by an 'nms=True' export.")
# TODO: CoreML NMS inference handling
# from ultralytics.utils.ops import xywh2xyxy
# box = xywh2xyxy(y['coordinates'] * [[w, h, w, h]]) # xyxy pixels
# conf, cls = y['confidence'].max(1), y['confidence'].argmax(1).astype(np.float32)
# y = np.concatenate((box, conf.reshape(-1, 1), cls.reshape(-1, 1)), 1)
elif len(y) == 1: # classification model
y = list(y.values())
elif len(y) == 2: # segmentation model
y = list(reversed(y.values())) # reversed for segmentation models (pred, proto)
elif self.paddle: # PaddlePaddle
im = im.cpu().numpy().astype(np.float32)
self.input_handle.copy_from_cpu(im)
self.predictor.run()
y = [self.predictor.get_output_handle(x).copy_to_cpu() for x in self.output_names]
elif self.ncnn: # ncnn
mat_in = self.pyncnn.Mat(im[0].cpu().numpy())
ex = self.net.create_extractor()
input_names, output_names = self.net.input_names(), self.net.output_names()
ex.input(input_names[0], mat_in)
y = []
for output_name in output_names:
mat_out = self.pyncnn.Mat()
ex.extract(output_name, mat_out)
y.append(np.array(mat_out)[None])
elif self.triton: # NVIDIA Triton Inference Server
im = im.cpu().numpy() # torch to numpy
y = self.model(im)
else: # TensorFlow (SavedModel, GraphDef, Lite, Edge TPU)
im = im.cpu().numpy()
if self.saved_model: # SavedModel
y = self.model(im, training=False) if self.keras else self.model(im)
if not isinstance(y, list):
y = [y]
elif self.pb: # GraphDef
y = self.frozen_func(x=self.tf.constant(im))
if len(y) == 2 and len(self.names) == 999: # segments and names not defined
ip, ib = (0, 1) if len(y[0].shape) == 4 else (1, 0) # index of protos, boxes
nc = y[ib].shape[1] - y[ip].shape[3] - 4 # y = (1, 160, 160, 32), (1, 116, 8400)
self.names = {i: f'class{i}' for i in range(nc)}
else: # Lite or Edge TPU
details = self.input_details[0]
integer = details['dtype'] in (np.int8, np.int16) # is TFLite quantized int8 or int16 model
if integer:
scale, zero_point = details['quantization']
im = (im / scale + zero_point).astype(details['dtype']) # de-scale
self.interpreter.set_tensor(details['index'], im)
self.interpreter.invoke()
y = []
for output in self.output_details:
x = self.interpreter.get_tensor(output['index'])
if integer:
scale, zero_point = output['quantization']
x = (x.astype(np.float32) - zero_point) * scale # re-scale
if x.ndim > 2: # if task is not classification
# Denormalize xywh by image size. See https://github.com/ultralytics/ultralytics/pull/1695
# xywh are normalized in TFLite/EdgeTPU to mitigate quantization error of integer models
x[:, [0, 2]] *= w
x[:, [1, 3]] *= h
y.append(x)
# TF segment fixes: export is reversed vs ONNX export and protos are transposed
if len(y) == 2: # segment with (det, proto) output order reversed
if len(y[1].shape) != 4:
y = list(reversed(y)) # should be y = (1, 116, 8400), (1, 160, 160, 32)
y[1] = np.transpose(y[1], (0, 3, 1, 2)) # should be y = (1, 116, 8400), (1, 32, 160, 160)
y = [x if isinstance(x, np.ndarray) else x.numpy() for x in y]
# for x in y:
# print(type(x), len(x)) if isinstance(x, (list, tuple)) else print(type(x), x.shape) # debug shapes
if isinstance(y, (list, tuple)):
return self.from_numpy(y[0]) if len(y) == 1 else [self.from_numpy(x) for x in y]
else:
return self.from_numpy(y)
至此就修改完成了,可以配置模型开始训练了
五、yaml模型文件
5.1 模型改进⭐
在代码配置完成后,配置模型的YAML文件。
此处以
ultralytics/cfg/models/rt-detr/rtdetr-l.yaml
为例,在同目录下创建一个用于自己数据集训练的模型文件
rtdetr-Ghostnetv1.yaml
。
将
rtdetr-l.yaml
中的内容复制到
rtdetr-Ghostnetv1.yaml
文件下,修改
nc
数量等于自己数据中目标的数量。
📌 模型的修改方法是将
骨干网络
替换成
Ghostnetv1
。
# Ultralytics YOLO 🚀, AGPL-3.0 license
# RT-DETR-l object detection model with P3-P5 outputs. For details see https://docs.ultralytics.com/models/rtdetr
# Parameters
nc: 1 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'
# [depth, width, max_channels]
l: [1.00, 1.00, 1024]
backbone:
# [from, repeats, module, args]
- [-1, 1, Ghostnetv1, []] # 4
head:
- [-1, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 5 input_proj.2
- [-1, 1, AIFI, [1024, 8]] # 6
- [-1, 1, Conv, [256, 1, 1]] # 7, Y5, lateral_convs.0
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 8
- [3, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 9 input_proj.1
- [[-2, -1], 1, Concat, [1]] # 10
- [-1, 3, RepC3, [256]] # 11, fpn_blocks.0
- [-1, 1, Conv, [256, 1, 1]] # 12, Y4, lateral_convs.1
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 13
- [2, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 14 input_proj.0
- [[-2, -1], 1, Concat, [1]] # 15 cat backbone P4
- [-1, 3, RepC3, [256]] # X3 (16), fpn_blocks.1
- [-1, 1, Conv, [256, 3, 2]] # 17, downsample_convs.0
- [[-1, 12], 1, Concat, [1]] # 18 cat Y4
- [-1, 3, RepC3, [256]] # F4 (19), pan_blocks.0
- [-1, 1, Conv, [256, 3, 2]] # 20, downsample_convs.1
- [[-1, 7], 1, Concat, [1]] # 21 cat Y5
- [-1, 3, RepC3, [256]] # F5 (22), pan_blocks.1
- [[16, 19, 22], 1, RTDETRDecoder, [nc]] # Detect(P3, P4, P5)
六、成功运行结果
分别打印网络模型可以看到
Ghostnetv1模块
已经加入到模型中,并可以进行训练了。
rtdetr-Ghostnetv1 :
rtdetr-Ghostnetv1 summary: 772 layers, 21,293,351 parameters, 21,293,351 gradients, 63.0 GFLOPs
from n params module arguments
0 -1 1 2671428 Ghostnetv1 []
1 -1 1 246272 ultralytics.nn.modules.conv.Conv [960, 256, 1, 1, None, 1, 1, False]
2 -1 1 789760 ultralytics.nn.modules.transformer.AIFI [256, 1024, 8]
3 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
4 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
5 3 1 29184 ultralytics.nn.modules.conv.Conv [112, 256, 1, 1, None, 1, 1, False]
6 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
7 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
8 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
9 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
10 2 1 10752 ultralytics.nn.modules.conv.Conv [40, 256, 1, 1, None, 1, 1, False]
11 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
12 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
13 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
14 [-1, 12] 1 0 ultralytics.nn.modules.conv.Concat [1]
15 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
16 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
17 [-1, 7] 1 0 ultralytics.nn.modules.conv.Concat [1]
18 -1 3 2232320 ultralytics.nn.modules.block.RepC3 [512, 256, 3]
19 [16, 19, 22] 1 7303907 ultralytics.nn.modules.head.RTDETRDecoder [1, [256, 256, 256]]
rtdetr-Ghostnetv1 summary: 772 layers, 21,293,351 parameters, 21,293,351 gradients, 63.0 GFLOPs