【RT-DETR多模态融合改进】在前期、中期、中后期、后期多模态融合中添加P2小目标检测层,完整步骤及代码
前言
主题: RT-DETR 的多模态融合改进中增加P2小目标检测层
方式: 分别在前期融合、中期融合、中-后期融合、后期融合中增加P2多模态融合检测层。
内容: 包含融合方式详解以及完整配置步骤,开箱即用,一键运行。
一、RT-DETR原始模型结构介绍
rt-detr-resnet18
原始模型结构如下:
# Ultralytics YOLO 🚀, AGPL-3.0 license
# RT-DETR-ResNet50 object detection model with P3-P5 outputs.
# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'
# [depth, width, max_channels]
l: [1.00, 1.00, 1024]
backbone:
# [from, repeats, module, args]
- [-1, 1, ConvNormLayer, [32, 3, 2, 1, 'relu']] # 0-P1
- [-1, 1, ConvNormLayer, [32, 3, 1, 1, 'relu']] # 1
- [-1, 1, ConvNormLayer, [64, 3, 1, 1, 'relu']] # 2
- [-1, 1, nn.MaxPool2d, [3, 2, 1]] # 3-P2
- [-1, 2, Blocks, [64, BasicBlock, 2, False]] # 4
- [-1, 2, Blocks, [128, BasicBlock, 3, False]] # 5-P3
- [-1, 2, Blocks, [256, BasicBlock, 4, False]] # 6-P4
- [-1, 2, Blocks, [512, BasicBlock, 5, False]] # 7-P5
head:
- [-1, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 8 input_proj.2
- [-1, 1, AIFI, [1024, 8]]
- [-1, 1, Conv, [256, 1, 1]] # 10, Y5, lateral_convs.0
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 11
- [6, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 12 input_proj.1
- [[-2, -1], 1, Concat, [1]]
- [-1, 3, RepC3, [256, 0.5]] # 14, fpn_blocks.0
- [-1, 1, Conv, [256, 1, 1]] # 15, Y4, lateral_convs.1
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 16
- [5, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 17 input_proj.0
- [[-2, -1], 1, Concat, [1]] # 18 cat backbone P4
- [-1, 3, RepC3, [256, 0.5]] # X3 (19), fpn_blocks.1
- [-1, 1, Conv, [256, 3, 2]] # 20, downsample_convs.0
- [[-1, 15], 1, Concat, [1]] # 21 cat Y4
- [-1, 3, RepC3, [256, 0.5]] # F4 (22), pan_blocks.0
- [-1, 1, Conv, [256, 3, 2]] # 23, downsample_convs.1
- [[-1, 10], 1, Concat, [1]] # 24 cat Y5
- [-1, 3, RepC3, [256, 0.5]] # F5 (25), pan_blocks.1
- [[19, 22, 25], 1, RTDETRDecoder, [nc, 256, 300, 4, 8, 3]] # Detect(P3, P4, P5)
二、有效特征层对应的检测头类别
2.1 P3/8 - small检测头
-
原始模型中的
P3/8特征层对应的检测头主要用于检测相对较小的目标。其特征图大小相对较大,空间分辨率较高。 -
适合检测尺寸大概在
8x8到32x32像素左右的目标。
2.2 P4/16 - medium检测头
-
这个检测头对应的
P4/16特征层经过了更多的下采样操作,相比P3/8特征图空间分辨率降低,但通道数增加,特征更抽象且有语义信息。 -
它主要用于检测中等大小的目标,尺寸范围大概在
32x32到64x64像素左右。
2.3 P5/32 - large检测头
-
P5/32是经过最多下采样操作得到的特征层,其空间分辨率最低,但语义信息最强、全局感受野最大。 -
该检测头适合检测较大尺寸的目标,一般是尺寸在
64x64像素以上的目标。
2.4 新添加针对小目标的检测头
-
新添加的检测头主要用于检测更小尺寸的目标。尺寸在
4x4到8x8像素左右的微小目标。
💡这是因为在目标检测任务中,随着目标尺寸的减小,需要更高分辨率的特征图来有效捕捉目标特征。新添加的检测头很可能是基于这样的考虑,通过一系列的卷积、上采样和拼接等操作生成适合微小目标检测的特征图,从而提高模型对微小目标的检测能力。
三、小目标检测头多模态融合方式
-
前期融合中,在网络输入阶段将多模态数据合并后,增加针对小目标的检测层。
-
中期融合中,在骨干网络中增加针对P2的多模态特征进行融合,以此引出小目标的检测层。
-
中-后期融合中,在颈部的FPN结构中,增加针对P2的多模态特征进行融合,以此引出小目标的检测层。
-
后期融合中,在检测头前增加P2多模态特征进行融合。
四、完整配置步骤
!!! 私信获取的项目包就已经把相关的多模态输入、训练等改动都已经配好了,只需要新建模型yaml文件,粘贴对应的模型,进行训练即可。 项目包获取及使用教程可参考链接: 《YOLO系列模型的多模态项目》配置使用教程
由于RT-DETR的其它版本参数量和计算量都比较大,本专栏主要在resnet18版本上进行改进。
除了模型结构方面的改动,在yaml文件中还传入了一个通道数
ch: 6
表示传入的是双模态,6通道 ,前三个是可见光,后三个是红外。
在default.yaml中也配置了这个参数。
4.1 P2前期融合
# Ultralytics YOLO 🚀, AGPL-3.0 license
# RT-DETR-ResNet50 object detection model with P3-P5 outputs.
# Parameters
ch: 6
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'
# [depth, width, max_channels]
l: [1.00, 1.00, 1024]
backbone:
# [from, repeats, module, args]
- [-1, 1, MF, [32]] # 0
- [-1, 1, ConvNormLayer, [32, 3, 2, 1, 'relu']] # 1-P1
- [-1, 1, ConvNormLayer, [32, 3, 1, 1, 'relu']] # 2
- [-1, 1, ConvNormLayer, [64, 3, 1, 1, 'relu']] # 3
- [-1, 1, nn.MaxPool2d, [3, 2, 1]] # 4-P2
- [-1, 2, Blocks, [64, BasicBlock, 2, False]] # 5
- [-1, 2, Blocks, [128, BasicBlock, 3, False]] # 6-P3
- [-1, 2, Blocks, [256, BasicBlock, 4, False]] # 7-P4
- [-1, 2, Blocks, [512, BasicBlock, 5, False]] # 8-P5
head:
- [-1, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 9 input_proj.2
- [-1, 1, AIFI, [1024, 8]]
- [-1, 1, Conv, [256, 1, 1]] # 11, Y5, lateral_convs.0
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 12
- [7, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 13 input_proj.1
- [[-2, -1], 1, Concat, [1]]
- [-1, 3, RepC3, [256, 0.5]] # 15, fpn_blocks.0
- [-1, 1, Conv, [256, 1, 1]] # 16, Y4, lateral_convs.1
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 17
- [6, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 18 input_proj.0
- [[-2, -1], 1, Concat, [1]] # 19 cat backbone P4
- [-1, 3, RepC3, [256, 0.5]] # X3 (20), fpn_blocks.1
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 21
- [5, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 22 input_proj.0
- [[-2, -1], 1, Concat, [1]] # 23 cat backbone P4
- [-1, 3, RepC3, [256, 0.5]] # X3 (24), fpn_blocks.1
- [-1, 1, Conv, [256, 3, 2]] # 25, downsample_convs.0
- [[-1, 20], 1, Concat, [1]] # 26 cat Y4
- [-1, 3, RepC3, [256, 0.5]] # F4 (27), pan_blocks.0
- [-1, 1, Conv, [256, 3, 2]] # 28, downsample_convs.0
- [[-1, 16], 1, Concat, [1]] # 29 cat Y4
- [-1, 3, RepC3, [256, 0.5]] # F4 (30), pan_blocks.0
- [-1, 1, Conv, [256, 3, 2]] # 31, downsample_convs.1
- [[-1, 11], 1, Concat, [1]] # 32 cat Y5
- [-1, 3, RepC3, [256, 0.5]] # F5 (33), pan_blocks.1
- [[24, 27, 30, 33], 1, RTDETRDecoder, [nc, 256, 300, 4, 8, 3]] # Detect(P3, P4, P5)
4.2 P2中期融合
# Ultralytics YOLO 🚀, AGPL-3.0 license
# RT-DETR-ResNet50 object detection model with P3-P5 outputs.
# Parameters
ch: 6
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'
# [depth, width, max_channels]
l: [1.00, 1.00, 1024]
backbone:
# [from, repeats, module, args]
- [-1, 1, IN, []] # 0
- [-1, 1, Multiin, [1]] # 1
- [-2, 1, Multiin, [2]] # 2
- [1, 1, ConvNormLayer, [32, 3, 2, 1, 'relu']] # 3-P1
- [-1, 1, ConvNormLayer, [32, 3, 1, 1, 'relu']] # 4
- [-1, 1, ConvNormLayer, [64, 3, 1, 1, 'relu']] # 5
- [-1, 1, nn.MaxPool2d, [3, 2, 1]] # 6-P2
- [-1, 2, Blocks, [64, BasicBlock, 2, False]] # 7
- [-1, 2, Blocks, [128, BasicBlock, 3, False]] # 8-P3
- [-1, 2, Blocks, [256, BasicBlock, 4, False]] # 9-P4
- [-1, 2, Blocks, [512, BasicBlock, 5, False]] # 10-P5
- [2, 1, ConvNormLayer, [32, 3, 2, 1, 'relu']] # 11-P1
- [-1, 1, ConvNormLayer, [32, 3, 1, 1, 'relu']] # 12
- [-1, 1, ConvNormLayer, [64, 3, 1, 1, 'relu']] # 13
- [-1, 1, nn.MaxPool2d, [3, 2, 1]] # 14-P2
- [-1, 2, Blocks, [64, BasicBlock, 2, False]] # 15
- [-1, 2, Blocks, [128, BasicBlock, 3, False]] # 16-P3
- [-1, 2, Blocks, [256, BasicBlock, 4, False]] # 17-P4
- [-1, 2, Blocks, [512, BasicBlock, 5, False]] # 18-P5
- [[7, 15], 1, Concat, [1]] # 19 cat backbone P3
- [[8, 16], 1, Concat, [1]] # 20 cat backbone P3
- [[9, 17], 1, Concat, [1]] # 21 cat backbone P4
- [[10, 18], 1, Concat, [1]] # 22 cat backbone P5
head:
- [-1, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 23 input_proj.2
- [-1, 1, AIFI, [1024, 8]]
- [-1, 1, Conv, [256, 1, 1]] # 25, Y5, lateral_convs.0
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 26
- [21, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 27 input_proj.1
- [[-2, -1], 1, Concat, [1]]
- [-1, 3, RepC3, [256, 0.5]] # 29, fpn_blocks.0
- [-1, 1, Conv, [256, 1, 1]] # 30, Y4, lateral_convs.1
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 31
- [20, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 32 input_proj.0
- [[-2, -1], 1, Concat, [1]] # 33 cat backbone P4
- [-1, 3, RepC3, [256, 0.5]] # X3 (34), fpn_blocks.1
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 35
- [19, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 36 input_proj.0
- [[-2, -1], 1, Concat, [1]] # 37 cat backbone P4
- [-1, 3, RepC3, [256, 0.5]] # X3 (38), fpn_blocks.1
- [-1, 1, Conv, [256, 3, 2]] # 39, downsample_convs.0
- [[-1, 34], 1, Concat, [1]] # 40 cat Y4
- [-1, 3, RepC3, [256, 0.5]] # F4 (41), pan_blocks.0
- [-1, 1, Conv, [256, 3, 2]] # 42, downsample_convs.0
- [[-1, 30], 1, Concat, [1]] # 43 cat Y4
- [-1, 3, RepC3, [256, 0.5]] # F4 (44), pan_blocks.0
- [-1, 1, Conv, [256, 3, 2]] # 45, downsample_convs.1
- [[-1, 25], 1, Concat, [1]] # 46 cat Y5
- [-1, 3, RepC3, [256, 0.5]] # F5 (47), pan_blocks.1
- [[38, 41, 44, 47], 1, RTDETRDecoder, [nc, 256, 300, 4, 8, 3]] # Detect(P3, P4, P5)
4.3 P2中-后期融合
# Ultralytics YOLO 🚀, AGPL-3.0 license
# RT-DETR-ResNet50 object detection model with P3-P5 outputs.
# Parameters
ch: 6
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'
# [depth, width, max_channels]
l: [1.00, 1.00, 1024]
backbone:
# [from, repeats, module, args]
- [-1, 1, IN, []] # 0
- [-1, 1, Multiin, [1]] # 1
- [-2, 1, Multiin, [2]] # 2
- [1, 1, ConvNormLayer, [32, 3, 2, 1, 'relu']] # 3-P1
- [-1, 1, ConvNormLayer, [32, 3, 1, 1, 'relu']] # 4
- [-1, 1, ConvNormLayer, [64, 3, 1, 1, 'relu']] # 5
- [-1, 1, nn.MaxPool2d, [3, 2, 1]] # 6-P2
- [-1, 2, Blocks, [64, BasicBlock, 2, False]] # 7
- [-1, 2, Blocks, [128, BasicBlock, 3, False]] # 8-P3
- [-1, 2, Blocks, [256, BasicBlock, 4, False]] # 9-P4
- [-1, 2, Blocks, [512, BasicBlock, 5, False]] # 10-P5
- [2, 1, ConvNormLayer, [32, 3, 2, 1, 'relu']] # 11-P1
- [-1, 1, ConvNormLayer, [32, 3, 1, 1, 'relu']] # 12
- [-1, 1, ConvNormLayer, [64, 3, 1, 1, 'relu']] # 13
- [-1, 1, nn.MaxPool2d, [3, 2, 1]] # 14-P2
- [-1, 2, Blocks, [64, BasicBlock, 2, False]] # 15
- [-1, 2, Blocks, [128, BasicBlock, 3, False]] # 16-P3
- [-1, 2, Blocks, [256, BasicBlock, 4, False]] # 17-P4
- [-1, 2, Blocks, [512, BasicBlock, 5, False]] # 18-P5
head:
- [10, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 19 input_proj.2
- [-1, 1, AIFI, [1024, 8]]
- [-1, 1, Conv, [256, 1, 1]] # 21, Y5, lateral_convs.0
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 22
- [9, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 23 input_proj.1
- [[-2, -1], 1, Concat, [1]]
- [-1, 3, RepC3, [256, 0.5]] # 25, fpn_blocks.0
- [-1, 1, Conv, [256, 1, 1]] # 26, Y4, lateral_convs.1
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 27
- [8, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 28 input_proj.0
- [[-2, -1], 1, Concat, [1]] # 29 cat backbone P4
- [-1, 3, RepC3, [256, 0.5]] # X3 (30), fpn_blocks.1
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 31
- [7, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 32 input_proj.0
- [[-2, -1], 1, Concat, [1]] # 33 cat backbone P4
- [-1, 3, RepC3, [256, 0.5]] # X3 (34), fpn_blocks.1
- [18, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 35 input_proj.2
- [-1, 1, AIFI, [1024, 8]]
- [-1, 1, Conv, [256, 1, 1]] # 37, Y5, lateral_convs.0
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 38
- [17, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 39 input_proj.1
- [[-2, -1], 1, Concat, [1]]
- [-1, 3, RepC3, [256, 0.5]] # 41, fpn_blocks.0
- [-1, 1, Conv, [256, 1, 1]] # 42, Y4, lateral_convs.1
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 43
- [16, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 44 input_proj.0
- [[-2, -1], 1, Concat, [1]] # 45 cat backbone P4
- [-1, 3, RepC3, [256, 0.5]] # X3 (46), fpn_blocks.1
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 47
- [15, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 48 input_proj.0
- [[-2, -1], 1, Concat, [1]] # 49 cat backbone P4
- [-1, 3, RepC3, [256, 0.5]] # X3 (50), fpn_blocks.1
- [[21, 37], 1, Concat, [1]] # 51 cat backbone P3
- [[26, 42], 1, Concat, [1]] # 52 cat backbone P3
- [[30, 46], 1, Concat, [1]] # 53 cat backbone P4
- [[34, 50], 1, Concat, [1]] # 54 cat backbone P5
- [-1, 1, Conv, [256, 3, 2]] # 55, downsample_convs.0
- [[-1, 53], 1, Concat, [1]] # 56 cat Y4
- [-1, 3, RepC3, [256, 0.5]] # F4 (57), pan_blocks.0
- [-1, 1, Conv, [256, 3, 2]] # 58, downsample_convs.0
- [[-1, 52], 1, Concat, [1]] # 59 cat Y4
- [-1, 3, RepC3, [256, 0.5]] # F4 (60), pan_blocks.0
- [-1, 1, Conv, [256, 3, 2]] # 61, downsample_convs.1
- [[-1, 51], 1, Concat, [1]] # 62 cat Y5
- [-1, 3, RepC3, [256, 0.5]] # F5 (63), pan_blocks.1
- [[54, 57, 60, 63], 1, RTDETRDecoder, [nc, 256, 300, 4, 8, 3]] # Detect(P3, P4, P5)
4.4 P2后期融合
# Ultralytics YOLO 🚀, AGPL-3.0 license
# RT-DETR-ResNet50 object detection model with P3-P5 outputs.
# Parameters
ch: 6
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'
# [depth, width, max_channels]
l: [1.00, 1.00, 1024]
backbone:
# [from, repeats, module, args]
- [-1, 1, IN, []] # 0
- [-1, 1, Multiin, [1]] # 1
- [-2, 1, Multiin, [2]] # 2
- [1, 1, ConvNormLayer, [32, 3, 2, 1, 'relu']] # 3-P1
- [-1, 1, ConvNormLayer, [32, 3, 1, 1, 'relu']] # 4
- [-1, 1, ConvNormLayer, [64, 3, 1, 1, 'relu']] # 5
- [-1, 1, nn.MaxPool2d, [3, 2, 1]] # 6-P2
- [-1, 2, Blocks, [64, BasicBlock, 2, False]] # 7
- [-1, 2, Blocks, [128, BasicBlock, 3, False]] # 8-P3
- [-1, 2, Blocks, [256, BasicBlock, 4, False]] # 9-P4
- [-1, 2, Blocks, [512, BasicBlock, 5, False]] # 10-P5
- [2, 1, ConvNormLayer, [32, 3, 2, 1, 'relu']] # 11-P1
- [-1, 1, ConvNormLayer, [32, 3, 1, 1, 'relu']] # 12
- [-1, 1, ConvNormLayer, [64, 3, 1, 1, 'relu']] # 13
- [-1, 1, nn.MaxPool2d, [3, 2, 1]] # 14-P2
- [-1, 2, Blocks, [64, BasicBlock, 2, False]] # 15
- [-1, 2, Blocks, [128, BasicBlock, 3, False]] # 16-P3
- [-1, 2, Blocks, [256, BasicBlock, 4, False]] # 17-P4
- [-1, 2, Blocks, [512, BasicBlock, 5, False]] # 18-P5
head:
- [10, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 19 input_proj.2
- [-1, 1, AIFI, [1024, 8]]
- [-1, 1, Conv, [256, 1, 1]] # 21, Y5, lateral_convs.0
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 22
- [9, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 23 input_proj.1
- [[-2, -1], 1, Concat, [1]]
- [-1, 3, RepC3, [256, 0.5]] # 25, fpn_blocks.0
- [-1, 1, Conv, [256, 1, 1]] # 26, Y4, lateral_convs.1
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 27
- [8, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 28 input_proj.0
- [[-2, -1], 1, Concat, [1]] # 29 cat backbone P4
- [-1, 3, RepC3, [256, 0.5]] # X3 (30), fpn_blocks.1
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 31
- [7, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 32 input_proj.0
- [[-2, -1], 1, Concat, [1]] # 33 cat backbone P4
- [-1, 3, RepC3, [256, 0.5]] # X3 (34), fpn_blocks.1
- [-1, 1, Conv, [256, 3, 2]] # 35, downsample_convs.0
- [[-1, 30], 1, Concat, [1]] # 36 cat Y4
- [-1, 3, RepC3, [256, 0.5]] # F4 (37), pan_blocks.0
- [-1, 1, Conv, [256, 3, 2]] # 38, downsample_convs.0
- [[-1, 26], 1, Concat, [1]] # 39 cat Y4
- [-1, 3, RepC3, [256, 0.5]] # F4 (40), pan_blocks.0
- [-1, 1, Conv, [256, 3, 2]] # 41, downsample_convs.1
- [[-1, 21], 1, Concat, [1]] # 42 cat Y5
- [-1, 3, RepC3, [256, 0.5]] # F5 (43), pan_blocks.1
- [18, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 44 input_proj.2
- [-1, 1, AIFI, [1024, 8]]
- [-1, 1, Conv, [256, 1, 1]] # 46, Y5, lateral_convs.0
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 47
- [17, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 48 input_proj.1
- [[-2, -1], 1, Concat, [1]]
- [-1, 3, RepC3, [256, 0.5]] # 50, fpn_blocks.0
- [-1, 1, Conv, [256, 1, 1]] # 51, Y4, lateral_convs.1
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 52
- [16, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 53 input_proj.0
- [[-2, -1], 1, Concat, [1]] # 54 cat backbone P4
- [-1, 3, RepC3, [256, 0.5]] # X3 (55), fpn_blocks.1
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 56
- [15, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 57 input_proj.0
- [[-2, -1], 1, Concat, [1]] # 58 cat backbone P4
- [-1, 3, RepC3, [256, 0.5]] # X3 (59), fpn_blocks.1
- [-1, 1, Conv, [256, 3, 2]] # 60, downsample_convs.0
- [[-1, 55], 1, Concat, [1]] # 61 cat Y4
- [-1, 3, RepC3, [256, 0.5]] # F4 (62), pan_blocks.0
- [-1, 1, Conv, [256, 3, 2]] # 63, downsample_convs.0
- [[-1, 51], 1, Concat, [1]] # 64 cat Y4
- [-1, 3, RepC3, [256, 0.5]] # F4 (65), pan_blocks.0
- [-1, 1, Conv, [256, 3, 2]] # 66, downsample_convs.1
- [[-1, 46], 1, Concat, [1]] # 67 cat Y5
- [-1, 3, RepC3, [256, 0.5]] # F4 (68), pan_blocks.1
- [[34, 59], 1, Concat, [1]] # 69 cat backbone P3
- [[37, 62], 1, Concat, [1]] # 70 cat backbone P3
- [[40, 65], 1, Concat, [1]] # 71 cat backbone P4
- [[43, 68], 1, Concat, [1]] # 72 cat backbone P5
- [[69, 70, 71, 72], 1, RTDETRDecoder, [nc, 256, 300, 4, 8, 3]] # Detect(P3, P4, P5)
五、成功运行结果
前期融合结果: 可以看到输入的通道数为6,表明可见光图像和红外图像均输入到了模型中进行融合训练。
rtdetr-resnet18-early-p2 summary: 491 layers, 22,160,188 parameters, 22,160,188 gradients, 128.4 GFLOPs
from n params module arguments
0 -1 1 1000 ultralytics.nn.AddModules.multimodal.MF [6, 32]
1 -1 1 9312 ultralytics.nn.AddModules.ResNet.ConvNormLayer[32, 32, 3, 2, 1, 'relu']
2 -1 1 9312 ultralytics.nn.AddModules.ResNet.ConvNormLayer[32, 32, 3, 1, 1, 'relu']
3 -1 1 18624 ultralytics.nn.AddModules.ResNet.ConvNormLayer[32, 64, 3, 1, 1, 'relu']
4 -1 1 0 torch.nn.modules.pooling.MaxPool2d [3, 2, 1]
5 -1 2 152512 ultralytics.nn.AddModules.ResNet.Blocks [64, 64, 2, 'BasicBlock', 2, False]
6 -1 2 526208 ultralytics.nn.AddModules.ResNet.Blocks [64, 128, 2, 'BasicBlock', 3, False]
7 -1 2 2100992 ultralytics.nn.AddModules.ResNet.Blocks [128, 256, 2, 'BasicBlock', 4, False]
8 -1 2 8396288 ultralytics.nn.AddModules.ResNet.Blocks [256, 512, 2, 'BasicBlock', 5, False]
9 -1 1 131584 ultralytics.nn.modules.conv.Conv [512, 256, 1, 1, None, 1, 1, False]
10 -1 1 789760 ultralytics.nn.modules.transformer.AIFI [256, 1024, 8]
11 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
12 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
13 7 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1, None, 1, 1, False]
14 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
15 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
16 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
17 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
18 6 1 33280 ultralytics.nn.modules.conv.Conv [128, 256, 1, 1, None, 1, 1, False]
19 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
20 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
21 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
22 5 1 16896 ultralytics.nn.modules.conv.Conv [64, 256, 1, 1, None, 1, 1, False]
23 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
24 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
25 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
26 [-1, 20] 1 0 ultralytics.nn.modules.conv.Concat [1]
27 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
28 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
29 [-1, 16] 1 0 ultralytics.nn.modules.conv.Concat [1]
30 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
31 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
32 [-1, 11] 1 0 ultralytics.nn.modules.conv.Concat [1]
33 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
34 [24, 27, 30, 33] 1 4057748 ultralytics.nn.modules.head.RTDETRDecoder [1, [256, 256, 256, 256], 256, 300, 4, 8, 3]
rtdetr-resnet18-early-p2 summary: 491 layers, 22,160,188 parameters, 22,160,188 gradients, 128.4 GFLOPs
中期融合结果:
rtdetr-resnet18-mid-p2 summary: 571 layers, 33,601,492 parameters, 33,601,492 gradients, 161.5 GFLOPs
from n params module arguments
0 -1 1 0 ultralytics.nn.AddModules.multimodal.IN []
1 -1 1 0 ultralytics.nn.AddModules.multimodal.Multiin [1]
2 -2 1 0 ultralytics.nn.AddModules.multimodal.Multiin [2]
3 1 1 960 ultralytics.nn.AddModules.ResNet.ConvNormLayer[3, 32, 3, 2, 1, 'relu']
4 -1 1 9312 ultralytics.nn.AddModules.ResNet.ConvNormLayer[32, 32, 3, 1, 1, 'relu']
5 -1 1 18624 ultralytics.nn.AddModules.ResNet.ConvNormLayer[32, 64, 3, 1, 1, 'relu']
6 -1 1 0 torch.nn.modules.pooling.MaxPool2d [3, 2, 1]
7 -1 2 152512 ultralytics.nn.AddModules.ResNet.Blocks [64, 64, 2, 'BasicBlock', 2, False]
8 -1 2 526208 ultralytics.nn.AddModules.ResNet.Blocks [64, 128, 2, 'BasicBlock', 3, False]
9 -1 2 2100992 ultralytics.nn.AddModules.ResNet.Blocks [128, 256, 2, 'BasicBlock', 4, False]
10 -1 2 8396288 ultralytics.nn.AddModules.ResNet.Blocks [256, 512, 2, 'BasicBlock', 5, False]
11 2 1 960 ultralytics.nn.AddModules.ResNet.ConvNormLayer[3, 32, 3, 2, 1, 'relu']
12 -1 1 9312 ultralytics.nn.AddModules.ResNet.ConvNormLayer[32, 32, 3, 1, 1, 'relu']
13 -1 1 18624 ultralytics.nn.AddModules.ResNet.ConvNormLayer[32, 64, 3, 1, 1, 'relu']
14 -1 1 0 torch.nn.modules.pooling.MaxPool2d [3, 2, 1]
15 -1 2 152512 ultralytics.nn.AddModules.ResNet.Blocks [64, 64, 2, 'BasicBlock', 2, False]
16 -1 2 526208 ultralytics.nn.AddModules.ResNet.Blocks [64, 128, 2, 'BasicBlock', 3, False]
17 -1 2 2100992 ultralytics.nn.AddModules.ResNet.Blocks [128, 256, 2, 'BasicBlock', 4, False]
18 -1 2 8396288 ultralytics.nn.AddModules.ResNet.Blocks [256, 512, 2, 'BasicBlock', 5, False]
19 [7, 15] 1 0 ultralytics.nn.modules.conv.Concat [1]
20 [8, 16] 1 0 ultralytics.nn.modules.conv.Concat [1]
21 [9, 17] 1 0 ultralytics.nn.modules.conv.Concat [1]
22 [10, 18] 1 0 ultralytics.nn.modules.conv.Concat [1]
23 -1 1 262656 ultralytics.nn.modules.conv.Conv [1024, 256, 1, 1, None, 1, 1, False]
24 -1 1 789760 ultralytics.nn.modules.transformer.AIFI [256, 1024, 8]
25 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
26 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
27 21 1 131584 ultralytics.nn.modules.conv.Conv [512, 256, 1, 1, None, 1, 1, False]
28 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
29 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
30 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
31 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
32 20 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1, None, 1, 1, False]
33 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
34 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
35 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
36 19 1 33280 ultralytics.nn.modules.conv.Conv [128, 256, 1, 1, None, 1, 1, False]
37 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
38 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
39 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
40 [-1, 34] 1 0 ultralytics.nn.modules.conv.Concat [1]
41 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
42 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
43 [-1, 30] 1 0 ultralytics.nn.modules.conv.Concat [1]
44 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
45 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
46 [-1, 25] 1 0 ultralytics.nn.modules.conv.Concat [1]
47 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
48 [38, 41, 44, 47] 1 4057748 ultralytics.nn.modules.head.RTDETRDecoder [1, [256, 256, 256, 256], 256, 300, 4, 8, 3]
rtdetr-resnet18-mid-p2 summary: 571 layers, 33,601,492 parameters, 33,601,492 gradients, 161.5 GFLOPs
中-后期融合结果:
rtdetr-resnet18-mid-to-late-p2 summary: 723 layers, 37,351,124 parameters, 37,351,124 gradients, 218.6 GFLOPs
from n params module arguments
0 -1 1 0 ultralytics.nn.AddModules.multimodal.IN []
1 -1 1 0 ultralytics.nn.AddModules.multimodal.Multiin [1]
2 -2 1 0 ultralytics.nn.AddModules.multimodal.Multiin [2]
3 1 1 960 ultralytics.nn.AddModules.ResNet.ConvNormLayer[3, 32, 3, 2, 1, 'relu']
4 -1 1 9312 ultralytics.nn.AddModules.ResNet.ConvNormLayer[32, 32, 3, 1, 1, 'relu']
5 -1 1 18624 ultralytics.nn.AddModules.ResNet.ConvNormLayer[32, 64, 3, 1, 1, 'relu']
6 -1 1 0 torch.nn.modules.pooling.MaxPool2d [3, 2, 1]
7 -1 2 152512 ultralytics.nn.AddModules.ResNet.Blocks [64, 64, 2, 'BasicBlock', 2, False]
8 -1 2 526208 ultralytics.nn.AddModules.ResNet.Blocks [64, 128, 2, 'BasicBlock', 3, False]
9 -1 2 2100992 ultralytics.nn.AddModules.ResNet.Blocks [128, 256, 2, 'BasicBlock', 4, False]
10 -1 2 8396288 ultralytics.nn.AddModules.ResNet.Blocks [256, 512, 2, 'BasicBlock', 5, False]
11 2 1 960 ultralytics.nn.AddModules.ResNet.ConvNormLayer[3, 32, 3, 2, 1, 'relu']
12 -1 1 9312 ultralytics.nn.AddModules.ResNet.ConvNormLayer[32, 32, 3, 1, 1, 'relu']
13 -1 1 18624 ultralytics.nn.AddModules.ResNet.ConvNormLayer[32, 64, 3, 1, 1, 'relu']
14 -1 1 0 torch.nn.modules.pooling.MaxPool2d [3, 2, 1]
15 -1 2 152512 ultralytics.nn.AddModules.ResNet.Blocks [64, 64, 2, 'BasicBlock', 2, False]
16 -1 2 526208 ultralytics.nn.AddModules.ResNet.Blocks [64, 128, 2, 'BasicBlock', 3, False]
17 -1 2 2100992 ultralytics.nn.AddModules.ResNet.Blocks [128, 256, 2, 'BasicBlock', 4, False]
18 -1 2 8396288 ultralytics.nn.AddModules.ResNet.Blocks [256, 512, 2, 'BasicBlock', 5, False]
19 10 1 131584 ultralytics.nn.modules.conv.Conv [512, 256, 1, 1, None, 1, 1, False]
20 -1 1 789760 ultralytics.nn.modules.transformer.AIFI [256, 1024, 8]
21 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
22 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
23 9 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1, None, 1, 1, False]
24 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
25 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
26 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
27 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
28 8 1 33280 ultralytics.nn.modules.conv.Conv [128, 256, 1, 1, None, 1, 1, False]
29 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
30 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
31 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
32 7 1 16896 ultralytics.nn.modules.conv.Conv [64, 256, 1, 1, None, 1, 1, False]
33 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
34 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
35 18 1 131584 ultralytics.nn.modules.conv.Conv [512, 256, 1, 1, None, 1, 1, False]
36 -1 1 789760 ultralytics.nn.modules.transformer.AIFI [256, 1024, 8]
37 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
38 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
39 17 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1, None, 1, 1, False]
40 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
41 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
42 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
43 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
44 16 1 33280 ultralytics.nn.modules.conv.Conv [128, 256, 1, 1, None, 1, 1, False]
45 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
46 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
47 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
48 15 1 16896 ultralytics.nn.modules.conv.Conv [64, 256, 1, 1, None, 1, 1, False]
49 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
50 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
51 [21, 37] 1 0 ultralytics.nn.modules.conv.Concat [1]
52 [26, 42] 1 0 ultralytics.nn.modules.conv.Concat [1]
53 [30, 46] 1 0 ultralytics.nn.modules.conv.Concat [1]
54 [34, 50] 1 0 ultralytics.nn.modules.conv.Concat [1]
55 -1 1 1180160 ultralytics.nn.modules.conv.Conv [512, 256, 3, 2]
56 [-1, 53] 1 0 ultralytics.nn.modules.conv.Concat [1]
57 -1 3 723456 ultralytics.nn.modules.block.RepC3 [768, 256, 3, 0.5]
58 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
59 [-1, 52] 1 0 ultralytics.nn.modules.conv.Concat [1]
60 -1 3 723456 ultralytics.nn.modules.block.RepC3 [768, 256, 3, 0.5]
61 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
62 [-1, 51] 1 0 ultralytics.nn.modules.conv.Concat [1]
63 -1 3 723456 ultralytics.nn.modules.block.RepC3 [768, 256, 3, 0.5]
64 [54, 57, 60, 63] 1 4123284 ultralytics.nn.modules.head.RTDETRDecoder [1, [512, 256, 256, 256], 256, 300, 4, 8, 3]
rtdetr-resnet18-mid-to-late-p2 summary: 723 layers, 37,351,124 parameters, 37,351,124 gradients, 218.6 GFLOPs
后期融合结果:
rtdetr-resnet18-late-p2 summary: 849 layers, 40,506,068 parameters, 40,506,068 gradients, 232.1 GFLOPs
from n params module arguments
0 -1 1 0 ultralytics.nn.AddModules.multimodal.IN []
1 -1 1 0 ultralytics.nn.AddModules.multimodal.Multiin [1]
2 -2 1 0 ultralytics.nn.AddModules.multimodal.Multiin [2]
3 1 1 960 ultralytics.nn.AddModules.ResNet.ConvNormLayer[3, 32, 3, 2, 1, 'relu']
4 -1 1 9312 ultralytics.nn.AddModules.ResNet.ConvNormLayer[32, 32, 3, 1, 1, 'relu']
5 -1 1 18624 ultralytics.nn.AddModules.ResNet.ConvNormLayer[32, 64, 3, 1, 1, 'relu']
6 -1 1 0 torch.nn.modules.pooling.MaxPool2d [3, 2, 1]
7 -1 2 152512 ultralytics.nn.AddModules.ResNet.Blocks [64, 64, 2, 'BasicBlock', 2, False]
8 -1 2 526208 ultralytics.nn.AddModules.ResNet.Blocks [64, 128, 2, 'BasicBlock', 3, False]
9 -1 2 2100992 ultralytics.nn.AddModules.ResNet.Blocks [128, 256, 2, 'BasicBlock', 4, False]
10 -1 2 8396288 ultralytics.nn.AddModules.ResNet.Blocks [256, 512, 2, 'BasicBlock', 5, False]
11 2 1 960 ultralytics.nn.AddModules.ResNet.ConvNormLayer[3, 32, 3, 2, 1, 'relu']
12 -1 1 9312 ultralytics.nn.AddModules.ResNet.ConvNormLayer[32, 32, 3, 1, 1, 'relu']
13 -1 1 18624 ultralytics.nn.AddModules.ResNet.ConvNormLayer[32, 64, 3, 1, 1, 'relu']
14 -1 1 0 torch.nn.modules.pooling.MaxPool2d [3, 2, 1]
15 -1 2 152512 ultralytics.nn.AddModules.ResNet.Blocks [64, 64, 2, 'BasicBlock', 2, False]
16 -1 2 526208 ultralytics.nn.AddModules.ResNet.Blocks [64, 128, 2, 'BasicBlock', 3, False]
17 -1 2 2100992 ultralytics.nn.AddModules.ResNet.Blocks [128, 256, 2, 'BasicBlock', 4, False]
18 -1 2 8396288 ultralytics.nn.AddModules.ResNet.Blocks [256, 512, 2, 'BasicBlock', 5, False]
19 10 1 131584 ultralytics.nn.modules.conv.Conv [512, 256, 1, 1, None, 1, 1, False]
20 -1 1 789760 ultralytics.nn.modules.transformer.AIFI [256, 1024, 8]
21 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
22 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
23 9 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1, None, 1, 1, False]
24 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
25 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
26 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
27 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
28 8 1 33280 ultralytics.nn.modules.conv.Conv [128, 256, 1, 1, None, 1, 1, False]
29 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
30 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
31 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
32 7 1 16896 ultralytics.nn.modules.conv.Conv [64, 256, 1, 1, None, 1, 1, False]
33 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
34 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
35 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
36 [-1, 30] 1 0 ultralytics.nn.modules.conv.Concat [1]
37 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
38 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
39 [-1, 26] 1 0 ultralytics.nn.modules.conv.Concat [1]
40 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
41 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
42 [-1, 21] 1 0 ultralytics.nn.modules.conv.Concat [1]
43 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
44 18 1 131584 ultralytics.nn.modules.conv.Conv [512, 256, 1, 1, None, 1, 1, False]
45 -1 1 789760 ultralytics.nn.modules.transformer.AIFI [256, 1024, 8]
46 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
47 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
48 17 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1, None, 1, 1, False]
49 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
50 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
51 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
52 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
53 16 1 33280 ultralytics.nn.modules.conv.Conv [128, 256, 1, 1, None, 1, 1, False]
54 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
55 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
56 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
57 15 1 16896 ultralytics.nn.modules.conv.Conv [64, 256, 1, 1, None, 1, 1, False]
58 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
59 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
60 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
61 [-1, 55] 1 0 ultralytics.nn.modules.conv.Concat [1]
62 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
63 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
64 [-1, 51] 1 0 ultralytics.nn.modules.conv.Concat [1]
65 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
66 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
67 [-1, 46] 1 0 ultralytics.nn.modules.conv.Concat [1]
68 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
69 [34, 59] 1 0 ultralytics.nn.modules.conv.Concat [1]
70 [37, 62] 1 0 ultralytics.nn.modules.conv.Concat [1]
71 [40, 65] 1 0 ultralytics.nn.modules.conv.Concat [1]
72 [43, 68] 1 0 ultralytics.nn.modules.conv.Concat [1]
73 [69, 70, 71, 72] 1 4319892 ultralytics.nn.modules.head.RTDETRDecoder [1, [512, 512, 512, 512], 256, 300, 4, 8, 3]
rtdetr-resnet18-late-p2 summary: 849 layers, 40,506,068 parameters, 40,506,068 gradients, 232.1 GFLOPs