【RT-DETR多模态融合改进】在前期、中期、中后期、后期多模态融合中添加P6大目标检测层,完整步骤及代码
前言
主题: RT-DETR 的多模态融合改进中增加P6大目标检测层
方式: 分别在前期融合、中期融合、中-后期融合、后期融合中增加P6多模态融合检测层。
内容: 包含融合方式详解以及完整配置步骤,开箱即用,一键运行。
一、RT-DETR原始模型结构介绍
rt-detr-resnet18
原始模型结构如下:
# Ultralytics YOLO 🚀, AGPL-3.0 license
# RT-DETR-ResNet50 object detection model with P3-P5 outputs.
# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'
# [depth, width, max_channels]
l: [1.00, 1.00, 1024]
backbone:
# [from, repeats, module, args]
- [-1, 1, ConvNormLayer, [32, 3, 2, 1, 'relu']] # 0-P1
- [-1, 1, ConvNormLayer, [32, 3, 1, 1, 'relu']] # 1
- [-1, 1, ConvNormLayer, [64, 3, 1, 1, 'relu']] # 2
- [-1, 1, nn.MaxPool2d, [3, 2, 1]] # 3-P2
- [-1, 2, Blocks, [64, BasicBlock, 2, False]] # 4
- [-1, 2, Blocks, [128, BasicBlock, 3, False]] # 5-P3
- [-1, 2, Blocks, [256, BasicBlock, 4, False]] # 6-P4
- [-1, 2, Blocks, [512, BasicBlock, 5, False]] # 7-P5
head:
- [-1, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 8 input_proj.2
- [-1, 1, AIFI, [1024, 8]]
- [-1, 1, Conv, [256, 1, 1]] # 10, Y5, lateral_convs.0
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 11
- [6, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 12 input_proj.1
- [[-2, -1], 1, Concat, [1]]
- [-1, 3, RepC3, [256, 0.5]] # 14, fpn_blocks.0
- [-1, 1, Conv, [256, 1, 1]] # 15, Y4, lateral_convs.1
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 16
- [5, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 17 input_proj.0
- [[-2, -1], 1, Concat, [1]] # 18 cat backbone P4
- [-1, 3, RepC3, [256, 0.5]] # X3 (19), fpn_blocks.1
- [-1, 1, Conv, [256, 3, 2]] # 20, downsample_convs.0
- [[-1, 15], 1, Concat, [1]] # 21 cat Y4
- [-1, 3, RepC3, [256, 0.5]] # F4 (22), pan_blocks.0
- [-1, 1, Conv, [256, 3, 2]] # 23, downsample_convs.1
- [[-1, 10], 1, Concat, [1]] # 24 cat Y5
- [-1, 3, RepC3, [256, 0.5]] # F5 (25), pan_blocks.1
- [[19, 22, 25], 1, RTDETRDecoder, [nc, 256, 300, 4, 8, 3]] # Detect(P3, P4, P5)
二、有效特征层对应的检测头类别
2.1 P3/8 - small检测头
-
原始模型中的
P3/8特征层对应的检测头主要用于检测相对较小的目标。其特征图大小相对较大,空间分辨率较高。 -
适合检测尺寸大概在
8x8到16x16像素左右的目标。
2.2 P4/16 - medium检测头
-
这个检测头对应的
P4/16特征层经过了更多的下采样操作,相比P3/8特征图空间分辨率降低,但通道数增加,特征更抽象且有语义信息。 -
它主要用于检测中等大小的目标,尺寸范围大概在
16x16到32x32像素左右。
2.3 P5/32 - large检测头
-
P5/32是经过最多下采样操作得到的特征层,其空间分辨率最低,但语义信息最强、全局感受野最大。 -
该检测头适合检测较大尺寸的目标,一般是尺寸在
32x32像素以上的目标。
2.4 新添加针对大目标的检测头
-
新添加的检测头主要用于检测更大尺寸的目标。尺寸在
64x64像素以上的超大目标。
💡这是因为在目标检测任务中,随着目标尺寸的增大,需要更能关注到整体轮廓的特征图来有效捕捉大目标特征。
三、P6检测层的多模态融合方式
-
前期融合中,在网络输入阶段将多模态数据合并后,增加针对大目标的检测层。
-
中期融合中,在骨干网络中增加针对P6的多模态特征进行融合,以此引出大目标的检测层。
-
中-后期融合中,在颈部的FPN结构中,增加针对P6的多模态特征进行融合,以此引出大目标的检测层。
-
后期融合中,在检测头前增加P6多模态特征进行融合。
四、完整配置步骤
!!! 私信获取的项目包就已经把相关的多模态输入、训练等改动都已经配好了,只需要新建模型yaml文件,粘贴对应的模型,进行训练即可。 项目包获取及使用教程可参考链接: 《YOLO系列模型的多模态项目》配置使用教程
由于RT-DETR的其它版本参数量和计算量都比较大,本专栏主要在resnet18版本上进行改进。
除了模型结构方面的改动,在yaml文件中还传入了一个通道数
ch: 6
表示传入的是双模态,6通道 ,前三个是可见光,后三个是红外。
在default.yaml中也配置了这个参数。
4.1 P6前期融合
# Ultralytics YOLO 🚀, AGPL-3.0 license
# RT-DETR-ResNet50 object detection model with P3-P5 outputs.
# Parameters
ch: 6
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'
# [depth, width, max_channels]
l: [1.00, 1.00, 1024]
backbone:
# [from, repeats, module, args]
- [-1, 1, MF, [32]] # 0
- [-1, 1, ConvNormLayer, [32, 3, 2, 1, 'relu']] # 1-P1
- [-1, 1, ConvNormLayer, [32, 3, 1, 1, 'relu']] # 2
- [-1, 1, ConvNormLayer, [64, 3, 1, 1, 'relu']] # 3
- [-1, 1, nn.MaxPool2d, [3, 2, 1]] # 4-P2
- [-1, 2, Blocks, [64, BasicBlock, 2, False]] # 5
- [-1, 2, Blocks, [128, BasicBlock, 3, False]] # 6-P3
- [-1, 2, Blocks, [256, BasicBlock, 4, False]] # 7-P4
- [-1, 2, Blocks, [512, BasicBlock, 5, False]] # 8-P5
- [-1, 2, Blocks, [512, BasicBlock, 5, False]] # 9-P6
head:
- [-1, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 10 input_proj.2
- [-1, 1, AIFI, [1024, 8]]
- [-1, 1, Conv, [256, 1, 1]] # 12, Y5, lateral_convs.0
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 13
- [8, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 14 input_proj.1
- [[-2, -1], 1, Concat, [1]]
- [-1, 3, RepC3, [256, 0.5]] # 16, fpn_blocks.0
- [-1, 1, Conv, [256, 1, 1]] # 17, Y4, lateral_convs.1
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 18
- [7, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 19 input_proj.0
- [[-2, -1], 1, Concat, [1]] # 20 cat backbone P4
- [-1, 3, RepC3, [256, 0.5]] # X3 (21), fpn_blocks.1
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 22
- [6, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 23 input_proj.0
- [[-2, -1], 1, Concat, [1]] # 24 cat backbone P4
- [-1, 3, RepC3, [256, 0.5]] # X3 (25), fpn_blocks.1
- [-1, 1, Conv, [256, 3, 2]] # 26, downsample_convs.0
- [[-1, 21], 1, Concat, [1]] # 27 cat Y4
- [-1, 3, RepC3, [256, 0.5]] # F4 (28), pan_blocks.0
- [-1, 1, Conv, [256, 3, 2]] # 29, downsample_convs.0
- [[-1, 17], 1, Concat, [1]] # 30 cat Y4
- [-1, 3, RepC3, [256, 0.5]] # F4 (31), pan_blocks.0
- [-1, 1, Conv, [256, 3, 2]] # 32, downsample_convs.1
- [[-1, 12], 1, Concat, [1]] # 33 cat Y5
- [-1, 3, RepC3, [256, 0.5]] # F5 (34), pan_blocks.1
- [[25, 28, 31, 34], 1, RTDETRDecoder, [nc, 256, 300, 4, 8, 3]] # Detect(P3, P4, P5)
4.2 P6中期融合
# Ultralytics YOLO 🚀, AGPL-3.0 license
# RT-DETR-ResNet50 object detection model with P3-P5 outputs.
# Parameters
ch: 6
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'
# [depth, width, max_channels]
l: [1.00, 1.00, 1024]
backbone:
# [from, repeats, module, args]
- [-1, 1, IN, []] # 0
- [-1, 1, Multiin, [1]] # 1
- [-2, 1, Multiin, [2]] # 2
- [1, 1, ConvNormLayer, [32, 3, 2, 1, 'relu']] # 3-P1
- [-1, 1, ConvNormLayer, [32, 3, 1, 1, 'relu']] # 4
- [-1, 1, ConvNormLayer, [64, 3, 1, 1, 'relu']] # 5
- [-1, 1, nn.MaxPool2d, [3, 2, 1]] # 6-P2
- [-1, 2, Blocks, [64, BasicBlock, 2, False]] # 7
- [-1, 2, Blocks, [128, BasicBlock, 3, False]] # 8-P3
- [-1, 2, Blocks, [256, BasicBlock, 4, False]] # 9-P4
- [-1, 2, Blocks, [512, BasicBlock, 5, False]] # 10-P5
- [-1, 2, Blocks, [512, BasicBlock, 5, False]] # 11-P5
- [2, 1, ConvNormLayer, [32, 3, 2, 1, 'relu']] # 12-P1
- [-1, 1, ConvNormLayer, [32, 3, 1, 1, 'relu']] # 13
- [-1, 1, ConvNormLayer, [64, 3, 1, 1, 'relu']] # 14
- [-1, 1, nn.MaxPool2d, [3, 2, 1]] # 15-P2
- [-1, 2, Blocks, [64, BasicBlock, 2, False]] # 16
- [-1, 2, Blocks, [128, BasicBlock, 3, False]] # 17-P3
- [-1, 2, Blocks, [256, BasicBlock, 4, False]] # 18-P4
- [-1, 2, Blocks, [512, BasicBlock, 5, False]] # 19-P5
- [-1, 2, Blocks, [512, BasicBlock, 5, False]] # 20-P5
- [[8, 17], 1, Concat, [1]] # 21 cat backbone P3
- [[9, 18], 1, Concat, [1]] # 22 cat backbone P3
- [[10, 19], 1, Concat, [1]] # 23 cat backbone P4
- [[11, 20], 1, Concat, [1]] # 24 cat backbone P5
head:
- [-1, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 25 input_proj.2
- [-1, 1, AIFI, [1024, 8]]
- [-1, 1, Conv, [256, 1, 1]] # 27, Y5, lateral_convs.0
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 28
- [23, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 29 input_proj.1
- [[-2, -1], 1, Concat, [1]]
- [-1, 3, RepC3, [256, 0.5]] # 31, fpn_blocks.0
- [-1, 1, Conv, [256, 1, 1]] # 32, Y4, lateral_convs.1
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 33
- [22, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 34 input_proj.0
- [[-2, -1], 1, Concat, [1]] # 35 cat backbone P4
- [-1, 3, RepC3, [256, 0.5]] # X3 (36), fpn_blocks.1
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 37
- [21, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 38 input_proj.0
- [[-2, -1], 1, Concat, [1]] # 39 cat backbone P4
- [-1, 3, RepC3, [256, 0.5]] # X3 (40), fpn_blocks.1
- [-1, 1, Conv, [256, 3, 2]] # 41, downsample_convs.0
- [[-1, 36], 1, Concat, [1]] # 42 cat Y4
- [-1, 3, RepC3, [256, 0.5]] # F4 (43), pan_blocks.0
- [-1, 1, Conv, [256, 3, 2]] # 44, downsample_convs.0
- [[-1, 32], 1, Concat, [1]] # 45 cat Y4
- [-1, 3, RepC3, [256, 0.5]] # F4 (46), pan_blocks.0
- [-1, 1, Conv, [256, 3, 2]] # 47, downsample_convs.1
- [[-1, 27], 1, Concat, [1]] # 48 cat Y5
- [-1, 3, RepC3, [256, 0.5]] # F5 (49), pan_blocks.1
- [[40, 43, 46, 49], 1, RTDETRDecoder, [nc, 256, 300, 4, 8, 3]] # Detect(P3, P4, P5)
4.3 P6中-后期融合
# Ultralytics YOLO 🚀, AGPL-3.0 license
# RT-DETR-ResNet50 object detection model with P3-P5 outputs.
# Parameters
ch: 6
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'
# [depth, width, max_channels]
l: [1.00, 1.00, 1024]
backbone:
# [from, repeats, module, args]
- [-1, 1, IN, []] # 0
- [-1, 1, Multiin, [1]] # 1
- [-2, 1, Multiin, [2]] # 2
- [1, 1, ConvNormLayer, [32, 3, 2, 1, 'relu']] # 3-P1
- [-1, 1, ConvNormLayer, [32, 3, 1, 1, 'relu']] # 4
- [-1, 1, ConvNormLayer, [64, 3, 1, 1, 'relu']] # 5
- [-1, 1, nn.MaxPool2d, [3, 2, 1]] # 6-P2
- [-1, 2, Blocks, [64, BasicBlock, 2, False]] # 7
- [-1, 2, Blocks, [128, BasicBlock, 3, False]] # 8-P3
- [-1, 2, Blocks, [256, BasicBlock, 4, False]] # 9-P4
- [-1, 2, Blocks, [512, BasicBlock, 5, False]] # 10-P5
- [-1, 2, Blocks, [512, BasicBlock, 5, False]] # 11-P5
- [2, 1, ConvNormLayer, [32, 3, 2, 1, 'relu']] # 12-P1
- [-1, 1, ConvNormLayer, [32, 3, 1, 1, 'relu']] # 13
- [-1, 1, ConvNormLayer, [64, 3, 1, 1, 'relu']] # 14
- [-1, 1, nn.MaxPool2d, [3, 2, 1]] # 15-P2
- [-1, 2, Blocks, [64, BasicBlock, 2, False]] # 16
- [-1, 2, Blocks, [128, BasicBlock, 3, False]] # 17-P3
- [-1, 2, Blocks, [256, BasicBlock, 4, False]] # 18-P4
- [-1, 2, Blocks, [512, BasicBlock, 5, False]] # 19-P5
- [-1, 2, Blocks, [512, BasicBlock, 5, False]] # 20-P5
head:
- [11, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 21 input_proj.2
- [-1, 1, AIFI, [1024, 8]]
- [-1, 1, Conv, [256, 1, 1]] # 23, Y5, lateral_convs.0
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 24
- [10, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 25 input_proj.1
- [[-2, -1], 1, Concat, [1]]
- [-1, 3, RepC3, [256, 0.5]] # 27, fpn_blocks.0
- [-1, 1, Conv, [256, 1, 1]] # 28, Y4, lateral_convs.1
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 29
- [9, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 30 input_proj.0
- [[-2, -1], 1, Concat, [1]] # 31 cat backbone P4
- [-1, 3, RepC3, [256, 0.5]] # X3 (32), fpn_blocks.1
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 33
- [8, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 34 input_proj.0
- [[-2, -1], 1, Concat, [1]] # 35 cat backbone P4
- [-1, 3, RepC3, [256, 0.5]] # X3 (36), fpn_blocks.1
- [20, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 37 input_proj.2
- [-1, 1, AIFI, [1024, 8]]
- [-1, 1, Conv, [256, 1, 1]] # 39, Y5, lateral_convs.0
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 40
- [19, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 41 input_proj.1
- [[-2, -1], 1, Concat, [1]]
- [-1, 3, RepC3, [256, 0.5]] # 43, fpn_blocks.0
- [-1, 1, Conv, [256, 1, 1]] # 44, Y4, lateral_convs.1
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 45
- [18, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 46 input_proj.0
- [[-2, -1], 1, Concat, [1]] # 47 cat backbone P4
- [-1, 3, RepC3, [256, 0.5]] # X3 (48), fpn_blocks.1
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 49
- [17, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 50 input_proj.0
- [[-2, -1], 1, Concat, [1]] # 51 cat backbone P4
- [-1, 3, RepC3, [256, 0.5]] # X3 (52), fpn_blocks.1
- [[23, 39], 1, Concat, [1]] # 53 cat backbone P3
- [[28, 44], 1, Concat, [1]] # 54 cat backbone P3
- [[32, 48], 1, Concat, [1]] # 55 cat backbone P4
- [[36, 52], 1, Concat, [1]] # 56 cat backbone P5
- [-1, 1, Conv, [256, 3, 2]] # 57, downsample_convs.0
- [[-1, 55], 1, Concat, [1]] # 58 cat Y4
- [-1, 3, RepC3, [256, 0.5]] # F5 (59), pan_blocks.0
- [-1, 1, Conv, [256, 3, 2]] # 60, downsample_convs.0
- [[-1, 54], 1, Concat, [1]] # 61 cat Y4
- [-1, 3, RepC3, [256, 0.5]] # F5 (62), pan_blocks.0
- [-1, 1, Conv, [256, 3, 2]] # 63, downsample_convs.1
- [[-1, 53], 1, Concat, [1]] # 64 cat Y5
- [-1, 3, RepC3, [256, 0.5]] # F5 (65), pan_blocks.1
- [[56, 59, 62, 65], 1, RTDETRDecoder, [nc, 256, 300, 4, 8, 3]] # Detect(P3, P4, P5)
4.4 P6后期融合
# Ultralytics YOLO 🚀, AGPL-3.0 license
# RT-DETR-ResNet50 object detection model with P3-P5 outputs.
# Parameters
ch: 6
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n-cls.yaml' will call yolov8-cls.yaml with scale 'n'
# [depth, width, max_channels]
l: [1.00, 1.00, 1024]
backbone:
# [from, repeats, module, args]
- [-1, 1, IN, []] # 0
- [-1, 1, Multiin, [1]] # 1
- [-2, 1, Multiin, [2]] # 2
- [1, 1, ConvNormLayer, [32, 3, 2, 1, 'relu']] # 3-P1
- [-1, 1, ConvNormLayer, [32, 3, 1, 1, 'relu']] # 4
- [-1, 1, ConvNormLayer, [64, 3, 1, 1, 'relu']] # 5
- [-1, 1, nn.MaxPool2d, [3, 2, 1]] # 6-P2
- [-1, 2, Blocks, [64, BasicBlock, 2, False]] # 7
- [-1, 2, Blocks, [128, BasicBlock, 3, False]] # 8-P3
- [-1, 2, Blocks, [256, BasicBlock, 4, False]] # 9-P4
- [-1, 2, Blocks, [512, BasicBlock, 5, False]] # 10-P5
- [-1, 2, Blocks, [512, BasicBlock, 5, False]] # 11-P5
- [2, 1, ConvNormLayer, [32, 3, 2, 1, 'relu']] # 12-P1
- [-1, 1, ConvNormLayer, [32, 3, 1, 1, 'relu']] # 13
- [-1, 1, ConvNormLayer, [64, 3, 1, 1, 'relu']] # 14
- [-1, 1, nn.MaxPool2d, [3, 2, 1]] # 15-P2
- [-1, 2, Blocks, [64, BasicBlock, 2, False]] # 16
- [-1, 2, Blocks, [128, BasicBlock, 3, False]] # 17-P3
- [-1, 2, Blocks, [256, BasicBlock, 4, False]] # 18-P4
- [-1, 2, Blocks, [512, BasicBlock, 5, False]] # 19-P5
- [-1, 2, Blocks, [512, BasicBlock, 5, False]] # 20-P5
head:
- [11, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 21 input_proj.2
- [-1, 1, AIFI, [1024, 8]]
- [-1, 1, Conv, [256, 1, 1]] # 23, Y5, lateral_convs.0
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 24
- [10, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 25 input_proj.1
- [[-2, -1], 1, Concat, [1]]
- [-1, 3, RepC3, [256, 0.5]] # 27, fpn_blocks.0
- [-1, 1, Conv, [256, 1, 1]] # 28, Y4, lateral_convs.1
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 29
- [9, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 30 input_proj.0
- [[-2, -1], 1, Concat, [1]] # 31 cat backbone P4
- [-1, 3, RepC3, [256, 0.5]] # X3 (32), fpn_blocks.1
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 33
- [8, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 34 input_proj.0
- [[-2, -1], 1, Concat, [1]] # 35 cat backbone P4
- [-1, 3, RepC3, [256, 0.5]] # X3 (36), fpn_blocks.1
- [-1, 1, Conv, [256, 3, 2]] # 37, downsample_convs.0
- [[-1, 32], 1, Concat, [1]] # 38 cat Y4
- [-1, 3, RepC3, [256, 0.5]] # F4 (39), pan_blocks.0
- [-1, 1, Conv, [256, 3, 2]] # 40, downsample_convs.0
- [[-1, 28], 1, Concat, [1]] # 41 cat Y4
- [-1, 3, RepC3, [256, 0.5]] # F4 (42), pan_blocks.0
- [-1, 1, Conv, [256, 3, 2]] # 43, downsample_convs.1
- [[-1, 23], 1, Concat, [1]] # 44 cat Y5
- [-1, 3, RepC3, [256, 0.5]] # F5 (45), pan_blocks.1
- [20, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 46 input_proj.2
- [-1, 1, AIFI, [1024, 8]]
- [-1, 1, Conv, [256, 1, 1]] # 48, Y5, lateral_convs.0
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 49
- [19, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 50 input_proj.1
- [[-2, -1], 1, Concat, [1]]
- [-1, 3, RepC3, [256, 0.5]] # 52, fpn_blocks.0
- [-1, 1, Conv, [256, 1, 1]] # 53, Y4, lateral_convs.1
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 54
- [18, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 55 input_proj.0
- [[-2, -1], 1, Concat, [1]] # 56 cat backbone P4
- [-1, 3, RepC3, [256, 0.5]] # X3 (57), fpn_blocks.1
- [-1, 1, nn.Upsample, [None, 2, 'nearest']] # 58
- [17, 1, Conv, [256, 1, 1, None, 1, 1, False]] # 59 input_proj.0
- [[-2, -1], 1, Concat, [1]] # 60 cat backbone P4
- [-1, 3, RepC3, [256, 0.5]] # X3 (61), fpn_blocks.1
- [-1, 1, Conv, [256, 3, 2]] # 62, downsample_convs.0
- [[-1, 57], 1, Concat, [1]] # 63 cat Y4
- [-1, 3, RepC3, [256, 0.5]] # F4 (64), pan_blocks.0
- [-1, 1, Conv, [256, 3, 2]] # 65, downsample_convs.0
- [[-1, 53], 1, Concat, [1]] # 66 cat Y4
- [-1, 3, RepC3, [256, 0.5]] # F4 (67), pan_blocks.0
- [-1, 1, Conv, [256, 3, 2]] # 68, downsample_convs.1
- [[-1, 48], 1, Concat, [1]] # 69 cat Y5
- [-1, 3, RepC3, [256, 0.5]] # 70 (49), pan_blocks.1
- [[36, 61], 1, Concat, [1]] # 71 cat backbone P3
- [[39, 64], 1, Concat, [1]] # 72 cat backbone P3
- [[42, 67], 1, Concat, [1]] # 73 cat backbone P4
- [[45, 70], 1, Concat, [1]] # 74 cat backbone P5
- [[71, 72, 73, 74], 1, RTDETRDecoder, [nc, 256, 300, 4, 8, 3]] # Detect(P3, P4, P5)
五、成功运行结果
前期融合结果: 可以看到输入的通道数为6,表明可见光图像和红外图像均输入到了模型中进行融合训练。
rtdetr-resnet18-early-p6 summary: 510 layers, 31,981,884 parameters, 31,981,884 gradients, 63.1 GFLOPs
from n params module arguments
0 -1 1 1000 ultralytics.nn.AddModules.multimodal.MF [6, 32]
1 -1 1 9312 ultralytics.nn.AddModules.ResNet.ConvNormLayer[32, 32, 3, 2, 1, 'relu']
2 -1 1 9312 ultralytics.nn.AddModules.ResNet.ConvNormLayer[32, 32, 3, 1, 1, 'relu']
3 -1 1 18624 ultralytics.nn.AddModules.ResNet.ConvNormLayer[32, 64, 3, 1, 1, 'relu']
4 -1 1 0 torch.nn.modules.pooling.MaxPool2d [3, 2, 1]
5 -1 2 152512 ultralytics.nn.AddModules.ResNet.Blocks [64, 64, 2, 'BasicBlock', 2, False]
6 -1 2 526208 ultralytics.nn.AddModules.ResNet.Blocks [64, 128, 2, 'BasicBlock', 3, False]
7 -1 2 2100992 ultralytics.nn.AddModules.ResNet.Blocks [128, 256, 2, 'BasicBlock', 4, False]
8 -1 2 8396288 ultralytics.nn.AddModules.ResNet.Blocks [256, 512, 2, 'BasicBlock', 5, False]
9 -1 2 9707008 ultralytics.nn.AddModules.ResNet.Blocks [512, 512, 2, 'BasicBlock', 5, False]
10 -1 1 131584 ultralytics.nn.modules.conv.Conv [512, 256, 1, 1, None, 1, 1, False]
11 -1 1 789760 ultralytics.nn.modules.transformer.AIFI [256, 1024, 8]
12 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
13 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
14 8 1 131584 ultralytics.nn.modules.conv.Conv [512, 256, 1, 1, None, 1, 1, False]
15 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
16 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
17 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
18 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
19 7 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1, None, 1, 1, False]
20 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
21 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
22 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
23 6 1 33280 ultralytics.nn.modules.conv.Conv [128, 256, 1, 1, None, 1, 1, False]
24 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
25 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
26 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
27 [-1, 21] 1 0 ultralytics.nn.modules.conv.Concat [1]
28 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
29 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
30 [-1, 17] 1 0 ultralytics.nn.modules.conv.Concat [1]
31 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
32 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
33 [-1, 12] 1 0 ultralytics.nn.modules.conv.Concat [1]
34 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
35 [25, 28, 31, 34] 1 4057748 ultralytics.nn.modules.head.RTDETRDecoder [1, [256, 256, 256, 256], 256, 300, 4, 8, 3]
rtdetr-resnet18-early-p6 summary: 510 layers, 31,981,884 parameters, 31,981,884 gradients, 63.1 GFLOPs
中期融合结果:
rtdetr-resnet18-mid-p6 summary: 609 layers, 53,244,884 parameters, 53,244,884 gradients, 97.3 GFLOPs
from n params module arguments
0 -1 1 0 ultralytics.nn.AddModules.multimodal.IN []
1 -1 1 0 ultralytics.nn.AddModules.multimodal.Multiin [1]
2 -2 1 0 ultralytics.nn.AddModules.multimodal.Multiin [2]
3 1 1 960 ultralytics.nn.AddModules.ResNet.ConvNormLayer[3, 32, 3, 2, 1, 'relu']
4 -1 1 9312 ultralytics.nn.AddModules.ResNet.ConvNormLayer[32, 32, 3, 1, 1, 'relu']
5 -1 1 18624 ultralytics.nn.AddModules.ResNet.ConvNormLayer[32, 64, 3, 1, 1, 'relu']
6 -1 1 0 torch.nn.modules.pooling.MaxPool2d [3, 2, 1]
7 -1 2 152512 ultralytics.nn.AddModules.ResNet.Blocks [64, 64, 2, 'BasicBlock', 2, False]
8 -1 2 526208 ultralytics.nn.AddModules.ResNet.Blocks [64, 128, 2, 'BasicBlock', 3, False]
9 -1 2 2100992 ultralytics.nn.AddModules.ResNet.Blocks [128, 256, 2, 'BasicBlock', 4, False]
10 -1 2 8396288 ultralytics.nn.AddModules.ResNet.Blocks [256, 512, 2, 'BasicBlock', 5, False]
11 -1 2 9707008 ultralytics.nn.AddModules.ResNet.Blocks [512, 512, 2, 'BasicBlock', 5, False]
12 2 1 960 ultralytics.nn.AddModules.ResNet.ConvNormLayer[3, 32, 3, 2, 1, 'relu']
13 -1 1 9312 ultralytics.nn.AddModules.ResNet.ConvNormLayer[32, 32, 3, 1, 1, 'relu']
14 -1 1 18624 ultralytics.nn.AddModules.ResNet.ConvNormLayer[32, 64, 3, 1, 1, 'relu']
15 -1 1 0 torch.nn.modules.pooling.MaxPool2d [3, 2, 1]
16 -1 2 152512 ultralytics.nn.AddModules.ResNet.Blocks [64, 64, 2, 'BasicBlock', 2, False]
17 -1 2 526208 ultralytics.nn.AddModules.ResNet.Blocks [64, 128, 2, 'BasicBlock', 3, False]
18 -1 2 2100992 ultralytics.nn.AddModules.ResNet.Blocks [128, 256, 2, 'BasicBlock', 4, False]
19 -1 2 8396288 ultralytics.nn.AddModules.ResNet.Blocks [256, 512, 2, 'BasicBlock', 5, False]
20 -1 2 9707008 ultralytics.nn.AddModules.ResNet.Blocks [512, 512, 2, 'BasicBlock', 5, False]
21 [8, 17] 1 0 ultralytics.nn.modules.conv.Concat [1]
22 [9, 18] 1 0 ultralytics.nn.modules.conv.Concat [1]
23 [10, 19] 1 0 ultralytics.nn.modules.conv.Concat [1]
24 [11, 20] 1 0 ultralytics.nn.modules.conv.Concat [1]
25 -1 1 262656 ultralytics.nn.modules.conv.Conv [1024, 256, 1, 1, None, 1, 1, False]
26 -1 1 789760 ultralytics.nn.modules.transformer.AIFI [256, 1024, 8]
27 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
28 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
29 23 1 262656 ultralytics.nn.modules.conv.Conv [1024, 256, 1, 1, None, 1, 1, False]
30 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
31 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
32 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
33 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
34 22 1 131584 ultralytics.nn.modules.conv.Conv [512, 256, 1, 1, None, 1, 1, False]
35 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
36 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
37 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
38 21 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1, None, 1, 1, False]
39 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
40 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
41 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
42 [-1, 36] 1 0 ultralytics.nn.modules.conv.Concat [1]
43 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
44 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
45 [-1, 32] 1 0 ultralytics.nn.modules.conv.Concat [1]
46 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
47 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
48 [-1, 27] 1 0 ultralytics.nn.modules.conv.Concat [1]
49 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
50 [40, 43, 46, 49] 1 4057748 ultralytics.nn.modules.head.RTDETRDecoder [1, [256, 256, 256, 256], 256, 300, 4, 8, 3]
rtdetr-resnet18-mid-p6 summary: 609 layers, 53,244,884 parameters, 53,244,884 gradients, 97.3 GFLOPs
中-后期融合结果:
rtdetr-resnet18-mid-to-late-p2 summary: 723 layers, 37,351,124 parameters, 37,351,124 gradients, 218.6 GFLOPs
from n params module arguments
0 -1 1 0 ultralytics.nn.AddModules.multimodal.IN []
1 -1 1 0 ultralytics.nn.AddModules.multimodal.Multiin [1]
2 -2 1 0 ultralytics.nn.AddModules.multimodal.Multiin [2]
3 1 1 960 ultralytics.nn.AddModules.ResNet.ConvNormLayer[3, 32, 3, 2, 1, 'relu']
4 -1 1 9312 ultralytics.nn.AddModules.ResNet.ConvNormLayer[32, 32, 3, 1, 1, 'relu']
5 -1 1 18624 ultralytics.nn.AddModules.ResNet.ConvNormLayer[32, 64, 3, 1, 1, 'relu']
6 -1 1 0 torch.nn.modules.pooling.MaxPool2d [3, 2, 1]
7 -1 2 152512 ultralytics.nn.AddModules.ResNet.Blocks [64, 64, 2, 'BasicBlock', 2, False]
8 -1 2 526208 ultralytics.nn.AddModules.ResNet.Blocks [64, 128, 2, 'BasicBlock', 3, False]
9 -1 2 2100992 ultralytics.nn.AddModules.ResNet.Blocks [128, 256, 2, 'BasicBlock', 4, False]
10 -1 2 8396288 ultralytics.nn.AddModules.ResNet.Blocks [256, 512, 2, 'BasicBlock', 5, False]
11 -1 2 9707008 ultralytics.nn.AddModules.ResNet.Blocks [512, 512, 2, 'BasicBlock', 5, False]
12 2 1 960 ultralytics.nn.AddModules.ResNet.ConvNormLayer[3, 32, 3, 2, 1, 'relu']
13 -1 1 9312 ultralytics.nn.AddModules.ResNet.ConvNormLayer[32, 32, 3, 1, 1, 'relu']
14 -1 1 18624 ultralytics.nn.AddModules.ResNet.ConvNormLayer[32, 64, 3, 1, 1, 'relu']
15 -1 1 0 torch.nn.modules.pooling.MaxPool2d [3, 2, 1]
16 -1 2 152512 ultralytics.nn.AddModules.ResNet.Blocks [64, 64, 2, 'BasicBlock', 2, False]
17 -1 2 526208 ultralytics.nn.AddModules.ResNet.Blocks [64, 128, 2, 'BasicBlock', 3, False]
18 -1 2 2100992 ultralytics.nn.AddModules.ResNet.Blocks [128, 256, 2, 'BasicBlock', 4, False]
19 -1 2 8396288 ultralytics.nn.AddModules.ResNet.Blocks [256, 512, 2, 'BasicBlock', 5, False]
20 -1 2 9707008 ultralytics.nn.AddModules.ResNet.Blocks [512, 512, 2, 'BasicBlock', 5, False]
21 11 1 131584 ultralytics.nn.modules.conv.Conv [512, 256, 1, 1, None, 1, 1, False]
22 -1 1 789760 ultralytics.nn.modules.transformer.AIFI [256, 1024, 8]
23 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
24 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
25 10 1 131584 ultralytics.nn.modules.conv.Conv [512, 256, 1, 1, None, 1, 1, False]
26 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
27 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
28 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
29 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
30 9 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1, None, 1, 1, False]
31 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
32 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
33 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
34 8 1 33280 ultralytics.nn.modules.conv.Conv [128, 256, 1, 1, None, 1, 1, False]
35 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
36 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
37 20 1 131584 ultralytics.nn.modules.conv.Conv [512, 256, 1, 1, None, 1, 1, False]
38 -1 1 789760 ultralytics.nn.modules.transformer.AIFI [256, 1024, 8]
39 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
40 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
41 19 1 131584 ultralytics.nn.modules.conv.Conv [512, 256, 1, 1, None, 1, 1, False]
42 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
43 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
44 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
45 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
46 18 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1, None, 1, 1, False]
47 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
48 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
49 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
50 17 1 33280 ultralytics.nn.modules.conv.Conv [128, 256, 1, 1, None, 1, 1, False]
51 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
52 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
53 [23, 39] 1 0 ultralytics.nn.modules.conv.Concat [1]
54 [28, 44] 1 0 ultralytics.nn.modules.conv.Concat [1]
55 [32, 48] 1 0 ultralytics.nn.modules.conv.Concat [1]
56 [36, 52] 1 0 ultralytics.nn.modules.conv.Concat [1]
57 -1 1 1180160 ultralytics.nn.modules.conv.Conv [512, 256, 3, 2]
58 [-1, 55] 1 0 ultralytics.nn.modules.conv.Concat [1]
59 -1 3 723456 ultralytics.nn.modules.block.RepC3 [768, 256, 3, 0.5]
60 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
61 [-1, 54] 1 0 ultralytics.nn.modules.conv.Concat [1]
62 -1 3 723456 ultralytics.nn.modules.block.RepC3 [768, 256, 3, 0.5]
63 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
64 [-1, 53] 1 0 ultralytics.nn.modules.conv.Concat [1]
65 -1 3 723456 ultralytics.nn.modules.block.RepC3 [768, 256, 3, 0.5]
66 [56, 59, 62, 65] 1 4123284 ultralytics.nn.modules.head.RTDETRDecoder [1, [512, 256, 256, 256], 256, 300, 4, 8, 3]
rtdetr-resnet18-mid-to-late-p6 summary: 761 layers, 56,994,516 parameters, 56,994,516 gradients, 111.6 GFLOPs
后期融合结果:
rtdetr-resnet18-late-p6 summary: 887 layers, 60,149,460 parameters, 60,149,460 gradients, 114.9 GFLOPs
from n params module arguments
0 -1 1 0 ultralytics.nn.AddModules.multimodal.IN []
1 -1 1 0 ultralytics.nn.AddModules.multimodal.Multiin [1]
2 -2 1 0 ultralytics.nn.AddModules.multimodal.Multiin [2]
3 1 1 960 ultralytics.nn.AddModules.ResNet.ConvNormLayer[3, 32, 3, 2, 1, 'relu']
4 -1 1 9312 ultralytics.nn.AddModules.ResNet.ConvNormLayer[32, 32, 3, 1, 1, 'relu']
5 -1 1 18624 ultralytics.nn.AddModules.ResNet.ConvNormLayer[32, 64, 3, 1, 1, 'relu']
6 -1 1 0 torch.nn.modules.pooling.MaxPool2d [3, 2, 1]
7 -1 2 152512 ultralytics.nn.AddModules.ResNet.Blocks [64, 64, 2, 'BasicBlock', 2, False]
8 -1 2 526208 ultralytics.nn.AddModules.ResNet.Blocks [64, 128, 2, 'BasicBlock', 3, False]
9 -1 2 2100992 ultralytics.nn.AddModules.ResNet.Blocks [128, 256, 2, 'BasicBlock', 4, False]
10 -1 2 8396288 ultralytics.nn.AddModules.ResNet.Blocks [256, 512, 2, 'BasicBlock', 5, False]
11 -1 2 9707008 ultralytics.nn.AddModules.ResNet.Blocks [512, 512, 2, 'BasicBlock', 5, False]
12 2 1 960 ultralytics.nn.AddModules.ResNet.ConvNormLayer[3, 32, 3, 2, 1, 'relu']
13 -1 1 9312 ultralytics.nn.AddModules.ResNet.ConvNormLayer[32, 32, 3, 1, 1, 'relu']
14 -1 1 18624 ultralytics.nn.AddModules.ResNet.ConvNormLayer[32, 64, 3, 1, 1, 'relu']
15 -1 1 0 torch.nn.modules.pooling.MaxPool2d [3, 2, 1]
16 -1 2 152512 ultralytics.nn.AddModules.ResNet.Blocks [64, 64, 2, 'BasicBlock', 2, False]
17 -1 2 526208 ultralytics.nn.AddModules.ResNet.Blocks [64, 128, 2, 'BasicBlock', 3, False]
18 -1 2 2100992 ultralytics.nn.AddModules.ResNet.Blocks [128, 256, 2, 'BasicBlock', 4, False]
19 -1 2 8396288 ultralytics.nn.AddModules.ResNet.Blocks [256, 512, 2, 'BasicBlock', 5, False]
20 -1 2 9707008 ultralytics.nn.AddModules.ResNet.Blocks [512, 512, 2, 'BasicBlock', 5, False]
21 11 1 131584 ultralytics.nn.modules.conv.Conv [512, 256, 1, 1, None, 1, 1, False]
22 -1 1 789760 ultralytics.nn.modules.transformer.AIFI [256, 1024, 8]
23 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
24 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
25 10 1 131584 ultralytics.nn.modules.conv.Conv [512, 256, 1, 1, None, 1, 1, False]
26 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
27 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
28 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
29 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
30 9 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1, None, 1, 1, False]
31 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
32 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
33 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
34 8 1 33280 ultralytics.nn.modules.conv.Conv [128, 256, 1, 1, None, 1, 1, False]
35 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
36 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
37 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
38 [-1, 32] 1 0 ultralytics.nn.modules.conv.Concat [1]
39 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
40 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
41 [-1, 28] 1 0 ultralytics.nn.modules.conv.Concat [1]
42 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
43 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
44 [-1, 23] 1 0 ultralytics.nn.modules.conv.Concat [1]
45 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
46 20 1 131584 ultralytics.nn.modules.conv.Conv [512, 256, 1, 1, None, 1, 1, False]
47 -1 1 789760 ultralytics.nn.modules.transformer.AIFI [256, 1024, 8]
48 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
49 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
50 19 1 131584 ultralytics.nn.modules.conv.Conv [512, 256, 1, 1, None, 1, 1, False]
51 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
52 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
53 -1 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1]
54 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
55 18 1 66048 ultralytics.nn.modules.conv.Conv [256, 256, 1, 1, None, 1, 1, False]
56 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
57 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
58 -1 1 0 torch.nn.modules.upsampling.Upsample [None, 2, 'nearest']
59 17 1 33280 ultralytics.nn.modules.conv.Conv [128, 256, 1, 1, None, 1, 1, False]
60 [-2, -1] 1 0 ultralytics.nn.modules.conv.Concat [1]
61 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
62 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
63 [-1, 57] 1 0 ultralytics.nn.modules.conv.Concat [1]
64 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
65 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
66 [-1, 53] 1 0 ultralytics.nn.modules.conv.Concat [1]
67 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
68 -1 1 590336 ultralytics.nn.modules.conv.Conv [256, 256, 3, 2]
69 [-1, 48] 1 0 ultralytics.nn.modules.conv.Concat [1]
70 -1 3 657920 ultralytics.nn.modules.block.RepC3 [512, 256, 3, 0.5]
71 [36, 61] 1 0 ultralytics.nn.modules.conv.Concat [1]
72 [39, 64] 1 0 ultralytics.nn.modules.conv.Concat [1]
73 [42, 67] 1 0 ultralytics.nn.modules.conv.Concat [1]
74 [45, 70] 1 0 ultralytics.nn.modules.conv.Concat [1]
75 [71, 72, 73, 74] 1 4319892 ultralytics.nn.modules.head.RTDETRDecoder [1, [512, 512, 512, 512], 256, 300, 4, 8, 3]
rtdetr-resnet18-late-p6 summary: 887 layers, 60,149,460 parameters, 60,149,460 gradients, 114.9 GFLOPs