一、本文介绍
本文给大家带来的 2024.3月份最新改进机制 ,由CPA-Enhancer: Chain-of-Thought Prompted Adaptive Enhancer for Object Detection under Unknown Degradations论文提出的 CPA-Enhancer链式思考网络 , CPA-Enhancer通过引入 链式思考提示机制 ,实现了对未知退化条件下图像的 自适应增强 。该方法的核心在于能够利用CoT提示对图像退化进行动态分析和适应, 从而显著提升物体检测性能 。 其适用的场景非常多低照度、图像去雾、雨天、雪天均有提点效果 , 同时其参数量进入的非常小仅有V8n仅有350W,本文内容由我独家整理!
欢迎大家订阅我的专栏一起学习YOLO!
二、原理介绍
官方论文地址:
官方论文地址点击此处即可跳转
官方代码地址: 官方代码地址点击此处即可跳转
CPA-Enhancer的创新点和改进机制可以从以下几个方面进行概括:
1. 链式思考(CoT)提示:
首次将链式思考(CoT)提示机制应用于
物体检测
任务中,通过逐步引导的方式处理未知退化图像的问题。
2. 自适应增强策略:
提出了一种能够根据CoT提示动态调整其增强策略的自适应增强器,无需事先了解图像的退化类型。
3. 插件式模型设计:
CPA-Enhancer设计为一个插件式模块,可以轻松地与任何现有的通用物体检测器集成,提升在退化图像上的检测
性能
。
改进机制
CoT提示生成模块(CGM):通过CoT提示生成模块动态生成与图像退化相关的上下文信息,使
模型
能够识别并适应不同类型的图像退化。
内容驱动提示块(CPB):
利用内容驱动提示块加强输入特征与CoT提示之间的交互,允许模型根据退化的类型调整其增强策略。
端到端训练:
CPA-Enhancer能够与目标检测器一起端到端地训练,无需单独的预训练过程或额外的监督信号。
总结
CPA-Enhancer通过引入链式思考提示机制,实现了对未知退化条件下图像的自适应增强。该方法的核心在于能够利用CoT提示对图像退化进行动态分析和适应,从而显著提升物体检测性能。其插件式设计使其可以无缝集成到现有的检测框架中,为处理实际应用中遇到的各种退化条件提供了一种有效的解决方案。通过实验验证,CPA-Enhancer不仅在物体检测任务上设立了新的性能标准,还证明了其对其他下游视觉任务性能的提升作用,展示了广泛的应用潜力。
三、核心代码
核心代码的使用方式看章节四!
- import torch
- import torch.nn as nn
- import torch.nn.functional as F
- import numbers
- from einops import rearrange
- from einops.layers.torch import Rearrange
- __all__ = ['CPA_arch']
- class RFAConv(nn.Module): # 基于Group Conv实现的RFAConv
- def __init__(self, in_channel, out_channel, kernel_size=3, stride=1):
- super().__init__()
- self.kernel_size = kernel_size
- self.get_weight = nn.Sequential(nn.AvgPool2d(kernel_size=kernel_size, padding=kernel_size // 2, stride=stride),
- nn.Conv2d(in_channel, in_channel * (kernel_size ** 2), kernel_size=1,
- groups=in_channel, bias=False))
- self.generate_feature = nn.Sequential(
- nn.Conv2d(in_channel, in_channel * (kernel_size ** 2), kernel_size=kernel_size, padding=kernel_size // 2,
- stride=stride, groups=in_channel, bias=False),
- nn.BatchNorm2d(in_channel * (kernel_size ** 2)),
- nn.ReLU())
- self.conv = nn.Sequential(nn.Conv2d(in_channel, out_channel, kernel_size=kernel_size, stride=kernel_size),
- nn.BatchNorm2d(out_channel),
- nn.ReLU())
- def forward(self, x):
- b, c = x.shape[0:2]
- weight = self.get_weight(x)
- h, w = weight.shape[2:]
- weighted = weight.view(b, c, self.kernel_size ** 2, h, w).softmax(2) # b c*kernel**2,h,w -> b c k**2 h w
- feature = self.generate_feature(x).view(b, c, self.kernel_size ** 2, h,
- w) # b c*kernel**2,h,w -> b c k**2 h w 获得感受野空间特征
- weighted_data = feature * weighted
- conv_data = rearrange(weighted_data, 'b c (n1 n2) h w -> b c (h n1) (w n2)', n1=self.kernel_size,
- # b c k**2 h w -> b c h*k w*k
- n2=self.kernel_size)
- return self.conv(conv_data)
- class Downsample(nn.Module):
- def __init__(self, n_feat):
- super(Downsample, self).__init__()
- self.body = nn.Sequential(nn.Conv2d(n_feat, n_feat // 2, kernel_size=3, stride=1, padding=1, bias=False),
- nn.PixelUnshuffle(2))
- def forward(self, x):
- return self.body(x)
- class Upsample(nn.Module):
- def __init__(self, n_feat):
- super(Upsample, self).__init__()
- self.body = nn.Sequential(nn.Conv2d(n_feat, n_feat * 2, kernel_size=3, stride=1, padding=1, bias=False),
- nn.PixelShuffle(2))
- def forward(self, x): # (b,c,h,w)
- return self.body(x) # (b,c/2,h*2,w*2)
- class SpatialAttention(nn.Module):
- def __init__(self):
- super(SpatialAttention, self).__init__()
- self.sa = nn.Conv2d(2, 1, 7, padding=3, padding_mode='reflect', bias=True)
- def forward(self, x): # x:[b,c,h,w]
- x_avg = torch.mean(x, dim=1, keepdim=True) # (b,1,h,w)
- x_max, _ = torch.max(x, dim=1, keepdim=True) # (b,1,h,w)
- x2 = torch.concat([x_avg, x_max], dim=1) # (b,2,h,w)
- sattn = self.sa(x2) # 7x7conv (b,1,h,w)
- return sattn * x
- class ChannelAttention(nn.Module):
- def __init__(self, dim, reduction=8):
- super(ChannelAttention, self).__init__()
- self.gap = nn.AdaptiveAvgPool2d(1)
- self.ca = nn.Sequential(
- nn.Conv2d(dim, dim // reduction, 1, padding=0, bias=True),
- nn.ReLU(inplace=True), # Relu
- nn.Conv2d(dim // reduction, dim, 1, padding=0, bias=True),
- )
- def forward(self, x): # x:[b,c,h,w]
- x_gap = self.gap(x) # [b,c,1,1]
- cattn = self.ca(x_gap) # [b,c,1,1]
- return cattn * x
- class Channel_Shuffle(nn.Module):
- def __init__(self, num_groups):
- super(Channel_Shuffle, self).__init__()
- self.num_groups = num_groups
- def forward(self, x):
- batch_size, chs, h, w = x.shape
- chs_per_group = chs // self.num_groups
- x = torch.reshape(x, (batch_size, self.num_groups, chs_per_group, h, w))
- # (batch_size, num_groups, chs_per_group, h, w)
- x = x.transpose(1, 2) # dim_1 and dim_2
- out = torch.reshape(x, (batch_size, -1, h, w))
- return out
- class TransformerBlock(nn.Module):
- def __init__(self, dim, num_heads, ffn_expansion_factor, bias, LayerNorm_type):
- super(TransformerBlock, self).__init__()
- self.norm1 = LayerNorm(dim, LayerNorm_type)
- self.attn = Attention(dim, num_heads, bias)
- self.norm2 = LayerNorm(dim, LayerNorm_type)
- self.ffn = FeedForward(dim, ffn_expansion_factor, bias)
- def forward(self, x):
- x = x + self.attn(self.norm1(x))
- x = x + self.ffn(self.norm2(x))
- return x
- def to_3d(x):
- return rearrange(x, 'b c h w -> b (h w) c')
- def to_4d(x, h, w):
- return rearrange(x, 'b (h w) c -> b c h w', h=h, w=w)
- class BiasFree_LayerNorm(nn.Module):
- def __init__(self, normalized_shape):
- super(BiasFree_LayerNorm, self).__init__()
- if isinstance(normalized_shape, numbers.Integral):
- normalized_shape = (normalized_shape,)
- normalized_shape = torch.Size(normalized_shape)
- assert len(normalized_shape) == 1
- self.weight = nn.Parameter(torch.ones(normalized_shape))
- self.normalized_shape = normalized_shape
- def forward(self, x):
- sigma = x.var(-1, keepdim=True, unbiased=False)
- return x / torch.sqrt(sigma + 1e-5) * self.weight
- class WithBias_LayerNorm(nn.Module):
- def __init__(self, normalized_shape):
- super(WithBias_LayerNorm, self).__init__()
- if isinstance(normalized_shape, numbers.Integral):
- normalized_shape = (normalized_shape,)
- normalized_shape = torch.Size(normalized_shape)
- assert len(normalized_shape) == 1
- self.weight = nn.Parameter(torch.ones(normalized_shape))
- self.bias = nn.Parameter(torch.zeros(normalized_shape))
- self.normalized_shape = normalized_shape
- def forward(self, x):
- device = x.device
- mu = x.mean(-1, keepdim=True)
- sigma = x.var(-1, keepdim=True, unbiased=False)
- result = (x - mu) / torch.sqrt(sigma + 1e-5) * self.weight.to(device) + self.bias.to(device)
- return result
- class LayerNorm(nn.Module):
- def __init__(self, dim, LayerNorm_type):
- super(LayerNorm, self).__init__()
- if LayerNorm_type == 'BiasFree':
- self.body = BiasFree_LayerNorm(dim)
- else:
- self.body = WithBias_LayerNorm(dim)
- def forward(self, x):
- h, w = x.shape[-2:]
- return to_4d(self.body(to_3d(x)), h, w)
- class FeedForward(nn.Module):
- def __init__(self, dim, ffn_expansion_factor, bias):
- super(FeedForward, self).__init__()
- hidden_features = int(dim * ffn_expansion_factor)
- self.project_in = nn.Conv2d(dim, hidden_features * 2, kernel_size=1, bias=bias)
- self.dwconv = nn.Conv2d(hidden_features * 2, hidden_features * 2, kernel_size=3, stride=1, padding=1,
- groups=hidden_features * 2, bias=bias)
- self.project_out = nn.Conv2d(hidden_features, dim, kernel_size=1, bias=bias)
- def forward(self, x):
- device = x.device
- self.project_in = self.project_in.to(device)
- self.dwconv = self.dwconv.to(device)
- self.project_out = self.project_out.to(device)
- x = self.project_in(x)
- x1, x2 = self.dwconv(x).chunk(2, dim=1)
- x = F.gelu(x1) * x2
- x = self.project_out(x)
- return x
- class Attention(nn.Module):
- def __init__(self, dim, num_heads, bias):
- super(Attention, self).__init__()
- self.num_heads = num_heads
- self.temperature = nn.Parameter(torch.ones(num_heads, 1, 1, dtype=torch.float32), requires_grad=True)
- self.qkv = nn.Conv2d(dim, dim * 3, kernel_size=1, bias=bias)
- self.qkv_dwconv = nn.Conv2d(dim * 3, dim * 3, kernel_size=3, stride=1, padding=1, groups=dim * 3,
- bias=bias)
- self.project_out = nn.Conv2d(dim, dim, kernel_size=1, bias=bias)
- def forward(self, x):
- b, c, h, w = x.shape
- device = x.device
- self.qkv = self.qkv.to(device)
- self.qkv_dwconv = self.qkv_dwconv.to(device)
- self.project_out = self.project_out.to(device)
- qkv = self.qkv(x)
- qkv = self.qkv_dwconv(qkv)
- q, k, v = qkv.chunk(3, dim=1)
- q = rearrange(q, 'b (head c) h w -> b head c (h w)', head=self.num_heads)
- k = rearrange(k, 'b (head c) h w -> b head c (h w)', head=self.num_heads)
- v = rearrange(v, 'b (head c) h w -> b head c (h w)', head=self.num_heads)
- q = torch.nn.functional.normalize(q, dim=-1)
- k = torch.nn.functional.normalize(k, dim=-1)
- attn = (q @ k.transpose(-2, -1)) * self.temperature.to(device)
- attn = attn.softmax(dim=-1)
- out = (attn @ v)
- out = rearrange(out, 'b head c (h w) -> b (head c) h w', head=self.num_heads, h=h, w=w)
- out = self.project_out(out)
- return out
- class resblock(nn.Module):
- def __init__(self, dim):
- super(resblock, self).__init__()
- # self.norm = LayerNorm(dim, LayerNorm_type='BiasFree')
- self.body = nn.Sequential(nn.Conv2d(dim, dim, kernel_size=3, stride=1, padding=1, bias=False),
- nn.PReLU(),
- nn.Conv2d(dim, dim, kernel_size=3, stride=1, padding=1, bias=False))
- def forward(self, x):
- res = self.body((x))
- res += x
- return res
- #########################################################################
- # Chain-of-Thought Prompt Generation Module (CGM)
- class CotPromptParaGen(nn.Module):
- def __init__(self,prompt_inch,prompt_size, num_path=3):
- super(CotPromptParaGen, self).__init__()
- # (128,32,32)->(64,64,64)->(32,128,128)
- self.chain_prompts=nn.ModuleList([
- nn.ConvTranspose2d(
- in_channels=prompt_inch if idx==0 else prompt_inch//(2**idx),
- out_channels=prompt_inch//(2**(idx+1)),
- kernel_size=3, stride=2, padding=1
- ) for idx in range(num_path)
- ])
- def forward(self,x):
- prompt_params = []
- prompt_params.append(x)
- for pe in self.chain_prompts:
- x=pe(x)
- prompt_params.append(x)
- return prompt_params
- #########################################################################
- # Content-driven Prompt Block (CPB)
- class ContentDrivenPromptBlock(nn.Module):
- def __init__(self, dim, prompt_dim, reduction=8, num_splits=4):
- super(ContentDrivenPromptBlock, self).__init__()
- self.dim = dim
- self.num_splits = num_splits
- self.pa2 = nn.Conv2d(2 * dim, dim, 7, padding=3, padding_mode='reflect', groups=dim, bias=True)
- self.sigmoid = nn.Sigmoid()
- self.conv3x3 = nn.Conv2d(prompt_dim, prompt_dim, kernel_size=3, stride=1, padding=1, bias=False)
- self.conv1x1 = nn.Conv2d(dim, prompt_dim, kernel_size=1, stride=1, bias=False)
- self.sa = SpatialAttention()
- self.ca = ChannelAttention(dim, reduction)
- self.myshuffle = Channel_Shuffle(2)
- self.out_conv1 = nn.Conv2d(prompt_dim + dim, dim, kernel_size=1, stride=1, bias=False)
- self.transformer_block = [
- TransformerBlock(dim=dim // num_splits, num_heads=1, ffn_expansion_factor=2.66, bias=False,
- LayerNorm_type='WithBias') for _ in range(num_splits)]
- def forward(self, x, prompt_param):
- # latent: (b,dim*8,h/8,w/8) prompt_param3: (1, 256, 16, 16)
- x_ = x
- B, C, H, W = x.shape
- cattn = self.ca(x) # channel-wise attn
- sattn = self.sa(x) # spatial-wise attn
- pattn1 = sattn + cattn
- pattn1 = pattn1.unsqueeze(dim=2) # [b,c,1,h,w]
- x = x.unsqueeze(dim=2) # [b,c,1,h,w]
- x2 = torch.cat([x, pattn1], dim=2) # [b,c,2,h,w]
- x2 = Rearrange('b c t h w -> b (c t) h w')(x2) # [b,c*2,h,w]
- x2 = self.myshuffle(x2) # [c1,c1_att,c2,c2_att,...]
- pattn2 = self.pa2(x2)
- pattn2 = self.conv1x1(pattn2) # [b,prompt_dim,h,w]
- prompt_weight = self.sigmoid(pattn2) # Sigmod
- prompt_param = F.interpolate(prompt_param, (H, W), mode="bilinear")
- # (b,prompt_dim,prompt_size,prompt_size) -> (b,prompt_dim,h,w)
- prompt = prompt_weight * prompt_param
- prompt = self.conv3x3(prompt) # (b,prompt_dim,h,w)
- inter_x = torch.cat([x_, prompt], dim=1) # (b,prompt_dim+dim,h,w)
- inter_x = self.out_conv1(inter_x) # (b,dim,h,w) dim=64
- splits = torch.split(inter_x, self.dim // self.num_splits, dim=1)
- transformered_splits = []
- for i, split in enumerate(splits):
- transformered_split = self.transformer_block[i](split)
- transformered_splits.append(transformered_split)
- result = torch.cat(transformered_splits, dim=1)
- return result
- #########################################################################
- # CPA_Enhancer
- class CPA_arch(nn.Module):
- def __init__(self, c_in=3, c_out=3, dim=4, prompt_inch=128, prompt_size=32):
- super(CPA_arch, self).__init__()
- self.conv0 = RFAConv(c_in, dim)
- self.conv1 = RFAConv(dim, dim)
- self.conv2 = RFAConv(dim * 2, dim * 2)
- self.conv3 = RFAConv(dim * 4, dim * 4)
- self.conv4 = RFAConv(dim * 8, dim * 8)
- self.conv5 = RFAConv(dim * 8, dim * 4)
- self.conv6 = RFAConv(dim * 4, dim * 2)
- self.conv7 = RFAConv(dim * 2, c_out)
- self.down1 = Downsample(dim)
- self.down2 = Downsample(dim * 2)
- self.down3 = Downsample(dim * 4)
- self.prompt_param_ini = nn.Parameter(torch.rand(1, prompt_inch, prompt_size, prompt_size)) # (b,c,h,w)
- self.myPromptParamGen = CotPromptParaGen(prompt_inch=prompt_inch,prompt_size=prompt_size)
- self.prompt1 = ContentDrivenPromptBlock(dim=dim * 2 ** 1, prompt_dim=prompt_inch // 4, reduction=8) # !!!!
- self.prompt2 = ContentDrivenPromptBlock(dim=dim * 2 ** 2, prompt_dim=prompt_inch // 2, reduction=8)
- self.prompt3 = ContentDrivenPromptBlock(dim=dim * 2 ** 3, prompt_dim=prompt_inch , reduction=8)
- self.up3 = Upsample(dim * 8)
- self.up2 = Upsample(dim * 4)
- self.up1 = Upsample(dim * 2)
- def forward(self, x): # (b,c_in,h,w)
- prompt_params = self.myPromptParamGen(self.prompt_param_ini)
- prompt_param1 = prompt_params[2] # [1, 64, 64, 64]
- prompt_param2 = prompt_params[1] # [1, 128, 32, 32]
- prompt_param3 = prompt_params[0] # [1, 256, 16, 16]
- x0 = self.conv0(x) # (b,dim,h,w)
- x1 = self.conv1(x0) # (b,dim,h,w)
- x1_down = self.down1(x1) # (b,dim,h/2,w/2)
- x2 = self.conv2(x1_down) # (b,dim,h/2,w/2)
- x2_down = self.down2(x2)
- x3 = self.conv3(x2_down)
- x3_down = self.down3(x3)
- x4 = self.conv4(x3_down)
- device = x4.device
- self.prompt1 = self.prompt1.to(device)
- self.prompt2 = self.prompt2.to(device)
- self.prompt3 = self.prompt3.to(device)
- x4_prompt = self.prompt3(x4, prompt_param3)
- x3_up = self.up3(x4_prompt)
- x5 = self.conv5(torch.cat([x3_up, x3], 1))
- x5_prompt = self.prompt2(x5, prompt_param2)
- x2_up = self.up2(x5_prompt)
- x2_cat = torch.cat([x2_up, x2], 1)
- x6 = self.conv6(x2_cat)
- x6_prompt = self.prompt1(x6, prompt_param1)
- x1_up = self.up1(x6_prompt)
- x7 = self.conv7(torch.cat([x1_up, x1], 1))
- return x7
- if __name__ == "__main__":
- # Generating Sample image
- image_size = (1, 3, 640, 640)
- image = torch.rand(*image_size)
- out = CPA_arch(3, 3, 4)
- out = out(image)
- print(out.size())
四、手把手教你添加本文机制
4.1 修改一
第一还是建立文件,我们找到如下 ultralytics /nn/modules文件夹下建立一个目录名字呢就是'Addmodules'文件夹( 用群内的文件的话已经有了无需新建) !然后在其内部建立一个新的py文件将核心代码复制粘贴进去即可。
4.2 修改二
第二步我们在该目录下创建一个新的py文件名字为'__init__.py'( 用群内的文件的话已经有了无需新建) ,然后在其内部导入我们的检测头如下图所示。
4.3 修改三
第三步我门中到如下文件'ultralytics/nn/tasks.py'进行导入和注册我们的模块( 用群内的文件的话已经有了无需重新导入直接开始第四步即可) !
从今天开始以后的教程就都统一成这个样子了,因为我默认大家用了我群内的文件来进行修改!!
4.4 修改四
按照我的添加在parse_model里添加即可。
关闭混合精度验证!
找到'ultralytics/engine/validator.py'文件找到 'class BaseValidator:' 然后在其'__call__'中 self.args.half = self.device.type != 'cpu' # force FP16 val during training的一行代码下面加上self.args.half = False
打印计算量的问题!
计算的GFLOPs计算异常不打印,所以需要额外修改一处, 我们找到如下文件'ultralytics/utils/torch_utils.py'文件内有如下的代码按照如下的图片进行修改,有一个get_flops的 函数 我们直接用我给的代码全部替换!
- def get_flops(model, imgsz=640):
- """Return a YOLO model's FLOPs."""
- if not thop:
- return 0.0 # if not installed return 0.0 GFLOPs
- try:
- model = de_parallel(model)
- p = next(model.parameters())
- if not isinstance(imgsz, list):
- imgsz = [imgsz, imgsz] # expand if int/float
- try:
- # Use stride size for input tensor
- stride = 640
- im = torch.empty((1, 3, stride, stride), device=p.device) # input image in BCHW format
- flops = thop.profile(deepcopy(model), inputs=[im], verbose=False)[0] / 1e9 * 2 # stride GFLOPs
- return flops * imgsz[0] / stride * imgsz[1] / stride # imgsz GFLOPs
- except Exception:
- # Use actual image size for input tensor (i.e. required for RTDETR models)
- im = torch.empty((1, p.shape[1], *imgsz), device=p.device) # input image in BCHW format
- return thop.profile(deepcopy(model), inputs=[im], verbose=False)[0] / 1e9 * 2 # imgsz GFLOPs
- except Exception:
- return 0.0
到此就修改完成了,大家可以复制下面的yaml文件运行。
五、yaml文件和运行记录
5.1 yaml文件
此版本训练信息:YOLO11-CPAEnhancer summary: 490 layers, 3,087,792 parameters, 3,087,776 gradients, 19.2 GFLOPs
- # Ultralytics YOLO 🚀, AGPL-3.0 license
- # YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
- # Parameters
- nc: 80 # number of classes
- scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'
- # [depth, width, max_channels]
- n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPs
- s: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPs
- m: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPs
- l: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPs
- x: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs
- # YOLO11n backbone
- backbone:
- # [from, repeats, module, args]
- - [-1, 1, CPA_arch, []] # 0-P1/2
- - [-1, 1, Conv, [64, 3, 2]] # 1-P1/2
- - [-1, 1, Conv, [128, 3, 2]] # 2-P2/4
- - [-1, 2, C3k2, [256, False, 0.25]]
- - [-1, 1, Conv, [256, 3, 2]] # 4-P3/8
- - [-1, 2, C3k2, [512, False, 0.25]]
- - [-1, 1, Conv, [512, 3, 2]] # 6-P4/16
- - [-1, 2, C3k2, [512, True]]
- - [-1, 1, Conv, [1024, 3, 2]] # 8-P5/32
- - [-1, 2, C3k2, [1024, True]]
- - [-1, 1, SPPF, [1024, 5]] # 10
- - [-1, 2, C2PSA, [1024]] # 11
- # YOLO11n head
- head:
- - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- - [[-1, 7], 1, Concat, [1]] # cat backbone P4
- - [-1, 2, C3k2, [512, False]] # 14
- - [-1, 1, nn.Upsample, [None, 2, "nearest"]]
- - [[-1, 5], 1, Concat, [1]] # cat backbone P3
- - [-1, 2, C3k2, [256, False]] # 17 (P3/8-small)
- - [-1, 1, Conv, [256, 3, 2]]
- - [[-1, 14], 1, Concat, [1]] # cat head P4
- - [-1, 2, C3k2, [512, False]] # 20 (P4/16-medium)
- - [-1, 1, Conv, [512, 3, 2]]
- - [[-1, 11], 1, Concat, [1]] # cat head P5
- - [-1, 2, C3k2, [1024, True]] # 23 (P5/32-large)
- - [[17, 20, 23], 1, Detect, [nc]] # Detect(P3, P4, P5)
5.2 训练代码
大家可以创建一个py文件将我给的代码复制粘贴进去,配置好自己的文件路径即可运行。
- import warnings
- warnings.filterwarnings('ignore')
- from ultralytics import YOLO
- if __name__ == '__main__':
- model = YOLO('ultralytics/cfg/models/v8/yolov8-C2f-FasterBlock.yaml')
- # model.load('yolov8n.pt') # loading pretrain weights
- model.train(data=r'替换数据集yaml文件地址',
- # 如果大家任务是其它的'ultralytics/cfg/default.yaml'找到这里修改task可以改成detect, segment, classify, pose
- cache=False,
- imgsz=640,
- epochs=150,
- single_cls=False, # 是否是单类别检测
- batch=4,
- close_mosaic=10,
- workers=0,
- device='0',
- optimizer='SGD', # using SGD
- # resume='', # 如过想续训就设置last.pt的地址
- amp=False, # 如果出现训练损失为Nan可以关闭amp
- project='runs/train',
- name='exp',
- )
5.3 训练过程截图
五、本文总结
到此本文的正式分享内容就结束了,在这里给大家推荐我的YOLOv11改进有效涨点专栏,本专栏目前为新开的平均质量分98分,后期我会根据各种最新的前沿顶会进行论文复现,也会对一些老的改进机制进行补充,如果大家觉得本文帮助到你了,订阅本专栏,关注后续更多的更新~