💡💡💡本文原创自研创新改进:vHeat在视觉任务中实现了很好的性能,包括图像分类,目标检测和语义分割。它还具有更高的推理速度、更少的FLOPs和更低的GPU内存使用来处理高分辨率图像。
💡💡💡多个数据集验证,能够涨点
💡💡💡将视觉语义的空间传播和物理热传导建立联系,提出了一种视觉热传导算子(Heat Conduction Operator, HCO),设计出了一种兼具低复杂度、全局感受野、物理可解释性的视觉表征模型 vHeat。HCO 与 self-attention 的计算形式和复杂度对比如下图所示。实验证明了 vHeat 在各种视觉任务中表现优秀。例如 vHeat-T 在 ImageNet-1K 上达到 82.2% 的分类准确率,比 Swin-T 高 0.9%,比 Vim-S 高1.7%。

收录
YOLOv8原创自研
💡💡💡全网独家首发创新(原创),适合paper !!!
💡💡💡 2024年计算机视觉顶会创新点适用于Yolov5、Yolov7、Yolov8等各个Yolo系列,专栏文章提供每一步步骤和源码,轻松带你上手魔改网络 !!!
💡💡💡重点:通过本专栏的阅读,后续你也可以设计魔改网络,在网络不同位置(Backbone、head、detect、loss等)进行魔改,实现创新!!!
1.原理介绍

论文: https://arxiv.org/pdf/2405.16555
摘要:学习鲁棒性和表达性视觉表征的一个基本问题是如何有效地估计整个图像中视觉语义的空间关系。在这项研究中,我们提出了一种新的视觉骨干模型vHeat,同时实现了高计算效率和全局接受野。受热传导物理原理的启发,其基本思想是将图像块概念化为热源,并将其相关性的计算建模为热能的扩散。该机制通过新提出的热传导算子(HCO)模块整合到深度模型中,该模块在物理上是合理的,可以使用DCT和IDCT操作有效地实现,复杂度为0 (N1.5)。广泛的实验表明,vHeat在各种视觉任务中超越视觉变压器(ViTs),同时还提供更高的推理速度,更低的FLOPs和更低的高分辨率图像GPU内存使用。

1)我们提出了一种基于热传导物理原理的视觉骨干模型vHeat,该模型同时实现了全局接受场,计算量低复杂性和高可解释性。
2)我们设计了热传导算子(HCO),这是一个物理上合理的模块,将图像斑块概念为热源,通过fve预测自适应热扩散,并根据热传导原理传递信息。
3)没有花里胡哨的技巧,vHeat在视觉任务中实现了很好的性能,包括图像分类,目标检测和语义分割。它还具有更高的推理速度、更少的FLOPs和更低的GPU内存使用来处理高分辨率图像。


2.如何加入YOLOv8
2.1新建加入ultralytics/nn/block/vHeat.py
import time
import math
from functools import partial
from typing import Optional, Callable
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.utils.checkpoint as checkpoint
from einops import rearrange, repeat
from timm.models.layers import DropPath, to_2tuple, trunc_normal_
from ultralytics.nn.modules.block import C2f
######################################## vHeat start by AI monsters csdn https://blog.csdn.net/m0_63774211 ########################################
class Mlp_Heat(nn.Module):
def __init__(self, in_features, hidden_features=None, out_features=None, act_layer=nn.GELU, drop=0.,
channels_first=False):
super().__init__()
out_features = out_features or in_features
hidden_features = hidden_features or in_features
Linear = partial(nn.Conv2d, kernel_size=1, padding=0) if channels_first else nn.Linear
self.fc1 = Linear(in_features, hidden_features)
self.act = act_layer()
self.fc2 = Linear(hidden_features, out_features)
self.drop = nn.Dropout(drop)
def forward(self, x):
x = self.fc1(x)
x = self.act(x)
x = self.drop(x)
x = self.fc2(x)
x = self.drop(x)
return x
class LayerNorm2d(nn.LayerNorm):
def forward(self, x: torch.Tensor):
x = x.permute(0, 2, 3, 1).contiguous()
x = F.layer_norm(x, self.normalized_shape, self.weight, self.bias, self.eps)
x = x.permute(0, 3, 1, 2).contiguous()
return x
class Heat2D(nn.Module):
"""
du/dt -k(d2u/dx2 + d2u/dy2) = 0;
du/dx_{x=0, x=a} = 0
du/dy_{y=0, y=b} = 0
=>
A_{n, m} = C(a, b, n==0, m==0) * sum_{0}^{a}{ sum_{0}^{b}{\phi(x, y)cos(n\pi/ax)cos(m\pi/by)dxdy }}
core = cos(n\pi/ax)cos(m\pi/by)exp(-[(n\pi/a)^2 + (m\pi/b)^2]kt)
u_{x, y, t} = sum_{0}^{\infinite}{ sum_{0}^{\infinite}{ core } }
assume a = N, b = M; x in [0, N], y in [0, M]; n in [0, N], m in [0, M]; with some slight change
=>
(\phi(x, y) = linear(dwconv(input(x, y))))
A(n, m) = DCT2D(\phi(x, y))
u(x, y, t) = IDCT2D(A(n, m) * exp(-[(n\pi/a)^2 + (m\pi/b)^2])**kt)
"""
def __init__(self, infer_mode=False, res=14, dim=96, hidden_dim=96, **kwargs):
super().__init__()
self.res = res
self.dwconv = nn.Conv2d(dim, hidden_dim, kernel_size=3, padding=1, groups=hidden_dim)
self.hidden_dim = hidden_dim
self.linear = nn.Linear(hidden_dim, 2 * hidden_dim, bias=True)
self.out_norm = nn.LayerNorm(hidden_dim)
self.out_linear = nn.Linear(hidden_dim, hidden_dim, bias=True)
self.infer_mode = infer_mode
self.to_k = nn.Sequential(
nn.Linear(hidden_dim, hidden_dim, bias=True),
nn.ReLU(),
)
def infer_init_heat2d(self, freq):
weight_exp = self.get_decay_map((self.res, self.res), device=freq.device)
self.k_exp = nn.Parameter(torch.pow(weight_exp[:, :, None], self.to_k(freq)), requires_grad=False)
# del self.to_k
@staticmethod
def get_cos_map(N=224, device=torch.device("cpu"), dtype=torch.float):
# cos((x + 0.5) / N * n * \pi) which is also the form of DCT and IDCT
# DCT: F(n) = sum( (sqrt(2/N) if n > 0 else sqrt(1/N)) * cos((x + 0.5) / N * n * \pi) * f(x) )
# IDCT: f(x) = sum( (sqrt(2/N) if n > 0 else sqrt(1/N)) * cos((x + 0.5) / N * n * \pi) * F(n) )
# returns: (Res_n, Res_x)
weight_x = (torch.linspace(0, N - 1, N, device=device, dtype=dtype).view(1, -1) + 0.5) / N
weight_n = torch.linspace(0, N - 1, N, device=device, dtype=dtype).view(-1, 1)
weight = torch.cos(weight_n * weight_x * torch.pi) * math.sqrt(2 / N)
weight[0, :] = weight[0, :] / math.sqrt(2)
return weight
@staticmethod
def get_decay_map(resolution=(224, 224), device=torch.device("cpu"), dtype=torch.float):
# exp(-[(n\pi/a)^2 + (m\pi/b)^2])
# returns: (Res_h, Res_w)
resh, resw = resolution
weight_n = torch.linspace(0, torch.pi, resh + 1, device=device, dtype=dtype)[:resh].view(-1, 1)
weight_m = torch.linspace(0, torch.pi, resw + 1, device=device, dtype=dtype)[:resw].view(1, -1)
weight = torch.pow(weight_n, 2) + torch.pow(weight_m, 2)
weight = torch.exp(-weight)
return weight
def forward(self, x: torch.Tensor, freq_embed=None):
B, C, H, W = x.shape
x = self.dwconv(x)
x = self.linear(x.permute(0, 2, 3, 1).contiguous()) # B, H, W, 2C
x, z = x.chunk(chunks=2, dim=-1) # B, H, W, C
if ((H, W) == getattr(self, "__RES__", (0, 0))) and (getattr(self, "__WEIGHT_COSN__", None).device == x.device):
weight_cosn = getattr(self, "__WEIGHT_COSN__", None)
weight_cosm = getattr(self, "__WEIGHT_COSM__", None)
weight_exp = getattr(self, "__WEIGHT_EXP__", None)
assert weight_cosn is not None
assert weight_cosm is not None
assert weight_exp is not None
else:
weight_cosn = self.get_cos_map(H, device=x.device).detach_()
weight_cosm = self.get_cos_map(W, device=x.device).detach_()
weight_exp = self.get_decay_map((H, W), device=x.device).detach_()
setattr(self, "__RES__", (H, W))
setattr(self, "__WEIGHT_COSN__", weight_cosn)
setattr(self, "__WEIGHT_COSM__", weight_cosm)
setattr(self, "__WEIGHT_EXP__", weight_exp)
N, M = weight_cosn.shape[0], weight_cosm.shape[0]
x = F.conv1d(x.contiguous().view(B, H, -1), weight_cosn.contiguous().view(N, H, 1).type_as(x))
x = F.conv1d(x.contiguous().view(-1, W, C),
weight_cosm.contiguous().view(M, W, 1).type_as(x)).contiguous().view(B, N, M, -1)
if not self.training:
x = torch.einsum("bnmc,nmc->bnmc", x, self.k_exp.type_as(x))
else:
weight_exp = torch.pow(weight_exp[:, :, None], self.to_k(freq_embed))
x = torch.einsum("bnmc,nmc -> bnmc", x, weight_exp) # exp decay
x = F.conv1d(x.contiguous().view(B, N, -1), weight_cosn.t().contiguous().view(H, N, 1).type_as(x))
x = F.conv1d(x.contiguous().view(-1, M, C),
weight_cosm.t().contiguous().view(W, M, 1).type_as(x)).contiguous().view(B, H, W, -1)
x = self.out_norm(x)
x = x * nn.functional.silu(z)
x = self.out_linear(x)
x = x.permute(0, 3, 1, 2).contiguous()
return x
class HeatBlock(nn.Module):
def __init__(
self,
hidden_dim: int = 0,
res: int = 14,
infer_mode=False,
drop_path: float = 0,
norm_layer: Callable[..., torch.nn.Module] = partial(LayerNorm2d, eps=1e-6),
use_checkpoint: bool = False,
drop: float = 0.0,
act_layer: nn.Module = nn.GELU,
mlp_ratio: float = 4.0,
post_norm=True,
layer_scale=None,
**kwargs,
):
super().__init__()
self.use_checkpoint = use_checkpoint
self.norm1 = norm_layer(hidden_dim)
self.op = Heat2D(res=res, dim=hidden_dim, hidden_dim=hidden_dim, infer_mode=infer_mode)
self.drop_path = DropPath(drop_path) if drop_path > 0. else nn.Identity()
self.mlp_branch = mlp_ratio > 0
if self.mlp_branch:
self.norm2 = norm_layer(hidden_dim)
mlp_hidden_dim = int(hidden_dim * mlp_ratio)
self.mlp = Mlp_Heat(in_features=hidden_dim, hidden_features=mlp_hidden_dim, act_layer=act_layer, drop=drop,
channels_first=True)
self.post_norm = post_norm
self.layer_scale = layer_scale is not None
self.infer_mode = infer_mode
if self.layer_scale:
self.gamma1 = nn.Parameter(layer_scale * torch.ones(hidden_dim),
requires_grad=True)
self.gamma2 = nn.Parameter(layer_scale * torch.ones(hidden_dim),
requires_grad=True)
self.freq_embed = nn.Parameter(torch.zeros(res, res, hidden_dim), requires_grad=True)
trunc_normal_(self.freq_embed, std=0.02)
self.op.infer_init_heat2d(self.freq_embed)
def _forward(self, x: torch.Tensor):
if not self.layer_scale:
if self.post_norm:
x = x + self.drop_path(self.norm1(self.op(x, self.freq_embed)))
if self.mlp_branch:
x = x + self.drop_path(self.norm2(self.mlp(x))) # FFN
else:
x = x + self.drop_path(self.op(self.norm1(x), self.freq_embed))
if self.mlp_branch:
x = x + self.drop_path(self.mlp(self.norm2(x))) # FFN
return x
if self.post_norm:
x = x + self.drop_path(self.gamma1[:, None, None] * self.norm1(self.op(x, self.freq_embed)))
if self.mlp_branch:
x = x + self.drop_path(self.gamma2[:, None, None] * self.norm2(self.mlp(x))) # FFN
else:
x = x + self.drop_path(self.gamma1[:, None, None] * self.op(self.norm1(x), self.freq_embed))
if self.mlp_branch:
x = x + self.drop_path(self.gamma2[:, None, None] * self.mlp(self.norm2(x))) # FFN
return x
def forward(self, input: torch.Tensor):
if not self.training:
self.op.infer_init_heat2d(self.freq_embed)
if self.use_checkpoint:
return checkpoint.checkpoint(self._forward, input)
else:
return self._forward(input)
class C2f_Heat(C2f):
def __init__(self, c1, c2, n=1, feat_size=None, shortcut=False, g=1, e=0.5):
super().__init__(c1, c2, n, shortcut, g, e)
self.m = nn.ModuleList(HeatBlock(self.c, feat_size) for _ in range(n))
######################################## vHeat start by AI monsters csdn https://blog.csdn.net/m0_63774211 ########################################2.2 注册ultralytics/nn/tasks.py
1)C2f_Heat进行注册
from ultralytics.nn.block.vHeat import C2f_Heat2)修改def parse_model(d, ch, verbose=True): # model_dict, input_channels(3)
不要直接复制以下代码,只需要将 C2f_Heat加入你的工程
n = n_ = max(round(n * depth), 1) if n > 1 else n # depth gain
if m in (
Classify,
Conv,
ConvTranspose,
GhostConv,
Bottleneck,
GhostBottleneck,
SPP,
SPPF,
DWConv,
Focus,
BottleneckCSP,
C1,
C2,
C2f,
C2fAttn,
C3,
C3TR,
C3Ghost,
nn.ConvTranspose2d,
DWConvTranspose2d,
C3x,
RepC3,
C2f_Heat
):
c1, c2 = ch[f], args[0]
if c2 != nc: # if c2 not equal to number of classes (i.e. for Classify() output)
c2 = make_divisible(min(c2, max_channels) * width, 8)
args = [c1, c2, *args[1:]]
if m in (BottleneckCSP, C1, C2, C2f, C2fAttn, C3, C3TR, C3Ghost, C3x, RepC3,C2f_Heat):
args.insert(2, n) # number of repeats
n = 13) 修改class DetectionModel(BaseModel):
原始为
if isinstance(m, Detect): # includes all Detect subclasses like Segment, Pose, OBB, WorldDetect
s = 256 # 2x min stride修改为:
if isinstance(m, Detect): # includes all Detect subclasses like Segment, Pose, OBB, WorldDetect
s = 640 # 2x min stride2.3 yolov8_C2f_Heat.yaml

# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLOv8 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect
# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolov8n.yaml' will call yolov8.yaml with scale 'n'
# [depth, width, max_channels]
n: [0.33, 0.25, 1024] # YOLOv8n summary: 225 layers, 3157200 parameters, 3157184 gradients, 8.9 GFLOPs
s: [0.33, 0.50, 1024] # YOLOv8s summary: 225 layers, 11166560 parameters, 11166544 gradients, 28.8 GFLOPs
m: [0.67, 0.75, 768] # YOLOv8m summary: 295 layers, 25902640 parameters, 25902624 gradients, 79.3 GFLOPs
l: [1.00, 1.00, 512] # YOLOv8l summary: 365 layers, 43691520 parameters, 43691504 gradients, 165.7 GFLOPs
x: [1.00, 1.25, 512] # YOLOv8x summary: 365 layers, 68229648 parameters, 68229632 gradients, 258.5 GFLOPs
# YOLOv8.0n backbone
backbone:
# [from, repeats, module, args]
- [-1, 1, Conv, [64, 3, 2]] # 0-P1/2
- [-1, 1, Conv, [128, 3, 2]] # 1-P2/4
- [-1, 3, C2f_Heat, [128, 160, True]]
- [-1, 1, Conv, [256, 3, 2]] # 3-P3/8
- [-1, 6, C2f_Heat, [256, 80, True]]
- [-1, 1, Conv, [512, 3, 2]] # 5-P4/16
- [-1, 6, C2f_Heat, [512, 40, True]]
- [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32
- [-1, 3, C2f_Heat, [1024, 20, True]]
- [-1, 1, SPPF, [1024, 5]] # 9
# YOLOv8.0n head
head:
- [-1, 1, nn.Upsample, [None, 2, 'nearest']]
- [[-1, 6], 1, Concat, [1]] # cat backbone P4
- [-1, 3, C2f, [512]] # 12
- [-1, 1, nn.Upsample, [None, 2, 'nearest']]
- [[-1, 4], 1, Concat, [1]] # cat backbone P3
- [-1, 3, C2f, [256]] # 15 (P3/8-small)
- [-1, 1, Conv, [256, 3, 2]]
- [[-1, 12], 1, Concat, [1]] # cat head P4
- [-1, 3, C2f, [512]] # 18 (P4/16-medium)
- [-1, 1, Conv, [512, 3, 2]]
- [[-1, 9], 1, Concat, [1]] # cat head P5
- [-1, 3, C2f, [1024]] # 21 (P5/32-large)
- [[15, 18, 21], 1, Detect, [nc]] # Detect(P3, P4, P5)