学习资源站

RT-DETR改进策略【损失函数篇】通过辅助边界框计算IoU提升检测效果(Inner_GIoU、Inner_DIoU、Inner_CIoU、Inner_EIoU、Inner_SIoU)_rtdetrloss-

RT-DETR改进策略【损失函数篇】| 通过辅助边界框计算IoU提升检测效果(Inner_GIoU、Inner_DIoU、Inner_CIoU、Inner_EIoU、Inner_SIoU)

一、背景:

  • 现有基于IoU的边界框回归方法主要通过添加新的损失项来加速收敛,忽略了IoU损失项本身的局限性,且在不同检测器和检测任务中不能自我调整,泛化性不强。
  • 通过分析边界框回归模型, inner_iou 论文中发现区分不同的回归样本,并使用不同尺度的辅助边界框来计算损失,可以有效加速边界框回归过程。对于高IoU样本,使用较小的辅助边界框计算损失可加速收敛,而较大的辅助边界框适用于低IoU样本。

本文将 RT-DETR 默认的 CIoU 损失函数修改成 inner_IoU inner_GIoU inner_DIoU inner_CIoU inner_EIoU inner_SIoU



二、原理

Inner-IoU : More Effective Intersection over Union Loss with Auxiliary Bounding Box

2.1 Inner - IoU计算原理

  1. 定义相关参数:
    • 真实(GT)框和锚点分别表示为 B g t B^{gt} B g t B B B
    • GT框和内GT框的中心点表示为 ( x c g t , y c g t ) (x_{c}^{gt}, y_{c}^{gt}) ( x c g t , y c g t ) ,锚点和内锚点的中心点表示为 ( x c , y c ) (x_{c}, y_{c}) ( x c , y c )
    • GT框的宽度和高度表示为 w g t w^{gt} w g t h g t h^{gt} h g t ,锚点的宽度和高度表示为 w w w h h h
    • 引入比例因子 ratio
  • 根据以下公式计算辅助边界框的坐标:
    • b l g t = x c g t − w g t ∗ r a t i o 2 b_{l}^{g t} = x_{c}^{g t} - \frac{w^{g t} * ratio}{2} b l g t = x c g t 2 w g t r a t i o b r g t = x c g t + w g t ∗ r a t i o 2 b_{r}^{g t} = x_{c}^{g t} + \frac{w^{g t} * ratio}{2} b r g t = x c g t + 2 w g t r a t i o
    • b t g t = y c g t − h g t ∗ r a t i o 2 b_{t}^{g t} = y_{c}^{g t} - \frac{h^{g t} * ratio}{2} b t g t = y c g t 2 h g t r a t i o b b g t = y c g t + h g t ∗ r a t i o 2 b_{b}^{g t} = y_{c}^{g t} + \frac{h^{g t} * ratio}{2} b b g t = y c g t + 2 h g t r a t i o
    • b l = x c − w ∗ r a t i o 2 b_{l} = x_{c} - \frac{w * ratio}{2} b l = x c 2 w r a t i o b r = x c + w ∗ r a t i o 2 b_{r} = x_{c} + \frac{w * ratio}{2} b r = x c + 2 w r a t i o
    • b t = y c − h ∗ r a t i o 2 b_{t} = y_{c} - \frac{h * ratio}{2} b t = y c 2 h r a t i o b b = y c + h ∗ r a t i o 2 b_{b} = y_{c} + \frac{h * ratio}{2} b b = y c + 2 h r a t i o
  • 计算交并比:
    • i n t e r = ( m i n ( b r g t , b r ) − m a x ( b l g t , b l ) ) ∗ ( m i n ( b b g t , b b ) − m a x ( b t g t , b t ) ) inter = (min(b_{r}^{g t}, b_{r}) - max(b_{l}^{g t}, b_{l})) * (min(b_{b}^{g t}, b_{b}) - max(b_{t}^{g t}, b_{t})) in t er = ( min ( b r g t , b r ) ma x ( b l g t , b l )) ( min ( b b g t , b b ) ma x ( b t g t , b t ))
    • u n i o n = ( w g t ∗ h g t ) ∗ ( r a t i o ) 2 + ( w ∗ h ) ∗ ( r a t i o ) 2 − i n t e r union = (w^{g t} * h^{g t}) * (ratio)^{2} + (w * h) * (ratio)^{2} - inter u ni o n = ( w g t h g t ) ( r a t i o ) 2 + ( w h ) ( r a t i o ) 2 in t er
    • I o U i n n e r = i n t e r u n i o n IoU^{inner} = \frac{inter}{union} I o U inn er = u ni o n in t er
  • Inner - IoU 损失的计算公式为: L I n n e r − I o U = 1 − I o U i n n e r L_{Inner - IoU} = 1 - IoU^{inner} L I nn er I o U = 1 I o U inn er
  • Inner - IoU 应用于现有基于IoU的边界框回归损失函数,得到:
    • L I n n e r − G I o U = L G I o U + I o U − I o U i n n e r L_{Inner - GIoU} = L_{GIoU} + IoU - IoU^{inner} L I nn er G I o U = L G I o U + I o U I o U inn er
    • L I n n e r − D I o U = L D I o U + I o U − I o U i n n e r L_{Inner - DIoU} = L_{DIoU} + IoU - IoU^{inner} L I nn er D I o U = L D I o U + I o U I o U inn er
    • L I n n e r − C I o U = L C I o U + I o U − I o U i n n e r L_{Inner - CIoU} = L_{CIoU} + IoU - IoU^{inner} L I nn er C I o U = L C I o U + I o U I o U inn er
    • L I n n e r − E I o U = L E I o U + I o U − I o U i n n e r L_{Inner - EIoU} = L_{EIoU} + IoU - IoU^{inner} L I nn er E I o U = L E I o U + I o U I o U inn er
    • L I n n e r − S I o U = L S I o U + I o U − I o U i n n e r L_{Inner - SIoU} = L_{SIoU} + IoU - IoU^{inner} L I nn er S I o U = L S I o U + I o U I o U inn er

在这里插入图片描述

根据文章内容,在 Inner - IoU 损失中,比例因子 ratio 通常在 [0.5, 1.5] 范围内进行调整。

对于 高IoU 样本,为了加速其回归,将比例因子设置为小于1的值,使用较小的辅助边界框计算损失。例如在模拟实验中,为加速高IoU样本的回归, 将比例因子ratio设置为0.8

对于 低IoU 样本,为了加速其回归过程,将比例因子设置为大于1的值,使用较大的辅助边界框计算损失。例如在模拟实验中,低IoU回归样本场景中, 将比例因子ratio设置为1.2

2.2 优势

  • 与IoU损失相比,当比例小于1且辅助边界框尺寸小于实际边界框时,回归的有效范围小于IoU损失,但梯度的绝对值大于从IoU损失获得的梯度,能够加速高IoU样本的收敛。
  • 当比例大于1时,较大规模的辅助边界框扩大了回归的有效范围,增强了低IoU样本回归的效果。
  • 通过一系列模拟和对比实验,验证了该方法在检测性能和泛化能力方面优于现有方法,对于不同像素大小的数据集都能达到较好的效果。
  • 不仅适用于一般检测任务,对于目标非常小的检测任务也表现良好,证实了该方法的泛化性。

论文: https://arxiv.org/abs/2311.02877
源码: https://github.com/malagoutou/Inner-IoU

三、添加步骤

3.1 utils\metrics.py

此处需要查看的文件是 ultralytics/utils/metrics.py

metrics.py 中定义了模型的损失函数和计算方法,我们想要加入新的损失函数就只需要将代码放到这个文件内即可

Inner - IoU 的代码添加到 metrics.py 中,如下:

def get_inner_iou(box1, box2, xywh=True, eps=1e-7, ratio=0.7):
    if xywh:  # transform from xywh to xyxy
        (x1, y1, w1, h1), (x2, y2, w2, h2) = box1.chunk(4, -1), box2.chunk(4, -1)
        w1_, h1_, w2_, h2_ = w1 / 2, h1 / 2, w2 / 2, h2 / 2
        b1_x1, b1_x2, b1_y1, b1_y2 = x1 - w1_, x1 + w1_, y1 - h1_, y1 + h1_
        b2_x1, b2_x2, b2_y1, b2_y2 = x2 - w2_, x2 + w2_, y2 - h2_, y2 + h2_
        inner_b1_x1, inner_b1_x2, inner_b1_y1, inner_b1_y2 = x1 - w1_* ratio, x1 + w1_ * ratio, y1 - h1_ * ratio, y1 + h1_ * ratio
        inner_b2_x1, inner_b2_x2, inner_b2_y1, inner_b2_y2 = x2 - w2_* ratio, x2 + w2_ * ratio, y2 - h2_ * ratio, y2 + h2_ * ratio
    else:  # x1, y1, x2, y2 = box1
        b1_x1, b1_y1, b1_x2, b1_y2 = box1.chunk(4, -1)
        b2_x1, b2_y1, b2_x2, b2_y2 = box2.chunk(4, -1)
        w1, h1 = b1_x2 - b1_x1, b1_y2 - b1_y1 + eps
        w2, h2 = b2_x2 - b2_x1, b2_y2 - b2_y1 + eps
    
    # Intersection area
    inter = (b1_x2.minimum(b2_x2) - b1_x1.maximum(b2_x1)).clamp_(0) * \
            (b1_y2.minimum(b2_y2) - b1_y1.maximum(b2_y1)).clamp_(0)
 
    # Union Area
    union = w1 * h1 * ratio * ratio + w2 * h2 * ratio * ratio - inter + eps
    return inter / union
 
def bbox_inner_iou(box1, box2, xywh=True, GIoU=False, DIoU=False, CIoU=False, EIoU=False, SIoU=False, eps=1e-7, ratio=0.7):
    """
    Calculate Intersection over Union (IoU) of box1(1, 4) to box2(n, 4).
    Args:
        box1 (torch.Tensor): A tensor representing a single bounding box with shape (1, 4).
        box2 (torch.Tensor): A tensor representing n bounding boxes with shape (n, 4).
        xywh (bool, optional): If True, input boxes are in (x, y, w, h) format. If False, input boxes are in
                               (x1, y1, x2, y2) format. Defaults to True.
        GIoU (bool, optional): If True, calculate Generalized IoU. Defaults to False.
        DIoU (bool, optional): If True, calculate Distance IoU. Defaults to False.
        CIoU (bool, optional): If True, calculate Complete IoU. Defaults to False.
        EIoU (bool, optional): If True, calculate Efficient IoU. Defaults to False.
        SIoU (bool, optional): If True, calculate Scylla IoU. Defaults to False.
        eps (float, optional): A small value to avoid division by zero. Defaults to 1e-7.
    Returns:
        (torch.Tensor): IoU, GIoU, DIoU, or CIoU values depending on the specified flags.
    """
 
    # Get the coordinates of bounding boxes
    if xywh:  # transform from xywh to xyxy
        (x1, y1, w1, h1), (x2, y2, w2, h2) = box1.chunk(4, -1), box2.chunk(4, -1)
        w1_, h1_, w2_, h2_ = w1 / 2, h1 / 2, w2 / 2, h2 / 2
        b1_x1, b1_x2, b1_y1, b1_y2 = x1 - w1_, x1 + w1_, y1 - h1_, y1 + h1_
        b2_x1, b2_x2, b2_y1, b2_y2 = x2 - w2_, x2 + w2_, y2 - h2_, y2 + h2_
    else:  # x1, y1, x2, y2 = box1
        b1_x1, b1_y1, b1_x2, b1_y2 = box1.chunk(4, -1)
        b2_x1, b2_y1, b2_x2, b2_y2 = box2.chunk(4, -1)
        w1, h1 = b1_x2 - b1_x1, b1_y2 - b1_y1 + eps
        w2, h2 = b2_x2 - b2_x1, b2_y2 - b2_y1 + eps
 
    innner_iou = get_inner_iou(box1, box2, xywh=xywh, ratio=ratio)
    
    # Intersection area
    inter = (b1_x2.minimum(b2_x2) - b1_x1.maximum(b2_x1)).clamp_(0) * \
            (b1_y2.minimum(b2_y2) - b1_y1.maximum(b2_y1)).clamp_(0)
 
    # Union Area
    union = w1 * h1 + w2 * h2 - inter + eps
 
    # IoU
    iou = inter / union
    if CIoU or DIoU or GIoU or EIoU or SIoU:
        cw = b1_x2.maximum(b2_x2) - b1_x1.minimum(b2_x1)  # convex (smallest enclosing box) width
        ch = b1_y2.maximum(b2_y2) - b1_y1.minimum(b2_y1)  # convex height
        if CIoU or DIoU or EIoU or SIoU:  # Distance or Complete IoU https://arxiv.org/abs/1911.08287v1
            c2 = cw ** 2 + ch ** 2 + eps  # convex diagonal squared
            rho2 = ((b2_x1 + b2_x2 - b1_x1 - b1_x2) ** 2 + (b2_y1 + b2_y2 - b1_y1 - b1_y2) ** 2) / 4  # center dist ** 2
            if CIoU:  # https://github.com/Zzh-tju/DIoU-SSD-pytorch/blob/master/utils/box/box_utils.py#L47
                v = (4 / math.pi ** 2) * (torch.atan(w2 / h2) - torch.atan(w1 / h1)).pow(2)
                with torch.no_grad():
                    alpha = v / (v - iou + (1 + eps))
                return innner_iou - (rho2 / c2 + v * alpha)  # CIoU
            elif EIoU:
                rho_w2 = ((b2_x2 - b2_x1) - (b1_x2 - b1_x1)) ** 2
                rho_h2 = ((b2_y2 - b2_y1) - (b1_y2 - b1_y1)) ** 2
                cw2 = cw ** 2 + eps
                ch2 = ch ** 2 + eps
                return innner_iou - (rho2 / c2 + rho_w2 / cw2 + rho_h2 / ch2) # EIoU
            elif SIoU:
                # SIoU Loss https://arxiv.org/pdf/2205.12740.pdf
                s_cw = (b2_x1 + b2_x2 - b1_x1 - b1_x2) * 0.5 + eps
                s_ch = (b2_y1 + b2_y2 - b1_y1 - b1_y2) * 0.5 + eps
                sigma = torch.pow(s_cw ** 2 + s_ch ** 2, 0.5)
                sin_alpha_1 = torch.abs(s_cw) / sigma
                sin_alpha_2 = torch.abs(s_ch) / sigma
                threshold = pow(2, 0.5) / 2
                sin_alpha = torch.where(sin_alpha_1 > threshold, sin_alpha_2, sin_alpha_1)
                angle_cost = torch.cos(torch.arcsin(sin_alpha) * 2 - math.pi / 2)
                rho_x = (s_cw / cw) ** 2
                rho_y = (s_ch / ch) ** 2
                gamma = angle_cost - 2
                distance_cost = 2 - torch.exp(gamma * rho_x) - torch.exp(gamma * rho_y)
                omiga_w = torch.abs(w1 - w2) / torch.max(w1, w2)
                omiga_h = torch.abs(h1 - h2) / torch.max(h1, h2)
                shape_cost = torch.pow(1 - torch.exp(-1 * omiga_w), 4) + torch.pow(1 - torch.exp(-1 * omiga_h), 4)
                return innner_iou - 0.5 * (distance_cost + shape_cost) + eps # SIoU
            return innner_iou - rho2 / c2  # DIoU
        c_area = cw * ch + eps  # convex area
        return innner_iou - (c_area - union) / c_area  # GIoU https://arxiv.org/pdf/1902.09630.pdf
    return innner_iou  # IoU

在这里插入图片描述

3.2 修改ultralytics/utils/loss.py

utils\loss.py 用于计算各种损失。

ultralytics/utils/loss.py 在的引用中添加 bbox_inner_iou ,然后在 BboxLoss 函数内修改如下代码,使模型调用此 bbox_inner_iou 损失函数。

在这里插入图片描述

3.2.1 Inner_CIou


iou = bbox_inner_iou(pred_bboxes[fg_mask], target_bboxes[fg_mask], xywh=True, CIoU=True)

在这里插入图片描述

3.2.2 Inner_GIou


iou = bbox_inner_iou(pred_bboxes[fg_mask], target_bboxes[fg_mask], xywh=True, GIoU=True)

3.2.3 Inner_DIou


iou = bbox_inner_iou(pred_bboxes[fg_mask], target_bboxes[fg_mask], xywh=True, DIoU=True)

3.2.4 Inner_EIou


iou = bbox_inner_iou(pred_bboxes[fg_mask], target_bboxes[fg_mask], xywh=True, EIoU=True)

3.2.5 Inner_SIou


iou = bbox_inner_iou(pred_bboxes[fg_mask], target_bboxes[fg_mask], xywh=True, SIoU=True)

3.3 修改ultralytics/utils/tal.py

tal.py 中是一些损失函数的功能应用。

ultralytics/utils/tal.py 在的引用中添加 bbox_inner_iou ,然后在 iou_calculation 函数内修改如下代码,使模型调用此 bbox_inner_iou 损失函数。

此处仅以 Inner_CIou 为例:

在这里插入图片描述

return bbox_inner_iou(gt_bboxes, pd_bboxes, xywh=False, CIoU=True).squeeze(-1).clamp_(0)

在这里插入图片描述

四、成功运行截图

在这里插入图片描述

五、总结

为了弥补现有 IoU 损失在不同检测任务中泛化性弱和收敛速度慢的问题,·Inner-IoU·通过引入比例因子 “ratio” 来控制辅助边界框的尺度大小,利用不同尺度的辅助边界框来计算损失,从而加速边界框回归过程。