RT-DETR改进策略【损失函数篇】| 通过辅助边界框计算IoU提升检测效果(Inner_GIoU、Inner_DIoU、Inner_CIoU、Inner_EIoU、Inner_SIoU)

一、背景：

现有基于IoU的边界框回归方法主要通过添加新的损失项来加速收敛，忽略了IoU损失项本身的局限性，且在不同检测器和检测任务中不能自我调整，泛化性不强。
通过分析边界框回归模型， inner_iou 论文中发现区分不同的回归样本，并使用不同尺度的辅助边界框来计算损失，可以有效加速边界框回归过程。对于高IoU样本，使用较小的辅助边界框计算损失可加速收敛，而较大的辅助边界框适用于低IoU样本。

本文将 RT-DETR 默认的 CIoU 损失函数修改成 inner_IoU 、 inner_GIoU 、 inner_DIoU 、 inner_CIoU 、 inner_EIoU 、 inner_SIoU 。

二、原理

Inner-IoU : More Effective Intersection over Union Loss with Auxiliary Bounding Box

2.1 Inner - IoU计算原理

定义相关参数：
- 真实（GT）框和锚点分别表示为 $B^{gt}$ 和 $B$ 。
- GT框和内GT框的中心点表示为 $x_{c}^{gt}, y_{c}^{gt})$ ，锚点和内锚点的中心点表示为 $x_{c}, y_{c})$ 。
- GT框的宽度和高度表示为 $w^{gt}$ 和 $h^{gt}$ ，锚点的宽度和高度表示为 $w$ 和 $h$ 。
- 引入比例因子 ratio 。

根据以下公式计算辅助边界框的坐标：
- $b_{l}^{g t} = x_{c}^{g t} - \frac{w^{g t} * ratio}{2}$ ， $b_{r}^{g t} = x_{c}^{g t} + \frac{w^{g t} * ratio}{2}$
- $b_{t}^{g t} = y_{c}^{g t} - \frac{h^{g t} * ratio}{2}$ ， $b_{b}^{g t} = y_{c}^{g t} + \frac{h^{g t} * ratio}{2}$
- $b_{l} = x_{c} - \frac{w * ratio}{2}$ ， $b_{r} = x_{c} + \frac{w * ratio}{2}$
- $b_{t} = y_{c} - \frac{h * ratio}{2}$ ， $b_{b} = y_{c} + \frac{h * ratio}{2}$
计算交并比：
- $inter = (min(b_{r}^{g t}, b_{r}) - max(b_{l}^{g t}, b_{l})) * (min(b_{b}^{g t}, b_{b}) - max(b_{t}^{g t}, b_{t}))$
- $union = (w^{g t} * h^{g t}) * (ratio)^{2} + (w * h) * (ratio)^{2} - inter$
- $IoU^{inner} = \frac{inter}{union}$
Inner - IoU 损失的计算公式为： $L_{Inner - IoU} = 1 - IoU^{inner}$
将 Inner - IoU 应用于现有基于IoU的边界框回归损失函数，得到：
- $L_{Inner - GIoU} = L_{GIoU} + IoU - IoU^{inner}$
- $L_{Inner - DIoU} = L_{DIoU} + IoU - IoU^{inner}$
- $L_{Inner - CIoU} = L_{CIoU} + IoU - IoU^{inner}$
- $L_{Inner - EIoU} = L_{EIoU} + IoU - IoU^{inner}$
- $L_{Inner - SIoU} = L_{SIoU} + IoU - IoU^{inner}$

在这里插入图片描述

根据文章内容，在 Inner - IoU 损失中，比例因子 ratio 通常在 [0.5, 1.5] 范围内进行调整。

对于 高IoU 样本，为了加速其回归，将比例因子设置为小于1的值，使用较小的辅助边界框计算损失。例如在模拟实验中，为加速高IoU样本的回归， 将比例因子ratio设置为0.8 。

对于 低IoU 样本，为了加速其回归过程，将比例因子设置为大于1的值，使用较大的辅助边界框计算损失。例如在模拟实验中，低IoU回归样本场景中， 将比例因子ratio设置为1.2 。

2.2 优势

与IoU损失相比，当比例小于1且辅助边界框尺寸小于实际边界框时，回归的有效范围小于IoU损失，但梯度的绝对值大于从IoU损失获得的梯度，能够加速高IoU样本的收敛。
当比例大于1时，较大规模的辅助边界框扩大了回归的有效范围，增强了低IoU样本回归的效果。
通过一系列模拟和对比实验，验证了该方法在检测性能和泛化能力方面优于现有方法，对于不同像素大小的数据集都能达到较好的效果。
不仅适用于一般检测任务，对于目标非常小的检测任务也表现良好，证实了该方法的泛化性。

论文： https://arxiv.org/abs/2311.02877
源码： https://github.com/malagoutou/Inner-IoU

三、添加步骤

3.1 utils\metrics.py

此处需要查看的文件是 ultralytics/utils/metrics.py

metrics.py 中定义了模型的损失函数和计算方法，我们想要加入新的损失函数就只需要将代码放到这个文件内即可

将 Inner - IoU 的代码添加到 metrics.py 中，如下：

def get_inner_iou(box1, box2, xywh=True, eps=1e-7, ratio=0.7):
    if xywh:  # transform from xywh to xyxy
        (x1, y1, w1, h1), (x2, y2, w2, h2) = box1.chunk(4, -1), box2.chunk(4, -1)
        w1_, h1_, w2_, h2_ = w1 / 2, h1 / 2, w2 / 2, h2 / 2
        b1_x1, b1_x2, b1_y1, b1_y2 = x1 - w1_, x1 + w1_, y1 - h1_, y1 + h1_
        b2_x1, b2_x2, b2_y1, b2_y2 = x2 - w2_, x2 + w2_, y2 - h2_, y2 + h2_
        inner_b1_x1, inner_b1_x2, inner_b1_y1, inner_b1_y2 = x1 - w1_* ratio, x1 + w1_ * ratio, y1 - h1_ * ratio, y1 + h1_ * ratio
        inner_b2_x1, inner_b2_x2, inner_b2_y1, inner_b2_y2 = x2 - w2_* ratio, x2 + w2_ * ratio, y2 - h2_ * ratio, y2 + h2_ * ratio
    else:  # x1, y1, x2, y2 = box1
        b1_x1, b1_y1, b1_x2, b1_y2 = box1.chunk(4, -1)
        b2_x1, b2_y1, b2_x2, b2_y2 = box2.chunk(4, -1)
        w1, h1 = b1_x2 - b1_x1, b1_y2 - b1_y1 + eps
        w2, h2 = b2_x2 - b2_x1, b2_y2 - b2_y1 + eps
    
    # Intersection area
    inter = (b1_x2.minimum(b2_x2) - b1_x1.maximum(b2_x1)).clamp_(0) * \
            (b1_y2.minimum(b2_y2) - b1_y1.maximum(b2_y1)).clamp_(0)
 
    # Union Area
    union = w1 * h1 * ratio * ratio + w2 * h2 * ratio * ratio - inter + eps
    return inter / union
 
def bbox_inner_iou(box1, box2, xywh=True, GIoU=False, DIoU=False, CIoU=False, EIoU=False, SIoU=False, eps=1e-7, ratio=0.7):
    """
    Calculate Intersection over Union (IoU) of box1(1, 4) to box2(n, 4).
    Args:
        box1 (torch.Tensor): A tensor representing a single bounding box with shape (1, 4).
        box2 (torch.Tensor): A tensor representing n bounding boxes with shape (n, 4).
        xywh (bool, optional): If True, input boxes are in (x, y, w, h) format. If False, input boxes are in
                               (x1, y1, x2, y2) format. Defaults to True.
        GIoU (bool, optional): If True, calculate Generalized IoU. Defaults to False.
        DIoU (bool, optional): If True, calculate Distance IoU. Defaults to False.
        CIoU (bool, optional): If True, calculate Complete IoU. Defaults to False.
        EIoU (bool, optional): If True, calculate Efficient IoU. Defaults to False.
        SIoU (bool, optional): If True, calculate Scylla IoU. Defaults to False.
        eps (float, optional): A small value to avoid division by zero. Defaults to 1e-7.
    Returns:
        (torch.Tensor): IoU, GIoU, DIoU, or CIoU values depending on the specified flags.
    """
 
    # Get the coordinates of bounding boxes
    if xywh:  # transform from xywh to xyxy
        (x1, y1, w1, h1), (x2, y2, w2, h2) = box1.chunk(4, -1), box2.chunk(4, -1)
        w1_, h1_, w2_, h2_ = w1 / 2, h1 / 2, w2 / 2, h2 / 2
        b1_x1, b1_x2, b1_y1, b1_y2 = x1 - w1_, x1 + w1_, y1 - h1_, y1 + h1_
        b2_x1, b2_x2, b2_y1, b2_y2 = x2 - w2_, x2 + w2_, y2 - h2_, y2 + h2_
    else:  # x1, y1, x2, y2 = box1
        b1_x1, b1_y1, b1_x2, b1_y2 = box1.chunk(4, -1)
        b2_x1, b2_y1, b2_x2, b2_y2 = box2.chunk(4, -1)
        w1, h1 = b1_x2 - b1_x1, b1_y2 - b1_y1 + eps
        w2, h2 = b2_x2 - b2_x1, b2_y2 - b2_y1 + eps
 
    innner_iou = get_inner_iou(box1, box2, xywh=xywh, ratio=ratio)
    
    # Intersection area
    inter = (b1_x2.minimum(b2_x2) - b1_x1.maximum(b2_x1)).clamp_(0) * \
            (b1_y2.minimum(b2_y2) - b1_y1.maximum(b2_y1)).clamp_(0)
 
    # Union Area
    union = w1 * h1 + w2 * h2 - inter + eps
 
    # IoU
    iou = inter / union
    if CIoU or DIoU or GIoU or EIoU or SIoU:
        cw = b1_x2.maximum(b2_x2) - b1_x1.minimum(b2_x1)  # convex (smallest enclosing box) width
        ch = b1_y2.maximum(b2_y2) - b1_y1.minimum(b2_y1)  # convex height
        if CIoU or DIoU or EIoU or SIoU:  # Distance or Complete IoU https://arxiv.org/abs/1911.08287v1
            c2 = cw ** 2 + ch ** 2 + eps  # convex diagonal squared
            rho2 = ((b2_x1 + b2_x2 - b1_x1 - b1_x2) ** 2 + (b2_y1 + b2_y2 - b1_y1 - b1_y2) ** 2) / 4  # center dist ** 2
            if CIoU:  # https://github.com/Zzh-tju/DIoU-SSD-pytorch/blob/master/utils/box/box_utils.py#L47
                v = (4 / math.pi ** 2) * (torch.atan(w2 / h2) - torch.atan(w1 / h1)).pow(2)
                with torch.no_grad():
                    alpha = v / (v - iou + (1 + eps))
                return innner_iou - (rho2 / c2 + v * alpha)  # CIoU
            elif EIoU:
                rho_w2 = ((b2_x2 - b2_x1) - (b1_x2 - b1_x1)) ** 2
                rho_h2 = ((b2_y2 - b2_y1) - (b1_y2 - b1_y1)) ** 2
                cw2 = cw ** 2 + eps
                ch2 = ch ** 2 + eps
                return innner_iou - (rho2 / c2 + rho_w2 / cw2 + rho_h2 / ch2) # EIoU
            elif SIoU:
                # SIoU Loss https://arxiv.org/pdf/2205.12740.pdf
                s_cw = (b2_x1 + b2_x2 - b1_x1 - b1_x2) * 0.5 + eps
                s_ch = (b2_y1 + b2_y2 - b1_y1 - b1_y2) * 0.5 + eps
                sigma = torch.pow(s_cw ** 2 + s_ch ** 2, 0.5)
                sin_alpha_1 = torch.abs(s_cw) / sigma
                sin_alpha_2 = torch.abs(s_ch) / sigma
                threshold = pow(2, 0.5) / 2
                sin_alpha = torch.where(sin_alpha_1 > threshold, sin_alpha_2, sin_alpha_1)
                angle_cost = torch.cos(torch.arcsin(sin_alpha) * 2 - math.pi / 2)
                rho_x = (s_cw / cw) ** 2
                rho_y = (s_ch / ch) ** 2
                gamma = angle_cost - 2
                distance_cost = 2 - torch.exp(gamma * rho_x) - torch.exp(gamma * rho_y)
                omiga_w = torch.abs(w1 - w2) / torch.max(w1, w2)
                omiga_h = torch.abs(h1 - h2) / torch.max(h1, h2)
                shape_cost = torch.pow(1 - torch.exp(-1 * omiga_w), 4) + torch.pow(1 - torch.exp(-1 * omiga_h), 4)
                return innner_iou - 0.5 * (distance_cost + shape_cost) + eps # SIoU
            return innner_iou - rho2 / c2  # DIoU
        c_area = cw * ch + eps  # convex area
        return innner_iou - (c_area - union) / c_area  # GIoU https://arxiv.org/pdf/1902.09630.pdf
    return innner_iou  # IoU

在这里插入图片描述

3.2 修改ultralytics/utils/loss.py

utils\loss.py 用于计算各种损失。

在 ultralytics/utils/loss.py 在的引用中添加 bbox_inner_iou ，然后在 BboxLoss 函数内修改如下代码，使模型调用此 bbox_inner_iou 损失函数。

在这里插入图片描述

3.2.1 Inner_CIou


iou = bbox_inner_iou(pred_bboxes[fg_mask], target_bboxes[fg_mask], xywh=True, CIoU=True)

在这里插入图片描述

3.2.2 Inner_GIou


iou = bbox_inner_iou(pred_bboxes[fg_mask], target_bboxes[fg_mask], xywh=True, GIoU=True)

3.2.3 Inner_DIou


iou = bbox_inner_iou(pred_bboxes[fg_mask], target_bboxes[fg_mask], xywh=True, DIoU=True)

3.2.4 Inner_EIou


iou = bbox_inner_iou(pred_bboxes[fg_mask], target_bboxes[fg_mask], xywh=True, EIoU=True)

3.2.5 Inner_SIou


iou = bbox_inner_iou(pred_bboxes[fg_mask], target_bboxes[fg_mask], xywh=True, SIoU=True)

3.3 修改ultralytics/utils/tal.py

tal.py 中是一些损失函数的功能应用。

在 ultralytics/utils/tal.py 在的引用中添加 bbox_inner_iou ，然后在 iou_calculation 函数内修改如下代码，使模型调用此 bbox_inner_iou 损失函数。

此处仅以 Inner_CIou 为例：

在这里插入图片描述

return bbox_inner_iou(gt_bboxes, pd_bboxes, xywh=False, CIoU=True).squeeze(-1).clamp_(0)

在这里插入图片描述

四、成功运行截图

在这里插入图片描述

五、总结

为了弥补现有 IoU 损失在不同检测任务中泛化性弱和收敛速度慢的问题，·Inner-IoU·通过引入比例因子 “ratio” 来控制辅助边界框的尺度大小，利用不同尺度的辅助边界框来计算损失，从而加速边界框回归过程。

学习资源站

RT-DETR改进策略【损失函数篇】通过辅助边界框计算IoU提升检测效果(Inner_GIoU、Inner_DIoU、Inner_CIoU、Inner_EIoU、Inner_SIoU)_rtdetrloss-

RT-DETR改进策略【损失函数篇】| 通过辅助边界框计算IoU提升检测效果(Inner_GIoU、Inner_DIoU、Inner_CIoU、Inner_EIoU、Inner_SIoU)

一、背景：

二、原理

2.1 Inner - IoU计算原理

2.2 优势

三、添加步骤

3.1 utils\metrics.py

3.2 修改ultralytics/utils/loss.py

3.2.1 Inner_CIou

3.2.2 Inner_GIou

3.2.3 Inner_DIou

3.2.4 Inner_EIou

3.2.5 Inner_SIou

3.3 修改ultralytics/utils/tal.py

四、成功运行截图

五、总结