图像特征提取¶

图像特征提取是从给定的图像中提取语义上有意义的特征的任务。这一任务有许多应用场景，包括图像相似性和图像检索。此外，大多数计算机视觉模型都可以用于图像特征提取，其中可以通过移除任务特定的头部（如图像分类、目标检测等）来获取特征。这些特征在高层次上非常有用，例如边缘检测、角点检测等。根据模型的深度，这些特征也可能包含有关现实世界的信息（例如猫的样子）。因此，这些输出可以用于训练特定数据集上的新分类器。

在这个指南中，你将：

学习如何基于 image-feature-extraction 管道构建一个简单的图像相似性系统。
通过裸模型推理实现相同的任务。

使用 `image-feature-extraction` 管道实现图像相似性¶

我们有两张猫坐在渔网上的图片，其中一张是生成的。

In [ ]:

from PIL import Image
import requests

img_urls = ["https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/cats.png", "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/cats.jpeg"]
image_real = Image.open(requests.get(img_urls[0], stream=True).raw).convert("RGB")
image_gen = Image.open(requests.get(img_urls[1], stream=True).raw).convert("RGB")

让我们看看管道的实际应用。首先，初始化管道。如果你不传递任何模型，管道将自动初始化为 google/vit-base-patch16-224。如果你想计算相似度，可以将 pool 参数设置为 True。

In [ ]:

import torch
from transformers import pipeline

DEVICE = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
pipe = pipeline(task="image-feature-extraction", model_name="google/vit-base-patch16-384", device=DEVICE, pool=True)

要使用 pipe 进行推理，可以将两张图片传递给它。

In [ ]:

outputs = pipe([image_real, image_gen])

输出包含这两张图片的池化嵌入。

In [ ]:

# 获取单个输出的长度
print(len(outputs[0][0]))
# 显示输出
print(outputs)

# 768
# [[-0.03909236937761307, 0.43381670117378235, -0.06913255900144577,

为了获得相似度分数，我们需要将它们传递给相似度函数。

In [ ]:

from torch.nn.functional import cosine_similarity

similarity_score = cosine_similarity(torch.Tensor(outputs[0]),
                                     torch.Tensor(outputs[1]), dim=1)

print(similarity_score)

# tensor([0.6043])

如果你希望获取池化之前的最后一个隐藏状态，可以避免传递 pool 参数，因为默认值为 False。这些隐藏状态对于基于模型特征训练新的分类器或模型非常有用。

In [ ]:

pipe = pipeline(task="image-feature-extraction", model_name="google/vit-base-patch16-224", device=DEVICE)
output = pipe(image_real)

由于输出未池化，我们得到的是最后一个隐藏状态，其中第一个维度是批量大小，最后两个维度是嵌入形状。

In [ ]:

import numpy as np
print(np.array(output).shape)
# (1, 197, 768)

使用 `AutoModel` 获取特征和相似性¶

我们还可以使用 transformers 的 AutoModel 类来获取特征。AutoModel 加载任何没有任务特定头部的变换器模型，我们可以使用它来获取特征。

In [ ]:

from transformers import AutoImageProcessor, AutoModel

processor = AutoImageProcessor.from_pretrained("google/vit-base-patch16-224")
model = AutoModel.from_pretrained("google/vit-base-patch16-224").to(DEVICE)

让我们编写一个简单的推理函数。我们将首先将输入传递给 processor，然后将其输出传递给 model。

In [ ]:

def infer(image):
  inputs = processor(image, return_tensors="pt").to(DEVICE)
  outputs = model(**inputs)
  return outputs.pooler_output

我们可以直接将图片传递给这个函数并获取嵌入。

In [ ]:

embed_real = infer(image_real)
embed_gen = infer(image_gen)

再次计算嵌入的相似度。

In [ ]:

from torch.nn.functional import cosine_similarity

similarity_score = cosine_similarity(embed_real, embed_gen, dim=1)
print(similarity_score)

# tensor([0.6061], device='cuda:0', grad_fn=<SumBackward1>)

学习资源站

040图像特征提取

图像特征提取¶

使用 `image-feature-extraction` 管道实现图像相似性¶

使用 `AutoModel` 获取特征和相似性¶

040图像特征提取

图像特征提取¶

使用 image-feature-extraction 管道实现图像相似性¶

使用 AutoModel 获取特征和相似性¶

使用 `image-feature-extraction` 管道实现图像相似性¶

使用 `AutoModel` 获取特征和相似性¶