Agents(代理) and tools(工具)¶

什么是 Agent (代理) ?¶

大型语言模型（LLMs）可以处理各种各样的任务，但它们通常难以处理逻辑、计算和搜索等基本任务。当在提示他们在表现不佳的领域进行生成时，往往无法直接生成我们期望的答案。

克服这个缺点的方法之一是创建一个Agent。Agent是一个使用LLM作为引擎的系统，它可以访问称为Tools的功能。这些Tools是用于执行任务的功能，同时也包含使 Agent 能够正确使用它们的描述信息。

Agents的程序可编程为：

设计一系列动作(Actions)和工具(Tools)并一次性运行它们，例如CodeAgent。
逐个计划并执行操作或工具，等待每个操作的结果，根据结果和计划再启动下一个操作，例如ReactJsonAgent。

Agent 的类型¶

CodeAgent¶

该 Agent 有一个确定的规划和执行步骤，会生成 Python 代码来一次性执行所有的操作。它能够为其工具处理不同的输入和输出类型，因此它是多模态任务的推荐选择。

ReactJsonAgent¶

这是解决推理型任务的首选 Agent ，因为ReAct框架（Yao等人，2022年）使 Agent 在执行后会观察之前的结果再决定下一步，使得每一步都建立在有效思考的基础上。

我们实现了两个版本的ReactJsonAgent：

ReactJsonAgent 在其输出中将工具调用生成为 JSON。
ReactCodeAgent 是一种新型的 ReactJsonAgent，它能够通过工具调用生成代码的二进制表示形式(Blob)，这对于具有强大编码能力的 LLMs 非常有效。

阅读将开源 LLMs 作为 LangChain Agents 博客文章，了解更多关于 ReAct Agent 的信息。

Agent_ManimCE

ReAct

下面展示的是 ReactCodeAgent 如何解决问题的过程和输出内容：

In [ ]:

agent.run(
    "How many more blocks (also denoted as layers) in BERT base encoder than the encoder from the architecture proposed in Attention is All You Need?",
)

如何构建 Agent？¶

要初始化 Agent ，你需要以下参数：

LLM（大型语言模型）：作为 Agent 的核心动力源，LLM负责生成 Agent 所需的输出。 Agent 本身并非纯粹的LLM，而是依托LLM进行运作的程序。
提示符：这是提供给LLM引擎的指令，用以指导其生成特定的输出。
工具箱：包含 Agent 可调用的各种工具， Agent 将从中选择合适的工具来执行任务。
解析器：用于从LLM的输出中提取信息，确定需要调用哪些工具以及相应的参数。

在初始化过程中，工具的属性会被用来生成工具描述，这些描述随后会被嵌入到 Agent 的提示符中，以确保 Agent 清楚了解可用的工具及其适用场景。

首先，需要安装 Agent 附加程序，它将自动安装所有的默认依赖项，为后续的 Agent 系统配置奠定基础。

In [ ]:

pip install transformers[agents]

通过定义一个接受消息列表并返回文本的 llm_engine 方法来构建 LLM 引擎。此调用还需要接受一个停止参数，该参数提示模型何时停止生成。

In [ ]:

from huggingface_hub import login, InferenceClient

login("<YOUR_HUGGINGFACEHUB_API_TOKEN>")

client = InferenceClient(model="meta-llama/Meta-Llama-3-70B-Instruct")

def llm_engine(messages, stop_sequences=["Task"]) -> str:
    response = client.chat_completion(messages, stop=stop_sequences, max_tokens=1000)
    answer = response.choices[0].message.content
    return answer

你可以使用 llm_engine 方法，但需要遵守以下几点：

输入格式：输入消息必须是列表，里面包含字典，字典的键和值都是字符串(List[Dict[str，str]])。
输出格式：返回的结果是一个字符串(str)。
停止生成：当遇到指定的停止序列(stop_sequences)时，输出会停止。

此外，llm_engine 还支持一个叫“语法”的参数。如果你在创建 Agent 时设置了语法，这个参数会一起传给 llm_engine，确保生成的输出格式正确。

你还需要提供一个叫 tools 的参数，它接受一个工具列表，这个列表可以是空的。如果你想要添加一些默认工具，可以设置 add_base_tools=True。

现在，你可以创建一个 Agent 并运行它（比如 CodeAgent）。为了方便使用，我们还提供了两种引擎：

TransformersEngine：可以创建一个具有预初始 pipeline 的TransformersEngine，在你的电脑上使用 transformers 库进行推理。
HfApiEngine：HfApiEngine 是一个专门设计的工具或类，它封装了 huggingface_hub.InferenceClient，简化了与 Hugging Face API 的交互过程。由于 Agent 行为通常需要更强大的模型，目前很难在本地运行，通过 Hugging Face 的在线服务，可以直接使用强大的预训练模型，如 Llama-3.1-70B-Instruct，这些模型在本地运行可能需要很高的计算资源。

In [ ]:

from transformers import CodeAgent, HfApiEngine

llm_engine = HfApiEngine(model="meta-llama/Meta-Llama-3-70B-Instruct")
agent = CodeAgent(tools=[], llm_engine=llm_engine, add_base_tools=True)

agent.run(
    "Could you translate this sentence from French, say it out loud and return the audio.",
    sentence="Où est la boulangerie la plus proche?",
)

你可以不定义参数llm_engine，默认情况下将创建HfApiEngine。

In [ ]:

from transformers import CodeAgent

agent = CodeAgent(tools=[], add_base_tools=True)

agent.run(
    "Could you translate this sentence from French, say it out loud and give me the audio.",
    sentence="Où est la boulangerie la plus proche?",
)

请注意，我们这里用了一个特别的参数叫 sentence。这个参数的作用是让你可以把需要处理的文本直接传给模型。

不仅如此，这个 sentence 参数还可以用来告诉模型去使用某个本地或远程文件的路径。这样一来，你可以更灵活地处理各种文件，非常方便！

文本处理：直接传文本给模型。
文件处理：传文件路径给模型。

In [ ]:

from transformers import ReactCodeAgent

agent = ReactCodeAgent(tools=[], llm_engine=llm_engine, add_base_tools=True)

agent.run("Why does Mike not know many people in New York?", audio="../../resources/record/transformers_recording.mp3")

提示符和输出解析器是自动定义的，但是你可以通过输出 Agent 中的system_prompt_template来查看它们。

In [ ]:

print(agent.system_prompt_template)

清楚地描述你要执行的任务非常重要。每次调用 run() 方法时，它都是独立的操作。因为 Agent 是由大型语言模型（LLM）支持的，所以提示中的微小变化可能会导致结果大不相同。

此外，你还可以连续运行 Agent 来处理不同的任务。每次运行时， Agent 的 agent.task 和 agent.logs 属性都会重新初始化，确保每个任务都是从干净的状态开始。

代码执行¶

Python 解释器会在你提供的工具和输入上执行代码。这样做是安全的，因为解释器只能调用你提供的工具（比如 Hugging Face 的工具）和打印函数。这意味着你能控制它能做什么，避免了潜在的风险。

另外，Python 解释器默认不允许导入不在安全列表中的模块，所以大部分常见的攻击手段都不会构成威胁。

不过，如果你需要在 ReactCodeAgent 或 CodeAgent 中使用额外的模块，你可以在初始化时，通过 additional_authorized_imports 参数来授权。这个参数接受一个字符串列表，列出你允许导入的额外模块。

In [ ]:

from transformers import ReactCodeAgent

agent = ReactCodeAgent(tools=[], additional_authorized_imports=['requests', 'bs4'])
agent.run("Could you get me the title of the page at url 'https://huggingface.co/blog'?")

执行将在任何试图执行非法操作的代码处停止，或者 Agent 生成的代码存在常规的Python错误。

注意！LLM可以生成任意代码，然后执行，所以请不要添加任何不安全的导入！

系统提示¶

Agent ，或者说驱动 Agent 的LLM，基于系统提示生成输出。系统提示可以根据预期的任务进行自定义和定制。例如，查看ReactCodeAgent的系统提示（以下版本略有简化）。

``` You will be given a task to solve as best you can. You have access to the following tools: <<tool_descriptions>> To solve the task, you must plan forward to proceed in a series of steps, in a cycle of 'Thought:', 'Code:', and 'Observation:' sequences. At each step, in the 'Thought:' sequence, you should first explain your reasoning towards solving the task, then the tools that you want to use. Then in the 'Code:' sequence, you shold write the code in simple Python. The code sequence must end with '/End code' sequence. During each intermediate step, you can use 'print()' to save whatever important information you will then need. These print outputs will then be available in the 'Observation:' field, for using this information as input for the next step. In the end you have to return a final answer using the `final_answer` tool. Here are a few examples using notional tools: --- {examples} Above example were using notional tools that might not exist for you. You only have acces to those tools: <<tool_names>> You also can perform computations in the python code you generate. Always provide a 'Thought:' and a 'Code:\n```py' sequence ending with '```<end_code>' sequence. You MUST provide at least the 'Code:' sequence to move forward. Remember to not perform too many operations in a single code block! You should split the task into intermediate code blocks. Print results at the end of each step to save the intermediate results. Then use final_answer() to return the final result. Remember to make sure that variables you use are all defined. Now Begin! ```

系统提示包含以下几个部分：

介绍：说明 Agent 应该怎么工作和使用工具。
工具描述：所有工具的描述由 <<Tool_Exclusions>> 标记定义。在运行时，这个标记会被用户定义或选择的工具动态替换。
- 工具描述信息包括工具的属性、名称、描述、输入和输出类型。
- 这些描述使用一个简单的 Jinja 2 模板生成，你可以根据需要优化这个模板。
预期输出格式：说明期望的输出格式是什么样的。

你可以通过以下方式改进系统提示：

添加更多关于输出格式的说明，让 Agent 更清楚应该怎么生成结果。
为了获得最大的灵活性，你可以自定义整个系统提示模板。只需将你的自定义提示作为参数传递给 system_prompt 参数，就可以覆盖默认的提示模板。

In [ ]:

from transformers import ReactJsonAgent
from transformers.agents import PythonInterpreterTool

agent = ReactJsonAgent(tools=[PythonInterpreterTool()], system_prompt="{your_custom_prompt}")

请确保在模板的某个地方定义一个叫 <<tool_dations>> 的字符串。这样做的目的是让 Agent 知道有哪些工具可以使用。

检查 Agent 运行¶

运行 Agent 后，你可以通过以下属性来检查发生了什么：

logs 属性：
- 这个属性存储了 Agent 的详细日志。每次 Agent 执行一步操作时，所有相关信息都会被记录在一个字典中，并添加到 agent.logs 里。
agent.write_inner_memory_from_logs() 方法：
- 这个方法会将 Agent 的日志转换成内部的内存格式，供大型语言模型（LLM）查看，就像一系列聊天消息一样。
- 它会遍历日志的每一步，只挑选出感兴趣的内容保存为消息。比如，系统提示和任务会被单独保存，每一步的 LLM 输出和工具调用输出也会分别保存为不同的消息。
- 如果你希望从更高层次了解发生了什么，可以使用这个方法。但请注意，并不是每个日志细节都会被转换成消息。

简单来说：

查看详细日志：通过 agent.logs 属性。
转换日志为内存格式：用 agent.write_inner_memory_from_logs() 方法，方便 LLM 查看，但不是所有细节都会被转换。

工具¶

工具是 Agent 用来执行任务的基本功能模块。

举个例子，查看 PythonInterpreterTool：

它有名称、描述、输入描述、输出类型，还有一个 call 方法来实际执行操作。

当你初始化 Agent 时，这些工具的属性会被用来生成工具描述。这个描述会被嵌入到 Agent 的系统提示中。这样， Agent 就能知道它有哪些工具可以用，以及这些工具是干什么的。

默认工具箱¶

Transformers 库自带一个默认的工具箱，可以帮助 Agent 完成各种任务。你可以在初始化 Agent 时，通过设置 add_base_tools=True 来添加这个工具箱。

这个工具箱包含以下功能：

文档问题回答：给定一个PDF等格式的文档，可以回答关于这个文档的问题（比如关于“甜甜圈”的问题）。
图像问题回答：给定一张图片，可以回答关于这张图片的问题（使用VILT模型）。
语音到文本：给定一段语音录音，可以将其转换成文字（使用耳语模型）。
文本转语音：将文字转换成语音（使用SpeechT5模型）。
翻译：将句子从一种语言翻译成另一种语言。
DuckDuckGo 搜索：使用DuckDuckGo浏览器进行网络搜索。
Python 代码解释器：在安全环境中运行LLM生成的Python代码。注意，如果你使用 add_base_tools=True 初始化，这个工具只会添加到 ReactJsonAgent，因为基于代码的 Agent 已经自带 Python 代码执行功能。

你也可以手动使用这些工具，只需调用 load_tool() 函数并指定要执行的任务即可。

In [ ]:

from transformers import load_tool

tool = load_tool("text-to-speech")
audio = tool("This is a text to speech tool")

创建新的工具¶

如果你发现 Hugging Face 提供的默认工具不能满足你的需求，也可以自己创建工具。

比如，我们可以创建一个工具，用来查找并返回在 Hugging Face Hub 上下载次数最多的模型，针对某个特定的任务。你可以从下面这段代码开始动手实现。

In [ ]:

from huggingface_hub import list_models

task = "text-classification"

model = next(iter(list_models(filter=task, sort="downloads", direction=-1)))
print(model.id)

下面这段代码可以快速转换为工具，只需将其包装在函数中并添加工具装饰器：

In [ ]:

from transformers import tool

@tool
def model_download_counter(task: str) -> str:
    """
    这是一个工具，它会返回Hugging Face Hub上给定任务中下载量最高的模型的 Checkpoint 名称。

    Args:
        task: 任务
    """
    model = next(iter(list_models(filter="text-classification", sort="downloads", direction=-1)))
    return model.id

这个功能需要以下几个要素：

明确的名字：这个名字要能清楚地描述工具的功能。由于这个工具是用来返回某个任务下载量最多的模型，我们可以给它起名叫 model_download_counter。
输入和输出的类型提示：需要明确指出输入和输出数据的类型。
详细的描述：包括一个 Args 部分，用来描述每个参数的具体作用。所有这些信息在初始化 Agent 时，会自动嵌入到系统提示中。

此定义的格式与apply_chat_template中使用的工具模式相同，唯一的区别是添加了工具装饰器：在这里阅读有关我们的工具使用 API 的更多信息。

初始化 Agent 并运行它：

In [ ]:

from transformers import CodeAgent
agent = CodeAgent(tools=[model_download_tool], llm_engine=llm_engine)
agent.run(
    "Can you give me the name of the model that has the most downloads in the 'text-to-video' task on the Hugging Face Hub?"
)

管理 Agent 的工具箱¶

如果你已经创建了一个 Agent ，再重新初始化它来添加新工具会比较麻烦。不过，使用 Transformers 库，你可以轻松地添加或替换 Agent 的工具箱中的工具。

比如，我们可以把 model_download_tool 添加到一个已经用默认工具箱初始化好的 Agent 中。

In [ ]:

from transformers import CodeAgent

agent = CodeAgent(tools=[], llm_engine=llm_engine, add_base_tools=True)
agent.toolbox.add_tool(model_download_tool)

现在我们可以利用新的工具和之前的文本到语音的工具：

In [ ]:

agent.run(
    "Can you read out loud the name of the model that has the most downloads in the 'text-to-video' task on the Hugging Face Hub and return the audio?"
)

注意！需要将工具添加到运行良好的 Agent 时要小心，因为它可能会使选择偏向于你的工具或选择其他工具，而不是此前已经定义的工具。

你可以用 agent.toolbox.update_tool() 方法来替换 Agent 工具箱中的现有工具。这个方法特别有用，因为 Agent 已经知道怎么完成那个任务，若你的新工具是现有工具的直接替代品，那么只要确保你的新工具和被替换的工具使用相同的接口；或者更新系统提示模板，确保所有包含被替换工具的提示都更新了

使用一系列工具¶

你可以通过 ToolCollection 对象并指定工具集合的参数 collection_slug 来使用工具集合。只需把这些工具集合转换成一个列表，然后在初始化 Agent 时传递进去，就可以开始使用它们了！

为了加快启动速度，所以只有在 Agent 调用时才会加载工具。

In [ ]:

from transformers import ToolCollection, ReactCodeAgent

image_tool_collection = ToolCollection(collection_slug="huggingface-tools/diffusion-tools-6630bb19a942c2306a2cdb6f")
agent = ReactCodeAgent(tools=[*image_tool_collection.tools], add_base_tools=True)

agent.run("Please draw me a picture of rivers and lakes.")

学习资源站

012Agents(代理) and tools(工具)

Agents(代理) and tools(工具)¶

什么是 Agent (代理) ?¶

Agent 的类型¶

CodeAgent¶

ReactJsonAgent¶

如何构建 Agent？¶

代码执行¶

系统提示¶

检查 Agent 运行¶

工具¶

默认工具箱¶

创建新的工具¶

管理 Agent 的工具箱¶

使用一系列工具¶