第七章 检查结果¶
在本章中,我们将重点如何检查系统生成的输出。在向用户展示输出之前,检查输出的质量、相关性和安全性对于确保提供的回应非常重要,无论是在自动化流程中还是其他场景中。我们将学习如何使用审查 API 来评估输出,并探讨如何使用额外的 Prompt 来提升模型在展示输出之前的质量评估。
一、环境配置¶
同上一章,我们首先需要配置使用 OpenAI API 的环境
import openai
# 导入第三方库
openai.api_key = "sk-..."
# 设置 API_KEY, 请替换成您自己的 API_KEY
# 以下为基于环境变量的配置方法示例,这样更加安全。仅供参考,后续将不再涉及。
# import openai
# import os
# OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")
# openai.api_key = OPENAI_API_KEY
def get_completion_from_messages(messages,
model="gpt-3.5-turbo",
temperature=0,
max_tokens=500):
'''
封装一个访问 OpenAI GPT3.5 的函数
参数:
messages: 这是一个消息列表,每个消息都是一个字典,包含 role(角色)和 content(内容)。角色可以是'system'、'user' 或 'assistant’,内容是角色的消息。
model: 调用的模型,默认为 gpt-3.5-turbo(ChatGPT),有内测资格的用户可以选择 gpt-4
temperature: 这决定模型输出的随机程度,默认为0,表示输出将非常确定。增加温度会使输出更随机。
max_tokens: 这决定模型输出的最大的 token 数。
'''
response = openai.ChatCompletion.create(
model=model,
messages=messages,
temperature=temperature, # 这决定模型输出的随机程度
max_tokens=max_tokens, # 这决定模型输出的最大的 token 数
)
return response.choices[0].message["content"]
二、检查输出是否有潜在的有害内容¶
主要就是 Moderation API 的使用
final_response_to_customer = f"""
The SmartX ProPhone has a 6.1-inch display, 128GB storage, \
12MP dual camera, and 5G. The FotoSnap DSLR Camera \
has a 24.2MP sensor, 1080p video, 3-inch LCD, and \
interchangeable lenses. We have a variety of TVs, including \
the CineView 4K TV with a 55-inch display, 4K resolution, \
HDR, and smart TV features. We also have the SoundMax \
Home Theater system with 5.1 channel, 1000W output, wireless \
subwoofer, and Bluetooth. Do you have any specific questions \
about these products or any other products we offer?
"""
# Moderation 是 OpenAI 的内容审核函数,用于检测这段内容的危害含量
response = openai.Moderation.create(
input=final_response_to_customer
)
moderation_output = response["results"][0]
print(moderation_output)
{
"categories": {
"hate": false,
"hate/threatening": false,
"self-harm": false,
"sexual": false,
"sexual/minors": false,
"violence": false,
"violence/graphic": false
},
"category_scores": {
"hate": 2.6680607e-06,
"hate/threatening": 1.2194433e-08,
"self-harm": 8.294434e-07,
"sexual": 3.41087e-05,
"sexual/minors": 1.5462567e-07,
"violence": 6.3285606e-06,
"violence/graphic": 2.9102332e-06
},
"flagged": false
}
final_response_to_customer = f"""
SmartX ProPhone 有一个 6.1 英寸的显示屏,128GB 存储、\
1200 万像素的双摄像头,以及 5G。FotoSnap 单反相机\
有一个 2420 万像素的传感器,1080p 视频,3 英寸 LCD 和\
可更换的镜头。我们有各种电视,包括 CineView 4K 电视,\
55 英寸显示屏,4K 分辨率、HDR,以及智能电视功能。\
我们也有 SoundMax 家庭影院系统,具有 5.1 声道,\
1000W 输出,无线重低音扬声器和蓝牙。关于这些产品或\
我们提供的任何其他产品您是否有任何具体问题?
"""
response = openai.Moderation.create(
input=final_response_to_customer
)
moderation_output = response["results"][0]
print(moderation_output)
{
"categories": {
"hate": false,
"hate/threatening": false,
"self-harm": false,
"sexual": false,
"sexual/minors": false,
"violence": false,
"violence/graphic": false
},
"category_scores": {
"hate": 2.6680607e-06,
"hate/threatening": 1.2194433e-08,
"self-harm": 8.294434e-07,
"sexual": 3.41087e-05,
"sexual/minors": 1.5462567e-07,
"violence": 6.3285606e-06,
"violence/graphic": 2.9102332e-06
},
"flagged": false
}
正如您所见,这个输出没有被标记,并且在所有类别中都获得了非常低的分数,说明给定的回应是合理的。
总的来说,检查输出也是非常重要的。例如,如果您正在为敏感的受众创建一个聊天机器人,您可以使用更低的阈值来标记输出。一般来说,如果审查输出表明内容被标记,您可以采取适当的行动,例如回应一个备用答案或生成一个新的回应。
请注意,随着我们改进模型,它们也越来越不太可能返回任何有害的输出。
另一种检查输出的方法是询问模型本身生成的结果是否令人满意,是否符合您所定义的标准。这可以通过将生成的输出作为输入的一部分提供给模型,并要求它评估输出的质量来实现。您可以以多种方式进行这样的操作。让我们看一个例子。
三、检查输出结果是否与提供的产品信息相符合¶
# 这是一段电子产品相关的信息
system_message = f"""
You are an assistant that evaluates whether \
customer service agent responses sufficiently \
answer customer questions, and also validates that \
all the facts the assistant cites from the product \
information are correct.
The product information and user and customer \
service agent messages will be delimited by \
3 backticks, i.e. ```.
Respond with a Y or N character, with no punctuation:
Y - if the output sufficiently answers the question \
AND the response correctly uses product information
N - otherwise
Output a single letter only.
"""
#这是顾客的提问
customer_message = f"""
tell me about the smartx pro phone and \
the fotosnap camera, the dslr one. \
Also tell me about your tvs"""
product_information = """{ "name": "SmartX ProPhone", "category": "Smartphones and Accessories", "brand": "SmartX", "model_number": "SX-PP10", "warranty": "1 year", "rating": 4.6, "features": [ "6.1-inch display", "128GB storage", "12MP dual camera", "5G" ], "description": "A powerful smartphone with advanced camera features.", "price": 899.99 } { "name": "FotoSnap DSLR Camera", "category": "Cameras and Camcorders", "brand": "FotoSnap", "model_number": "FS-DSLR200", "warranty": "1 year", "rating": 4.7, "features": [ "24.2MP sensor", "1080p video", "3-inch LCD", "Interchangeable lenses" ], "description": "Capture stunning photos and videos with this versatile DSLR camera.", "price": 599.99 } { "name": "CineView 4K TV", "category": "Televisions and Home Theater Systems", "brand": "CineView", "model_number": "CV-4K55", "warranty": "2 years", "rating": 4.8, "features": [ "55-inch display", "4K resolution", "HDR", "Smart TV" ], "description": "A stunning 4K TV with vibrant colors and smart features.", "price": 599.99 } { "name": "SoundMax Home Theater", "category": "Televisions and Home Theater Systems", "brand": "SoundMax", "model_number": "SM-HT100", "warranty": "1 year", "rating": 4.4, "features": [ "5.1 channel", "1000W output", "Wireless subwoofer", "Bluetooth" ], "description": "A powerful home theater system for an immersive audio experience.", "price": 399.99 } { "name": "CineView 8K TV", "category": "Televisions and Home Theater Systems", "brand": "CineView", "model_number": "CV-8K65", "warranty": "2 years", "rating": 4.9, "features": [ "65-inch display", "8K resolution", "HDR", "Smart TV" ], "description": "Experience the future of television with this stunning 8K TV.", "price": 2999.99 } { "name": "SoundMax Soundbar", "category": "Televisions and Home Theater Systems", "brand": "SoundMax", "model_number": "SM-SB50", "warranty": "1 year", "rating": 4.3, "features": [ "2.1 channel", "300W output", "Wireless subwoofer", "Bluetooth" ], "description": "Upgrade your TV's audio with this sleek and powerful soundbar.", "price": 199.99 } { "name": "CineView OLED TV", "category": "Televisions and Home Theater Systems", "brand": "CineView", "model_number": "CV-OLED55", "warranty": "2 years", "rating": 4.7, "features": [ "55-inch display", "4K resolution", "HDR", "Smart TV" ], "description": "Experience true blacks and vibrant colors with this OLED TV.", "price": 1499.99 }"""
q_a_pair = f"""
Customer message: ```{customer_message}```
Product information: ```{product_information}```
Agent response: ```{final_response_to_customer}```
Does the response use the retrieved information correctly?
Does the response sufficiently answer the question?
Output Y or N
"""
#判断相关性
messages = [
{'role': 'system', 'content': system_message},
{'role': 'user', 'content': q_a_pair}
]
response = get_completion_from_messages(messages, max_tokens=1)
print(response)
Y
# 这是一段电子产品相关的信息
system_message = f"""
您是一个助理,用于评估客服代理的回复是否充分回答了客户问题,\
并验证助理从产品信息中引用的所有事实是否正确。
产品信息、用户和客服代理的信息将使用三个反引号(即 ```)\
进行分隔。
请以 Y 或 N 的字符形式进行回复,不要包含标点符号:\
Y - 如果输出充分回答了问题并且回复正确地使用了产品信息\
N - 其他情况。
仅输出单个字母。
"""
#这是顾客的提问
customer_message = f"""
告诉我有关 smartx pro 手机\
和 fotosnap 相机(单反相机)的信息。\
还有您电视的信息。
"""
product_information = """{ "name": "SmartX ProPhone", "category": "Smartphones and Accessories", "brand": "SmartX", "model_number": "SX-PP10", "warranty": "1 year", "rating": 4.6, "features": [ "6.1-inch display", "128GB storage", "12MP dual camera", "5G" ], "description": "A powerful smartphone with advanced camera features.", "price": 899.99 } { "name": "FotoSnap DSLR Camera", "category": "Cameras and Camcorders", "brand": "FotoSnap", "model_number": "FS-DSLR200", "warranty": "1 year", "rating": 4.7, "features": [ "24.2MP sensor", "1080p video", "3-inch LCD", "Interchangeable lenses" ], "description": "Capture stunning photos and videos with this versatile DSLR camera.", "price": 599.99 } { "name": "CineView 4K TV", "category": "Televisions and Home Theater Systems", "brand": "CineView", "model_number": "CV-4K55", "warranty": "2 years", "rating": 4.8, "features": [ "55-inch display", "4K resolution", "HDR", "Smart TV" ], "description": "A stunning 4K TV with vibrant colors and smart features.", "price": 599.99 } { "name": "SoundMax Home Theater", "category": "Televisions and Home Theater Systems", "brand": "SoundMax", "model_number": "SM-HT100", "warranty": "1 year", "rating": 4.4, "features": [ "5.1 channel", "1000W output", "Wireless subwoofer", "Bluetooth" ], "description": "A powerful home theater system for an immersive audio experience.", "price": 399.99 } { "name": "CineView 8K TV", "category": "Televisions and Home Theater Systems", "brand": "CineView", "model_number": "CV-8K65", "warranty": "2 years", "rating": 4.9, "features": [ "65-inch display", "8K resolution", "HDR", "Smart TV" ], "description": "Experience the future of television with this stunning 8K TV.", "price": 2999.99 } { "name": "SoundMax Soundbar", "category": "Televisions and Home Theater Systems", "brand": "SoundMax", "model_number": "SM-SB50", "warranty": "1 year", "rating": 4.3, "features": [ "2.1 channel", "300W output", "Wireless subwoofer", "Bluetooth" ], "description": "Upgrade your TV's audio with this sleek and powerful soundbar.", "price": 199.99 } { "name": "CineView OLED TV", "category": "Televisions and Home Theater Systems", "brand": "CineView", "model_number": "CV-OLED55", "warranty": "2 years", "rating": 4.7, "features": [ "55-inch display", "4K resolution", "HDR", "Smart TV" ], "description": "Experience true blacks and vibrant colors with this OLED TV.", "price": 1499.99 }"""
q_a_pair = f"""
顾客的信息: ```{customer_message}```
产品信息: ```{product_information}```
代理的回复: ```{final_response_to_customer}```
回复是否正确使用了检索的信息?
回复是否充分地回答了问题?
输出 Y 或 N
"""
#判断相关性
messages = [
{'role': 'system', 'content': system_message},
{'role': 'user', 'content': q_a_pair}
]
response = get_completion_from_messages(messages, max_tokens=1)
print(response)
Y
another_response = "life is like a box of chocolates"
q_a_pair = f"""
Customer message: ```{customer_message}```
Product information: ```{product_information}```
Agent response: ```{another_response}```
Does the response use the retrieved information correctly?
Does the response sufficiently answer the question?
Output Y or N
"""
messages = [
{'role': 'system', 'content': system_message},
{'role': 'user', 'content': q_a_pair}
]
response = get_completion_from_messages(messages)
print(response)
N
another_response = "生活就像一盒巧克力"
q_a_pair = f"""
顾客的信息: ```{customer_message}```
产品信息: ```{product_information}```
代理的回复: ```{another_response}```
回复是否正确使用了检索的信息?
回复是否充分地回答了问题?
输出 Y 或 N
"""
messages = [
{'role': 'system', 'content': system_message},
{'role': 'user', 'content': q_a_pair}
]
response = get_completion_from_messages(messages)
print(response)
N
因此,您可以看到,模型能够提供关于生成输出质量的反馈。您可以利用这个反馈来决定是否展示输出给用户或生成新的回应。甚至可以尝试为每个用户查询生成多个模型回应,然后选择最佳的回应展示给用户。因此,您有多种尝试的方式。
总的来说,使用审查 API 来检查输出是一个不错的做法。但是,我认为在大部分情况下这可能是不必要的,尤其是当您使用更先进的模型,例如 GPT-4 时。
事实上,我们并没有看到很多人在实际生产环境中采取这种做法。这也会增加系统的延迟和成本,因为您必须等待额外的调用,还需要额外的 tokens。如果您的应用或产品的错误率只有 0.0000001%,那么或许您可以尝试这种方法。但总的来说,我们不建议您在实际应用中采用这种方式。
在下一章中,我们将把我们在评估输入部分、处理部分和检查输出中学到的所有内容结合起来,构建一个端到端的系统。