第四章 文本概括¶
一、引言¶
当今世界上文本信息浩如烟海,我们很难拥有足够的时间去阅读所有想了解的东西。但欣喜的是,目前LLM在文本概括任务上展现了强大的水准,也已经有不少团队将概括功能实现在多种应用中。
本章节将介绍如何使用编程的方式,调用API接口来实现“文本概括”功能。
首先,我们需要引入 OpenAI 包,加载 API 密钥,定义 getCompletion 函数。
import openai
# 导入第三方库
openai.api_key = "sk-..."
# 设置 API_KEY, 请替换成您自己的 API_KEY
def get_completion(prompt, model="gpt-3.5-turbo"):
messages = [{"role": "user", "content": prompt}]
response = openai.ChatCompletion.create(
model=model,
messages=messages,
temperature=0, # 值越低则输出文本随机性越低
)
return response.choices[0].message["content"]
二、单一文本概括¶
以商品评论的总结任务为例:对于电商平台来说,网站上往往存在着海量的商品评论,这些评论反映了所有客户的想法。如果我们拥有一个工具去概括这些海量、冗长的评论,便能够快速地浏览更多评论,洞悉客户的偏好,从而指导平台与商家提供更优质的服务。
输入文本
prod_review = """
Got this panda plush toy for my daughter's birthday, \
who loves it and takes it everywhere. It's soft and \
super cute, and its face has a friendly look. It's \
a bit small for what I paid though. I think there \
might be other options that are bigger for the \
same price. It arrived a day earlier than expected, \
so I got to play with it myself before I gave it \
to her.
"""
输入文本(中文翻译)
prod_review_zh = """
这个熊猫公仔是我给女儿的生日礼物,她很喜欢,去哪都带着。
公仔很软,超级可爱,面部表情也很和善。但是相比于价钱来说,
它有点小,我感觉在别的地方用同样的价钱能买到更大的。
快递比预期提前了一天到货,所以在送给女儿之前,我自己玩了会。
"""
2.1 限制输出文本长度¶
我们尝试限制文本长度为最多30词。
prompt = f"""
Your task is to generate a short summary of a product \
review from an ecommerce site.
Summarize the review below, delimited by triple
backticks, in at most 30 words.
Review: ```{prod_review}```
"""
response = get_completion(prompt)
print(response)
Soft and cute panda plush toy loved by daughter, but a bit small for the price. Arrived early.
中文翻译版本
prompt = f"""
您的任务是从电子商务网站上生成一个产品评论的简短摘要。
请对三个反引号之间的评论文本进行概括,最多30个词汇。
评论: ```{prod_review_zh}```
"""
response = get_completion(prompt)
print(response)
可爱软熊猫公仔,女儿喜欢,面部表情和善,但价钱有点小贵,快递提前一天到货。
2.2 设置关键角度侧重¶
有时,针对不同的业务,我们对文本的侧重会有所不同。例如对于商品评论文本,物流会更关心运输时效,商家更加关心价格与商品质量,平台更关心整体服务体验。
我们可以通过增加Prompt提示,来体现对于某个特定角度的侧重。
2.2.1 侧重于快递服务¶
prompt = f"""
Your task is to generate a short summary of a product \
review from an ecommerce site to give feedback to the \
Shipping deparmtment.
Summarize the review below, delimited by triple
backticks, in at most 30 words, and focusing on any aspects \
that mention shipping and delivery of the product.
Review: ```{prod_review}```
"""
response = get_completion(prompt)
print(response)
The panda plush toy arrived a day earlier than expected, but the customer felt it was a bit small for the price paid.
中文翻译版本
prompt = f"""
您的任务是从电子商务网站上生成一个产品评论的简短摘要。
请对三个反引号之间的评论文本进行概括,最多30个词汇,并且聚焦在产品运输上。
评论: ```{prod_review_zh}```
"""
response = get_completion(prompt)
print(response)
快递提前到货,熊猫公仔软可爱,但有点小,价钱不太划算。
可以看到,输出结果以“快递提前一天到货”开头,体现了对于快递效率的侧重。
2.2.2 侧重于价格与质量¶
prompt = f"""
Your task is to generate a short summary of a product \
review from an ecommerce site to give feedback to the \
pricing deparmtment, responsible for determining the \
price of the product.
Summarize the review below, delimited by triple
backticks, in at most 30 words, and focusing on any aspects \
that are relevant to the price and perceived value.
Review: ```{prod_review}```
"""
response = get_completion(prompt)
print(response)
The panda plush toy is soft, cute, and loved by the recipient, but the price may be too high for its size compared to other options.
中文翻译版本
prompt = f"""
您的任务是从电子商务网站上生成一个产品评论的简短摘要。
请对三个反引号之间的评论文本进行概括,最多30个词汇,并且聚焦在产品价格和质量上。
评论: ```{prod_review_zh}```
"""
response = get_completion(prompt)
print(response)
可爱软熊猫公仔,面部表情友好,但价钱有点高,尺寸较小。快递提前一天到货。
可以看到,输出结果以“质量好、价格小贵、尺寸小”开头,体现了对于产品价格与质量的侧重。
2.3 关键信息提取¶
在2.2节中,虽然我们通过添加关键角度侧重的 Prompt ,使得文本摘要更侧重于某一特定方面,但是可以发现,结果中也会保留一些其他信息,如偏重价格与质量角度的概括中仍保留了“快递提前到货”的信息。如果我们只想要提取某一角度的信息,并过滤掉其他所有信息,则可以要求 LLM 进行“文本提取( Extract )”而非“概括( Summarize )”
prompt = f"""
Your task is to extract relevant information from \
a product review from an ecommerce site to give \
feedback to the Shipping department.
From the review below, delimited by triple quotes \
extract the information relevant to shipping and \
delivery. Limit to 30 words.
Review: ```{prod_review}```
"""
response = get_completion(prompt)
print(response)
"The product arrived a day earlier than expected."
中文翻译版本
prompt = f"""
您的任务是从电子商务网站上的产品评论中提取相关信息。
请从以下三个反引号之间的评论文本中提取产品运输相关的信息,最多30个词汇。
评论: ```{prod_review_zh}```
"""
response = get_completion(prompt)
print(response)
快递比预期提前了一天到货。
三、同时概括多条文本¶
在实际的工作流中,我们往往有许许多多的评论文本,以下示例将多条用户评价放进列表,并利用 for 循环,使用文本概括(Summarize)提示词,将评价概括至小于 20 词,并按顺序打印。当然,在实际生产中,对于不同规模的评论文本,除了使用 for 循环以外,还可能需要考虑整合评论、分布式等方法提升运算效率。您可以搭建主控面板,来总结大量用户评论,来方便您或他人快速浏览,还可以点击查看原评论。这样您能高效掌握顾客的所有想法。
review_1 = prod_review
# review for a standing lamp
review_2 = """
Needed a nice lamp for my bedroom, and this one \
had additional storage and not too high of a price \
point. Got it fast - arrived in 2 days. The string \
to the lamp broke during the transit and the company \
happily sent over a new one. Came within a few days \
as well. It was easy to put together. Then I had a \
missing part, so I contacted their support and they \
very quickly got me the missing piece! Seems to me \
to be a great company that cares about their customers \
and products.
"""
# review for an electric toothbrush
review_3 = """
My dental hygienist recommended an electric toothbrush, \
which is why I got this. The battery life seems to be \
pretty impressive so far. After initial charging and \
leaving the charger plugged in for the first week to \
condition the battery, I've unplugged the charger and \
been using it for twice daily brushing for the last \
3 weeks all on the same charge. But the toothbrush head \
is too small. I’ve seen baby toothbrushes bigger than \
this one. I wish the head was bigger with different \
length bristles to get between teeth better because \
this one doesn’t. Overall if you can get this one \
around the $50 mark, it's a good deal. The manufactuer's \
replacements heads are pretty expensive, but you can \
get generic ones that're more reasonably priced. This \
toothbrush makes me feel like I've been to the dentist \
every day. My teeth feel sparkly clean!
"""
# review for a blender
review_4 = """
So, they still had the 17 piece system on seasonal \
sale for around $49 in the month of November, about \
half off, but for some reason (call it price gouging) \
around the second week of December the prices all went \
up to about anywhere from between $70-$89 for the same \
system. And the 11 piece system went up around $10 or \
so in price also from the earlier sale price of $29. \
So it looks okay, but if you look at the base, the part \
where the blade locks into place doesn’t look as good \
as in previous editions from a few years ago, but I \
plan to be very gentle with it (example, I crush \
very hard items like beans, ice, rice, etc. in the \
blender first then pulverize them in the serving size \
I want in the blender then switch to the whipping \
blade for a finer flour, and use the cross cutting blade \
first when making smoothies, then use the flat blade \
if I need them finer/less pulpy). Special tip when making \
smoothies, finely cut and freeze the fruits and \
vegetables (if using spinach-lightly stew soften the \
spinach then freeze until ready for use-and if making \
sorbet, use a small to medium sized food processor) \
that you plan to use that way you can avoid adding so \
much ice if at all-when making your smoothie. \
After about a year, the motor was making a funny noise. \
I called customer service but the warranty expired \
already, so I had to buy another one. FYI: The overall \
quality has gone done in these types of products, so \
they are kind of counting on brand recognition and \
consumer loyalty to maintain sales. Got it in about \
two days.
"""
reviews = [review_1, review_2, review_3, review_4]
review_1 = prod_review_zh
# review for a standing lamp
review_2 = """
我想为我的卧室找一个漂亮的灯,这款灯还有额外的存储空间,价格也不太高。\
购买后很快就收到了,两天就送到了。但在运输过程中,灯的拉链断了,公司态度\
很好,发来了一条新的。新的拉链也在几天内就到了。这个灯非常容易装配。后来,我\
发现缺少一个部分,所以我联系了他们的客户支持,他们很快就给我寄来了缺失的部件\
!我觉得这是一家非常关心他们的客户和产品的好公司。
"""
# review for an electric toothbrush
review_3 = """
我的牙科卫生师推荐我使用电动牙刷,这就是我购买这款牙刷的原因。目前为止,我发现电池的\
续航时间颇为令人印象深刻。在初次充电并在第一周保持充电器插头插入以调节电池状态之后,我\
已经将充电器拔掉,并在过去的3周里,每天两次刷牙都使用同一次充电。然而,这款牙刷的刷头实\
在太小了。我见过的婴儿牙刷都比这个大。我希望牙刷头能做得更大一些,搭配不同长度的刷毛更好\
地清洁牙齿间缝,因为现有的无法做到这一点。总的来说,如果你能以大约50美元的价格购入这款电动\
牙刷,那它就物超所值。厂家配套的替换刷头价格相当昂贵,但你可以买到价格更为合理的通用款。\
使用这款牙刷让我感觉像每天都去看了牙医一样,我的牙齿感觉洁净如新!
"""
# review for a blender
review_4 = """
他们还在11月把17件套系统以大约$49的优惠价格销售,几乎是五折。但不明原因(轻易就可以归咎为价格欺诈)\
在到了12月第二周,同一套系统的价格一下儿飙升到了$70-$89之间。11件套系统的价格也从之前的优惠价$29上\
升了大概$10。看上去还算公道,但如果你仔细观察底部,会发现刀片锁定的部位相比几年前的版本要略逊一筹,所\
以我打算非常小心翼翼地使用(例如,我会将像豆子、冰块、大米之类的硬质食材先用搅拌机压碎,然后调到我需要\
的份量,再用打发刀片研磨成更细的粉状,制作冰沙时我首选交叉刀片,如果需要更细腻些或者少些浆糊状,我会换成\
平刀)。在制作果昔时,把将要用的水果和蔬菜切片冷冻是个小技巧(如果你打算用菠菜,要先稍微焖炖软,再冷冻,\
制作雪葩时,用一个小到中号的食品加工器就行)这样就不用或者很少加冰块到你的果昔了。大约一年后,电机开始发出\
一些可疑的声音。我联系了客服,但保修期已经过期,所以我只好另购一台。友情提示:这类产品的整体质量都在下滑,\
所以他们更多的是利用品牌知名度和消费者的忠诚度来保持销售。我在两天之后就收到了它。
"""
reviews = [review_1, review_2, review_3, review_4]
for i in range(len(reviews)):
prompt = f"""
Your task is to generate a short summary of a product \
review from an ecommerce site.
Summarize the review below, delimited by triple \
backticks in at most 20 words.
Review: ```{reviews[i]}```
"""
response = get_completion(prompt)
print(i, response, "\n")
0 Soft and cute panda plush toy loved by daughter, but a bit small for the price. Arrived early. 1 Affordable lamp with storage, fast shipping, and excellent customer service. Easy to assemble and missing parts were quickly replaced. 2 Good battery life, small toothbrush head, but effective cleaning. Good deal if bought around $50. 3 The product was on sale for $49 in November, but the price increased to $70-$89 in December. The base doesn't look as good as previous editions, but the reviewer plans to be gentle with it. A special tip for making smoothies is to freeze the fruits and vegetables beforehand. The motor made a funny noise after a year, and the warranty had expired. Overall quality has decreased.
for i in range(len(reviews)):
prompt = f"""
你的任务是从电子商务网站上的产品评论中提取相关信息。
请对三个反引号之间的评论文本进行概括,最多20个词汇。
评论文本: ```{reviews[i]}```
"""
response = get_completion(prompt)
print(i, response, "\n")
0 概括:可爱的熊猫毛绒玩具,质量好,送货快,但有点小。
1 这个评论是关于一款具有额外储存空间的床头灯,价格适中。客户对公司的服务和产品表示满意。
2 评论概括:电动牙刷电池寿命长,但刷头太小,需要更长的刷毛。价格合理,使用后牙齿感觉干净。
3 评论概括:产品价格在12月份上涨,质量不如以前,但交付速度快。