第六章评估¶

一、设置OpenAI API Key
二、创建LLM应用
三、人工评估
- 3.1 如何评估新创建的实例
- 3.2 中文版
四、通过LLM进行评估实例

一、设置OpenAI API Key¶

登陆 OpenAI 账户获取API Key，然后将其设置为环境变量。

如果你想要设置为全局环境变量，可以参考知乎文章。
如果你想要设置为本地/项目环境变量，在本文件目录下创建.env文件, 打开文件输入以下内容。

OPENAI_API_KEY="your_api_key"

替换"your_api_key"为你自己的 API Key

In [1]:

import os
import openai
from dotenv import load_dotenv, find_dotenv

# 读取本地/项目的环境变量。

# find_dotenv()寻找并定位.env文件的路径
# load_dotenv()读取该.env文件，并将其中的环境变量加载到当前的运行环境中  
# 如果你设置的是全局的环境变量，这行代码则没有任何作用。
_ = load_dotenv(find_dotenv())

# 获取环境变量 OPENAI_API_KEY
openai.api_key = os.environ['OPENAI_API_KEY']

二、创建LLM应用¶

按照langchain链的方式进行构建

In [2]:

from langchain.chains.retrieval_qa.base import RetrievalQA #检索QA链，在文档上进行检索
from langchain.chat_models.openai import ChatOpenAI #openai模型
from langchain.document_loaders import CSVLoader #文档加载器，采用csv格式存储
from langchain.indexes import VectorstoreIndexCreator #导入向量存储索引创建器
from langchain.vectorstores import DocArrayInMemorySearch #向量存储

In [3]:

#加载数据
file = './data/OutdoorClothingCatalog_1000.csv'
loader = CSVLoader(file_path=file)
data = loader.load()

In [4]:

#查看数据
import pandas as pd
test_data = pd.read_csv(file,header=None)
test_data

Out[4]:

	0	1	2
0	NaN	name	description
1	0.0	Women's Campside Oxfords	This ultracomfortable lace-to-toe Oxford boast...
2	1.0	Recycled Waterhog Dog Mat, Chevron Weave	Protect your floors from spills and splashing ...
3	2.0	Infant and Toddler Girls' Coastal Chill Swimsu...	She'll love the bright colors, ruffles and exc...
4	3.0	Refresh Swimwear, V-Neck Tankini Contrasts	Whether you're going for a swim or heading out...
...	...	...	...
996	995.0	Men's Classic Denim, Standard Fit	Crafted from premium denim that will last wash...
997	996.0	CozyPrint Sweater Fleece Pullover	The ultimate sweater fleece - made from superi...
998	997.0	Women's NRS Endurance Spray Paddling Pants	These comfortable and affordable splash paddli...
999	998.0	Women's Stop Flies Hoodie	This great-looking hoodie uses No Fly Zone Tec...
1000	999.0	Modern Utility Bag	This US-made crossbody bag is built with the s...

1001 rows × 3 columns

In [5]:

from langchain_community.embeddings.openai import OpenAIEmbeddings
'''
将指定向量存储类,创建完成后，我们将从加载器中调用,通过文档记载器列表加载
'''
embeddings = OpenAIEmbeddings(model='text-embedding-3-small') #初始化
index = VectorstoreIndexCreator(
    embedding=embeddings,
    vectorstore_cls=DocArrayInMemorySearch
).from_documents(data[:100]) # 考虑到时间及经济方面只向量化前100行数据

/Users/lta/anaconda3/envs/cookbook/lib/python3.10/site-packages/langchain_core/_api/deprecation.py:119: LangChainDeprecationWarning: The class `OpenAIEmbeddings` was deprecated in LangChain 0.0.9 and will be removed in 0.3.0. An updated version of the class exists in the langchain-openai package and should be used instead. To use it run `pip install -U langchain-openai` and import as `from langchain_openai import OpenAIEmbeddings`.
  warn_deprecated(
/Users/lta/anaconda3/envs/cookbook/lib/python3.10/site-packages/pydantic/_migration.py:283: UserWarning: `pydantic.error_wrappers:ValidationError` has been moved to `pydantic:ValidationError`.
  warnings.warn(f'`{import_path}` has been moved to `{new_location}`.')

In [6]:

#通过指定语言模型、链类型、检索器和我们要打印的详细程度来创建检索QA链
llm = ChatOpenAI(temperature = 0.0)
qa = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=index.vectorstore.as_retriever(), 
    verbose=True,
    chain_type_kwargs = {
        "document_separator": "<<<<>>>>>"
    }
)

/Users/lta/anaconda3/envs/cookbook/lib/python3.10/site-packages/langchain_core/_api/deprecation.py:119: LangChainDeprecationWarning: The class `ChatOpenAI` was deprecated in LangChain 0.0.10 and will be removed in 0.3.0. An updated version of the class exists in the langchain-openai package and should be used instead. To use it run `pip install -U langchain-openai` and import as `from langchain_openai import ChatOpenAI`.
  warn_deprecated(

2.1 创建评估数据点¶

我们需要做的第一件事是真正弄清楚我们想要评估它的一些数据点，我们将介绍几种不同的方法来完成这个任务

1、将自己想出好的数据点作为例子，查看一些数据，然后想出例子问题和答案，以便以后用于评估

In [7]:

data[10]#查看这里的一些文档，我们可以对其中发生的事情有所了解

Out[7]:

Document(page_content=": 10\nname: Cozy Comfort Pullover Set, Stripe\ndescription: Perfect for lounging, this striped knit set lives up to its name. We used ultrasoft fabric and an easy design that's as comfortable at bedtime as it is when we have to make a quick run out.\n\nSize & Fit\n- Pants are Favorite Fit: Sits lower on the waist.\n- Relaxed Fit: Our most generous fit sits farthest from the body.\n\nFabric & Care\n- In the softest blend of 63% polyester, 35% rayon and 2% spandex.\n\nAdditional Features\n- Relaxed fit top with raglan sleeves and rounded hem.\n- Pull-on pants have a wide elastic waistband and drawstring, side pockets and a modern slim leg.\n\nImported.", metadata={'source': './data/OutdoorClothingCatalog_1000.csv', 'row': 10})

In [8]:

data[11]

Out[8]:

Document(page_content=': 11\nname: Ultra-Lofty 850 Stretch Down Hooded Jacket\ndescription: This technical stretch down jacket from our DownTek collection is sure to keep you warm and comfortable with its full-stretch construction providing exceptional range of motion. With a slightly fitted style that falls at the hip and best with a midweight layer, this jacket is suitable for light activity up to 20° and moderate activity up to -30°. The soft and durable 100% polyester shell offers complete windproof protection and is insulated with warm, lofty goose down. Other features include welded baffles for a no-stitch construction and excellent stretch, an adjustable hood, an interior media port and mesh stash pocket and a hem drawcord. Machine wash and dry. Imported.', metadata={'source': './data/OutdoorClothingCatalog_1000.csv', 'row': 11})

看起来第一个文档中有这个套头衫，第二个文档中有这个夹克，从这些细节中，我们可以创建一些例子查询和答案

2.2 创建测试用例数据¶

In [9]:

examples = [
    {
        "query": "Do the Cozy Comfort Pullover Set\
        have side pockets?",
        "answer": "Yes"
    },
    {
        "query": "What collection is the Ultra-Lofty \
        850 Stretch Down Hooded Jacket from?",
        "answer": "The DownTek collection"
    }
]

因此，我们可以问一个简单的问题，这个舒适的套头衫套装有侧口袋吗？，我们可以通过上面的内容看到，它确实有一些侧口袋，答案为是对于第二个文档，我们可以看到这件夹克来自某个系列，即down tech系列，答案是down tech系列。

2.3 通过LLM生成测试用例¶

In [10]:

from langchain.evaluation.qa import QAGenerateChain #导入QA生成链，它将接收文档，并从每个文档中创建一个问题答案对

In [11]:

example_gen_chain = QAGenerateChain.from_llm(ChatOpenAI())#通过传递chat open AI语言模型来创建这个链

In [12]:

data[:5]

Out[12]:

[Document(page_content=": 0\nname: Women's Campside Oxfords\ndescription: This ultracomfortable lace-to-toe Oxford boasts a super-soft canvas, thick cushioning, and quality construction for a broken-in feel from the first time you put them on. \n\nSize & Fit: Order regular shoe size. For half sizes not offered, order up to next whole size. \n\nSpecs: Approx. weight: 1 lb.1 oz. per pair. \n\nConstruction: Soft canvas material for a broken-in feel and look. Comfortable EVA innersole with Cleansport NXT® antimicrobial odor control. Vintage hunt, fish and camping motif on innersole. Moderate arch contour of innersole. EVA foam midsole for cushioning and support. Chain-tread-inspired molded rubber outsole with modified chain-tread pattern. Imported. \n\nQuestions? Please contact us for any inquiries.", metadata={'source': './data/OutdoorClothingCatalog_1000.csv', 'row': 0}),
Document(page_content=': 1\nname: Recycled Waterhog Dog Mat, Chevron Weave\ndescription: Protect your floors from spills and splashing with our ultradurable recycled Waterhog dog mat made right here in the USA. \n\nSpecs\nSmall - Dimensions: 18" x 28". \nMedium - Dimensions: 22.5" x 34.5".\n\nWhy We Love It\nMother nature, wet shoes and muddy paws have met their match with our Recycled Waterhog mats. Ruggedly constructed from recycled plastic materials, these ultratough mats help keep dirt and water off your floors and plastic out of landfills, trails and oceans. Now, that\'s a win-win for everyone.\n\nFabric & Care\nVacuum or hose clean.\n\nConstruction\n24 oz. polyester fabric made from 94% recycled materials.\nRubber backing.\n\nAdditional Features\nFeatures an -exclusive design.\nFeatures thick and thin fibers for scraping dirt and absorbing water.\nDries quickly and resists fading, rotting, mildew and shedding.\nUse indoors or out.\nMade in the USA.\n\nHave questions? Reach out to our customer service team with any questions you may have.', metadata={'source': './data/OutdoorClothingCatalog_1000.csv', 'row': 1}),
Document(page_content=": 2\nname: Infant and Toddler Girls' Coastal Chill Swimsuit, Two-Piece\ndescription: She'll love the bright colors, ruffles and exclusive whimsical prints of this toddler's two-piece swimsuit! Our four-way-stretch and chlorine-resistant fabric keeps its shape and resists snags. The UPF 50+ rated fabric provides the highest rated sun protection possible, blocking 98% of the sun's harmful rays. The crossover no-slip straps and fully lined bottom ensure a secure fit and maximum coverage. Machine wash and line dry for best results. Imported.", metadata={'source': './data/OutdoorClothingCatalog_1000.csv', 'row': 2}),
Document(page_content=": 3\nname: Refresh Swimwear, V-Neck Tankini Contrasts\ndescription: Whether you're going for a swim or heading out on an SUP, this watersport-ready tankini top is designed to move with you and stay comfortable. All while looking great in an eye-catching colorblock style. \n\nSize & Fit\nFitted: Sits close to the body.\n\nWhy We Love It\nNot only does this swimtop feel good to wear, its fabric is good for the earth too. In recycled nylon, with Lycra® spandex for the perfect amount of stretch. \n\nFabric & Care\nThe premium Italian-blend is breathable, quick drying and abrasion resistant. \nBody in 82% recycled nylon with 18% Lycra® spandex. \nLined in 90% recycled nylon with 10% Lycra® spandex. \nUPF 50+ rated – the highest rated sun protection possible. \nHandwash, line dry.\n\nAdditional Features\nLightweight racerback straps are easy to get on and off, and won't get in your way. \nFlattering V-neck silhouette. \nImported.\n\nSun Protection That Won't Wear Off\nOur high-performance fabric provides SPF", metadata={'source': './data/OutdoorClothingCatalog_1000.csv', 'row': 3}),
Document(page_content=": 4\nname: EcoFlex 3L Storm Pants\ndescription: Our new TEK O2 technology makes our four-season waterproof pants even more breathable. It's guaranteed to keep you dry and comfortable – whatever the activity and whatever the weather. Size & Fit: Slightly Fitted through hip and thigh. \n\nWhy We Love It: Our state-of-the-art TEK O2 technology offers the most breathability we've ever tested. Great as ski pants, they're ideal for a variety of outdoor activities year-round. Plus, they're loaded with features outdoor enthusiasts appreciate, including weather-blocking gaiters and handy side zips. Air In. Water Out. See how our air-permeable TEK O2 technology keeps you dry and comfortable. \n\nFabric & Care: 100% nylon, exclusive of trim. Machine wash and dry. \n\nAdditional Features: Three-layer shell delivers waterproof protection. Brand new TEK O2 technology provides enhanced breathability. Interior gaiters keep out rain and snow. Full side zips for easy on/off over boots. Two zippered hand pockets. Thigh pocket. Imported.\n\n – Official Supplier to the U.S. Ski Team\nTHEIR WILL", metadata={'source': './data/OutdoorClothingCatalog_1000.csv', 'row': 4})]

In [13]:

new_examples = example_gen_chain.apply_and_parse(
    [{"doc": t} for t in data[:5]]
) #我们可以创建许多例子

/Users/lta/anaconda3/envs/cookbook/lib/python3.10/site-packages/langchain/chains/llm.py:367: UserWarning: The apply_and_parse method is deprecated, instead pass an output parser directly to LLMChain.
  warnings.warn(

In [14]:

new_examples #查看用例数据

Out[14]:

[{'qa_pairs': {'query': "What is the weight of one pair of Women's Campside Oxfords?",
   'answer': "The approximate weight of one pair of Women's Campside Oxfords is 1 lb. 1 oz."}},
 {'qa_pairs': {'query': 'What are the dimensions of the small and medium sizes of the Recycled Waterhog Dog Mat, Chevron Weave?',
   'answer': 'The small size has dimensions of 18" x 28", while the medium size has dimensions of 22.5" x 34.5".'}},
 {'qa_pairs': {'query': "What are some key features of the Infant and Toddler Girls' Coastal Chill Swimsuit, Two-Piece as described in the document?",
   'answer': 'Some key features of the swimsuit include bright colors, ruffles, exclusive whimsical prints, four-way-stretch and chlorine-resistant fabric, UPF 50+ rated fabric for sun protection, crossover no-slip straps, fully lined bottom for secure fit and maximum coverage, and the recommendation to machine wash and line dry for best results.'}},
 {'qa_pairs': {'query': 'What is the composition of the fabric used in the Refresh Swimwear V-Neck Tankini Contrasts?',
   'answer': 'The body of the tankini is made of 82% recycled nylon and 18% Lycra® spandex, while the lining is made of 90% recycled nylon and 10% Lycra® spandex.'}},
 {'qa_pairs': {'query': 'What technology is used in the EcoFlex 3L Storm Pants to make them more breathable and waterproof?',
   'answer': 'The EcoFlex 3L Storm Pants use TEK O2 technology to make them more breathable and waterproof.'}}]

In [15]:

new_examples[0]

Out[15]:

{'qa_pairs': {'query': "What is the weight of one pair of Women's Campside Oxfords?",
  'answer': "The approximate weight of one pair of Women's Campside Oxfords is 1 lb. 1 oz."}}

In [16]:

data[0]

Out[16]:

Document(page_content=": 0\nname: Women's Campside Oxfords\ndescription: This ultracomfortable lace-to-toe Oxford boasts a super-soft canvas, thick cushioning, and quality construction for a broken-in feel from the first time you put them on. \n\nSize & Fit: Order regular shoe size. For half sizes not offered, order up to next whole size. \n\nSpecs: Approx. weight: 1 lb.1 oz. per pair. \n\nConstruction: Soft canvas material for a broken-in feel and look. Comfortable EVA innersole with Cleansport NXT® antimicrobial odor control. Vintage hunt, fish and camping motif on innersole. Moderate arch contour of innersole. EVA foam midsole for cushioning and support. Chain-tread-inspired molded rubber outsole with modified chain-tread pattern. Imported. \n\nQuestions? Please contact us for any inquiries.", metadata={'source': './data/OutdoorClothingCatalog_1000.csv', 'row': 0})

2.4 组合用例数据¶

In [17]:

examples += new_examples

In [18]:

qa.run(examples[0]["query"])

/Users/lta/anaconda3/envs/cookbook/lib/python3.10/site-packages/langchain_core/_api/deprecation.py:119: LangChainDeprecationWarning: The method `Chain.run` was deprecated in langchain 0.1.0 and will be removed in 0.3.0. Use invoke instead.
  warn_deprecated(


> Entering new RetrievalQA chain...

> Finished chain.

Out[18]:

'Yes, the Cozy Comfort Pullover Set does have side pockets.'

2.5 中文版¶

按照langchain链的方式进行构建

In [19]:

from langchain.chains.retrieval_qa.base import RetrievalQA #检索QA链，在文档上进行检索
from langchain.chat_models.openai import ChatOpenAI #openai模型
from langchain.document_loaders import CSVLoader #文档加载器，采用csv格式存储
from langchain.indexes import VectorstoreIndexCreator #导入向量存储索引创建器
from langchain.vectorstores import DocArrayInMemorySearch #向量存储

In [20]:

#加载中文数据
file = './data/product_data.csv'
loader = CSVLoader(file_path=file)
data = loader.load()

In [21]:

#查看数据
import pandas as pd
test_data = pd.read_csv(file,header=None)
test_data

Out[21]:

	0	1
0	product_name	description
1	全自动咖啡机	规格:\n大型 - 尺寸：13.8'' x 17.3''。\n中型 - 尺寸：11.5'' ...
2	电动牙刷	规格:\n一般大小 - 高度：9.5''，宽度：1''。\n\n为什么我们热爱它:\n我们的...
3	橙味维生素C泡腾片	规格:\n每盒含有20片。\n\n为什么我们热爱它:\n我们的橙味维生素C泡腾片是快速补充维...
4	无线蓝牙耳机	规格:\n单个耳机尺寸：1.5'' x 1.3''。\n\n为什么我们热爱它:\n这款无线蓝...
5	瑜伽垫	规格:\n尺寸：24'' x 68''。\n\n为什么我们热爱它:\n我们的瑜伽垫拥有出色的...
6	防水运动手表	规格:\n表盘直径：40mm。\n\n为什么我们热爱它:\n这款防水运动手表配备了心率监测和...
7	书籍:《机器学习基础》	规格:\n页数：580页。\n\n为什么我们热爱它:\n《机器学习基础》以易懂的语言讲解了机...
8	空气净化器	规格:\n尺寸：15'' x 15'' x 20''。\n\n为什么我们热爱它:\n我们的空...
9	陶瓷保温杯	规格:\n容量：350ml。\n\n为什么我们热爱它:\n我们的陶瓷保温杯设计优雅，保温效果...
10	宠物自动喂食器	规格:\n尺寸：14'' x 9'' x 15''。\n\n为什么我们热爱它:\n我们的宠物...
11	高清电视机	规格:\n尺寸：50''。\n\n为什么我们热爱它:\n我们的高清电视机拥有出色的画质和强大...
12	旅行背包	规格:\n尺寸：18'' x 12'' x 6''。\n\n为什么我们热爱它:\n我们的旅行...
13	太阳能庭院灯	规格:\n高度：18''。\n\n为什么我们热爱它:\n我们的太阳能庭院灯无需电源，只需将其...
14	厨房刀具套装	规格:\n一套包括8把刀。\n\n为什么我们热爱它:\n我们的厨房刀具套装由专业级不锈钢制成...
15	迷你无线蓝牙音箱	规格:\n直径：3''，高度：2''。\n\n为什么我们热爱它:\n我们的迷你无线蓝牙音箱体...
16	抗菌洗手液	规格:\n容量：500ml。\n\n为什么我们热爱它:\n我们的抗菌洗手液含有天然植物精华，...
17	纯棉T恤	规格:\n尺码：S, M, L, XL, XXL。\n\n为什么我们热爱它:\n我们的纯棉T...
18	自动咖啡机	规格:\n尺寸：12'' x 8'' x 14''。\n\n为什么我们热爱它:\n我们的自动...
19	摄像头保护套	规格:\n适用于各种品牌和型号的摄像头。\n\n为什么我们热爱它:\n我们的摄像头保护套可以...
20	玻璃保护膜	规格:\n适用于各种尺寸的手机屏幕。\n\n为什么我们热爱它:\n我们的玻璃保护膜可以有效防...
21	儿童益智玩具	规格:\n适合3岁以上的儿童。\n\n为什么我们热爱它:\n我们的儿童益智玩具设计独特，色彩...
22	迷你书架	规格:\n尺寸：20'' x 8'' x 24''。\n\n为什么我们热爱它:\n我们的迷你...
23	防滑瑜伽垫	规格:\n尺寸：72'' x 24''。\n\n为什么我们热爱它:\n我们的防滑瑜伽垫采用高...
24	LED台灯	规格:\n尺寸：6'' x 6'' x 18''。\n\n为什么我们热爱它:\n我们的LED...
25	水晶酒杯	规格:\n容量：250ml。\n\n为什么我们热爱它:\n我们的水晶酒杯采用高品质水晶玻璃制...

In [22]:

'''
将指定向量存储类,创建完成后，我们将从加载器中调用,通过文档记载器列表加载
'''
index = VectorstoreIndexCreator(
    embedding=embeddings,
    vectorstore_cls=DocArrayInMemorySearch
).from_loaders([loader])

In [23]:

#通过指定语言模型、链类型、检索器和我们要打印的详细程度来创建检索QA链
llm = ChatOpenAI(temperature = 0.0)
qa = RetrievalQA.from_chain_type(
    llm=llm, 
    chain_type="stuff", 
    retriever=index.vectorstore.as_retriever(), 
    verbose=True,
    chain_type_kwargs = {
        "document_separator": "<<<<>>>>>"
    }
)

创建评估数据点¶

我们需要做的第一件事是真正弄清楚我们想要评估它的一些数据点，我们将介绍几种不同的方法来完成这个任务

1、将自己想出好的数据点作为例子，查看一些数据，然后想出例子问题和答案，以便以后用于评估

In [24]:

data[10]#查看这里的一些文档，我们可以对其中发生的事情有所了解

Out[24]:

Document(page_content="product_name: 高清电视机\ndescription: 规格:\n尺寸：50''。\n\n为什么我们热爱它:\n我们的高清电视机拥有出色的画质和强大的音效，带来沉浸式的观看体验。\n\n材质与护理:\n使用干布清洁。\n\n构造:\n由塑料、金属和电子元件制成。\n\n其他特性:\n支持网络连接，可以在线观看视频。\n配备遥控器。\n在韩国制造。\n\n有问题？请随时联系我们的客户服务团队，他们会解答您的所有问题。", metadata={'source': './data/product_data.csv', 'row': 10})

In [25]:

data[11]

Out[25]:

Document(page_content="product_name: 旅行背包\ndescription: 规格:\n尺寸：18'' x 12'' x 6''。\n\n为什么我们热爱它:\n我们的旅行背包拥有多个实用的内外袋，轻松装下您的必需品，是短途旅行的理想选择。\n\n材质与护理:\n可以手洗，自然晾干。\n\n构造:\n由防水尼龙制成。\n\n其他特性:\n附带可调节背带和安全锁。\n在中国制造。\n\n有问题？请随时联系我们的客户服务团队，他们会解答您的所有问题。", metadata={'source': './data/product_data.csv', 'row': 11})

看上面的第一个文档中有高清电视机，第二个文档中有旅行背包，从这些细节中，我们可以创建一些例子查询和答案

创建测试用例数据¶

In [26]:

examples = [
    {
        "query": "高清电视机怎么进行护理？",
        "answer": "使用干布清洁。"
    },
    {
        "query": "旅行背包有内外袋吗？",
        "answer": "有。"
    }
]

通过LLM生成测试用例¶

In [27]:

from langchain.evaluation.qa import QAGenerateChain #导入QA生成链，它将接收文档，并从每个文档中创建一个问题答案对

由于QAGenerateChain类中使用的PROMPT是英文，故我们继承QAGenerateChain类，将PROMPT加上“请使用中文输出”。

下面是generate_chain.py文件中的QAGenerateChain类的源码

In [28]:

"""LLM Chain specifically for generating examples for question answering."""
from __future__ import annotations

from typing import Any

from langchain.base_language import BaseLanguageModel
from langchain.chains.llm import LLMChain
from langchain.evaluation.qa.generate_prompt import PROMPT

class QAGenerateChain(LLMChain):
    """LLM Chain specifically for generating examples for question answering."""

    @classmethod
    def from_llm(cls, llm: BaseLanguageModel, **kwargs: Any) -> QAGenerateChain:
        """Load QA Generate Chain from LLM."""
        return cls(llm=llm, prompt=PROMPT, **kwargs)

In [29]:

PROMPT

Out[29]:

PromptTemplate(input_variables=['doc'], template='You are a teacher coming up with questions to ask on a quiz. \nGiven the following document, please generate a question and answer based on that document.\n\nExample Format:\n<Begin Document>\n...\n<End Document>\nQUESTION: question here\nANSWER: answer here\n\nThese questions should be detailed and be based explicitly on information in the document. Begin!\n\n<Begin Document>\n{doc}\n<End Document>')

我们可以看到PROMPT为英文，下面我们将PROMPT添加上“请使用中文输出”

In [30]:

# 下面是langchain.evaluation.qa.generate_prompt中的源码，我们在template的最后加上“请使用中文输出”
# flake8: noqa
from langchain.output_parsers.regex import RegexParser
from langchain.prompts import PromptTemplate

template = """You are a teacher coming up with questions to ask on a quiz. 
Given the following document, please generate a question and answer based on that document.

Example Format:
<Begin Document>
...
<End Document>
QUESTION: question here
ANSWER: answer here

These questions should be detailed and be based explicitly on information in the document. Begin!

<Begin Document>
{doc}
<End Document>
请使用中文输出。
"""
output_parser = RegexParser(
    regex=r"QUESTION: (.*?)\nANSWER: (.*)", output_keys=["query", "answer"]
)
PROMPT = PromptTemplate(
    input_variables=["doc"], template=template, output_parser=output_parser
)

PROMPT

Out[30]:

PromptTemplate(input_variables=['doc'], output_parser=RegexParser(regex='QUESTION: (.*?)\\nANSWER: (.*)', output_keys=['query', 'answer']), template='You are a teacher coming up with questions to ask on a quiz. \nGiven the following document, please generate a question and answer based on that document.\n\nExample Format:\n<Begin Document>\n...\n<End Document>\nQUESTION: question here\nANSWER: answer here\n\nThese questions should be detailed and be based explicitly on information in the document. Begin!\n\n<Begin Document>\n{doc}\n<End Document>\n请使用中文输出。\n')

In [31]:

# 继承QAGenerateChain
class MyQAGenerateChain(QAGenerateChain):
    """LLM Chain specifically for generating examples for question answering."""

    @classmethod
    def from_llm(cls, llm: BaseLanguageModel, **kwargs: Any) -> QAGenerateChain:
        """Load QA Generate Chain from LLM."""
        return cls(llm=llm, prompt=PROMPT, **kwargs)

In [32]:

example_gen_chain = MyQAGenerateChain.from_llm(ChatOpenAI())#通过传递chat open AI语言模型来创建这个链

In [33]:

data[:5]

Out[33]:

[Document(page_content="product_name: 全自动咖啡机\ndescription: 规格:\n大型 - 尺寸：13.8'' x 17.3''。\n中型 - 尺寸：11.5'' x 15.2''。\n\n为什么我们热爱它:\n这款全自动咖啡机是爱好者的理想选择。 一键操作，即可研磨豆子并沏制出您喜爱的咖啡。它的耐用性和一致性使它成为家庭和办公室的理想选择。\n\n材质与护理:\n清洁时只需轻擦。\n\n构造:\n由高品质不锈钢制成。\n\n其他特性:\n内置研磨器和滤网。\n预设多种咖啡模式。\n在中国制造。\n\n有问题？ 请随时联系我们的客户服务团队，他们会解答您的所有问题。", metadata={'source': './data/product_data.csv', 'row': 0}),
 Document(page_content="product_name: 电动牙刷\ndescription: 规格:\n一般大小 - 高度：9.5''，宽度：1''。\n\n为什么我们热爱它:\n我们的电动牙刷采用先进的刷头设计和强大的电机，为您提供超凡的清洁力和舒适的刷牙体验。\n\n材质与护理:\n不可水洗，只需用湿布清洁。\n\n构造:\n由食品级塑料和尼龙刷毛制成。\n\n其他特性:\n具有多种清洁模式和定时功能。\nUSB充电。\n在日本制造。\n\n有问题？请随时联系我们的客户服务团队，他们会解答您的所有问题。", metadata={'source': './data/product_data.csv', 'row': 1}),
 Document(page_content='product_name: 橙味维生素C泡腾片\ndescription: 规格:\n每盒含有20片。\n\n为什么我们热爱它:\n我们的橙味维生素C泡腾片是快速补充维生素C的理想方式。每片含有500mg的维生素C，可以帮助提升免疫力，保护您的健康。\n\n材质与护理:\n请存放在阴凉干燥的地方，避免阳光直射。\n\n构造:\n主要成分为维生素C和柠檬酸钠。\n\n其他特性:\n含有天然橙味。\n易于携带。\n在美国制造。\n\n有问题？请随时联系我们的客户服务团队，他们会解答您的所有问题。', metadata={'source': './data/product_data.csv', 'row': 2}),
 Document(page_content="product_name: 无线蓝牙耳机\ndescription: 规格:\n单个耳机尺寸：1.5'' x 1.3''。\n\n为什么我们热爱它:\n这款无线蓝牙耳机配备了降噪技术和长达8小时的电池续航力，让您无论在哪里都可以享受无障碍的音乐体验。\n\n材质与护理:\n只需用湿布清洁。\n\n构造:\n由耐用的塑料和金属构成，配备有软质耳塞。\n\n其他特性:\n快速充电功能。\n内置麦克风，支持接听电话。\n在韩国制造。\n\n有问题？请随时联系我们的客户服务团队，他们会解答您的所有问题。", metadata={'source': './data/product_data.csv', 'row': 3}),
 Document(page_content="product_name: 瑜伽垫\ndescription: 规格:\n尺寸：24'' x 68''。\n\n为什么我们热爱它:\n我们的瑜伽垫拥有出色的抓地力和舒适度，无论是做瑜伽还是健身，都是理想的选择。\n\n材质与护理:\n可用清水清洁，自然晾干。\n\n构造:\n由环保PVC材料制成。\n\n其他特性:\n附带便携包和绑带。\n在印度制造。\n\n有问题？请随时联系我们的客户服务团队，他们会解答您的所有问题。", metadata={'source': './data/product_data.csv', 'row': 4})]

In [34]:

new_examples = example_gen_chain.apply_and_parse(
    [{"doc": t} for t in data[:5]]
) #我们可以创建许多例子

/Users/lta/anaconda3/envs/cookbook/lib/python3.10/site-packages/langchain/chains/llm.py:367: UserWarning: The apply_and_parse method is deprecated, instead pass an output parser directly to LLMChain.
  warnings.warn(

In [35]:

new_examples #查看用例数据

Out[35]:

[{'query': '这款全自动咖啡机的材质是什么？它的构造是什么？',
  'answer': '这款全自动咖啡机由高品质不锈钢制成，构造包括内置研磨器和滤网。'},
 {'query': '这款电动牙刷的材质是什么？它是由什么制成的？', 'answer': '这款电动牙刷由食品级塑料和尼龙刷毛制成。'},
 {'query': '这种产品的主要成分是什么？', 'answer': '主要成分为维生素C和柠檬酸钠。'},
 {'query': '这款无线蓝牙耳机具有哪些特性和优势？',
  'answer': '这款无线蓝牙耳机配备了降噪技术和长达8小时的电池续航力，让您无论在哪里都可以享受无障碍的音乐体验。它由耐用的塑料和金属构成，配备有软质耳塞。此外，它还具有快速充电功能、内置麦克风以及支持接听电话的功能。'},
 {'query': '这款瑜伽垫的尺寸是多少？', 'answer': "尺寸为24'' x 68''。"}]

In [36]:

new_examples[0]

Out[36]:

{'query': '这款全自动咖啡机的材质是什么？它的构造是什么？',
 'answer': '这款全自动咖啡机由高品质不锈钢制成，构造包括内置研磨器和滤网。'}

In [37]:

data[0]

Out[37]:

Document(page_content="product_name: 全自动咖啡机\ndescription: 规格:\n大型 - 尺寸：13.8'' x 17.3''。\n中型 - 尺寸：11.5'' x 15.2''。\n\n为什么我们热爱它:\n这款全自动咖啡机是爱好者的理想选择。 一键操作，即可研磨豆子并沏制出您喜爱的咖啡。它的耐用性和一致性使它成为家庭和办公室的理想选择。\n\n材质与护理:\n清洁时只需轻擦。\n\n构造:\n由高品质不锈钢制成。\n\n其他特性:\n内置研磨器和滤网。\n预设多种咖啡模式。\n在中国制造。\n\n有问题？ 请随时联系我们的客户服务团队，他们会解答您的所有问题。", metadata={'source': './data/product_data.csv', 'row': 0})

组合用例数据¶

In [38]:

examples += new_examples

In [39]:

qa.run(examples[0]["query"])


> Entering new RetrievalQA chain...

> Finished chain.

Out[39]:

'高清电视机的护理方法是使用干布清洁。所以您可以使用干布轻轻擦拭电视机表面，避免使用湿布或清洁剂来清洁高清电视机。'

三、人工评估¶

现在有了这些示例，但是我们如何评估正在发生的事情呢？通过运行一个示例通过链，并查看它产生的输出在这里我们传递一个查询，然后我们得到一个答案。实际上正在发生的事情，进入语言模型的实际提示是什么？
它检索的文档是什么？
中间结果是什么？
仅仅查看最终答案通常不足以了解链中出现了什么问题或可能出现了什么问题

In [40]:

''' 
LingChainDebug工具可以了解运行一个实例通过链中间所经历的步骤
'''
import langchain
langchain.debug = True

In [41]:

qa.run(examples[0]["query"])#重新运行与上面相同的示例，可以看到它开始打印出更多的信息

[chain/start] [chain:RetrievalQA] Entering Chain run with input:
{
  "query": "高清电视机怎么进行护理？"
}
[chain/start] [chain:RetrievalQA > chain:StuffDocumentsChain] Entering Chain run with input:
[inputs]
[chain/start] [chain:RetrievalQA > chain:StuffDocumentsChain > chain:LLMChain] Entering Chain run with input:
{
  "question": "高清电视机怎么进行护理？",
  "context": "product_name: 高清电视机\ndescription: 规格:\n尺寸：50''。\n\n为什么我们热爱它:\n我们的高清电视机拥有出色的画质和强大的音效，带来沉浸式的观看体验。\n\n材质与护理:\n使用干布清洁。\n\n构造:\n由塑料、金属和电子元件制成。\n\n其他特性:\n支持网络连接，可以在线观看视频。\n配备遥控器。\n在韩国制造。\n\n有问题？请随时联系我们的客户服务团队，他们会解答您的所有问题。<<<<>>>>>product_name: 摄像头保护套\ndescription: 规格:\n适用于各种品牌和型号的摄像头。\n\n为什么我们热爱它:\n我们的摄像头保护套可以有效保护您的摄像头不受灰尘和刮擦的影响。\n\n材质与护理:\n使用湿布擦拭。\n\n构造:\n由耐磨的合成纤维制成。\n\n其他特性:\n设计简洁，易于安装。\n在中国制造。\n\n有问题？请随时联系我们的客户服务团队，他们会解答您的所有问题。<<<<>>>>>product_name: 空气净化器\ndescription: 规格:\n尺寸：15'' x 15'' x 20''。\n\n为什么我们热爱它:\n我们的空气净化器采用了先进的HEPA过滤技术，能有效去除空气中的微粒和异味，为您提供清新的室内环境。\n\n材质与护理:\n清洁时使用干布擦拭。\n\n构造:\n由塑料和电子元件制成。\n\n其他特性:\n三档风速，附带定时功能。\n在德国制造。\n\n有问题？请随时联系我们的客户服务团队，他们会解答您的所有问题。<<<<>>>>>product_name: 宠物自动喂食器\ndescription: 规格:\n尺寸：14'' x 9'' x 15''。\n\n为什么我们热爱它:\n我们的宠物自动喂食器可以定时定量投放食物，让您无论在家或外出都能确保宠物的饮食。\n\n材质与护理:\n可用湿布清洁。\n\n构造:\n由塑料和电子元件制成。\n\n其他特性:\n配备LCD屏幕，操作简单。\n可以设置多次投食。\n在美国制造。\n\n有问题？请随时联系我们的客户服务团队，他们会解答您的所有问题。"
}
[llm/start] [chain:RetrievalQA > chain:StuffDocumentsChain > chain:LLMChain > llm:ChatOpenAI] Entering LLM run with input:
{
  "prompts": [
    "System: Use the following pieces of context to answer the user's question. \nIf you don't know the answer, just say that you don't know, don't try to make up an answer.\n----------------\nproduct_name: 高清电视机\ndescription: 规格:\n尺寸：50''。\n\n为什么我们热爱它:\n我们的高清电视机拥有出色的画质和强大的音效，带来沉浸式的观看体验。\n\n材质与护理:\n使用干布清洁。\n\n构造:\n由塑料、金属和电子元件制成。\n\n其他特性:\n支持网络连接，可以在线观看视频。\n配备遥控器。\n在韩国制造。\n\n有问题？请随时联系我们的客户服务团队，他们会解答您的所有问题。<<<<>>>>>product_name: 摄像头保护套\ndescription: 规格:\n适用于各种品牌和型号的摄像头。\n\n为什么我们热爱它:\n我们的摄像头保护套可以有效保护您的摄像头不受灰尘和刮擦的影响。\n\n材质与护理:\n使用湿布擦拭。\n\n构造:\n由耐磨的合成纤维制成。\n\n其他特性:\n设计简洁，易于安装。\n在中国制造。\n\n有问题？请随时联系我们的客户服务团队，他们会解答您的所有问题。<<<<>>>>>product_name: 空气净化器\ndescription: 规格:\n尺寸：15'' x 15'' x 20''。\n\n为什么我们热爱它:\n我们的空气净化器采用了先进的HEPA过滤技术，能有效去除空气中的微粒和异味，为您提供清新的室内环境。\n\n材质与护理:\n清洁时使用干布擦拭。\n\n构造:\n由塑料和电子元件制成。\n\n其他特性:\n三档风速，附带定时功能。\n在德国制造。\n\n有问题？请随时联系我们的客户服务团队，他们会解答您的所有问题。<<<<>>>>>product_name: 宠物自动喂食器\ndescription: 规格:\n尺寸：14'' x 9'' x 15''。\n\n为什么我们热爱它:\n我们的宠物自动喂食器可以定时定量投放食物，让您无论在家或外出都能确保宠物的饮食。\n\n材质与护理:\n可用湿布清洁。\n\n构造:\n由塑料和电子元件制成。\n\n其他特性:\n配备LCD屏幕，操作简单。\n可以设置多次投食。\n在美国制造。\n\n有问题？请随时联系我们的客户服务团队，他们会解答您的所有问题。\nHuman: 高清电视机怎么进行护理？"
  ]
}
[llm/end] [chain:RetrievalQA > chain:StuffDocumentsChain > chain:LLMChain > llm:ChatOpenAI] [2.23s] Exiting LLM run with output:
{
  "generations": [
    [
      {
        "text": "高清电视机的护理方法是使用干布清洁。",
        "generation_info": {
          "finish_reason": "stop",
          "logprobs": null
        },
        "type": "ChatGeneration",
        "message": {
          "lc": 1,
          "type": "constructor",
          "id": [
            "langchain",
            "schema",
            "messages",
            "AIMessage"
          ],
          "kwargs": {
            "content": "高清电视机的护理方法是使用干布清洁。",
            "response_metadata": {
              "token_usage": {
                "completion_tokens": 19,
                "prompt_tokens": 806,
                "total_tokens": 825
              },
              "model_name": "gpt-3.5-turbo",
              "system_fingerprint": null,
              "finish_reason": "stop",
              "logprobs": null
            },
            "type": "ai",
            "id": "run-d773d96f-cd9b-454b-a0f6-72e69dd90218-0",
            "tool_calls": [],
            "invalid_tool_calls": []
          }
        }
      }
    ]
  ],
  "llm_output": {
    "token_usage": {
      "completion_tokens": 19,
      "prompt_tokens": 806,
      "total_tokens": 825
    },
    "model_name": "gpt-3.5-turbo",
    "system_fingerprint": null
  },
  "run": null
}
[chain/end] [chain:RetrievalQA > chain:StuffDocumentsChain > chain:LLMChain] [2.24s] Exiting Chain run with output:
{
  "text": "高清电视机的护理方法是使用干布清洁。"
}
[chain/end] [chain:RetrievalQA > chain:StuffDocumentsChain] [2.24s] Exiting Chain run with output:
{
  "output_text": "高清电视机的护理方法是使用干布清洁。"
}
[chain/end] [chain:RetrievalQA] [3.15s] Exiting Chain run with output:
{
  "result": "高清电视机的护理方法是使用干布清洁。"
}

Out[41]:

'高清电视机的护理方法是使用干布清洁。'

我们可以看到它首先深入到检索QA链中，然后它进入了一些文档链。如上所述，我们正在使用stuff方法，现在我们正在传递这个上下文，可以看到，这个上下文是由我们检索到的不同文档创建的。因此，在进行问答时，当返回错误结果时，通常不是语言模型本身出错了，实际上是检索步骤出错了，仔细查看问题的确切内容和上下文可以帮助调试出错的原因。
然后，我们可以再向下一级，看看进入语言模型的确切内容，以及 OpenAI 自身，在这里，我们可以看到传递的完整提示，我们有一个系统消息，有所使用的提示的描述，这是问题回答链使用的提示，我们可以看到提示打印出来，使用以下上下文片段回答用户的问题。如果您不知道答案，只需说您不知道即可，不要试图编造答案。然后我们看到一堆之前插入的上下文，我们还可以看到有关实际返回类型的更多信息。我们不仅仅返回一个答案，还有token的使用情况，可以了解到token数的使用情况

由于这是一个相对简单的链，我们现在可以看到最终的响应，舒适的毛衣套装，条纹款，有侧袋，正在起泡，通过链返回给用户，我们刚刚讲解了如何查看和调试单个输入到该链的情况。

3.1 如何评估新创建的实例¶

与创建它们类似，可以运行链条来处理所有示例，然后查看输出并尝试弄清楚，发生了什么，它是否正确

In [42]:

# 我们需要为所有示例创建预测，关闭调试模式，以便不将所有内容打印到屏幕上
langchain.debug = False

3.2 中文版¶

In [43]:

''' 
LingChainDebug工具可以了解运行一个实例通过链中间所经历的步骤
'''
import langchain
langchain.debug = True

In [44]:

qa.run(examples[0]["query"])#重新运行与上面相同的示例，可以看到它开始打印出更多的信息

[chain/start] [chain:RetrievalQA] Entering Chain run with input:
{
  "query": "高清电视机怎么进行护理？"
}
[chain/start] [chain:RetrievalQA > chain:StuffDocumentsChain] Entering Chain run with input:
[inputs]
[chain/start] [chain:RetrievalQA > chain:StuffDocumentsChain > chain:LLMChain] Entering Chain run with input:
{
  "question": "高清电视机怎么进行护理？",
  "context": "product_name: 高清电视机\ndescription: 规格:\n尺寸：50''。\n\n为什么我们热爱它:\n我们的高清电视机拥有出色的画质和强大的音效，带来沉浸式的观看体验。\n\n材质与护理:\n使用干布清洁。\n\n构造:\n由塑料、金属和电子元件制成。\n\n其他特性:\n支持网络连接，可以在线观看视频。\n配备遥控器。\n在韩国制造。\n\n有问题？请随时联系我们的客户服务团队，他们会解答您的所有问题。<<<<>>>>>product_name: 摄像头保护套\ndescription: 规格:\n适用于各种品牌和型号的摄像头。\n\n为什么我们热爱它:\n我们的摄像头保护套可以有效保护您的摄像头不受灰尘和刮擦的影响。\n\n材质与护理:\n使用湿布擦拭。\n\n构造:\n由耐磨的合成纤维制成。\n\n其他特性:\n设计简洁，易于安装。\n在中国制造。\n\n有问题？请随时联系我们的客户服务团队，他们会解答您的所有问题。<<<<>>>>>product_name: 空气净化器\ndescription: 规格:\n尺寸：15'' x 15'' x 20''。\n\n为什么我们热爱它:\n我们的空气净化器采用了先进的HEPA过滤技术，能有效去除空气中的微粒和异味，为您提供清新的室内环境。\n\n材质与护理:\n清洁时使用干布擦拭。\n\n构造:\n由塑料和电子元件制成。\n\n其他特性:\n三档风速，附带定时功能。\n在德国制造。\n\n有问题？请随时联系我们的客户服务团队，他们会解答您的所有问题。<<<<>>>>>product_name: 宠物自动喂食器\ndescription: 规格:\n尺寸：14'' x 9'' x 15''。\n\n为什么我们热爱它:\n我们的宠物自动喂食器可以定时定量投放食物，让您无论在家或外出都能确保宠物的饮食。\n\n材质与护理:\n可用湿布清洁。\n\n构造:\n由塑料和电子元件制成。\n\n其他特性:\n配备LCD屏幕，操作简单。\n可以设置多次投食。\n在美国制造。\n\n有问题？请随时联系我们的客户服务团队，他们会解答您的所有问题。"
}
[llm/start] [chain:RetrievalQA > chain:StuffDocumentsChain > chain:LLMChain > llm:ChatOpenAI] Entering LLM run with input:
{
  "prompts": [
    "System: Use the following pieces of context to answer the user's question. \nIf you don't know the answer, just say that you don't know, don't try to make up an answer.\n----------------\nproduct_name: 高清电视机\ndescription: 规格:\n尺寸：50''。\n\n为什么我们热爱它:\n我们的高清电视机拥有出色的画质和强大的音效，带来沉浸式的观看体验。\n\n材质与护理:\n使用干布清洁。\n\n构造:\n由塑料、金属和电子元件制成。\n\n其他特性:\n支持网络连接，可以在线观看视频。\n配备遥控器。\n在韩国制造。\n\n有问题？请随时联系我们的客户服务团队，他们会解答您的所有问题。<<<<>>>>>product_name: 摄像头保护套\ndescription: 规格:\n适用于各种品牌和型号的摄像头。\n\n为什么我们热爱它:\n我们的摄像头保护套可以有效保护您的摄像头不受灰尘和刮擦的影响。\n\n材质与护理:\n使用湿布擦拭。\n\n构造:\n由耐磨的合成纤维制成。\n\n其他特性:\n设计简洁，易于安装。\n在中国制造。\n\n有问题？请随时联系我们的客户服务团队，他们会解答您的所有问题。<<<<>>>>>product_name: 空气净化器\ndescription: 规格:\n尺寸：15'' x 15'' x 20''。\n\n为什么我们热爱它:\n我们的空气净化器采用了先进的HEPA过滤技术，能有效去除空气中的微粒和异味，为您提供清新的室内环境。\n\n材质与护理:\n清洁时使用干布擦拭。\n\n构造:\n由塑料和电子元件制成。\n\n其他特性:\n三档风速，附带定时功能。\n在德国制造。\n\n有问题？请随时联系我们的客户服务团队，他们会解答您的所有问题。<<<<>>>>>product_name: 宠物自动喂食器\ndescription: 规格:\n尺寸：14'' x 9'' x 15''。\n\n为什么我们热爱它:\n我们的宠物自动喂食器可以定时定量投放食物，让您无论在家或外出都能确保宠物的饮食。\n\n材质与护理:\n可用湿布清洁。\n\n构造:\n由塑料和电子元件制成。\n\n其他特性:\n配备LCD屏幕，操作简单。\n可以设置多次投食。\n在美国制造。\n\n有问题？请随时联系我们的客户服务团队，他们会解答您的所有问题。\nHuman: 高清电视机怎么进行护理？"
  ]
}
[llm/end] [chain:RetrievalQA > chain:StuffDocumentsChain > chain:LLMChain > llm:ChatOpenAI] [1.98s] Exiting LLM run with output:
{
  "generations": [
    [
      {
        "text": "高清电视机的护理方法是使用干布清洁。所以您可以使用干布轻轻擦拭电视机表面，避免使用湿布或清洁剂来清洁高清电视机。",
        "generation_info": {
          "finish_reason": "stop",
          "logprobs": null
        },
        "type": "ChatGeneration",
        "message": {
          "lc": 1,
          "type": "constructor",
          "id": [
            "langchain",
            "schema",
            "messages",
            "AIMessage"
          ],
          "kwargs": {
            "content": "高清电视机的护理方法是使用干布清洁。所以您可以使用干布轻轻擦拭电视机表面，避免使用湿布或清洁剂来清洁高清电视机。",
            "response_metadata": {
              "token_usage": {
                "completion_tokens": 66,
                "prompt_tokens": 806,
                "total_tokens": 872
              },
              "model_name": "gpt-3.5-turbo",
              "system_fingerprint": null,
              "finish_reason": "stop",
              "logprobs": null
            },
            "type": "ai",
            "id": "run-828d71a4-f96b-4c12-a3f3-a663ff915249-0",
            "tool_calls": [],
            "invalid_tool_calls": []
          }
        }
      }
    ]
  ],
  "llm_output": {
    "token_usage": {
      "completion_tokens": 66,
      "prompt_tokens": 806,
      "total_tokens": 872
    },
    "model_name": "gpt-3.5-turbo",
    "system_fingerprint": null
  },
  "run": null
}
[chain/end] [chain:RetrievalQA > chain:StuffDocumentsChain > chain:LLMChain] [1.99s] Exiting Chain run with output:
{
  "text": "高清电视机的护理方法是使用干布清洁。所以您可以使用干布轻轻擦拭电视机表面，避免使用湿布或清洁剂来清洁高清电视机。"
}
[chain/end] [chain:RetrievalQA > chain:StuffDocumentsChain] [1.99s] Exiting Chain run with output:
{
  "output_text": "高清电视机的护理方法是使用干布清洁。所以您可以使用干布轻轻擦拭电视机表面，避免使用湿布或清洁剂来清洁高清电视机。"
}
[chain/end] [chain:RetrievalQA] [2.36s] Exiting Chain run with output:
{
  "result": "高清电视机的护理方法是使用干布清洁。所以您可以使用干布轻轻擦拭电视机表面，避免使用湿布或清洁剂来清洁高清电视机。"
}

Out[44]:

'高清电视机的护理方法是使用干布清洁。所以您可以使用干布轻轻擦拭电视机表面，避免使用湿布或清洁剂来清洁高清电视机。'

如何评估新创建的实例¶

与创建它们类似，可以运行链条来处理所有示例，然后查看输出并尝试弄清楚，发生了什么，它是否正确

In [45]:

# 我们需要为所有示例创建预测，关闭调试模式，以便不将所有内容打印到屏幕上
langchain.debug = False

四、通过LLM进行评估实例¶

In [46]:

predictions = qa.apply(examples) #为所有不同的示例创建预测


> Entering new RetrievalQA chain...

/Users/lta/anaconda3/envs/cookbook/lib/python3.10/site-packages/langchain_core/_api/deprecation.py:119: LangChainDeprecationWarning: The method `Chain.apply` was deprecated in langchain 0.1.0 and will be removed in 0.3.0. Use batch instead.
  warn_deprecated(

> Finished chain.


> Entering new RetrievalQA chain...

> Finished chain.


> Entering new RetrievalQA chain...

> Finished chain.


> Entering new RetrievalQA chain...

> Finished chain.


> Entering new RetrievalQA chain...

> Finished chain.


> Entering new RetrievalQA chain...

> Finished chain.


> Entering new RetrievalQA chain...

> Finished chain.

In [47]:

''' 
对预测的结果进行评估，导入QA问题回答，评估链，通过语言模型创建此链
'''
from langchain.evaluation.qa import QAEvalChain #导入QA问题回答，评估链

In [48]:

#通过调用chatGPT进行评估
llm = ChatOpenAI(temperature=0)
eval_chain = QAEvalChain.from_llm(llm)

In [49]:

graded_outputs = eval_chain.evaluate(examples, predictions)#在此链上调用evaluate，进行评估

4.1 评估思路¶

当它面前有整个文档时，它可以生成一个真实的答案，我们将打印出预测的答，当它进行QA链时，使用embedding和向量数据库进行检索时，将其传递到语言模型中，然后尝试猜测预测的答案，我们还将打印出成绩，这也是语言模型生成的。当它要求评估链评估正在发生的事情时，以及它是否正确或不正确。因此，当我们循环遍历所有这些示例并将它们打印出来时，可以详细了解每个示例

In [54]:

#我们将传入示例和预测，得到一堆分级输出，循环遍历它们打印答案
for i, eg in enumerate(examples):
    print(f"例 {i}:")
    print("问题: " + predictions[i]['query'])
    print("真实答案: " + predictions[i]['answer'])
    print("预测答案: " + predictions[i]['result'])
    print("预测成绩: " + graded_outputs[i]['results'])
    print()

例 0:
问题: 高清电视机怎么进行护理？
真实答案: 使用干布清洁。
预测答案: 高清电视机的护理方法是使用干布清洁。您可以使用干布轻轻擦拭电视机表面，避免使用湿布或清洁剂直接清洁，以免损坏电视机。
预测成绩: CORRECT

例 1:
问题: 旅行背包有内外袋吗？
真实答案: 有。
预测答案: 是的，旅行背包有多个实用的内外袋，可以轻松装下您的必需品，是短途旅行的理想选择。
预测成绩: CORRECT

例 2:
问题: 这款全自动咖啡机的材质是什么？它的构造是什么？
真实答案: 这款全自动咖啡机由高品质不锈钢制成，构造包括内置研磨器和滤网。
预测答案: 这款全自动咖啡机的材质是高品质不锈钢制成的。它的构造包括内置研磨器和滤网。
预测成绩: CORRECT

例 3:
问题: 这款电动牙刷的材质是什么？它是由什么制成的？
真实答案: 这款电动牙刷由食品级塑料和尼龙刷毛制成。
预测答案: 这款电动牙刷的材质是食品级塑料和尼龙刷毛制成的。
预测成绩: CORRECT

例 4:
问题: 这种产品的主要成分是什么？
真实答案: 主要成分为维生素C和柠檬酸钠。
预测答案: 橙味维生素C泡腾片的主要成分是维生素C和柠檬酸钠。
预测成绩: CORRECT

例 5:
问题: 这款无线蓝牙耳机具有哪些特性和优势？
真实答案: 这款无线蓝牙耳机配备了降噪技术和长达8小时的电池续航力，让您无论在哪里都可以享受无障碍的音乐体验。它由耐用的塑料和金属构成，配备有软质耳塞。此外，它还具有快速充电功能、内置麦克风以及支持接听电话的功能。
预测答案: 这款无线蓝牙耳机具有以下特性和优势：
1. 配备降噪技术，提供无障碍的音乐体验。
2. 长达8小时的电池续航力，让您可以长时间享受音乐。
3. 快速充电功能，方便快速充电。
4. 内置麦克风，支持接听电话。
5. 由耐用的塑料和金属构成，配备软质耳塞，提供舒适的佩戴体验。
6. 适合日常使用，制造于韩国。
预测成绩: CORRECT

例 6:
问题: 这款瑜伽垫的尺寸是多少？
真实答案: 尺寸为24'' x 68''。
预测答案: 这款瑜伽垫的尺寸是 24'' x 68''。
预测成绩: CORRECT

4.2 结果分析¶

对于每个示例，它看起来都是正确的，让我们看看第一个例子。这里的问题是，舒适的套头衫套装，有侧口袋吗？真正的答案，我们创建了这个，是肯定的。模型预测的答案是舒适的套头衫套装条纹，确实有侧口袋。因此，我们可以理解这是一个正确的答案。它将其评为正确。

使用模型评估的优势¶

你有这些答案，它们是任意的字符串。没有单一的真实字符串是最好的可能答案，有许多不同的变体，只要它们具有相同的语义，它们应该被评为相似。如果使用正则进行精准匹配就会丢失语义信息，到目前为止存在的许多评估指标都不够好。目前最有趣和最受欢迎的之一就是使用语言模型进行评估。

3.3 通过LLM进行评估实例¶

In [55]:

predictions = qa.apply(examples) #为所有不同的示例创建预测


> Entering new RetrievalQA chain...

> Finished chain.


> Entering new RetrievalQA chain...

> Finished chain.


> Entering new RetrievalQA chain...

> Finished chain.


> Entering new RetrievalQA chain...

> Finished chain.


> Entering new RetrievalQA chain...

> Finished chain.


> Entering new RetrievalQA chain...

> Finished chain.


> Entering new RetrievalQA chain...

> Finished chain.

In [56]:

''' 
对预测的结果进行评估，导入QA问题回答，评估链，通过语言模型创建此链
'''
from langchain.evaluation.qa import QAEvalChain #导入QA问题回答，评估链

In [57]:

#通过调用chatGPT进行评估
llm = ChatOpenAI(temperature=0)
eval_chain = QAEvalChain.from_llm(llm)

In [58]:

graded_outputs = eval_chain.evaluate(examples, predictions)#在此链上调用evaluate，进行评估

评估思路¶

In [60]:

#我们将传入示例和预测，得到一堆分级输出，循环遍历它们打印答案
for i, eg in enumerate(examples):
    print(f"Example {i}:")
    print("Question: " + predictions[i]['query'])
    print("Real Answer: " + predictions[i]['answer'])
    print("Predicted Answer: " + predictions[i]['result'])
    print("Predicted Grade: " + graded_outputs[i]['results'])
    print()

Example 0:
Question: 高清电视机怎么进行护理？
Real Answer: 使用干布清洁。
Predicted Answer: 高清电视机的护理方法是使用干布清洁。所以您可以使用干布轻轻擦拭电视机表面，避免使用湿布或化学清洁剂，以免损坏电视机。
Predicted Grade: CORRECT

Example 1:
Question: 旅行背包有内外袋吗？
Real Answer: 有。
Predicted Answer: 是的，旅行背包有多个实用的内外袋，可以轻松装下您的必需品，是短途旅行的理想选择。
Predicted Grade: CORRECT

Example 2:
Question: 这款全自动咖啡机的材质是什么？它的构造是什么？
Real Answer: 这款全自动咖啡机由高品质不锈钢制成，构造包括内置研磨器和滤网。
Predicted Answer: 这款全自动咖啡机的材质是高品质不锈钢制成的，构造包括内置研磨器和滤网。
Predicted Grade: CORRECT

Example 3:
Question: 这款电动牙刷的材质是什么？它是由什么制成的？
Real Answer: 这款电动牙刷由食品级塑料和尼龙刷毛制成。
Predicted Answer: 这款电动牙刷由食品级塑料和尼龙刷毛制成。
Predicted Grade: CORRECT

Example 4:
Question: 这种产品的主要成分是什么？
Real Answer: 主要成分为维生素C和柠檬酸钠。
Predicted Answer: 橙味维生素C泡腾片的主要成分是维生素C和柠檬酸钠。
Predicted Grade: CORRECT

Example 5:
Question: 这款无线蓝牙耳机具有哪些特性和优势？
Real Answer: 这款无线蓝牙耳机配备了降噪技术和长达8小时的电池续航力，让您无论在哪里都可以享受无障碍的音乐体验。它由耐用的塑料和金属构成，配备有软质耳塞。此外，它还具有快速充电功能、内置麦克风以及支持接听电话的功能。
Predicted Answer: 这款无线蓝牙耳机具有以下特性和优势：
1. 配备降噪技术，提供更清晰的音质和音乐体验。
2. 长达8小时的电池续航力，让您可以持续享受音乐，无需频繁充电。
3. 快速充电功能，方便快速充电。
4. 内置麦克风，支持接听电话，增加了使用的便利性。
5. 由耐用的塑料和金属构成，配备软质耳塞，提供舒适的佩戴体验。
6. 只需用湿布清洁，方便日常护理。
7. 在韩国制造，品质有保证。
Predicted Grade: CORRECT

Example 6:
Question: 这款瑜伽垫的尺寸是多少？
Real Answer: 尺寸为24'' x 68''。
Predicted Answer: 这款瑜伽垫的尺寸是 24'' x 68''。
Predicted Grade: CORRECT

结果分析¶

对于每个示例，它看起来都是正确的，让我们看看第一个例子。这里的问题是，旅行背包有内外袋吗？真正的答案，我们创建了这个，是肯定的。模型预测的答案是是的，旅行背包有多个实用的内外袋，可以轻松装下您的必需品。因此，我们可以理解这是一个正确的答案。它将其评为正确。

学习资源站

27-必修3-搞定LangChain大模型应用开发-评估

第六章评估¶

一、设置OpenAI API Key¶

二、创建LLM应用¶

2.1 创建评估数据点¶

2.2 创建测试用例数据¶

2.3 通过LLM生成测试用例¶

2.4 组合用例数据¶

2.5 中文版¶

创建评估数据点¶

创建测试用例数据¶

通过LLM生成测试用例¶

组合用例数据¶

三、人工评估¶

3.1 如何评估新创建的实例¶

3.2 中文版¶

如何评估新创建的实例¶

四、通过LLM进行评估实例¶

4.1 评估思路¶

4.2 结果分析¶

使用模型评估的优势¶

3.3 通过LLM进行评估实例¶

评估思路¶

结果分析¶

使用模型评估的优势¶

27-必修3-搞定LangChain大模型应用开发-评估

第六章 评估¶

一、设置OpenAI API Key¶

二、 创建LLM应用¶

2.1 创建评估数据点¶

2.2 创建测试用例数据¶

2.3 通过LLM生成测试用例¶

2.4 组合用例数据¶

2.5 中文版¶

创建评估数据点¶

创建测试用例数据¶

通过LLM生成测试用例¶

组合用例数据¶

三、 人工评估¶

3.1 如何评估新创建的实例¶

3.2 中文版¶

如何评估新创建的实例¶

四、 通过LLM进行评估实例¶

4.1 评估思路¶

4.2 结果分析¶

使用模型评估的优势¶

3.3 通过LLM进行评估实例¶

评估思路¶

结果分析¶

使用模型评估的优势¶

第六章评估¶

二、创建LLM应用¶

三、人工评估¶

四、通过LLM进行评估实例¶