重新排名者
在本笔记本中,我们将展示如何使用 RedisVL 对搜索结果进行重新排序 (documents 或 chunks or records)。今天的 RedisVL 通过以下方式支持重新排名:
- 使用预先训练的交叉编码器的重新排序器,该编码器可以使用来自 Hugging Face 交叉编码器模型的模型或实现交叉编码器功能的 Hugging Face 模型(例如:BAAI/bge-reranker-base)。
- Cohere /rerank API 的 API
- VoyageAI /rerank API 的 API 中。
在运行此笔记本之前,请务必:
- 已安装
redisvl
并为此笔记本激活该环境。 - 有一个正在运行的 Redis 堆栈实例,并且 RediSearch > 2.4 处于活动状态。
例如,您可以使用 Docker 在本地运行 Redis Stack:
docker run -d -p 6379:6379 -p 8001:8001 redis/redis-stack:latest
This will run Redis on port 6379 and RedisInsight at http://localhost:8001.
# import necessary modules
import os
Simple Reranking
Reranking provides a relevance boost to search results generated by
traditional (lexical) or semantic search strategies.
As a simple demonstration, take the passages and user query below:
query = "What is the capital of the United States?"
docs = [
"Carson City is the capital city of the American state of Nevada. At the 2010 United States Census, Carson City had a population of 55,274.",
"The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean that are a political division controlled by the United States. Its capital is Saipan.",
"Charlotte Amalie is the capital and largest city of the United States Virgin Islands. It has about 20,000 people. The city is on the island of Saint Thomas.",
"Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district. The President of the USA and many major national government offices are in the territory. This makes it the political center of the United States of America.",
"Capital punishment (the death penalty) has existed in the United States since before the United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states. The federal government (including the United States military) also uses capital punishment."
]
The goal of reranking is to provide a more fine-grained quality improvement to
initial search results. With RedisVL, this would likely be results coming back
from a search operation like full text or vector.
Using the Cross-Encoder Reranker
To use the cross-encoder reranker we initialize an instance of HFCrossEncoderReranker
passing a suitable model (if no model is provided, the cross-encoder/ms-marco-MiniLM-L-6-v2
model is used):
from redisvl.utils.rerank import HFCrossEncoderReranker
cross_encoder_reranker = HFCrossEncoderReranker("BAAI/bge-reranker-base")
Rerank documents with HFCrossEncoderReranker
With the obtained reranker instance we can rerank and truncate the list of
documents based on relevance to the initial query.
results, scores = cross_encoder_reranker.rank(query=query, docs=docs)
for result, score in zip(results, scores):
print(score, " -- ", result)
0.07461125403642654 -- {'content': 'Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district. The President of the USA and many major national government offices are in the territory. This makes it the political center of the United States of America.'}
0.05220315232872963 -- {'content': 'Charlotte Amalie is the capital and largest city of the United States Virgin Islands. It has about 20,000 people. The city is on the island of Saint Thomas.'}
0.3802368640899658 -- {'content': 'Carson City is the capital city of the American state of Nevada. At the 2010 United States Census, Carson City had a population of 55,274.'}
Using the Cohere Reranker
To initialize the Cohere reranker you'll need to install the cohere library and provide the right Cohere API Key.
#!pip install cohere
import getpass
# setup the API Key
api_key = os.environ.get("COHERE_API_KEY") or getpass.getpass("Enter your Cohere API key: ")
from redisvl.utils.rerank import CohereReranker
cohere_reranker = CohereReranker(limit=3, api_config={"api_key": api_key})
Rerank documents with CohereReranker
Below we will use the CohereReranker
to rerank and truncate the list of
documents above based on relevance to the initial query.
results, scores = cohere_reranker.rank(query=query, docs=docs)
for result, score in zip(results, scores):
print(score, " -- ", result)
0.9990564 -- Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district. The President of the USA and many major national government offices are in the territory. This makes it the political center of the United States of America.
0.7516481 -- Capital punishment (the death penalty) has existed in the United States since before the United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states. The federal government (including the United States military) also uses capital punishment.
0.08882029 -- The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean that are a political division controlled by the United States. Its capital is Saipan.
Working with semi-structured documents
Often times the initial result set includes other metadata and components that could be used to steer the reranking relevancy. To accomplish this, we can set the rank_by
argument and provide documents with those additional fields.
docs = [
{
"source": "wiki",
"passage": "Carson City is the capital city of the American state of Nevada. At the 2010 United States Census, Carson City had a population of 55,274."
},
{
"source": "encyclopedia",
"passage": "The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean that are a political division controlled by the United States. Its capital is Saipan."
},
{
"source": "textbook",
"passage": "Charlotte Amalie is the capital and largest city of the United States Virgin Islands. It has about 20,000 people. The city is on the island of Saint Thomas."
},
{
"source": "textbook",
"passage": "Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district. The President of the USA and many major national government offices are in the territory. This makes it the political center of the United States of America."
},
{
"source": "wiki",
"passage": "Capital punishment (the death penalty) has existed in the United States since before the United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states. The federal government (including the United States military) also uses capital punishment."
}
]
results, scores = cohere_reranker.rank(query=query, docs=docs, rank_by=["passage", "source"])
for result, score in zip(results, scores):
print(score, " -- ", result)
0.9988121 -- {'source': 'textbook', 'passage': 'Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district. The President of the USA and many major national government offices are in the territory. This makes it the political center of the United States of America.'}
0.5974905 -- {'source': 'wiki', 'passage': 'Capital punishment (the death penalty) has existed in the United States since before the United States was a country. As of 2017, capital punishment is legal in 30 of the 50 states. The federal government (including the United States military) also uses capital punishment.'}
0.059101548 -- {'source': 'encyclopedia', 'passage': 'The Commonwealth of the Northern Mariana Islands is a group of islands in the Pacific Ocean that are a political division controlled by the United States. Its capital is Saipan.'}
Using the VoyageAI Reranker
To initialize the VoyageAI reranker you'll need to install the voyaeai library and provide the right VoyageAI API Key.
#!pip install voyageai
import getpass
# setup the API Key
api_key = os.environ.get("VOYAGE_API_KEY") or getpass.getpass("Enter your VoyageAI API key: ")
from redisvl.utils.rerank import VoyageAIReranker
reranker = VoyageAIReranker(model="rerank-lite-1", limit=3, api_config={"api_key": api_key})# Please check the available models at https://docs.voyageai.com/docs/reranker
Rerank documents with VoyageAIReranker
Below we will use the VoyageAIReranker
to rerank and also truncate the list of
documents above based on relevance to the initial query.
results, scores = reranker.rank(query=query, docs=docs)
for result, score in zip(results, scores):
print(score, " -- ", result)
0.796875 -- Washington, D.C. (also known as simply Washington or D.C., and officially as the District of Columbia) is the capital of the United States. It is a federal district. The President of the USA and many major national government offices are in the territory. This makes it the political center of the United States of America.
0.578125 -- Charlotte Amalie is the capital and largest city of the United States Virgin Islands. It has about 20,000 people. The city is on the island of Saint Thomas.
0.5625 -- Carson City is the capital city of the American state of Nevada. At the 2010 United States Census, Carson City had a population of 55,274.
On this page