LLM 缓存
语义缓存
class SemanticCache(name='llmcache', distance_threshold=0.1, ttl=None, vectorizer=None, filterable_fields=None, redis_client=None, redis_url='redis://localhost:6379', connection_kwargs={}, overwrite=False, **kwargs)
基地:BaseLLMCache
大型语言模型的语义缓存。
大型语言模型的语义缓存。
- 参数:
- name (str , optional) – 语义缓存搜索索引的名称。 默认为 “llmcache”。
- distance_threshold (float , optional) – 的语义阈值 缓存。默认值为 0.1。
- ttl (Optional [ int ] , optional) – 缓存的记录的生存时间 在 Redis 中。默认为 None。
- vectorizer (Optional [ BaseVectorizer ] , optional) – 缓存的矢量化器。 默认为 HFTextVectorizer。
- filterable_fields (Optional [ List [ Dict [ str , Any ] ] ]) – RedisVL 字段的可选列表 可用于使用过滤器自定义缓存检索。
- redis_client (Optional [ Redis ] , optional) – 一个 redis 客户端连接实例。 默认为 None。
- redis_url (str , optional) – redis url。默认为 redis://localhost:6379。
- connection_kwargs (Dict [ str , Any ]) – 连接参数 对于 Redis 客户端。默认为空 {}。
- overwrite (bool) – 是否强制覆盖 语义缓存索引。默认为 false。
- 提高:
- TypeError – 如果提供了无效的矢量化器。
- TypeError – 如果 TTL 值不是 int。
- ValueError – 如果阈值不介于 0 和 1 之间。
- ValueError – 如果现有架构与新架构不匹配,并且覆盖为 False。
async acheck(prompt=None, vector=None, num_results=1, return_fields=None, filter_expression=None, distance_threshold=None)
异步检查语义缓存中是否有类似于指定提示的结果 或 vector 的
此方法使用向量相似度搜索缓存,其中 原始文本提示(转换为向量)或提供的向量 输入。它检查语义相似的提示并获取缓存的 LLM 响应。
- 参数:
- prompt (Optional [ str ] , optional) – 要在 缓存。
- vector (Optional [ List [ float ] ] , optional) - 向量表示形式 的提示符中搜索。
- num_results (int , optional) – 要返回的缓存结果的数量。 默认值为 1。
- return_fields (Optional [ List [ str ] ] , optional) – 要包含的字段 在每个返回的结果中。如果为 None,则默认为 all available 字段中的字段。
- filter_expression (Optional [FilterExpression ]) – 可选的筛选条件表达式 ,可用于筛选缓存结果。默认为 None 和 将搜索 Full 缓存。
- distance_threshold (Optional [ float ]) – 语义的阈值 vector distance 的 Vector Distance。
-
- 返回:
- 包含请求的
- 返回每个类似缓存响应的字段。
- 返回类型:列表[Dict[str, Any]]
- 提高:
- ValueError – 如果未指定提示或向量。
- ValueError – 如果 'vector' 的维度不正确。
- TypeError – 如果 return_fields 在提供时不是列表。
response = await cache.acheck(
prompt="What is the captial city of France?"
)
async adrop(ids=None, keys=None)
async adrop(ids=None, keys=None)
Async expire specific entries from the cache by id or specific
Redis key.
- Parameters:
- ids (Optional [ str ]) – The document ID or IDs to remove from the cache.
- keys (Optional [ str ]) – The Redis keys to remove from the cache.
- Return type:
None
async astore(prompt, response, vector=None, metadata=None, filters=None, ttl=None)
Async stores the specified key-value pair in the cache along with metadata.
- Parameters:
- prompt (str) – The user prompt to cache.
- response (str) – The LLM response to cache.
- vector (Optional [ List [ float ] ] , optional) – The prompt vector to
cache. Defaults to None, and the prompt vector is generated on
demand.
- metadata (Optional [ Dict [ str , Any ] ] , optional) – The optional metadata to cache
alongside the prompt and response. Defaults to None.
- filters (Optional [ Dict [ str , Any ] ]) – The optional tag to assign to the cache entry.
Defaults to None.
- ttl (Optional [ int ]) – The optional TTL override to use on this individual cache
entry. Defaults to the global TTL setting.
- Returns:
The Redis key for the entries added to the semantic cache.
- Return type:
str
- Raises:
- ValueError – If neither prompt nor vector is specified.
- ValueError – if vector has incorrect dimensions.
- TypeError – If provided metadata is not a dictionary.
key = await cache.astore(
prompt="What is the captial city of France?",
response="Paris",
metadata={"city": "Paris", "country": "France"}
)
async aupdate(key, **kwargs)
Async update specific fields within an existing cache entry. If no fields
are passed, then only the document TTL is refreshed.
- Parameters:
key (str) – the key of the document to update using kwargs.
- Raises:
- ValueError if an incorrect mapping is provided as a kwarg. –
- TypeError if metadata is provided and not of type dict. –
- Return type:
None
key = await cache.astore('this is a prompt', 'this is a response')
await cache.aupdate(
key,
metadata={"hit_count": 1, "model_name": "Llama-2-7b"}
)
check(prompt=None, vector=None, num_results=1, return_fields=None, filter_expression=None, distance_threshold=None)
Checks the semantic cache for results similar to the specified prompt
or vector.
This method searches the cache using vector similarity with
either a raw text prompt (converted to a vector) or a provided vector as
input. It checks for semantically similar prompts and fetches the cached
LLM responses.
- Parameters:
- prompt (Optional [ str ] , optional) – The text prompt to search for in
the cache.
- vector (Optional [ List [ float ] ] , optional) – The vector representation
of the prompt to search for in the cache.
- num_results (int , optional) – The number of cached results to return.
Defaults to 1.
- return_fields (Optional [ List [ str ] ] , optional) – The fields to include
in each returned result. If None, defaults to all available
fields in the cached entry.
- filter_expression (Optional [FilterExpression ]) – Optional filter expression
that can be used to filter cache results. Defaults to None and
the full cache will be searched.
- distance_threshold (Optional [ float ]) – The threshold for semantic
vector distance.
-
- Returns:
- A list of dicts containing the requested
- return fields for each similar cached response.
- Return type:
List[Dict[str, Any]]
- Raises:
- ValueError – If neither a prompt nor a vector is specified.
- ValueError – if ‘vector’ has incorrect dimensions.
- TypeError – If return_fields is not a list when provided.
response = cache.check(
prompt="What is the captial city of France?"
)
clear()
Clear the cache of all keys while preserving the index.
- Return type:
None
delete()
Clear the semantic cache of all keys and remove the underlying search
index.
- Return type:
None
drop(ids=None, keys=None)
Manually expire specific entries from the cache by id or specific
Redis key.
- Parameters:
- ids (Optional [ str ]) – The document ID or IDs to remove from the cache.
- keys (Optional [ str ]) – The Redis keys to remove from the cache.
- Return type:
None
set_threshold(distance_threshold)
Sets the semantic distance threshold for the cache.
- Parameters:
distance_threshold (float) – The semantic distance threshold for
the cache.
- Raises:
ValueError – If the threshold is not between 0 and 1.
- Return type:
None
set_ttl(ttl=None)
Set the default TTL, in seconds, for entries in the cache.
- Parameters:
ttl (Optional [ int ] , optional) – The optional time-to-live expiration
for the cache, in seconds.
- Raises:
ValueError – If the time-to-live value is not an integer.
store(prompt, response, vector=None, metadata=None, filters=None, ttl=None)
Stores the specified key-value pair in the cache along with metadata.
- Parameters:
- prompt (str) – The user prompt to cache.
- response (str) – The LLM response to cache.
- vector (Optional [ List [ float ] ] , optional) – The prompt vector to
cache. Defaults to None, and the prompt vector is generated on
demand.
- metadata (Optional [ Dict [ str , Any ] ] , optional) – The optional metadata to cache
alongside the prompt and response. Defaults to None.
- filters (Optional [ Dict [ str , Any ] ]) – The optional tag to assign to the cache entry.
Defaults to None.
- ttl (Optional [ int ]) – The optional TTL override to use on this individual cache
entry. Defaults to the global TTL setting.
- Returns:
The Redis key for the entries added to the semantic cache.
- Return type:
str
- Raises:
- ValueError – If neither prompt nor vector is specified.
- ValueError – if vector has incorrect dimensions.
- TypeError – If provided metadata is not a dictionary.
key = cache.store(
prompt="What is the captial city of France?",
response="Paris",
metadata={"city": "Paris", "country": "France"}
)
update(key, **kwargs)
Update specific fields within an existing cache entry. If no fields
are passed, then only the document TTL is refreshed.
- Parameters:
key (str) – the key of the document to update using kwargs.
- Raises:
- ValueError if an incorrect mapping is provided as a kwarg. –
- TypeError if metadata is provided and not of type dict. –
- Return type:
None
key = cache.store('this is a prompt', 'this is a response')
cache.update(key, metadata={"hit_count": 1, "model_name": "Llama-2-7b"})
)
property aindex:
AsyncSearchIndex
| None
The underlying AsyncSearchIndex for the cache.
- Returns:
The async search index.
- Return type:
AsyncSearchIndex
property distance_threshold: float
The semantic distance threshold for the cache.
- Returns:
The semantic distance threshold.
- Return type:
float
property index:
SearchIndex
The underlying SearchIndex for the cache.
- Returns:
The search index.
- Return type:
SearchIndex
property ttl: int | None
The default TTL, in seconds, for entries in the cache.
On this page