LLM 缓存

语义缓存

`class SemanticCache(name='llmcache', distance_threshold=0.1, ttl=None, vectorizer=None, filterable_fields=None, redis_client=None, redis_url='redis://localhost:6379', connection_kwargs={}, overwrite=False, **kwargs)`

基地：BaseLLMCache

大型语言模型的语义缓存。

参数：
- name （str ， optional） – 语义缓存搜索索引的名称。默认为 “llmcache”。
- distance_threshold （float ， optional） – 的语义阈值缓存。默认值为 0.1。
- ttl （Optional [ int ] ， optional） – 缓存的记录的生存时间在 Redis 中。默认为 None。
- vectorizer （Optional [ BaseVectorizer ] ， optional） – 缓存的矢量化器。默认为 HFTextVectorizer。
- filterable_fields （Optional [ List [ Dict [ str ， Any ] ] ]） – RedisVL 字段的可选列表可用于使用过滤器自定义缓存检索。
- redis_client （Optional [ Redis ] ， optional） – 一个 redis 客户端连接实例。默认为 None。
- redis_url （str ， optional） – redis url。默认为 redis://localhost:6379。
- connection_kwargs （Dict [ str ， Any ]） – 连接参数对于 Redis 客户端。默认为空 {}。
- overwrite （bool） – 是否强制覆盖语义缓存索引。默认为 false。
提高：
- TypeError – 如果提供了无效的矢量化器。
- TypeError – 如果 TTL 值不是 int。
- ValueError – 如果阈值不介于 0 和 1 之间。
- ValueError – 如果现有架构与新架构不匹配，并且覆盖为 False。

`async acheck(prompt=None, vector=None, num_results=1, return_fields=None, filter_expression=None, distance_threshold=None)`

异步检查语义缓存中是否有类似于指定提示的结果或 vector 的

此方法使用向量相似度搜索缓存，其中原始文本提示（转换为向量）或提供的向量输入。它检查语义相似的提示并获取缓存的 LLM 响应。

参数：
- prompt （Optional [ str ] ， optional） – 要在缓存。
- vector （Optional [ List [ float ] ] ， optional） - 向量表示形式的提示符中搜索。
- num_results （int ， optional） – 要返回的缓存结果的数量。默认值为 1。
- return_fields （Optional [ List [ str ] ] ， optional） – 要包含的字段在每个返回的结果中。如果为 None，则默认为 all available 字段中的字段。
- filter_expression （Optional [FilterExpression ]） – 可选的筛选条件表达式，可用于筛选缓存结果。默认为 None 和将搜索 Full 缓存。
- distance_threshold （Optional [ float ]） – 语义的阈值 vector distance 的 Vector Distance。
返回：

包含请求的

返回每个类似缓存响应的字段。
返回类型：列表[Dict[str， Any]]
提高：
- ValueError – 如果未指定提示或向量。
- ValueError – 如果 'vector' 的维度不正确。
- TypeError – 如果 return_fields 在提供时不是列表。

response = await cache.acheck(
    prompt="What is the captial city of France?"
)

`async adrop(ids=None, keys=None)`


Async expire specific entries from the cache by id or specific
Redis key.

Parameters:

ids (Optional [ str ]) – The document ID or IDs to remove from the cache.
keys (Optional [ str ]) – The Redis keys to remove from the cache.


Return type:
None

async astore(prompt, response, vector=None, metadata=None, filters=None, ttl=None)
Async stores the specified key-value pair in the cache along with metadata.

Parameters:

prompt (str) – The user prompt to cache.
response (str) – The LLM response to cache.
vector (Optional [ List [ float ] ] , optional) – The prompt vector to
cache. Defaults to None, and the prompt vector is generated on
demand.
metadata (Optional [ Dict [ str , Any ] ] , optional) – The optional metadata to cache
alongside the prompt and response. Defaults to None.
filters (Optional [ Dict [ str , Any ] ]) – The optional tag to assign to the cache entry.
Defaults to None.
ttl (Optional [ int ]) – The optional TTL override to use on this individual cache
entry. Defaults to the global TTL setting.


Returns:
The Redis key for the entries added to the semantic cache.
Return type:
str
Raises:

ValueError – If neither prompt nor vector is specified.
ValueError – if vector has incorrect dimensions.
TypeError – If provided metadata is not a dictionary.



key = await cache.astore(
    prompt="What is the captial city of France?",
    response="Paris",
    metadata={"city": "Paris", "country": "France"}
)
async aupdate(key, **kwargs)

Async update specific fields within an existing cache entry. If no fields
are passed, then only the document TTL is refreshed.

Parameters:
key (str) – the key of the document to update using kwargs.
Raises:

ValueError if an incorrect mapping is provided as a kwarg. –
TypeError if metadata is provided and not of type dict. –


Return type:
None

key = await cache.astore('this is a prompt', 'this is a response')
await cache.aupdate(
    key,
    metadata={"hit_count": 1, "model_name": "Llama-2-7b"}
)
check(prompt=None, vector=None, num_results=1, return_fields=None, filter_expression=None, distance_threshold=None)

Checks the semantic cache for results similar to the specified prompt
or vector.
This method searches the cache using vector similarity with
either a raw text prompt (converted to a vector) or a provided vector as
input. It checks for semantically similar prompts and fetches the cached
LLM responses.

Parameters:

prompt (Optional [ str ] , optional) – The text prompt to search for in
the cache.
vector (Optional [ List [ float ] ] , optional) – The vector representation
of the prompt to search for in the cache.
num_results (int , optional) – The number of cached results to return.
Defaults to 1.
return_fields (Optional [ List [ str ] ] , optional) – The fields to include
in each returned result. If None, defaults to all available
fields in the cached entry.
filter_expression (Optional [FilterExpression ]) – Optional filter expression
that can be used to filter cache results. Defaults to None and
the full cache will be searched.
distance_threshold (Optional [ float ]) – The threshold for semantic
vector distance.




Returns:
A list of dicts containing the requested
return fields for each similar cached response.


Return type:
List[Dict[str, Any]]
Raises:

ValueError – If neither a prompt nor a vector is specified.
ValueError – if ‘vector’ has incorrect dimensions.
TypeError – If return_fields is not a list when provided.



response = cache.check(
    prompt="What is the captial city of France?"
)
clear()

Clear the cache of all keys while preserving the index.

Return type:
None

delete()
Clear the semantic cache of all keys and remove the underlying search
index.

Return type:
None

drop(ids=None, keys=None)
Manually expire specific entries from the cache by id or specific
Redis key.

Parameters:

ids (Optional [ str ]) – The document ID or IDs to remove from the cache.
keys (Optional [ str ]) – The Redis keys to remove from the cache.


Return type:
None

set_threshold(distance_threshold)
Sets the semantic distance threshold for the cache.

Parameters:
distance_threshold (float) – The semantic distance threshold for
the cache.
Raises:
ValueError – If the threshold is not between 0 and 1.
Return type:
None

set_ttl(ttl=None)
Set the default TTL, in seconds, for entries in the cache.

Parameters:
ttl (Optional [ int ] , optional) – The optional time-to-live expiration
for the cache, in seconds.
Raises:
ValueError – If the time-to-live value is not an integer.

store(prompt, response, vector=None, metadata=None, filters=None, ttl=None)
Stores the specified key-value pair in the cache along with metadata.

Parameters:

prompt (str) – The user prompt to cache.
response (str) – The LLM response to cache.
vector (Optional [ List [ float ] ] , optional) – The prompt vector to
cache. Defaults to None, and the prompt vector is generated on
demand.
metadata (Optional [ Dict [ str , Any ] ] , optional) – The optional metadata to cache
alongside the prompt and response. Defaults to None.
filters (Optional [ Dict [ str , Any ] ]) – The optional tag to assign to the cache entry.
Defaults to None.
ttl (Optional [ int ]) – The optional TTL override to use on this individual cache
entry. Defaults to the global TTL setting.


Returns:
The Redis key for the entries added to the semantic cache.
Return type:
str
Raises:

ValueError – If neither prompt nor vector is specified.
ValueError – if vector has incorrect dimensions.
TypeError – If provided metadata is not a dictionary.



key = cache.store(
    prompt="What is the captial city of France?",
    response="Paris",
    metadata={"city": "Paris", "country": "France"}
)
update(key, **kwargs)

Update specific fields within an existing cache entry. If no fields
are passed, then only the document TTL is refreshed.

Parameters:
key (str) – the key of the document to update using kwargs.
Raises:

ValueError if an incorrect mapping is provided as a kwarg. –
TypeError if metadata is provided and not of type dict. –


Return type:
None

key = cache.store('this is a prompt', 'this is a response')
cache.update(key, metadata={"hit_count": 1, "model_name": "Llama-2-7b"})
)
property aindex: AsyncSearchIndex  | None

The underlying AsyncSearchIndex for the cache.

Returns:
The async search index.
Return type:
AsyncSearchIndex

property distance_threshold: float
The semantic distance threshold for the cache.

Returns:
The semantic distance threshold.
Return type:
float

property index: SearchIndex 
The underlying SearchIndex for the cache.

Returns:
The search index.
Return type:
SearchIndex

property ttl: int | None
The default TTL, in seconds, for entries in the cache.

LLM 缓存

语义缓存

`class SemanticCache(name='llmcache', distance_threshold=0.1, ttl=None, vectorizer=None, filterable_fields=None, redis_client=None, redis_url='redis://localhost:6379', connection_kwargs={}, overwrite=False, **kwargs)`

`async acheck(prompt=None, vector=None, num_results=1, return_fields=None, filter_expression=None, distance_threshold=None)`

`async adrop(ids=None, keys=None)`

`async astore(prompt, response, vector=None, metadata=None, filters=None, ttl=None)`

`async aupdate(key, **kwargs)`

`check(prompt=None, vector=None, num_results=1, return_fields=None, filter_expression=None, distance_threshold=None)`

`clear()`

`delete()`

`drop(ids=None, keys=None)`

`set_threshold(distance_threshold)`

`set_ttl(ttl=None)`

`store(prompt, response, vector=None, metadata=None, filters=None, ttl=None)`

`update(key, **kwargs)`

`property aindex: AsyncSearchIndex | None`

`property distance_threshold: float`

`property index:` `SearchIndex`

`property ttl: int | None`

On this page