LLM 会话内存

大型语言模型本质上是无状态的,并且不知道以前与用户的交互,甚至不知道当前对话的先前部分。虽然这在提出简单的问题时可能并不明显,但在进行依赖于对话上下文的长期对话时,它就会成为一个障碍。

此问题的解决方案是将之前的会话历史记录附加到对 LLM 的每个后续调用中。

此笔记本将介绍如何使用 Redis 来构建、存储和检索此会话内存。

from redisvl.extensions.session_manager import StandardSessionManager
chat_session = StandardSessionManager(name='student tutor')
12:24:11 redisvl.index.index INFO   Index already exists, not overwriting.

To align with common LLM APIs, Redis stores messages with role and content fields. The supported roles are "system", "user" and "llm".

You can store messages one at a time or all at once.

chat_session.add_message({"role":"system", "content":"You are a helpful geography tutor, giving simple and short answers to questions about Europen countries."})
chat_session.add_messages([
    {"role":"user", "content":"What is the capital of France?"},
    {"role":"llm", "content":"The capital is Paris."},
    {"role":"user", "content":"And what is the capital of Spain?"},
    {"role":"llm", "content":"The capital is Madrid."},
    {"role":"user", "content":"What is the population of Great Britain?"},
    {"role":"llm", "content":"As of 2023 the population of Great Britain is approximately 67 million people."},]
    )

At any point we can retrieve the recent history of the conversation. It will be ordered by entry time.

context = chat_session.get_recent()
for message in context:
    print(message)
{'role': 'llm', 'content': 'The capital is Paris.'}
{'role': 'user', 'content': 'And what is the capital of Spain?'}
{'role': 'llm', 'content': 'The capital is Madrid.'}
{'role': 'user', 'content': 'What is the population of Great Britain?'}
{'role': 'llm', 'content': 'As of 2023 the population of Great Britain is approximately 67 million people.'}

In many LLM flows the conversation progresses in a series of prompt and response pairs. session managers offer a convienience function store() to add these simply.

prompt = "what is the size of England compared to Portugal?"
response = "England is larger in land area than Portal by about 15000 square miles."
chat_session.store(prompt, response)

context = chat_session.get_recent(top_k=6)
for message in context:
    print(message)
{'role': 'user', 'content': 'And what is the capital of Spain?'}
{'role': 'llm', 'content': 'The capital is Madrid.'}
{'role': 'user', 'content': 'What is the population of Great Britain?'}
{'role': 'llm', 'content': 'As of 2023 the population of Great Britain is approximately 67 million people.'}
{'role': 'user', 'content': 'what is the size of England compared to Portugal?'}
{'role': 'llm', 'content': 'England is larger in land area than Portal by about 15000 square miles.'}

Managing multiple users and conversations

For applications that need to handle multiple conversations concurrently, Redis supports tagging messages to keep conversations separated.

chat_session.add_message({"role":"system", "content":"You are a helpful algebra tutor, giving simple answers to math problems."}, session_tag='student two')
chat_session.add_messages([
    {"role":"user", "content":"What is the value of x in the equation 2x + 3 = 7?"},
    {"role":"llm", "content":"The value of x is 2."},
    {"role":"user", "content":"What is the value of y in the equation 3y - 5 = 7?"},
    {"role":"llm", "content":"The value of y is 4."}],
    session_tag='student two'
    )

for math_message in chat_session.get_recent(session_tag='student two'):
    print(math_message)
{'role': 'system', 'content': 'You are a helpful algebra tutor, giving simple answers to math problems.'}
{'role': 'user', 'content': 'What is the value of x in the equation 2x + 3 = 7?'}
{'role': 'llm', 'content': 'The value of x is 2.'}
{'role': 'user', 'content': 'What is the value of y in the equation 3y - 5 = 7?'}
{'role': 'llm', 'content': 'The value of y is 4.'}

Semantic conversation memory

For longer conversations our list of messages keeps growing. Since LLMs are stateless we have to continue to pass this conversation history on each subsequent call to ensure the LLM has the correct context.

A typical flow looks like this:

while True:
    prompt = input('enter your next question')
    context = chat_session.get_recent()
    response = LLM_api_call(prompt=prompt, context=context)
    chat_session.store(prompt, response)

This works, but as context keeps growing so too does our LLM token count, which increases latency and cost.

Conversation histories can be truncated, but that can lead to losing relevant information that appeared early on.

A better solution is to pass only the relevant conversational context on each subsequent call.

For this, RedisVL has the SemanticSessionManager, which uses vector similarity search to return only semantically relevant sections of the conversation.

from redisvl.extensions.session_manager import SemanticSessionManager
semantic_session = SemanticSessionManager(name='tutor')

semantic_session.add_messages(chat_session.get_recent(top_k=8))
12:24:15 redisvl.index.index INFO   Index already exists, not overwriting.
prompt = "what have I learned about the size of England?"
semantic_session.set_distance_threshold(0.35)
context = semantic_session.get_relevant(prompt)
for message in context:
    print(message)
{'role': 'user', 'content': 'what is the size of England compared to Portugal?'}
{'role': 'llm', 'content': 'England is larger in land area than Portal by about 15000 square miles.'}

You can adjust the degree of semantic similarity needed to be included in your context.

Setting a distance threshold close to 0.0 will require an exact semantic match, while a distance threshold of 1.0 will include everthing.

semantic_session.set_distance_threshold(0.7)

larger_context = semantic_session.get_relevant(prompt)
for message in larger_context:
    print(message)
{'role': 'user', 'content': 'what is the size of England compared to Portugal?'}
{'role': 'llm', 'content': 'England is larger in land area than Portal by about 15000 square miles.'}
{'role': 'user', 'content': 'What is the population of Great Britain?'}
{'role': 'llm', 'content': 'As of 2023 the population of Great Britain is approximately 67 million people.'}

Conversation control

LLMs can hallucinate on occasion and when this happens it can be useful to prune incorrect information from conversational histories so this incorrect information doesn't continue to be passed as context.

semantic_session.store(
    prompt="what is the smallest country in Europe?",
    response="Monaco is the smallest country in Europe at 0.78 square miles." # Incorrect. Vatican City is the smallest country in Europe
    )

# get the key of the incorrect message
context = semantic_session.get_recent(top_k=1, raw=True)
bad_key = context[0]['entry_id']
semantic_session.drop(bad_key)

corrected_context = semantic_session.get_recent()
for message in corrected_context:
    print(message)
{'role': 'user', 'content': 'What is the population of Great Britain?'}
{'role': 'llm', 'content': 'As of 2023 the population of Great Britain is approximately 67 million people.'}
{'role': 'user', 'content': 'what is the size of England compared to Portugal?'}
{'role': 'llm', 'content': 'England is larger in land area than Portal by about 15000 square miles.'}
{'role': 'user', 'content': 'what is the smallest country in Europe?'}
chat_session.clear()
RATE THIS PAGE
Back to top ↑