Redis 查询引擎的索引管理最佳实践
管理 Redis 查询引擎索引简介
Redis 查询引擎 (RQE) 是一个强大的工具,用于对结构化、半结构化和非结构化数据执行复杂的搜索和查询作。索引是此功能的主干,可实现快速高效的数据检索。 正确管理这些索引对于实现最佳性能、可扩展性和资源利用率至关重要。
本指南概述了在 RQE 索引的整个生命周期中管理 RQE 索引的最佳实践。它提供了以下方面的建议:
- 规划和创建索引以适合您的查询模式。
- 使用索引别名来管理架构更新并最大限度地减少停机时间。
- 监控和验证索引填充以确保查询就绪。
- 通过查询分析和内存管理优化性能。
- 在独立和集群 Redis 环境中维护和扩展索引。
- 版本控制、测试和自动化索引管理。
为什么指数管理很重要
索引直接影响查询速度和资源消耗。 管理不善的索引可能会导致内存使用量增加、查询时间变慢以及维护数据一致性的挑战。 通过遵循本指南中概述的策略,您可以:
- 减少运营开销。
- 提高应用程序性能。
- 确保在 Schema 更改期间平稳过渡。
- 利用不断增长的数据集高效扩展。
战略性地规划索引
战略性地规划索引需要了解应用程序的查询模式并定制索引以匹配。
首先确定应用程序执行的搜索类型(如全文搜索、范围查询或地理空间查找)以及涉及的字段。
根据字段的用途对字段进行分类:可搜索字段(例如TEXT
对于全文搜索)、可筛选字段(例如TAG
)和可排序字段(例如NUMERIC
进行范围查询或排序)。
将字段类型与其预期用途相匹配,并避免为很少查询的字段编制索引以节省资源。以下是索引类型的列表:
TEXT
:用TEXT
进行自由文本搜索,如果某些字段更重要,请设置权重。TAG
:用TAG
对于受益于完全匹配和筛选的分类数据(例如,产品类别)。NUMERIC
:用NUMERIC
对于数字范围 (例如,价格、时间戳)。GEO
:用GEO
对于地理空间坐标(例如,纬度/经度)。GEOSHAPE
:用GEOSHAPE
将位置表示为点,还可以定义形状并查询点和形状之间的交互(例如,查找包含在封闭形状中的所有点)。VECTOR
:用VECTOR
进行高维相似性搜索。
有关如何最好地使用这些索引类型的讨论和示例,请参阅这些页面。
接下来,模拟对示例数据集的查询以识别潜在的瓶颈。
使用如下工具FT.PROFILE
分析查询执行并在需要时优化架构。
例如,将权重分配给TEXT
字段来确定结果的优先级,或使用PREFIX
选项FT.CREATE
将索引限制为特定的键模式。请注意,您可以使用多个PREFIX
子句(见下文)
创建索引后,使用实际查询验证其性能,并使用可用工具监控使用情况:
FT.EXPLAIN
和FT.EXPLAINCLI
允许您查看 Redis 查询引擎如何解析给定的搜索查询。FT.EXPLAIN
返回查询执行计划的结构化划分,而FT.EXPLAINCLI
提供更具可读性的树状格式,以便于解释。这些命令可用于诊断查询结构并确保它与预期的逻辑保持一致。FT.INFO
提供有关索引的详细统计信息,包括索引文档的数量、内存使用情况和配置设置。它有助于监控索引增长、评估内存消耗和验证索引结构以检测潜在的低效率。FT.PROFILE
在捕获执行详细信息的同时运行查询,这有助于揭示查询性能瓶颈。它提供对处理时间、关键访问和过滤器应用程序的见解,使其成为微调复杂查询和优化搜索效率的关键工具。
避免过度索引。为每个字段编制索引会增加内存使用量,并可能减慢更新速度。 仅对计划查询所必需的字段编制索引。
索引创建
- 使用
FT.CREATE
命令定义索引架构。 - 将权重分配给
TEXT
fields (字段) 确定全文搜索结果中某些字段的优先级。 - 使用
PREFIX
选项将索引限制为具有特定模式的键。 在创建索引时使用多个 PREFIX 子句允许您在单个索引下为多个键模式编制索引。这在以下几种情况下非常有用:-
如果您的 Redis 数据库在不同的键前缀下存储不同类型的实体(例如,
user:123
,order:456
),则单个索引可以通过指定多个前缀来覆盖两者。例如:FT.CREATE my_index ON HASH PREFIX 2 "user:" "order:" SCHEMA name TEXT age NUMERIC status TAG
This approach enables searching across multiple entity types without needing separate indexes.
-
Instead of querying multiple indexes separately, you can search across related data structures using a single query. This is particularly helpful when data structures share common fields, such as searching both customer and vendor records under a unified contacts index.
-
Maintaining multiple indexes for similar data types can be inefficient in terms of memory and query performance. By consolidating data under one index with multiple prefixes, you reduce overhead while still allowing for distinct key organization.
-
If your data model evolves and new key patterns are introduced, using multiple
PREFIX
clauses from the start ensures future compatibility without requiring a full reindexing.
-
- Data loading strategy: load data into Redis before creating an index when working with large datasets. Use the
ON HASH
orON JSON
options to match the data structure.
Index aliasing
Index aliases act as abstracted names for the underlying indexes, enabling applications to reference the alias instead of the actual index name. This approach simplifies schema updates and index management.
There are several use cases for index aliasing, including:
- Schema updates: when updating an index schema, create a new index and associate the same alias with it. This allows a seamless transition without requiring application-level changes.
- Version control: use aliases to manage different versions of an index. For example, assign the alias products to
products_v1
initially and later to products_v2
when the schema evolves.
- Testing and rollback: assign an alias to a test index during staged deployments. If issues arise, quickly switch the alias back to the stable index.
Best practices for aliasing:
- Always create an alias for your indexes during initial setup, even if you don’t anticipate immediate schema changes.
- Use clear and descriptive alias names to avoid confusion (e.g.,
users_current
or orders_live
).
- Make sure that an alias points to only one index at a time to maintain predictable query results.
- Use aliases to provide tenant-specific access. For example, assign tenant-specific aliases like
tenant1_products
and tenant2_products
to different indexes for isolated query performance.
Tools for managing aliases:
- Assign an alias:
FT.ALIASADD
my_alias my_index
- Update an alias:
FT.ALIASUPDATE
my_alias new_index
- Remove an alias:
FT.ALIASDEL
my_alias
Monitoring and troubleshooting aliases:
- Use the
FT.INFO
command to check which aliases are associated with an index.
- Make sure your aliases always points to valid indexes and are correctly updated during schema changes.
Monitor index population
-
Use the FT.INFO
command to monitor the num_docs
and indexing
fields, to check that all expected documents are indexed.
FT.INFO my_new_index
-
Validate data with sample queries to ensure proper indexing:
FT.SEARCH my_new_index "*"
-
Use FT.PROFILE
to analyze query plans and validate performance:
FT.PROFILE my_new_index SEARCH QUERY "your_query"
-
Implement scripts to periodically verify document counts and query results. For example, in Python:
import re
def check_index_readiness(index_name, expected_docs):
r = redis.StrictRedis(host='localhost', port=6379, decode_responses=True)
info = r.execute_command('FT.INFO', index_name)
num_docs = int(info[info.index('num_docs') + 1])
return num_docs >= expected_d
if check_index_readiness('my_new_index', 100000):
print("Index is fully populated!")
else:
print("Index is still populating...")
Monitoring index performance
- Use the
FT.PROFILE
command to analyze query performance and identify bottlenecks.
- Regularly monitor memory usage with the
INFO
memory
and FT.INFO
commands to detect growth patterns and optimize resource allocation.
Index maintenance
- If schema changes are required, create a new index with the updated schema and reassign the alias once the index is ready.
- Use Redis key expiration to automatically remove outdated records and keep indexes lean.
FT.ALTER vs. aliasing
Use FT.ALTER
when you need to add new fields to an existing index without rebuilding it, minimizing downtime and resource usage. However, FT.ALTER
cannot remove or modify existing fields, limiting its flexibility.
Use index aliasing when making schema changes that require reindexing, such as modifying field types or removing fields. In this case, create a new index with the updated schema, populate it, and then use FT.ALIASUPDATE
to seamlessly switch queries to the new index without disrupting application functionality.
Scaling and high availability
- In a clustered Redis setup, make sure indexes are designed with key distribution in mind to prevent query inefficiencies.
- Test how indexes behave under replica promotion to ensure consistent query behavior across nodes.
Versioning and testing
- When changing schemas, create a new version of the index alongside the old one and migrate data progressively.
- Test index changes in a staging environment before deploying them to production.
Cleaning up
- Use the
FT.DROPINDEX
command to remove unused indexes and free up memory. Be cautious with the DD
(Delete Documents) flag to avoid unintended data deletion.
- Make sure no keys remain that were previously associated with dropped indexes if the data is no longer relevant.
Documentation and automation
- Document your index configurations to facilitate future maintenance.
- Use scripts or orchestration tools to automate index creation, monitoring, and cleanup.
On this page