扫描
SCAN cursor [MATCH pattern] [COUNT count] [TYPE type]
- 从以下位置开始可用:
- 2.8.0
- 时间复杂度:
- O(1) 进行调用。O(N) 进行完整迭代,包括足够的命令调用,以便游标返回回 0。N 是集合中的元素数。
- ACL 类别:
-
@keyspace
,@read
,@slow
,
这SCAN
命令和密切相关的命令SSCAN
,HSCAN
和ZSCAN
用于增量迭代元素集合。
SCAN
迭代当前所选 Redis 数据库中的键集。SSCAN
迭代 Sets 类型的元素。HSCAN
迭代 Hash 类型的字段及其关联值。ZSCAN
迭代 Sorted Set 类型的元素及其关联的分数。
由于这些命令允许增量迭代,每次调用只返回少量元素,因此它们可以在生产中使用,而不会有命令的缺点,例如KEYS
或SMEMBERS
当针对大量键或元素调用时,这可能会阻止服务器很长时间(甚至几秒钟)。
但是,虽然阻止了像SMEMBERS
能够在给定时刻提供属于 Set 的所有元素,SCAN 系列命令仅对返回的元素提供有限的保证,因为我们增量迭代的集合可以在迭代过程中发生变化。
请注意,SCAN
,SSCAN
,HSCAN
和ZSCAN
所有命令的工作方式都非常相似,因此本文档涵盖了所有四个命令。但是,一个明显的区别是,在SSCAN
,HSCAN
和ZSCAN
第一个参数是保存 Set、Hash 或 Sorted Set 值的键的名称。这SCAN
command 不需要任何 key name 参数,因为它迭代当前数据库中的 key,因此迭代的对象是数据库本身。
SCAN 基本用法
SCAN 是基于游标的迭代器。这意味着,在每次调用该命令时,服务器都会返回一个更新的游标,用户需要在下一次调用中将其用作 cursor 参数。
当游标设置为 0 时,迭代开始,当服务器返回的游标为 0 时终止。以下是 SCAN 迭代的示例:
> scan 0
1) "17"
2) 1) "key:12"
2) "key:8"
3) "key:4"
4) "key:14"
5) "key:16"
6) "key:17"
7) "key:15"
8) "key:10"
9) "key:3"
10) "key:7"
11) "key:1"
> scan 17
1) "0"
2) 1) "key:5"
2) "key:18"
3) "key:0"
4) "key:2"
5) "key:19"
6) "key:13"
7) "key:6"
8) "key:9"
9) "key:11"
在上面的示例中,第一次调用使用 0 作为光标来启动迭代。第二次调用使用上一次调用返回的游标作为回复的第一个元素,即 17。
如您所见,SCAN 返回值是一个包含两个值的数组:第一个值是下次调用中使用的新游标,第二个值是元素数组。
由于在第二次调用中返回的游标为 0,因此服务器向调用者发出信号,表明迭代已完成,并且集合已完全探索。以 cursor 值为 0 开始迭代,然后调用SCAN
直到返回的 cursor 再次为 0 称为 Full iteration。
返回值
SCAN
,SSCAN
,HSCAN
和ZSCAN
返回两个元素的 Multi-Bulk 回复,其中第一个元素是表示无符号 64 位数字(游标)的字符串,第二个元素是包含元素数组的 Multi-Bulk。
SCAN
array of elements 是键的列表。SSCAN
array of elements 是 Set 成员的列表。HSCAN
元素数组包含两个元素,一个 field 和一个 value,用于 Hash 的每个返回元素。ZSCAN
元素数组包含两个元素,一个 member 及其关联的 score,用于 Sorted Set 的每个返回元素。
扫描保证
这SCAN
命令,以及SCAN
family 能够向用户提供一组与完整迭代相关的保证。
- 完整迭代始终检索集合中从完整迭代开始到结束的所有元素。这意味着,如果给定元素在迭代开始时位于集合内,并且在迭代终止时仍然存在,那么在某个时间点
SCAN
将其返回给用户。 - 完整迭代永远不会返回从完整迭代开始到结束的集合中不存在的任何元素。因此,如果某个元素在迭代开始之前被删除,并且在迭代持续的所有时间内从未被添加回集合中,
SCAN
确保永远不会返回此元素。
但是,由于SCAN
关联状态非常少(仅游标),它具有以下缺点:
- 给定元素可以多次返回。由应用程序来处理重复元素的情况,例如,仅使用返回的元素来执行在多次重新应用时安全的作。
- 在完整迭代期间,集合中未始终存在的元素可能会返回,也可能不会返回:它是 undefined。
每次 SCAN 调用返回的元素数
SCAN
Family 函数不保证每次调用返回的元素数在给定范围内。这些命令还允许返回零个元素,只要返回的游标不为零,客户端就不应认为迭代完成。
但是,返回的元素数量是合理的,也就是说,实际上SCAN
在迭代大型集合时,可能会以几十个元素的顺序返回最大数量的元素,或者当迭代的集合足够小以在内部表示为编码数据结构时,可能会在一次调用中返回集合的所有元素(这种情况发生在小型 Sets、Hashes 和 Sorted Sets 中)。
但是,用户可以使用 COUNT 选项调整每次调用返回的元素数的数量级。
COUNT 选项
而SCAN
不保证每次迭代返回的元素数量,则可以根据经验调整SCAN
使用 COUNT 选项。基本上,使用 COUNT 时,用户指定每次调用时应完成的工作量,以便从集合中检索元素。这只是对实现的一个提示,但一般来说,这是您大多数时候可以从实现中期望的。
- 默认的
COUNT
值为 10。 - 当迭代键空间或足够大以由哈希表表示的 Set、Hash 或 Sorted Set 时,假设没有使用 MATCH 选项,服务器通常会返回 count 或每次调用的 count 以上的元素。请查看本文档后面的 SCAN 为何可以一次返回所有元素 部分。
- 当迭代编码为 intsets 的 Set(仅由整数组成的小集)或编码为 ziplist 的 Hashes 和 Sorted Set(由小的单个值组成的小哈希和集)时,通常所有元素都返回在第一个
SCAN
调用,而不管COUNT
价值。
重要提示:无需对每次迭代使用相同的 COUNT 值。调用方可以根据需要自由地将计数从一个迭代更改为另一个迭代,只要在下一次调用中传递的游标是在上一次调用命令中获得的游标。
MATCH 选项
可以只迭代与给定 glob 样式模式匹配的元素,类似于KEYS
命令,该命令将模式作为其唯一参数。
为此,只需将MATCH <pattern>
参数的末尾SCAN
命令(它适用于所有SCAN
family 命令)。
以下是使用 MATCH 的迭代示例:
It is important to note that the MATCH filter is applied after elements are retrieved from the collection, just before returning data to the client. This means that if the pattern matches very little elements inside the collection, SCAN
will likely return no elements in most iterations. An example is shown below:
As you can see most of the calls returned zero elements, but the last call where a COUNT
of 1000 was used in order to force the command to do more scanning for that iteration.
When using Redis Cluster, the search is optimized for patterns that imply a single slot.
If a pattern can only match keys of one slot,
Redis only iterates over keys in that slot, rather than the whole database,
when searching for keys matching the pattern.
For example, with the pattern {a}h*llo
, Redis would only try to match it with the keys in slot 15495, which hash tag {a}
implies.
To use pattern with hash tag, see Hash tags in the Cluster specification for more information.
The TYPE option
You can use the TYPE
option to ask SCAN
to only return objects that match a given type
, allowing you to iterate through the database looking for keys of a specific type. The TYPE option is only available on the whole-database SCAN
, not HSCAN
or ZSCAN
etc.
The type
argument is the same string name that the TYPE
command returns. Note a quirk where some Redis types, such as GeoHashes, HyperLogLogs, Bitmaps, and Bitfields, may internally be implemented using other Redis types, such as a string or zset, so can't be distinguished from other keys of that same type by SCAN
. For example, a ZSET and GEOHASH:
It is important to note that the TYPE filter is also applied after elements are retrieved from the database, so the option does not reduce the amount of work the server has to do to complete a full iteration, and for rare types you may receive no elements in many iterations.
The NOVALUES option
When using HSCAN
, you can use the NOVALUES
option to make Redis return only the keys in the hash table without their corresponding values.
Multiple parallel iterations
It is possible for an infinite number of clients to iterate the same collection at the same time, as the full state of the iterator is in the cursor, that is obtained and returned to the client at every call. No server side state is taken at all.
Terminating iterations in the middle
Since there is no state server side, but the full state is captured by the cursor, the caller is free to terminate an iteration half-way without signaling this to the server in any way. An infinite number of iterations can be started and never terminated without any issue.
Calling SCAN with a corrupted cursor
Calling SCAN
with a broken, negative, out of range, or otherwise invalid cursor, will result in undefined behavior but never in a crash. What will be undefined is that the guarantees about the returned elements can no longer be ensured by the SCAN
implementation.
The only valid cursors to use are:
- The cursor value of 0 when starting an iteration.
- The cursor returned by the previous call to SCAN in order to continue the iteration.
Guarantee of termination
The SCAN
algorithm is guaranteed to terminate only if the size of the iterated collection remains bounded to a given maximum size, otherwise iterating a collection that always grows may result into SCAN
to never terminate a full iteration.
This is easy to see intuitively: if the collection grows there is more and more work to do in order to visit all the possible elements, and the ability to terminate the iteration depends on the number of calls to SCAN
and its COUNT option value compared with the rate at which the collection grows.
Why SCAN may return all the items of an aggregate data type in a single call?
In the COUNT
option documentation, we state that sometimes this family of commands may return all the elements of a Set, Hash or Sorted Set at once in a single call, regardless of the COUNT
option value. The reason why this happens is that the cursor-based iterator can be implemented, and is useful, only when the aggregate data type that we are scanning is represented as a hash table. However Redis uses a memory optimization where small aggregate data types, until they reach a given amount of items or a given max size of single elements, are represented using a compact single-allocation packed encoding. When this is the case, SCAN
has no meaningful cursor to return, and must iterate the whole data structure at once, so the only sane behavior it has is to return everything in a call.
However once the data structures are bigger and are promoted to use real hash tables, the SCAN
family of commands will resort to the normal behavior. Note that since this special behavior of returning all the elements is true only for small aggregates, it has no effects on the command complexity or latency. However the exact limits to get converted into real hash tables are user configurable, so the maximum number of elements you can see returned in a single call depends on how big an aggregate data type could be and still use the packed representation.
Also note that this behavior is specific of SSCAN
, HSCAN
and ZSCAN
. SCAN
itself never shows this behavior because the key space is always represented by hash tables.
Further reading
For more information about managing keys, please refer to the The Redis Keyspace tutorial.
Additional examples
Give the following commands, showing iteration of a hash key, a try in the interactive console:
RESP2/RESP3 Reply
Array reply: specifically, an array with two elements.
- The first element is a Bulk string reply that represents an unsigned 64-bit number, the cursor.
- The second element is an Array reply with the names of scanned keys.
History
- Starting with Redis version 6.0.0: Added the
TYPE
subcommand.