超日志
HyperLogLog 是一种概率数据结构,用于估计集合的基数。
HyperLogLog 是一种概率数据结构,用于估计集合的基数。作为一种概率数据结构,HyperLogLog 以完美的准确性换取了高效的空间利用。
Redis HyperLogLog 实现最多使用 12 KB,并提供 0.81% 的标准误差。
计算唯一项通常需要一定的内存量 与要计数的项目数成正比,因为您需要 记住您过去已经看到的元素,以避免 多次数一数。但是,存在一组算法进行交易 Memory for precision:它们返回具有标准误差的估计度量, 对于 HyperLogLog 的 Redis 实现,该比例不到 1%。 此算法的神奇之处在于您不再需要使用大量内存 与计数的项目数量成正比,并且可以使用 恒定的内存量;12k 字节,或者如果你的 HyperLogLog(我们现在只称它们为 HLL)的元素很少。
Redis 中的 HLL 虽然在技术上是一种不同的数据结构,但它是经过编码的
作为 Redis 字符串,因此您可以调用GET
序列化 HLL,以及SET
将其反序列化回服务器。
从概念上讲,HLL API 类似于使用 Sets 执行相同的任务。你会SADD
每个观察到的元素都集成到一个集合中,并且会使用SCARD
要检查
集合内的元素数,这些元素是唯一的,因为SADD
不会
重新添加现有元素。
虽然您并没有真正将项目添加到 HLL 中,因为数据结构 仅包含不包含实际元素的状态,则 API 是 相同:
- 每次看到新元素时,都会将其添加到计数中
PFADD
. - 当您想要检索使用
PFADD
命令,您可以使用PFCOUNT
命令。如果您需要合并两个不同的 HLL,PFMERGE
命令可用。由于 HLL 提供唯一元素的近似计数,因此合并的结果将为您提供两个源 HLL 中唯一元素数量的近似值。
Some examples of use cases for this data structure is counting unique queries
performed by users in a search form every day, number of unique visitors to a web page and other similar cases.
Redis is also able to perform the union of HLLs, please check the
full documentation for more information.
Use cases
Anonymous unique visits of a web page (SaaS, analytics tools)
This application answers these questions:
- How many unique visits has this page had on this day?
- How many unique users have played this song?
- How many unique users have viewed this video?
Note:
Storing the IP address or any other kind of personal identifier is against the law in some countries, which makes it impossible to get unique visitor statistics on your website.
One HyperLogLog is created per page (video/song) per period, and every IP/identifier is added to it on every visit.
Basic commands
PFADD
adds an item to a HyperLogLog.
PFCOUNT
returns an estimate of the number of items in the set.
PFMERGE
combines two or more HyperLogLogs into one.
See the complete list of HyperLogLog commands.
Performance
Writing (PFADD
) to and reading from (PFCOUNT
) the HyperLogLog is done in constant time and space.
Merging HLLs is O(n), where n is the number of sketches.
Limits
The HyperLogLog can estimate the cardinality of sets with up to 18,446,744,073,709,551,616 (2^64) members.
Learn more
- Redis new data structure: the HyperLogLog has a lot of details about the data structure and its implementation in Redis.
- Redis HyperLogLog Explained shows you how to use Redis HyperLogLog data structures to build a traffic heat map.
On this page