Redis 流水线

如何通过批处理 Redis 命令来优化往返时间

Redis 流水线是一种提高性能的技术,它一次发出多个命令,而无需等待对每个单独命令的响应。大多数 Redis 客户端都支持 Pipelining。本文档介绍了 pipelining 旨在解决的问题以及 pipelining 在 Redis 中的工作原理。

请求/响应协议和往返时间 (RTT)

Redis 是一个使用客户端-服务器模型和所谓的请求/响应协议的 TCP 服务器。

这意味着通常通过以下步骤完成请求:

  • 客户端向服务器发送查询,并从套接字中读取,通常以阻塞方式读取服务器响应。
  • 服务器处理命令并将响应发送回客户端。

例如,四个命令序列是这样的:

  • 客户:INCR X
  • 服务器:1
  • 客户:INCR X
  • 服务器:2
  • 客户:INCR X
  • 服务器:3
  • 客户:INCR X
  • 服务器:4

客户端和服务器通过网络链接连接。 此类链路可以非常快(环回接口)或非常慢(通过 Internet 建立的连接,两台主机之间有许多跃点)。 无论网络延迟是多少,数据包从客户端传输到服务器,再从服务器传输到客户端都需要时间来承载回复。

这个时间称为 RTT (Round Trip Time)。 当 Client 端需要连续执行许多请求时(例如,向同一列表添加许多元素,或使用许多 key 填充数据库),很容易看出这会如何影响性能。 例如,如果 RTT 时间为 250 毫秒(在 Internet 上的链接速度非常慢的情况下),即使服务器每秒能够处理 100k 个请求,我们也将能够每秒最多处理 4 个请求。

如果使用的接口是环回接口,则 RTT 要短得多,通常为亚毫秒级,但如果您需要连续执行多次写入,即使这样也会加起来很多。

幸运的是,有一种方法可以改进此用例。

Redis 流水线

可以实现 Request/Response 服务器,以便即使客户端尚未读取旧响应,它也能够处理新请求。 这样就可以向服务器发送多个命令,而无需等待回复,最后一步即可读取回复。

这称为流水线,是一种广泛使用数十年的技术。 例如,许多 POP3 协议实现已经支持此功能,从而大大加快了从服务器下载新电子邮件的过程。

Redis 从早期就支持 pipelining,因此无论您运行的是什么版本,您都可以将 pipelining 与 Redis 一起使用。 以下是使用原始 netcat 实用程序的示例:

$ (printf "PING\r\nPING\r\nPING\r\n"; sleep 1) | nc localhost 6379
+PONG
+PONG
+PONG

This time we don't pay the cost of RTT for every call, but just once for the three commands.

To be explicit, with pipelining the order of operations of our very first example will be the following:

  • Client: INCR X
  • Client: INCR X
  • Client: INCR X
  • Client: INCR X
  • Server: 1
  • Server: 2
  • Server: 3
  • Server: 4

IMPORTANT NOTE: While the client sends commands using pipelining, the server will be forced to queue the replies, using memory. So if you need to send a lot of commands with pipelining, it is better to send them as batches each containing a reasonable number, for instance 10k commands, read the replies, and then send another 10k commands again, and so forth. The speed will be nearly the same, but the additional memory used will be at most the amount needed to queue the replies for these 10k commands.

It's not just a matter of RTT

Pipelining is not just a way to reduce the latency cost associated with the round trip time, it actually greatly improves the number of operations you can perform per second in a given Redis server. This is because without using pipelining, serving each command is very cheap from the point of view of accessing the data structures and producing the reply, but it is very costly from the point of view of doing the socket I/O. This involves calling the read() and write() syscall, that means going from user land to kernel land. The context switch is a huge speed penalty.

When pipelining is used, many commands are usually read with a single read() system call, and multiple replies are delivered with a single write() system call. Consequently, the number of total queries performed per second initially increases almost linearly with longer pipelines, and eventually reaches 10 times the baseline obtained without pipelining, as shown in this figure.

Pipeline size and IOPs

A real world code example

In the following benchmark we'll use the Redis Ruby client, supporting pipelining, to test the speed improvement due to pipelining:

require 'rubygems'
require 'redis'

def bench(descr)
  start = Time.now
  yield
  puts "#{descr} #{Time.now - start} seconds"
end

def without_pipelining
  r = Redis.new
  10_000.times do
    r.ping
  end
end

def with_pipelining
  r = Redis.new
  r.pipelined do |rp|
    10_000.times do
      rp.ping
    end
  end
end

bench('without pipelining') do
  without_pipelining
end
bench('with pipelining') do
  with_pipelining
end

Running the above simple script yields the following figures on my Mac OS X system, running over the loopback interface, where pipelining will provide the smallest improvement as the RTT is already pretty low:

without pipelining 1.185238 seconds
with pipelining 0.250783 seconds

As you can see, using pipelining, we improved the transfer by a factor of five.

Pipelining vs Scripting

Using Redis scripting, available since Redis 2.6, a number of use cases for pipelining can be addressed more efficiently using scripts that perform a lot of the work needed at the server side. A big advantage of scripting is that it is able to both read and write data with minimal latency, making operations like read, compute, write very fast (pipelining can't help in this scenario since the client needs the reply of the read command before it can call the write command).

Sometimes the application may also want to send EVAL or EVALSHA commands in a pipeline. This is entirely possible and Redis explicitly supports it with the SCRIPT LOAD command (it guarantees that EVALSHA can be called without the risk of failing).

Appendix: Why are busy loops slow even on the loopback interface?

Even with all the background covered in this page, you may still wonder why a Redis benchmark like the following (in pseudo code), is slow even when executed in the loopback interface, when the server and the client are running in the same physical machine:

FOR-ONE-SECOND:
    Redis.SET("foo","bar")
END

After all, if both the Redis process and the benchmark are running in the same box, isn't it just copying messages in memory from one place to another without any actual latency or networking involved?

The reason is that processes in a system are not always running, actually it is the kernel scheduler that lets the process run. So, for instance, when the benchmark is allowed to run, it reads the reply from the Redis server (related to the last command executed), and writes a new command. The command is now in the loopback interface buffer, but in order to be read by the server, the kernel should schedule the server process (currently blocked in a system call) to run, and so forth. So in practical terms the loopback interface still involves network-like latency, because of how the kernel scheduler works.

Basically a busy loop benchmark is the silliest thing that can be done when metering performances on a networked server. The wise thing is just avoiding benchmarking in this way.

RATE THIS PAGE
Back to top ↑