群集节点的维护模式

准备用于维护的群集节点。

Redis 企业软件

使用维护模式可防止在 Redis Enterprise 服务器上的硬件修补或作系统维护期间丢失数据。当维护模式开启时,所有分片都会从正在维护的节点中移出并迁移到另一个可用节点。

激活维护模式

当您激活维护模式时,Redis Enterprise 将执行以下作:

  1. 检查节点关闭是否会导致 quorum 丢失。如果是这样,维护模式将不会打开。

    维护模式不能防止仲裁丢失。如果您为集群中的大多数节点激活维护模式并同时重新启动它们,则仲裁将丢失,这可能会导致数据丢失。

  2. 如果不存在维护模式快照,或者您使用overwrite_snapshot当您激活维护模式时,Redis Enterprise 会创建一个新的节点快照,用于记录节点的分片和终端节点配置。

  3. 将节点标记为 quorum 节点,以防止分片和终端节点迁移到该节点。

    此时,rladmin status以黄色显示节点的 shards 字段,这表示 shards 无法迁移到该节点。

  4. 在空间可用时迁移分片并将终端节点绑定到其他节点。

默认情况下,维护模式不会降级主节点。当原始主节点重新启动时,集群会选择一个新的主节点。

添加demote_node选项添加到rladmin命令在激活维护模式时降级主节点

要激活节点的维护模式,请运行以下命令:

rladmin node <node_id> maintenance_mode on overwrite_snapshot

You can start server maintenance if:

  • All shards and endpoints have moved to other nodes

  • Enough nodes are still online to maintain quorum

Prevent replica shard migration

If you do not have enough resources available to move all of the shards to other nodes, you can turn maintenance mode on without migrating the replica shards.

Before you prevent replica shard migration during maintenance mode, consider the following effects:

  • Replica shards remain on the node during maintenance.

  • If the maintenance node fails, the master shards do not have replica shards to maintain data redundancy and high availability.

  • Replica shards that remain on the node can still be promoted during failover to preserve availability.

To activate maintenance mode without replica shard migration, run:

rladmin node <node_id> maintenance_mode on evict_ha_replica disabled evict_active_active_replica disabled

Demote a master node

If maintenance might affect connectivity to the master node, you can demote the master node when you activate maintenance mode. This lets the cluster elect a new master node.

To demote a master node when activating maintenance mode, run:

rladmin node <node_id> maintenance_mode on demote_node

Verify maintenance mode activation

To verify maintenance mode for a node, use rladmin status and review the node's shards field. If that value is displayed in yellow (shown earlier), then the node is in maintenance mode.

Avoid activating maintenance mode when it is already active. Maintenance mode activations stack. If you activate maintenance mode for a node that is already in maintenance mode, you will have to deactivate maintenance mode twice in order to restore full functionality.

Deactivate maintenance mode

When you deactivate maintenance mode, Redis Enterprise:

  1. Loads a specified node snapshot or defaults to the latest maintenance mode snapshot.

  2. Unmarks the node as a quorum node to allow shards and endpoints to migrate to the node.

  3. Restores the shards and endpoints that were in the node at the time of the snapshot.

  4. Deletes the snapshot.

To deactivate maintenance mode after server maintenance, run:

rladmin node <node_id> maintenance_mode off

By default, a snapshot is required to deactivate maintenance mode. If the snapshot cannot be restored, deactivation is cancelled and the node remains in maintenance mode. In such events, it may be necessary to reset node status.

Specify a snapshot

When you turn off maintenance mode, you can restore the node configuration from a maintenance mode snapshot or any snapshots previously created by rladmin node <node_id> snapshot create. If you do not specify a snapshot, Redis Enterprise uses the latest maintenance mode snapshot by default.

To get a list of available snapshots, run:

rladmin node <node_id> snapshot list

To specify a snapshot when you turn maintenance mode off, run:

rladmin node <node_id> maintenance_mode off snapshot_name <snapshot_name>
Note:
If an error occurs when you turn on maintenance mode, the snapshot is not deleted. When you rerun the command, use the snapshot from the initial attempt since it contains the original state of the node.

Skip shard restoration

You can prevent the migrated shards and endpoints from returning to the original node after you turn off maintenance mode.

To turn maintenance mode off and skip shard restoration, run:

rladmin node <node_id> maintenance_mode off skip_shards_restore

Reset node status

In extreme cases, you may need to reset a node's status. Run the following commands to do so:

$ rladmin tune node <node_id> max_listeners 100
$ rladmin tune node <node_id> quorum_only disabled

Use these commands with caution. For best results, contact Support before running these commands.

Cluster status example

This example shows how the output of rladmin status changes when you turn on maintenance mode for a node.

The cluster status before turning on maintenance mode:

redislabs@rp1_node1:/opt$ rladmin status
CLUSTER NODES:
NODE:ID   ROLE     ADDRESS       EXTERNAL_ADDRESS     HOSTNAME    SHARDS
*node:1   master   172.17.0.2                         rp1_node1   2/100
node:2    slave    172.17.0.4                         rp3_node1   2/100
node:3    slave    172.17.0.3                         rp2_node1   0/100

The cluster status after turning on maintenance mode:

redislabs@rp1_node1:/opt$ rladmin node 2 maintenance_mode on
Performing maintenance_on action on node:2: 0%
created snapshot NodeSnapshot<name=maintenance_mode_2019-03-14_09-50-59,time=None,node_uid=2>

node:2 will not accept any more shards
Performing maintenance_on action on node:2: 100%
OK
redislabs@rp1_node1:/opt$ rladmin status
CLUSTER NODES:
NODE:ID   ROLE     ADDRESS       EXTERNAL_ADDRESS     HOSTNAME    SHARDS
*node:1   master   172.17.0.2                         rp1_node1   2/100
node:2    slave    172.17.0.4                         rp3_node1   0/0
node:3    slave    172.17.0.3                         rp2_node1   2/100

After turning on maintenance mode for node 2, Redis Enterprise saves a snapshot of its configuration and then moves its shards and endpoints to node 3.

Now node 2 has 0/0 shards because shards cannot migrate to it while it is in maintenance mode.

Maintenance mode REST API

You can also turn maintenance mode on or off using REST API requests to POST /nodes/{node_uid}/actions/{action}.

Activate maintenance mode (REST API)

Use POST /nodes/{node_uid}/actions/maintenance_on to activate maintenance mode:

POST https://<hostname>:9443/v1/nodes/<node_id>/actions/maintenance_on
{
    "overwrite_snapshot": true,
    "evict_ha_replica": true,
    "evict_active_active_replica": true
}

You can set evict_ha_replica and evict_active_active_replica to false to prevent replica shard migration.

The maintenance_on request returns a JSON response body:

{
    "status":"queued",
    "task_id":"<task-id-guid>"
}

Deactivate maintenance mode (REST API)

Use POST /nodes/{node_uid}/actions/maintenance_off deactivate maintenance mode:

POST https://<hostname>:9443/v1/nodes/<node_id>/actions/maintenance_off
{ "skip_shards_restore": false }

The skip_shards_restore boolean flag allows the maintenance_off action to skip shard restoration when set to true.

The maintenance_off request returns a JSON response body:

{
    "status":"queued",
    "task_id":"<task-id-guid>"
}

Track action status

You can send a request to GET /nodes/{node_uid}/actions/{action} to track the status of the maintenance_on and maintenance_off actions.

This request returns the status of the maintenance_on action:

GET https://<hostname>:9443/v1/nodes/<node_id>/actions/maintenance_on

The response body:

{
    "status":"completed",
    "task_id":"38c7405b-26a7-4379-b84c-cab4b3db706d"
}
RATE THIS PAGE
Back to top ↑