Redis 集群与高可用
Redis 集群与高可用

Redis 集群与高可用

Redis单机服务存在数据和服务的单点问题,而且单机性能也存在着上限,可以利用Redis的集群相关技术来解决这些问题.

Redis 主从复制

Redis 主从复制架构

主从模式(master/slave),和MySQL的主从模式类似,可以实现Redis数据的跨主机的远程备份。常见客户端连接主从的架构:
程序APP先连接到高可用性 LB 集群提供的虚拟IP,再由LB调度将用户的请求至后端Redis 服务器来真正提供服务

主从复制特点
一个master可以有多个slave
一个slave只能有一个master
数据流向是从master到slave单向的
master 可读可写
slave 只读

主从复制实现

当master出现故障后,可以会提升一个slave节点变成新的Mster,因此Redis Slave 需要设置和master相同的连接密码,此外当一个Slave提升为新的master 通过持久化实现数据的恢复

当配置Redis复制功能时,强烈建议打开主服务器的持久化功能。否则的话,由于延迟等问题,部署的主节点Redis服务应该要避免自动启动。
参考案例: 导致主从服务器数据全部丢失

1.假设节点A为主服务器,并且关闭了持久化。并且节点B和节点C从节点A复制数据
2.节点A崩溃,然后由自动拉起服务重启了节点A.由于节点A的持久化被关闭了,所以重启之后没有任何数据
3.节点B和节点C将从节点A复制数据,但是A的数据是空的,于是就把自身保存的数据副本删除。

在关闭主服务器上的持久化,并同时开启自动拉起进程的情况下,即便使用Sentinel来实现Redis的高可用性,也是非常危险的。因为主服务器可能拉起得非常快,以至于Sentinel在配置的心跳时间间隔内没有检测到主服务器已被重启,然后还是会执行上面的数据丢失的流程。无论何时,数据安全都是极其重要的,所以应该禁止主服务器关闭持久化的同时自动启动。

主从命令配置

启用主从同步

Redis Server 默认为 master节点,如果要配置为从节点,需要指定master服务器的IP,端口及连接密码在从节点执行 REPLICAOF MASTER_IP PORT 指令可以启用主从同步复制功能,早期版本使用 SLAVEOF指令

127.0.0.1:6379> REPLICAOF MASTER_IP PORT #新版推荐使用
127.0.0.1:6379> SLAVEOF MasterIP Port #旧版使用,将被淘汰
127.0.0.1:6379> CONFIG SET masterauth <masterpass>
#master
[root@ubuntu2004 ~]#
[root@ubuntu2004 ~]#redis-cli 
127.0.0.1:6379> AUTH 123456
OK
127.0.0.1:6379> INFO replication
# Replication
role:master
connected_slaves:0
master_failover_state:no-failover
master_replid:36fc325641d00b7bd4c98a38cd33f5af01c28234
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
127.0.0.1:6379> set key1 v1-master
OK
127.0.0.1:6379> get key1
"v1-master"

#slave-1
[root@ubuntu2004 ~]#redis-cli 
127.0.0.1:6379> AUTH 123456
OK
127.0.0.1:6379> INFO replication
# Replication
role:master
connected_slaves:0
master_failover_state:no-failover
master_replid:11e2f612583f3facebd4d481b1f0f031ca48dd2f
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
127.0.0.1:6379> set key1 v1-slave-202
OK
127.0.0.1:6379> get key1
"v1-slave-202"

#slave-2
[root@ubuntu2004 ~]#redis-cli 
127.0.0.1:6379> AUTH 123456
OK
127.0.0.1:6379> INFO replication
# Replication
role:master
connected_slaves:0
master_failover_state:no-failover
master_replid:af27c4c549276f57e5a574cf4d39f75a13557824
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0
127.0.0.1:6379> set key1 v1-slave-203
OK
127.0.0.1:6379> get key1
"v1-slave-203"

#在slave1上设置master的IP和端口,4.0版之前的指令为slaveof
127.0.0.1:6379> REPLICAOF 10.0.0.201 6379
OK
#在slave上设置master的密码,才可以同步
127.0.0.1:6379> CONFIG set masterauth 123456
OK
127.0.0.1:6379> INFO replication
# Replication
role:slave
master_host:10.0.0.201
master_port:6379
master_link_status:up
master_last_io_seconds_ago:3
master_sync_in_progress:0
slave_read_repl_offset:70
slave_repl_offset:70
slave_priority:100
slave_read_only:1
replica_announced:1
connected_slaves:0
master_failover_state:no-failover
master_replid:8e0ba23a1b10d3cda4e6c8832d100a7c71de95e3
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:70
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:15
repl_backlog_histlen:56
127.0.0.1:6379> get key1
"v1-master"
127.0.0.1:6379> DBSIZE
(integer) 1100011

#slave2上设置master的IP和端口,4.0版之前的指令为slaveof
127.0.0.1:6379> REPLICAOF 10.0.0.201 6379
OK
127.0.0.1:6379> CONFIG set masterauth 123456
OK
127.0.0.1:6379> INFO replication
# Replication
role:slave
master_host:10.0.0.201
master_port:6379
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
slave_read_repl_offset:308
slave_repl_offset:308
slave_priority:100
slave_read_only:1
replica_announced:1
connected_slaves:0
master_failover_state:no-failover
master_replid:8e0ba23a1b10d3cda4e6c8832d100a7c71de95e3
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:308
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:309
repl_backlog_histlen:0
127.0.0.1:6379> get key1
"v1-master"
127.0.0.1:6379> DBSIZE
(integer) 1100011

#查看master 信息
127.0.0.1:6379> INFO replication
# Replication
role:master
connected_slaves:2
slave0:ip=10.0.0.202,port=6379,state=online,offset=350,lag=1
slave1:ip=10.0.0.203,port=6379,state=online,offset=350,lag=1
master_failover_state:no-failover
master_replid:8e0ba23a1b10d3cda4e6c8832d100a7c71de95e3
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:350
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:350
127.0.0.1:6379> DBSIZE
(integer) 1100011

删除主从同步

在从节点执行 REPLICAOF NO ONE 指令可以取消主从复制

取消复制,在slave上执行REPLICAOF NO ONE,会断开和master的连接不再主从复制, 但不会清除slave上已有的数据
127.0.0.1:6379> REPLICAOF no one

验证同步

观察master日志

[root@ubuntu2004 ~]#tail -n30 /apps/redis/log/redis-6379.log 
61708:M 30 Oct 2022 14:11:08.547 * Successfully renamed the temporary AOF base file temp-rewriteaof-bg-61926.aof into appendonly.aof.3.base.rdb
61708:M 30 Oct 2022 14:11:08.549 * Removing the history file appendonly.aof.2.incr.aof in the background
61708:M 30 Oct 2022 14:11:08.549 * Removing the history file appendonly.aof.2.base.rdb in the background
61708:M 30 Oct 2022 14:11:08.552 * Background AOF rewrite finished successfully
61708:M 30 Oct 2022 14:22:39.966 * Background saving started by pid 62215
62215:C 30 Oct 2022 14:22:40.563 * DB saved on disk
62215:C 30 Oct 2022 14:22:40.564 * Fork CoW for RDB: current 0 MB, peak 0 MB, average 0 MB
61708:M 30 Oct 2022 14:22:40.625 * Background saving terminated with success
61708:M 30 Oct 2022 14:50:46.686 * DB saved on disk
61708:M 31 Oct 2022 10:00:00.452 * Replica 10.0.0.202:6379 asks for synchronization
61708:M 31 Oct 2022 10:00:00.452 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for '11e2f612583f3facebd4d481b1f0f031ca48dd2f', my replication IDs are '36fc325641d00b7bd4c98a38cd33f5af01c28234' and '0000000000000000000000000000000000000000')
61708:M 31 Oct 2022 10:00:00.453 * Replication backlog created, my new replication IDs are '8e0ba23a1b10d3cda4e6c8832d100a7c71de95e3' and '0000000000000000000000000000000000000000'
61708:M 31 Oct 2022 10:00:00.453 * Delay next BGSAVE for diskless SYNC
61708:M 31 Oct 2022 10:00:05.610 * Starting BGSAVE for SYNC with target: replicas sockets
61708:M 31 Oct 2022 10:00:05.631 * Background RDB transfer started by pid 77225
77225:C 31 Oct 2022 10:00:07.812 * Fork CoW for RDB: current 1 MB, peak 1 MB, average 1 MB
61708:M 31 Oct 2022 10:00:07.813 # Diskless rdb transfer, done reading from pipe, 1 replicas still up.
61708:M 31 Oct 2022 10:00:07.857 * Background RDB transfer terminated with success
61708:M 31 Oct 2022 10:00:07.857 * Streamed RDB transfer with replica 10.0.0.202:6379 succeeded (socket). Waiting for REPLCONF ACK from slave to enable streaming
61708:M 31 Oct 2022 10:00:07.857 * Synchronization with replica 10.0.0.202:6379 succeeded
61708:M 31 Oct 2022 10:03:35.623 * Replica 10.0.0.203:6379 asks for synchronization
61708:M 31 Oct 2022 10:03:35.623 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for 'af27c4c549276f57e5a574cf4d39f75a13557824', my replication IDs are '8e0ba23a1b10d3cda4e6c8832d100a7c71de95e3' and '0000000000000000000000000000000000000000')
61708:M 31 Oct 2022 10:03:35.623 * Delay next BGSAVE for diskless SYNC
61708:M 31 Oct 2022 10:03:40.934 * Starting BGSAVE for SYNC with target: replicas sockets
61708:M 31 Oct 2022 10:03:40.937 * Background RDB transfer started by pid 77315
77315:C 31 Oct 2022 10:03:41.836 * Fork CoW for RDB: current 1 MB, peak 1 MB, average 1 MB
61708:M 31 Oct 2022 10:03:41.836 # Diskless rdb transfer, done reading from pipe, 1 replicas still up.
61708:M 31 Oct 2022 10:03:41.843 * Background RDB transfer terminated with success
61708:M 31 Oct 2022 10:03:41.843 * Streamed RDB transfer with replica 10.0.0.203:6379 succeeded (socket). Waiting for REPLCONF ACK from slave to enable streaming
61708:M 31 Oct 2022 10:03:41.843 * Synchronization with replica 10.0.0.203:6379 succeeded

在 slave 节点观察日志

[root@ubuntu2004 ~]#tail  /apps/redis/log/redis-6379.log 
53218:S 31 Oct 2022 10:00:05.613 * Full resync from master: 8e0ba23a1b10d3cda4e6c8832d100a7c71de95e3:14
53218:S 31 Oct 2022 10:00:05.709 * MASTER <-> REPLICA sync: receiving streamed RDB from master with EOF to disk
53218:S 31 Oct 2022 10:00:07.810 * Discarding previously cached master state.
53218:S 31 Oct 2022 10:00:07.810 * MASTER <-> REPLICA sync: Flushing old data
53218:S 31 Oct 2022 10:00:07.810 * MASTER <-> REPLICA sync: Loading DB in memory
53218:S 31 Oct 2022 10:00:07.815 * Loading RDB produced by version 7.0.5
53218:S 31 Oct 2022 10:00:07.815 * RDB age 2 seconds
53218:S 31 Oct 2022 10:00:07.815 * RDB memory usage when created 92.38 Mb
53218:S 31 Oct 2022 10:00:09.651 * Done loading RDB, keys loaded: 1100011, keys expired: 0.
53218:S 31 Oct 2022 10:00:09.651 * MASTER <-> REPLICA sync: Finished with success

修改slave节点配置文件

#从节点重复以下操作,因为之前使用命令修改的配置,重启后就失效了所以需要修改配置文件
[root@ubuntu2004 ~]#vim /apps/redis/etc/redis.conf 
# replicaof <masterip> <masterport>
replicaof 10.0.0.201 6379 #指定master的IP和端口号
.....
# masterauth <master-password>
masterauth 123456
[root@ubuntu2004 ~]#systemctl restart redis.service

master和slave查看状态

#master
[root@ubuntu2004 ~]#redis-cli 
127.0.0.1:6379> AUTH 123456
OK
127.0.0.1:6379> info replication
# Replication
role:master
connected_slaves:1
slave0:ip=10.0.0.203,port=6379,state=online,offset=1568,lag=1
master_failover_state:no-failover
master_replid:8e0ba23a1b10d3cda4e6c8832d100a7c71de95e3
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:1568
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:1568

#slave1(slave2相同)
[root@ubuntu2004 ~]#redis-cli 
127.0.0.1:6379> AUTH 123456
OK
127.0.0.1:6379> get key1   #同步成功后,slave原key信息丢失,获取master复制过来新的值
"v1-master"
127.0.0.1:6379> INFO replication
# Replication
role:slave
master_host:10.0.0.201
master_port:6379
master_link_status:down
master_last_io_seconds_ago:-1
master_sync_in_progress:1
slave_read_repl_offset:14
slave_repl_offset:14
master_sync_total_bytes:-1
master_sync_read_bytes:0
master_sync_left_bytes:-1
master_sync_perc:-0.00
master_sync_last_io_seconds_ago:0
master_link_down_since_seconds:-1
slave_priority:100
slave_read_only:1
replica_announced:1
connected_slaves:0
master_failover_state:no-failover
master_replid:8e0ba23a1b10d3cda4e6c8832d100a7c71de95e3
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:14
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

#停止master的redis服务:systemctl stop redis,在slave上可以观察到以下现象
127.0.0.1:6379> INFO replication
# Replication
role:slave
master_host:10.0.0.201
master_port:6379
master_link_status:down   #显示down,表示无法连接master
master_last_io_seconds_ago:-1
master_sync_in_progress:0
slave_read_repl_offset:14
slave_repl_offset:14
master_link_down_since_seconds:-1
slave_priority:100
slave_read_only:1
replica_announced:1
connected_slaves:0
master_failover_state:no-failover
master_replid:8e0ba23a1b10d3cda4e6c8832d100a7c71de95e3
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:14
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

master日志

[root@ubuntu2004 ~]#tail -f /apps/redis/log/redis-6379.log
77825:M 31 Oct 2022 10:21:57.556 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for '8e0ba23a1b10d3cda4e6c8832d100a7c71de95e3', my replication IDs are '63e81b5ac49766a809c0b4f68bf2ad2bdb3d2116' and '0000000000000000000000000000000000000000')
77825:M 31 Oct 2022 10:21:57.556 * Replication backlog created, my new replication IDs are '3f6e5aae3689756fd51b0e72e16774af84367e17' and '0000000000000000000000000000000000000000'
77825:M 31 Oct 2022 10:21:57.556 * Delay next BGSAVE for diskless SYNC
77825:M 31 Oct 2022 10:22:02.839 * Starting BGSAVE for SYNC with target: replicas sockets
77825:M 31 Oct 2022 10:22:02.842 * Background RDB transfer started by pid 77848
77848:C 31 Oct 2022 10:22:03.940 * Fork CoW for RDB: current 1 MB, peak 1 MB, average 0 MB
77825:M 31 Oct 2022 10:22:03.940 # Diskless rdb transfer, done reading from pipe, 1 replicas still up.
77825:M 31 Oct 2022 10:22:03.949 * Background RDB transfer terminated with success
77825:M 31 Oct 2022 10:22:03.949 * Streamed RDB transfer with replica 10.0.0.203:6379 succeeded (socket). Waiting for REPLCONF ACK from slave to enable streaming
77825:M 31 Oct 2022 10:22:03.949 * Synchronization with replica 10.0.0.203:6379 succeeded
77825:M 31 Oct 2022 10:27:09.094 * Replica 10.0.0.202:6379 asks for synchronization
77825:M 31 Oct 2022 10:27:09.095 * Partial resynchronization not accepted: Replication ID mismatch (Replica asked for '8e0ba23a1b10d3cda4e6c8832d100a7c71de95e3', my replication IDs are '3f6e5aae3689756fd51b0e72e16774af84367e17' and '0000000000000000000000000000000000000000')
77825:M 31 Oct 2022 10:27:09.095 * Delay next BGSAVE for diskless SYNC
77825:M 31 Oct 2022 10:27:14.897 * Starting BGSAVE for SYNC with target: replicas sockets
77825:M 31 Oct 2022 10:27:14.899 * Background RDB transfer started by pid 78010
78010:C 31 Oct 2022 10:27:15.852 * Fork CoW for RDB: current 1 MB, peak 1 MB, average 0 MB
77825:M 31 Oct 2022 10:27:15.852 # Diskless rdb transfer, done reading from pipe, 1 replicas still up.
77825:M 31 Oct 2022 10:27:15.907 * Background RDB transfer terminated with success
77825:M 31 Oct 2022 10:27:15.907 * Streamed RDB transfer with replica 10.0.0.202:6379 succeeded (socket). Waiting for REPLCONF ACK from slave to enable streaming
77825:M 31 Oct 2022 10:27:15.907 * Synchronization with replica 10.0.0.202:6379 succeeded

slave日志

[root@ubuntu2004 ~]#tail -f /apps/redis/log/redis-6379.log 
12424:S 31 Oct 2022 10:22:02.894 * Full resync from master: 3f6e5aae3689756fd51b0e72e16774af84367e17:14
12424:S 31 Oct 2022 10:22:02.897 * MASTER <-> REPLICA sync: receiving streamed RDB from master with EOF to disk
12424:S 31 Oct 2022 10:22:03.993 * Discarding previously cached master state.
12424:S 31 Oct 2022 10:22:03.993 * MASTER <-> REPLICA sync: Flushing old data
12424:S 31 Oct 2022 10:22:04.444 * MASTER <-> REPLICA sync: Loading DB in memory
12424:S 31 Oct 2022 10:22:04.454 * Loading RDB produced by version 7.0.5
12424:S 31 Oct 2022 10:22:04.454 * RDB age 2 seconds
12424:S 31 Oct 2022 10:22:04.454 * RDB memory usage when created 91.63 Mb
12424:S 31 Oct 2022 10:22:05.052 * Done loading RDB, keys loaded: 1100011, keys expired: 0.
12424:S 31 Oct 2022 10:22:05.052 * MASTER <-> REPLICA sync: Finished with success

slave 只读状态

验证Slave节点为只读状态, 不支持写入

[root@ubuntu2004 ~]#redis-cli 
127.0.0.1:6379> AUTH 123456
OK
127.0.0.1:6379> set key1 v1-slave
(error) READONLY You can't write against a read only replica.

主从复制故障恢复

主从复制故障恢复过程介绍

slave 节点故障和恢复
当 slave 节点故障时,将Redis Client指向另一个 slave 节点即可,并及时修复故障从节点

master 节点故障和恢复
当 master 节点故障时,需要提升slave为新的master
master故障后,只能手动提升一个slave为新master,不支持自动切换。
之后将其它的slave节点重新指定新的master为master节点
Master的切换会导致master_replid发生变化,slave之前的master_replid就和当前master不一致从而会引发所有 slave的全量同步。

实现 Redis 的级联复制

即实现基于Slave节点的Slave

master和slave1节点无需修改,只需要修改slave2及slave3指向slave1做为mater即可

#slave-2
[root@ubuntu2004 ~]#redis-cli 
127.0.0.1:6379> AUTH 123456
OK
127.0.0.1:6379> REPLICAOF no one
OK
127.0.0.1:6379> info replication
# Replication
role:master
connected_slaves:0
master_failover_state:no-failover
master_replid:50a58af83e58347126c925e0b9199b2ced508300
master_replid2:3f6e5aae3689756fd51b0e72e16774af84367e17
master_repl_offset:3220
second_repl_offset:3221
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:15
repl_backlog_histlen:3206

127.0.0.1:6379> REPLICAOF 10.0.0.202 6379
OK

127.0.0.1:6379> info replication
# Replication
role:slave
master_host:10.0.0.202
master_port:6379
master_link_status:down
master_last_io_seconds_ago:-1
master_sync_in_progress:0
slave_read_repl_offset:3220
slave_repl_offset:3220
master_link_down_since_seconds:-1
slave_priority:100
slave_read_only:1
replica_announced:1
connected_slaves:0
master_failover_state:no-failover
master_replid:50a58af83e58347126c925e0b9199b2ced508300
master_replid2:3f6e5aae3689756fd51b0e72e16774af84367e17
master_repl_offset:3220
second_repl_offset:3221
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:15
repl_backlog_histlen:3206

[root@ubuntu2004 ~]#vim /apps/redis/etc/redis.conf 
# replicaof <masterip> <masterport>
replicaof 10.0.0.202 6379
....
# masterauth <master-password>
masterauth 123456

#slave-3
[root@ubuntu2004 ~]#redis-cli 
127.0.0.1:6379> AUTH 123456
OK
127.0.0.1:6379> REPLICAOF 10.0.0.202 6379
OK
127.0.0.1:6379> CONFIG set masterauth 123456
OK

127.0.0.1:6379> info replication
# Replication
role:slave
master_host:10.0.0.202
master_port:6379
master_link_status:down
master_last_io_seconds_ago:-1
master_sync_in_progress:0
slave_read_repl_offset:0
slave_repl_offset:0
master_link_down_since_seconds:-1
slave_priority:100
slave_read_only:1
replica_announced:1
connected_slaves:0
master_failover_state:no-failover
master_replid:bbf48a6263d40d4476505da0eed0f8a2a3ea6304
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:0
second_repl_offset:-1
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

#查看slave-1状态
127.0.0.1:6379> info replication
# Replication
role:slave
master_host:10.0.0.201
master_port:6379
master_link_status:up
master_last_io_seconds_ago:6
master_sync_in_progress:0
slave_read_repl_offset:3990
slave_repl_offset:3990
slave_priority:100
slave_read_only:1
replica_announced:1
connected_slaves:2
slave0:ip=10.0.0.203,port=6379,state=online,offset=3990,lag=0
slave1:ip=10.0.0.204,port=6379,state=online,offset=3990,lag=1
master_failover_state:no-failover
master_replid:3f6e5aae3689756fd51b0e72e16774af84367e17
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:3990
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:449
repl_backlog_histlen:3542

主从复制优化

主从复制过程

Redis主从复制分为全量同步和增量同步
Redis 的主从同步是非阻塞的,即同步过程不会影响主服务器的正常访问.

全量复制过程

主从节点建立连接,验证身份后,从节点向主节点发送PSYNC(2.8版本之前是SYNC)命令
主节点向从节点发送FULLRESYNC命令,包括runID和offset
从节点保存主节点信息
主节点执行BGSAVE保存RDB文件,同时记录新的记录到buffer中
主节点发送RDB文件给从节点
主节点将新收到buffer中的记录发送至从节点
从节点删除本机的旧数据
从节点加载RDB
从节点同步主节点的buffer信息

增量复制过程

在主从复制首次完成全量同步之后再次需要同步时,从服务器只要发送当前的offset位置(类似于MySQL的binlog的位置)给主服务器,然后主服务器根据相应的位置将之后的数据(包括写在缓冲区的积压数据)发送给从服务器,再次将其保存到从节点内存即可。

即首次全量复制,之后的复制基本增量复制实现

主从同步完整过程

lave发起连接master,验证通过后,发送PSYNC命令
master接收到PSYNC命令后,执行BGSAVE命令将全部数据保存至RDB文件中,并将后续发生的写操作记录至buffer中
master向所有slave发送RDB文件
master向所有slave发送后续记录在buffer中写操作
slave收到快照文件后丢弃所有旧数据
slave加载收到的RDB到内存
slave 执行来自master接收到的buffer写操作
当slave完成全量复制后,后续master只会先发送slave_repl_offset信息
以后slave比较自身和master的差异,只会进行增量复制的数据即

#复制缓冲区(环形队列)配置参数:
#master的写入数据缓冲区,用于记录自上一次同步后到下一次同步过程中间的写入命令,计算公式:repl-backlog-size = 允许从节点最大中断时长 * 主实例offset每秒写入量,比如:master每秒最大写入64mb,最大允许60秒,那么就要设置为64mb*60秒=3840MB(3.8G),建议此值是设置的足够大repl-backlog-size 1mb
#如果一段时间后没有slave连接到master,则backlog size的内存将会被释放。如果值为0则表示永远不释放这部份内存。
repl-backlog-ttl 3600

避免全量复制

第一次全量复制不可避免,后续的全量复制可以利用小主节点(内存小),业务低峰时进行全量
节点运行ID不匹配:主节点重启会导致RUNID变化,可能会触发全量复制,可以利用故障转移,例如哨兵或集群,而从节点重启动,不会导致全量复制
复制积压缓冲区不足: 当主节点生成的新数据大于缓冲区大小,从节点恢复和主节点连接后,会导致全量复制.解决方法将repl-backlog-size 调大

避免复制风暴

单主节点复制风暴
当主节点重启,多从节点复制
解决方法:更换复制拓扑

单机器多实例复制风暴
机器宕机后,大量全量复制
解决方法:主节点分散多机器

主从同步优化配置

Redis在2.8版本之前没有提供增量部分复制的功能,当网络闪断或者slave Redis重启之后会导致主从之间的全量同步,即从2.8版本开始增加了部分复制的功能。

性能相关配置

repl-diskless-sync no # 是否使用无盘方式进行同步RDB文件,默认为no,no表示不使用无盘,需要将RDB文件保存到磁盘后再发送给slave,yes表示使用无盘,即RDB文件不需要保存至本地磁盘,而且直接通过网络发送给slave

repl-diskless-sync-delay 5 #无盘时复制的服务器等待的延迟时间

repl-ping-slave-period 10 #slave向master发送ping指令的时间间隔,默认为10s

repl-timeout 60 #指定ping连接超时时间,超过此值无法连接,master_link_status显示为down状态,并记录错误日志

repl-disable-tcp-nodelay no #是否启用TCP_NODELAY
#设置成yes,则redis会合并多个小的TCP包成一个大包再发送,此方式可以节省带宽,但会造成同步延迟时长的增加,导致master与slave数据短期内不一致
#设置成no,则master会立即同步数据

repl-backlog-size 1mb #master的写入数据缓冲区,用于记录自上一次同步后到下一次同步前期间的写入命令,计算公式:repl-backlog-size = 允许slave最大中断时长 * master节点offset每秒写入量,如:master每秒最大写入量为32MB,最长允许中断60秒,就要至少设置为32*60=1920MB,建议此值是设置的足够大,如果此值太小,会造成全量复制

repl-backlog-ttl 3600 #指定多长时间后如果没有slave连接到master,则backlog的内存数据将会过期。如果值为0表示永远不过期。

slave-priority 100 #slave参与选举新的master的优先级,此整数值越小则优先级越高。当master故障时将会按照优先级来选择slave端进行选举新的master,如果值设置为0,则表示该slave节点永远不会被选为master节点。

min-replicas-to-write 1 #指定master的可用slave不能少于个数,如果少于此值,master将无法执行写操作,默认为0,生产建议设为1,

min-slaves-max-lag 20 #指定至少有min-replicas-to-write数量的slave延迟时间都大于此秒数时,master将不能执行写操作

常见主从复制故障

主从硬件和软件配置不一致

主从节点的maxmemory不一致,主节点内存大于从节点内存,主从复制可能丢失数据
rename-command 命令不一致,如在主节点启用flushdb,从节点禁用此命令,结果在master节点执行flushdb后,导致slave节点不同步

#在从节点定义rename-command flushall "",但是在主节点没有此配置,则当在主节点执行flushall时,会在从节点提示下面同步错误
10822:S 16 Oct 2020 20:03:45.291 # == CRITICAL == This replica is sending anerror to its master: 'unknown command flushall, with args beginning with: 'after processing the command '<unknown>'

#master有一个rename-command flushdb "wang",而slave没有这个配置,则同步时从节点可以看到以下同步错误
3181:S 21 Oct 2020 17:34:50.581 # == CRITICAL == This replica is sending an error to its master: 'unknown command wang, with args beginning with: ' after processing the command '<unknown>

master密码错误

如果slave节点配置的master密码错误,导致验证不通过,自然将无法建立主从同步关系。

[root@ubuntu2004 ~]#tail -f /apps/redis/log/redis-6379.log 
75974:S 31 Oct 2022 10:26:31.241 * Connecting to MASTER 10.0.0.201:6379
75974:S 31 Oct 2022 10:26:31.241 * MASTER <-> REPLICA sync started
75974:S 31 Oct 2022 10:26:31.242 * Non blocking connect for SYNC fired the event.
75974:S 31 Oct 2022 10:26:31.244 * Master replied to PING, replication can continue...
75974:S 31 Oct 2022 10:26:31.246 # Unable to AUTH to MASTER: -WRONGPASS invalid username-password pair or user is disabled.
75974:S 31 Oct 2022 10:26:32.246 * Connecting to MASTER 10.0.0.201:6379

Redis版本不一致

不同的redis 版本之间尤其是大版本间可能会存在兼容性问题,如:Redis 3,4,5,6之间
因此主从复制的所有节点应该使用相同的版本

安全模式下无法远程连接

如果开启了安全模式,并且没有设置bind地址和密码,会导致无法远程连接

Redis 哨兵 Sentinel

Redis 集群介绍

主从架构和MySQL的主从复制一样,无法实现master和slave角色的自动切换,即当master出现故障时,不能实现自动的将一个slave 节点提升为新的master节点,即主从复制无法实现自动的故障转移功能,如果想实现转移,则需要手动修改配置,才能将 slave 服务器提升新的master节点.此外只有一个主节点支持写操作,所以业务量很大时会导致Redis服务性能达到瓶颈

需要解决的主从复制以下存在的问题:
master和slave角色的自动切换,且不能影响业务
提升Redis服务整体性能,支持更高并发访问

哨兵Sentinel工作原理

哨兵Sentinel从Redis2.6版本开始引用,Redis 2.8版本之后稳定可用。生产环境如果要使用此功能建议使用Redis的2.8版本以上版本

Sentinel 架构和故障转移机制

Sentinel 架构
Sentinel 故障转移

专门的Sentinel 服务进程是用于监控redis集群中Master工作的状态,当Master主服务器发生故障的时候,可以实现Master和Slave的角色的自动切换,从而实现系统的高可用性

Sentinel是一个分布式系统,即需要在多个节点上各自同时运行一个sentinel进程,Sentienl 进程通过流言协议(gossip protocols)来接收关于Master是否下线状态,并使用投票协议(Agreement Protocols)来决定是否执行自动故障转移,并选择合适的Slave作为新的Master

每个Sentinel进程会向其它Sentinel、Master、Slave定时发送消息,来确认对方是否存活,如果发现某个节点在指定配置时间内未得到响应,则会认为此节点已离线,即为主观宕机Subjective Down,简称为 SDOWN

如果哨兵集群中的多数Sentinel进程认为Master存在SDOWN,共同利用 is-master-down-by-addr 命令互相通知后,则认为客观宕机Objectively Down, 简称 ODOWN
接下来利用投票算法,从所有slave节点中,选一台合适的slave将之提升为新Master节点,然后自动修改其它slave相关配置,指向新的master节点,最终实现故障转移failover

Redis Sentinel中的Sentinel节点个数应该为大于等于3且最好为奇数

客户端初始化时连接的是Sentinel节点集合,不再是具体的Redis节点,即 Sentinel只是配置中心不是代理。
Redis Sentinel 节点与普通 Redis 没有区别,要实现读写分离依赖于客户端程序

Sentinel 机制类似于MySQL中的MHA功能,只解决master和slave角色的自动故障转移问题,但单个
Master 的性能瓶颈问题并没有解决

Redis 3.0 之前版本中,生产环境一般使用哨兵模式较多,Redis 3.0后推出Redis cluster功能,可以支持更大规模的高并发环境

Sentinel中的三个定时任务

每10 秒每个sentinel 对master和slave执行info
发现slave节点
确认主从关系

每2秒每个sentinel通过master节点的channel交换信息(pub/sub)
通过sentinel__:hello频道交互
交互对节点的“看法”和自身信息

每1秒每个sentinel对其他sentinel和redis执行ping

实现哨兵架构

以下案例实现一主两从的基于哨兵的高可用Redis架构

哨兵需要先实现主从复制

哨兵的前提是已经实现了Redis的主从复制
注意: master 的配置文件中masterauth 和slave 都必须相同

所有主从节点的 redis.conf 中关健配置

bind 0.0.0.0
masterauth "123456"
requirepass "123456"

此处主从配置沿用上面实验的环境

127.0.0.1:6379> info replication
# Replication
role:master
connected_slaves:2
slave0:ip=10.0.0.204,port=6379,state=online,offset=7322,lag=0
slave1:ip=10.0.0.203,port=6379,state=online,offset=7322,lag=0
master_failover_state:no-failover
master_replid:9357f3a67266d4d8687ea4ae329eb854a16d22f5
master_replid2:3f6e5aae3689756fd51b0e72e16774af84367e17
master_repl_offset:7322
second_repl_offset:7309
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:449
repl_backlog_histlen:6874

编辑哨兵配置

sentinel配置
Sentinel实际上是一个特殊的redis服务器,有些redis指令支持,但很多指令并不支持.默认监听在26379/tcp端口.
哨兵服务可以和Redis服务器分开部署在不同主机,但为了节约成本一般会部署在一起所有redis节点使用相同的以下示例的配置文件

#如果是编译安装,在源码目录有sentinel.conf,复制到安装目录即可,如:/apps/redis/etc/sentinel.conf
[root@ubuntu2004 ~]#cp /usr/local/src/redis-7.0.5/sentinel.conf /apps/redis/etc/
[root@ubuntu2004 ~]#chown redis.redis /apps/redis/etc/sentinel.conf

[root@ubuntu2004 ~]#vim /apps/redis/etc/sentinel.conf
bind 0.0.0.0
port 26379
daemonize yes
pidfile "redis-sentinel.pid"
Logfile "sentinel_26379.log"
dir "/apps/redis/data"
sentinel monitor mymaster 10.0.0.202 6379 2
sentinel auth-pass mymaster 123456
sentinel down-after-milliseconds mymaster 15000
sentinel parallel-syncs mymaster 1
sentinel failover-timeout mymaster 180000
sentinel deny-scripts-reconfig yes
 
[root@ubuntu2004 ~]#scp /apps/redis/etc/sentinel.conf 10.0.0.202:/apps/redis/etc/sentinel.conf
[root@ubuntu2004 ~]#scp /apps/redis/etc/sentinel.conf 10.0.0.204:/apps/redis/etc/sentinel.conf
[root@ubuntu2004 ~]#redis-sentinel /apps/redis/etc/sentinel.conf 
[root@ubuntu2004 ~]#ss -ntlp
State       Recv-Q      Send-Q           Local Address:Port              Peer Address:Port      Process                                           
LISTEN      0           511                    0.0.0.0:26379                  0.0.0.0:*          users:(("redis-sentinel",pid=82201,fd=6))        
LISTEN      0           511                    0.0.0.0:6379                   0.0.0.0:*          users:(("redis-server",pid=76018,fd=6))          
LISTEN      0           4096             127.0.0.53%lo:53                     0.0.0.0:*          users:(("systemd-resolve",pid=43797,fd=13))      
LISTEN      0           128                    0.0.0.0:22                     0.0.0.0:*          users:(("sshd",pid=767,fd=3))                    
LISTEN      0           128                  127.0.0.1:6010                   0.0.0.0:*          users:(("sshd",pid=74257,fd=10))                 
LISTEN      0           511                      [::1]:6379                      [::]:*          users:(("redis-server",pid=76018,fd=7))          
LISTEN      0           128                       [::]:22                        [::]:*          users:(("sshd",pid=767,fd=4))                    
LISTEN      0           128                      [::1]:6010                      [::]:*          users:(("sshd",pid=74257,fd=9))                  


验证哨兵服务

查看哨兵服务端口状态

[root@ubuntu2004 redis]#ss -ntl
State            Recv-Q           Send-Q                     Local Address:Port                      Peer Address:Port          Process           
LISTEN           0                511                              0.0.0.0:26379                          0.0.0.0:*                               
LISTEN           0                511                              0.0.0.0:6379                           0.0.0.0:*                               
LISTEN           0                4096                       127.0.0.53%lo:53                             0.0.0.0:*                               
LISTEN           0                128                              0.0.0.0:22                             0.0.0.0:*                               
LISTEN           0                128                            127.0.0.1:6010                           0.0.0.0:*                               
LISTEN           0                511                                [::1]:6379                              [::]:*                               
LISTEN           0                128                                 [::]:22                                [::]:*                               
LISTEN           0                128                                [::1]:6010                              [::]:*                               

查看哨兵日志

#master
[root@ubuntu2004 redis]#tail -f /apps/redis/data/sentinel_26379.log 
84297:X 31 Oct 2022 14:53:03.104 * Removing the pid file.
84297:X 31 Oct 2022 14:53:03.105 # Sentinel is now ready to exit, bye bye...
84429:X 31 Oct 2022 14:53:30.792 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
84429:X 31 Oct 2022 14:53:30.792 # Redis version=7.0.5, bits=64, commit=00000000, modified=0, pid=84429, just started
84429:X 31 Oct 2022 14:53:30.792 # Configuration loaded
84429:X 31 Oct 2022 14:53:30.794 * Increased maximum number of open files to 10032 (it was originally set to 1024).
84429:X 31 Oct 2022 14:53:30.794 * monotonic clock: POSIX clock_gettime
84429:X 31 Oct 2022 14:53:30.795 * Running mode=sentinel, port=26379.
84429:X 31 Oct 2022 14:53:30.796 # Sentinel ID is 43cf2cdcbe75e74c5e462e7bbc610f656bd10392
84429:X 31 Oct 2022 14:53:30.796 # +monitor master mymaster 10.0.0.202 6379 quorum 2

#slave
[root@ubuntu2004 ~]#tail -f /apps/redis/data/sentinel_26379.log
19455:X 31 Oct 2022 14:55:08.469 * Removing the pid file.
19455:X 31 Oct 2022 14:55:08.469 # Sentinel is now ready to exit, bye bye...
19516:X 31 Oct 2022 14:55:21.891 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
19516:X 31 Oct 2022 14:55:21.891 # Redis version=7.0.5, bits=64, commit=00000000, modified=0, pid=19516, just started
19516:X 31 Oct 2022 14:55:21.891 # Configuration loaded
19516:X 31 Oct 2022 14:55:21.892 * Increased maximum number of open files to 10032 (it was originally set to 1024).
19516:X 31 Oct 2022 14:55:21.892 * monotonic clock: POSIX clock_gettime
19516:X 31 Oct 2022 14:55:21.892 * Running mode=sentinel, port=26379.
19516:X 31 Oct 2022 14:55:21.893 # Sentinel ID is 1642a54f9942599d943a25140def05138ef6a536
19516:X 31 Oct 2022 14:55:21.893 # +monitor master mymaster 10.0.0.202 6379 quorum 2

当前sentinel状态
在sentinel状态中尤其是最后一行,涉及到masterIP是多少,有几个slave,有几个sentinels,必须是符合全部服务器数量

[root@ubuntu2004 ~]#redis-cli -p 26379
127.0.0.1:26379> INFO sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_tilt_since_seconds:-1
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=10.0.0.202:6379,slaves=2,sentinels=3  #两个slave,三个sentinel服务器,如果sentinels值不符合,检查myid可能冲突

停止Master 实现故障转移

[root@ubuntu2004 redis]#systemctl stop redis.service 
[root@ubuntu2004 redis]#systemctl status redis.service 
● redis.service - Redis persistent key-value database
     Loaded: loaded (/lib/systemd/system/redis.service; ena
....


#查看各节点上哨兵信息:
[root@ubuntu2004 ~]#redis-cli -p 26379
127.0.0.1:26379> INFO sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_tilt_since_seconds:-1
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=10.0.0.204:6379,slaves=2,sentinels=3

#故障转移时sentinel的信息:
[root@ubuntu2004 ~]#tail -f /apps/redis/data/sentinel_26379.log
47519:X 31 Oct 2022 14:59:23.050 # +failover-state-reconf-slaves master mymaster 10.0.0.202 6379
47519:X 31 Oct 2022 14:59:23.106 * +slave-reconf-sent slave 10.0.0.203:6379 10.0.0.203 6379 @ mymaster 10.0.0.202 6379
47519:X 31 Oct 2022 14:59:23.285 * +slave-reconf-inprog slave 10.0.0.203:6379 10.0.0.203 6379 @ mymaster 10.0.0.202 6379
47519:X 31 Oct 2022 14:59:23.285 * +slave-reconf-done slave 10.0.0.203:6379 10.0.0.203 6379 @ mymaster 10.0.0.202 6379
47519:X 31 Oct 2022 14:59:23.344 # +failover-end master mymaster 10.0.0.202 6379
47519:X 31 Oct 2022 14:59:23.344 # +switch-master mymaster 10.0.0.202 6379 10.0.0.204 6379
47519:X 31 Oct 2022 14:59:23.344 * +slave slave 10.0.0.203:6379 10.0.0.203 6379 @ mymaster 10.0.0.204 6379
47519:X 31 Oct 2022 14:59:23.344 * +slave slave 10.0.0.202:6379 10.0.0.202 6379 @ mymaster 10.0.0.204 6379
47519:X 31 Oct 2022 14:59:23.346 * Sentinel new configuration saved on disk
47519:X 31 Oct 2022 14:59:38.435 # +sdown slave 10.0.0.202:6379 10.0.0.202 6379 @ mymaster 10.0.0.204 6379

#验证故障转移
[root@ubuntu2004 ~]#grep ^replicaof /apps/redis/etc/redis.conf 
replicaof 10.0.0.204 6379

#哨兵配置文件的sentinel monitor IP 同样也会被修改
#slave-1
[root@ubuntu2004 ~]#egrep "^[a-Z]" /apps/redis/etc/sentinel.conf 
bind 0.0.0.0
port 26379
daemonize yes
pidfile "redis-sentinel.pid"
logfile "sentinel_26379.log"
dir "/apps/redis/data"
sentinel monitor mymaster 10.0.0.204 6379 2
sentinel auth-pass mymaster 123456
sentinel down-after-milliseconds mymaster 15000
sentinel deny-scripts-reconfig yes
latency-tracking-info-percentiles 50 99 99.9
protected-mode no
user default on nopass sanitize-payload ~* &* +@all
sentinel myid 1642a54f9942599d943a25140def05138ef6a536
sentinel config-epoch mymaster 1
sentinel leader-epoch mymaster 1
sentinel current-epoch 1
sentinel known-replica mymaster 10.0.0.202 6379
sentinel known-replica mymaster 10.0.0.203 6379
sentinel known-sentinel mymaster 10.0.0.204 26379 57069760d51098677ad601621e03fa4e4fa6d1f3
sentinel known-sentinel mymaster 10.0.0.202 26379 43cf2cdcbe75e74c5e462e7bbc610f656bd10392

#slave-2(master)
[root@ubuntu2004 ~]#egrep "^[a-Z]" /apps/redis/etc/sentinel.conf 
bind 0.0.0.0
port 26379
daemonize yes
pidfile "redis-sentinel.pid"
logfile "sentinel_26379.log"
dir "/apps/redis/data"
sentinel monitor mymaster 10.0.0.204 6379 2
sentinel auth-pass mymaster 123456
sentinel down-after-milliseconds mymaster 15000
sentinel deny-scripts-reconfig yes
latency-tracking-info-percentiles 50 99 99.9
protected-mode no
user default on nopass sanitize-payload ~* &* +@all
sentinel myid 57069760d51098677ad601621e03fa4e4fa6d1f3
sentinel config-epoch mymaster 1
sentinel leader-epoch mymaster 1
sentinel current-epoch 1
sentinel known-replica mymaster 10.0.0.202 6379
sentinel known-replica mymaster 10.0.0.203 6379
sentinel known-sentinel mymaster 10.0.0.203 26379 1642a54f9942599d943a25140def05138ef6a536
sentinel known-sentinel mymaster 10.0.0.202 26379 43cf2cdcbe75e74c5e462e7bbc610f656bd10392

#验证 Redis 各节点状态
#新master
[root@ubuntu2004 ~]#redis-cli -a 123456 
127.0.0.1:6379> INFO replication
# Replication
role:master
connected_slaves:1
slave0:ip=10.0.0.203,port=6379,state=online,offset=307220,lag=0
master_failover_state:no-failover
master_replid:83648426841af360229e5d28e893218e07d0bfd8
master_replid2:9357f3a67266d4d8687ea4ae329eb854a16d22f5
master_repl_offset:307355
second_repl_offset:211230
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:3963
repl_backlog_histlen:303393

#slave-1
[root@ubuntu2004 ~]#redis-cli 
127.0.0.1:6379> AUTH 123456
OK
127.0.0.1:6379> INFO replication
# Replication
role:slave
master_host:10.0.0.204
master_port:6379
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
slave_read_repl_offset:316335
slave_repl_offset:316335
slave_priority:100
slave_read_only:1
replica_announced:1
connected_slaves:0
master_failover_state:no-failover
master_replid:83648426841af360229e5d28e893218e07d0bfd8
master_replid2:9357f3a67266d4d8687ea4ae329eb854a16d22f5
master_repl_offset:316335
second_repl_offset:211230
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:210127
repl_backlog_histlen:106209

原master重新加入Redis集群

#原master202
[root@ubuntu2004 redis]#systemctl start redis.service 
[root@ubuntu2004 redis]#systemctl status redis.service 
● redis.service - Redis persistent key-value database
     Loaded: loaded (/lib/systemd/system/redis.service; enabled; vendor preset: enabled)
     Active: active (running) since Mon 2022-10-31 15:09:30 CST; 1s ago
......
[root@ubuntu2004 ~]#grep  "^replicaof" /apps/redis/etc/redis.conf 
replicaof 10.0.0.204 6379

[root@ubuntu2004 redis]#redis-cli -a 123456
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:6379> INFO replication
# Replication
role:slave
master_host:10.0.0.204
master_port:6379
master_link_status:up
master_last_io_seconds_ago:1
master_sync_in_progress:0
slave_read_repl_offset:390204
slave_repl_offset:390204
slave_priority:100
slave_read_only:1
replica_announced:1
connected_slaves:0
master_failover_state:no-failover
master_replid:83648426841af360229e5d28e893218e07d0bfd8
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:390204
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:376854
repl_backlog_histlen:13351

[root@ubuntu2004 redis]#redis-cli -p 26379
127.0.0.1:26379> INFO sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_tilt_since_seconds:-1
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=10.0.0.204:6379,slaves=2,sentinels=3


#新master204
[root@ubuntu2004 ~]#redis-cli -a 123456
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:6379> INFO replication
# Replication
role:master
connected_slaves:2
slave0:ip=10.0.0.203,port=6379,state=online,offset=410864,lag=0
slave1:ip=10.0.0.202,port=6379,state=online,offset=410864,lag=0
master_failover_state:no-failover
master_replid:83648426841af360229e5d28e893218e07d0bfd8
master_replid2:9357f3a67266d4d8687ea4ae329eb854a16d22f5
master_repl_offset:410999
second_repl_offset:211230
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:3963
repl_backlog_histlen:407037

[root@ubuntu2004 ~]#tail -f /apps/redis/data/sentinel_26379.log 
47519:X 31 Oct 2022 14:59:23.285 * +slave-reconf-inprog slave 10.0.0.203:6379 10.0.0.203 6379 @ mymaster 10.0.0.202 6379
47519:X 31 Oct 2022 14:59:23.285 * +slave-reconf-done slave 10.0.0.203:6379 10.0.0.203 6379 @ mymaster 10.0.0.202 6379
47519:X 31 Oct 2022 14:59:23.344 # +failover-end master mymaster 10.0.0.202 6379
47519:X 31 Oct 2022 14:59:23.344 # +switch-master mymaster 10.0.0.202 6379 10.0.0.204 6379
47519:X 31 Oct 2022 14:59:23.344 * +slave slave 10.0.0.203:6379 10.0.0.203 6379 @ mymaster 10.0.0.204 6379
47519:X 31 Oct 2022 14:59:23.344 * +slave slave 10.0.0.202:6379 10.0.0.202 6379 @ mymaster 10.0.0.204 6379
47519:X 31 Oct 2022 14:59:23.346 * Sentinel new configuration saved on disk
47519:X 31 Oct 2022 14:59:38.435 # +sdown slave 10.0.0.202:6379 10.0.0.202 6379 @ mymaster 10.0.0.204 6379
47519:X 31 Oct 2022 15:09:31.121 # -sdown slave 10.0.0.202:6379 10.0.0.202 6379 @ mymaster 10.0.0.204 6379
47519:X 31 Oct 2022 15:12:58.991 * +reboot slave 10.0.0.202:6379 10.0.0.202 6379 @ mymaster 10.0.0.204 6379

Sentinel 运维

手动让主节点下线

127.0.0.1:26379> sentinel failover <masterName>
[root@ubuntu2004 redis]#vim etc/redis.conf 
replica-priority 50
[root@ubuntu2004 redis]#systemctl restart redis

#或者动态修改
[root@ubuntu2004 redis]#redis-cli -a 123456
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:6379> CONFIG get replica-priority
1) "replica-priority"
2) "50"
127.0.0.1:6379> CONFIG set replica-priority 10
OK
127.0.0.1:6379> CONFIG get replica-priority
1) "replica-priority"
2) "10"

[root@ubuntu2004 redis]#redis-cli -p 26379
127.0.0.1:26379> INFO sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_tilt_since_seconds:-1
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=10.0.0.204:6379,slaves=2,sentinels=3
127.0.0.1:26379> sentinel failover mymaster
OK
.... #中间需要等待几秒
127.0.0.1:26379> INFO sentinel
# Sentinel
sentinel_masters:1
sentinel_tilt:0
sentinel_tilt_since_seconds:-1
sentinel_running_scripts:0
sentinel_scripts_queue_length:0
sentinel_simulate_failure_flags:0
master0:name=mymaster,status=ok,address=10.0.0.202:6379,slaves=2,sentinels=3

应用程序连接 Sentinel

Redis 官方支持多种开发语言的客户端:https://redis.io/clients

客户端连接 Sentinel 工作原理

1. 客户端获取 Sentinel 节点集合,选举出一个 Sentinel

2. 由这个sentinel 通过masterName 获取master节点信息,客户端通过sentinel get-master-addr-by-name master-name这个api来获取对应主节点信息

3. 客户端发送role指令确认master的信息,验证当前获取的“主节点”是真正的主节点,这样的目的是为了防止故障转移期间主节点的变化

4. 客户端保持和Sentinel节点集合的联系,即订阅Sentinel节点相关频道,时刻获取关于主节点的相关信息,获取新的master 信息变化,并自动连接新的master

java 连接Sentinel哨兵

java 客户端连接Redis:https://github.com/xetorthio/jedis/blob/master/pom.xml

#jedis/pom.xml 配置连接redis
<properties>
<redis-hosts>localhost:6379,localhost:6380,localhost:6381,localhost:6382,localhost:6383,localhost:6384,localhost:6385</redis-hosts>
<sentinel-hosts>localhost:26379,localhost:26380,localhost:26381</sentinel-hosts>
<cluster-hosts>localhost:7379,localhost:7380,localhost:7381,localhost:7382,localhost:7383,localhost:7384,localhost:7385</cluster-hosts>
<github.global.server>github</github.global.server>
</properties>

java客户端连接单机的redis是通过Jedis来实现的,java代码用的时候只要创建Jedis对象就可以建多个Jedis连接池来连接redis,应用程序再直接调用连接池即可连接Redis。而Redis为了保障高可用,服务一般都是Sentinel部署方式,当Redis服务中的主服务挂掉之后,会仲裁出另外一台Slaves服务充当Master。这个时候,我们的应用即使使用了Jedis 连接池,如果Master服务挂了,应用将还是无法连接新的Master服务,为了解决这个问题, Jedis也提供了相应的Sentinel实现,能够在Redis Sentinel主从切换时候,通知应用,把应用连接到新的Master服务。
Redis Sentinel的使用也是十分简单的,只是在JedisPool中添加了Sentinel和MasterName参数,JRedisSentinel底层基于Redis订阅实现Redis主从服务的切换通知,当Reids发生主从切换时,Sentinel会发送通知主动通知Jedis进行连接的切换,JedisSentinelPool在每次从连接池中获取链接对象的时候,都要对连接对象进行检测,如果此链接和Sentinel的Master服务连接参数不一致,则会关闭此连接,重新获取新的Jedis连接对象。

python 连接 Sentinel 哨兵

[root@ubuntu2004 ~]#python3 -V
Python 3.8.10
[root@ubuntu2004 ~]#apt -y install python3-redis

[root@ubuntu2004 ~]#cat sentinel_test.py 
#!/usr/bin/python3
import redis
from redis.sentinel import Sentinel
#连接哨兵服务器(主机名也可以用域名)
sentinel = Sentinel([('10.0.0.202', 26379),
                     ('10.0.0.203', 26379),
                     ('10.0.0.204', 26379)],
                     socket_timeout=0.5)

redis_auth_pass = '123456'

#mymaster 是配置哨兵模式的redis集群名称,此为默认值,实际名称按照个人部署案例来填写
#获取主服务器地址
master = sentinel.discover_master('mymaster')
print(master)

#获取从服务器地址
slave = sentinel.discover_slaves('mymaster')
print(slave)

#获取主服务器进行写入
master = sentinel.master_for('mymaster', socket_timeout=0.5,
password=redis_auth_pass, db=0)
w_ret = master.set('name', 'shuhong')
#输出:True

#获取从服务器进行读取(默认是round-roubin)
slave = sentinel.slave_for('mymaster', socket_timeout=0.5,
password=redis_auth_pass, db=0)
r_ret = slave.get('name')
print(r_ret)
#输出:shuhong





[root@ubuntu2004 ~]#./sentinel_test.py 
('10.0.0.202', 6379)
[('10.0.0.204', 6379), ('10.0.0.203', 6379)]
b'shuhong'
[root@ubuntu2004 ~]#redis-cli -a 123456
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
127.0.0.1:6379> get name
"shuhong"

Redis Cluster

Redis Cluster 介绍

使用哨兵sentinel 只能解决Redis高可用问题,实现Redis的自动故障转移,但仍然无法解决Redis Master单节点的性能瓶颈问题为了解决单机性能的瓶颈,提高Redis 服务整体性能,可以使用分布式集群的解决方案

早期 Redis 分布式集群部署方案:

客户端分区:由客户端程序自己实现写入分配、高可用管理和故障转移等,对客户端的开发实现较为复杂

代理服务:客户端不直接连接Redis,而先连接到代理服务,由代理服务实现相应读写分配,当前代理服务都是第三方实现.此方案中客户端实现无需特殊开发,实现容易,但是代理服务节点仍存有单点故障和性能瓶颈问题。比如:Twitter开源Twemproxy,豌豆荚开发的 codis

Redis 3.0 版本之后推出无中心架构的 Redis Cluster ,支持多个master节点并行写入和故障的自动转移动能.

Redis cluster 架构

Redis cluster 需要至少 3个master节点才能实现,slave节点数量不限,当然一般每个master都至少对应的有一个slave节点
如果有三个主节点采用哈希槽 hash slot 的方式来分配16384个槽位 slot
此三个节点分别承担的slot 区间可以是如以下方式分配

节点M1 0-5460
节点M2 5461-10922
节点M3 10923-16383

Redis cluster 的工作原理

数据分区

如果是单机存储的话,直接将数据存放在单机redis就行了。但是如果是集群存储,就需要考虑到数据分区了。

数据分区通常采取顺序分布和hash分布。

分布方式顺序分布哈希分布
数据分散度分布倾斜分布散列
顺序访问支持不支持

顺序分布保障了数据的有序性,但是离散性低,可能导致某个分区的数据热度高,其他分区数据的热度低,分区访问不均衡。
哈希分布也分为多种分布方式,比如区域哈希分区,一致性哈希分区等。而redis cluster采用的是虚拟槽分区的方式。

虚拟槽分区
redis cluster设置有0~16383的槽,每个槽映射一个数据子集,通过hash函数,将数据存放在不同的槽位中,每个集群的节点保存一部分的槽。
每个key存储时,先经过哈希函数CRC16(key)得到一个整数,然后整数与16384取余,得到槽的数值,然后找到对应的节点,将数据存放入对应的槽中。

集群通信

但是寻找槽的过程并不是一次就命中的,比如上图key将要存放在14396槽中,但是并不是一下就锁定了node3节点,可能先去询问node1,然后才访问node3。

而集群中节点之间的通信,保证了最多两次就能命中对应槽所在的节点。因为在每个节点中,都保存了其他节点的信息,知道哪个槽由哪个节点负责。这样即使第一次访问没有命中槽,但是会通知客户端,该槽在哪个节点,这样访问对应节点就能精准命中。

1. 节点A对节点B发送一个meet操作,B返回后表示A和B之间能够进行沟通。
2. 节点A对节点C发送meet操作,C返回后,A和C之间也能进行沟通。
3. 然后B根据对A的了解,就能找到C,B和C之间也建立了联系。
4. 直到所有节点都能建立联系。
这样每个节点都能互相知道对方负责哪些槽。

集群伸缩
集群并不是建立之后,节点数就固定不变的,也会有新的节点加入集群或者集群中的节点下线,这就是集群的扩容和缩容。但是由于集群节点和槽息息相关,所以集群的伸缩也对应了槽和数据的迁移

集群扩容
当有新的节点准备好加入集群时,这个新的节点还是孤立节点,加入有两种方式。一个是通过集群节点执行命令来和孤立节点握手,另一个则是使用脚本来添加节点。
1. cluster_node_ip:port: cluster meet ip port new_node_ip:port
2. redis-trib.rb add-node new_node_ip:port cluster_node_ip:port

通常这个新的节点有两种身份,要么作为主节点,要么作为从节点:
主节点:分摊槽和数据
从节点:作故障转移备份

下线节点的流程如下:
1. 判断该节点是否持有槽,如果未持有槽就跳转到下一步,持有槽则先迁移槽到其他节点
2. 通知其他节点(cluster forget)忘记该下线节点
3. 关闭下线节点的服务
需要注意的是如果先下线主节点,再下线从节点,会进行故障转移,所以要先下线从节点。

故障转移
除了手动下线节点外,也会面对突发故障。下面提到的主要是主节点的故障,因为从节点的故障并不影响主节点工作,对应的主节点只会记住自己哪个从节点下线了,并将信息发送给其他节点。故障的从节点重连后,继续官复原职,复制主节点的数据。

只有主节点才需要进行故障转移。在之前学习主从复制时,我们需要使用redis sentinel来实现故障转移。而redis cluster则不需要redis sentinel,其自身就具备了故障转移功能。

根据前面我们了解到,节点之间是会进行通信的,节点之间通过ping/pong交互消息,所以借此就能发现故障。集群节点发现故障同样是有主观下线和客观下线的

主观下线

对于每个节点有一个故障列表,故障列表维护了当前节点接收到的其他所有节点的信息。当半数以上的持有槽的主节点都标记某个节点主观下线,就会尝试客观下线。

客观下线

故障转移
集群同样具备了自动转移故障的功能,和哨兵有些类似,在进行客观下线之后,就开始准备让故障节点的从节点“上任”了。

首先是进行资格检查,只有具备资格的从节点才能参加选举:
故障节点的所有从节点检查和故障主节点之间的断线时间
超过cluster-node-timeout * cluster-slave-validati-factor(默认10)则取消选举资格
然后是准备选举顺序,不同偏移量的节点,参与选举的顺位不同。offset最大的slave节点,选举顺位最高,最优先选举。而offset较低的slave节点,要延迟选举。

当有从节点参加选举后,主节点收到信息就开始投票。偏移量最大的节点,优先参与选举就更大可能获得最多的票数,称为主节点。

当从节点走马上任变成主节点之后,就要开始进行替换主节点:
1. 让该slave节点执行slaveof no one变为master节点
2. 将故障节点负责的槽分配给该节点
3. 向集群中其他节点广播Pong消息,表明已完成故障转移
4. 故障节点重启后,会成为new_master的slave节点

Redis Cluster 部署架构说明

注意: 建立Redis Cluster 的节点需要清空数据
测试环境:3台服务器,每台服务器启动6379和6380两个redis 服务实例,适用于测试环境

生产环境:6台服务器,分别是三组master/slave,适用于生产环境

部署方式介绍

说明:Redis 5.X 和之前版本相比有较大变化,以下分别介绍两个版本5.X和4.X的配置

redis cluster 有多种部署方法
·原生命令安装
理解Redis Cluster架构
生产环境不使用
·官方工具安装
高效、准确
生产环境可以使用
·自主研发
可以实现可视化的自动化部署

实战案例:基于Redis 5 以上版本的 redis cluster 部署

官方文档:https://redis.io/topics/cluster-tutorial
redis cluster 相关命令
范例: 查看 –cluster 选项帮助

准备环境

ansible部署机器

[root@ansible ~]#apt -y install ansible
[root@ansible ~]#mkdir /data/ansible -p
[root@ansible ~]#cp /etc/ansible/ansible.cfg /data/ansible/

[root@ansible ~]#cd /data/ansible/
[root@ansible ansible]#vim ansible.cfg
[defaults]
inventory      = /data/ansible/hosts
roles_path    = /data/ansible/roles
host_key_checking = False
remote_user = root

[privilege_escalation]
become=True
become_method=sudo
become_user=root
become_ask_pass=False

[root@ansible ansible]#vim inventory
[master]
10.0.0.201
10.0.0.202
10.0.0.203

[slave]
10.0.0.204
10.0.0.205
10.0.0.206

#脚本打通key验证
[root@ansible ansible]#bash ssh.sh 

[root@ansible ansible]#ansible all -m ping 
10.0.0.201 | SUCCESS => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python3"
    },
    "changed": false,
    "ping": "pong"
}
10.0.0.203 | SUCCESS => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python3"
    },
    "changed": false,
    "ping": "pong"
}
10.0.0.205 | SUCCESS => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python3"
    },
    "changed": false,
    "ping": "pong"
}
10.0.0.204 | SUCCESS => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python3"
    },
    "changed": false,
    "ping": "pong"
}
10.0.0.202 | SUCCESS => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python3"
    },
    "changed": false,
    "ping": "pong"
}
10.0.0.206 | SUCCESS => {
    "ansible_facts": {
        "discovered_interpreter_python": "/usr/bin/python3"
    },
    "changed": false,
    "ping": "pong"
}

[root@ansible ansible]#vim adhoc.sh
#!/bin/bash
# 
#********************************************************************
#Author:            shuhong
#QQ:                985347841
#Date:              2022-10-31
#FileName:          adhoc.sh
#URL:               hhhhh
#Description:       The test script
#Copyright (C):     2022 All rights reserved
#********************************************************************
ansible 10.0.0.201 -m hostname -a 'name=Redis-Master1'
ansible 10.0.0.202 -m hostname -a 'name=Redis-Master2'
ansible 10.0.0.203 -m hostname -a 'name=Redis-Master3'
ansible 10.0.0.204 -m hostname -a 'name=Redis-Slave1'
ansible 10.0.0.205 -m hostname -a 'name=Redis-Slave2'
ansible 10.0.0.206 -m hostname -a 'name=Redis-Slave3'
~                                                                                                                                                 
~                                                                                                                                                 
"adhoc.sh" [新] 17L, 706C 已写入                                                                                                
[root@ansible ansible]#bash adhoc.sh 
10.0.0.201 | CHANGED => {
    "ansible_facts": {
        "ansible_domain": "networksolutions.com",
        "ansible_fqdn": "underconstruction.networksolutions.com",
        "ansible_hostname": "Redis-Master1",
        "ansible_nodename": "Redis-Master1",
        "discovered_interpreter_python": "/usr/bin/python3"
    },
    "changed": true,
    "name": "Redis-Master1"
}
10.0.0.202 | CHANGED => {
    "ansible_facts": {
        "ansible_domain": "networksolutions.com",
        "ansible_fqdn": "underconstruction.networksolutions.com",
        "ansible_hostname": "Redis-Master2",
        "ansible_nodename": "Redis-Master2",
        "discovered_interpreter_python": "/usr/bin/python3"
    },
    "changed": true,
    "name": "Redis-Master2"
}
10.0.0.203 | CHANGED => {
    "ansible_facts": {
        "ansible_domain": "networksolutions.com",
        "ansible_fqdn": "underconstruction.networksolutions.com",
        "ansible_hostname": "Redis-Master3",
        "ansible_nodename": "Redis-Master3",
        "discovered_interpreter_python": "/usr/bin/python3"
    },
    "changed": true,
    "name": "Redis-Master3"
}
10.0.0.204 | CHANGED => {
    "ansible_facts": {
        "ansible_domain": "networksolutions.com",
        "ansible_fqdn": "underconstruction.networksolutions.com",
        "ansible_hostname": "Redis-Slave1",
        "ansible_nodename": "Redis-Slave1",
        "discovered_interpreter_python": "/usr/bin/python3"
    },
    "changed": true,
    "name": "Redis-Slave1"
}
10.0.0.205 | CHANGED => {
    "ansible_facts": {
        "ansible_domain": "networksolutions.com",
        "ansible_fqdn": "underconstruction.networksolutions.com",
        "ansible_hostname": "Redis-Slave2",
        "ansible_nodename": "Redis-Slave2",
        "discovered_interpreter_python": "/usr/bin/python3"
    },
    "changed": true,
    "name": "Redis-Slave2"
}
10.0.0.206 | CHANGED => {
    "ansible_facts": {
        "ansible_domain": "networksolutions.com",
        "ansible_fqdn": "underconstruction.networksolutions.com",
        "ansible_hostname": "Redis-Slave3",
        "ansible_nodename": "Redis-Slave3",
        "discovered_interpreter_python": "/usr/bin/python3"
    },
    "changed": true,
    "name": "Redis-Slave3"
}

批量编译安装redis

[root@ansible ansible]#egrep -v "^#|^$" templates/redis.conf.j2 
bind 0.0.0.0 -::1
protected-mode yes
port 6379
tcp-backlog 511
timeout 0
tcp-keepalive 300
daemonize no
pidfile {{ INSTALL_DIR }}/run/redis_6379.pid
loglevel notice
logfile {{ INSTALL_DIR }}/log/redis-6379.log
databases 16
always-show-logo no
set-proc-title yes
proc-title-template "{title} {listen-addr} {server-mode}"
stop-writes-on-bgsave-error yes
rdbcompression yes
rdbchecksum yes
dbfilename dump.rdb
rdb-del-sync-files no
dir {{ INSTALL_DIR }}/data/
masterauth {{ PASSWORD }}
replica-serve-stale-data yes
replica-read-only yes
repl-diskless-sync yes
repl-diskless-sync-delay 5
repl-diskless-sync-max-replicas 0
repl-diskless-load disabled
repl-disable-tcp-nodelay no
replica-priority 100
acllog-max-len 128
requirepass {{ PASSWORD }}
lazyfree-lazy-eviction no
lazyfree-lazy-expire no
lazyfree-lazy-server-del no
replica-lazy-flush no
lazyfree-lazy-user-del no
lazyfree-lazy-user-flush no
oom-score-adj no
oom-score-adj-values 0 200 800
disable-thp yes
appendonly no
appendfilename "appendonly.aof"
appenddirname "appendonlydir"
appendfsync everysec
no-appendfsync-on-rewrite no
auto-aof-rewrite-percentage 100
auto-aof-rewrite-min-size 64mb
aof-load-truncated yes
aof-use-rdb-preamble yes
aof-timestamp-enabled no
cluster-enabled yes
cluster-config-file nodes-6379.conf
cluster-require-full-coverage no 
 
slowlog-log-slower-than 10000
slowlog-max-len 128
latency-monitor-threshold 0
notify-keyspace-events ""
hash-max-listpack-entries 512
hash-max-listpack-value 64
list-max-listpack-size -2
list-compress-depth 0
set-max-intset-entries 512
zset-max-listpack-entries 128
zset-max-listpack-value 64
hll-sparse-max-bytes 3000
stream-node-max-bytes 4096
stream-node-max-entries 100
activerehashing yes
client-output-buffer-limit normal 0 0 0
client-output-buffer-limit replica 256mb 64mb 60
client-output-buffer-limit pubsub 32mb 8mb 60
hz 10
dynamic-hz yes
aof-rewrite-incremental-fsync yes
rdb-save-incremental-fsync yes
jemalloc-bg-thread yes
[root@ansible ansible]#egrep -v "^#|^$" templates/redis.service.j2 
[Unit]
Description=Redis persistent key-value database
After=network.target
[Service]
ExecStart={{ INSTALL_DIR }}/bin/redis-server {{ INSTALL_DIR }}/etc/redis.conf --supervised systemd
ExecStop=/bin/kill -s QUIT $MAINPID
Type=notify
User=redis
Group=redis
RuntimeDirectory=redis
RuntimeDirectoryMode=0755
LimitNOFILE=1000000
[Install]
WantedBy=multi-user.target
[root@ansible ansible]#tree .
.
├── adhoc.sh
├── ansible.cfg
├── files
│   └── redis-7.0.5.tar.gz
├── install_redis_cluster.yaml
├── inventory
├── SCANIP.log
├── ssh.sh
└── templates
    ├── redis.conf.j2
    └── redis.service.j2

2 directories, 9 files
[root@ansible ansible]#cat install_redis_cluster.yaml 
---
- name: install redis
  hosts: all
  serial: 2
  vars:
    - version: "redis-7.0.5"
    - user: "redis"
    - id: "88"
    - INSTALL_DIR: "/apps/redis"
    - CPUS: "2"
    - PASSWORD: "123456"
  tasks:
    - name: rocky prepare
      yum:
        name: "{{ item }}"
        state: present
      loop:
        - gcc 
        - make 
        - jemalloc-devel
        - systemd-devel
      when: ansible_distribution_file_variety == "RedHat"

    - name: ubuntu prepare
      apt:
        update_cache: yes
        name: "{{ item }}"
        state: present
      loop:
        - gcc
        - make
        - libjemalloc-dev 
        - libsystemd-dev
      when: ansible_distribution_file_variety == "Debian"

    - name: group
      group: 
        name: "{{ user }}"
        gid: "{{ id }}"
        state: present
    - name: create user
      user: 
        name: "{{ user }}"
        shell: /sbin/nologin
        system: yes
        group: "{{ user }}"
        uid: "{{ id }}"

    - name: install redis
      unarchive: 
        src: "{{ version }}.tar.gz"
        dest: "/usr/local/src/"
        owner: root
        group: root

    - name: shell make
      shell: "cd /usr/local/src/{{ version }} && make -j {{ CPUS }} USE_SYSTEMD=yes PREFIX={{ INSTALL_DIR }} install"
    
    - name: link
      file:
        src: "{{ INSTALL_DIR }}/bin/{{ item }}"
        path: "/usr/bin/{{ item }}"
        state: link
      loop: 
        - redis-benchmark
        - redis-check-aof
        - redis-check-rdb
        - redis-cli
        - redis-sentinel  
        - redis-server    
    - name: dir
      file:
        path: "{{ INSTALL_DIR }}/{{ item }}"
        state: directory
      loop:
        - etc
        - log
        - data
        - run 
 
    - name:  conf
      template:
        src: redis.conf.j2
        dest: "{{ INSTALL_DIR }}/etc/redis.conf"
    
    - name: chown
      shell: "chown -R redis.redis {{ INSTALL_DIR }}"
    
    - name: sysctl net.core.somaxconn
      sysctl: 
        name: net.core.somaxconn
        value: 1024
        sysctl_set: yes
    
    - name: sysctl vm.overcommit_memory
      sysctl:
        name: vm.overcommit_memory
        value: 1
        sysctl_set: yes
  
    - name: service
      template:
        src: redis.service.j2
        dest: /lib/systemd/system/redis.service
 
    - name: start 
      service:
        daemon_reload: yes
        name: redis
        state: started
        enabled: yes

[root@ansible ansible]#ansible-playbook install_redis_cluster.yaml 
....
PLAY RECAP ***************************************************************************************************************************************
10.0.0.201                 : ok=14   changed=3    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0   
10.0.0.202                 : ok=14   changed=3    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0   
10.0.0.203                 : ok=14   changed=4    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0   
10.0.0.204                 : ok=14   changed=4    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0   
10.0.0.205                 : ok=14   changed=4    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0   
10.0.0.206                 : ok=14   changed=4    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0   

验证当前Redis服务状态

#开启了16379的cluster的端口,实际的端口=redis port + 10000
[root@Redis-Master1 ~]#ss -ntlp
State       Recv-Q      Send-Q           Local Address:Port              Peer Address:Port      Process                                           
LISTEN      0           511                    0.0.0.0:6379                   0.0.0.0:*          users:(("redis-server",pid=79229,fd=6))          
LISTEN      0           4096             127.0.0.53%lo:53                     0.0.0.0:*          users:(("systemd-resolve",pid=42635,fd=13))      
LISTEN      0           128                    0.0.0.0:22                     0.0.0.0:*          users:(("sshd",pid=774,fd=3))                    
LISTEN      0           511                    0.0.0.0:16379                  0.0.0.0:*          users:(("redis-server",pid=79229,fd=9))          
LISTEN      0           511                      [::1]:6379                      [::]:*          users:(("redis-server",pid=79229,fd=7))          
LISTEN      0           128                       [::]:22                        [::]:*          users:(("sshd",pid=774,fd=4))                    
LISTEN      0           511                      [::1]:16379                     [::]:*          users:(("redis-server",pid=79229,fd=10))         
.....

创建集群

[root@Redis-Master1 ~]#redis-cli -a 123456 --no-auth-warning --cluster create  10.0.0.201:6379 10.0.0.202:6379 10.0.0.203:6379 10.0.0.204:6379 10.0.0.205:6379 10.0.0.206:6379 --cluster-replicas 1
>>> Performing hash slots allocation on 6 nodes...
Master[0] -> Slots 0 - 5460
Master[1] -> Slots 5461 - 10922
Master[2] -> Slots 10923 - 16383
Adding replica 10.0.0.205:6379 to 10.0.0.201:6379
Adding replica 10.0.0.206:6379 to 10.0.0.202:6379
Adding replica 10.0.0.204:6379 to 10.0.0.203:6379
M: f1c00557c9e0939fe429921f10ac24d5198c7b25 10.0.0.201:6379
   slots:[0-5460] (5461 slots) master
M: 8abe17bdc9846481cf0ee404a1c4a3ac4af579b7 10.0.0.202:6379
   slots:[5461-10922] (5462 slots) master
M: dc7df9c92e6089c470b3d106c4e8ef7082133233 10.0.0.203:6379
   slots:[10923-16383] (5461 slots) master
S: 085fdaee434ce0b7b0159836eccb118f978510d6 10.0.0.204:6379
   replicates dc7df9c92e6089c470b3d106c4e8ef7082133233
S: 3bf1fc52a7ca5b165cd57a675d020a89828a5f62 10.0.0.205:6379
   replicates f1c00557c9e0939fe429921f10ac24d5198c7b25
S: 4f2b470fab9b51a6ba5b2a2029dc98a8fc1f40cb 10.0.0.206:6379
   replicates 8abe17bdc9846481cf0ee404a1c4a3ac4af579b7
Can I set the above configuration? (type 'yes' to accept): yes
>>> Nodes configuration updated
>>> Assign a different config epoch to each node
>>> Sending CLUSTER MEET messages to join the cluster
Waiting for the cluster to join
...
>>> Performing Cluster Check (using node 10.0.0.201:6379)
M: f1c00557c9e0939fe429921f10ac24d5198c7b25 10.0.0.201:6379
   slots:[0-5460] (5461 slots) master
   1 additional replica(s)
M: 8abe17bdc9846481cf0ee404a1c4a3ac4af579b7 10.0.0.202:6379
   slots:[5461-10922] (5462 slots) master
   1 additional replica(s)
S: 4f2b470fab9b51a6ba5b2a2029dc98a8fc1f40cb 10.0.0.206:6379
   slots: (0 slots) slave
   replicates 8abe17bdc9846481cf0ee404a1c4a3ac4af579b7
M: dc7df9c92e6089c470b3d106c4e8ef7082133233 10.0.0.203:6379
   slots:[10923-16383] (5461 slots) master
   1 additional replica(s)
S: 085fdaee434ce0b7b0159836eccb118f978510d6 10.0.0.204:6379
   slots: (0 slots) slave
   replicates dc7df9c92e6089c470b3d106c4e8ef7082133233
S: 3bf1fc52a7ca5b165cd57a675d020a89828a5f62 10.0.0.205:6379
   slots: (0 slots) slave
   replicates f1c00557c9e0939fe429921f10ac24d5198c7b25
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.



#从上面可以观察到三组
Adding replica 10.0.0.205:6379 to 10.0.0.201:6379
Adding replica 10.0.0.206:6379 to 10.0.0.202:6379
Adding replica 10.0.0.204:6379 to 10.0.0.203:6379

验证集群

[root@Redis-Master1 ~]#redis-cli -a 123456 -c INFO replication
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Replication
role:master
connected_slaves:1
slave0:ip=10.0.0.205,port=6379,state=online,offset=126,lag=1
master_failover_state:no-failover
master_replid:7e1a477503ca53a305844e1a0c3cc5769796e0f6
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:126
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:126

[root@Redis-Master2 ~]#redis-cli -a 123456 -c INFO replication
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Replication
role:master
connected_slaves:1
slave0:ip=10.0.0.206,port=6379,state=online,offset=210,lag=0
master_failover_state:no-failover
master_replid:6fed8dc7af5d5a9523a72c8888a1e8e69876a7fe
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:210
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:210

[root@Redis-Master3 ~]#redis-cli -a 123456 -c INFO replication
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Replication
role:master
connected_slaves:1
slave0:ip=10.0.0.204,port=6379,state=online,offset=238,lag=0
master_failover_state:no-failover
master_replid:6c5491a689fe630a227a605c5d2becc7424373b7
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:238
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:1
repl_backlog_histlen:238

[root@Redis-Slave1 ~]#redis-cli -a 123456 -c INFO replication
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Replication
role:slave
master_host:10.0.0.203
master_port:6379
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
slave_read_repl_offset:280
slave_repl_offset:280
slave_priority:100
slave_read_only:1
replica_announced:1
connected_slaves:0
master_failover_state:no-failover
master_replid:6c5491a689fe630a227a605c5d2becc7424373b7
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:280
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:15
repl_backlog_histlen:266

[root@Redis-Slave2 ~]#redis-cli -a 123456 -c INFO replication
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Replication
role:slave
master_host:10.0.0.201
master_port:6379
master_link_status:up
master_last_io_seconds_ago:7
master_sync_in_progress:0
slave_read_repl_offset:308
slave_repl_offset:308
slave_priority:100
slave_read_only:1
replica_announced:1
connected_slaves:0
master_failover_state:no-failover
master_replid:7e1a477503ca53a305844e1a0c3cc5769796e0f6
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:308
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:15
repl_backlog_histlen:294

[root@Redis-Slave3 ~]#redis-cli -a 123456 -c INFO replication
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Replication
role:slave
master_host:10.0.0.202
master_port:6379
master_link_status:up
master_last_io_seconds_ago:9
master_sync_in_progress:0
slave_read_repl_offset:322
slave_repl_offset:322
slave_priority:100
slave_read_only:1
replica_announced:1
connected_slaves:0
master_failover_state:no-failover
master_replid:6fed8dc7af5d5a9523a72c8888a1e8e69876a7fe
master_replid2:0000000000000000000000000000000000000000
master_repl_offset:322
second_repl_offset:-1
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:15
repl_backlog_histlen:308

范例: 查看指定master节点的slave节点信息

[root@Redis-Master1 ~]#redis-cli -a 123456 cluster nodes
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
8abe17bdc9846481cf0ee404a1c4a3ac4af579b7 10.0.0.202:6379@16379 master - 0 1667219561000 2 connected 5461-10922
4f2b470fab9b51a6ba5b2a2029dc98a8fc1f40cb 10.0.0.206:6379@16379 slave 8abe17bdc9846481cf0ee404a1c4a3ac4af579b7 0 1667219561905 2 connected
dc7df9c92e6089c470b3d106c4e8ef7082133233 10.0.0.203:6379@16379 master - 0 1667219562000 3 connected 10923-16383
085fdaee434ce0b7b0159836eccb118f978510d6 10.0.0.204:6379@16379 slave dc7df9c92e6089c470b3d106c4e8ef7082133233 0 1667219560899 3 connected
3bf1fc52a7ca5b165cd57a675d020a89828a5f62 10.0.0.205:6379@16379 slave f1c00557c9e0939fe429921f10ac24d5198c7b25 0 1667219562909 1 connected
f1c00557c9e0939fe429921f10ac24d5198c7b25 10.0.0.201:6379@16379 myself,master - 0 1667219560000 1 connected 0-5460

#以下命令查看指定master节点的slave节点信息,其中f1c00557c9e0939fe429921f10ac24d5198c7b25
[root@Redis-Master1 ~]#redis-cli -a 123456 cluster slaves f1c00557c9e0939fe429921f10ac24d5198c7b25
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
1) "3bf1fc52a7ca5b165cd57a675d020a89828a5f62 10.0.0.205:6379@16379 slave f1c00557c9e0939fe429921f10ac24d5198c7b25 0 1667219675000 1 connected"

验证集群状态

[root@Redis-Master1 ~]#redis-cli -a 123456 CLUSTER INFO
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:6
cluster_my_epoch:1
cluster_stats_messages_ping_sent:431
cluster_stats_messages_pong_sent:445
cluster_stats_messages_sent:876
cluster_stats_messages_ping_received:440
cluster_stats_messages_pong_received:431
cluster_stats_messages_meet_received:5
cluster_stats_messages_received:876
total_cluster_links_buffer_limit_exceeded:0

#查看任意节点的集群状态
[root@Redis-Master1 ~]#redis-cli -a 123456 --cluster info 10.0.0.202:6379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
10.0.0.202:6379 (8abe17bd...) -> 0 keys | 5462 slots | 1 slaves.
10.0.0.203:6379 (dc7df9c9...) -> 0 keys | 5461 slots | 1 slaves.
10.0.0.201:6379 (f1c00557...) -> 0 keys | 5461 slots | 1 slaves.
[OK] 0 keys in 3 masters.
0.00 keys per slot on average.

查看对应关系

[root@Redis-Master1 ~]#redis-cli -a 123456 CLUSTER NODES
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
8abe17bdc9846481cf0ee404a1c4a3ac4af579b7 10.0.0.202:6379@16379 master - 1667219808937 1667219803919 2 connected 5461-10922
4f2b470fab9b51a6ba5b2a2029dc98a8fc1f40cb 10.0.0.206:6379@16379 slave 8abe17bdc9846481cf0ee404a1c4a3ac4af579b7 0 1667219807935 2 connected
dc7df9c92e6089c470b3d106c4e8ef7082133233 10.0.0.203:6379@16379 master - 0 1667219807000 3 connected 10923-16383
085fdaee434ce0b7b0159836eccb118f978510d6 10.0.0.204:6379@16379 slave dc7df9c92e6089c470b3d106c4e8ef7082133233 0 1667219806931 3 connected
3bf1fc52a7ca5b165cd57a675d020a89828a5f62 10.0.0.205:6379@16379 slave f1c00557c9e0939fe429921f10ac24d5198c7b25 0 1667219805000 1 connected
f1c00557c9e0939fe429921f10ac24d5198c7b25 10.0.0.201:6379@16379 myself,master - 0 1667219806000 1 connected 0-5460


[root@Redis-Master1 ~]#redis-cli -a 123456 --cluster check 10.0.0.201:6379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
10.0.0.201:6379 (f1c00557...) -> 0 keys | 5461 slots | 1 slaves.
10.0.0.202:6379 (8abe17bd...) -> 0 keys | 5462 slots | 1 slaves.
10.0.0.203:6379 (dc7df9c9...) -> 0 keys | 5461 slots | 1 slaves.
[OK] 0 keys in 3 masters.
0.00 keys per slot on average.
>>> Performing Cluster Check (using node 10.0.0.201:6379)
M: f1c00557c9e0939fe429921f10ac24d5198c7b25 10.0.0.201:6379
   slots:[0-5460] (5461 slots) master
   1 additional replica(s)
M: 8abe17bdc9846481cf0ee404a1c4a3ac4af579b7 10.0.0.202:6379
   slots:[5461-10922] (5462 slots) master
   1 additional replica(s)
S: 4f2b470fab9b51a6ba5b2a2029dc98a8fc1f40cb 10.0.0.206:6379
   slots: (0 slots) slave
   replicates 8abe17bdc9846481cf0ee404a1c4a3ac4af579b7
M: dc7df9c92e6089c470b3d106c4e8ef7082133233 10.0.0.203:6379
   slots:[10923-16383] (5461 slots) master
   1 additional replica(s)
S: 085fdaee434ce0b7b0159836eccb118f978510d6 10.0.0.204:6379
   slots: (0 slots) slave
   replicates dc7df9c92e6089c470b3d106c4e8ef7082133233
S: 3bf1fc52a7ca5b165cd57a675d020a89828a5f62 10.0.0.205:6379
   slots: (0 slots) slave
   replicates f1c00557c9e0939fe429921f10ac24d5198c7b25
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

测试集群写入数据

redis cluster 写入key

#经过算法计算,当前key的槽位需要写入指定的node
[root@Redis-Master1 ~]#redis-cli -a 123456 -h 10.0.0.201 SET key1 values1
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
(error) MOVED 9189 10.0.0.202:6379   #槽位不在当前node所以无法写入

#指定槽位对应node可写入
[root@Redis-Master1 ~]#redis-cli -a 123456 -h 10.0.0.202 SET key1 values1
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
OK
[root@Redis-Master1 ~]#redis-cli -a 123456 -h 10.0.0.202 GET key1 
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
"values1"

#对应的slave节点可以KEYS *,但GET key1失败,可以到master上执行GET key1
[root@Redis-Master1 ~]#redis-cli -a 123456 -h 10.0.0.206 GET key1 
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
(error) MOVED 9189 10.0.0.202:6379
[root@Redis-Master1 ~]#redis-cli -a 123456 -h 10.0.0.206 KEYS "*"
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
1) "key1"

redis cluster 计算key所属的slot

[root@Redis-Master1 ~]#redis-cli -h 10.0.0.201 -a 123456 --no-auth-warning cluster nodes
8abe17bdc9846481cf0ee404a1c4a3ac4af579b7 10.0.0.202:6379@16379 master - 0 1667220224744 2 connected 5461-10922
4f2b470fab9b51a6ba5b2a2029dc98a8fc1f40cb 10.0.0.206:6379@16379 slave 8abe17bdc9846481cf0ee404a1c4a3ac4af579b7 0 1667220224000 2 connected
dc7df9c92e6089c470b3d106c4e8ef7082133233 10.0.0.203:6379@16379 master - 0 1667220223740 3 connected 10923-16383
085fdaee434ce0b7b0159836eccb118f978510d6 10.0.0.204:6379@16379 slave dc7df9c92e6089c470b3d106c4e8ef7082133233 0 1667220223000 3 connected
3bf1fc52a7ca5b165cd57a675d020a89828a5f62 10.0.0.205:6379@16379 slave f1c00557c9e0939fe429921f10ac24d5198c7b25 0 1667220221000 1 connected
f1c00557c9e0939fe429921f10ac24d5198c7b25 10.0.0.201:6379@16379 myself,master - 0 1667220222000 1 connected 0-5460


#计算得到hello对应的slot
[root@Redis-Master1 ~]#redis-cli -h 10.0.0.201 -a 123456 --no-auth-warning cluster keyslot hello
(integer) 866
[root@Redis-Master1 ~]#redis-cli -h 10.0.0.201 -a 123456 --no-auth-warning set hello shuhong
OK
[root@Redis-Master1 ~]#redis-cli -h 10.0.0.201 -a 123456 --no-auth-warning cluster keyslot name
(integer) 5798
[root@Redis-Master1 ~]#redis-cli -h 10.0.0.202 -a 123456 --no-auth-warning set name shuzihan
OK
[root@Redis-Master1 ~]#redis-cli -h 10.0.0.202 -a 123456 --no-auth-warning get name 
"shuzihan"

#使用选项-c 以集群模式连接
[root@Redis-Master1 ~]#redis-cli -c -h 10.0.0.201 -a 123456 --no-auth-warning
10.0.0.201:6379> cluster keyslot linux
(integer) 12299
10.0.0.201:6379> set linux love
-> Redirected to slot [12299] located at 10.0.0.203:6379
OK
10.0.0.203:6379> get linux 
"love"
10.0.0.203:6379> exit
[root@Redis-Master1 ~]#redis-cli -h 10.0.0.203 -a 123456 --no-auth-warning get linux
"love"

python 程序实现Redis Cluster 访问

官网:https://github.com/Grokzen/redis-py-cluster

[root@Redis-Master1 ~]#apt -y install python3
[root@Redis-Master1 ~]#apt -y install python3-pip
[root@Redis-Master1 ~]#pip3 install redis-py-cluster

[root@Redis-Master1 ~]#cat redis_cluster_test.py 
#!/usr/bin/env python3
from rediscluster import RedisCluster
startup_nodes = [
{"host":"10.0.0.201", "port":6379},
{"host":"10.0.0.202", "port":6379},
{"host":"10.0.0.203", "port":6379},
{"host":"10.0.0.204", "port":6379},
{"host":"10.0.0.205", "port":6379},
{"host":"10.0.0.206", "port":6379}
]
redis_conn= RedisCluster(startup_nodes=startup_nodes,password='123456',decode_responses=True)
for i in range(0, 10000):
    redis_conn.set('key'+str(i),'value'+str(i))
    print('key'+str(i)+':',redis_conn.get('key'+str(i)))
                                                                                 
[root@Redis-Master1 ~]#chmod +x redis_cluster_test.py
[root@Redis-Master1 ~]#./redis_cluster_test.py
....
key9994: value9994
key9995: value9995
key9996: value9996
key9997: value9997
key9998: value9998
key9999: value9999

[root@Redis-Master1 ~]#redis-cli -a 123456 -h 10.0.0.201 dbsize
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
(integer) 3332
[root@Redis-Master1 ~]#redis-cli -a 123456 -h 10.0.0.202 dbsize
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
(integer) 3341
[root@Redis-Master1 ~]#redis-cli -a 123456 -h 10.0.0.204 dbsize
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
(integer) 3330

#合计10003符合插入数据数量

模拟故障实现故障转移

#模拟Redis-Master2节点出故障,需要相应的数秒故障转移时间
[root@Redis-Master2 ~]#redis-cli -a 123456 shutdown
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
[root@Redis-Master2 ~]#ss -ntlp
State       Recv-Q      Send-Q            Local Address:Port             Peer Address:Port      Process                                           
LISTEN      0           4096              127.0.0.53%lo:53                    0.0.0.0:*          users:(("systemd-resolve",pid=12479,fd=13))      
LISTEN      0           128                     0.0.0.0:22                    0.0.0.0:*          users:(("sshd",pid=772,fd=3))                    
LISTEN      0           128                   127.0.0.1:6010                  0.0.0.0:*          users:(("sshd",pid=1060,fd=10))                  
LISTEN      0           128                   127.0.0.1:6011                  0.0.0.0:*          users:(("sshd",pid=46295,fd=10))                 
LISTEN      0           128                        [::]:22                       [::]:*          users:(("sshd",pid=772,fd=4))                    
LISTEN      0           128                       [::1]:6010                     [::]:*          users:(("sshd",pid=1060,fd=9))                   
LISTEN      0           128                       [::1]:6011                     [::]:*          users:(("sshd",pid=46295,fd=9))                  

[root@Redis-Master2 ~]#redis-cli -a 123456 --cluster info 10.0.0.201:6379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
Could not connect to Redis at 10.0.0.202:6379: Connection refused
10.0.0.201:6379 (f1c00557...) -> 3332 keys | 5461 slots | 1 slaves.
10.0.0.206:6379 (4f2b470f...) -> 3341 keys | 5462 slots | 0 slaves.
10.0.0.203:6379 (dc7df9c9...) -> 3330 keys | 5461 slots | 1 slaves.
[OK] 10003 keys in 3 masters.
0.61 keys per slot on average.

[root@Redis-Master2 ~]#redis-cli -a 123456 --cluster check 10.0.0.201:6379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
Could not connect to Redis at 10.0.0.202:6379: Connection refused
10.0.0.201:6379 (f1c00557...) -> 3332 keys | 5461 slots | 1 slaves.
10.0.0.206:6379 (4f2b470f...) -> 3341 keys | 5462 slots | 0 slaves.
10.0.0.203:6379 (dc7df9c9...) -> 3330 keys | 5461 slots | 1 slaves.
[OK] 10003 keys in 3 masters.
0.61 keys per slot on average.
>>> Performing Cluster Check (using node 10.0.0.201:6379)
M: f1c00557c9e0939fe429921f10ac24d5198c7b25 10.0.0.201:6379
   slots:[0-5460] (5461 slots) master
   1 additional replica(s)
M: 4f2b470fab9b51a6ba5b2a2029dc98a8fc1f40cb 10.0.0.206:6379
   slots:[5461-10922] (5462 slots) master
M: dc7df9c92e6089c470b3d106c4e8ef7082133233 10.0.0.203:6379
   slots:[10923-16383] (5461 slots) master
   1 additional replica(s)
S: 085fdaee434ce0b7b0159836eccb118f978510d6 10.0.0.204:6379
   slots: (0 slots) slave
   replicates dc7df9c92e6089c470b3d106c4e8ef7082133233
S: 3bf1fc52a7ca5b165cd57a675d020a89828a5f62 10.0.0.205:6379
   slots: (0 slots) slave
   replicates f1c00557c9e0939fe429921f10ac24d5198c7b25
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

[root@Redis-Master2 ~]#redis-cli -a 123456 -h 10.0.0.206
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
10.0.0.206:6379> info replication
# Replication
role:master
connected_slaves:0
master_failover_state:no-failover
master_replid:abc64ca5ff6a5d3a984943a792427307643345f3
master_replid2:6fed8dc7af5d5a9523a72c8888a1e8e69876a7fe
master_repl_offset:138754
second_repl_offset:138755
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:15
repl_backlog_histlen:138740

#恢复故障节点Redis-Master2自动成为slave节点
[root@Redis-Master2 ~]#systemctl start redis.service 
[root@Redis-Master2 ~]#ss -ntlp
State       Recv-Q      Send-Q           Local Address:Port              Peer Address:Port      Process                                           
LISTEN      0           4096             127.0.0.53%lo:53                     0.0.0.0:*          users:(("systemd-resolve",pid=12479,fd=13))      
LISTEN      0           128                    0.0.0.0:22                     0.0.0.0:*          users:(("sshd",pid=772,fd=3))                    
LISTEN      0           128                  127.0.0.1:6010                   0.0.0.0:*          users:(("sshd",pid=1060,fd=10))                  
LISTEN      0           511                    0.0.0.0:16379                  0.0.0.0:*          users:(("redis-server",pid=49649,fd=9))          
LISTEN      0           128                  127.0.0.1:6011                   0.0.0.0:*          users:(("sshd",pid=46295,fd=10))                 
LISTEN      0           511                    0.0.0.0:6379                   0.0.0.0:*          users:(("redis-server",pid=49649,fd=6))          
LISTEN      0           128                       [::]:22                        [::]:*          users:(("sshd",pid=772,fd=4))                    
LISTEN      0           128                      [::1]:6010                      [::]:*          users:(("sshd",pid=1060,fd=9))                   
LISTEN      0           511                      [::1]:16379                     [::]:*          users:(("redis-server",pid=49649,fd=10))         
LISTEN      0           128                      [::1]:6011                      [::]:*          users:(("sshd",pid=46295,fd=9))                  
LISTEN      0           511                      [::1]:6379                      [::]:*          users:(("redis-server",pid=49649,fd=7))          

#查看自动生成的配置文件,可以查看node2自动成为slave节点
[root@Redis-Master2 ~]#cat /apps/redis/data/nodes-6379.conf 
f1c00557c9e0939fe429921f10ac24d5198c7b25 10.0.0.201:6379@16379 master - 0 1667221230046 1 connected 0-5460
8abe17bdc9846481cf0ee404a1c4a3ac4af579b7 10.0.0.202:6379@16379 myself,slave 4f2b470fab9b51a6ba5b2a2029dc98a8fc1f40cb 0 1667221230030 7 connected
3bf1fc52a7ca5b165cd57a675d020a89828a5f62 10.0.0.205:6379@16379 slave f1c00557c9e0939fe429921f10ac24d5198c7b25 0 1667221230047 1 connected
dc7df9c92e6089c470b3d106c4e8ef7082133233 10.0.0.203:6379@16379 master - 0 1667221230047 3 connected 10923-16383
085fdaee434ce0b7b0159836eccb118f978510d6 10.0.0.204:6379@16379 slave dc7df9c92e6089c470b3d106c4e8ef7082133233 0 1667221230051 3 connected
4f2b470fab9b51a6ba5b2a2029dc98a8fc1f40cb 10.0.0.206:6379@16379 master - 0 1667221230051 7 connected 5461-10922
vars currentEpoch 7 lastVoteEpoch 0

[root@Redis-Slave3 ~]#redis-cli -a 123456 -c INFO replication
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
# Replication
role:master
connected_slaves:1
slave0:ip=10.0.0.202,port=6379,state=online,offset=138852,lag=1
master_failover_state:no-failover
master_replid:abc64ca5ff6a5d3a984943a792427307643345f3
master_replid2:6fed8dc7af5d5a9523a72c8888a1e8e69876a7fe
master_repl_offset:138852
second_repl_offset:138755
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:15
repl_backlog_histlen:138838

Redis cluster 管理

集群扩容

扩容适用场景:
当前客户量激增,现有的Redis cluster架构已经无法满足越来越高的并发访问请求,为解决此问题,新购置两台服务器,要求将其动态添加到现有集群,但不能影响业务的正常访问。
注意: 生产环境一般建议master节点为奇数个,比如:3,5,7,以防止脑裂现象

添加节点准备
增加Redis 新节点,需要与之前的Redis node版本和配置一致,然后分别再启动两台Redis node,应为一主一从。

[root@ansible ansible]#ansible-playbook install_redis_cluster.yaml 

PLAY [install redis] *******************************************************************************************************************************

TASK [Gathering Facts] *****************************************************************************************************************************
ok: [10.0.0.208]
ok: [10.0.0.209]
....

添加新的master节点到集群
使用以下命令添加新节点,要添加的新redis节点IP和端口添加到的已有的集群中任意节点的IP:端口

add-node new_host:new_port existing_host:existing_port [--slave --master-id <arg>]
#说明:
new_host:new_port #指定新添加的主机的IP和端口
existing_host:existing_port #指定已有的集群中任意节点的IP和端口

Redis 5 以上版本的添加命令:

[root@Redis-Slave3 ~]#redis-cli -a 123456 --cluster add-node 10.0.0.208:6379 10.0.0.201:6379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
>>> Adding node 10.0.0.208:6379 to cluster 10.0.0.201:6379
>>> Performing Cluster Check (using node 10.0.0.201:6379)
M: f1c00557c9e0939fe429921f10ac24d5198c7b25 10.0.0.201:6379
   slots:[0-5460] (5461 slots) master
   1 additional replica(s)
S: 8abe17bdc9846481cf0ee404a1c4a3ac4af579b7 10.0.0.202:6379
   slots: (0 slots) slave
   replicates 4f2b470fab9b51a6ba5b2a2029dc98a8fc1f40cb
M: 4f2b470fab9b51a6ba5b2a2029dc98a8fc1f40cb 10.0.0.206:6379
   slots:[5461-10922] (5462 slots) master
   1 additional replica(s)
M: dc7df9c92e6089c470b3d106c4e8ef7082133233 10.0.0.203:6379
   slots:[10923-16383] (5461 slots) master
   1 additional replica(s)
S: 085fdaee434ce0b7b0159836eccb118f978510d6 10.0.0.204:6379
   slots: (0 slots) slave
   replicates dc7df9c92e6089c470b3d106c4e8ef7082133233
S: 3bf1fc52a7ca5b165cd57a675d020a89828a5f62 10.0.0.205:6379
   slots: (0 slots) slave
   replicates f1c00557c9e0939fe429921f10ac24d5198c7b25
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
>>> Getting functions from cluster
>>> Send FUNCTION LIST to 10.0.0.208:6379 to verify there is no functions in it
>>> Send FUNCTION RESTORE to 10.0.0.208:6379
>>> Send CLUSTER MEET to node 10.0.0.208:6379 to make it join the cluster.
[OK] New node added correctly.

[root@Redis-Slave3 ~]#redis-cli -a 123456 --cluster info 10.0.0.201:6379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
10.0.0.201:6379 (f1c00557...) -> 3332 keys | 5461 slots | 1 slaves.
10.0.0.206:6379 (4f2b470f...) -> 3341 keys | 5462 slots | 1 slaves.
10.0.0.203:6379 (dc7df9c9...) -> 3330 keys | 5461 slots | 1 slaves.
10.0.0.208:6379 (5c55e3d8...) -> 0 keys | 0 slots | 0 slaves.
[OK] 10003 keys in 4 masters.
0.61 keys per slot on average.

[root@Redis-Slave3 ~]#redis-cli -a 123456 --cluster check 10.0.0.201:6379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
10.0.0.201:6379 (f1c00557...) -> 3332 keys | 5461 slots | 1 slaves.
10.0.0.206:6379 (4f2b470f...) -> 3341 keys | 5462 slots | 1 slaves.
10.0.0.203:6379 (dc7df9c9...) -> 3330 keys | 5461 slots | 1 slaves.
10.0.0.208:6379 (5c55e3d8...) -> 0 keys | 0 slots | 0 slaves.
[OK] 10003 keys in 4 masters.
0.61 keys per slot on average.
>>> Performing Cluster Check (using node 10.0.0.201:6379)
M: f1c00557c9e0939fe429921f10ac24d5198c7b25 10.0.0.201:6379
   slots:[0-5460] (5461 slots) master
   1 additional replica(s)
S: 8abe17bdc9846481cf0ee404a1c4a3ac4af579b7 10.0.0.202:6379
   slots: (0 slots) slave
   replicates 4f2b470fab9b51a6ba5b2a2029dc98a8fc1f40cb
M: 4f2b470fab9b51a6ba5b2a2029dc98a8fc1f40cb 10.0.0.206:6379
   slots:[5461-10922] (5462 slots) master
   1 additional replica(s)
M: dc7df9c92e6089c470b3d106c4e8ef7082133233 10.0.0.203:6379
   slots:[10923-16383] (5461 slots) master
   1 additional replica(s)
M: 5c55e3d8616d5f8f83c20a1d5ddc9e2996fe7561 10.0.0.208:6379
   slots: (0 slots) master
S: 085fdaee434ce0b7b0159836eccb118f978510d6 10.0.0.204:6379
   slots: (0 slots) slave
   replicates dc7df9c92e6089c470b3d106c4e8ef7082133233
S: 3bf1fc52a7ca5b165cd57a675d020a89828a5f62 10.0.0.205:6379
   slots: (0 slots) slave
   replicates f1c00557c9e0939fe429921f10ac24d5198c7b25
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

在新的master上重新分配槽位

Redis 5以上版本命令:

[root@Redis-Slave3 ~]#redis-cli -a 123456 --cluster reshard 10.0.0.206:6379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
>>> Performing Cluster Check (using node 10.0.0.206:6379)
M: 4f2b470fab9b51a6ba5b2a2029dc98a8fc1f40cb 10.0.0.206:6379
   slots:[5461-10922] (5462 slots) master
   1 additional replica(s)
S: 3bf1fc52a7ca5b165cd57a675d020a89828a5f62 10.0.0.205:6379
   slots: (0 slots) slave
   replicates f1c00557c9e0939fe429921f10ac24d5198c7b25
M: f1c00557c9e0939fe429921f10ac24d5198c7b25 10.0.0.201:6379
   slots:[0-5460] (5461 slots) master
   1 additional replica(s)
M: dc7df9c92e6089c470b3d106c4e8ef7082133233 10.0.0.203:6379
   slots:[10923-16383] (5461 slots) master
   1 additional replica(s)
M: 5c55e3d8616d5f8f83c20a1d5ddc9e2996fe7561 10.0.0.208:6379
   slots: (0 slots) master
S: 085fdaee434ce0b7b0159836eccb118f978510d6 10.0.0.204:6379
   slots: (0 slots) slave
   replicates dc7df9c92e6089c470b3d106c4e8ef7082133233
S: 8abe17bdc9846481cf0ee404a1c4a3ac4af579b7 10.0.0.202:6379
   slots: (0 slots) slave
   replicates 4f2b470fab9b51a6ba5b2a2029dc98a8fc1f40cb
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
How many slots do you want to move (from 1 to 16384)? 4096
What is the receiving node ID? 5c55e3d8616d5f8f83c20a1d5ddc9e2996fe7561
Please enter all the source node IDs.
  Type 'all' to use all the nodes as source nodes for the hash slots.
  Type 'done' once you entered all the source nodes IDs.
Source node #1: all

Ready to move 4096 slots.
  Source nodes:
    M: 4f2b470fab9b51a6ba5b2a2029dc98a8fc1f40cb 10.0.0.206:6379
       slots:[5461-10922] (5462 slots) master
       1 additional replica(s)
    M: f1c00557c9e0939fe429921f10ac24d5198c7b25 10.0.0.201:6379
       slots:[0-5460] (5461 slots) master
       1 additional replica(s)
    M: dc7df9c92e6089c470b3d106c4e8ef7082133233 10.0.0.203:6379
       slots:[10923-16383] (5461 slots) master
       1 additional replica(s)
  Destination node:
    M: 5c55e3d8616d5f8f83c20a1d5ddc9e2996fe7561 10.0.0.208:6379
       slots: (0 slots) master
  Resharding plan:
    Moving slot 5461 from 4f2b470fab9b51a6ba5b2a2029dc98a8fc1f40cb
    Moving slot 5462 from 4f2b470fab9b51a6ba5b2a2029dc98a8fc1f40cb
    Moving slot 5463 from 4f2b470fab9b51a6ba5b2a2029dc98a8fc1f40cb
    Moving slot 5464 from 4f2b470fab9b51a6ba5b2a2029dc98a8fc1f40cb
    Moving slot 5465 from 4f2b470fab9b51a6ba5b2a2029dc98a8fc1f40cb
    Moving slot 5466 from 4f2b470fab9b51a6ba5b2a2029dc98a8fc1f40cb
.....


#确定slot分配成功
[root@Redis-Slave3 ~]#redis-cli -a 123456 --cluster check 10.0.0.206:6379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
10.0.0.206:6379 (4f2b470f...) -> 2515 keys | 4096 slots | 1 slaves.
10.0.0.201:6379 (f1c00557...) -> 2511 keys | 4096 slots | 1 slaves.
10.0.0.203:6379 (dc7df9c9...) -> 2501 keys | 4096 slots | 1 slaves.
10.0.0.208:6379 (5c55e3d8...) -> 2476 keys | 4096 slots | 0 slaves.
[OK] 10003 keys in 4 masters.
0.61 keys per slot on average.
>>> Performing Cluster Check (using node 10.0.0.206:6379)
M: 4f2b470fab9b51a6ba5b2a2029dc98a8fc1f40cb 10.0.0.206:6379
   slots:[6827-10922] (4096 slots) master
   1 additional replica(s)
S: 3bf1fc52a7ca5b165cd57a675d020a89828a5f62 10.0.0.205:6379
   slots: (0 slots) slave
   replicates f1c00557c9e0939fe429921f10ac24d5198c7b25
M: f1c00557c9e0939fe429921f10ac24d5198c7b25 10.0.0.201:6379
   slots:[1365-5460] (4096 slots) master
   1 additional replica(s)
M: dc7df9c92e6089c470b3d106c4e8ef7082133233 10.0.0.203:6379
   slots:[12288-16383] (4096 slots) master
   1 additional replica(s)
M: 5c55e3d8616d5f8f83c20a1d5ddc9e2996fe7561 10.0.0.208:6379
   slots:[0-1364],[5461-6826],[10923-12287] (4096 slots) master
S: 085fdaee434ce0b7b0159836eccb118f978510d6 10.0.0.204:6379
   slots: (0 slots) slave
   replicates dc7df9c92e6089c470b3d106c4e8ef7082133233
S: 8abe17bdc9846481cf0ee404a1c4a3ac4af579b7 10.0.0.202:6379
   slots: (0 slots) slave
   replicates 4f2b470fab9b51a6ba5b2a2029dc98a8fc1f40cb
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

为新的master指定新的slave节点

Redis 5 以上版本添加命令:

[root@Redis-Slave3 ~]#redis-cli -a 123456 --cluster add-node 10.0.0.209:6379 10.0.0.206:6379 --cluster-slave --cluster-master-id 5c55e3d8616d5f8f83c20a1d5ddc9e2996fe7561
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
>>> Adding node 10.0.0.209:6379 to cluster 10.0.0.206:6379
>>> Performing Cluster Check (using node 10.0.0.206:6379)
M: 4f2b470fab9b51a6ba5b2a2029dc98a8fc1f40cb 10.0.0.206:6379
   slots:[6827-10922] (4096 slots) master
   1 additional replica(s)
S: 3bf1fc52a7ca5b165cd57a675d020a89828a5f62 10.0.0.205:6379
   slots: (0 slots) slave
   replicates f1c00557c9e0939fe429921f10ac24d5198c7b25
M: f1c00557c9e0939fe429921f10ac24d5198c7b25 10.0.0.201:6379
   slots:[1365-5460] (4096 slots) master
   1 additional replica(s)
M: dc7df9c92e6089c470b3d106c4e8ef7082133233 10.0.0.203:6379
   slots:[12288-16383] (4096 slots) master
   1 additional replica(s)
M: 5c55e3d8616d5f8f83c20a1d5ddc9e2996fe7561 10.0.0.208:6379
   slots:[0-1364],[5461-6826],[10923-12287] (4096 slots) master
S: 085fdaee434ce0b7b0159836eccb118f978510d6 10.0.0.204:6379
   slots: (0 slots) slave
   replicates dc7df9c92e6089c470b3d106c4e8ef7082133233
S: 8abe17bdc9846481cf0ee404a1c4a3ac4af579b7 10.0.0.202:6379
   slots: (0 slots) slave
   replicates 4f2b470fab9b51a6ba5b2a2029dc98a8fc1f40cb
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
>>> Send CLUSTER MEET to node 10.0.0.209:6379 to make it join the cluster.
Waiting for the cluster to join

>>> Configure node as replica of 10.0.0.208:6379.
[OK] New node added correctly.


[root@Redis-Slave3 ~]#redis-cli -a 123456 --cluster check 10.0.0.206:6379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
10.0.0.206:6379 (4f2b470f...) -> 2515 keys | 4096 slots | 1 slaves.
10.0.0.201:6379 (f1c00557...) -> 2511 keys | 4096 slots | 1 slaves.
10.0.0.203:6379 (dc7df9c9...) -> 2501 keys | 4096 slots | 1 slaves.
10.0.0.208:6379 (5c55e3d8...) -> 2476 keys | 4096 slots | 1 slaves.
[OK] 10003 keys in 4 masters.
0.61 keys per slot on average.
>>> Performing Cluster Check (using node 10.0.0.206:6379)
M: 4f2b470fab9b51a6ba5b2a2029dc98a8fc1f40cb 10.0.0.206:6379
   slots:[6827-10922] (4096 slots) master
   1 additional replica(s)
S: 3bf1fc52a7ca5b165cd57a675d020a89828a5f62 10.0.0.205:6379
   slots: (0 slots) slave
   replicates f1c00557c9e0939fe429921f10ac24d5198c7b25
M: f1c00557c9e0939fe429921f10ac24d5198c7b25 10.0.0.201:6379
   slots:[1365-5460] (4096 slots) master
   1 additional replica(s)
M: dc7df9c92e6089c470b3d106c4e8ef7082133233 10.0.0.203:6379
   slots:[12288-16383] (4096 slots) master
   1 additional replica(s)
S: 67b0af2a210ea94bd6d414eeda03d64caef104dd 10.0.0.209:6379
   slots: (0 slots) slave
   replicates 5c55e3d8616d5f8f83c20a1d5ddc9e2996fe7561
M: 5c55e3d8616d5f8f83c20a1d5ddc9e2996fe7561 10.0.0.208:6379
   slots:[0-1364],[5461-6826],[10923-12287] (4096 slots) master
   1 additional replica(s)
S: 085fdaee434ce0b7b0159836eccb118f978510d6 10.0.0.204:6379
   slots: (0 slots) slave
   replicates dc7df9c92e6089c470b3d106c4e8ef7082133233
S: 8abe17bdc9846481cf0ee404a1c4a3ac4af579b7 10.0.0.202:6379
   slots: (0 slots) slave
   replicates 4f2b470fab9b51a6ba5b2a2029dc98a8fc1f40cb
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.


[root@Redis-Slave3 ~]#redis-cli -a 123456 --cluster info 10.0.0.201:6379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
10.0.0.201:6379 (f1c00557...) -> 2511 keys | 4096 slots | 1 slaves.
10.0.0.206:6379 (4f2b470f...) -> 2515 keys | 4096 slots | 1 slaves.
10.0.0.203:6379 (dc7df9c9...) -> 2501 keys | 4096 slots | 1 slaves.
10.0.0.208:6379 (5c55e3d8...) -> 2476 keys | 4096 slots | 1 slaves.
[OK] 10003 keys in 4 masters.
0.61 keys per slot on average.

集群缩容

缩容适用场景:
随着业务萎缩用户量下降明显,和领导商量决定将现有Redis集群的8台主机中下线两台主机挪做它用,缩容后性能仍能满足当前业务需求
删除节点过程:
扩容时是先添加node到集群,然后再分配槽位,而缩容时的操作相反,是先将被要删除的node上的槽位迁移到集群中的其他node上,然后 才能再将其从集群中删除,如果一个node上的槽位没有被完全迁移空,删除该node时也会提示有数据出错导致无法删除。

迁移要删除的master节点上面的槽位到其它master
注意: 被迁移Redis master源服务器必须保证没有数据,否则迁移报错并会被强制中断。

[root@Redis-Slave3 ~]#redis-cli -a 123456 --cluster reshard 10.0.0.206:6379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
>>> Performing Cluster Check (using node 10.0.0.206:6379)
M: 4f2b470fab9b51a6ba5b2a2029dc98a8fc1f40cb 10.0.0.206:6379
   slots:[6827-10922] (4096 slots) master
   1 additional replica(s)
S: 3bf1fc52a7ca5b165cd57a675d020a89828a5f62 10.0.0.205:6379
   slots: (0 slots) slave
   replicates f1c00557c9e0939fe429921f10ac24d5198c7b25
M: f1c00557c9e0939fe429921f10ac24d5198c7b25 10.0.0.201:6379
   slots:[1365-5460] (4096 slots) master
   1 additional replica(s)
M: dc7df9c92e6089c470b3d106c4e8ef7082133233 10.0.0.203:6379
   slots:[12288-16383] (4096 slots) master
   1 additional replica(s)
S: 67b0af2a210ea94bd6d414eeda03d64caef104dd 10.0.0.209:6379
   slots: (0 slots) slave
   replicates 5c55e3d8616d5f8f83c20a1d5ddc9e2996fe7561
M: 5c55e3d8616d5f8f83c20a1d5ddc9e2996fe7561 10.0.0.208:6379
   slots:[0-1364],[5461-6826],[10923-12287] (4096 slots) master
   1 additional replica(s)
S: 085fdaee434ce0b7b0159836eccb118f978510d6 10.0.0.204:6379
   slots: (0 slots) slave
   replicates dc7df9c92e6089c470b3d106c4e8ef7082133233
S: 8abe17bdc9846481cf0ee404a1c4a3ac4af579b7 10.0.0.202:6379
   slots: (0 slots) slave
   replicates 4f2b470fab9b51a6ba5b2a2029dc98a8fc1f40cb
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
How many slots do you want to move (from 1 to 16384)? 1365 
What is the receiving node ID? f1c00557c9e0939fe429921f10ac24d5198c7b25
Please enter all the source node IDs.
  Type 'all' to use all the nodes as source nodes for the hash slots.
  Type 'done' once you entered all the source nodes IDs.
Source node #1: 5c55e3d8616d5f8f83c20a1d5ddc9e2996fe7561
Source node #2: done
....


[root@Redis-Slave3 ~]#redis-cli -a 123456 --cluster reshard 10.0.0.206:6379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
>>> Performing Cluster Check (using node 10.0.0.206:6379)
M: 4f2b470fab9b51a6ba5b2a2029dc98a8fc1f40cb 10.0.0.206:6379
   slots:[6827-10922] (4096 slots) master
   1 additional replica(s)
S: 3bf1fc52a7ca5b165cd57a675d020a89828a5f62 10.0.0.205:6379
   slots: (0 slots) slave
   replicates f1c00557c9e0939fe429921f10ac24d5198c7b25
M: f1c00557c9e0939fe429921f10ac24d5198c7b25 10.0.0.201:6379
   slots:[0-5460] (5461 slots) master
   1 additional replica(s)
M: dc7df9c92e6089c470b3d106c4e8ef7082133233 10.0.0.203:6379
   slots:[12288-16383] (4096 slots) master
   1 additional replica(s)
S: 67b0af2a210ea94bd6d414eeda03d64caef104dd 10.0.0.209:6379
   slots: (0 slots) slave
   replicates 5c55e3d8616d5f8f83c20a1d5ddc9e2996fe7561
M: 5c55e3d8616d5f8f83c20a1d5ddc9e2996fe7561 10.0.0.208:6379
   slots:[5461-6826],[10923-12287] (2731 slots) master
   1 additional replica(s)
S: 085fdaee434ce0b7b0159836eccb118f978510d6 10.0.0.204:6379
   slots: (0 slots) slave
   replicates dc7df9c92e6089c470b3d106c4e8ef7082133233
S: 8abe17bdc9846481cf0ee404a1c4a3ac4af579b7 10.0.0.202:6379
   slots: (0 slots) slave
   replicates 4f2b470fab9b51a6ba5b2a2029dc98a8fc1f40cb
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
How many slots do you want to move (from 1 to 16384)? 1366
What is the receiving node ID? 4f2b470fab9b51a6ba5b2a2029dc98a8fc1f40cb
Please enter all the source node IDs.
  Type 'all' to use all the nodes as source nodes for the hash slots.
  Type 'done' once you entered all the source nodes IDs.
Source node #1: 5c55e3d8616d5f8f83c20a1d5ddc9e2996fe7561
Source node #2: done
.....


[root@Redis-Slave3 ~]#redis-cli -a 123456 --cluster reshard 10.0.0.206:6379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
>>> Performing Cluster Check (using node 10.0.0.206:6379)
M: 4f2b470fab9b51a6ba5b2a2029dc98a8fc1f40cb 10.0.0.206:6379
   slots:[5461-10922] (5462 slots) master
   1 additional replica(s)
S: 3bf1fc52a7ca5b165cd57a675d020a89828a5f62 10.0.0.205:6379
   slots: (0 slots) slave
   replicates f1c00557c9e0939fe429921f10ac24d5198c7b25
M: f1c00557c9e0939fe429921f10ac24d5198c7b25 10.0.0.201:6379
   slots:[0-5460] (5461 slots) master
   1 additional replica(s)
M: dc7df9c92e6089c470b3d106c4e8ef7082133233 10.0.0.203:6379
   slots:[12288-16383] (4096 slots) master
   1 additional replica(s)
S: 67b0af2a210ea94bd6d414eeda03d64caef104dd 10.0.0.209:6379
   slots: (0 slots) slave
   replicates 5c55e3d8616d5f8f83c20a1d5ddc9e2996fe7561
M: 5c55e3d8616d5f8f83c20a1d5ddc9e2996fe7561 10.0.0.208:6379
   slots:[10923-12287] (1365 slots) master
   1 additional replica(s)
S: 085fdaee434ce0b7b0159836eccb118f978510d6 10.0.0.204:6379
   slots: (0 slots) slave
   replicates dc7df9c92e6089c470b3d106c4e8ef7082133233
S: 8abe17bdc9846481cf0ee404a1c4a3ac4af579b7 10.0.0.202:6379
   slots: (0 slots) slave
   replicates 4f2b470fab9b51a6ba5b2a2029dc98a8fc1f40cb
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.
How many slots do you want to move (from 1 to 16384)? 1365
What is the receiving node ID? dc7df9c92e6089c470b3d106c4e8ef7082133233
Please enter all the source node IDs.
  Type 'all' to use all the nodes as source nodes for the hash slots.
  Type 'done' once you entered all the source nodes IDs.
Source node #1: 5c55e3d8616d5f8f83c20a1d5ddc9e2996fe7561
Source node #2: done
....

[root@Redis-Slave3 ~]#redis-cli -a 123456 --cluster check 10.0.0.208:6379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
10.0.0.201:6379 (f1c00557...) -> 3332 keys | 5461 slots | 1 slaves.
10.0.0.206:6379 (4f2b470f...) -> 3341 keys | 5462 slots | 1 slaves.
10.0.0.203:6379 (dc7df9c9...) -> 3330 keys | 5461 slots | 3 slaves.
[OK] 10003 keys in 3 masters.
0.61 keys per slot on average.
>>> Performing Cluster Check (using node 10.0.0.208:6379)
S: 5c55e3d8616d5f8f83c20a1d5ddc9e2996fe7561 10.0.0.208:6379
   slots: (0 slots) slave
   replicates dc7df9c92e6089c470b3d106c4e8ef7082133233
S: 8abe17bdc9846481cf0ee404a1c4a3ac4af579b7 10.0.0.202:6379
   slots: (0 slots) slave
   replicates 4f2b470fab9b51a6ba5b2a2029dc98a8fc1f40cb
M: f1c00557c9e0939fe429921f10ac24d5198c7b25 10.0.0.201:6379
   slots:[0-5460] (5461 slots) master
   1 additional replica(s)
S: 3bf1fc52a7ca5b165cd57a675d020a89828a5f62 10.0.0.205:6379
   slots: (0 slots) slave
   replicates f1c00557c9e0939fe429921f10ac24d5198c7b25
S: 085fdaee434ce0b7b0159836eccb118f978510d6 10.0.0.204:6379
   slots: (0 slots) slave
   replicates dc7df9c92e6089c470b3d106c4e8ef7082133233
S: 67b0af2a210ea94bd6d414eeda03d64caef104dd 10.0.0.209:6379
   slots: (0 slots) slave
   replicates dc7df9c92e6089c470b3d106c4e8ef7082133233
M: 4f2b470fab9b51a6ba5b2a2029dc98a8fc1f40cb 10.0.0.206:6379
   slots:[5461-10922] (5462 slots) master
   1 additional replica(s)
M: dc7df9c92e6089c470b3d106c4e8ef7082133233 10.0.0.203:6379
   slots:[10923-16383] (5461 slots) master
   3 additional replica(s)
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

#从集群中删除服务器
[root@Redis-Slave3 ~]#redis-cli -a 123456 --cluster del-node 10.0.0.201:6379 67b0af2a210ea94bd6d414eeda03d64caef104dd
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
>>> Removing node 67b0af2a210ea94bd6d414eeda03d64caef104dd from cluster 10.0.0.201:6379
>>> Sending CLUSTER FORGET messages to the cluster...
>>> Sending CLUSTER RESET SOFT to the deleted node.
[root@Redis-Slave3 ~]#redis-cli -a 123456 --cluster del-node 10.0.0.201:6379 5c55e3d8616d5f8f83c20a1d5ddc9e2996fe7561
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
>>> Removing node 5c55e3d8616d5f8f83c20a1d5ddc9e2996fe7561 from cluster 10.0.0.201:6379
>>> Sending CLUSTER FORGET messages to the cluster...
>>> Sending CLUSTER RESET SOFT to the deleted node.

[root@Redis-Slave3 ~]#redis-cli -a 123456 --cluster check 10.0.0.201:6379
Warning: Using a password with '-a' or '-u' option on the command line interface may not be safe.
10.0.0.201:6379 (f1c00557...) -> 3332 keys | 5461 slots | 1 slaves.
10.0.0.206:6379 (4f2b470f...) -> 3341 keys | 5462 slots | 1 slaves.
10.0.0.203:6379 (dc7df9c9...) -> 3330 keys | 5461 slots | 1 slaves.
[OK] 10003 keys in 3 masters.
0.61 keys per slot on average.
>>> Performing Cluster Check (using node 10.0.0.201:6379)
M: f1c00557c9e0939fe429921f10ac24d5198c7b25 10.0.0.201:6379
   slots:[0-5460] (5461 slots) master
   1 additional replica(s)
S: 8abe17bdc9846481cf0ee404a1c4a3ac4af579b7 10.0.0.202:6379
   slots: (0 slots) slave
   replicates 4f2b470fab9b51a6ba5b2a2029dc98a8fc1f40cb
M: 4f2b470fab9b51a6ba5b2a2029dc98a8fc1f40cb 10.0.0.206:6379
   slots:[5461-10922] (5462 slots) master
   1 additional replica(s)
M: dc7df9c92e6089c470b3d106c4e8ef7082133233 10.0.0.203:6379
   slots:[10923-16383] (5461 slots) master
   1 additional replica(s)
S: 085fdaee434ce0b7b0159836eccb118f978510d6 10.0.0.204:6379
   slots: (0 slots) slave
   replicates dc7df9c92e6089c470b3d106c4e8ef7082133233
S: 3bf1fc52a7ca5b165cd57a675d020a89828a5f62 10.0.0.205:6379
   slots: (0 slots) slave
   replicates f1c00557c9e0939fe429921f10ac24d5198c7b25
[OK] All nodes agree about slots configuration.
>>> Check for open slots...
>>> Check slots coverage...
[OK] All 16384 slots covered.

导入现有Redis数据至集群

官方提供了迁移单个Redis节点数据到集群的工具,有些公司开发了离线迁移工具
官方工具: redis-cli –cluster import
第三方在线迁移工具: 模拟slave 节点实现, 比如: 唯品会 redis-migrate-tool , 豌豆荚 redis-port
导入适用场景:
业务数据初始是放在单一节点的主机上,随着业务量上升,建立了redis 集群,需要将之前旧数据导入到新建的Redis cluster中.
注意: 导入数据需要redis cluster不能与被导入的数据有重复的key名称,否则导入不成功或中断。

基础环境准备
因为导入时不能指定验证密码,所以导入数据之前需要关闭所有Redis 节点的密码。

#注意每个节点都需要提前关闭安全模式
redis-cli -h <每个节点> -p 6379 -a 123456 --no-auth-warning CONFIG SET requirepass ""
redis-cli --cluster import <导入的地址>:6379 --cluster-from  <数据源地址>:6379 --cluster-copy --cluster-replace

集群偏斜

3.3.5.4 集群偏斜
redis cluster 多个节点运行一段时间后,可能会出现倾斜现象,某个节点数据偏多,内存消耗更大,或者接受用户请求访问更多
发生倾斜的原因可能如下:
节点和槽分配不均
不同槽对应键值数量差异较大
包含bigkey,建议少用
内存相关配置不一致
热点数据不均衡 : 一致性不高时,可以使用本缓存和MQ
获取指定槽位中对应键key值的个数

#redis-cli cluster countkeysinslot {slot的值}

执行自动的槽位重新平衡分布,但会影响客户端的访问,此方法慎用

#redis-cli --cluster rebalance <集群节点IP:PORT>

Redis cluster 的局限性

在集群模式下的从节点是只读连接的,也就是说集群模式中的从节点是拒绝任何读写请求的。当有命令尝试从slave节点获取数据时,slave节点会重定向命令到负责该数据所在槽的节点。
为什么说是只读连接呢?因为slave可以执行命令:readonly,这样从节点就能读取请求,但是这只是在这次连接中生效。也就是说,当客户端断开连接重启后,再次请求又变成重定向了。
集群模式下的读写分离更加复杂,需要维护不同主节点的从节点和对于槽的关系。
通常是不建议在集群模式下构建读写分离,而是添加节点来解决需求。不过考虑到节点之间信息交流带来的带宽问题,官方建议节点数不超过1000个。

单机,哨兵和集群的选择
大多数时客户端性能会”降低”
命令无法跨节点使用:mget、keys、scan、flush、sinter等
客户端维护更复杂:SDK和应用本身消耗(例如更多的连接池)
不支持多个数据库︰集群模式下只有一个db 0
复制只支持一层∶不支持树形复制结构,不支持级联复制
Key事务和Lua支持有限∶操作的key必须在一个节点,Lua和事务无法跨节点使用
所以集群搭建还要考虑单机redis是否已经不能满足业务的并发量,在redis sentinel同样能够满足高可用,且并发并未饱和的前提下,搭建集群反而是画蛇添足了。