Reids集群slots缺失修复


Reids集群slots报错总结

今天项目挂掉了,显示redis集群报错,看了报错日志,发现和redis的slots有关,报错如下:

2022-04-04 09:54:06.734 [localhost-startStop-1] WARN  org.hibernate.id.UUIDHexGenerator - HHH000409: Using org.hibernate.id.UUIDHexGenerator which does not generate IETF RFC 4122 compliant UUID values; consider using org.hibernate.id.UUIDGenerator instead
2022-04-04 09:54:08.111 [redisson-netty-1-1] ERROR org.redisson.cluster.ClusterConnectionManager - cluster_state:fail for /192.168.205.82:6379
2022-04-04 09:54:10.392 [localhost-startStop-1] ERROR o.hibernate.cache.redis.client.RedisClientFactory - Fail to create RedisClient.
org.redisson.client.RedisConnectionException: Not all slots are covered! Only 10923 slots are avaliable
	at org.redisson.cluster.ClusterConnectionManager.<init>(ClusterConnectionManager.java:175)
	at org.redisson.config.ConfigSupport.createConnectionManager(ConfigSupport.java:240)
	at org.redisson.Redisson.<init>(Redisson.java:115)
	at org.redisson.Redisson.create(Redisson.java:152)
	at org.hibernate.cache.redis.client.RedisClientFactory.createRedisClient(RedisClientFactory.java:47)
	at org.hibernate.cache.redis.client.RedisClientFactory.createRedisClient(RedisClientFactory.java:64)
	at org.hibernate.cache.redis.client.RedisClientFactory.createRedisClient(RedisClientFactory.java:92)
	at org.hibernate.cache.redis.hibernate5.AbstractRedisRegionFactory.createRedisClient(AbstractRedisRegionFactory.java:65)
	at org.hibernate.cache.redis.hibernate5.SingletonRedisRegionFactory.start(SingletonRedisRegionFactory.java:55)
	at org.hibernate.internal.CacheImpl.<init>(CacheImpl.java:49)
	at org.hibernate.engine.spi.CacheInitiator.initiateService(CacheInitiator.java:28)
	at org.hibernate.engine.spi.CacheInitiator.initiateService(CacheInitiator.java:20)
	at org.hibernate.service.internal.SessionFactoryServiceRegistryImpl.initiateService(SessionFactoryServiceRegistryImpl.java:49)
	at org.hibernate.service.internal.AbstractServiceRegistryImpl.createService(AbstractServiceRegistryImpl.java:254)
	at org.hibernate.service.internal.AbstractServiceRegistryImpl.initializeService(AbstractServiceRegistryImpl.java:228)
	at org.hibernate.service.internal.AbstractServiceRegistryImpl.getService(AbstractServiceRegistryImpl.java:207)
	at org.hibernate.service.internal.SessionFactoryServiceRegistryImpl.getService(SessionFactoryServiceRegistryImpl.java:68)

1、登录服务器进行查看

redis-cli -h <ip>  -p <端口> -a <密码>
127.0.0.1:7000> set age 20
(error) CLUSTERDOWN The cluster is down
127.0.0.1:7000> cluster info
cluster_state:fail
cluster_slots_assigned:16371
cluster_slots_ok:16371
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:8
cluster_my_epoch:1
cluster_stats_messages_sent:1007
cluster_stats_messages_received:1005

由于redis集群默认slots为16384个,而现在只有16371个,少了13个slots,再看一下slots信息:

127.0.0.1:7000>cluster slots
1) 1) (integer) 0
    2) (integer) 5460
    3) 1) "192.168.205.81"
       2) (integer) 6380
       3) "5fbcbee6cff9c0e4391dbf1553e87befa4914049"
    4) 1) "192.168.205.86"
       2) (integer) 6379
       3) "25cf4ec8180d6cc7e78192cb0d2d021335501b6a"
 2) 1) (integer) 5461
    2) (integer) 6606
    3) 1) "192.168.205.81"
       2) (integer) 6379
       3) "9d83c3a9be7ee1a12a38da33e260f4dbda059462"
    4) 1) "192.168.205.82"
       2) (integer) 6380
       3) "30a9d2065538e59874eced14836c512b5a0d09f4"
 3) 1) (integer) 6608
    2) (integer) 6613
    3) 1) "192.168.205.81"
       2) (integer) 6379
       3) "9d83c3a9be7ee1a12a38da33e260f4dbda059462"
    4) 1) "192.168.205.82"
       2) (integer) 6380
       3) "30a9d2065538e59874eced14836c512b5a0d09f4"
 4) 1) (integer) 6615
    2) (integer) 7037
    3) 1) "192.168.205.81"
       2) (integer) 6379
       3) "9d83c3a9be7ee1a12a38da33e260f4dbda059462"
    4) 1) "192.168.205.82"
       2) (integer) 6380
       3) "30a9d2065538e59874eced14836c512b5a0d09f4"
 5) 1) (integer) 7039
    2) (integer) 7471
    3) 1) "192.168.205.81"
       2) (integer) 6379
       3) "9d83c3a9be7ee1a12a38da33e260f4dbda059462"
    4) 1) "192.168.205.82"
       2) (integer) 6380
       3) "30a9d2065538e59874eced14836c512b5a0d09f4"
 6) 1) (integer) 7473
    2) (integer) 8197
    3) 1) "192.168.205.81"
       2) (integer) 6379
       3) "9d83c3a9be7ee1a12a38da33e260f4dbda059462"
    4) 1) "192.168.205.82"
       2) (integer) 6380
       3) "30a9d2065538e59874eced14836c512b5a0d09f4"
 7) 1) (integer) 8199
    2) (integer) 8324
    3) 1) "192.168.205.81"
       2) (integer) 6379
       3) "9d83c3a9be7ee1a12a38da33e260f4dbda059462"
    4) 1) "192.168.205.82"
       2) (integer) 6380
       3) "30a9d2065538e59874eced14836c512b5a0d09f4"
 8) 1) (integer) 8326
    2) (integer) 8342
    3) 1) "192.168.205.81"
       2) (integer) 6379
       3) "9d83c3a9be7ee1a12a38da33e260f4dbda059462"
    4) 1) "192.168.205.82"
       2) (integer) 6380
       3) "30a9d2065538e59874eced14836c512b5a0d09f4"
 9) 1) (integer) 8344
    2) (integer) 8492
    3) 1) "192.168.205.81"
       2) (integer) 6379
       3) "9d83c3a9be7ee1a12a38da33e260f4dbda059462"
    4) 1) "192.168.205.82"
       2) (integer) 6380
       3) "30a9d2065538e59874eced14836c512b5a0d09f4"
10) 1) (integer) 8494
    2) (integer) 9555
    3) 1) "192.168.205.81"
       2) (integer) 6379
       3) "9d83c3a9be7ee1a12a38da33e260f4dbda059462"
    4) 1) "192.168.205.82"
       2) (integer) 6380
       3) "30a9d2065538e59874eced14836c512b5a0d09f4"
11) 1) (integer) 9557
    2) (integer) 9901
    3) 1) "192.168.205.81"
       2) (integer) 6379
       3) "9d83c3a9be7ee1a12a38da33e260f4dbda059462"
    4) 1) "192.168.205.82"
       2) (integer) 6380
       3) "30a9d2065538e59874eced14836c512b5a0d09f4"
12) 1) (integer) 9903
    2) (integer) 9999
    3) 1) "192.168.205.81"
       2) (integer) 6379
       3) "9d83c3a9be7ee1a12a38da33e260f4dbda059462"
    4) 1) "192.168.205.82"
       2) (integer) 6380
       3) "30a9d2065538e59874eced14836c512b5a0d09f4"
13) 1) (integer) 10001
    2) (integer) 10338
    3) 1) "192.168.205.81"
       2) (integer) 6379
       3) "9d83c3a9be7ee1a12a38da33e260f4dbda059462"
    4) 1) "192.168.205.82"
       2) (integer) 6380
       3) "30a9d2065538e59874eced14836c512b5a0d09f4"
14) 1) (integer) 10340
    2) (integer) 10802
    3) 1) "192.168.205.81"
       2) (integer) 6379
       3) "9d83c3a9be7ee1a12a38da33e260f4dbda059462"
    4) 1) "192.168.205.82"
       2) (integer) 6380
       3) "30a9d2065538e59874eced14836c512b5a0d09f4"
15) 1) (integer) 10804
    2) (integer) 10922
    3) 1) "192.168.205.81"
       2) (integer) 6379
       3) "9d83c3a9be7ee1a12a38da33e260f4dbda059462"
    4) 1) "192.168.205.82"
       2) (integer) 6380
       3) "30a9d2065538e59874eced14836c512b5a0d09f4"
16) 1) (integer) 10923
    2) (integer) 16383
    3) 1) "192.168.205.82"
       2) (integer) 6379
       3) "031d12ae9a6f15e7550e17916454b41fdeaa0bbe"
    4) 1) "192.168.205.86"
       2) (integer) 6380
       3) "eeaa7bbb9e71c47d4c8c0fc67b188753910713e5"

2、分析slots缺少情况

从上述slots中,把所有缺失的slots统计出来:

0-5460
5461-6606
6608-6613
6615-7037
7039-7471
7473-8197
8199-8324
8326-8342
8344-8492
8494-9555
9557-9901
9903-9999
10001-10338
10340-10802
10804-10922
10923-16383

缺失的slots为:

6607 6614 7038 7472 8198 8325 8343 8493 9556 9902 10000 10339 10803

正好为我们缺失的13个slots (16384-16371=13)

3、解决方法如下

3.1将一个或多个槽(slot)指派(assign)给当前节点

127.0.0.1:7000>cluster addslots  6607 6614 7038 7472 8198 8325 8343 8493 9556 9902 10000 10339 10803
OK

3-2查看结果

127.0.0.1:7000>cluster info
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:30
cluster_my_epoch:25
cluster_stats_messages_ping_sent:2831
cluster_stats_messages_pong_sent:686
cluster_stats_messages_publish_sent:98525
cluster_stats_messages_sent:102042
cluster_stats_messages_ping_received:686
cluster_stats_messages_pong_received:690
cluster_stats_messages_publish_received:96234
cluster_stats_messages_received:97610

集群里slots问题已经修复成功,下面进行检测,发现集群上一些slots有些问题

./redis-trib.rb check 192.168.205.81:6379
>>> Check for open slots...
[WARNING] Node 192.168.205.81:6379 has slots in importing state (194,299,626,660,710,1274,1819,1838,2109,3237,3504,3647,3909,4044,4313,4453,4478,4706,4979).
[WARNING] Node 192.168.205.81:6380 has slots in importing state (6607,6614,8712,9902).
[WARNING] Node 192.168.205.82:6379 has slots in importing state (8712).
[WARNING] The following slots are open: 194,299,626,660,710,1274,1819,1838,2109,3237,3504,3647,3909,4044,4313,4453,4478,4706,4979,6607,6614,8712,9902
>>> Check slots coverage...
[OK] All 16384 slots covered.

3-3 分别登录上述node的redis中,删除对应的slosts,再添加slots

127.0.0.1:7000>cluster delslots 194 299 626 660 710 1274 1819 1838 2109 3237 3504 3647 3909 4044 4313 4453 4478 4706 4979
ok
127.0.0.1:7000>cluster addslots 194 299 626 660 710 1274 1819 1838 2109 3237 3504 3647 3909 4044 4313 4453 4478 4706 4979
ok

再对集群进行检测时,已经没有警告信息了,至此redis已全部修复完成

redis原理查看:https://www.cnblogs.com/mengchunchen/p/10059436.html


文章作者: yushui1995
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 yushui1995 !
评论
  目录