Reids集群slots报错总结
今天项目挂掉了,显示redis集群报错,看了报错日志,发现和redis的slots有关,报错如下:
2022-04-04 09:54:06.734 [localhost-startStop-1] WARN org.hibernate.id.UUIDHexGenerator - HHH000409: Using org.hibernate.id.UUIDHexGenerator which does not generate IETF RFC 4122 compliant UUID values; consider using org.hibernate.id.UUIDGenerator instead
2022-04-04 09:54:08.111 [redisson-netty-1-1] ERROR org.redisson.cluster.ClusterConnectionManager - cluster_state:fail for /192.168.205.82:6379
2022-04-04 09:54:10.392 [localhost-startStop-1] ERROR o.hibernate.cache.redis.client.RedisClientFactory - Fail to create RedisClient.
org.redisson.client.RedisConnectionException: Not all slots are covered! Only 10923 slots are avaliable
at org.redisson.cluster.ClusterConnectionManager.<init>(ClusterConnectionManager.java:175)
at org.redisson.config.ConfigSupport.createConnectionManager(ConfigSupport.java:240)
at org.redisson.Redisson.<init>(Redisson.java:115)
at org.redisson.Redisson.create(Redisson.java:152)
at org.hibernate.cache.redis.client.RedisClientFactory.createRedisClient(RedisClientFactory.java:47)
at org.hibernate.cache.redis.client.RedisClientFactory.createRedisClient(RedisClientFactory.java:64)
at org.hibernate.cache.redis.client.RedisClientFactory.createRedisClient(RedisClientFactory.java:92)
at org.hibernate.cache.redis.hibernate5.AbstractRedisRegionFactory.createRedisClient(AbstractRedisRegionFactory.java:65)
at org.hibernate.cache.redis.hibernate5.SingletonRedisRegionFactory.start(SingletonRedisRegionFactory.java:55)
at org.hibernate.internal.CacheImpl.<init>(CacheImpl.java:49)
at org.hibernate.engine.spi.CacheInitiator.initiateService(CacheInitiator.java:28)
at org.hibernate.engine.spi.CacheInitiator.initiateService(CacheInitiator.java:20)
at org.hibernate.service.internal.SessionFactoryServiceRegistryImpl.initiateService(SessionFactoryServiceRegistryImpl.java:49)
at org.hibernate.service.internal.AbstractServiceRegistryImpl.createService(AbstractServiceRegistryImpl.java:254)
at org.hibernate.service.internal.AbstractServiceRegistryImpl.initializeService(AbstractServiceRegistryImpl.java:228)
at org.hibernate.service.internal.AbstractServiceRegistryImpl.getService(AbstractServiceRegistryImpl.java:207)
at org.hibernate.service.internal.SessionFactoryServiceRegistryImpl.getService(SessionFactoryServiceRegistryImpl.java:68)
1、登录服务器进行查看
redis-cli -h <ip> -p <端口> -a <密码>
127.0.0.1:7000> set age 20
(error) CLUSTERDOWN The cluster is down
127.0.0.1:7000> cluster info
cluster_state:fail
cluster_slots_assigned:16371
cluster_slots_ok:16371
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:8
cluster_my_epoch:1
cluster_stats_messages_sent:1007
cluster_stats_messages_received:1005
由于redis集群默认slots为16384个,而现在只有16371个,少了13个slots,再看一下slots信息:
127.0.0.1:7000>cluster slots
1) 1) (integer) 0
2) (integer) 5460
3) 1) "192.168.205.81"
2) (integer) 6380
3) "5fbcbee6cff9c0e4391dbf1553e87befa4914049"
4) 1) "192.168.205.86"
2) (integer) 6379
3) "25cf4ec8180d6cc7e78192cb0d2d021335501b6a"
2) 1) (integer) 5461
2) (integer) 6606
3) 1) "192.168.205.81"
2) (integer) 6379
3) "9d83c3a9be7ee1a12a38da33e260f4dbda059462"
4) 1) "192.168.205.82"
2) (integer) 6380
3) "30a9d2065538e59874eced14836c512b5a0d09f4"
3) 1) (integer) 6608
2) (integer) 6613
3) 1) "192.168.205.81"
2) (integer) 6379
3) "9d83c3a9be7ee1a12a38da33e260f4dbda059462"
4) 1) "192.168.205.82"
2) (integer) 6380
3) "30a9d2065538e59874eced14836c512b5a0d09f4"
4) 1) (integer) 6615
2) (integer) 7037
3) 1) "192.168.205.81"
2) (integer) 6379
3) "9d83c3a9be7ee1a12a38da33e260f4dbda059462"
4) 1) "192.168.205.82"
2) (integer) 6380
3) "30a9d2065538e59874eced14836c512b5a0d09f4"
5) 1) (integer) 7039
2) (integer) 7471
3) 1) "192.168.205.81"
2) (integer) 6379
3) "9d83c3a9be7ee1a12a38da33e260f4dbda059462"
4) 1) "192.168.205.82"
2) (integer) 6380
3) "30a9d2065538e59874eced14836c512b5a0d09f4"
6) 1) (integer) 7473
2) (integer) 8197
3) 1) "192.168.205.81"
2) (integer) 6379
3) "9d83c3a9be7ee1a12a38da33e260f4dbda059462"
4) 1) "192.168.205.82"
2) (integer) 6380
3) "30a9d2065538e59874eced14836c512b5a0d09f4"
7) 1) (integer) 8199
2) (integer) 8324
3) 1) "192.168.205.81"
2) (integer) 6379
3) "9d83c3a9be7ee1a12a38da33e260f4dbda059462"
4) 1) "192.168.205.82"
2) (integer) 6380
3) "30a9d2065538e59874eced14836c512b5a0d09f4"
8) 1) (integer) 8326
2) (integer) 8342
3) 1) "192.168.205.81"
2) (integer) 6379
3) "9d83c3a9be7ee1a12a38da33e260f4dbda059462"
4) 1) "192.168.205.82"
2) (integer) 6380
3) "30a9d2065538e59874eced14836c512b5a0d09f4"
9) 1) (integer) 8344
2) (integer) 8492
3) 1) "192.168.205.81"
2) (integer) 6379
3) "9d83c3a9be7ee1a12a38da33e260f4dbda059462"
4) 1) "192.168.205.82"
2) (integer) 6380
3) "30a9d2065538e59874eced14836c512b5a0d09f4"
10) 1) (integer) 8494
2) (integer) 9555
3) 1) "192.168.205.81"
2) (integer) 6379
3) "9d83c3a9be7ee1a12a38da33e260f4dbda059462"
4) 1) "192.168.205.82"
2) (integer) 6380
3) "30a9d2065538e59874eced14836c512b5a0d09f4"
11) 1) (integer) 9557
2) (integer) 9901
3) 1) "192.168.205.81"
2) (integer) 6379
3) "9d83c3a9be7ee1a12a38da33e260f4dbda059462"
4) 1) "192.168.205.82"
2) (integer) 6380
3) "30a9d2065538e59874eced14836c512b5a0d09f4"
12) 1) (integer) 9903
2) (integer) 9999
3) 1) "192.168.205.81"
2) (integer) 6379
3) "9d83c3a9be7ee1a12a38da33e260f4dbda059462"
4) 1) "192.168.205.82"
2) (integer) 6380
3) "30a9d2065538e59874eced14836c512b5a0d09f4"
13) 1) (integer) 10001
2) (integer) 10338
3) 1) "192.168.205.81"
2) (integer) 6379
3) "9d83c3a9be7ee1a12a38da33e260f4dbda059462"
4) 1) "192.168.205.82"
2) (integer) 6380
3) "30a9d2065538e59874eced14836c512b5a0d09f4"
14) 1) (integer) 10340
2) (integer) 10802
3) 1) "192.168.205.81"
2) (integer) 6379
3) "9d83c3a9be7ee1a12a38da33e260f4dbda059462"
4) 1) "192.168.205.82"
2) (integer) 6380
3) "30a9d2065538e59874eced14836c512b5a0d09f4"
15) 1) (integer) 10804
2) (integer) 10922
3) 1) "192.168.205.81"
2) (integer) 6379
3) "9d83c3a9be7ee1a12a38da33e260f4dbda059462"
4) 1) "192.168.205.82"
2) (integer) 6380
3) "30a9d2065538e59874eced14836c512b5a0d09f4"
16) 1) (integer) 10923
2) (integer) 16383
3) 1) "192.168.205.82"
2) (integer) 6379
3) "031d12ae9a6f15e7550e17916454b41fdeaa0bbe"
4) 1) "192.168.205.86"
2) (integer) 6380
3) "eeaa7bbb9e71c47d4c8c0fc67b188753910713e5"
2、分析slots缺少情况
从上述slots中,把所有缺失的slots统计出来:
0-5460
5461-6606
6608-6613
6615-7037
7039-7471
7473-8197
8199-8324
8326-8342
8344-8492
8494-9555
9557-9901
9903-9999
10001-10338
10340-10802
10804-10922
10923-16383
缺失的slots为:
6607 6614 7038 7472 8198 8325 8343 8493 9556 9902 10000 10339 10803
正好为我们缺失的13个slots (16384-16371=13)
3、解决方法如下
3.1将一个或多个槽(slot)指派(assign)给当前节点
127.0.0.1:7000>cluster addslots 6607 6614 7038 7472 8198 8325 8343 8493 9556 9902 10000 10339 10803
OK
3-2查看结果
127.0.0.1:7000>cluster info
cluster_state:ok
cluster_slots_assigned:16384
cluster_slots_ok:16384
cluster_slots_pfail:0
cluster_slots_fail:0
cluster_known_nodes:6
cluster_size:3
cluster_current_epoch:30
cluster_my_epoch:25
cluster_stats_messages_ping_sent:2831
cluster_stats_messages_pong_sent:686
cluster_stats_messages_publish_sent:98525
cluster_stats_messages_sent:102042
cluster_stats_messages_ping_received:686
cluster_stats_messages_pong_received:690
cluster_stats_messages_publish_received:96234
cluster_stats_messages_received:97610
集群里slots问题已经修复成功,下面进行检测,发现集群上一些slots有些问题
./redis-trib.rb check 192.168.205.81:6379
>>> Check for open slots...
[WARNING] Node 192.168.205.81:6379 has slots in importing state (194,299,626,660,710,1274,1819,1838,2109,3237,3504,3647,3909,4044,4313,4453,4478,4706,4979).
[WARNING] Node 192.168.205.81:6380 has slots in importing state (6607,6614,8712,9902).
[WARNING] Node 192.168.205.82:6379 has slots in importing state (8712).
[WARNING] The following slots are open: 194,299,626,660,710,1274,1819,1838,2109,3237,3504,3647,3909,4044,4313,4453,4478,4706,4979,6607,6614,8712,9902
>>> Check slots coverage...
[OK] All 16384 slots covered.
3-3 分别登录上述node的redis中,删除对应的slosts,再添加slots
127.0.0.1:7000>cluster delslots 194 299 626 660 710 1274 1819 1838 2109 3237 3504 3647 3909 4044 4313 4453 4478 4706 4979
ok
127.0.0.1:7000>cluster addslots 194 299 626 660 710 1274 1819 1838 2109 3237 3504 3647 3909 4044 4313 4453 4478 4706 4979
ok
再对集群进行检测时,已经没有警告信息了,至此redis已全部修复完成
redis原理查看:https://www.cnblogs.com/mengchunchen/p/10059436.html