线上redis宕机的排查处理-快照持久化
一、过程
早上看到监控,后面了解到由于开发大批量的同步数据,才导致redis节点宕机了,线上redis部署在3台服务器,6个节点采用3主3从
二、排查
连接到线上服务器开始排查,查看进程,发现对应端口的进程已经没有了,直接重启了对应节点的redis
./redis-server /home/dpan/soft/redis-5.0.10/cluster-conf/6380/redis.conf &
然后发现,redis的两个节点都挂掉了
然后查看对应的运行日志,发现到这里就卡住了
12361:C 23 Sep 2022 10:27:27.604 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
12361:C 23 Sep 2022 10:27:27.604 # Redis version=5.0.10, bits=64, commit=00000000, modified=0, pid=12361, just started
12361:C 23 Sep 2022 10:27:27.604 # Configuration loaded
12361:M 23 Sep 2022 10:27:27.605 * Increased maximum number of open files to 10032 (it was originally set to 1024).
12361:M 23 Sep 2022 10:27:27.606 * Node configuration loaded, I'm 9bb67931114d794bc2cbd2d30eb0d8c4fcd8ca84
12361:M 23 Sep 2022 10:27:27.607 * Running mode=cluster, port=6381.
12361:M 23 Sep 2022 10:27:27.607 # Server initialized
12361:M 23 Sep 2022 10:27:27.608 * Reading RDB preamble from AOF file...
12361:M 23 Sep 2022 10:28:12.122 * Reading the remaining AOF tail...
排查后发现启动一个节点居然占用内存10G多,2个节点自然是无法启动的,只能启一个节点,这显然是不正常的,以为修改redis启动内存可以解决这个问题,后来发现没有用
因为在Redis中持久化的方式有两种,一种是快照持久化,一种是AOF持久化
快照持久化也叫RDB持久化方式。就是通过拍摄快照的方式来实现持久化,将某个时间的内存数据存储在一个rdb文件中。在redis服务重新启动的时候会加载rdb文件中的数据
三、处理
redis中的快照持久化默认是开启的,在redis.conf配置文件中有相关的配置选项
################################ SNAPSHOTTING ################################
#
# Save the DB on disk:
#
# save <seconds> <changes>
#
# Will save the DB if both the given number of seconds and the given
# number of write operations against the DB occurred.
#
# In the example below the behaviour will be to save:
# after 900 sec (15 min) if at least 1 key changed
# after 300 sec (5 min) if at least 10 keys changed
# after 60 sec if at least 10000 keys changed
#
# Note: you can disable saving completely by commenting out all "save" lines.
#
# It is also possible to remove all the previously configured save
# points by adding a save directive with a single empty string argument
# like in the following example:
#
# save ""
save 900 1 #900秒内至少有1个key被更改就执行快照
save 300 10 #300内描述至少有10个key被更改就执行快照
save 60 10000 #60秒内至少有10000个key被更改就执行快照
# By default Redis will stop accepting writes if RDB snapshots are enabled
# (at least one save point) and the latest background save failed.
# This will make the user aware (in a hard way) that data is not persisting
# on disk properly, otherwise chances are that no one will notice and some
# disaster will happen.
#
# If the background saving process will start working again Redis will
# automatically allow writes again.
#
# However if you have setup your proper monitoring of the Redis server
# and persistence, you may want to disable this feature so that Redis will
# continue to work as usual even if there are problems with disk,
# permissions, and so forth.
stop-writes-on-bgsave-error yes #拍摄快照失败是否继续执行写命令
# Compress string objects using LZF when dump .rdb databases?
# For default that's set to 'yes' as it's almost always a win.
# If you want to save some CPU in the saving child set it to 'no' but
# the dataset will likely be bigger if you have compressible values or keys.
rdbcompression yes #是否对快照文件进行压缩
# Since version 5 of RDB a CRC64 checksum is placed at the end of the file.
# This makes the format more resistant to corruption but there is a performance
# hit to pay (around 10%) when saving and loading RDB files, so you can disable it
# for maximum performances.
#
# RDB files created with checksum disabled have a checksum of zero that will
# tell the loading code to skip the check.
rdbchecksum yes #是否进行数据校验
# The filename where to dump the DB
dbfilename dump.rdb #快照文件存储的名称
# The working directory.
#
# The DB will be written inside this directory, with the filename specified
# above using the 'dbfilename' configuration directive.
#
# The Append Only File will also be created inside this directory.
#
# Note that you must specify a directory here, not a file name.
dir ./ #快照文件存储的位置
快照文件默认会和conf文件在统计目录,名称为dump.rdb
优缺点
优点
RDB文件是一个很简洁的单文件,它保存了某个时间点的Redis数据,很适合用于做备份。你可以设定一个时间点对RDB文件进行归档,这样就能在需要的时候很轻易的把数据恢复到不同的版本。
RDB很适合用于灾备。单文件很方便就能传输到远程的服务器上。
RDB的性能很好,需要进行持久化时,主进程会fork一个子进程出来,然后把持久化的工作交给子进程,自己不会有相关的I/O操作。
比起AOF,在数据量比较大的情况下,RDB的启动速度更快。
缺点
RDB容易造成数据的丢失。假设每5分钟保存一次快照,如果Redis因为某些原因不能正常工作,那么从上次产生快照到Redis出现问题这段时间的数据就会丢失了。
RDB使用fork()产生子进程进行数据的持久化,如果数据比较大的话可能就会花费点时间,造成Redis停止服务几毫秒。如果数据量很大且CPU性能不是很好的时候,停止服务的时间甚至会到1秒
配置里禁用持久化快照
#1.在redis.conf配置文件中注释掉所有的save配置
#2.在最后一条save配置追加吃命令
save ""
最后解决方法:禁用快照持久化,并且删除dump.rdb,然后重启所有节点的redis服务
./redis-server /home/dpan/soft/redis-5.0.10/cluster-conf/6380/redis.conf &
./redis-server /home/dpan/soft/redis-5.0.10/cluster-conf/6381/redis.conf &