Tag Archives: replication

MySQL Slave monitor in Bash

I have several slave setups for different sites, that I use for several purposes, like: taking packups from the slave, or using it for reporting/read only operations for masters that have high write query rates.

But, in the last week I found that a couple of the slaves have been stop for some time without me noticing that (one of them was not working for about a month), so that mean that I don’t have valid packups for the last month! So I thought that I must automate monitoring the slave, so I could receive notification if the slave stopped working for a reason or another.

Why Slave Stop working?
MySQL replication works to synchronize master data to slave database in real time asynchronously, so the master data is copied as it is to the slave through executing the same write queries (stored in the master BinLog) to the slave, so any error that occur in one of these queries may cause the slave to halt.

For example: one time a client of mine tried to submit very large article in his news portal, so in order to get this working I increased the value of max_allowed_packet directive without applying the same configuration to the slave, and that caused the slave SQL_THREAD to stop working.

How to check slave is working or not?
The basic command to check the status of your slave is

SHOW SLAVE STATUS\G

this command will give you summery information about your slave, the most useful pieces of information here is:

  1. Slave_SQL_Running : indicates whether the thread responsible for writing SQL queries is running normally (any error on one of the queries that it executes will stop the thread and it will set value for Last_SQL_Error and Last_SQL_Errno)
  2. Slave_IO_Running: indicates whether the thread responsible for receiving commands sent from master and write them to relay log in the slave is working.
  3. Seconds_Behind_Master: how much the slave is delayed from the master (usually it should minimum, if this value is continuously increasing, this indicates a problem even if both SQL_THREAD and IO_THREAD are running)

So, I searched on the internet to see if there are some scripts that perform that check, and I found a couple of useful links but it didn’t contain all the functionality that I needed, so I wrote my own bash script that monitor the slave and send an alert once there is problem in replicating data from the master.
The script below do the following:

  1. extracts the information about slave using SHOW SLAVE STATUS command in mysql
  2. checks the problems about slave (using the three piece of information discussed above), and write them to a log file in the tmp directory
  3. if there is a problem, send the contents of the log file to the DBA in order to for him to fix.
#!/bin/bash
server_name="Slave #1 - Test DB" #change this in order to indicate which slave you are monitoring now
admin_email='zaid@wewebit.com' #email of the database administrator to recieve notification

# change mysql credentials in the following commands if you running monitor using a user other than root
sql_thread_running=$(mysql -e "show slave status\G" | awk -F":" '/Slave_SQL_Running/ { print $2 }' | tr -d ' ')
io_thread_running=$(mysql -e "show slave status\G" | awk -F":" '/Slave_IO_Running/ { print $2 }' | tr -d ' ')
seconds_late=$(mysql -e "show slave status\G" | awk -F":" '/Seconds_Behind_Master/ { print $2 }' | tr -d ' ')
seconds_late=$(($seconds_late+0))

if [ "$sql_thread_running" = "No" ] || [ "$io_thread_running" = "No" ] || [ $seconds_late -gt 3600 ]; then

log_file="/tmp/log_slave_status_$(date +%m-%d-%Y-%H:%M)"
echo "Slave status report on $(date +%m-%d-%Y-%H:%M)" >> $log_file
echo "Error in slave on $server_name" >> $log_file
if [ "$sql_thread_running" = "No" ]; then
echo "SQL Thread not running" >> $log_file
fi

if [ "$io_thread_running" = "No" ]; then
echo "IO thread not running" >> $log_file
fi

if [ $seconds_late -gt 3600 ]; then #formattting how the latency of the slave behind master should be displayed
display_late="$seconds_late seconds"
if [ $seconds_late -gt 60 ]; then
display_late="$display_late = $(($seconds_late/60)) minutes"
fi

if [ $seconds_late -gt 3600 ]; then
display_late="$display_late = $(($seconds_late/3600)) hours"
fi

echo "slave is behind master by $display_late" >> $log_file
fi

#echo send alerts to the admin
mail -s "Slave status on $server_name" $admin_email < $log_file
echo "Slave not running, alerts sent to the admins..."
else
echo "slave is running normally, no problem detected :)"
fi

you can download the bash script from here

 

Put bash in crontab
The final step is to put this simple script in crontab in order to check the status periodically,
just open crontab file using

crontab -e

And I added it to run each half an hour, as follows:

# some environments require that you set the shell path inside crontab in order to run properly
SHELL=/bin/bash
PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
*/30 * * * * /usr/local/sbin/monitor_slave.sh

 

Now, whenever there is problem in one of my slave, a message will be delivered to my email giving me a summery information about the problem, so I am not worried anymore about my slave(s) status.