I am sorry for the unscheduled downtime this morning, Friday 25th of May around 8 am UTC. A scheduled kernel upgrade of the server went not as expected. The kernel upgrade did went correctly on the slave, including reboot and resync, but the master failed to come up again. For data safety reasons, we performed a backup of the slave before promoting it as master and switching the application to use the new master. This backup is what took a bit more time than expected and resulted in the large downtime.
What is next?
I will keep you informed. Sorry for the annoyance, sometimes issues happen and this one took me by surprise.
Update: Reviewing the logs, the combination of a VM + Hardware node restart, including KVM upgrade is most likely the culprit.
May 25, 2012
© Céondo Ltd, 2007-2013. All rights reserved.