Imagine this situation:
You are doing the sensible thing, testing to ensure that your DR plan works. But things go wrong. You restore to the production database, not the DR test. You restore to a point in time that was 3 months prior. Panic sets in, as you realise that all of the backups that you have on hand are all, stuck some three months ago.
Effectively you now have three problems that need to be solved in the following order:
- Restore the database to its most recent state.
- Make the backups functional again.
- Update the DR process.
This guide below serves to aid if in the future, I or someone else needs to attempt to restore a legacy MySQL server by using the binary log. In this instance, we were lucky from two perspectives:
- We had a functional OS level backup from 4:00 AM on the day of the DR test (that we could spin up and do a MySQL Dump from).
- We had a Binary Log, thanks to MySQL replication.
Process to restore from Binary Log
Pre-requisites:
mysqlbinlogversion that matches your production MySQL version. (In my case, it was 5.7.44)- Copies of the binary logs. In my Docker setup, they were located at
/var/log/mysql/mysql-bin.* - A recent known “good backup” with the timestamp in UTC.
- The timeframe known to be good (i.e., before the problematic transaction).
Steps:
- Copy the binary logs onto your windows machine.
- Extract the binary:
mysql-5.7.44-winx64.zip\mysql-5.7.44-winx64\bin\mysqlbinlog.exe - Place in the same location as the binary logs.
- Determine which binary log is likely to have the time span of transactions that need to be recovered.
- Navigate to the directory that the mysqlbinlogs.exe is in, and execute the command:
.\mysqlbinlog.exe --verbose --force-read --start-datetime="" --stop-datetime="" mysql-bin.000001 > missing_transactions_utf16.sql
Complete example:.\mysqlbinlog.exe --verbose --force-read --start-datetime="2024-12-06 04:00:00" --stop-datetime="2024-12-06 16:00:00" mysql-bin.000001 > missing_transactions_utf16.sql - Note that this produces a file with Windows Line endings (LF), and in UTF-16 LE BOM. Both need to be converted in a Linux machine.
- Copy the .sql file to the target Linux machine.
- In the Linux machine, execute the following commands:
Line ending fix:sed 's/\r$//' missing_transactions_utf16.sql > fixed_missing_transactions_utf16.sql - Now we need to correct the file encoding to what MySQL can understand.
Encoding fix:iconv -f UTF-16LE -t UTF-8 fixed_missing_transactions_utf16.sql -o missing_transactions_utf8.sql - We are now ready to import the known good backup and apply the missing transactions.
- Using the regular steps to restore from the known good backup.
- Restore the transactions with:
mysql --binary-mode=1 --default-character-set=utf8 --max-allowed-packet=64M -u <user> -p < missing_transactions_utf8.sql
Learnings
From this incident we have made a few changes to the BCDR Policy Implementation, and the overall process for DR testing.
- Ensure that Binary Logs, or Write Ahead Logs are enabled and collected as part of a regular backup process.
- When performing a DR test, deploy to separate infrastructure.
- Keep an archive of all the tooling that is required for any legacy systems. This may mean capturing the installer for multiple platforms, and any other regular tools that are needed.
- Ensure that automated processes are in place to check that the replication is working as expected.
- Ensure that automated processes are in place to check that the backups are occurring as expected.
