MIC(Group Replication)でまだ実行されていないトランザクションがマスターのバイナリログからロストしている場合のエラー

Single PrimaryのMySQL InnoDB Cluster(Group Replication)が組まれているサーバのうち、スレーブとなる1台を切り離した後、切り離されて以降のトランザクションが記録されたバイナリログがなくなっていた(参照できなかった)場合にレプリケーションしようとするとどうなるか試した。

Group Replication時のエラー

以下が発生したエラー。当然Group Replicationは組めていない。

:
2018-05-11T15:58:34.908319+09:00 0 [ERROR] Plugin group_replication reported: 'Group contains 2 members which is greater than group_replication_auto_increment_increment value of 1. This can lead to an higher rate of transactional aborts.'
2018-05-11T15:58:34.908768+09:00 1215 [Note] Plugin group_replication reported: 'This server is working as secondary member with primary member address mysql01:3306.'
2018-05-11T15:58:34.909234+09:00 0 [Note] Plugin group_replication reported: 'Group membership changed to mysql02:3306, mysql01:3306 on view 15260218991937581:2.'
2018-05-11T15:58:34.909316+09:00 1229 [Note] Plugin group_replication reported: 'Establishing group recovery connection with a possible donor. Attempt 1/10'
2018-05-11T15:58:34.922418+09:00 1229 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='mysql01', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''.
2018-05-11T15:58:34.934405+09:00 1229 [Note] Plugin group_replication reported: 'Establishing connection to a group replication recovery donor ********-****-****-****-************ at mysql01 port: 3306.'
2018-05-11T15:58:34.934960+09:00 1231 [Warning] Storing MySQL user name or password information in the master info repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options　for START SLAVE; see the 'START SLAVE Syntax' in the MySQL Manual for more information.
2018-05-11T15:58:34.936904+09:00 1232 [Note] Slave SQL thread for channel 'group_replication_recovery' initialized, starting replication in log 'FIRST' at position 0, relay log './mysql02-relay-bin-group_replication_recovery.000001' position: 4
2018-05-11T15:58:34.966882+09:00 1231 [Note] Slave I/O thread for channel 'group_replication_recovery': connected to master 'mysql_innodb_cluster_r**********@mysql01:3306',replication started in log 'FIRST' at position 4
2018-05-11T15:58:34.974048+09:00 1231 [ERROR] Error reading packet from server for channel 'group_replication_recovery': The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires. (server_errno=1236)
2018-05-11T15:58:34.974079+09:00 1231 [ERROR] Slave I/O for channel 'group_replication_recovery': Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION =　1, but the master has purged binary logs containing GTIDs that the slave requires.', Error_code: 1236
2018-05-11T15:58:34.974085+09:00 1231 [Note] Slave I/O thread exiting for channel 'group_replication_recovery', read up to log 'FIRST', position 4
2018-05-11T15:58:34.974115+09:00 1229 [Note] Plugin group_replication reported: 'Terminating existing group replication donor connection and purging the corresponding logs.'
2018-05-11T15:58:34.974159+09:00 1232 [Note] Error reading relay log event for channel 'group_replication_recovery': slave SQL thread was killed
2018-05-11T15:58:35.012419+09:00 1229 [Note] 'CHANGE MASTER TO FOR CHANNEL 'group_replication_recovery' executed'. Previous state master_host='mysql01', master_port= 3306, master_log_file='', master_log_pos= 4, master_bind=''. New state master_host='', master_port= 0, master_log_file='', master_log_pos= 4, master_bind=''.
2018-05-11T15:58:35.024756+09:00 1229 [Note] Plugin group_replication reported: 'Retrying group recovery connection with another donor. Attempt 2/10'
:

MySQL5.7のGroup ReplicationではGTIDを使用したレプリケーションが行われる。 GTIDを使用したレプリケーションを組む場合、 CHANGE MASTER TO でバイナリログポジションを指定しなくても、スレーブが要求するGTIDの情報をもとに、マスターはどのポジションからバイナリログを転送すれば良いのかを自動的に判断する。実行済みのトランザクションのGTIDの情報はgtid_executed変数(mysql.gtid_executedテーブル)に永続的に保存されるが、実行されるトランザクション自体はバイナリログに記録される。そのため、これが読み込めないとレプリケーションを組む際にエラーとなる。ちなみにすでにバイナリログから削除されたトランザクション情報はgtid_purged変数(mysql.gtid_purgedテーブル)に記録される。

こうなると、他のサーバからmysqldumpなりでフルダンプを取得してreset masterを実行した後に、mysqlテーブルを含めてインポートするしかないように思う…のでやってみたら復旧できた。マスターのデータディレクトリを何等かの方法でコピーするのでも良いと思うがその場合はサーバのUUIDが記載されたauto.cnfは削除する必要がある。ちなみにオフィシャルではEnterprise Backupを用いる方法が記載されていた。

MySQL :: MySQL 5.7 Reference Manual :: 17.4.4 Using MySQL Enterprise Backup with Group Replication

おわりに

MySQL InnoDB Cluster(Group Replication)でまだ実行されていないトランザクションがマスターのバイナリログからロストしている場合のエラーについて書いた。復旧はお早めに。

dshimizu/blog

アルファ版

MIC(Group Replication)でまだ実行されていないトランザクションがマスターのバイナリログからロストしている場合のエラー

Group Replication時のエラー

おわりに

参考