Galera Cluster & MySQL signal 6 loop

Few weeks ago I encountered one of the most bizzare issues I ever got to debug. On my testing Galera Cluster, one of the nodes failed and refused to join back the cluster. What I initially observed was the MySQL process looping with the error message that was caught in /var/log/mysql/error.log:

[...]
2023-02-03T11:58:39.629737Z 0 [ERROR] [MY-010147] [Server] Too many arguments (first extra is 'file').
2023-02-03T11:58:39.630568Z 0 [ERROR] [MY-010119] [Server] Aborting
terminate called after throwing an instance of 'wsrep::runtime_error'
  what():  State wait was interrupted
11:58:41 UTC - mysqld got signal 6 ;
Most likely, you have hit a bug, but this error can also be caused by malfunctioning hardware.
Thread pointer: 0x7fc730004900
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 7fc74d949b58 thread_stack 0x100000
/usr/sbin/mysqld(my_print_stacktrace(unsigned char const*, unsigned long)+0x41) [0x5616ca5a3cd1]
/usr/sbin/mysqld(handle_fatal_signal+0x313) [0x5616c93f2e43]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x143c0) [0x7fc85cf2a3c0]
/lib/x86_64-linux-gnu/libc.so.6(gsignal+0xcb) [0x7fc85c60203b]
/lib/x86_64-linux-gnu/libc.so.6(abort+0x12b) [0x7fc85c5e1859]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0x9e911) [0x7fc85c9b9911]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa38c) [0x7fc85c9c538c]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa3f7) [0x7fc85c9c53f7]
/lib/x86_64-linux-gnu/libstdc++.so.6(+0xaa6a9) [0x7fc85c9c56a9]
/usr/sbin/mysqld(+0xf0501e) [0x5616c913f01e]
/usr/sbin/mysqld(wsrep::server_state::sst_received(wsrep::client_service&, int)+0xf55) [0x5616cad18645]
/usr/sbin/mysqld(+0x11e2ab6) [0x5616c941cab6]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x8609) [0x7fc85cf1e609]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x43) [0x7fc85c6de163]

Trying to get some variables.
Some pointers may be invalid and cause the dump to abort.
Query (0): Connection ID (thread ID): 3
Status: NOT_KILLED

The manual page at [http://dev.mysql.com/doc/mysql/en/crashing.html](http://dev.mysql.com/doc/mysql/en/crashing.html) contains
information that should help you find out what is causing the crash.
[...]

It was looping in that state and failing to properly get going, start SST, join the cluster etc. systemd is set to restart on-failure, and as it was failing, it was being restarted constantly.

What I eventually observed was that when mysqld got started manually (i.e. outside of systemd control), it wasn't failing. So I went back to the message above and this part in particular started bothering me: [ERROR] [MY-010147] [Server] Too many arguments (first extra is 'file'). Usually you get this when you pass too many (duh!) or wrong arguments to mysqld — but we are not passing anything, are we? systemd is handling this for us, right? Right?!

Yes and no. I had a quick look onto a healthy instance to check status of mysql service:

mysql.service - MySQL Wsrep Server
     Loaded: loaded (/lib/systemd/system/mysql.service; disabled; vendor preset: enabled)
    Drop-In: /etc/systemd/system/mysql.service.d
             └─limits.conf
     Active: active (running) since Fri 2023-02-03 16:09:22 GMT; 2 days ago
       Docs: man:mysqld(8)
             http://dev.mysql.com/doc/refman/en/using-systemd.html
   Main PID: 131915 (mysqld)
      Tasks: 57 (limit: 4677)
     Memory: 2.5G
     CGroup: /system.slice/mysql.service
             └─131915 /usr/sbin/mysqld --wsrep_start_position=3c10debe-3db1-11ed-9b39-1e0592f0bc56:878002

OK, so there is, in fact, --wsrep-start_position being passed. What are we passing then on the unhealthy node?

[...]
─101854 /usr/sbin/mysqld --wsrep_start_position=Binary file /var/log/mysql/error.log matches

You... WHAT??? Well that's not right! How is this passed then? Let's have a look into the systemd service file in /lib/systemd/system/mysql.service:

[...]
[Service]
User=mysql
Group=mysql
Type=exec
PermissionsStartOnly=true
ExecStartPre=/usr/share/mysql-8.0/mysql-systemd-start pre
ExecStart=/usr/sbin/mysqld $MYSQLD_OPTS $MYSQLD_RECOVER_START
TimeoutSec=0
LimitNOFILE = 10000
Restart=on-failure
RestartPreventExitStatus=1
[...]

There's a $MYSQLD_OPTS and $MYSQLD_RECOVER_START coming from... Somewhere. Initially I assumed it must be from /etc/default/mysql as this is were such things used to reside. But it wasn't the case — here they are passed from the ExecStartPre, so a simple bash script that runs before the main mysqld process is being started. What's inside, you ask?

. /usr/share/mysql-8.0/mysql-helpers

sanity () {
# Make sure database and required directories exist
	verify_ready $1
	verify_database $1

	/lib/apparmor/profile-load usr.sbin.mysqld

	if [ ! -r /etc/mysql/my.cnf ]; then
		echo "MySQL configuration not found at /etc/mysql/my.cnf. Please install one."
		exit 1
	fi
	# wsrep: Read and export recovery position
	get_recover_pos $1
}

case $1 in
	"pre")  sanity $2 ;;
	# wsrep: Post clears recovery position from systemctl env
	"post") clear_recover_pos $2;;
esac

Oh yes, get_recover_pos $1 is exactly what we are after, aren't we? So onto another script, /usr/share/mysql-8.0/mysql-helpers:

[...]
# Wsrep recovery
get_recover_pos () {
    local instance=$1
    wsrep_start_position_opt=""
    err_log="$(get_mysql_option ${INSTANCE} log-error "/var/log/mysql/error.log")"
    data_dir="$(get_mysql_option mysqld datadir "/var/lib/mysql")"
    mysqld ${instance:+--defaults-group-suffix=@$instance} \
                     --datadir="${data_dir}" --user=mysql --log-error-verbosity=3 --wsrep_recover 2>> "${err_log}"

    local rp="$(grep '\[WSREP\] Recovered position:' "${err_log}" | tail -n 1)"
    if [ -z "$rp" ]; then
         local skipped="$(grep WSREP "${err_log}" | grep 'skipping position recovery')"
         if [ -z "$skipped" ]; then
             systemd-cat -t mysql -p err echo "WSREP: Failed to recover position:"
             exit 1
         else
             systemd-cat -t mysql -p info echo "WSREP: Position recovery skipped"
         fi
    else
         local start_pos="$(echo $rp | sed 's/.*\[WSREP\]\ Recovered\ position://' \
              | sed 's/^[ \t]*//')"
         systemd-cat -t mysql -p info echo "WSREP: Recovered position $start_pos"
         wsrep_start_position_opt="--wsrep_start_position=$start_pos"
    fi

    if [ -n "$start_pos" ]; then
        systemctl set-environment MYSQLD_RECOVER_START${instance:+_$instance}="$wsrep_start_position_opt"
    fi
}
[...]

There's a lot going in there, but all of the if statements need the $rp variable and this one is fetched like so: grep '\[WSREP\] Recovered position:' "${err_log}" | tail -n 1. Knowing what we know, ${err_log} has got to be /var/log/mysql/error.log, correct? Correct:

grep '\[WSREP\] Recovered position:' "/var/log/mysql/error.log" | tail -n 1
Binary file /var/log/mysql/error.log matches

Looks terribly familiar, hey?

Cool, so now we know what and how, but WHY was still killing me 😄 I grabbed the error.log file for further analysing and I noticed two things:

With BSD grep (default /usr/bin/grep in macOS for example) there was no such problem — file was treated as text one and everything was fine. (I did reproduce exactly the same behaviour with GNU grep though).
If I passed grep -a to enforce treating this file as text it also worked as expected.

While searching for reasons, I finally stumbled upon this blog post: A surprising reason grep may think a file is a binary file. And yeah, sure enough, there's a hole in the file:

[...]
openat(AT_FDCWD, "weird-corrupted-error-log-file-that-thinks-its-a-binary.log", O_RDONLY|O_NOCTTY) = 3
newfstatat(3, "", {st_mode=S_IFREG|0640, st_size=34342621, ...}, AT_EMPTY_PATH) = 0
read(3, "2022-03-25T17:39:01.846182Z 0 [S"..., 98304) = 98304
lseek(3, 98304, SEEK_HOLE)              = 34342621
[...]

Don't believe me? I actually have a picture of it!

Side note: you can find null pointers (i.e. holes) in (n)vim by typing / to search and then pressing ctrl + shift + 2 to find them. This works also in less and bunch of other coreutils too. You're welcome.

So what can cause such a hole in the file? If you ever looked into syslog after kernel panicked the above picture must look familiar — so yeah, kernel panic is surely one way. Depending on circumstances you may get it also with OOM killing ungracefully some process. I did have OOM killing mysqld process — that's what actually started the whole incident:

[...]
Feb  3 11:48:20 test-cluster-01 kernel: [23620731.370944] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/mysql.service,task=mysqld,pid=533068,uid=113
Feb  3 11:48:20 test-cluster-01 kernel: [23620731.371044] Out of memory: Killed process 533068 (mysqld) total-vm:9486148kB, anon-rss:3381832kB, file-rss:0kB, shmem-rss:0kB, UID:113 pgtables:8624kB oom_score_adj:0
Feb  3 11:48:20 test-cluster-01 kernel: [23620731.562123] oom_reaper: reaped process 533068 (mysqld), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
[...]

It's a perfectly reasonable assumption to just finish here — but it wasn't the root cause of having null pointers in the file. If you'll look closely at the timestamps, hole in the file doesn't match the OOM event. That's not all, I also checked /var/log/syslog from that day to confirm my suspicion:

[...]
Feb  3 11:49:12 test-cluster-01 mysql-systemd-start[2827770]: Skipping profile in /etc/apparmor.d/disable: usr.sbin.mysqld
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@
^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@Feb  3 11:49:40 test-cluster-01 systemd-modules-load[366]: Inserted module 'msr'
Feb  3 11:49:40 test-cluster-01 systemd-sysctl[380]: Not setting net/ipv4/conf/all/promote_secondaries (explicit setting exists).
Feb  3 11:49:40 test-cluster-01 systemd-sysctl[380]: Not setting net/ipv4/conf/default/promote_secondaries (explicit setting exists).
[...]

At this point there were no doubts: I was the cause of this (you did not expect that, did you?!) 😬

You see, when test-cluster-01 started to have issues (i.e. went down due to OOM) I didn't have enough brain cycles to deal with it, being busy with other stuff — I should have just leave it there, but while I was already in my Cloud provider's panel doing other things I decided to just power cycle that machine. That power cycle turned out to be not a graceful reboot (like I thought), but a power cycle indeed, so it went down "for real" and I ended up with null pointers in my log files. And the rest... is history 😋

What caught me by surprise was the actual usage of grep to go through the MySQL's error.log in order to find WSREP recovered position — I still think it's weird. That said, I have it on my list to open PR against codership/mysql-wsrep to grep with -a flag passed, so that it never tries to be clever and just defaults to text when parsing.

UPDATE: issue and PR has been created.

Galera Cluster & MySQL signal 6 loop

Author

Read next

Aptly for own Debian repository

Terrible Shell Scripting: Contexts

Pocket ID Debian package

Dying UNIX News