Jump to content

Crash 95% of way through 18TB drive rebuild on 6.12.0


Recommended Posts

As above, my server has died around the time it was due to complete a disk rebuild. Luckily I already have the syslog server enabled. There are a large number of lines in the log around the time it crashed like this:

 

Jun 17 17:26:00 Tower php-fpm[6964]: [WARNING] [pool www] child 18572 exited on signal 9 (SIGKILL) after 13.448421 seconds from start

 

Syslog lines before and after the crash, I forced a shutdown at about 18:05 after seeing that the server was unresponsive and stuck on unRAID boot. Any ideas?

 

Jun 17 16:29:44 Tower emhttpd: spinning down /dev/sdf
Jun 17 16:30:34 Tower emhttpd: spinning down /dev/sdr
Jun 17 16:30:34 Tower emhttpd: spinning down /dev/sdq
Jun 17 16:31:08 Tower emhttpd: spinning down /dev/sds
Jun 17 16:31:08 Tower emhttpd: spinning down /dev/sdn
Jun 17 16:31:24 Tower emhttpd: spinning down /dev/sdd
Jun 17 16:31:37 Tower emhttpd: spinning down /dev/sdk
Jun 17 16:31:39 Tower emhttpd: spinning down /dev/sdh
Jun 17 16:31:40 Tower emhttpd: spinning down /dev/sdl
Jun 17 16:31:46 Tower emhttpd: spinning down /dev/sdj
Jun 17 16:31:52 Tower emhttpd: spinning down /dev/sdo
Jun 17 16:31:53 Tower emhttpd: spinning down /dev/sdp
Jun 17 16:32:06 Tower kernel: mdcmd (58): set md_write_method 0
Jun 17 16:32:06 Tower kernel: 
Jun 17 16:32:22 Tower emhttpd: spinning down /dev/sdi
Jun 17 16:43:03 Tower emhttpd: spinning down /dev/sde
Jun 17 16:56:09 Tower emhttpd: read SMART /dev/sdn
Jun 17 16:56:14 Tower emhttpd: read SMART /dev/sdj
Jun 17 16:56:23 Tower emhttpd: read SMART /dev/sdd
Jun 17 16:56:44 Tower emhttpd: read SMART /dev/sdq
Jun 17 16:56:52 Tower emhttpd: read SMART /dev/sdl
Jun 17 16:57:05 Tower emhttpd: read SMART /dev/sds
Jun 17 17:01:15 Tower emhttpd: read SMART /dev/sde
Jun 17 17:21:22 Tower emhttpd: read SMART /dev/sdo
Jun 17 17:21:33 Tower emhttpd: read SMART /dev/sdf
Jun 17 17:21:55 Tower emhttpd: read SMART /dev/sdk
Jun 17 17:21:55 Tower emhttpd: read SMART /dev/sdh
Jun 17 17:21:55 Tower emhttpd: read SMART /dev/sdr
Jun 17 17:21:55 Tower emhttpd: read SMART /dev/sdi
Jun 17 17:22:09 Tower kernel: mdcmd (59): set md_write_method 1
Jun 17 17:22:09 Tower kernel: 
Jun 17 17:26:00 Tower php-fpm[6964]: [WARNING] [pool www] child 18572 exited on signal 9 (SIGKILL) after 13.448421 seconds from start
Jun 17 17:26:02 Tower php-fpm[6964]: [WARNING] [pool www] child 18573 exited on signal 9 (SIGKILL) after 15.246233 seconds from start
Jun 17 17:26:05 Tower php-fpm[6964]: [WARNING] [pool www] child 18581 exited on signal 9 (SIGKILL) after 17.519854 seconds from start
Jun 17 17:26:16 Tower php-fpm[6964]: [WARNING] [pool www] child 19423 exited on signal 9 (SIGKILL) after 13.088010 seconds from start
Jun 17 17:26:27 Tower php-fpm[6964]: [WARNING] [pool www] child 19447 exited on signal 9 (SIGKILL) after 17.427857 seconds from start
Jun 17 17:26:44 Tower php-fpm[6964]: [WARNING] [pool www] child 19483 exited on signal 9 (SIGKILL) after 31.906754 seconds from start
Jun 17 17:26:57 Tower php-fpm[6964]: [WARNING] [pool www] child 19574 exited on signal 9 (SIGKILL) after 37.215285 seconds from start
Jun 17 17:27:17 Tower php-fpm[6964]: [WARNING] [pool www] child 19609 exited on signal 9 (SIGKILL) after 40.251549 seconds from start
Jun 17 17:27:30 Tower php-fpm[6964]: [WARNING] [pool www] child 19636 exited on signal 9 (SIGKILL) after 38.833944 seconds from start
Jun 17 17:27:42 Tower php-fpm[6964]: [WARNING] [pool www] child 19705 exited on signal 9 (SIGKILL) after 33.462388 seconds from start
Jun 17 17:28:07 Tower php-fpm[6964]: [WARNING] [pool www] child 19813 exited on signal 9 (SIGKILL) after 34.387948 seconds from start
Jun 17 17:28:16 Tower php-fpm[6964]: [WARNING] [pool www] child 20349 exited on signal 9 (SIGKILL) after 39.641391 seconds from start
Jun 17 17:28:19 Tower php-fpm[6964]: [WARNING] [pool www] child 20372 exited on signal 9 (SIGKILL) after 35.862451 seconds from start
Jun 17 17:28:37 Tower php-fpm[6964]: [WARNING] [pool www] child 20547 exited on signal 9 (SIGKILL) after 15.529951 seconds from start
Jun 17 17:28:44 Tower php-fpm[6964]: [WARNING] [pool www] child 20553 exited on signal 9 (SIGKILL) after 20.183873 seconds from start
Jun 17 17:29:23 Tower php-fpm[6964]: [WARNING] [pool www] child 21145 exited on signal 9 (SIGKILL) after 32.638050 seconds from start
Jun 17 17:29:33 Tower php-fpm[6964]: [WARNING] [pool www] child 21154 exited on signal 9 (SIGKILL) after 41.826269 seconds from start
Jun 17 17:30:17 Tower php-fpm[6964]: [WARNING] [pool www] child 21239 exited on signal 9 (SIGKILL) after 44.186443 seconds from start
Jun 17 17:30:40 Tower php-fpm[6964]: [WARNING] [pool www] child 21298 exited on signal 9 (SIGKILL) after 55.225397 seconds from start
Jun 17 17:31:09 Tower php-fpm[6964]: [WARNING] [pool www] child 21849 exited on signal 9 (SIGKILL) after 40.263303 seconds from start
Jun 17 17:31:35 Tower php-fpm[6964]: [WARNING] [pool www] child 21906 exited on signal 9 (SIGKILL) after 48.283361 seconds from start
Jun 17 17:35:30 Tower nginx: 2023/06/17 17:28:20 [error] 7092#7092: *1041948 recv() failed (104: Connection reset by peer) while reading response header from upstream, client: 10.0.0.67, server: 10-0-0-9.xxx.myunraid.net, request: "POST /plugins/unassigned.devices/UnassignedDevices.php HTTP/2.0", subrequest: "/auth-request.php", upstream: "fastcgi://unix:/var/run/php5-fpm.sock:", host: "10-0-0-9.xxx.myunraid.net", referrer: "https://10-0-0-9.xxx.myunraid.net/Main"
Jun 17 17:35:38 Tower nginx: 2023/06/17 17:28:20 [error] 7092#7092: *1041948 auth request unexpected status: 502 while sending to client, client: 10.0.0.67, server: 10-0-0-9.xxx.myunraid.net, request: "POST /plugins/unassigned.devices/UnassignedDevices.php HTTP/2.0", host: "10-0-0-9.xxx.myunraid.net", referrer: "https://10-0-0-9.xxx.myunraid.net/Main"
Jun 17 17:42:37 Tower php-fpm[6964]: [WARNING] [pool www] child 20482 exited on signal 9 (SIGKILL) after 862.776645 seconds from start
Jun 17 17:43:35 Tower php-fpm[6964]: [WARNING] [pool www] child 23524 exited on signal 9 (SIGKILL) after 628.501667 seconds from start
Jun 17 17:44:37 Tower php-fpm[6964]: [WARNING] [pool www] child 29755 exited on signal 9 (SIGKILL) after 48.124581 seconds from start
Jun 17 17:45:35 Tower php-fpm[6964]: [WARNING] [pool www] child 30442 exited on signal 9 (SIGKILL) after 47.418422 seconds from start
Jun 17 17:46:36 Tower php-fpm[6964]: [WARNING] [pool www] child 30697 exited on signal 9 (SIGKILL) after 45.732086 seconds from start
Jun 17 17:47:58 Tower php-fpm[6964]: [WARNING] [pool www] child 31337 exited on signal 9 (SIGKILL) after 48.213143 seconds from start
Jun 17 17:48:47 Tower php-fpm[6964]: [WARNING] [pool www] child 31570 exited on signal 9 (SIGKILL) after 43.416692 seconds from start
Jun 17 18:05:12 Tower kernel: mdcmd (36): set md_write_method 1
Jun 17 18:05:12 Tower kernel: 
Jun 17 18:05:12 Tower cache_dirs: Arguments=
Jun 17 18:05:12 Tower cache_dirs: Max Scan Secs=10, Min Scan Secs=1
Jun 17 18:05:12 Tower cache_dirs: Scan Type=adaptive
Jun 17 18:05:12 Tower cache_dirs: Min Scan Depth=4
Jun 17 18:05:12 Tower cache_dirs: Max Scan Depth=none
Jun 17 18:05:12 Tower cache_dirs: Use Command='find -noleaf'
Jun 17 18:05:12 Tower cache_dirs: ---------- Caching Directories ---------------

 

tower-diagnostics-20230617-1814.zip

Link to comment

The rebuild finished successfully in the end, however this morning after kicking off the mover to get the system back to a good state it went offline (blank screen, not reachable on network) again. I've rolled back to 6.11.5 as I suspect this instability is down to 6.12.0 - the server has been rock solid for months before the upgrade.

Edited by topherino
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...