Jump to content

system forced shutdown after 90 second wait


Go to solution Solved by trurl,

Recommended Posts

I've noticed for the past few months that if I reboot my server the graceful shutdown fails and the system has to force a shutdown which results in a parity check on boot. Ive attached the diagnostics that were collected as part of the "force" shutdown. I'm hoping that someone can give me some clue what process is causing the shutdown to hang and how to correct it.

 

brunnhilde-diagnostics-20220105-1809.zip

Link to comment
  • Solution

sdf in those diagnostics was the first disk in a pool named torrent, formatted btrfs. sdg was the other disk in the pool, but it was listed in the smart folder as sdp so it must have disconnected.

 

And the errors in syslog are, in fact, for sdg

Jan  5 18:09:36 Brunnhilde kernel: BTRFS error (device sdf1): bdev /dev/sdg1 errs: wr 872, rd 13057, flush 0, corrupt 0, gen 0
Jan  5 18:09:36 Brunnhilde kernel: BTRFS error (device sdf1): bdev /dev/sdg1 errs: wr 873, rd 13057, flush 0, corrupt 0, gen 0
Jan  5 18:09:36 Brunnhilde kernel: BTRFS error (device sdf1): bdev /dev/sdg1 errs: wr 873, rd 13058, flush 0, corrupt 0, gen 0
Jan  5 18:09:36 Brunnhilde kernel: BTRFS error (device sdf1): bdev /dev/sdg1 errs: wr 874, rd 13058, flush 0, corrupt 0, gen 0
Jan  5 18:09:36 Brunnhilde kernel: BTRFS error (device sdf1): bdev /dev/sdg1 errs: wr 874, rd 13059, flush 0, corrupt 0, gen 0
Jan  5 18:09:36 Brunnhilde kernel: BTRFS error (device sdf1): bdev /dev/sdg1 errs: wr 875, rd 13059, flush 0, corrupt 0, gen 0
Jan  5 18:09:36 Brunnhilde kernel: BTRFS error (device sdf1): bdev /dev/sdg1 errs: wr 875, rd 13060, flush 0, corrupt 0, gen 0
Jan  5 18:09:36 Brunnhilde kernel: BTRFS error (device sdf1): bdev /dev/sdg1 errs: wr 876, rd 13060, flush 0, corrupt 0, gen 0
Jan  5 18:09:36 Brunnhilde kernel: BTRFS error (device sdf1): bdev /dev/sdg1 errs: wr 876, rd 13061, flush 0, corrupt 0, gen 0
Jan  5 18:09:36 Brunnhilde kernel: BTRFS error (device sdf1): bdev /dev/sdg1 errs: wr 877, rd 13061, flush 0, corrupt 0, gen 0

 

Link to comment

There is a known issue (probably existing for quite a while) where the OS is too "extreme" in calling stuff unclean shutdowns.  Currently if any process has to be killed in order to shutdown, then an unclean shutdown happens.

 

How it's supposed to work (and hopefully fixed next rev) is that only if the drives can't be unmounted cleanly even after killing a process if necessary should be "unclean"

 

At the end of the day, this means that most so-called unclean shutdowns (where a power failure isn't involved) aren't actually unclean.  (90% of the time whenever this happens to me, I cancel the parity check after a couple of minutes, as I know that on the monthly correcting check it'll catch any issues)

Link to comment

Yeah, I thought about cancelling the check but it’s really not hurting anything so I went ahead and let it run.

 

Maybe my question should really be rather than a reboot, is there a better way to fix the cache pool when I’ve accidentally disconnected one of the disks (they’re all eSATA in separate enclosures). Maybe just an array stop/start or something.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...