wgstarks Posted January 5, 2022 Share Posted January 5, 2022 I've noticed for the past few months that if I reboot my server the graceful shutdown fails and the system has to force a shutdown which results in a parity check on boot. Ive attached the diagnostics that were collected as part of the "force" shutdown. I'm hoping that someone can give me some clue what process is causing the shutdown to hang and how to correct it. brunnhilde-diagnostics-20220105-1809.zip Quote Link to comment
itimpi Posted January 5, 2022 Share Posted January 5, 2022 Your syslog shows that there appear to be btrfs level problems with device sdf which I think is a cache drive so it cannot be successfully unmounted Quote Link to comment
wgstarks Posted January 6, 2022 Author Share Posted January 6, 2022 sdf is my cache but it’s formatted xfs. The only thing I have formatted btrfs is a two disk cache pool (sdg & sdh). Of course the designations probably changed on the reboot right? Quote Link to comment
Solution trurl Posted January 6, 2022 Solution Share Posted January 6, 2022 sdf in those diagnostics was the first disk in a pool named torrent, formatted btrfs. sdg was the other disk in the pool, but it was listed in the smart folder as sdp so it must have disconnected. And the errors in syslog are, in fact, for sdg Jan 5 18:09:36 Brunnhilde kernel: BTRFS error (device sdf1): bdev /dev/sdg1 errs: wr 872, rd 13057, flush 0, corrupt 0, gen 0 Jan 5 18:09:36 Brunnhilde kernel: BTRFS error (device sdf1): bdev /dev/sdg1 errs: wr 873, rd 13057, flush 0, corrupt 0, gen 0 Jan 5 18:09:36 Brunnhilde kernel: BTRFS error (device sdf1): bdev /dev/sdg1 errs: wr 873, rd 13058, flush 0, corrupt 0, gen 0 Jan 5 18:09:36 Brunnhilde kernel: BTRFS error (device sdf1): bdev /dev/sdg1 errs: wr 874, rd 13058, flush 0, corrupt 0, gen 0 Jan 5 18:09:36 Brunnhilde kernel: BTRFS error (device sdf1): bdev /dev/sdg1 errs: wr 874, rd 13059, flush 0, corrupt 0, gen 0 Jan 5 18:09:36 Brunnhilde kernel: BTRFS error (device sdf1): bdev /dev/sdg1 errs: wr 875, rd 13059, flush 0, corrupt 0, gen 0 Jan 5 18:09:36 Brunnhilde kernel: BTRFS error (device sdf1): bdev /dev/sdg1 errs: wr 875, rd 13060, flush 0, corrupt 0, gen 0 Jan 5 18:09:36 Brunnhilde kernel: BTRFS error (device sdf1): bdev /dev/sdg1 errs: wr 876, rd 13060, flush 0, corrupt 0, gen 0 Jan 5 18:09:36 Brunnhilde kernel: BTRFS error (device sdf1): bdev /dev/sdg1 errs: wr 876, rd 13061, flush 0, corrupt 0, gen 0 Jan 5 18:09:36 Brunnhilde kernel: BTRFS error (device sdf1): bdev /dev/sdg1 errs: wr 877, rd 13061, flush 0, corrupt 0, gen 0 Quote Link to comment
wgstarks Posted January 6, 2022 Author Share Posted January 6, 2022 Yeah. It’s an eSATA enclosure that I accidentally knocked the power cord loose from. When I re-powered it it showed as the cache pool and UD. That was why I was rebooting. Quote Link to comment
Squid Posted January 6, 2022 Share Posted January 6, 2022 There is a known issue (probably existing for quite a while) where the OS is too "extreme" in calling stuff unclean shutdowns. Currently if any process has to be killed in order to shutdown, then an unclean shutdown happens. How it's supposed to work (and hopefully fixed next rev) is that only if the drives can't be unmounted cleanly even after killing a process if necessary should be "unclean" At the end of the day, this means that most so-called unclean shutdowns (where a power failure isn't involved) aren't actually unclean. (90% of the time whenever this happens to me, I cancel the parity check after a couple of minutes, as I know that on the monthly correcting check it'll catch any issues) Quote Link to comment
wgstarks Posted January 6, 2022 Author Share Posted January 6, 2022 Yeah, I thought about cancelling the check but it’s really not hurting anything so I went ahead and let it run. Maybe my question should really be rather than a reboot, is there a better way to fix the cache pool when I’ve accidentally disconnected one of the disks (they’re all eSATA in separate enclosures). Maybe just an array stop/start or something. Quote Link to comment
Squid Posted January 6, 2022 Share Posted January 6, 2022 That's better suited to directly asking to resident BTRFS god @JorgeB Quote Link to comment
JorgeB Posted January 7, 2022 Share Posted January 7, 2022 You can stop the array, and if/when all pool devices are back online and appear on the main page and are already assigned, after a page refresh if needed (you can't just re-assigned them manually), you can just re-start it. 1 Quote Link to comment
wgstarks Posted January 7, 2022 Author Share Posted January 7, 2022 7 hours ago, JorgeB said: You can stop the array, and if/when all pool devices are back online and appear on the main page and are already assigned, after a page refresh if needed (you can't just re-assigned them manually), you can just re-start it. Thanks Quote Link to comment
JorgeB Posted January 7, 2022 Share Posted January 7, 2022 1 hour ago, wgstarks said: Thanks Forgot to mention, there shouldn't be, but if there's an "all data on this device will be deleted" warning after any of the pool devices don't start it, in that case reboot first, warning should them be gone. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.