nlash Posted August 25, 2022 Share Posted August 25, 2022 (edited) The last two times I've rebooted, the server has started a parity check and reported cache pool errors upon start-up. The first time was after an OS upgrade (months ago), the most recent time (today) was after a CPU pin reassignment. I have a script that I grabbed from here that does hourly BTRFS checks and reports when it finds them. Nothing was reported since the last time this happened and I rebuilt my pool. I also have monthly BTRFS scrubs scheduled. Diagnostics attached. Any help as to why this is happening would be appreciated. unraidserver-diagnostics-20220825-1704.zip Edited August 25, 2022 by nlash Quote Link to comment
trurl Posted August 25, 2022 Share Posted August 25, 2022 Those look fine, are you currently having the problem? Quote Link to comment
nlash Posted August 26, 2022 Author Share Posted August 26, 2022 Yes, cache pool errors on one device. Quote Link to comment
nlash Posted August 26, 2022 Author Share Posted August 26, 2022 Just moved everything off of the cache to the array, removed the offending disk, formatted and added back to the cache pool. Cache is still empty. I rebooted and get the errors again. Bad drive? Quote Link to comment
JorgeB Posted August 26, 2022 Share Posted August 26, 2022 Those errors mean that device sdb dropped offline at some point in the past, but before this last boot, to reset the fs errors see here. Quote Link to comment
nlash Posted August 26, 2022 Author Share Posted August 26, 2022 4 hours ago, JorgeB said: Those errors mean that device sdb dropped offline at some point in the past, but before this last boot, to reset the fs errors see here. Ah, the reset errors was the step I was missing—thank you. Would sbd dropping offline be due to cables or some other something else? Quote Link to comment
JorgeB Posted August 26, 2022 Share Posted August 26, 2022 30 minutes ago, nlash said: Would sbd dropping offline be due to cables or some other something else? Most often it's a cable issue, diags before rebooting if it happens might give a better idea. Quote Link to comment
nlash Posted August 26, 2022 Author Share Posted August 26, 2022 1 hour ago, JorgeB said: Most often it's a cable issue, diags before rebooting if it happens might give a better idea. Got it—thank you for the help. I'll swap the cable and see how it shakes out. 1 Quote Link to comment
nlash Posted September 25, 2022 Author Share Posted September 25, 2022 On 8/26/2022 at 7:44 AM, JorgeB said: Most often it's a cable issue, diags before rebooting if it happens might give a better idea. Getting cache pool errors again after replacing the cable. Pre-reboot diags attached. Any advice is much appreciated. unraidserver-diagnostics-20220925-0750.zip Quote Link to comment
JorgeB Posted September 26, 2022 Share Posted September 26, 2022 Sep 25 06:53:04 UnRAIDServer kernel: ata4: hard resetting link Sep 25 06:53:09 UnRAIDServer kernel: ata4: found unknown device (class 0) Sep 25 06:53:09 UnRAIDServer kernel: ata4: softreset failed (device not ready) Sep 25 06:53:09 UnRAIDServer kernel: ata4: reset failed, giving up Sep 25 06:53:09 UnRAIDServer kernel: ata4.00: disable device Sep 25 06:53:09 UnRAIDServer kernel: ata4: EH complete Device drooped offline, this is usually a power/connection problem, check/replace cables, both power and SATA. 1 Quote Link to comment
nlash Posted September 26, 2022 Author Share Posted September 26, 2022 Sorry, last questions (hopefully). What's the best sequence of steps to protect the data on the cache from corruption when this happens? I've moved my cache-only shares to the array and I have appdata backups happening nightly. I'm not familiar enough with how the cache pool system works to know what NOT to do in this scenario. Can I simply power down, replace the cable, and scrub? Quote Link to comment
Solution JorgeB Posted September 26, 2022 Solution Share Posted September 26, 2022 Usually you just need to bring the device back online and run a scrub, assuming there are no "nocow" shares, more info here. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.