lfas Posted July 27, 2022 Share Posted July 27, 2022 I was having issues booting my Windows VM, so I did a clean reboot of unraid. When it came back online, my parity disk was disabled. I stopped the array, removed the disk, started then stopped the array, and re-added the disk and now it is doing a parity rebuild/check. But I am concerned, and don't understand how to interpret the logs. In the past, disk 4 would randomly disable itself (and as far as I could tell all the SMART tests checked out). I eventually determined disk 4 was disabling due to a bad cable, which I removed from the system. Disk 4 has been fine since then, but now that the parity disk disabled, I am worried since I didn't touch it. What should I do to make sure my system is running correctly? The diagnostics I attached are from right after I rebooted and noticed the drive was disabled. icaria-diagnostics-20220727-1046.zip Quote Link to comment
trurl Posted July 27, 2022 Share Posted July 27, 2022 5 minutes ago, lfas said: from right after I rebooted And so can't tell us why it became disabled. If this is a recurring issue you might want to setup syslog server Looks like problems with reading cache2. Maybe related? Controller or power? Quote Link to comment
JorgeB Posted July 27, 2022 Share Posted July 27, 2022 Disk was already disabled at boot time, if it happens again grab the diags before rebooting. This indicates that the cache SATA device dropped offline some time ago: Jul 27 11:41:21 Icaria kernel: BTRFS info (device nvme0n1p1): bdev /dev/sdd1 errs: wr 231821, rd 56307, flush 14833, corrupt 820, gen 0 Run a scrub and then see here to reset the stats and for better pool monitoring. There's also data corruption detected on the other pool, Ryzen with overclock RAM like you have is known to in some cases corrupt data, see here. Quote Link to comment
lfas Posted July 27, 2022 Author Share Posted July 27, 2022 3 minutes ago, trurl said: And so can't tell us why it became disabled. If this is a recurring issue you might want to setup syslog server Looks like problems with reading cache2. Maybe related? Controller or power? Thank you for the reply. I will setup the syslog server as you recommended. The only way I know how to test if it is the controller or power is to use a different motherboard and different PSU. Is there an easier way to test? Quote Link to comment
lfas Posted July 27, 2022 Author Share Posted July 27, 2022 6 minutes ago, JorgeB said: Disk was already disabled at boot time, if it happens again grab the diags before rebooting. This indicates that the cache SATA device dropped offline some time ago: Jul 27 11:41:21 Icaria kernel: BTRFS info (device nvme0n1p1): bdev /dev/sdd1 errs: wr 231821, rd 56307, flush 14833, corrupt 820, gen 0 Run a scrub and then see here to reset the stats and for better pool monitoring. There's also data corruption detected on the other pool, Ryzen with overclock RAM like you have is known to in some cases corrupt data, see here. Thank you, I will do both those things. For the scrub, should I enable "repair corrupted blocks"? Quote Link to comment
JorgeB Posted July 27, 2022 Share Posted July 27, 2022 2 minutes ago, lfas said: For the scrub, should I enable "repair corrupted blocks"? Yes, and check that in the end there are no uncorrectable errors. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.