July 27, 20223 yr I was having issues booting my Windows VM, so I did a clean reboot of unraid. When it came back online, my parity disk was disabled. I stopped the array, removed the disk, started then stopped the array, and re-added the disk and now it is doing a parity rebuild/check. But I am concerned, and don't understand how to interpret the logs. In the past, disk 4 would randomly disable itself (and as far as I could tell all the SMART tests checked out). I eventually determined disk 4 was disabling due to a bad cable, which I removed from the system. Disk 4 has been fine since then, but now that the parity disk disabled, I am worried since I didn't touch it. What should I do to make sure my system is running correctly? The diagnostics I attached are from right after I rebooted and noticed the drive was disabled. icaria-diagnostics-20220727-1046.zip
July 27, 20223 yr Community Expert 5 minutes ago, lfas said: from right after I rebooted And so can't tell us why it became disabled. If this is a recurring issue you might want to setup syslog server Looks like problems with reading cache2. Maybe related? Controller or power?
July 27, 20223 yr Community Expert Disk was already disabled at boot time, if it happens again grab the diags before rebooting. This indicates that the cache SATA device dropped offline some time ago: Jul 27 11:41:21 Icaria kernel: BTRFS info (device nvme0n1p1): bdev /dev/sdd1 errs: wr 231821, rd 56307, flush 14833, corrupt 820, gen 0 Run a scrub and then see here to reset the stats and for better pool monitoring. There's also data corruption detected on the other pool, Ryzen with overclock RAM like you have is known to in some cases corrupt data, see here.
July 27, 20223 yr Author 3 minutes ago, trurl said: And so can't tell us why it became disabled. If this is a recurring issue you might want to setup syslog server Looks like problems with reading cache2. Maybe related? Controller or power? Thank you for the reply. I will setup the syslog server as you recommended. The only way I know how to test if it is the controller or power is to use a different motherboard and different PSU. Is there an easier way to test?
July 27, 20223 yr Author 6 minutes ago, JorgeB said: Disk was already disabled at boot time, if it happens again grab the diags before rebooting. This indicates that the cache SATA device dropped offline some time ago: Jul 27 11:41:21 Icaria kernel: BTRFS info (device nvme0n1p1): bdev /dev/sdd1 errs: wr 231821, rd 56307, flush 14833, corrupt 820, gen 0 Run a scrub and then see here to reset the stats and for better pool monitoring. There's also data corruption detected on the other pool, Ryzen with overclock RAM like you have is known to in some cases corrupt data, see here. Thank you, I will do both those things. For the scrub, should I enable "repair corrupted blocks"?
July 27, 20223 yr Community Expert 2 minutes ago, lfas said: For the scrub, should I enable "repair corrupted blocks"? Yes, and check that in the end there are no uncorrectable errors.
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.