Jump to content

Disks getting disabled, this time the parity disk


Recommended Posts

I was having issues booting my Windows VM, so I did a clean reboot of unraid. When it came back online, my parity disk was disabled. I stopped the array, removed the disk, started then stopped the array, and re-added the disk and now it is doing a parity rebuild/check.

 

But I am concerned, and don't understand how to interpret the logs. In the past,  disk 4 would randomly disable itself (and as far as I could tell all the SMART tests checked out). I eventually determined disk 4 was disabling due to a bad cable, which I removed from the system. Disk 4 has been fine since then, but now that the parity disk disabled, I am worried since I didn't touch it.

 

What should I do to make sure my system is running correctly?

 

The diagnostics I attached are from right after I rebooted and noticed the drive was disabled.

icaria-diagnostics-20220727-1046.zip

Link to comment

Disk was already disabled at boot time, if it happens again grab the diags before rebooting.

 

This indicates that the cache SATA device dropped offline some time ago:

 

Jul 27 11:41:21 Icaria kernel: BTRFS info (device nvme0n1p1): bdev /dev/sdd1 errs: wr 231821, rd 56307, flush 14833, corrupt 820, gen 0

 

Run a scrub and then see here to reset the stats and for better pool monitoring.

 

There's also data corruption detected on the other pool, Ryzen with overclock RAM like you have is known to in some cases corrupt data, see here.

 

 

Link to comment
3 minutes ago, trurl said:

And so can't tell us why it became disabled. If this is a recurring issue you might want to setup syslog server

 

Looks like problems with reading cache2. Maybe related? Controller or power?

Thank you for the reply. I will setup the syslog server as you recommended.

The only way I know how to test if it is the controller or power is to use a different motherboard and different PSU. Is there an easier way to test?

Link to comment
6 minutes ago, JorgeB said:

Disk was already disabled at boot time, if it happens again grab the diags before rebooting.

 

This indicates that the cache SATA device dropped offline some time ago:

 

Jul 27 11:41:21 Icaria kernel: BTRFS info (device nvme0n1p1): bdev /dev/sdd1 errs: wr 231821, rd 56307, flush 14833, corrupt 820, gen 0

 

Run a scrub and then see here to reset the stats and for better pool monitoring.

 

There's also data corruption detected on the other pool, Ryzen with overclock RAM like you have is known to in some cases corrupt data, see here.

 

 

Thank you, I will do both those things. For the scrub, should I enable "repair corrupted blocks"?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...