Corrupted Docker Image -> Parity Drive Errors?

michaelhthomas · May 13

Hi all,

I'm a bit baffled and could use some assistance with filesystem errors. This is now the second time that, following a reboot, my UNRAID server reports 1 error on the Parity drive and then removes it from the array. I have run (short) SMART tests on all drives and found nothing out of the ordinary. Strangely, the only errors I see in the syslog appear to be related to a corrupted docker image?

There's a good chance I'm missing something here, so if you wouldn't mind taking a look, that would be great. Currently, I've stopped the array, and I am waiting to restart it until I've gotten this figured out.

gringotts-diagnostics-20240513-0825.zip

trurl · May 13

17 minutes ago, michaelhthomas said:

the only errors I see in the syslog appear to be related to a corrupted docker image?

You must have overlooked all these:

May 13 08:21:13 Gringotts kernel: md: disk1 read error, sector=2312
May 13 08:21:13 Gringotts kernel: md: disk2 read error, sector=2312
May 13 08:21:13 Gringotts kernel: md: disk3 read error, sector=2312

And lots more before and after.

Since multiple disks are involved, probably a power or controller issue.

Any power splitters?

michaelhthomas · May 13

I did catch those, but what I found strange is that those are all from after the parity drive was pulled from the array (as far as I can tell). Once the parity drive is removed, it looks like every single drive has read errors for every read, based on what was shown in the web UI.

I was wondering if it was some motherboard issue. What's weird is that, last time this occurred, I was able to rebuild the parity drive and operate without errors for about a month. There are no power splitters or anything unusual.

JorgeB · May 13

The XML is missing from the diags, but looks like the SATA controller (and a USB controller) are bound to the opensense VM, so Unraid lost connection with all disks when it tried to start..

michaelhthomas · May 13

I bet that's it! The OPNsense VM was being passed a PCIe device which is no longer attached (network card), so the device ids must have shifted around and ended up causing the SATA controller to be passed through instead.

itimpi · May 13

The moment you remove (or add) hardware you should assume that the hardware id’s for passed through hardware are likely to have changed and need redoing.

I think you can also sometimes getting them changing after an major OS update due to changes in the kernel level.

Corrupted Docker Image -> Parity Drive Errors?

Recommended Posts

michaelhthomas

Link to comment

trurl

Link to comment

michaelhthomas

Link to comment

JorgeB

Link to comment

michaelhthomas

Link to comment

itimpi

Link to comment

Join the conversation