Jump to content

Docker crashing and parity disks in error state


Recommended Posts

Hi there. A few days ago, I saw the docker service (docker image and appdata on cache) crash and got a warning that both of my parity disks entered error state and in the Main page show up as "Parity device is disabled", and was unable to restart docker from Settings. I rebooted the server and my dockers were back. I stopped the array, unassigned the parity disks, started the array, stopped the array, and reassigned the parity and started again, starting a parity rebuild.

 

The same thing happened today with the docker service crashing, and the parity disks entered error state during the rebuild. Attached diagnostics from the recent crash. Not too sure what's at fault here, any help is appreciated!

 

Specs:

Unraid 6.8.3

ASRock Rack X470D4U

Ryzen 7 3700X

4 x Samsung 8GB DDR4 2666MHz CL17 ECC UDIMM (M391A1K43BB2-CTD)

Dell PERC H310 LSI 9211-8i IT Mode

2 x Samsung 860 Evo 500GB (cache)

2 x WD Red 12TB (parity)

3 x WD White 12TB

4 x WD Red 10TB

mxcr-unraid-diagnostics-20200309-0955.zip

Edited by zashmx
Link to comment

I think you must have some sort of controller problem, or possibly power.

 

Your diagnostics don't even give consistent information about whether your disks are mounted or not. df shows only cache, vars shows all mounted. And shares says none exist, as does df. Disabled parities of course should have no effect on your data whether on disks in the array or in the cache pool.

 

Nothing obvious in syslog beyond possible corruption of libvirt and docker images. And some problems writing parity and reading cache.

 

Reboot and post another diagnostic. Those don't make any sense.

Link to comment

There was a problem with the onboard SATA controller:

 

Mar  9 01:41:23 MXCR-Unraid kernel: ahci 0000:03:00.1: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000e394c000 flags=0x0000]

 

I've seen this multiple times on this forum with Ryzen boards, it might only happen with IOMMU enable.

Link to comment
3 hours ago, trurl said:

I think you must have some sort of controller problem, or possibly power.

 

Your diagnostics don't even give consistent information about whether your disks are mounted or not. df shows only cache, vars shows all mounted. And shares says none exist, as does df. Disabled parities of course should have no effect on your data whether on disks in the array or in the cache pool.

 

Nothing obvious in syslog beyond possible corruption of libvirt and docker images. And some problems writing parity and reading cache.

 

Reboot and post another diagnostic. Those don't make any sense.

Thanks for responding. Attached diagnostics after reboot. Should I delete docker.img and libvirt.img and recreating them just in case?

 

2 hours ago, johnnie.black said:

There was a problem with the onboard SATA controller:

 


Mar  9 01:41:23 MXCR-Unraid kernel: ahci 0000:03:00.1: Event logged [IO_PAGE_FAULT domain=0x0000 address=0x00000000e394c000 flags=0x0000]

 

I've seen this multiple times on this forum with Ryzen boards, it might only happen with IOMMU enable.

Thanks for the suggestion. I've disabled IOMMU in the BIOS to rule it out. I've started a parity rebuild now and will monitor to see if occurs again.

mxcr-unraid-diagnostics-20200309-1437.zip

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...