Jump to content
LAST CALL on the Unraid Summer Sale! 😎 ⌛ ×

Cache disk randomly going missing/offline


Go to solution Solved by JorgeB,

Recommended Posts

Hi guys,

I've recently been tinkering with my unraid server. I added a pcie card with 4-port nic to use with a pfSenseVM. I was also using powertop autotune.

 

Everything was going fine and well, but I started getting some stability issues specifically with my Cache drive.

 

I had one Kingston KC2500 NVME drive serving as my cache drive and for my docker and VMs. I would randomly get errors where my dockers and VM would crash and the cache drive was inaccessible. If I rebooted the cache drive would also be missing.

 

I originally thought it could be the pcie card so I removed it and eventually also added a SATA SSD for the cache so now it is Raid 1. Also added the "append initrd=/bzroot nvme_core.default_ps_max_latency_us=0" to my flash. With no VM on and powertop autotune on I still get the random dropping cache pool on both the NVME and SATA SSD.

 

It seems to be okay with powertop autotune off, but I'm not sure that was the problem.

 

Log seems to suggest read errors on my nvme, is this a hardware issue? Could the pcie nic be affecting my NVME?

 

Any advice is appreciated, thanks!

tower-syslog-20221201-0435.zip tower-diagnostics-20221129-1406.zip

Link to comment
21 hours ago, JorgeB said:

Diags are after rebooting but if the device is dropping also add

pcie_aspm=off

to syslinux to see if helps.

Is that for after the initrd=/bzroot as well?

 

Sorry, here are the files. Another one just occurred. Reboot from the webUI didn't bring the NVME or the SATA SSD back, it shows that they were "missing disks".

When I safely shutdown from the webUI and did not flip off the PSU switch, but manually turned the server one using the power button everything worked fine again.

tower-syslog-20221201-2245.zip tower-diagnostics-20221202-1146.zip

Link to comment
  • Solution
1 hour ago, Apex_Budi said:

Is that for after the initrd=/bzroot as well?

Yep.

 

Dec  1 22:53:03 Tower kernel: nvme nvme0: Abort status: 0x371
Dec  1 22:54:04 Tower kernel: nvme nvme0: Device not ready; aborting reset, CSTS=0x1
Dec  1 22:54:04 Tower kernel: nvme nvme0: Removing after probe failure status: -19
Dec  1 22:55:04 Tower kernel: nvme nvme0: Device not ready; aborting reset, CSTS=0x1
Dec  1 22:55:04 Tower kernel: nvme0n1: detected capacity change from 488397168 to 0

Device is dropping offline, see if the above helps, if it doesn't look for a BIOS update or try a different NVMe device if available.

  • Like 1
Link to comment
7 minutes ago, JorgeB said:

Yep.

 

Dec  1 22:53:03 Tower kernel: nvme nvme0: Abort status: 0x371
Dec  1 22:54:04 Tower kernel: nvme nvme0: Device not ready; aborting reset, CSTS=0x1
Dec  1 22:54:04 Tower kernel: nvme nvme0: Removing after probe failure status: -19
Dec  1 22:55:04 Tower kernel: nvme nvme0: Device not ready; aborting reset, CSTS=0x1
Dec  1 22:55:04 Tower kernel: nvme0n1: detected capacity change from 488397168 to 0

Device is dropping offline, see if the above helps, if it doesn't look for a BIOS update or try a different NVMe device if available.

Thanks! i'll give that a go! I know BIOS is old because parts are second hand.

 

Is it weird that the SATA SSD in the same cache pool also drops offline/goes missing when the NVME is the one that has problems?

Link to comment
39 minutes ago, Apex_Budi said:

Is it weird that the SATA SSD in the same cache pool also drops offline/goes missing when the NVME is the one that has problems?

Looks like the SATA SSD is also dropping offline, but can't see that in the diags, probably due to all the other errors.

 

39 minutes ago, Apex_Budi said:

Would powertop affect these at all? Especially on powersaving modes?

It can, you should try disabling it for now.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...