Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Cache disk randomly going missing/offline

Featured Replies

Hi guys,

I've recently been tinkering with my unraid server. I added a pcie card with 4-port nic to use with a pfSenseVM. I was also using powertop autotune.

 

Everything was going fine and well, but I started getting some stability issues specifically with my Cache drive.

 

I had one Kingston KC2500 NVME drive serving as my cache drive and for my docker and VMs. I would randomly get errors where my dockers and VM would crash and the cache drive was inaccessible. If I rebooted the cache drive would also be missing.

 

I originally thought it could be the pcie card so I removed it and eventually also added a SATA SSD for the cache so now it is Raid 1. Also added the "append initrd=/bzroot nvme_core.default_ps_max_latency_us=0" to my flash. With no VM on and powertop autotune on I still get the random dropping cache pool on both the NVME and SATA SSD.

 

It seems to be okay with powertop autotune off, but I'm not sure that was the problem.

 

Log seems to suggest read errors on my nvme, is this a hardware issue? Could the pcie nic be affecting my NVME?

 

Any advice is appreciated, thanks!

tower-syslog-20221201-0435.zip tower-diagnostics-20221129-1406.zip

Solved by JorgeB

Diags are after rebooting but if the device is dropping also add

pcie_aspm=off

to syslinux to see if helps.

  • Author
21 hours ago, JorgeB said:

Diags are after rebooting but if the device is dropping also add

pcie_aspm=off

to syslinux to see if helps.

Is that for after the initrd=/bzroot as well?

 

Sorry, here are the files. Another one just occurred. Reboot from the webUI didn't bring the NVME or the SATA SSD back, it shows that they were "missing disks".

When I safely shutdown from the webUI and did not flip off the PSU switch, but manually turned the server one using the power button everything worked fine again.

tower-syslog-20221201-2245.zip tower-diagnostics-20221202-1146.zip

  • Solution
1 hour ago, Apex_Budi said:

Is that for after the initrd=/bzroot as well?

Yep.

 

Dec  1 22:53:03 Tower kernel: nvme nvme0: Abort status: 0x371
Dec  1 22:54:04 Tower kernel: nvme nvme0: Device not ready; aborting reset, CSTS=0x1
Dec  1 22:54:04 Tower kernel: nvme nvme0: Removing after probe failure status: -19
Dec  1 22:55:04 Tower kernel: nvme nvme0: Device not ready; aborting reset, CSTS=0x1
Dec  1 22:55:04 Tower kernel: nvme0n1: detected capacity change from 488397168 to 0

Device is dropping offline, see if the above helps, if it doesn't look for a BIOS update or try a different NVMe device if available.

  • Author
7 minutes ago, JorgeB said:

Yep.

 

Dec  1 22:53:03 Tower kernel: nvme nvme0: Abort status: 0x371
Dec  1 22:54:04 Tower kernel: nvme nvme0: Device not ready; aborting reset, CSTS=0x1
Dec  1 22:54:04 Tower kernel: nvme nvme0: Removing after probe failure status: -19
Dec  1 22:55:04 Tower kernel: nvme nvme0: Device not ready; aborting reset, CSTS=0x1
Dec  1 22:55:04 Tower kernel: nvme0n1: detected capacity change from 488397168 to 0

Device is dropping offline, see if the above helps, if it doesn't look for a BIOS update or try a different NVMe device if available.

Thanks! i'll give that a go! I know BIOS is old because parts are second hand.

 

Is it weird that the SATA SSD in the same cache pool also drops offline/goes missing when the NVME is the one that has problems?

  • Author

Would powertop affect these at all? Especially on powersaving modes?

39 minutes ago, Apex_Budi said:

Is it weird that the SATA SSD in the same cache pool also drops offline/goes missing when the NVME is the one that has problems?

Looks like the SATA SSD is also dropping offline, but can't see that in the diags, probably due to all the other errors.

 

39 minutes ago, Apex_Budi said:

Would powertop affect these at all? Especially on powersaving modes?

It can, you should try disabling it for now.

  • Author

Had another missing disk even with pcie_aspm=off.

But had been stable for almost 24h now with no powertop auto-tune.

I suspect it could also be the CEC2019 that I enabled in BIOS (my mobo is Gigabyte H310M S2H) around the same time. It automatically turned on ASPM for everything.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.