Closed - Unraid Docker Crashing - Possible NvMe Failure


Recommended Posts

Good day, 

About 4 days ago docker crashed. All dockers reside in /mnt/cache. Cache being ADATA_SX8200PNP_2J3420080645 (nvme0n1)

When i checked in the main tab, the drive had disappeared. All else was fine. 
i restarted the server a couple of times, with no luck. I presumed a faulty drive; so I removed it and installed another nvme drive in the same pci slot and the new drive showed up. So sort of confirmed my suspicion. Nevertheless I reinstalled the presumed failed drive in its old location restarted and it appeared again. 

 

Server run fine last 4 days. This morning docker was again dead. I pulled the attached log and restarted unraid. All is working again.

Apparently the drive disconnected at 1:48am. 

 

Should I assume the drive has had it? It's 2 years old.

 

Thank you 

syslog.txt

Edited by juan11perez
closed
Link to comment

NVMe device dropped offline, this can sometimes help:

 

Some NVMe devices have issues with power states on Linux, try this, on the main GUI page click on flash, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (on the top right) and add this to your default boot option, after "append" and before "initrd=/bzroot"

nvme_core.default_ps_max_latency_us=0

Reboot and see if it makes a difference, if it doesn't look for a BIOS update and/or try a different brand/model device.

Link to comment

@JorgeB

Thank you for the advise. I'll implement and monitor. 

The drive has worked fine for 2 years and only started acting up after I added a 3rd GPU in the 3rd PCI lane. Not sure it has any bearing on it, but it's the only change.

I have another identical drive in the second m.2 slot and it's fine; but then again is 4 months old. 

Once again thank you.

Link to comment
  • 3 weeks later...

@JorgeB

Short note to update and again thank you for your advise. 

 

Adding "nvme_core.default_ps_max_latency_us=0" to syslinux did not resolve the issue. The drive kept disconnecting. 

 

I replaced it with a Samsung 970plus and that stopped the problem. Sever has now been running for over 10 days with no issues.

 

Not sure if this means the Adata drive failed, but if it did it's quite disappointing as it's only 2 years old and the internet is full of praises for this product.

 

  • Like 1
Link to comment
  • juan11perez changed the title to Closed - Unraid Docker Crashing - Possible NvMe Failure

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.