NVMe cache losing connection, will only reappear after reseating on motherboard


Go to solution Solved by JorgeB,

Recommended Posts

This has been an ongoing issue that has gotten worse with 6.11.X

 

Every couple of days\weeks my NVMe drive will lose its connection with unRAID.  Docker service stops, VMs stop, shares and GUI still work

I have to reseat the drive before it is recognized by unRAID after a restart.  Because of this the array never fully stops, so upon restart it does a parity check which takes about a day to run with 14TB parity drives

Any help is appreciated...diagnostics attached

 

 

tesseract-diagnostics-20230106-1721.zip

Link to comment
  • Solution

This can sometimes help, on the main GUI page click on the flash drive, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (top right) and add this to your default boot option, after "append initrd=/bzroot":

 

nvme_core.default_ps_max_latency_us=0 pcie_aspm=off

e.g.:

append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 pcie_aspm=off


Reboot and see if it makes a difference.

 

10 hours ago, Alchemist Zim said:

I have to reseat the drive before it is recognized by unRAID after a restart. 

Most likely just power cycling the server will bring it back, just a reboot usually won't.

Link to comment

happened overnight, 

diags posted pre and post reboot

 

Quote

This can sometimes help, on the main GUI page click on the flash drive, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (top right) and add this to your default boot option, after "append initrd=/bzroot":

Will try this if it fails again

 

Quote

Most likely just power cycling the server will bring it back, just a reboot usually won't.

I think I tried this previously, and it didn't work...Will try again

prereboot - tesseract-diagnostics-20230108-0516.zip postreboot - tesseract-diagnostics-20230108-0530.zip

Link to comment

Just Happened...adding nvme_core.default to boot

 

Jan 11 19:27:54 TESSERACT  emhttpd: shcmd (3450934): umount /mnt/cache-nvme
Jan 11 19:27:54 TESSERACT root: umount: /mnt/cache-nvme: target is busy.
Jan 11 19:27:54 TESSERACT  emhttpd: shcmd (3450934): exit status: 32
Jan 11 19:27:54 TESSERACT  emhttpd: Retry unmounting disk share(s)...
Jan 11 19:27:59 TESSERACT  emhttpd: Unmounting disks...
Jan 11 19:27:59 TESSERACT  emhttpd: shcmd (3450935): umount /mnt/cache-nvme
Jan 11 19:27:59 TESSERACT root: umount: /mnt/cache-nvme: target is busy.
Jan 11 19:27:59 TESSERACT  emhttpd: shcmd (3450935): exit status: 32
Jan 11 19:27:59 TESSERACT  emhttpd: Retry unmounting disk share(s)...
Jan 11 19:28:04 TESSERACT kernel: btrfs_dev_stat_print_on_error: 25 callbacks suppressed
Jan 11 19:28:04 TESSERACT kernel: BTRFS error (device nvme0n1p1: state EA): bdev /dev/nvme0n1p1 errs: wr 63, rd 187186, flush 0, corrupt 0, gen 0
Jan 11 19:28:04 TESSERACT kernel: I/O error, dev loop2, sector 0 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
Jan 11 19:28:04 TESSERACT kernel: BTRFS error (device nvme0n1p1: state EA): bdev /dev/nvme0n1p1 errs: wr 63, rd 187187, flush 0, corrupt 0, gen 0
Jan 11 19:28:04 TESSERACT kernel: I/O error, dev loop2, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Jan 11 19:28:04 TESSERACT kernel: Buffer I/O error on dev loop2, logical block 0, async page read
Jan 11 19:28:04 TESSERACT  emhttpd: Unmounting disks...
Jan 11 19:28:04 TESSERACT  emhttpd: shcmd (3450937): umount /mnt/cache-nvme
Jan 11 19:28:04 TESSERACT root: umount: /mnt/cache-nvme: target is busy.
Jan 11 19:28:04 TESSERACT  emhttpd: shcmd (3450937): exit status: 32
Jan 11 19:28:04 TESSERACT  emhttpd: Retry unmounting disk share(s)...
Jan 11 19:28:06 TESSERACT kernel: BTRFS error (device nvme0n1p1: state EA): bdev /dev/nvme0n1p1 errs: wr 63, rd 187188, flush 0, corrupt 0, gen 0
Jan 11 19:28:06 TESSERACT kernel: I/O error, dev loop3, sector 0 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
Jan 11 19:28:06 TESSERACT kernel: BTRFS error (device nvme0n1p1: state EA): bdev /dev/nvme0n1p1 errs: wr 63, rd 187189, flush 0, corrupt 0, gen 0
Jan 11 19:28:06 TESSERACT kernel: I/O error, dev loop3, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Jan 11 19:28:06 TESSERACT kernel: Buffer I/O error on dev loop3, logical block 0, async page read
Jan 11 19:28:06 TESSERACT kernel: BTRFS error (device nvme0n1p1: state EA): bdev /dev/nvme0n1p1 errs: wr 63, rd 187190, flush 0, corrupt 0, gen 0
Jan 11 19:28:06 TESSERACT kernel: I/O error, dev loop2, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Jan 11 19:28:06 TESSERACT kernel: Buffer I/O error on dev loop2, logical block 0, async page read
Jan 11 19:28:06 TESSERACT kernel: BTRFS error (device nvme0n1p1: state EA): bdev /dev/nvme0n1p1 errs: wr 63, rd 187191, flush 0, corrupt 0, gen 0
Jan 11 19:28:06 TESSERACT kernel: I/O error, dev loop3, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Jan 11 19:28:06 TESSERACT kernel: Buffer I/O error on dev loop3, logical block 0, async page read
Jan 11 19:28:09 TESSERACT  emhttpd: Unmounting disks...
Jan 11 19:28:09 TESSERACT  emhttpd: shcmd (3450938): umount /mnt/cache-nvme
Jan 11 19:28:09 TESSERACT root: umount: /mnt/cache-nvme: target is busy.
Jan 11 19:28:09 TESSERACT  emhttpd: shcmd (3450938): exit status: 32
Jan 11 19:28:09 TESSERACT  emhttpd: Retry unmounting disk share(s)...
Jan 11 19:28:14 TESSERACT  emhttpd: Unmounting disks...

 

full power cycle by turning off power at the PSU did restore the nvme without reseating...thanks JorgeB

tesseract-diagnostics-20230111-1918.zip

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.