NVMe cache losing connection, will only reappear after reseating on motherboard

Alchemist Zim · January 6, 2023

This has been an ongoing issue that has gotten worse with 6.11.X

Every couple of days\weeks my NVMe drive will lose its connection with unRAID. Docker service stops, VMs stop, shares and GUI still work

I have to reseat the drive before it is recognized by unRAID after a restart. Because of this the array never fully stops, so upon restart it does a parity check which takes about a day to run with 14TB parity drives

Any help is appreciated...diagnostics attached

tesseract-diagnostics-20230106-1721.zip

trurl · January 6, 2023

Have you tried updating nvme firmware?

Alchemist Zim · January 6, 2023

13 minutes ago, trurl said:

Have you tried updating nvme firmware?

didn't even know that was a thing until just now

will try that and get back to you

if it happens again, will try to get diagnostics before restarting

JorgeB · January 7, 2023

This can sometimes help, on the main GUI page click on the flash drive, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (top right) and add this to your default boot option, after "append initrd=/bzroot":

nvme_core.default_ps_max_latency_us=0 pcie_aspm=off

e.g.:

append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 pcie_aspm=off

Reboot and see if it makes a difference.

10 hours ago, Alchemist Zim said:

I have to reseat the drive before it is recognized by unRAID after a restart.

Most likely just power cycling the server will bring it back, just a reboot usually won't.

Alchemist Zim · January 8, 2023

happened overnight,

diags posted pre and post reboot

Quote

This can sometimes help, on the main GUI page click on the flash drive, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (top right) and add this to your default boot option, after "append initrd=/bzroot":

Will try this if it fails again

Quote

Most likely just power cycling the server will bring it back, just a reboot usually won't.

I think I tried this previously, and it didn't work...Will try again

prereboot - tesseract-diagnostics-20230108-0516.zip postreboot - tesseract-diagnostics-20230108-0530.zip

Alchemist Zim · January 12, 2023

Just Happened...adding nvme_core.default to boot

Jan 11 19:27:54 TESSERACT  emhttpd: shcmd (3450934): umount /mnt/cache-nvme
Jan 11 19:27:54 TESSERACT root: umount: /mnt/cache-nvme: target is busy.
Jan 11 19:27:54 TESSERACT  emhttpd: shcmd (3450934): exit status: 32
Jan 11 19:27:54 TESSERACT  emhttpd: Retry unmounting disk share(s)...
Jan 11 19:27:59 TESSERACT  emhttpd: Unmounting disks...
Jan 11 19:27:59 TESSERACT  emhttpd: shcmd (3450935): umount /mnt/cache-nvme
Jan 11 19:27:59 TESSERACT root: umount: /mnt/cache-nvme: target is busy.
Jan 11 19:27:59 TESSERACT  emhttpd: shcmd (3450935): exit status: 32
Jan 11 19:27:59 TESSERACT  emhttpd: Retry unmounting disk share(s)...
Jan 11 19:28:04 TESSERACT kernel: btrfs_dev_stat_print_on_error: 25 callbacks suppressed
Jan 11 19:28:04 TESSERACT kernel: BTRFS error (device nvme0n1p1: state EA): bdev /dev/nvme0n1p1 errs: wr 63, rd 187186, flush 0, corrupt 0, gen 0
Jan 11 19:28:04 TESSERACT kernel: I/O error, dev loop2, sector 0 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
Jan 11 19:28:04 TESSERACT kernel: BTRFS error (device nvme0n1p1: state EA): bdev /dev/nvme0n1p1 errs: wr 63, rd 187187, flush 0, corrupt 0, gen 0
Jan 11 19:28:04 TESSERACT kernel: I/O error, dev loop2, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Jan 11 19:28:04 TESSERACT kernel: Buffer I/O error on dev loop2, logical block 0, async page read
Jan 11 19:28:04 TESSERACT  emhttpd: Unmounting disks...
Jan 11 19:28:04 TESSERACT  emhttpd: shcmd (3450937): umount /mnt/cache-nvme
Jan 11 19:28:04 TESSERACT root: umount: /mnt/cache-nvme: target is busy.
Jan 11 19:28:04 TESSERACT  emhttpd: shcmd (3450937): exit status: 32
Jan 11 19:28:04 TESSERACT  emhttpd: Retry unmounting disk share(s)...
Jan 11 19:28:06 TESSERACT kernel: BTRFS error (device nvme0n1p1: state EA): bdev /dev/nvme0n1p1 errs: wr 63, rd 187188, flush 0, corrupt 0, gen 0
Jan 11 19:28:06 TESSERACT kernel: I/O error, dev loop3, sector 0 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
Jan 11 19:28:06 TESSERACT kernel: BTRFS error (device nvme0n1p1: state EA): bdev /dev/nvme0n1p1 errs: wr 63, rd 187189, flush 0, corrupt 0, gen 0
Jan 11 19:28:06 TESSERACT kernel: I/O error, dev loop3, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Jan 11 19:28:06 TESSERACT kernel: Buffer I/O error on dev loop3, logical block 0, async page read
Jan 11 19:28:06 TESSERACT kernel: BTRFS error (device nvme0n1p1: state EA): bdev /dev/nvme0n1p1 errs: wr 63, rd 187190, flush 0, corrupt 0, gen 0
Jan 11 19:28:06 TESSERACT kernel: I/O error, dev loop2, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Jan 11 19:28:06 TESSERACT kernel: Buffer I/O error on dev loop2, logical block 0, async page read
Jan 11 19:28:06 TESSERACT kernel: BTRFS error (device nvme0n1p1: state EA): bdev /dev/nvme0n1p1 errs: wr 63, rd 187191, flush 0, corrupt 0, gen 0
Jan 11 19:28:06 TESSERACT kernel: I/O error, dev loop3, sector 0 op 0x0:(READ) flags 0x0 phys_seg 1 prio class 0
Jan 11 19:28:06 TESSERACT kernel: Buffer I/O error on dev loop3, logical block 0, async page read
Jan 11 19:28:09 TESSERACT  emhttpd: Unmounting disks...
Jan 11 19:28:09 TESSERACT  emhttpd: shcmd (3450938): umount /mnt/cache-nvme
Jan 11 19:28:09 TESSERACT root: umount: /mnt/cache-nvme: target is busy.
Jan 11 19:28:09 TESSERACT  emhttpd: shcmd (3450938): exit status: 32
Jan 11 19:28:09 TESSERACT  emhttpd: Retry unmounting disk share(s)...
Jan 11 19:28:14 TESSERACT  emhttpd: Unmounting disks...

full power cycle by turning off power at the PSU did restore the nvme without reseating...thanks JorgeB

tesseract-diagnostics-20230111-1918.zip

Alchemist Zim · January 20, 2023

adding nvme_core.default_ps_max_latency_us=0 pcie_aspm=off seems to have fixed it

been up for a week with no issues

Thanks

NVMe cache losing connection, will only reappear after reseating on motherboard

Recommended Posts

Alchemist Zim

Link to comment

trurl

Link to comment

Alchemist Zim

Link to comment

JorgeB

Link to comment

Alchemist Zim

Link to comment

Alchemist Zim

Link to comment

Alchemist Zim

Link to comment

Join the conversation