Jump to content

SSD cache drive disappearing until reboot


Go to solution Solved by JorgeB,

Recommended Posts

Started happening a few weeks ago, thought It may have been urbackup or a lack of trim since apparently I didn't reenable once the plugin moved to the OS. Happens on the latest unraid 6 release and the unraid 7 beta. Essentially my ssd randomly will drop from the os, causing all my shares and dockers to lock up. Sometimes i can stop the array and reboot from the gui, other times I have to ssh and do a shutdown -r. A reboot brings the drive back. SMART doesn't seem to show any errors. Any suggestions?

 

  2832  Jun 30 11:10:04 Alpha rc.docker: Containers started.
  2833  Jun 30 11:10:04 Alpha rc.docker: bazarr: started successfully!
  2834  Jun 30 11:10:44 Alpha kernel: iommu ivhd2: AMD-Vi: Event logged [INVALID_DEVICE_REQUEST device=0000:00:00.0 pasid=0x00000 address=0xfffffffdf8000000 flags=0x0a00]
  2835  Jun 30 11:11:00 Alpha flash_backup: adding task: /usr/local/emhttp/plugins/dynamix.my.servers/scripts/UpdateFlashBackup update
  2836  Jun 30 11:11:07 Alpha kernel: nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10
  2837  Jun 30 11:11:07 Alpha kernel: nvme nvme0: Does your device have a faulty power saving mode enabled?
  2838  Jun 30 11:11:07 Alpha kernel: nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug
  2839  Jun 30 11:11:07 Alpha kernel: nvme0n1: I/O Cmd(0x2) @ LBA 731666232, 256 blocks, I/O Error (sct 0x3 / sc 0x71)
  2840  Jun 30 11:11:07 Alpha kernel: I/O error, dev nvme0n1, sector 731666232 op 0x0:(READ) flags 0x80700 phys_seg 3 prio class 0
  2841  Jun 30 11:11:07 Alpha kernel: nvme0n1: I/O Cmd(0x2) @ LBA 84296928, 8 blocks, I/O Error (sct 0x3 / sc 0x71)
  2842  Jun 30 11:11:07 Alpha kernel: I/O error, dev nvme0n1, sector 84296928 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
  2843  Jun 30 11:11:07 Alpha kernel: nvme0n1: I/O Cmd(0x2) @ LBA 98138344, 8 blocks, I/O Error (sct 0x3 / sc 0x71)
  2844  Jun 30 11:11:07 Alpha kernel: I/O error, dev nvme0n1, sector 98138344 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
  2845  Jun 30 11:11:07 Alpha kernel: nvme0n1: I/O Cmd(0x2) @ LBA 48090744, 8 blocks, I/O Error (sct 0x3 / sc 0x71)
  2846  Jun 30 11:11:07 Alpha kernel: I/O error, dev nvme0n1, sector 48090744 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
  2847  Jun 30 11:11:07 Alpha kernel: nvme0n1: I/O Cmd(0x2) @ LBA 788687176, 8 blocks, I/O Error (sct 0x3 / sc 0x71)
  2848  Jun 30 11:11:07 Alpha kernel: I/O error, dev nvme0n1, sector 788687176 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
  2849  Jun 30 11:11:07 Alpha kernel: nvme0n1: I/O Cmd(0x2) @ LBA 499727632, 8 blocks, I/O Error (sct 0x3 / sc 0x71)
  2850  Jun 30 11:11:07 Alpha kernel: nvme0n1: I/O Cmd(0x2) @ LBA 683782920, 32 blocks, I/O Error (sct 0x3 / sc 0x71)
  2851  Jun 30 11:11:07 Alpha kernel: nvme0n1: I/O Cmd(0x2) @ LBA 75812848, 8 blocks, I/O Error (sct 0x3 / sc 0x71)
  2852  Jun 30 11:11:07 Alpha kernel: nvme0n1: I/O Cmd(0x2) @ LBA 731666488, 256 blocks, I/O Error (sct 0x3 / sc 0x71)
  2853  Jun 30 11:11:07 Alpha kernel: nvme0n1: I/O Cmd(0x2) @ LBA 688794488, 8 blocks, I/O Error (sct 0x3 / sc 0x71)
  2854  Jun 30 11:11:07 Alpha kernel: I/O error, dev nvme0n1, sector 683782920 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
  2855  Jun 30 11:11:07 Alpha kernel: I/O error, dev nvme0n1, sector 499727632 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
  2856  Jun 30 11:11:07 Alpha kernel: I/O error, dev nvme0n1, sector 75812848 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
  2857  Jun 30 11:11:07 Alpha kernel: I/O error, dev nvme0n1, sector 688794488 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0
  2858  Jun 30 11:11:07 Alpha kernel: I/O error, dev nvme0n1, sector 731666488 op 0x0:(READ) flags 0x80700 phys_seg 3 prio class 0
  2859  Jun 30 11:11:07 Alpha kernel: nvme 0000:41:00.0: enabling device (0000 -> 0002)
  2860  Jun 30 11:11:07 Alpha kernel: nvme nvme0: Shutdown timeout set to 8 seconds
  2861  Jun 30 11:12:10 Alpha kernel: nvme nvme0: I/O tag 14 (400e) QID 0 timeout, disable controller
  2862  Jun 30 11:12:10 Alpha kernel: nvme nvme0: failed to set APST feature (-4)
  2863  Jun 30 11:12:10 Alpha kernel: nvme nvme0: Disabling device after reset failure: -4
  2864  Jun 30 11:12:10 Alpha rsyslogd: file '/mnt/user/Syslog/syslog-10.0.2.0.log'[10] write error - see https://www.rsyslog.com/solving-rsyslog-write-errors/ for help OS error: Input/output error [v8.2102.0 try https://www.rsyslog.com/e/2027 ]
  2865  Jun 30 11:12:10 Alpha kernel: I/O error, dev nvme0n1, sector 23453552 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
  2866  Jun 30 11:12:10 Alpha kernel: I/O error, dev nvme0n1, sector 488618439 op 0x1:(WRITE) flags 0x9800 phys_seg 1 prio class 0
  2867  Jun 30 11:12:10 Alpha kernel: XFS (nvme0n1p1): log I/O error -5
  2868  Jun 30 11:12:10 Alpha kernel: I/O error, dev nvme0n1, sector 14882216 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
  2869  Jun 30 11:12:10 Alpha kernel: I/O error, dev nvme0n1, sector 23453560 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0
  2870  Jun 30 11:12:10 Alpha kernel: I/O error, dev nvme0n1, sector 633858960 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 0
  2871  Jun 30 11:12:10 Alpha kernel: nvme0n1p1: writeback error on inode 561999096, offset 9269248, sector 633858960
  2872  Jun 30 11:12:10 Alpha kernel: nvme0n1p1: writeback error on inode 23454846, offset 24576, sector 23453552
  2873  Jun 30 11:12:10 Alpha kernel: nvme0n1p1: writeback error on inode 21467663, offset 0, sector 21464784
  2874  Jun 30 11:12:10 Alpha kernel: nvme0n1p1: writeback error on inode 23454846, offset 28672, sector 23453560
  2875  Jun 30 11:12:10 Alpha kernel: XFS (nvme0n1p1): Filesystem has been shut down due to log error (0x2).
  2876  Jun 30 11:12:10 Alpha kernel: nvme0n1p1: writeback error on inode 21451625, offset 413696, sector 14882216
  2877  Jun 30 11:12:10 Alpha kernel: XFS (nvme0n1p1): Please unmount the filesystem and rectify the problem(s).
  2878  Jun 30 11:12:10 Alpha rsyslogd: file '/mnt/user/Syslog/syslog-10.0.2.0.log': open error: Input/output error [v8.2102.0 try https://www.rsyslog.com/e/2433 ]
  2879  Jun 30 11:12:10 Alpha kernel: nvme0n1p1: writeback error on inode 21451625, offset 417792, sector 14883416
  2880  Jun 30 11:12:10 Alpha kernel: loop: Write error at byte offset 2464059392, length 4096.
  2881  Jun 30 11:12:10 Alpha kernel: I/O error, dev loop2, sector 564032 op 0x0:(READ) flags 0x1000 phys_seg 1 prio class 0
  2882  Jun 30 11:12:10 Alpha kernel: I/O error, dev loop2, sector 4812616 op 0x1:(WRITE) flags 0x0 phys_seg 3 prio class 0
  2883  Jun 30 11:12:10 Alpha kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 1, flush 0, corrupt 0, gen 0
  2884  Jun 30 11:12:10 Alpha kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 1, rd 1, flush 0, corrupt 0, gen 0

 

Edited by chiefo
Cache drive not pool, single cache ssd
Link to comment
  • chiefo changed the title to SSD cache drive disappearing until reboot
  • Solution

Try this to see if it helps:

 

on the main GUI page click on the flash drive, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (top right) and add this to your default boot option, after "append initrd=/bzroot"

nvme_core.default_ps_max_latency_us=0 pcie_aspm=off

e.g.:

append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 pcie_aspm=off


Reboot and see if it makes a difference.

Link to comment
On 7/1/2024 at 3:32 AM, JorgeB said:

Try this to see if it helps:

 

on the main GUI page click on the flash drive, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (top right) and add this to your default boot option, after "append initrd=/bzroot"

nvme_core.default_ps_max_latency_us=0 pcie_aspm=off

e.g.:

append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 pcie_aspm=off


Reboot and see if it makes a difference.

So far so good. I kind of assumed that was a generic message since this system has been stable for years with the same hardware. Was there something that recently changed in the unraid kernel that may have changed how this works?

Link to comment
On 7/2/2024 at 9:58 AM, JorgeB said:

Just a kernel change can make a difference sometimes, for better or worse, or it may not have had enough time to error again.

Whelp. Been a week and seems to be solved. Before it wasn't making it more then say a day or so. Thanks!

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...