chiefo Posted June 30 Share Posted June 30 (edited) Started happening a few weeks ago, thought It may have been urbackup or a lack of trim since apparently I didn't reenable once the plugin moved to the OS. Happens on the latest unraid 6 release and the unraid 7 beta. Essentially my ssd randomly will drop from the os, causing all my shares and dockers to lock up. Sometimes i can stop the array and reboot from the gui, other times I have to ssh and do a shutdown -r. A reboot brings the drive back. SMART doesn't seem to show any errors. Any suggestions? 2832 Jun 30 11:10:04 Alpha rc.docker: Containers started. 2833 Jun 30 11:10:04 Alpha rc.docker: bazarr: started successfully! 2834 Jun 30 11:10:44 Alpha kernel: iommu ivhd2: AMD-Vi: Event logged [INVALID_DEVICE_REQUEST device=0000:00:00.0 pasid=0x00000 address=0xfffffffdf8000000 flags=0x0a00] 2835 Jun 30 11:11:00 Alpha flash_backup: adding task: /usr/local/emhttp/plugins/dynamix.my.servers/scripts/UpdateFlashBackup update 2836 Jun 30 11:11:07 Alpha kernel: nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10 2837 Jun 30 11:11:07 Alpha kernel: nvme nvme0: Does your device have a faulty power saving mode enabled? 2838 Jun 30 11:11:07 Alpha kernel: nvme nvme0: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug 2839 Jun 30 11:11:07 Alpha kernel: nvme0n1: I/O Cmd(0x2) @ LBA 731666232, 256 blocks, I/O Error (sct 0x3 / sc 0x71) 2840 Jun 30 11:11:07 Alpha kernel: I/O error, dev nvme0n1, sector 731666232 op 0x0:(READ) flags 0x80700 phys_seg 3 prio class 0 2841 Jun 30 11:11:07 Alpha kernel: nvme0n1: I/O Cmd(0x2) @ LBA 84296928, 8 blocks, I/O Error (sct 0x3 / sc 0x71) 2842 Jun 30 11:11:07 Alpha kernel: I/O error, dev nvme0n1, sector 84296928 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0 2843 Jun 30 11:11:07 Alpha kernel: nvme0n1: I/O Cmd(0x2) @ LBA 98138344, 8 blocks, I/O Error (sct 0x3 / sc 0x71) 2844 Jun 30 11:11:07 Alpha kernel: I/O error, dev nvme0n1, sector 98138344 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0 2845 Jun 30 11:11:07 Alpha kernel: nvme0n1: I/O Cmd(0x2) @ LBA 48090744, 8 blocks, I/O Error (sct 0x3 / sc 0x71) 2846 Jun 30 11:11:07 Alpha kernel: I/O error, dev nvme0n1, sector 48090744 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0 2847 Jun 30 11:11:07 Alpha kernel: nvme0n1: I/O Cmd(0x2) @ LBA 788687176, 8 blocks, I/O Error (sct 0x3 / sc 0x71) 2848 Jun 30 11:11:07 Alpha kernel: I/O error, dev nvme0n1, sector 788687176 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0 2849 Jun 30 11:11:07 Alpha kernel: nvme0n1: I/O Cmd(0x2) @ LBA 499727632, 8 blocks, I/O Error (sct 0x3 / sc 0x71) 2850 Jun 30 11:11:07 Alpha kernel: nvme0n1: I/O Cmd(0x2) @ LBA 683782920, 32 blocks, I/O Error (sct 0x3 / sc 0x71) 2851 Jun 30 11:11:07 Alpha kernel: nvme0n1: I/O Cmd(0x2) @ LBA 75812848, 8 blocks, I/O Error (sct 0x3 / sc 0x71) 2852 Jun 30 11:11:07 Alpha kernel: nvme0n1: I/O Cmd(0x2) @ LBA 731666488, 256 blocks, I/O Error (sct 0x3 / sc 0x71) 2853 Jun 30 11:11:07 Alpha kernel: nvme0n1: I/O Cmd(0x2) @ LBA 688794488, 8 blocks, I/O Error (sct 0x3 / sc 0x71) 2854 Jun 30 11:11:07 Alpha kernel: I/O error, dev nvme0n1, sector 683782920 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0 2855 Jun 30 11:11:07 Alpha kernel: I/O error, dev nvme0n1, sector 499727632 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0 2856 Jun 30 11:11:07 Alpha kernel: I/O error, dev nvme0n1, sector 75812848 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0 2857 Jun 30 11:11:07 Alpha kernel: I/O error, dev nvme0n1, sector 688794488 op 0x0:(READ) flags 0x80700 phys_seg 1 prio class 0 2858 Jun 30 11:11:07 Alpha kernel: I/O error, dev nvme0n1, sector 731666488 op 0x0:(READ) flags 0x80700 phys_seg 3 prio class 0 2859 Jun 30 11:11:07 Alpha kernel: nvme 0000:41:00.0: enabling device (0000 -> 0002) 2860 Jun 30 11:11:07 Alpha kernel: nvme nvme0: Shutdown timeout set to 8 seconds 2861 Jun 30 11:12:10 Alpha kernel: nvme nvme0: I/O tag 14 (400e) QID 0 timeout, disable controller 2862 Jun 30 11:12:10 Alpha kernel: nvme nvme0: failed to set APST feature (-4) 2863 Jun 30 11:12:10 Alpha kernel: nvme nvme0: Disabling device after reset failure: -4 2864 Jun 30 11:12:10 Alpha rsyslogd: file '/mnt/user/Syslog/syslog-10.0.2.0.log'[10] write error - see https://www.rsyslog.com/solving-rsyslog-write-errors/ for help OS error: Input/output error [v8.2102.0 try https://www.rsyslog.com/e/2027 ] 2865 Jun 30 11:12:10 Alpha kernel: I/O error, dev nvme0n1, sector 23453552 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0 2866 Jun 30 11:12:10 Alpha kernel: I/O error, dev nvme0n1, sector 488618439 op 0x1:(WRITE) flags 0x9800 phys_seg 1 prio class 0 2867 Jun 30 11:12:10 Alpha kernel: XFS (nvme0n1p1): log I/O error -5 2868 Jun 30 11:12:10 Alpha kernel: I/O error, dev nvme0n1, sector 14882216 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0 2869 Jun 30 11:12:10 Alpha kernel: I/O error, dev nvme0n1, sector 23453560 op 0x1:(WRITE) flags 0x800 phys_seg 1 prio class 0 2870 Jun 30 11:12:10 Alpha kernel: I/O error, dev nvme0n1, sector 633858960 op 0x1:(WRITE) flags 0x100000 phys_seg 1 prio class 0 2871 Jun 30 11:12:10 Alpha kernel: nvme0n1p1: writeback error on inode 561999096, offset 9269248, sector 633858960 2872 Jun 30 11:12:10 Alpha kernel: nvme0n1p1: writeback error on inode 23454846, offset 24576, sector 23453552 2873 Jun 30 11:12:10 Alpha kernel: nvme0n1p1: writeback error on inode 21467663, offset 0, sector 21464784 2874 Jun 30 11:12:10 Alpha kernel: nvme0n1p1: writeback error on inode 23454846, offset 28672, sector 23453560 2875 Jun 30 11:12:10 Alpha kernel: XFS (nvme0n1p1): Filesystem has been shut down due to log error (0x2). 2876 Jun 30 11:12:10 Alpha kernel: nvme0n1p1: writeback error on inode 21451625, offset 413696, sector 14882216 2877 Jun 30 11:12:10 Alpha kernel: XFS (nvme0n1p1): Please unmount the filesystem and rectify the problem(s). 2878 Jun 30 11:12:10 Alpha rsyslogd: file '/mnt/user/Syslog/syslog-10.0.2.0.log': open error: Input/output error [v8.2102.0 try https://www.rsyslog.com/e/2433 ] 2879 Jun 30 11:12:10 Alpha kernel: nvme0n1p1: writeback error on inode 21451625, offset 417792, sector 14883416 2880 Jun 30 11:12:10 Alpha kernel: loop: Write error at byte offset 2464059392, length 4096. 2881 Jun 30 11:12:10 Alpha kernel: I/O error, dev loop2, sector 564032 op 0x0:(READ) flags 0x1000 phys_seg 1 prio class 0 2882 Jun 30 11:12:10 Alpha kernel: I/O error, dev loop2, sector 4812616 op 0x1:(WRITE) flags 0x0 phys_seg 3 prio class 0 2883 Jun 30 11:12:10 Alpha kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 1, flush 0, corrupt 0, gen 0 2884 Jun 30 11:12:10 Alpha kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 1, rd 1, flush 0, corrupt 0, gen 0 Edited June 30 by chiefo Cache drive not pool, single cache ssd Quote Link to comment
Solution JorgeB Posted July 1 Solution Share Posted July 1 Try this to see if it helps: on the main GUI page click on the flash drive, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (top right) and add this to your default boot option, after "append initrd=/bzroot" nvme_core.default_ps_max_latency_us=0 pcie_aspm=off e.g.: append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 pcie_aspm=off Reboot and see if it makes a difference. Quote Link to comment
chiefo Posted July 2 Author Share Posted July 2 On 7/1/2024 at 3:32 AM, JorgeB said: Try this to see if it helps: on the main GUI page click on the flash drive, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (top right) and add this to your default boot option, after "append initrd=/bzroot" nvme_core.default_ps_max_latency_us=0 pcie_aspm=off e.g.: append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 pcie_aspm=off Reboot and see if it makes a difference. So far so good. I kind of assumed that was a generic message since this system has been stable for years with the same hardware. Was there something that recently changed in the unraid kernel that may have changed how this works? Quote Link to comment
JorgeB Posted July 2 Share Posted July 2 Just a kernel change can make a difference sometimes, for better or worse, or it may not have had enough time to error again. Quote Link to comment
chiefo Posted July 10 Author Share Posted July 10 On 7/2/2024 at 9:58 AM, JorgeB said: Just a kernel change can make a difference sometimes, for better or worse, or it may not have had enough time to error again. Whelp. Been a week and seems to be solved. Before it wasn't making it more then say a day or so. Thanks! 1 Quote Link to comment
chiefo Posted July 17 Author Share Posted July 17 Well. Looks like it happened again this morning. Although it lasted a lot longer this time. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.