hellpower Posted February 21 Share Posted February 21 Hi all, I upgraded my unraid server 3 month ago To the following: ASRock Z690 Extreme purchase date: 26 nov 2023 13th Gen Intel® Core™ i3-13100F purchase date: 26 nov 2023 64 GiB DDR4 purchase date: 26 nov 2023 GeForce GTX 1050 Ti Samsung_SSD_980_1TB (old ssd but working fine) purchase date: 4 years ago Samsung 990 PRO 4TB (Not working, missing) purchase date: 07 jan 2024 5x 4tb hdd, 3 of with are WD red and the others are Seagate Constellation ES.3(refurbished) Hey, so my setup was all good initially, but after about 1 or 2 weeks, I started getting this annoying filesystem read-only error. And guess what? After a reboot unraid says my SSD is missing. The weird thing is, when I jiggle it around a bit (yes, I'm talking about reseating it), it magically works again for another couple of weeks. But hey, I'm on vacation right now and can't do that. So, to avoid the headache, I just removed the SSD for now. It's not like it's doing anything super important anyway. The fist problem listed in the syslogs is: Feb 21 08:00:02 Tower kernel: btrfs_dev_stat_inc_and_print: 66 callbacks suppressed Feb 21 08:00:02 Tower kernel: BTRFS error (device nvme1n1p1: state EA): bdev /dev/nvme1n1p1 errs: wr 2, rd 585789, flush 0, corrupt 0, gen 0 Yeah, that's the culprit, spamming my logs over and over again. Annoying as heck! Quote Link to comment
JorgeB Posted February 21 Share Posted February 21 Please post the diagnostics. Quote Link to comment
hellpower Posted February 21 Author Share Posted February 21 tower-diagnostics-20240221-1338.zip Quote Link to comment
JorgeB Posted February 21 Share Posted February 21 I don't see the device dropping in those diags, if it happens again save new ones, in the meantime you can try this: On the main GUI page click on the flash drive, scroll down to "Syslinux Configuration", make sure it's set to "menu view" (top right) and add this to your default boot option, after "append initrd=/bzroot" nvme_core.default_ps_max_latency_us=0 pcie_aspm=off e.g.: append initrd=/bzroot nvme_core.default_ps_max_latency_us=0 pcie_aspm=off Reboot and see if it makes a difference. Quote Link to comment
hellpower Posted February 21 Author Share Posted February 21 syslog-20240220-184239.txt This is the log of yesterday. Feb 20 02:12:55 Tower kernel: nvme nvme1: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0xffff Feb 20 02:12:55 Tower kernel: nvme nvme1: Does your device have a faulty power saving mode enabled? Feb 20 02:12:55 Tower kernel: nvme nvme1: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" and report a bug Feb 20 02:12:55 Tower kernel: nvme1n1: I/O Cmd(0x2) @ LBA 709300912, 32 blocks, I/O Error (sct 0x3 / sc 0x71) Feb 20 02:12:55 Tower kernel: I/O error, dev nvme1n1, sector 709300912 op 0x0:(READ) flags 0x80700 phys_seg 3 prio class 2 I have used powertop to lower the power consumption a bit. but can't see it in there now. also the reboot did nothing. it did not comeback at least Quote Link to comment
Michael_P Posted February 21 Share Posted February 21 DId you try what @JorgeB suggested? The log is telling you to try it, too Feb 20 02:12:55 Tower kernel: nvme nvme1: Try "nvme_core.default_ps_max_latency_us=0 pcie_aspm=off" Quote Link to comment
hellpower Posted February 21 Author Share Posted February 21 (edited) yup, already changed it and did a reboot. Edited February 21 by hellpower Quote Link to comment
JorgeB Posted February 21 Share Posted February 21 See if it helps, sometimes it does, especially when that is mentioned when it went down. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.