mason1171 Posted April 5, 2021 Share Posted April 5, 2021 Unbeknownst to me, over the course of two hrs, my server experienced over 7000 btfs errors. I frankly have very little idea of what happened. I tried to stop the array as soon as I realized what was happening. Unraid after a min threw the "Retry unmounting user share(s)". I then booted into safe mode. I don't know where to go from here. Please help me out. I would greatly appreciate it. My very very long syslog is attached syslog-20210404-200343.txt Quote Link to comment
trurl Posted April 5, 2021 Share Posted April 5, 2021 Go to Tools - Diagnostics and attach the complete Diagnostics ZIP file to your NEXT post in this thread. Quote Link to comment
mason1171 Posted April 5, 2021 Author Share Posted April 5, 2021 (edited) Diagnostics Attached. These are not the diagnostics including the 7000 errors as I rebooted M1171-nas-diagnostics-20210404-2307.zip Edited April 5, 2021 by mason1171 Quote Link to comment
JorgeB Posted April 5, 2021 Share Posted April 5, 2021 Problem was caused by one of the cache devices dropping offline: Apr 4 18:50:50 M1171-NAS kernel: ata1: softreset failed (1st FIS failed) Apr 4 18:50:50 M1171-NAS kernel: ata1: limiting SATA link speed to 3.0 Gbps Apr 4 18:50:50 M1171-NAS kernel: ata1: hard resetting link Apr 4 18:50:55 M1171-NAS kernel: ata1: softreset failed (1st FIS failed) Apr 4 18:50:55 M1171-NAS kernel: ata1: reset failed, giving up Apr 4 18:50:55 M1171-NAS kernel: ata1.00: disabled There's a known issue with the onboard SATA controller in some Ryzen boards, look for a BIOS update, or use an add-on controller. 1 Quote Link to comment
mason1171 Posted April 5, 2021 Author Share Posted April 5, 2021 Problem was caused by one of the cache devices dropping offline: Apr 4 18:50:50 M1171-NAS kernel: ata1: softreset failed (1st FIS failed)Apr 4 18:50:50 M1171-NAS kernel: ata1: limiting SATA link speed to 3.0 GbpsApr 4 18:50:50 M1171-NAS kernel: ata1: hard resetting linkApr 4 18:50:55 M1171-NAS kernel: ata1: softreset failed (1st FIS failed)Apr 4 18:50:55 M1171-NAS kernel: ata1: reset failed, giving upApr 4 18:50:55 M1171-NAS kernel: ata1.00: disabled There's a known issue with the onboard SATA controller in some Ryzen boards, look for a BIOS update, or use an add-on controller.Ah hah, this is a relatively new board I have. My old board’s bios was up to date. This one may not be. Thank you very much. Ill look for an update Quote Link to comment
mason1171 Posted April 5, 2021 Author Share Posted April 5, 2021 10 hours ago, JorgeB said: Problem was caused by one of the cache devices dropping offline: Apr 4 18:50:50 M1171-NAS kernel: ata1: softreset failed (1st FIS failed) Apr 4 18:50:50 M1171-NAS kernel: ata1: limiting SATA link speed to 3.0 Gbps Apr 4 18:50:50 M1171-NAS kernel: ata1: hard resetting link Apr 4 18:50:55 M1171-NAS kernel: ata1: softreset failed (1st FIS failed) Apr 4 18:50:55 M1171-NAS kernel: ata1: reset failed, giving up Apr 4 18:50:55 M1171-NAS kernel: ata1.00: disabled There's a known issue with the onboard SATA controller in some Ryzen boards, look for a BIOS update, or use an add-on controller. After updating the bios and rebooting into unraid, I check the syslog and am met with additional errors. I've stopped the array again. Do you know what my nvme is doing? More in diagnostics Quote Apr 5 16:17:38 Jared-NAS emhttpd: shcmd (47): mount -t btrfs -o noatime,space_cache=v2,discard=async -U f3f35778-a797-4761-b345-d45e72821985 /mnt/cache Apr 5 16:17:38 Jared-NAS kernel: BTRFS info (device nvme0n1p1): turning on async discard Apr 5 16:17:38 Jared-NAS kernel: BTRFS info (device nvme0n1p1): using free space tree Apr 5 16:17:38 Jared-NAS kernel: BTRFS info (device nvme0n1p1): has skinny extents Apr 5 16:17:38 M1171-NAS kernel: BTRFS error (device nvme0n1p1): parent transid verify failed on 7546413809664 wanted 4193807 found 4193798 Apr 5 16:17:38 M1171-NAS kernel: BTRFS info (device nvme0n1p1): read error corrected: ino 0 off 7546413809664 (dev /dev/sdb1 sector 136423488) Apr 5 16:17:38 M1171-NAS kernel: BTRFS info (device nvme0n1p1): read error corrected: ino 0 off 7546413813760 (dev /dev/sdb1 sector 136423496) Apr 5 16:17:38 M1171-NAS kernel: BTRFS info (device nvme0n1p1): read error corrected: ino 0 off 7546413817856 (dev /dev/sdb1 sector 136423504) Apr 5 16:17:38 M1171-NAS kernel: BTRFS info (device nvme0n1p1): read error corrected: ino 0 off 7546413821952 (dev /dev/sdb1 sector 136423512) Apr 5 16:17:38 M1171-NAS kernel: BTRFS error (device nvme0n1p1): parent transid verify failed on 7546413826048 wanted 4193807 found 4193798 Apr 5 16:17:38 M1171-NAS kernel: BTRFS info (device nvme0n1p1): read error corrected: ino 0 off 7546413826048 (dev /dev/sdb1 sector 136423520) Apr 5 16:17:38 M1171-NAS kernel: BTRFS info (device nvme0n1p1): read error corrected: ino 0 off 7546413830144 (dev /dev/sdb1 sector 136423528) Apr 5 16:17:38 M1171-NAS kernel: BTRFS info (device nvme0n1p1): read error corrected: ino 0 off 7546413834240 (dev /dev/sdb1 sector 136423536) Apr 5 16:17:38 M1171-NAS kernel: BTRFS info (device nvme0n1p1): read error corrected: ino 0 off 7546413838336 (dev /dev/sdb1 sector 136423544) Apr 5 16:17:38 M1171-NAS kernel: BTRFS error (device nvme0n1p1): parent transid verify failed on 7546413858816 wanted 4193807 found 4193798 Apr 5 16:17:38 M1171-NAS kernel: BTRFS info (device nvme0n1p1): read error corrected: ino 0 off 7546413858816 (dev /dev/sdb1 sector 136423584) M1171-nas-diagnostics-20210405-1626.zip Quote Link to comment
mason1171 Posted April 5, 2021 Author Share Posted April 5, 2021 Should I also append this line to my syslinuxcfg? nvme_core.default_ps_max_latency_us=0 Quote Link to comment
JorgeB Posted April 6, 2021 Share Posted April 6, 2021 The errors you're seeing are because of the other pool device dropping earlier, you need to run a scrub, more info here. Quote Link to comment
mason1171 Posted April 6, 2021 Author Share Posted April 6, 2021 The errors you're seeing are because of the other pool device dropping earlier, you need to run a scrub, more info here.Thanks. Is there anything else I need to know? Quote Link to comment
JorgeB Posted April 6, 2021 Share Posted April 6, 2021 All info for that is the link above, you can ask if there are any doubts. Quote Link to comment
mason1171 Posted April 6, 2021 Author Share Posted April 6, 2021 All info for that is the link above, you can ask if there are any doubts.Much appreciated. Im truly grateful for your help Quote Link to comment
mason1171 Posted April 6, 2021 Author Share Posted April 6, 2021 All info for that is the link above, you can ask if there are any doubts.After scrubbing the cache, it looks like all errors were corrected. Ill check if the array needs scrubbing as well. Quote Link to comment
JorgeB Posted April 6, 2021 Share Posted April 6, 2021 15 minutes ago, mason1171 said: After scrubbing the cache, it looks like all errors were corrected Yep. Array should be fine, at last mount all disks were clean, btrfs will show any accumulated errors at mount time (unless you clear them), e.g, this was form the pool: Apr 5 16:17:38 M1171-NAS kernel: BTRFS info (device nvme0n1p1): bdev /dev/sdb1 errs: wr 283405, rd 112295, flush 0, corrupt 0, gen 0 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.