jmbrnt Posted July 5, 2020 Posted July 5, 2020 Hi Woke up this morning to no DNS (I run unbound on my Unraid server).. Checked and most of my containers had died. Tried to stop and start, no go. Followed a quick Google and tried to remove and re-generate docker.img, which seems to have given me some grace (all containers deleted from GUI, but I can re-add them easily and the data persists...)... However, it seems the problem is one of my cache drives. It's not showing errors on the GUI, but btrfs commands show errors galore. It's a Dell R720, so I don't think it's a SAS cable as many of the other drives would be affected - possibly the SSD has just carked it - it's nearly 4 years old. I have attached diagnostics from prior to a reboot. I also see a lot of these: Quote found 43627390 Jul 5 11:58:15 unraid-2 kernel: repair_io_failure: 7 callbacks suppressed Jul 5 11:58:15 unraid-2 kernel: BTRFS info (device sdj1): read error corrected: ino 0 off 1799880704 (dev /dev/sdi1 sector 1377280) Jul 5 11:58:15 unraid-2 kernel: BTRFS info (device sdj1): read error corrected: ino 0 off 1799884800 (dev /dev/sdi1 sector 1377288) Jul 5 11:58:15 unraid-2 kernel: BTRFS info (device sdj1): read error corrected: ino 0 off 1799888896 (dev /dev/sdi1 sector 1377296) Jul 5 11:58:15 unraid-2 kernel: BTRFS info (device sdj1): read error corrected: ino 0 off 1799892992 (dev /dev/sdi1 The cache is a pool of 2 drives - would I be best to remove /dev/sdi and replace it? Thanks unraid-2-diagnostics-20200705-0845.zip Quote
JorgeB Posted July 5, 2020 Posted July 5, 2020 Syslosg doesn't show the start of the problem but looks like one the cache devices dropped offline, run a correcting scrub and check all errors were corrected, more info here: https://forums.unraid.net/topic/46802-faq-for-unraid-v6/?do=findComment&comment=700582 If the scrub can't fix all errors or there are more issues post after reboot diags. 1 Quote
jmbrnt Posted July 5, 2020 Author Posted July 5, 2020 (edited) Thanks @johnnie.black Quote root@unraid-2:~# btrfs dev stats /mnt/cache [/dev/sdj1].write_io_errs 0 [/dev/sdj1].read_io_errs 0 [/dev/sdj1].flush_io_errs 0 [/dev/sdj1].corruption_errs 0 [/dev/sdj1].generation_errs 0 [/dev/sdi1].write_io_errs 180353467 [/dev/sdi1].read_io_errs 16704619 [/dev/sdi1].flush_io_errs 4373774 [/dev/sdi1].corruption_errs 0 [/dev/sdi1].generation_errs 0 I am now running a scrub, thanks for the tip. Quote UUID: b3f8ccf1-2aa0-4bc6-b798-f79b24fcd561 Scrub started: Sun Jul 5 13:42:06 2020 Status: finished Duration: 0:12:56 Total to scrub: 302.93GiB Rate: 399.75MiB/s Error summary: verify=1068 csum=812422 Corrected: 813490 Uncorrectable: 0 Unverified: 0 Scrub looks OK Edited July 5, 2020 by jmbrnt Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.