Cache read-only and other general disasters


Recommended Posts

Hi

 

Woke up this morning to no DNS (I run unbound on my Unraid server).. Checked and most of my containers had died. Tried to stop and start, no go.

 

Followed a quick Google and tried to remove and re-generate docker.img, which seems to have given me some grace (all containers deleted from GUI, but I can re-add them easily and the data persists...)... However, it seems the problem is one of my cache drives. It's not showing errors on the GUI, but btrfs commands show errors galore.

 

It's a Dell R720, so I don't think it's a SAS cable as many of the other drives would be affected - possibly the SSD has just carked it - it's nearly 4 years old.

 

I have attached diagnostics from prior to a reboot.

 

I also see a lot of these:

 

Quote

found 43627390
Jul 5 11:58:15 unraid-2 kernel: repair_io_failure: 7 callbacks suppressed
Jul 5 11:58:15 unraid-2 kernel: BTRFS info (device sdj1): read error corrected: ino 0 off 1799880704 (dev /dev/sdi1 sector 1377280)
Jul 5 11:58:15 unraid-2 kernel: BTRFS info (device sdj1): read error corrected: ino 0 off 1799884800 (dev /dev/sdi1 sector 1377288)
Jul 5 11:58:15 unraid-2 kernel: BTRFS info (device sdj1): read error corrected: ino 0 off 1799888896 (dev /dev/sdi1 sector 1377296)
Jul 5 11:58:15 unraid-2 kernel: BTRFS info (device sdj1): read error corrected: ino 0 off 1799892992 (dev /dev/sdi1

 

The cache is a pool of 2 drives - would I be best to remove /dev/sdi and replace it?

 

Thanks

unraid-2-diagnostics-20200705-0845.zip

Link to comment

Thanks @johnnie.black

 

Quote

root@unraid-2:~# btrfs dev stats /mnt/cache
[/dev/sdj1].write_io_errs    0
[/dev/sdj1].read_io_errs     0
[/dev/sdj1].flush_io_errs    0
[/dev/sdj1].corruption_errs  0
[/dev/sdj1].generation_errs  0
[/dev/sdi1].write_io_errs    180353467
[/dev/sdi1].read_io_errs     16704619
[/dev/sdi1].flush_io_errs    4373774
[/dev/sdi1].corruption_errs  0
[/dev/sdi1].generation_errs  0

 

I am now running a scrub, thanks for the tip.

 

 

Quote

UUID: b3f8ccf1-2aa0-4bc6-b798-f79b24fcd561 Scrub started: Sun Jul 5 13:42:06 2020 Status: finished Duration: 0:12:56 Total to scrub: 302.93GiB Rate: 399.75MiB/s Error summary: verify=1068 csum=812422 Corrected: 813490 Uncorrectable: 0 Unverified: 0

 

Scrub looks OK

Edited by jmbrnt
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.