Cache drive IO errors, BTRFS errors and broken Docker containers (SOLVED)


Recommended Posts

I have SSD as a cache drive connected to ASM1061 SATA controller, which connected to motherboard's PCIe2.0x1 slot through PCIe riser. I get errors mostly during nighttime usually after 3-4 days, which also leads to some broken Docker containers (simple docker restart does not work, need to reboot NAS, re-create docker.img and etc.). 

 

I previously asked advice on Reddit and tried to solve it, but apparently it did not fix it.  For example, i got this sort of errors repeating continuously until I restart my NAS. 

Jun 12 04:39:55 BokunoNAS kernel: blk_update_request: I/O error, dev sdc, sector 7410400 op 0x0:(READ) flags 0x0 phys_seg 4 prio class 0
Jun 12 04:39:55 BokunoNAS kernel: BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 39, rd 51253, flush 0, corrupt 0, gen 0
Jun 12 04:39:55 BokunoNAS kernel: BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 39, rd 51254, flush 0, corrupt 0, gen 0
Jun 12 04:39:55 BokunoNAS kernel: BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 39, rd 51255, flush 0, corrupt 0, gen 0
Jun 12 04:39:55 BokunoNAS kernel: BTRFS error (device sdc1): bdev /dev/sdc1 errs: wr 39, rd 51256, flush 0, corrupt 0, gen 0
Jun 12 04:39:55 BokunoNAS kernel: BTRFS warning (device sdc1): direct IO failed ino 29048 rw 0,0 sector 0x7112e8 len 0 err no 10
Jun 12 04:39:55 BokunoNAS kernel: BTRFS warning (device sdc1): direct IO failed ino 29048 rw 0,0 sector 0x7112f0 len 0 err no 10
Jun 12 04:39:55 BokunoNAS kernel: BTRFS warning (device sdc1): direct IO failed ino 29048 rw 0,0 sector 0x7112f8 len 0 err no 10
Jun 12 04:39:55 BokunoNAS kernel: BTRFS warning (device sdc1): direct IO failed ino 29048 rw 0,0 sector 0x711300 len 0 err no 10
Jun 12 04:39:55 BokunoNAS kernel: blk_update_request: I/O error, dev loop2, sector 755936 op 0x0:(READ) flags 0x1000 phys_seg 4 prio class 0
Jun 12 04:39:55 BokunoNAS kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 9, rd 15817, flush 0, corrupt 0, gen 0

I also attached my diagnostics. I will really appreciate help, since this bug just makes me hesitate to fully use my newly built NAS. 

bokunonas-diagnostics-20210612-0424.zip

UPD. "Fix Common Problems" plugin gives:

Quote

 

Unable to write to cache    Drive mounted read-only or completely full. Begin Investigation Here: 

Unable to write to Docker Image     Docker Image either full or corrupted. Investigate Here: 

 

As far as I remember I did not have such problem when I have used motherboard's SATA slots, so I am guessing that something wrong with either SATA controller or even PCIe riser(?)... I could ditch SATA controller for a while, but in the end I would like to have 5 SATA slots minimum (motherboard only has 4). 

Edited by Volkerball
UPD, SOLVED
  • Like 1
Link to comment

I checked connections and even zip-tied PCIe riser connections. Have not tried to use other SATA cable tho (is it a common problem?). I can not check immediately if changing cable helps, since this problem usually occurs in 3-4 days after reboot. It is kinda bizarre that everything works fine for several days and suddenly errors occurs without obvious (at least for me) trigger.  

 

UPD. Turned off, swapped SATA cable, turned on, started array. I have no idea does it solved the problem or not, attaching my diagnostics file anyway. 

bokunonas-diagnostics-20210612-0528.zip

Edited by Volkerball
UPD and attached updated diagnostics
Link to comment

Cache device dropped offline:

 

Jun 12 07:00:22 BokunoNAS kernel: ata3: hard resetting link
Jun 12 07:00:57 BokunoNAS kernel: ata3: softreset failed (1st FIS failed)
Jun 12 07:00:57 BokunoNAS kernel: ata3: limiting SATA link speed to 3.0 Gbps
Jun 12 07:00:57 BokunoNAS kernel: ata3: hard resetting link
Jun 12 07:01:02 BokunoNAS kernel: ata3: softreset failed (1st FIS failed)
Jun 12 07:01:02 BokunoNAS kernel: ata3: reset failed, giving up
Jun 12 07:01:02 BokunoNAS kernel: ata3.00: disabled

 

If this is with a new cable try a different port, make sure you also replace/swap the power cable, if that fails try a new device, if available.

Link to comment

Ok, probably I owe some update. I frankensteined my case (NSC-400 copy, hence no capability to accommodate pcie-cards) to put PCIe-SATA card directly, so I would take PCIe riser out of equation (you never be sure about Chinese engineering). It did not help, so I ordered another ASMedia1061 card with different PCB design (SATA ports directed perpendicularly to MB, not parallel as previous one) just to be sure, and it works fine for almost 48 hours, so I am 90% sure that it solved the problem. Thank you all for suggestions. 

  • Like 1
Link to comment
  • Volkerball changed the title to Cache drive IO errors, BTRFS errors and broken Docker containers (SOLVED)

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.