Jump to content

Cache drive errors giving "read-only" errors and "Write error" in syslog (BTRFS)


tmchow

Recommended Posts

I started getting errors on my cache drive when I tried copying some files around. Not sure when it started, but it must have been for at least a few days because the cache drive has accumulated 161GB of files on it, and I have the mover scheduled to run every hour.

 

My cache drive is sdb.

 

Here is a sample of sets of errors from my syslog:

 

May 23 00:03:27 Tower kernel: BTRFS warning (device sdb1): csum failed ino 838502 off 1468112896 csum 4102523892 expected csum 682593604 

 

May 23 00:04:43 Tower kernel: BTRFS warning (device sdb1): csum failed ino 832301 off 600309760 csum 2262965905 expected csum 902917322  
May 23 00:04:43 Tower kernel: BTRFS warning (device sdb1): csum failed ino 832301 off 600309760 csum 2262965905 expected csum 902917322  

 

May 24 18:09:52 Tower kernel: loop: Write error at byte offset 3953295360, length 4096.
May 24 18:09:52 Tower kernel: blk_update_request: I/O error, dev loop0, sector 7721280
May 24 18:09:52 Tower kernel: BTRFS: bdev /dev/loop0 errs: wr 75, rd 0, flush 0, corrupt 0, gen 0
May 24 18:09:52 Tower kernel: loop: Write error at byte offset 3953307648, length 4096.
May 24 18:09:52 Tower kernel: blk_update_request: I/O error, dev loop0, sector 7721304
May 24 18:09:52 Tower kernel: BTRFS: bdev /dev/loop0 errs: wr 76, rd 0, flush 0, corrupt 0, gen 0
May 24 18:09:57 Tower kernel: loop: Write error at byte offset 3754209280, length 4096.  
May 24 18:09:57 Tower kernel: blk_update_request: I/O error, dev loop0, sector 7332440  
May 24 18:09:57 Tower kernel: BTRFS: bdev /dev/loop0 errs: wr 77, rd 0, flush 0, corrupt 0, gen 0  
May 24 18:09:57 Tower kernel: loop: Write error at byte offset 3954819072, length 4096.  
May 24 18:09:57 Tower kernel: blk_update_request: I/O error, dev loop0, sector 7724256  
May 24 18:09:57 Tower kernel: BTRFS: bdev /dev/loop0 errs: wr 78, rd 0, flush 0, corrupt 0, gen 0  
May 24 18:09:58 Tower logger: find: cannot delete `./media-tv/tv-kids/Kate and Mim/Season 1/Kate.a
ring.Race.720p.iP.WEBRip.AAC2.0.H.264-SynHD.mkv': Read-only file system

 

and

 

May 24 18:28:32 Tower kernel: loop: Write error at byte offset 3953307648, length 4096.  
May 24 18:28:32 Tower kernel: blk_update_request: I/O error, dev loop0, sector 7721304  
May 24 18:28:32 Tower kernel: BTRFS: bdev /dev/loop0 errs: wr 204, rd 0, flush 0, corrupt 0, gen 0  
May 24 18:28:37 Tower kernel: loop: Write error at byte offset 3754209280, length 4096.  
May 24 18:28:37 Tower kernel: blk_update_request: I/O error, dev loop0, sector 7332440  
May 24 18:28:37 Tower kernel: BTRFS: bdev /dev/loop0 errs: wr 205, rd 0, flush 0, corrupt 0, gen 0  
May 24 18:28:37 Tower kernel: loop: Write error at byte offset 3954819072, length 4096.  
May 24 18:28:37 Tower kernel: blk_update_request: I/O error, dev loop0, sector 7724256  
May 24 18:28:37 Tower kernel: BTRFS: bdev /dev/loop0 errs: wr 206, rd 0, flush 0, corrupt 0, gen 0  
May 24 18:29:07 Tower kernel: loop: Write error at byte offset 3953295360, length 4096.  
May 24 18:29:07 Tower kernel: blk_update_request: I/O error, dev loop0, sector 7721280  
May 24 18:29:07 Tower kernel: BTRFS: bdev /dev/loop0 errs: wr 207, rd 0, flush 0, corrupt 0, gen 0  
May 24 18:29:07 Tower kernel: loop: Write error at byte offset 3953307648, length 4096.  
May 24 18:29:07 Tower kernel: blk_update_request: I/O error, dev loop0, sector 7721304  
May 24 18:29:07 Tower kernel: BTRFS: bdev /dev/loop0 errs: wr 208, rd 0, flush 0, corrupt 0, gen 0  
May 24 18:29:12 Tower kernel: loop: Write error at byte offset 3754209280, length 4096.  
May 24 18:29:12 Tower kernel: blk_update_request: I/O error, dev loop0, sector 7332440  
May 24 18:29:12 Tower kernel: BTRFS: bdev /dev/loop0 errs: wr 209, rd 0, flush 0, corrupt 0, gen 0  
May 24 18:29:12 Tower kernel: loop: Write error at byte offset 3953295360, length 4096.  
May 24 18:29:12 Tower kernel: blk_update_request: I/O error, dev loop0, sector 7721280  
May 24 18:29:12 Tower kernel: BTRFS: bdev /dev/loop0 errs: wr 210, rd 0, flush 0, corrupt 0, gen 0  
May 24 18:29:42 Tower kernel: loop: Write error at byte offset 3953295360, length 4096.  
May 24 18:29:42 Tower kernel: blk_update_request: I/O error, dev loop0, sector 7721280  
May 24 18:29:42 Tower kernel: BTRFS: bdev /dev/loop0 errs: wr 211, rd 0, flush 0, corrupt 0, gen 0  
May 24 18:29:42 Tower kernel: loop: Write error at byte offset 3953307648, length 4096.  
May 24 18:29:42 Tower kernel: blk_update_request: I/O error, dev loop0, sector 7721304  
May 24 18:29:42 Tower kernel: BTRFS: bdev /dev/loop0 errs: wr 212, rd 0, flush 0, corrupt 0, gen 0  
May 24 18:29:47 Tower kernel: loop: Write error at byte offset 3754209280, length 4096.  
May 24 18:29:47 Tower kernel: blk_update_request: I/O error, dev loop0, sector 7332440  
May 24 18:29:47 Tower kernel: BTRFS: bdev /dev/loop0 errs: wr 213, rd 0, flush 0, corrupt 0, gen 0  
May 24 18:29:47 Tower kernel: loop: Write error at byte offset 3954819072, length 4096.  
May 24 18:29:47 Tower kernel: blk_update_request: I/O error, dev loop0, sector 7724256  
May 24 18:29:47 Tower kernel: BTRFS: bdev /dev/loop0 errs: wr 214, rd 0, flush 0, corrupt 0, gen 0  
May 24 18:30:17 Tower kernel: loop: Write error at byte offset 3953295360, length 4096.
May 24 18:30:17 Tower kernel: blk_update_request: I/O error, dev loop0, sector 7721280
May 24 18:30:17 Tower kernel: BTRFS: bdev /dev/loop0 errs: wr 215, rd 0, flush 0, corrupt 0, gen 0
May 24 18:30:17 Tower kernel: loop: Write error at byte offset 3953307648, length 4096.
May 24 18:30:17 Tower kernel: blk_update_request: I/O error, dev loop0, sector 7721304
May 24 18:30:17 Tower kernel: BTRFS: bdev /dev/loop0 errs: wr 216, rd 0, flush 0, corrupt 0, gen 0

 

In other places, I see the mover was successfully able to move files it seems. For example:

 

May 23 08:00:01 Tower logger: ./media-tv/tv/Fear the Walking Dead/Season 2/Fear.the.Walking.Dead.S02E07.Shiva.WEBDL-1080p.mkv
May 23 08:00:01 Tower logger: .d..t...... ./
May 23 08:00:01 Tower logger: .d..t...... media-tv/
May 23 08:00:01 Tower logger: .d..t...... media-tv/tv/
May 23 08:00:01 Tower logger: .d..t...... media-tv/tv/Fear the Walking Dead/
May 23 08:00:01 Tower logger: >f+++++++++ media-tv/tv/Fear the Walking Dead/Season 2/Fear.the.Walking.Dead.S02E07.Shiva.WEBDL-1080p.mkv
May 23 08:00:01 Tower logger: 19649 ?        S      0:00 /bin/bash /usr/local/sbin/mover
May 23 08:00:01 Tower logger: mover already running
May 23 08:00:01 Tower logger: 19649 ?        S      0:00 /bin/bash /usr/local/sbin/mover
May 23 08:00:01 Tower logger: mover already running
May 23 08:00:51 Tower logger: ./media-tv/tv/Fear the Walking Dead/Season 2
May 23 08:00:51 Tower logger: .d..t...... media-tv/tv/Fear the Walking Dead/Season 2/
May 23 08:00:51 Tower logger: ./media-tv/tv/Fear the Walking Dead
May 23 08:00:51 Tower logger: .d..t...... media-tv/tv/Fear the Walking Dead/
May 23 08:00:51 Tower logger: ./media-tv/tv
May 23 08:00:51 Tower logger: .d..t...... media-tv/tv/
May 23 08:00:51 Tower logger: ./media-tv/
May 23 08:00:51 Tower logger: .d..t...... media-tv/
May 23 08:00:51 Tower logger: mover finished

 

Update: This got a bit better after I did a reboot.. i found some old thread in the archives that suggested this.  The files that couldn't be moved/copied were suddenly abel to be copied.  However, I'm still periodically getting errors like this in the syslog:

 

May 24 21:31:50 Tower kernel: BTRFS warning (device sdb1): csum failed ino 857048 off 4763451392 csum 2445600135 expected csum 2319853485
May 24 21:31:50 Tower kernel: BTRFS warning (device sdb1): csum failed ino 857048 off 4763451392 csum 2445600135 expected csum 2319853485
May 24 21:31:50 Tower kernel: BTRFS warning (device sdb1): csum failed ino 857048 off 4763451392 csum 2445600135 expected csum 2319853485
May 24 21:31:50 Tower kernel: BTRFS warning (device sdb1): csum failed ino 857048 off 4763451392 csum 2445600135 expected csum 2319853485
May 24 21:31:50 Tower kernel: BTRFS warning (device sdb1): csum failed ino 857048 off 4763451392 csum 2445600135 expected csum 2319853485
May 24 21:31:50 Tower kernel: BTRFS warning (device sdb1): csum failed ino 857048 off 4763451392 csum 2445600135 expected csum 2319853485
May 24 21:32:48 Tower logger: rsync: read errors mapping "/mnt/cache/media-movies/movies/some-movie/movie.mkv": Input/output error (5)
May 24 21:32:48 Tower logger: ERROR: media-movies/movies/some-movie/movie.mkv failed verification -- update retained.
May 24 21:32:48 Tower logger: rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1165) [sender=3.1.0]

 

Looks like a corruption for some reason on the cache drive but only for some files?

 

Questions:

1. is this a common problem?

2. How do I resolve it?

3. Will I lose all the files on the cache drive?

Link to comment

You have some corruption in the file system, for which the Check Disk File systems wiki page was written.  Unfortunately, the repair tools for BTRFS and XFS aren't considered very good, so I can't guarantee that the btrfs scrub command will completely repair your cache drive.  The BTRFS section of that page has additional instructions for redoing the drive, if needed.

 

Is this common?  Well, it's certainly happening more often than we'd like, and a number of users have switched to XFS instead.  BTRFS *should* work though.

 

Will you lose files?  Hopefully not, but I do recommend copying off everything that you can.

Link to comment

 

 

...so I can't guarantee that the btrfs scrub command will completely repair your cache drive.

 

Scrub can only fix errors on a mirrored cache pool, I believe that's not on the wiki, using a single device it will identify errors but it can't fix them.

 

I did not know that!  Which means it's even more useless than I thought!

 

Edit: I can't find any corroboration for that.  Can you point me to something definitive?

Link to comment

 

 

...so I can't guarantee that the btrfs scrub command will completely repair your cache drive.

 

Scrub can only fix errors on a mirrored cache pool, I believe that's not on the wiki, using a single device it will identify errors but it can't fix them.

 

I did not know that!  Which means it's even more useless than I thought!

 

Edit: I can't find any corroboration for that.  Can you point me to something definitive?

https://lime-technology.com/forum/index.php?topic=44400.msg424619#msg424619

 

End of the post.

Link to comment

 

 

...so I can't guarantee that the btrfs scrub command will completely repair your cache drive.

 

Scrub can only fix errors on a mirrored cache pool, I believe that's not on the wiki, using a single device it will identify errors but it can't fix them.

 

That's because Scrub isn't actually the same thing as check disk. There is a BTRFS check disk command "btrfsck."

 

But before that you should try btrfs restore

Link to comment

This thread is really eye opening.

 

Sounds like for single cache drive I shouldn't be using btrfs and instead using XFS?

 

Or even better sound like I should be using a pooled cache drive?

 

 

Sent from my iPhone using Tapatalk

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...