tmchow Posted May 25, 2016 Share Posted May 25, 2016 I started getting errors on my cache drive when I tried copying some files around. Not sure when it started, but it must have been for at least a few days because the cache drive has accumulated 161GB of files on it, and I have the mover scheduled to run every hour. My cache drive is sdb. Here is a sample of sets of errors from my syslog: May 23 00:03:27 Tower kernel: BTRFS warning (device sdb1): csum failed ino 838502 off 1468112896 csum 4102523892 expected csum 682593604 May 23 00:04:43 Tower kernel: BTRFS warning (device sdb1): csum failed ino 832301 off 600309760 csum 2262965905 expected csum 902917322 May 23 00:04:43 Tower kernel: BTRFS warning (device sdb1): csum failed ino 832301 off 600309760 csum 2262965905 expected csum 902917322 May 24 18:09:52 Tower kernel: loop: Write error at byte offset 3953295360, length 4096. May 24 18:09:52 Tower kernel: blk_update_request: I/O error, dev loop0, sector 7721280 May 24 18:09:52 Tower kernel: BTRFS: bdev /dev/loop0 errs: wr 75, rd 0, flush 0, corrupt 0, gen 0 May 24 18:09:52 Tower kernel: loop: Write error at byte offset 3953307648, length 4096. May 24 18:09:52 Tower kernel: blk_update_request: I/O error, dev loop0, sector 7721304 May 24 18:09:52 Tower kernel: BTRFS: bdev /dev/loop0 errs: wr 76, rd 0, flush 0, corrupt 0, gen 0 May 24 18:09:57 Tower kernel: loop: Write error at byte offset 3754209280, length 4096. May 24 18:09:57 Tower kernel: blk_update_request: I/O error, dev loop0, sector 7332440 May 24 18:09:57 Tower kernel: BTRFS: bdev /dev/loop0 errs: wr 77, rd 0, flush 0, corrupt 0, gen 0 May 24 18:09:57 Tower kernel: loop: Write error at byte offset 3954819072, length 4096. May 24 18:09:57 Tower kernel: blk_update_request: I/O error, dev loop0, sector 7724256 May 24 18:09:57 Tower kernel: BTRFS: bdev /dev/loop0 errs: wr 78, rd 0, flush 0, corrupt 0, gen 0 May 24 18:09:58 Tower logger: find: cannot delete `./media-tv/tv-kids/Kate and Mim/Season 1/Kate.a ring.Race.720p.iP.WEBRip.AAC2.0.H.264-SynHD.mkv': Read-only file system and May 24 18:28:32 Tower kernel: loop: Write error at byte offset 3953307648, length 4096. May 24 18:28:32 Tower kernel: blk_update_request: I/O error, dev loop0, sector 7721304 May 24 18:28:32 Tower kernel: BTRFS: bdev /dev/loop0 errs: wr 204, rd 0, flush 0, corrupt 0, gen 0 May 24 18:28:37 Tower kernel: loop: Write error at byte offset 3754209280, length 4096. May 24 18:28:37 Tower kernel: blk_update_request: I/O error, dev loop0, sector 7332440 May 24 18:28:37 Tower kernel: BTRFS: bdev /dev/loop0 errs: wr 205, rd 0, flush 0, corrupt 0, gen 0 May 24 18:28:37 Tower kernel: loop: Write error at byte offset 3954819072, length 4096. May 24 18:28:37 Tower kernel: blk_update_request: I/O error, dev loop0, sector 7724256 May 24 18:28:37 Tower kernel: BTRFS: bdev /dev/loop0 errs: wr 206, rd 0, flush 0, corrupt 0, gen 0 May 24 18:29:07 Tower kernel: loop: Write error at byte offset 3953295360, length 4096. May 24 18:29:07 Tower kernel: blk_update_request: I/O error, dev loop0, sector 7721280 May 24 18:29:07 Tower kernel: BTRFS: bdev /dev/loop0 errs: wr 207, rd 0, flush 0, corrupt 0, gen 0 May 24 18:29:07 Tower kernel: loop: Write error at byte offset 3953307648, length 4096. May 24 18:29:07 Tower kernel: blk_update_request: I/O error, dev loop0, sector 7721304 May 24 18:29:07 Tower kernel: BTRFS: bdev /dev/loop0 errs: wr 208, rd 0, flush 0, corrupt 0, gen 0 May 24 18:29:12 Tower kernel: loop: Write error at byte offset 3754209280, length 4096. May 24 18:29:12 Tower kernel: blk_update_request: I/O error, dev loop0, sector 7332440 May 24 18:29:12 Tower kernel: BTRFS: bdev /dev/loop0 errs: wr 209, rd 0, flush 0, corrupt 0, gen 0 May 24 18:29:12 Tower kernel: loop: Write error at byte offset 3953295360, length 4096. May 24 18:29:12 Tower kernel: blk_update_request: I/O error, dev loop0, sector 7721280 May 24 18:29:12 Tower kernel: BTRFS: bdev /dev/loop0 errs: wr 210, rd 0, flush 0, corrupt 0, gen 0 May 24 18:29:42 Tower kernel: loop: Write error at byte offset 3953295360, length 4096. May 24 18:29:42 Tower kernel: blk_update_request: I/O error, dev loop0, sector 7721280 May 24 18:29:42 Tower kernel: BTRFS: bdev /dev/loop0 errs: wr 211, rd 0, flush 0, corrupt 0, gen 0 May 24 18:29:42 Tower kernel: loop: Write error at byte offset 3953307648, length 4096. May 24 18:29:42 Tower kernel: blk_update_request: I/O error, dev loop0, sector 7721304 May 24 18:29:42 Tower kernel: BTRFS: bdev /dev/loop0 errs: wr 212, rd 0, flush 0, corrupt 0, gen 0 May 24 18:29:47 Tower kernel: loop: Write error at byte offset 3754209280, length 4096. May 24 18:29:47 Tower kernel: blk_update_request: I/O error, dev loop0, sector 7332440 May 24 18:29:47 Tower kernel: BTRFS: bdev /dev/loop0 errs: wr 213, rd 0, flush 0, corrupt 0, gen 0 May 24 18:29:47 Tower kernel: loop: Write error at byte offset 3954819072, length 4096. May 24 18:29:47 Tower kernel: blk_update_request: I/O error, dev loop0, sector 7724256 May 24 18:29:47 Tower kernel: BTRFS: bdev /dev/loop0 errs: wr 214, rd 0, flush 0, corrupt 0, gen 0 May 24 18:30:17 Tower kernel: loop: Write error at byte offset 3953295360, length 4096. May 24 18:30:17 Tower kernel: blk_update_request: I/O error, dev loop0, sector 7721280 May 24 18:30:17 Tower kernel: BTRFS: bdev /dev/loop0 errs: wr 215, rd 0, flush 0, corrupt 0, gen 0 May 24 18:30:17 Tower kernel: loop: Write error at byte offset 3953307648, length 4096. May 24 18:30:17 Tower kernel: blk_update_request: I/O error, dev loop0, sector 7721304 May 24 18:30:17 Tower kernel: BTRFS: bdev /dev/loop0 errs: wr 216, rd 0, flush 0, corrupt 0, gen 0 In other places, I see the mover was successfully able to move files it seems. For example: May 23 08:00:01 Tower logger: ./media-tv/tv/Fear the Walking Dead/Season 2/Fear.the.Walking.Dead.S02E07.Shiva.WEBDL-1080p.mkv May 23 08:00:01 Tower logger: .d..t...... ./ May 23 08:00:01 Tower logger: .d..t...... media-tv/ May 23 08:00:01 Tower logger: .d..t...... media-tv/tv/ May 23 08:00:01 Tower logger: .d..t...... media-tv/tv/Fear the Walking Dead/ May 23 08:00:01 Tower logger: >f+++++++++ media-tv/tv/Fear the Walking Dead/Season 2/Fear.the.Walking.Dead.S02E07.Shiva.WEBDL-1080p.mkv May 23 08:00:01 Tower logger: 19649 ? S 0:00 /bin/bash /usr/local/sbin/mover May 23 08:00:01 Tower logger: mover already running May 23 08:00:01 Tower logger: 19649 ? S 0:00 /bin/bash /usr/local/sbin/mover May 23 08:00:01 Tower logger: mover already running May 23 08:00:51 Tower logger: ./media-tv/tv/Fear the Walking Dead/Season 2 May 23 08:00:51 Tower logger: .d..t...... media-tv/tv/Fear the Walking Dead/Season 2/ May 23 08:00:51 Tower logger: ./media-tv/tv/Fear the Walking Dead May 23 08:00:51 Tower logger: .d..t...... media-tv/tv/Fear the Walking Dead/ May 23 08:00:51 Tower logger: ./media-tv/tv May 23 08:00:51 Tower logger: .d..t...... media-tv/tv/ May 23 08:00:51 Tower logger: ./media-tv/ May 23 08:00:51 Tower logger: .d..t...... media-tv/ May 23 08:00:51 Tower logger: mover finished Update: This got a bit better after I did a reboot.. i found some old thread in the archives that suggested this. The files that couldn't be moved/copied were suddenly abel to be copied. However, I'm still periodically getting errors like this in the syslog: May 24 21:31:50 Tower kernel: BTRFS warning (device sdb1): csum failed ino 857048 off 4763451392 csum 2445600135 expected csum 2319853485 May 24 21:31:50 Tower kernel: BTRFS warning (device sdb1): csum failed ino 857048 off 4763451392 csum 2445600135 expected csum 2319853485 May 24 21:31:50 Tower kernel: BTRFS warning (device sdb1): csum failed ino 857048 off 4763451392 csum 2445600135 expected csum 2319853485 May 24 21:31:50 Tower kernel: BTRFS warning (device sdb1): csum failed ino 857048 off 4763451392 csum 2445600135 expected csum 2319853485 May 24 21:31:50 Tower kernel: BTRFS warning (device sdb1): csum failed ino 857048 off 4763451392 csum 2445600135 expected csum 2319853485 May 24 21:31:50 Tower kernel: BTRFS warning (device sdb1): csum failed ino 857048 off 4763451392 csum 2445600135 expected csum 2319853485 May 24 21:32:48 Tower logger: rsync: read errors mapping "/mnt/cache/media-movies/movies/some-movie/movie.mkv": Input/output error (5) May 24 21:32:48 Tower logger: ERROR: media-movies/movies/some-movie/movie.mkv failed verification -- update retained. May 24 21:32:48 Tower logger: rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1165) [sender=3.1.0] Looks like a corruption for some reason on the cache drive but only for some files? Questions: 1. is this a common problem? 2. How do I resolve it? 3. Will I lose all the files on the cache drive? Link to comment
RobJ Posted May 25, 2016 Share Posted May 25, 2016 You have some corruption in the file system, for which the Check Disk File systems wiki page was written. Unfortunately, the repair tools for BTRFS and XFS aren't considered very good, so I can't guarantee that the btrfs scrub command will completely repair your cache drive. The BTRFS section of that page has additional instructions for redoing the drive, if needed. Is this common? Well, it's certainly happening more often than we'd like, and a number of users have switched to XFS instead. BTRFS *should* work though. Will you lose files? Hopefully not, but I do recommend copying off everything that you can. Link to comment
JorgeB Posted May 25, 2016 Share Posted May 25, 2016 ...so I can't guarantee that the btrfs scrub command will completely repair your cache drive. Scrub can only fix errors on a mirrored cache pool, I believe that's not on the wiki, using a single device it will identify errors but it can't fix them. Link to comment
tmchow Posted May 25, 2016 Author Share Posted May 25, 2016 So could I copy all files off cache to array (non cache share), stop cache drive, reformat, add drive back as cache and copy files back? Sent from my iPhone using Tapatalk Link to comment
RobJ Posted May 25, 2016 Share Posted May 25, 2016 ...so I can't guarantee that the btrfs scrub command will completely repair your cache drive. Scrub can only fix errors on a mirrored cache pool, I believe that's not on the wiki, using a single device it will identify errors but it can't fix them. I did not know that! Which means it's even more useless than I thought! Edit: I can't find any corroboration for that. Can you point me to something definitive? Link to comment
RobJ Posted May 25, 2016 Share Posted May 25, 2016 So could I copy all files off cache to array (non cache share), stop cache drive, reformat, add drive back as cache and copy files back? Yes, see Redoing a drive formatted with BTRFS Link to comment
JorgeB Posted May 25, 2016 Share Posted May 25, 2016 ...so I can't guarantee that the btrfs scrub command will completely repair your cache drive. Scrub can only fix errors on a mirrored cache pool, I believe that's not on the wiki, using a single device it will identify errors but it can't fix them. I did not know that! Which means it's even more useless than I thought! Edit: I can't find any corroboration for that. Can you point me to something definitive? https://lime-technology.com/forum/index.php?topic=44400.msg424619#msg424619 End of the post. Link to comment
RobJ Posted May 25, 2016 Share Posted May 25, 2016 https://lime-technology.com/forum/index.php?topic=44400.msg424619#msg424619 No idea how I missed that discussion! Thank you. Rather disheartening though. I've responded there concerning this. I don't know how I can recommend BTRFS for single disk use! I'll certainly need to update the wiki. Link to comment
gundamguy Posted May 25, 2016 Share Posted May 25, 2016 ...so I can't guarantee that the btrfs scrub command will completely repair your cache drive. Scrub can only fix errors on a mirrored cache pool, I believe that's not on the wiki, using a single device it will identify errors but it can't fix them. That's because Scrub isn't actually the same thing as check disk. There is a BTRFS check disk command "btrfsck." But before that you should try btrfs restore Link to comment
tmchow Posted May 26, 2016 Author Share Posted May 26, 2016 This thread is really eye opening. Sounds like for single cache drive I shouldn't be using btrfs and instead using XFS? Or even better sound like I should be using a pooled cache drive? Sent from my iPhone using Tapatalk Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.