sparkus Posted May 18, 2017 Share Posted May 18, 2017 So I haven't been checking the system log, and I just did, and I found that this error has been happening for about a month or so. I read some of the file system repair tools, but I'm not sure how to figure out which device is md1. I'm gathering I should do a BTRFS scrub on this, but any other advice on how to possibly solve would be awesome sauce. Apr 17 12:14:41 Tower kernel: BTRFS critical (device md1): corrupt leaf, slot offset bad: block=746107731968, root=1, slot=6 tower-diagnostics-20170518-1404.zip Quote Link to comment
JorgeB Posted May 18, 2017 Share Posted May 18, 2017 md1 is disk1, make sure you have backups of anything important on disk1, btrfs fsck is getting better all the time but sometimes it can still make things worse. Also before running it run: btrfs dev stats /dev/md1 Post output if it doesn't come up all zeros. 1 Quote Link to comment
SSD Posted May 18, 2017 Share Posted May 18, 2017 @johnnie.black - Is this you?? Looks like 2 leaves are corrupt on this BTRFS tree! 1 Quote Link to comment
sparkus Posted May 18, 2017 Author Share Posted May 18, 2017 [/dev/md1].write_io_errs 0 [/dev/md1].read_io_errs 0 [/dev/md1].flush_io_errs 0 [/dev/md1].corruption_errs 0 [/dev/md1].generation_errs 0 Running scrub now. Quote Link to comment
JorgeB Posted May 18, 2017 Share Posted May 18, 2017 Scrubbing won't help here, you need btrfs check, you can use the GUI, it's below scrub, don't forget the backups though. 1 Quote Link to comment
sparkus Posted May 18, 2017 Author Share Posted May 18, 2017 I have a parity drive, are you insinuating I need a better backup than that? Quote Link to comment
JorgeB Posted May 18, 2017 Share Posted May 18, 2017 Since parity is not a backup, definitely yes. 1 Quote Link to comment
JonathanM Posted May 18, 2017 Share Posted May 18, 2017 6 minutes ago, sparkus said: I have a parity drive, are you insinuating I need a better backup than that? Parity is in NO WAY a backup. All operations on the drive including those that corrupted the filesystem are faithfully reproduced realtime in parity. If you pull the physical drive and rebuild from parity, it will put back exactly the same corruption. Parity is there to allow you to replicate completely what is on a missing drive, corrupt or not. 1 Quote Link to comment
sparkus Posted May 18, 2017 Author Share Posted May 18, 2017 Thanks for the info. What's the best way to backup an individual drive then? Quote Link to comment
SSD Posted May 19, 2017 Share Posted May 19, 2017 3 hours ago, sparkus said: Thanks for the info. What's the best way to backup an individual drive then? Depends where you want to back it up to. Can be as easy as copying the data over the network to a different machine. Could also install a drive in the unRAID server, mount it outside the array, and copy the data to it (that would be the fastest). If you only want to backup unique works, like pictures, home movies, etc. you can backup to an online service. Gridrunner (a.k.a., SpaceInvader One) had what looked like a pretty good video on how to configure that directly from unRAID. I haven't tried it myself but it is on my list. You could backup a whole drive online, but uploading could take months, depending on your upload speed. At some point it is just not practical to use online storage for backups even if they are "unlimited". Non-unique works recovery should be possible using commercial sources (re-rip, re-download, re-record, etc.) so backups may not be as critical, but could take a very long time should you experience data loss. It is a complex formula of risk, cost, ability to recovery, time to restore, and criticality of data that drives the "what to backup" decision that each user has to evaluate for himself. Although unRAID is not a backup, the parity protection features of unRAID do impact the risk of data loss, and may weigh on your decision. For example, you might say that it would cost $1000 in drives to backup your array. If the risk is 10% of data loss, and recovery of data would take you a month and cost you $5000, you might say that it is worth the $1000 to avoid the risk. But with unRAID, if you say that the risk of data loss is only 2%, you might say it is not worth $1000. That you'd take the 2% risk of spending 5x more. But for anything that is unrecoverable and has a high value to you, but all means back that up, preferably to multiple sources including an offsite option. 1 Quote Link to comment
sparkus Posted May 19, 2017 Author Share Posted May 19, 2017 Thank you all for answering my questions. If I just have a btrfs error on one drive can I just back up the one drive worth of data for the recovery? Or should I have a total backup of all the information? Offsite Total backup at my upload speed would take a couple of years, but a single drive I could probably afford to do the backup of. Quote Link to comment
JorgeB Posted May 19, 2017 Share Posted May 19, 2017 For this case you'd only need to backup disk1, just in case something goes wrong, most likely btrfs check can repair the filesystem and even if it doesn't and the disk becomes unmountable there are other tools to try and recover the data, just wanted to make sure you understand that there are risks before proceeding and you should always have backups of any irreplaceable data. Quote Link to comment
sparkus Posted May 27, 2017 Author Share Posted May 27, 2017 so I mounted a backup device (this time with xfs) and backed up the drive in question. Started the array in maintenance mode, then used check --readonly Produced this result: checking extents incorrect offsets 15996 43 bad block 746107731968 Errors found in extent allocation tree or chunk allocation incorrect offsets 15996 43 Checking filesystem on /dev/md then I ran check --repair Produced the following result. checking extents incorrect offsets 15996 43 Fixed 0 roots. checking free space cache checking fs roots checking csums checking root refs enabling repair mode Checking filesystem on /dev/md1 UUID: ebc5a243-841f-47d6-890a-08aed9f8fd23 Shifting item nr 7 by 15953 bytes in block 745197944832 cache and super generation don't match, space cache will be invalidated found 2019719933952 bytes used err is 0 total csum bytes: 1969407612 total tree bytes: 2099068928 total fs tree bytes: 17039360 total extent tree bytes: 18350080 btree space waste bytes: 71212365 file data blocks allocated: 2017638678528 referenced 2017602125824 I don't know what most of this means, so I ran the check --readonly Produced this result: checking extents checking free space cache checking fs roots checking csums checking root refs Checking filesystem on /dev/md1 UUID: ebc5a243-841f-47d6-890a-08aed9f8fd23 cache and super generation don't match, space cache will be invalidated found 2019273338880 bytes used err is 0 total csum bytes: 1969407612 total tree bytes: 2098462720 total fs tree bytes: 17039360 total extent tree bytes: 18350080 btree space waste bytes: 71171756 file data blocks allocated: 2017192771584 referenced 2017156218880 I took this as a good sign, and am running a parity check now. Thanks for all the help. I'm just documenting this in case someone else stumbles upon it later. Quote Link to comment
JorgeB Posted May 27, 2017 Share Posted May 27, 2017 Seems fixed, it's a good idea to run a scrub to check all checksums are OK and run check --readonly one more time in a week or so to confirm no more issues. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.