ryoko227 Posted September 27, 2018 Share Posted September 27, 2018 This and a multiple varients of this error are getting spammed across my system log since updating to 6.6.0 BTRFS warning (device sdb1): csum failed root 5 ino 274 off 62905892864 csum 0x2d8eafc5 expected csum 0xea417956 mirror 1 The device in question is a singular SSD mounted outide the array via the go file at startup. Running stats on the device results in this [/dev/sdb1].write_io_errs 0 [/dev/sdb1].read_io_errs 0 [/dev/sdb1].flush_io_errs 0 [/dev/sdb1].corruption_errs 0 [/dev/sdb1].generation_errs 0 Scrubbing the device also gives 0 errors. scrub started at Thu Sep 27 14:24:13 2018 and finished after 00:05:14 total bytes scrubbed: 153.88GiB with 0 errors I have copied all of the data off and reformated the device. (This did have the added benefit of UD plug-in now showing the: FS, temp, and capacity). However, the errors continue. I've read quite a few posts related to this, but still haven't found a solution on my own. I admittedly don't know enough about the btrfs file system to understand which file(s)/metadata are causing the error. Any help would be greatly appreciated. Thank you in advance o/ yes-mediaserver-diagnostics-20180927-1436.zip Quote Link to comment
JorgeB Posted September 27, 2018 Share Posted September 27, 2018 Those are checksum errors and if you keep getting them after reformatting there's likely a hardware problem, like bad RAM. 2 Quote Link to comment
ryoko227 Posted September 28, 2018 Author Share Posted September 28, 2018 (edited) Should I try running memtest or something to troubleshoot it further? I'm thinking that might help with trying to sort out which piece of hardware might be causing it. I'm also going to check and see if there is a new BIOS for this specific motherboard as well. Thank you as always for your help johnnie! EDIT- Updated the BIOS as it specifically had increased memory compatiblity listed. Also took the time to clean the dust out and reseat everything. Still popping the error. I'll run memtest on it later when people aren't using it and see if that identifies/finds the issue. Edited September 28, 2018 by ryoko227 Added more information Quote Link to comment
JorgeB Posted September 28, 2018 Share Posted September 28, 2018 You should run memtest, note that exiting errors can't be fixed, you'll need to reformat or replace all the corrupt files. 1 Quote Link to comment
ryoko227 Posted October 23, 2018 Author Share Posted October 23, 2018 (edited) Just a follow up to this, since I thought it was weird and also so when I forget what I did I can check back later, www So I never got around to running the memtest, partially out of laziness, mostly out of not wanting to stay after hours. I noticed yesterday, that the mounting point for the drive no longer seemed to have any files or folders in it. So I tried copying the backups over just to see, and it said the disk was full... I decided I would try reformating the drive (again just to see) and pulled the mount commands from the go file and rebooted. Drive showed up totally fine in Unassigned Devices, could mount with no issues, and all the files and folders were still there. So, I set it to automount, changed my VM settings to point to this new mounting point, and profit. EDIT - Even with mutliple reboots nothing has changed... weird Everything seems to be running fine and I have had 0 checksum errors since. Keeping in mind, I did the 6.6.3 update right before all of this. So TBH, I don't know what "fixed" it, or even what was ultimately wrong with it. I know I should still run the mem test to verify Edited October 23, 2018 by ryoko227 Added more information 1 Quote Link to comment
Marshalleq Posted February 25, 2020 Share Posted February 25, 2020 Hey, did you ever find out anything further? I've just started getting them too. Personally I suspect it's a typical BTRFS issue - I've heard more than one person say BTRFS often corrupts. But hey, what do I know. It's only BTRFS complaining though, not XFS and not ZFS. Quote Link to comment
JorgeB Posted February 25, 2020 Share Posted February 25, 2020 7 hours ago, Marshalleq said: I've heard more than one person say BTRFS often corrupts. I won't argue that btrfs doesn't have its bugs, but I've been using it for a long time as well as following the development on the mailing list and never heard of any data checksum related bug, that feature is pretty much bullet proof, i.e., if there's a checksum error data doesn't match the checksum stored at write time, this happens most often in Unraid with raid based pools when one of the members dropped offline and then comes back online, the old data will be stale and fail checksums, scrub will bring up to date, but if this is happening on a single device filesystem then you can be pretty sure data corruption occurred, or there's a hardware problem, like bad RAM. 7 hours ago, Marshalleq said: It's only BTRFS complaining though, not XFS Well, XFS would never complain, since it doesn't ckecskum data, it will happy feed you corrupted data. I also use ZFS for a couple of servers, no doubt more stable than btrfs, but it's not perfect, and not as flexible. Quote Link to comment
Marshalleq Posted February 25, 2020 Share Posted February 25, 2020 4 hours ago, johnnie.black said: I won't argue that btrfs doesn't have its bugs, if this is happening on a single device filesystem then you can be pretty sure data corruption occurred, or there's a hardware problem, like bad RAM. It's a BTRFS mirror. I wouldn't have chosen BTRFS if I had any other choice. Of the space of the few years I've been using it I've had 3 maybe 4 issues, all different. I think one other where I needed to format and start again, and two related to the mirror not being created as advertised - that's probably an unraid bug more than a btrfs bug though. 4 hours ago, johnnie.black said: Well, XFS would never complain, since it doesn't ckecskum data, it will happy feed you corrupted data. The context was provided to indicate it wasn't likely a whole system issue impacting disks as one other person on here seemed to think when I dug back into the archives. I'll just invoke the mover and reformat it. I'm half tempted to get rid of the mirror altogether as I think a single disk will have less issues than BTRFS will. But provided it doesn't actually corrupt my data, I'll keep the mirror. Though the jury's out on that one. Quote Link to comment
JorgeB Posted February 25, 2020 Share Posted February 25, 2020 57 minutes ago, Marshalleq said: It's a BTRFS mirror. Then and like mentioned the most likely reason for checksum errors would be one of the devices having dropped offline for some time and then rejoined the pool, if you post the diagnostics we could confirm if that was the case. Quote Link to comment
Marshalleq Posted February 25, 2020 Share Posted February 25, 2020 Thanks but it has impacted both BTRFS on the mirror and the btrfs on the docker image. I assume it must have been caused when I had to force restart the box the other day due to an Nvidia lockup. I don't really expect otherwise, but btrfs does seem to be more picky about such things and I would have thought a scrub or repair would have sorted it. But no. Quote Link to comment
JorgeB Posted February 25, 2020 Share Posted February 25, 2020 1 hour ago, Marshalleq said: on the mirror and the btrfs on the docker image. Just FYI docker image share on Unraid by default is NOCOW, so checksums are disabled and any corruption can't be fixed with a scrub. Quote Link to comment
Marshalleq Posted February 26, 2020 Share Posted February 26, 2020 (edited) Thanks, the saga continues. Fixed all this up, but now getting transport endpoint is not connected on /mnt/user. Also lots of segfaults when docker tries to run and the shares have all disappeared. Fun times. I officially dislike BTRFS - even though I recognise it may not be entirely the fault of BTRFS - I have to blame something until I know better! Edited February 26, 2020 by Marshalleq Quote Link to comment
JorgeB Posted March 2, 2020 Share Posted March 2, 2020 On 2/26/2020 at 7:37 PM, Marshalleq said: I officially dislike BTRFS - even though I recognise it may not be entirely the fault of BTRFS - I have to blame something until I know better! Just for any future reader I see like suspected this wasn't a btrfs problem: https://forums.unraid.net/topic/41333-zfs-plugin-for-unraid/?do=findComment&comment=828132 On 2/29/2020 at 7:21 PM, Marshalleq said: Edit 2: Memtest confirms I have faulty, or possibly misconfigured memory. There goes my morning.... Quote Link to comment
Marshalleq Posted March 2, 2020 Share Posted March 2, 2020 Yep, your gut was right, it was a faulty memory stick. Though this exercise I have learnt the following: I don't know how long the memory was faulty - my assumption is many months, it was at about 79GB so it may not always have been used. Also that: All file systems were impacted The BTRFS file system required to be formatted to be recovered because it wouldn't rebalance. The BTRFS docker image required to be deleted and recreated (or older version restored from backup) because it wouldn't repair ZFS clearly pointed me directly at the corrupted file (on my single disk ZFS volume) so I could restore it which was nice The ZFS mirror healed with a simple scrub. XFS of course just ran an fsck type thing, so something could still be lingering, but that data is not very important, that's why it's on XFS. I'd like something more robust, but short of memory errors like this and cold reboots, it's probably pretty safe for what it is. I do believe BTRFS will also point me at the corrupted file, maybe it did, my memory on that is struggling. It has been a good exercise. It's possible someone with more knowledge of BTRFS could have fixed it, though a corrupted image file didn't sound very fixable to me so I elected to start again because I don't trust it based on past experience. I could be completely wrong, but that's where I landed. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.