O_M_R Posted December 9, 2019 Share Posted December 9, 2019 (edited) Hi all, I'm at a complete loss for this one, and it centers around these kinds of errors: Dec 9 03:44:30 NAS kernel: BTRFS warning (device sdk1): csum failed root 5 ino 453231 off 22727274496 csum 0x70dd045b expected csum 0xe0efe733 mirror 1 I'm hoping someone can help me out, since I've exhausted literally everything I can think of. I used to have an old 120GB SSD as my cache, I got zero errors, everything was happy all using BTRFS. I am using older hardware, an Asus Sabertooth X58 w/ an i7 950, 24GB of RAM. I'm also running the latest (stable) release of unRAID. I got a Samsung EVO 850, 1TB and replaced my cache drive with that. It went fine, but then I started noticing these errors in the log. I noticed all of them were files being downloaded by NZBGet, and I only noticed because I didn't understand why there was still data sitting on the cache drive that should have been moved over. To be clear, I've *never* seen this error for my various docker containers etc, just NZBGet created files (so far). So next course of action, I grab a Samsung EVO 860 1TB, and put the pair in a BTRFS cache pool, thinking perhaps the first SSD is faulty. I continued to get errors... in the exact same spot on every file on both disks. Weird. I then tried testing my RAM, since it's older corsair and not ECC or anything. I let memtest run for 2 passes (around 13 hours) before I had to get things up and running again. No errors. So next, I have 2 controllers on my motherboard. 2 SATA3 ports, which use a Marvell Controller, which I read could be problematic, and some older Intel SATA2 ports. I tried switching to the SATA2 ports, and the error persisted. Finally, I disconnected the EVO 850, and ran just the EVO 860, reformatted it and restored my data. Still more errors. I believe it's happening mostly on files that NZBGet has repaired, like the checksum metadata isn't being updated or something after the repair but this is a random guess. I'm ready to throw in the towel on this one, as I've tried everything I can think of short of building a new server which is on the radar but just not right now. I'd really like to keep both drives together as a cache, as I appreciate the redundancy but I'm getting close to throwing in the towel on this one. I'm trying to figure out if there's some obscure setting I need to change in NZBGet and all will be right in the world. EDIT: Diagnostics Added nas-diagnostics-20191209-2151.zip Edited December 9, 2019 by O_M_R Added Diagnostics Quote Link to comment
Squid Posted December 9, 2019 Share Posted December 9, 2019 You should post your diagnostics, so that people in the know will have all the relevant info. Quote Link to comment
O_M_R Posted December 9, 2019 Author Share Posted December 9, 2019 Done! Hopefully I've tossed up the right stuff. I also wanted to say thanks, I've been reading the forum a lot as there's lots of great info here for trouble shooting. Quote Link to comment
JorgeB Posted December 10, 2019 Share Posted December 10, 2019 Checksum errors mean data is getting corrupt, most commonly from bad RAM, but could be controller, board, CPU, etc, unlikely to be the SSDs themselves. Quote Link to comment
O_M_R Posted December 10, 2019 Author Share Posted December 10, 2019 It's just strange that with an older 120GB OCZ I never had a single problem. There's really only 3 things I can think of at this point, aside from building a new server. 1. The firmware on the Samsungs (or the drive controller) hates old hardware, and is causing the issue through some form of incompatibility. 2. There actually isn't any corruption at all, but the Samsung drive is doing something behind the scenes that makes btrfs angry (which may be due to older hardware). 3. NZBGet is allocating space in a manner that btrfs doesn't like. I find it odd the only thing corrupting is NZBGet downloads, it's strange to say the least. I know the 850 firmware is up to date, but not sure about the 860 since I just ordered it and put it in. Only thing I've got left to try is updating the firmware. I've even tried changing sata cables! To be honest, more than anything it just stinks to try to improve the server a bit, and run into a headache like this (to be clear I'm not blaming unRAID at all!). Usenet downloads are one thing, but I have some more important things on there I'd prefer not to have corrupted. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.