February 20, 20206 yr I keep getting btrfs corruption on the nvme cache disk after about 3 days of uptime. dmesg shows lots of btrfs errors. I also get 30-50 errors on parity checks (one 4tb wd red with xfs for array and one for parity). Memtest has been running for 18 hours with no errors. Any ideas? The first time the cache disk crashed I removed it as cache and rsynced it to another ssd I had lying around (a handful of unimportant files wouldn't copy). I reformatted the nmve as btrfs and rscynced the stuff back over and added it back as cache. ryzen 9 3900x, 64 gb ram, 1 tb nvme, 2x 4tb wd red
February 20, 20206 yr 24 minutes ago, uek2wooF said: I keep getting btrfs corruption on the nvme cache disk after about 3 days of uptime. dmesg shows lots of btrfs errors. I also get 30-50 errors on parity checks (one 4tb wd red with xfs for array and one for parity). Memtest has been running for 18 hours with no errors. Any ideas? The first time the cache disk crashed I removed it as cache and rsynced it to another ssd I had lying around (a handful of unimportant files wouldn't copy). I reformatted the nmve as btrfs and rscynced the stuff back over and added it back as cache. ryzen 9 3900x, 64 gb ram, 1 tb nvme, 2x 4tb wd red I had this with my cache with was 2 X 512GB Toshiba SSDs on my 3900x when I first set it up and was testing. I had my Ram running at 3600MHz, and the system was not stable. What do you have your ram running at? Try backing it off to 2667 and see if things stable down. Also turn off PBO in the BIOS and see if that helps. I was able to get my system stable at 3200 MHz Ram speed, but 3600 just never worked. System lock ups, and my cache kept getting corrupted. Also, update your BIOS if it is not already on the latest. Pull your diags and post them up so we can take a look.
February 20, 20206 yr Author I couldn't get my 3600 ram to run at all at 3600, had to drop to 3200. I will drop it more but shouldn't I be seeing errors from memtest? 21 hours now no errors. I will upload some configs when I am done memtesting, going to let it go a little longer. I will try to find PBO too before booting back up. Thanks for the reply. This is so frustrating. (btw asrock taichi x570 is the mobo)
February 20, 20206 yr Community Expert 3 hours ago, uek2wooF said: 64 gb ram 4 DIMMs @ 3200 is overclock, see here for more info: https://forums.unraid.net/topic/46802-faq-for-unraid-v6/page/2/?tab=comments#comment-543490
February 21, 20206 yr Author I didn't realize that. I have dropped the ram speed to 2666 and since then I've gotten a clean parity check for the first time, and I even copied 40 gb to the array first to make sure there was some activity. No cache errors yet. I am keeping an eye on it. Thanks!
February 21, 20206 yr 4 hours ago, uek2wooF said: I didn't realize that. I have dropped the ram speed to 2666 and since then I've gotten a clean parity check for the first time, and I even copied 40 gb to the array first to make sure there was some activity. No cache errors yet. I am keeping an eye on it. Thanks! If you don't need 64 GB of ram you could run at a higher ram speed with just 2 dimms, but at the end of the day, the difference in performance is really not that great, especially for a server.
February 21, 20206 yr Author Getting btrfs errors trying to remove a docker container. Could it be that btrfs just sucks? Is ext4 ok for the cache drive?
February 21, 20206 yr Just now, uek2wooF said: Is ext4 ok for the cache drive? Not an option. tools, diagnostics, attach the zip file to your next post if you want assistance.
February 21, 20206 yr Author This looks like a corrupt docker image maybe. Should I just rm docker.img? How do I create a new one?
February 21, 20206 yr Author Created a new docker image and reinstalled some containers. Forgot my settings for the private docker net I had set up so now everything is broken. Good times.
February 22, 20206 yr Author Everything seems to be fixed for now. If I have more problems I will try xfs on cache next I guess.
February 22, 20206 yr Community Expert That could still be a result of the previous issues, btrfs gets quickly corrupt with bad RAM, and if it keeps getting corrupt without an apparent reason it can serve as good warning there are still hardware issues.
Archived
This topic is now archived and is closed to further replies.