HemiStormtrooper Posted October 26, 2021 Share Posted October 26, 2021 I recently started noticing my cache filling up and not going back down in used space. I have the mover set to run everyday at 3am, yet it seems like some data is staying on my cache pool. I enabled logging on the mover, and ran I ran it I saw this pop up. Oct 25 23:25:19 Installation03 kernel: BTRFS warning (device sdb1): csum failed root 5 ino 21216272 off 4594712576 csum 0xad6138ce expected csum 0xb6089ce0 mirror 2 I'm assuming data got corrupt when I was copying it across the network, but I'm not sure if my cache SSD is going bad (less than 8 months old) or if it's memory related. Any help would be greatly appreciated. Quote Link to comment
JorgeB Posted October 26, 2021 Share Posted October 26, 2021 Please post the diagnostics. Quote Link to comment
HemiStormtrooper Posted October 26, 2021 Author Share Posted October 26, 2021 Sorry. Was posting from mobile last night and my phone hates downloading zip files. I have attached the diags below. installation03-diagnostics-20211026-0708.zip Quote Link to comment
JorgeB Posted October 26, 2021 Share Posted October 26, 2021 Btrfs is detecting data corruption, you should start by running memtest. Quote Link to comment
HemiStormtrooper Posted October 26, 2021 Author Share Posted October 26, 2021 I'm trying to launch memtest from my unRAID USB, but when I select memtest, it says ok then reboots and goes back to the unRAID bootloader. Quote Link to comment
JorgeB Posted October 26, 2021 Share Posted October 26, 2021 Memtest only works with CSM/legacy boot, it won't with UEFI boot. Quote Link to comment
HemiStormtrooper Posted October 26, 2021 Author Share Posted October 26, 2021 (edited) Nevermind. Had to download a Uefi version and boot that. How long should I let it run for? I'm doing the default 4 passes right now. Edited October 26, 2021 by HemiStormtrooper Quote Link to comment
JorgeB Posted October 26, 2021 Share Posted October 26, 2021 Ideally 24 hours, but if there's a findable problem it will usually take a couple of hours at most. Quote Link to comment
HemiStormtrooper Posted October 26, 2021 Author Share Posted October 26, 2021 So, I'm highly confident that it's going to be memory related. Pass 1 on memtest already has 29 errors before test 9 was complete. Unfortunately, my build doesn't support ECC memory and not really in the position to buy all new mobo/cpu/memory. Does anyone have any recommendations for DDR4-2133 memory for my build? In the meantime, I think I'm going to test each stick individually and see if it's just one stick that's bad or not. Quote Link to comment
itimpi Posted October 26, 2021 Share Posted October 26, 2021 23 minutes ago, HemiStormtrooper said: I'm highly confident that it's going to be memory related. Pass 1 on memtest already has 29 errors before test 9 was complete. Definitely the case since anything other than 0 errors is too many Quote Link to comment
HemiStormtrooper Posted October 26, 2021 Author Share Posted October 26, 2021 So since these corrupt files are on my cache, how do I go about removing them once memtest is finished running? Quote Link to comment
JorgeB Posted October 26, 2021 Share Posted October 26, 2021 Run a scrub, any corrupt files will be listed in the syslog, delete those, but note that with bad RAM some files might have gotten corrupt during writes and the checksum will match that, so there could still be undetected corruption, same for any array xfs data written while the RAM was failing. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.