eswarasai Posted July 26, 2021 Share Posted July 26, 2021 Hey, so I had to do a force shutdown yesterday as trying to stop the array was stuck at Syncing Filesystems... message. I've double checked using `lsof /mnt` that none of the files were being accessed before doing so. After turning the system back on and started to receive the cache corrupt and checksum invalid errors. I did run the scrub twice to double check if there were any errors but none are being reported yet I'm receiving the corrupt error messages. I'm attaching the diagnostics below, any help is extremely appreciated. Happy to provide any further information needed. Thanks! titan-diagnostics-20210726-2311.zip Quote Link to comment
JorgeB Posted July 26, 2021 Share Posted July 26, 2021 That's a strange one, if there's corruption it should be found by scrubbing, I would suggest backing up, re-formatting pool and restoring the data. Quote Link to comment
eswarasai Posted July 27, 2021 Author Share Posted July 27, 2021 23 hours ago, JorgeB said: That's a strange one, if there's corruption it should be found by scrubbing, I would suggest backing up, re-formatting pool and restoring the data. Just went through the above steps and looks like my Cache 0 is still having corrupt issues. I'm guessing I need to replace my cache drive and there's nothing else that can be done? Scrub still doesn't report any errors either. Please find the latest diagnostics below. Appreciate your help, thanks! titan-diagnostics-20210727-2331.zip Quote Link to comment
JorgeB Posted July 28, 2021 Share Posted July 28, 2021 What would make more sense is that this issue has nothing to do with the previous shutdown, but it's a hardware problem, like bad RAM, and just noticed your RAM is overclocked, that is known to corrupt data with some Ryzen servers, set it to max supported speed and try again, if there are still errors after that run memtest. Quote Link to comment
eswarasai Posted July 28, 2021 Author Share Posted July 28, 2021 10 minutes ago, JorgeB said: What would make more sense is that this issue has nothing to do with the previous shutdown, but it's a hardware problem, like bad RAM, and just noticed your RAM is overclocked, that is known to corrupt data with some Ryzen servers, set it to max supported speed and try again, if there are still errors after that run memtest. Yeah, turned off the XMP profile on motherboard this morning, RAM speeds are at 2133 MT/s and been monitoring the cache for corrupt errors. Have received a couple of errors so far. Will try to run memtest, couldn't do one this morning as system kept restarting when selecting memtest option on UEFI mode. Switching to CSM couldn't even boot from the USB device. Quote Link to comment
itimpi Posted July 28, 2021 Share Posted July 28, 2021 41 minutes ago, eswarasai said: couldn't do one this morning as system kept restarting when selecting memtest option on UEFI mode The memtest supplied with Unraid will not work when booting in UEFI mode. You can get one that will from memtest86.com Quote Link to comment
eswarasai Posted July 30, 2021 Author Share Posted July 30, 2021 Thanks @JorgeB and @itimpi, appreciate both of your help in this thread. Looks like I've had one bad RAM stick out of the 4, so I've removed that and running the system on 3 RAM sticks, there were no errors in this setup on memtest. Hopefully the data corruption errors would not occur anymore. It would've been very helpful if I've come across the user script to monitor cache pool for any errors when I was getting started with Unraid. It isn't really advertised/well known until and unless you look for the specific error posts in the forum. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.