Sentie Posted February 17, 2020 Share Posted February 17, 2020 So I have something seriously strange going on. Let me start by saying that I did have another issue that was at/around the same time and I'm not sure if they are connected that was connected to motherboard settings that can be found hear But the reboot that happened to be the one that caused me to find that problem was when I rebooted to finish the install to 6.8.2. side note I rebuilt my flash drive with a clean install of Unraid and pasted in over my config file when trying to troubleshoot that problem. After I got to booting again I was having trouble with kernel panics so I uninstalled a bunch of plugins and turned off all my dockers, that helped for a boot or two but was definitely not stable found a suggestion to rebuild my flash drive and copy over my config file I tried that but had no luck I think I got it up a time or two in safe mode. Finally i reverted back to a backup of my flash drive that is 6.8.0 and the problem was fixed. I upgraded to 6.8.2 and again kernel panics. So reverted back uninstalled plugins and upgraded... still no luck. so now I'm back to my 6.8.0 and I'm not sure what to do about that but the more pressing problem at this point i have a lot of parity errors and I'm really new to Unraid still so I'm not sure what my next step should be. I am sure there is some kind of corruption somewhere in either my os backup or one of my dockers because when i was trying to stop the dockers three of my processor cores pegged to 100% and got stuck there and the system became unresponsive when trying to load the dashboard or the docker tab and the cpu started heating up outside of it's normal operating temp (not overheating just hotter then it normally runs on Unraid I saw parity errors in that run and had to perform a dirty shutdown to get any kind of system responsiveness again. now the parity check is running again and is not even as far as it was before the lockup and it has found as many errors as it did in the first pass. attached is my diagnostics report. I hope someone might be able to give me some advice on what to do next. Sorry for the novel I wasn't sure what would be relevant and not. skynet-diagnostics-20200217-0326.zip Quote Link to comment
JorgeB Posted February 17, 2020 Share Posted February 17, 2020 Run a couple of non correcting parity checks, if it finds a different number of errors you most likely have a hardware problem, also see here in case you're using overclocked RAM, which is known to cause instability and even data corruption on some Ryzen systems. Quote Link to comment
Sentie Posted February 17, 2020 Author Share Posted February 17, 2020 (edited) Okay I will do that. My ram isn't over clocked. Also don't know if it is relevant but it seems that all or most of the parity errors were in the first 10ish% of the drive when i went to bed it was around 10% done and was at just shy of 3000 errors now just after waking up it is at 3500 errors and 86% done. be default does it correct parity errors? if so then it is probably going to be fixed (hopefully) could the repeated failure to boot cause this kind of problem? Will definitely run a few more non correcting checks though and let you know what comes up. Edited February 17, 2020 by Sentie not quite awake yet and didn't actually respond to the suggestion Quote Link to comment
JorgeB Posted February 18, 2020 Share Posted February 18, 2020 If there were any unclean shutdowns a few sync errors are normal after that. Quote Link to comment
Sentie Posted February 18, 2020 Author Share Posted February 18, 2020 so the error correcting run that was going when I posted last has finished. I'm running a non correcting now. it is looking like it has a comparable number of errors to before the error correcting run. but I saw a post about a possibility that it might have misread the first one and now it might be finding errors that were introduced by the correcting run? will run another non correcting after this and see if they post the same info. If they are the same i will run another correcting run and start over. Will let you know what turns out. Quote Link to comment
JorgeB Posted February 18, 2020 Share Posted February 18, 2020 After a correcting check you should always get 0 errors, suggest running memtest Quote Link to comment
Sentie Posted February 19, 2020 Author Share Posted February 19, 2020 Memtest has been running through the night and has found errors. I'm not exactly sure how to read this or how long i should leave it running for. Quote Link to comment
JorgeB Posted February 19, 2020 Share Posted February 19, 2020 You can stop it after a single error, it confirms RAM is the problem. Quote Link to comment
JonathanM Posted February 19, 2020 Share Posted February 19, 2020 13 minutes ago, Sentie said: Memtest has been running through the night and has found errors From the looks of it, I'm pretty sure you are overclocking the RAM controller, but I could be wrong. Make sure you are obeying the max speeds for the amount and type of RAM on that CPU. Any errors at all are unacceptable, you need to change either the hardware or the settings until you get zero errors over a 24 hour runtime of memtest. Quote Link to comment
Sentie Posted February 19, 2020 Author Share Posted February 19, 2020 (edited) All settings on my system are stock. 4 ram chips from the qlc list. I will pull half the chips and rerun the tests with different combinations till the error is corrected. Edited February 19, 2020 by Sentie Quote Link to comment
JonathanM Posted February 19, 2020 Share Posted February 19, 2020 Just now, Sentie said: All settings on my system are stock. Be sure to disable XMP Quote Link to comment
Sentie Posted February 19, 2020 Author Share Posted February 19, 2020 I can't find anything about xmp in my bios i don't think my server board supports it (ASRockRack X470D4U2-2T). That isn't something that would ship on by default though is it? I did find there is a bios update available so I guess that is actually my next step. I got an email from someone responding to this thread stating my ram is overclocked but I don't see the comment hear. the mb wouldn't do this by default would it? I did finally find the DRAM Timing config tab in my bios but it has a scary you might break your hardware if you mess with these settings do you wish to proceed button. Since I have no idea what I'm doing when it comes to system clocks I really don't want to go in there if I can avoid it. Thanks for your help so far everyone. Quote Link to comment
JorgeB Posted February 19, 2020 Share Posted February 19, 2020 19 minutes ago, Sentie said: I can't find anything about xmp in my bios i don't think my server board supports it (ASRockRack X470D4U2-2T). That's a server board, so unlikely to support overclock, but it should tell what frequency RAM is running. 19 minutes ago, Sentie said: I got an email from someone responding to this thread stating my ram is overclocked Yep, that was my bad, it was the CPU clock i saw, not the RAM, so I immediately deleted the reply. Quote Link to comment
Sentie Posted February 20, 2020 Author Share Posted February 20, 2020 Got it. I will dig closer into the ram speed sometime in the next week or so. all of my data is really well backed up and back to the world of work now so my tinker time is much more limited but i will work on playing with those ram tests and trying to find the ram speed. Let you guys know what I find. Quote Link to comment
Sentie Posted March 5, 2020 Author Share Posted March 5, 2020 I know this has been a bit. Took quite a bit of testing to narrow down the problem. My system is stable as long as I only have 2 sticks of ram slotted all the chips are fine as long as I don't have more then 2 in the system. I have tried two different CPUs and get the same behavior with both. There was a bios update released so I updated and tried again and didn't get any change. I still have not been able to find the ram speed so I have reached out to the motherboard manufacture for assistance. For the moment running okay with 16gb or ram. Hopefully I will be able to get the upgrade taken care of at some point. Also switched out the cooler from the stock amd cooler too so now I can get to ram slot b2 without removing the cpu which should make testing much quicker. let me know if you have any other ideas. Thanks so far Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.