louij2 Posted January 5, 2020 Share Posted January 5, 2020 (edited) Hi My UnRaid keeps randomly crashing, I thought that it was my memory so I put another 16GB in (24GB Total) but I still have issues, my log file was getting to 100% so I followed the below however I still get the same problems. I have checked Docker and it shows no more 1.2mb on docker log size. I only have 1 VM running that is running an MC server. thanks tower-diagnostics-20200105-2106.zip Edited February 25, 2020 by louij2 Quote Link to comment
JorgeB Posted January 6, 2020 Share Posted January 6, 2020 Diags are after rebooting, try enabling the syslog server/mirror feature to see if it catches anything, also make sure you're using the Ryzen workarounds, Ryzen on Linux can lock up due to issues with c-states, make sure bios is up to date, then look for "Power Supply Idle Control" (or similar) and set it to "typical current idle" (or similar), or completely disable C-sates. More info here: https://forums.unraid.net/bug-reports/prereleases/670-rc1-system-hard-lock-r354/ 1 Quote Link to comment
louij2 Posted January 6, 2020 Author Share Posted January 6, 2020 3 hours ago, johnnie.black said: Diags are after rebooting, try enabling the syslog server/mirror feature to see if it catches anything, also make sure you're using the Ryzen workarounds, Ryzen on Linux can lock up due to issues with c-states, make sure bios is up to date, then look for "Power Supply Idle Control" (or similar) and set it to "typical current idle" (or similar), or completely disable C-sates. More info here: https://forums.unraid.net/bug-reports/prereleases/670-rc1-system-hard-lock-r354/ Thanks for that been driving me nuts, just updated bios and going to check c state stuff, its weird how it only happened when I started running my VM or when I upgraded from A8-9600 to Ryzen 5 1600 Quote Link to comment
louij2 Posted January 6, 2020 Author Share Posted January 6, 2020 (edited) 5 hours ago, johnnie.black said: Diags are after rebooting, try enabling the syslog server/mirror feature to see if it catches anything, also make sure you're using the Ryzen workarounds, Ryzen on Linux can lock up due to issues with c-states, make sure bios is up to date, then look for "Power Supply Idle Control" (or similar) and set it to "typical current idle" (or similar), or completely disable C-sates. More info here: https://forums.unraid.net/bug-reports/prereleases/670-rc1-system-hard-lock-r354/ Updating bios and disabling the c state seemed to fix it for me, however now I have a problem with my VM. Because I was in the process of doing something when the UnRaid crashed, I am able to boot up the VM but when I go to execute the .jar file as I normally do I cannot run this jar anymore and it doesnt output anything so I have a feeling that my vdisk is now corrupted and I need to repair it somehow? Edited January 6, 2020 by louij2 Quote Link to comment
JorgeB Posted January 6, 2020 Share Posted January 6, 2020 6 minutes ago, louij2 said: so I have a feeling that my vdisk is now corrupted and I need to repair it somehow? Try running a filesystem check on the VM itself, failing that restore the vdisk from a backup if available. Quote Link to comment
louij2 Posted January 7, 2020 Author Share Posted January 7, 2020 10 hours ago, johnnie.black said: Try running a filesystem check on the VM itself, failing that restore the vdisk from a backup if available. VM down Gonna have to make it again file system check didnt report back anything Quote Link to comment
louij2 Posted February 25, 2020 Author Share Posted February 25, 2020 (edited) My system keeps crashing again and I'm not sure why , the server was all good I was on MC with my mates and then we where doing lots of redstone stuff and then the UnRaid crashed the same way it was crashing before I disabled C States in BIOS. It also recovered itself and I managed to pull diagnostics before reboot tower-diagnostics-20200224-1913.zip Edited February 25, 2020 by louij2 Quote Link to comment
Chess Posted February 25, 2020 Share Posted February 25, 2020 (edited) 26 minutes ago, louij2 said: My system keeps crashing again and I'm not sure why , the server was all good I was on MC with my mates and then we where doing lots of redstone stuff and then the UnRaid crashed the same way it was crashing before I disabled C States in BIOS. It also recovered itself and I managed to pull diagnostics before reboot tower-diagnostics-20200224-1913.zip 101.73 kB · 0 downloads I don't see anything in the logs as it looks like these are from after the reboot. Reading through the chain here, this started happening after you added more ram to your system. What speed are you running your RAM at? Ryzen is very picky about RAM speed. Bring your RAM speed down to the supported ram speed for the number and type of DIMMs you have in your server. Scroll down to a post from @johnnie.black on Ryzen ram speed to see what your ram should run at. What Ryzen CPU are you using? Edited February 25, 2020 by Chess url Quote Link to comment
louij2 Posted February 25, 2020 Author Share Posted February 25, 2020 1 hour ago, Chess said: I don't see anything in the logs as it looks like these are from after the reboot. Reading through the chain here, this started happening after you added more ram to your system. What speed are you running your RAM at? Ryzen is very picky about RAM speed. Bring your RAM speed down to the supported ram speed for the number and type of DIMMs you have in your server. Scroll down to a post from @johnnie.black on Ryzen ram speed to see what your ram should run at. What Ryzen CPU are you using? I have Ryzen 1600 I had a weird mem config with 2400mhz 8gb kit and 16gb 3200mhz kit taken the 8gb kit out, which may not be getting enough airflow cos the system does power off more when its hotter Quote Link to comment
Dissones4U Posted February 25, 2020 Share Posted February 25, 2020 @louij2 On 1/6/2020 at 2:58 AM, johnnie.black said: Diags are after rebooting, try enabling the syslog server/mirror feature to see if it catches anything within that syslog tutorial Frank1940 said Quote One very neat feature is that each entry are appended onto this file every time a new line is added to the syslog. This should mean if you have a reboot of the server after a week of collecting the syslog, you will have everything from before the reboot and after the reboot in one file! I prefer option three but do it how you like as long as you create a situation where the syslog persists after reboot. 1 Quote Link to comment
Chess Posted February 25, 2020 Share Posted February 25, 2020 8 minutes ago, louij2 said: I have Ryzen 1600 I had a weird mem config with 2400mhz 8gb kit and 16gb 3200mhz kit taken the 8gb kit out, which may not be getting enough airflow cos the system does power off more when its hotter Should not matter if you have all 4 DIMMS in, however you are limited to 2133 or 1866 RAM speed depending on if any of the DIMMS are dual rank, which your 16 GB sticks might be. Set your ram speed at 1866 in the BIOS and see if your crashes goes away. Quote Link to comment
louij2 Posted February 25, 2020 Author Share Posted February 25, 2020 (edited) 10 minutes ago, Chess said: Should not matter if you have all 4 DIMMS in, however you are limited to 2133 or 1866 RAM speed depending on if any of the DIMMS are dual rank, which your 16 GB sticks might be. Set your ram speed at 1866 in the BIOS and see if your crashes goes away. Well I took the 8gb kit out and put the 16gb one back in but still crashing. I'm thinking it may still be a heat thing. Set up syslogger server and getting output Edited February 25, 2020 by louij2 Quote Link to comment
Chess Posted February 25, 2020 Share Posted February 25, 2020 Just now, louij2 said: Well I took the 8gb kit out and put the 16gb one back in but still crashing. I'm thinking it may still be a heat thing I suppose that's possible, but unlikely. I really feel you should rule out the ram. Ram speeds over the above quoted is overclocking ram on Ryzen. 1st Gen Ryzen is very picky about ram speeds. The forum here is littered with users having crashes on Ryzen related to Ram speed. Even my own build. Set the ram to 1866 in the BIOS with either of the two dimms and see if the crashes go away. Quote Link to comment
louij2 Posted February 25, 2020 Author Share Posted February 25, 2020 1 hour ago, Chess said: I suppose that's possible, but unlikely. I really feel you should rule out the ram. Ram speeds over the above quoted is overclocking ram on Ryzen. 1st Gen Ryzen is very picky about ram speeds. The forum here is littered with users having crashes on Ryzen related to Ram speed. Even my own build. Set the ram to 1866 in the BIOS with either of the two dimms and see if the crashes go away. Okay so I am transferring some files from a VM with FTP and it is running fine and the issue seemed to come when the server is really putting some work in. Quote Link to comment
Chess Posted February 25, 2020 Share Posted February 25, 2020 10 minutes ago, louij2 said: Okay so I am transferring some files from a VM with FTP and it is running fine and the issue seemed to come when the server is really putting some work in. I wonder if FTP does not use the ram cache. Still, would be good to get an idea on the CPU temps and take a look. Alas if you are on 6.8.2 Ryzen temps are not working. Assume that is the same with 1st Gen Ryzen. You could setup a temp Win10 BM install and pull temps with the system under full load and see if it still crashes. Or you could consider downgrading to 6.8 RC-7. That has linux Kernel 5.x with a number of fixes in for Ryzen. When you added the extra DIMMS did you knock anything? Maybe consider re-seating your CPU cooler just to make sure that it is sitting on the CPU correctly and it's good a tight. Also, check to see if the fan is spinning on it. I doubt that it's not, but it's a possibility. Quote Link to comment
louij2 Posted February 25, 2020 Author Share Posted February 25, 2020 (edited) 15 minutes ago, Chess said: I wonder if FTP does not use the ram cache. Still, would be good to get an idea on the CPU temps and take a look. Alas if you are on 6.8.2 Ryzen temps are not working. Assume that is the same with 1st Gen Ryzen. You could setup a temp Win10 BM install and pull temps with the system under full load and see if it still crashes. Or you could consider downgrading to 6.8 RC-7. That has linux Kernel 5.x with a number of fixes in for Ryzen. When you added the extra DIMMS did you knock anything? Maybe consider re-seating your CPU cooler just to make sure that it is sitting on the CPU correctly and it's good a tight. Also, check to see if the fan is spinning on it. I doubt that it's not, but it's a possibility. Hi don't think it's CPU temps tho? It is showing them at around 63c in UnRaid when it crashed. I setup the temp sensor and it picks up for CPU die and package and mobo temp. I'm just glad that I didn't UnRaid my 3900x system like you, was seriously considering that lol! Going to try MemTest86 too Edited February 25, 2020 by louij2 Quote Link to comment
Chess Posted February 25, 2020 Share Posted February 25, 2020 32 minutes ago, louij2 said: Hi don't think it's CPU temps tho? It is showing them at around 63c in UnRaid when it crashed. I setup the temp sensor and it picks up for CPU die and package and mobo temp. I'm just glad that I didn't UnRaid my 3900x system like you, was seriously considering that lol! Going to try MemTest86 too hahaa... For me I wanted to get all of my systems down to 1 (NAS, gaming, VMs) so a 3900x seemed like the best compromise. Wanted a threadripper, but the cost to jump into that platform was just not going to get past the manager at home. Run Memtest and report back. We'll get you stable with enough work, then you'll swap over the 3900x to the unraid box Quote Link to comment
louij2 Posted February 26, 2020 Author Share Posted February 26, 2020 (edited) Hi My FixCommonProblems has said I have Machine Check Events and to get MCELOG so I have. Could this be related? I haven't had a chance to run memory tests yet. Thnx! Also realised I am stupid overclocking to 3.8Ghz on a 350w Gold PSU so I have turned that off for now and do not get the MCE errors in FCP now. tower-diagnostics-20200226-1125.zip Edited February 26, 2020 by louij2 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.