Jump to content

Help with crashes Edit: Crashing again 24/02/2020


Recommended Posts

Hi

 

My UnRaid keeps randomly crashing, I thought that it was my memory so I put another 16GB in (24GB Total) but I still have issues, my log file was getting to 100% so I followed the below however I still get the same problems. I have checked Docker and it shows no more 1.2mb on docker log size. I only have 1 VM running that is running an MC server.

thanks

tower-diagnostics-20200105-2106.zip

Edited by louij2
Link to comment

Diags are after rebooting, try enabling the syslog server/mirror feature to see if it catches anything, also make sure you're using the Ryzen workarounds, Ryzen on Linux can lock up due to issues with c-states, make sure bios is up to date, then look for "Power Supply Idle Control" (or similar) and set it to "typical current idle" (or similar), or completely disable C-sates.

 

More info here:
https://forums.unraid.net/bug-reports/prereleases/670-rc1-system-hard-lock-r354/

  • Like 1
Link to comment
3 hours ago, johnnie.black said:

Diags are after rebooting, try enabling the syslog server/mirror feature to see if it catches anything, also make sure you're using the Ryzen workarounds, Ryzen on Linux can lock up due to issues with c-states, make sure bios is up to date, then look for "Power Supply Idle Control" (or similar) and set it to "typical current idle" (or similar), or completely disable C-sates.

 

More info here:
https://forums.unraid.net/bug-reports/prereleases/670-rc1-system-hard-lock-r354/

Thanks for that been driving me nuts, just updated bios and going to check c state stuff, its weird how it only happened when I started running my VM or when I upgraded from A8-9600 to Ryzen 5 1600

Link to comment

  

5 hours ago, johnnie.black said:

Diags are after rebooting, try enabling the syslog server/mirror feature to see if it catches anything, also make sure you're using the Ryzen workarounds, Ryzen on Linux can lock up due to issues with c-states, make sure bios is up to date, then look for "Power Supply Idle Control" (or similar) and set it to "typical current idle" (or similar), or completely disable C-sates.

 

More info here:
https://forums.unraid.net/bug-reports/prereleases/670-rc1-system-hard-lock-r354/

Updating bios and disabling the c state seemed to fix it for me, however now I have a problem with my VM. Because I was in the process of doing something when the UnRaid crashed, I am able to boot up the VM but when I go to execute the .jar file as I normally do I cannot run this jar anymore and it doesnt output anything so I have a feeling that my vdisk is now corrupted and I need to repair it somehow?

Edited by louij2
Link to comment
  • 1 month later...

My system keeps crashing again and I'm not sure why

, the server was all good I was on MC with my mates and then we where doing lots of redstone stuff and then the UnRaid crashed the same way it was crashing before I disabled C States in BIOS. It also recovered itself and I managed to pull diagnostics before reboot

tower-diagnostics-20200224-1913.zip

Edited by louij2
Link to comment
26 minutes ago, louij2 said:

My system keeps crashing again and I'm not sure why

, the server was all good I was on MC with my mates and then we where doing lots of redstone stuff and then the UnRaid crashed the same way it was crashing before I disabled C States in BIOS. It also recovered itself and I managed to pull diagnostics before reboot

tower-diagnostics-20200224-1913.zip 101.73 kB · 0 downloads

 

I don't see anything in the logs as it looks like these are from after the reboot.

 

Reading through the chain here, this started happening after you added more ram to your system. What speed are you running your RAM at? Ryzen is very picky about RAM speed. Bring your RAM speed down to the supported ram speed for the number and type of DIMMs you have in your server.  Scroll down to a post from @johnnie.black on Ryzen ram speed to see what your ram should run at.

 

 

What Ryzen CPU are you using?

Edited by Chess
url
Link to comment
1 hour ago, Chess said:

 

I don't see anything in the logs as it looks like these are from after the reboot.

 

Reading through the chain here, this started happening after you added more ram to your system. What speed are you running your RAM at? Ryzen is very picky about RAM speed. Bring your RAM speed down to the supported ram speed for the number and type of DIMMs you have in your server.  Scroll down to a post from @johnnie.black on Ryzen ram speed to see what your ram should run at.

 

 

What Ryzen CPU are you using?

I have Ryzen 1600 I had a weird mem config with 2400mhz 8gb kit and 16gb 3200mhz kit taken the 8gb kit out, which may not be getting enough airflow cos the system does power off more when its hotter

Link to comment

@louij2

On 1/6/2020 at 2:58 AM, johnnie.black said:

Diags are after rebooting, try enabling the syslog server/mirror feature to see if it catches anything

 

within that syslog tutorial Frank1940 said

 

Quote

One very neat feature is that each entry are appended onto this file every time a new line is added to the syslog.  This should mean if you have a reboot of the server after a week of collecting the syslog, you will have everything from before the reboot and after the reboot in one file!  

 

I prefer option three but do it how you like as long as you create a situation where the syslog persists after reboot.

  • Thanks 1
Link to comment
8 minutes ago, louij2 said:

I have Ryzen 1600 I had a weird mem config with 2400mhz 8gb kit and 16gb 3200mhz kit taken the 8gb kit out, which may not be getting enough airflow cos the system does power off more when its hotter

 

Should not matter if you have all 4 DIMMS in, however you are limited to 2133 or 1866 RAM speed depending on if any of the DIMMS are dual rank, which your 16 GB sticks might be. Set your ram speed at 1866 in the BIOS and see if your crashes goes away.

Link to comment
10 minutes ago, Chess said:

 

Should not matter if you have all 4 DIMMS in, however you are limited to 2133 or 1866 RAM speed depending on if any of the DIMMS are dual rank, which your 16 GB sticks might be. Set your ram speed at 1866 in the BIOS and see if your crashes goes away.

Well I took the 8gb kit out and put the 16gb one back in but still crashing. I'm thinking it may still be a heat thing. Set up syslogger server and getting output

Edited by louij2
Link to comment
Just now, louij2 said:

Well I took the 8gb kit out and put the 16gb one back in but still crashing. I'm thinking it may still be a heat thing

 

I suppose that's possible, but unlikely. I really feel you should rule out the ram. Ram speeds over the above quoted is overclocking ram on Ryzen. 1st Gen Ryzen is very picky about ram speeds. The forum here is littered with users having crashes on Ryzen related to Ram speed. Even my own build. 

 

Set the ram to 1866 in the BIOS with either of the two dimms and see if the crashes go away. 

Link to comment
1 hour ago, Chess said:

 

I suppose that's possible, but unlikely. I really feel you should rule out the ram. Ram speeds over the above quoted is overclocking ram on Ryzen. 1st Gen Ryzen is very picky about ram speeds. The forum here is littered with users having crashes on Ryzen related to Ram speed. Even my own build. 

 

Set the ram to 1866 in the BIOS with either of the two dimms and see if the crashes go away. 

Okay so I am transferring some files from a VM with FTP and it is running fine and the issue seemed to come when the server is really putting some work in.

Link to comment
10 minutes ago, louij2 said:

Okay so I am transferring some files from a VM with FTP and it is running fine and the issue seemed to come when the server is really putting some work in.

 

I wonder if FTP does not use the ram cache. Still, would be good to get an idea on the CPU temps and take a look. Alas if you are on 6.8.2 Ryzen temps are not working. Assume that is the same with 1st Gen Ryzen. You could setup a temp Win10 BM  install and pull temps with the system under full load and see if it still crashes. Or you could consider downgrading to 6.8 RC-7. That has linux Kernel 5.x with a number of fixes in for Ryzen.

 

When you added the extra DIMMS did you knock anything? Maybe consider re-seating your CPU cooler just to make sure that it is sitting on the CPU correctly and it's good a tight. Also, check to see if the fan is spinning on it. I doubt that it's not, but it's a possibility. 

Link to comment
15 minutes ago, Chess said:

 

I wonder if FTP does not use the ram cache. Still, would be good to get an idea on the CPU temps and take a look. Alas if you are on 6.8.2 Ryzen temps are not working. Assume that is the same with 1st Gen Ryzen. You could setup a temp Win10 BM  install and pull temps with the system under full load and see if it still crashes. Or you could consider downgrading to 6.8 RC-7. That has linux Kernel 5.x with a number of fixes in for Ryzen.

 

When you added the extra DIMMS did you knock anything? Maybe consider re-seating your CPU cooler just to make sure that it is sitting on the CPU correctly and it's good a tight. Also, check to see if the fan is spinning on it. I doubt that it's not, but it's a possibility. 

Hi don't think it's CPU temps tho? It is showing them at around 63c in UnRaid when it crashed. I setup the temp sensor and it picks up for CPU die and package and mobo temp. I'm just glad that I didn't UnRaid my 3900x system like you, was seriously considering that lol! Going to try MemTest86 too

Edited by louij2
Link to comment
32 minutes ago, louij2 said:

Hi don't think it's CPU temps tho? It is showing them at around 63c in UnRaid when it crashed. I setup the temp sensor and it picks up for CPU die and package and mobo temp. I'm just glad that I didn't UnRaid my 3900x system like you, was seriously considering that lol! Going to try MemTest86 too

 

hahaa... For me I wanted to get all of my systems down to 1 (NAS, gaming, VMs) so a 3900x seemed like the best compromise. Wanted a threadripper, but the cost to jump into that platform was just not going to get past the manager at home.

 

Run Memtest and report back. We'll get you stable with enough work, then you'll swap over the 3900x to the unraid box ;)

Link to comment

Hi

 

My FixCommonProblems has said I have Machine Check Events and to get MCELOG so I have. Could this be related? I haven't had a chance to run memory tests yet. Thnx! Also realised I am stupid overclocking to 3.8Ghz on a 350w Gold PSU so I have turned that off for now and do not get the MCE errors in FCP now.

tower-diagnostics-20200226-1125.zip

Edited by louij2
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...