System reboots while doing parity check


Recommended Posts

I would run Memtst (A boot option) for 24 hours...

Aug  1 01:24:29 Tower kernel: mce: [Hardware Error]: Machine check events logged
Aug  1 01:24:29 Tower kernel: mce: [Hardware Error]: CPU 2: Machine Check: 0 Bank 1: bf80000000000124
Aug  1 01:24:29 Tower kernel: mce: [Hardware Error]: TSC 0 ADDR 40fd03e00 MISC 86

 

Link to comment
On 8/1/2020 at 7:53 AM, Frank1940 said:

I would run Memtst (A boot option) for 24 hours...


Aug  1 01:24:29 Tower kernel: mce: [Hardware Error]: Machine check events logged
Aug  1 01:24:29 Tower kernel: mce: [Hardware Error]: CPU 2: Machine Check: 0 Bank 1: bf80000000000124
Aug  1 01:24:29 Tower kernel: mce: [Hardware Error]: TSC 0 ADDR 40fd03e00 MISC 86

 

Zero errors during the memtest. 10 passes

Link to comment

The reboot occurs at line 22 in the syslog---  Time of 22:06:21    

 

I am not an expert at reading syslogs but I don't see anything in the first 21 lines that is not typical of normal operation. 

 

I must ask, is it possible that you have a pet or child that might be pushing the reset button.  Often times during a parity check, there is a nice flashing led that tends to attract and demand attention from the curious.

 

Is this a new hardware build or is it a recycled computer?  You might provide a few details as to the background of this server. 

Link to comment

While I do have pets, none were in my room during the time while it rebooted and there are no children in my house.

 

This is a recycled computer, it was my old gaming computer from about 5 years ago. It has an i5 4690k, 2x8gb of ram and a corsair CX450 psu.

Before I started using unraid, it was functioning as a windows/ubuntu computer for a couple of weeks, which had no problems. 

 

I'm going to try and snap a picture of when it crashes because the most recent reboot showed that text does appear on screen when it does crash.

Link to comment

Next thing to try.  Boot it in the Safe Mode and see if it still reboots.  And go back to the BIOS stock settings on any overclocking.  (Overclocking is a no, no for servers!)  

 

Also look at the inside of the case.  Make sure it is clean.  Get the dust out of heat sinks and fans.  Make sure that the air flow is over the drives.  Basically, the fans at the back of the case should blow out.  Double check that the PS/MB power plugs are all securely plugged in.  (By the way, PS have caused this problem in the past...)  Most of the rebooting problems are hardware related. 

Link to comment

I managed to capture the moment when it reboots and it spits out this.

mce: [Hardware Error]: CPU 2: Machine Check Exception 5 Bank 1: bf80000000000124
mce: [Hardware Error]: RIP !INEXACT! 10:<ffffffff81334b4e> {percpu_counter_add_batch+0x4e/0x52}
mce: [Hardware Error]: TSC 3963ac8a7429 ADDR 40b9a9340 MISC 86
mce: [Hardware Error]: PROCESSOR 0:306c3 TIME 1596582445 SOCKET 0 APIC 4 microcode 27
mce: [Hardware Error]: Run the above through 'mcelog --ascii'
mce: [Hardware Error]: Machine check: Processor context corrupt
Kernel panic - not syncing: Fatal machine check
Kernel Offset: disabled
Rebooting in 30 seconds..

I don't currently have a power supply on hand, I would use my main pc's psu when I manage to find a good sale to replace however. It would suck if it is the psu as this one is only 4ish months old.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.