Your server has detected hardware errors


Rubene

Recommended Posts

I bought a new machine to run unRAID on. After booting I got this message: 'Your server has detected hardware errors'. The array seems to run fine (although it is still a small array without parity disk. Explored unRAID on old hardware. Since it's a good fit to replace my Synology I bought new hardware and going to purchase a license).

 

This is in the syslog:

Jan  9 21:21:03 Tower kernel: smpboot: CPU0: Intel(R) Core(TM) i3-8100 CPU @ 3.60GHz (family: 0x6, model: 0x9e, stepping: 0xb)
Jan  9 21:21:03 Tower kernel: mce: [Hardware Error]: Machine check events logged
Jan  9 21:21:03 Tower kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 8: ae00000000801136
Jan  9 21:21:03 Tower kernel: mce: [Hardware Error]: TSC 0 ADDR 8b445140 MISC 43040000086
Jan  9 21:21:03 Tower kernel: mce: [Hardware Error]: PROCESSOR 0:906eb TIME 1578601246 SOCKET 0 APIC 0 microcode ca
Jan  9 21:21:03 Tower kernel: mce: [Hardware Error]: Machine check events logged
Jan  9 21:21:03 Tower kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 9: ae00000000801136
Jan  9 21:21:03 Tower kernel: mce: [Hardware Error]: TSC 0 ADDR 8b445100 MISC 43040000086
Jan  9 21:21:03 Tower kernel: mce: [Hardware Error]: PROCESSOR 0:906eb TIME 1578601246 SOCKET 0 APIC 0 microcode ca

To be sure it's not the RAM, I ran memtest. After 4 hours it completed and did not find any issues.

 


+---------------------------------------------+-----------------+--------+
|                    Test                     | # Tests Passed  | Errors |
+---------------------------------------------+-----------------+--------+
| Test 0 [Address test, walking ones, 1 CPU]  | 4/4 (100%)      |      0 |
| Test 1 [Address test, own address, 1 CPU]   | 4/4 (100%)      |      0 |
| Test 2 [Address test, own address]          | 4/4 (100%)      |      0 |
| Test 3 [Moving inversions, ones & zeroes]   | 4/4 (100%)      |      0 |
| Test 4 [Moving inversions, 8-bit pattern]   | 4/4 (100%)      |      0 |
| Test 5 [Moving inversions, random pattern]  | 4/4 (100%)      |      0 |
| Test 6 [Block move, 64-byte blocks]         | 4/4 (100%)      |      0 |
| Test 7 [Moving inversions, 32-bit pattern]  | 4/4 (100%)      |      0 |
| Test 8 [Random number sequence]             | 4/4 (100%)      |      0 |
| Test 9 [Modulo 20, ones & zeros]            | 4/4 (100%)      |      0 |
| Test 10 [Bit fade test, 2 patterns, 1 CPU]  | 4/4 (100%)      |      0 |
| Test 13 [Hammer test]                       | 4/4 (100%)      |      0 |
+---------------------------------------------+-----------------+--------+

If I understood correctly, this might happen very early in the boot process during initialization. Doesn't seem something to worry about but I would like to be sure.

Thanks!

 

tower-diagnostics-20200109-2056.zip

Link to comment

Yesterday it happened again:

 

Jan 10 21:08:29 Tower kernel: smpboot: CPU0: Intel(R) Core(TM) i3-8100 CPU @ 3.60GHz (family: 0x6, model: 0x9e, stepping: 0xb)
Jan 10 21:08:29 Tower kernel: mce: [Hardware Error]: Machine check events logged
Jan 10 21:08:29 Tower kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 8: ae00000000801136
Jan 10 21:08:29 Tower kernel: mce: [Hardware Error]: TSC 0 ADDR 8b445140 MISC 47040000086 
Jan 10 21:08:29 Tower kernel: mce: [Hardware Error]: PROCESSOR 0:906eb TIME 1578686893 SOCKET 0 APIC 0 microcode ca
Jan 10 21:08:29 Tower kernel: mce: [Hardware Error]: Machine check events logged
Jan 10 21:08:29 Tower kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 9: ae00000000801136
Jan 10 21:08:29 Tower kernel: mce: [Hardware Error]: TSC 0 ADDR 8b445100 MISC 43040000086 
Jan 10 21:08:29 Tower kernel: mce: [Hardware Error]: PROCESSOR 0:906eb TIME 1578686893 SOCKET 0 APIC 0 microcode ca

Any idea what this could be? Since it happened twice now.

Link to comment

The hardware errors actually happens every time during boot, except one time. Did around 20 - 30 reboots I guess.

I checked the mcelog (/dev/mcelog) but its empty. I also did a second memtest: no errors. Apart from the notifications, the machine & array seems to be running fine.

 

But I also had some issues with my new flash drive during some boots. Looked like it couldn't read some files. But these got solved when I moved the flash drive to a USB 3.0 port instead of a 2.0 which it was in before. Could that make sense?

 

Could anyone please have a look at these hardware errors? The hardware is brand new, I want to avoid any problems with it.

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.