Rubene Posted January 10, 2020 Share Posted January 10, 2020 I bought a new machine to run unRAID on. After booting I got this message: 'Your server has detected hardware errors'. The array seems to run fine (although it is still a small array without parity disk. Explored unRAID on old hardware. Since it's a good fit to replace my Synology I bought new hardware and going to purchase a license). This is in the syslog: Jan 9 21:21:03 Tower kernel: smpboot: CPU0: Intel(R) Core(TM) i3-8100 CPU @ 3.60GHz (family: 0x6, model: 0x9e, stepping: 0xb) Jan 9 21:21:03 Tower kernel: mce: [Hardware Error]: Machine check events logged Jan 9 21:21:03 Tower kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 8: ae00000000801136 Jan 9 21:21:03 Tower kernel: mce: [Hardware Error]: TSC 0 ADDR 8b445140 MISC 43040000086 Jan 9 21:21:03 Tower kernel: mce: [Hardware Error]: PROCESSOR 0:906eb TIME 1578601246 SOCKET 0 APIC 0 microcode ca Jan 9 21:21:03 Tower kernel: mce: [Hardware Error]: Machine check events logged Jan 9 21:21:03 Tower kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 9: ae00000000801136 Jan 9 21:21:03 Tower kernel: mce: [Hardware Error]: TSC 0 ADDR 8b445100 MISC 43040000086 Jan 9 21:21:03 Tower kernel: mce: [Hardware Error]: PROCESSOR 0:906eb TIME 1578601246 SOCKET 0 APIC 0 microcode ca To be sure it's not the RAM, I ran memtest. After 4 hours it completed and did not find any issues. +---------------------------------------------+-----------------+--------+ | Test | # Tests Passed | Errors | +---------------------------------------------+-----------------+--------+ | Test 0 [Address test, walking ones, 1 CPU] | 4/4 (100%) | 0 | | Test 1 [Address test, own address, 1 CPU] | 4/4 (100%) | 0 | | Test 2 [Address test, own address] | 4/4 (100%) | 0 | | Test 3 [Moving inversions, ones & zeroes] | 4/4 (100%) | 0 | | Test 4 [Moving inversions, 8-bit pattern] | 4/4 (100%) | 0 | | Test 5 [Moving inversions, random pattern] | 4/4 (100%) | 0 | | Test 6 [Block move, 64-byte blocks] | 4/4 (100%) | 0 | | Test 7 [Moving inversions, 32-bit pattern] | 4/4 (100%) | 0 | | Test 8 [Random number sequence] | 4/4 (100%) | 0 | | Test 9 [Modulo 20, ones & zeros] | 4/4 (100%) | 0 | | Test 10 [Bit fade test, 2 patterns, 1 CPU] | 4/4 (100%) | 0 | | Test 13 [Hammer test] | 4/4 (100%) | 0 | +---------------------------------------------+-----------------+--------+ If I understood correctly, this might happen very early in the boot process during initialization. Doesn't seem something to worry about but I would like to be sure. Thanks! tower-diagnostics-20200109-2056.zip Quote Link to comment
Rubene Posted January 11, 2020 Author Share Posted January 11, 2020 Yesterday it happened again: Jan 10 21:08:29 Tower kernel: smpboot: CPU0: Intel(R) Core(TM) i3-8100 CPU @ 3.60GHz (family: 0x6, model: 0x9e, stepping: 0xb) Jan 10 21:08:29 Tower kernel: mce: [Hardware Error]: Machine check events logged Jan 10 21:08:29 Tower kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 8: ae00000000801136 Jan 10 21:08:29 Tower kernel: mce: [Hardware Error]: TSC 0 ADDR 8b445140 MISC 47040000086 Jan 10 21:08:29 Tower kernel: mce: [Hardware Error]: PROCESSOR 0:906eb TIME 1578686893 SOCKET 0 APIC 0 microcode ca Jan 10 21:08:29 Tower kernel: mce: [Hardware Error]: Machine check events logged Jan 10 21:08:29 Tower kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 9: ae00000000801136 Jan 10 21:08:29 Tower kernel: mce: [Hardware Error]: TSC 0 ADDR 8b445100 MISC 43040000086 Jan 10 21:08:29 Tower kernel: mce: [Hardware Error]: PROCESSOR 0:906eb TIME 1578686893 SOCKET 0 APIC 0 microcode ca Any idea what this could be? Since it happened twice now. Quote Link to comment
Rubene Posted January 14, 2020 Author Share Posted January 14, 2020 The hardware errors actually happens every time during boot, except one time. Did around 20 - 30 reboots I guess. I checked the mcelog (/dev/mcelog) but its empty. I also did a second memtest: no errors. Apart from the notifications, the machine & array seems to be running fine. But I also had some issues with my new flash drive during some boots. Looked like it couldn't read some files. But these got solved when I moved the flash drive to a USB 3.0 port instead of a 2.0 which it was in before. Could that make sense? Could anyone please have a look at these hardware errors? The hardware is brand new, I want to avoid any problems with it. Quote Link to comment
Squid Posted January 14, 2020 Share Posted January 14, 2020 The mce happens during initialization of the CPU. Happens on certain hardware combinations due to flux in the vortex / the house is built using pyramids instead of A-Frames / Gremlins (IE: no particular reason for it, and nothing to worry about at all) Quote Link to comment
Rubene Posted January 15, 2020 Author Share Posted January 15, 2020 Thanks for your reply! Glad to hear that it is nothing to worry about. Is this something unraid specific? Quote Link to comment
itimpi Posted January 15, 2020 Share Posted January 15, 2020 5 minutes ago, Rubene said: Is this something unraid specific? I think it will occur on any Linux based system using the same kernel level as is being used by the Unraid release. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.