January 10, 20206 yr I bought a new machine to run unRAID on. After booting I got this message: 'Your server has detected hardware errors'. The array seems to run fine (although it is still a small array without parity disk. Explored unRAID on old hardware. Since it's a good fit to replace my Synology I bought new hardware and going to purchase a license). This is in the syslog: Jan 9 21:21:03 Tower kernel: smpboot: CPU0: Intel(R) Core(TM) i3-8100 CPU @ 3.60GHz (family: 0x6, model: 0x9e, stepping: 0xb) Jan 9 21:21:03 Tower kernel: mce: [Hardware Error]: Machine check events logged Jan 9 21:21:03 Tower kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 8: ae00000000801136 Jan 9 21:21:03 Tower kernel: mce: [Hardware Error]: TSC 0 ADDR 8b445140 MISC 43040000086 Jan 9 21:21:03 Tower kernel: mce: [Hardware Error]: PROCESSOR 0:906eb TIME 1578601246 SOCKET 0 APIC 0 microcode ca Jan 9 21:21:03 Tower kernel: mce: [Hardware Error]: Machine check events logged Jan 9 21:21:03 Tower kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 9: ae00000000801136 Jan 9 21:21:03 Tower kernel: mce: [Hardware Error]: TSC 0 ADDR 8b445100 MISC 43040000086 Jan 9 21:21:03 Tower kernel: mce: [Hardware Error]: PROCESSOR 0:906eb TIME 1578601246 SOCKET 0 APIC 0 microcode ca To be sure it's not the RAM, I ran memtest. After 4 hours it completed and did not find any issues. +---------------------------------------------+-----------------+--------+ | Test | # Tests Passed | Errors | +---------------------------------------------+-----------------+--------+ | Test 0 [Address test, walking ones, 1 CPU] | 4/4 (100%) | 0 | | Test 1 [Address test, own address, 1 CPU] | 4/4 (100%) | 0 | | Test 2 [Address test, own address] | 4/4 (100%) | 0 | | Test 3 [Moving inversions, ones & zeroes] | 4/4 (100%) | 0 | | Test 4 [Moving inversions, 8-bit pattern] | 4/4 (100%) | 0 | | Test 5 [Moving inversions, random pattern] | 4/4 (100%) | 0 | | Test 6 [Block move, 64-byte blocks] | 4/4 (100%) | 0 | | Test 7 [Moving inversions, 32-bit pattern] | 4/4 (100%) | 0 | | Test 8 [Random number sequence] | 4/4 (100%) | 0 | | Test 9 [Modulo 20, ones & zeros] | 4/4 (100%) | 0 | | Test 10 [Bit fade test, 2 patterns, 1 CPU] | 4/4 (100%) | 0 | | Test 13 [Hammer test] | 4/4 (100%) | 0 | +---------------------------------------------+-----------------+--------+ If I understood correctly, this might happen very early in the boot process during initialization. Doesn't seem something to worry about but I would like to be sure. Thanks! tower-diagnostics-20200109-2056.zip
January 11, 20206 yr Author Yesterday it happened again: Jan 10 21:08:29 Tower kernel: smpboot: CPU0: Intel(R) Core(TM) i3-8100 CPU @ 3.60GHz (family: 0x6, model: 0x9e, stepping: 0xb) Jan 10 21:08:29 Tower kernel: mce: [Hardware Error]: Machine check events logged Jan 10 21:08:29 Tower kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 8: ae00000000801136 Jan 10 21:08:29 Tower kernel: mce: [Hardware Error]: TSC 0 ADDR 8b445140 MISC 47040000086 Jan 10 21:08:29 Tower kernel: mce: [Hardware Error]: PROCESSOR 0:906eb TIME 1578686893 SOCKET 0 APIC 0 microcode ca Jan 10 21:08:29 Tower kernel: mce: [Hardware Error]: Machine check events logged Jan 10 21:08:29 Tower kernel: mce: [Hardware Error]: CPU 0: Machine Check: 0 Bank 9: ae00000000801136 Jan 10 21:08:29 Tower kernel: mce: [Hardware Error]: TSC 0 ADDR 8b445100 MISC 43040000086 Jan 10 21:08:29 Tower kernel: mce: [Hardware Error]: PROCESSOR 0:906eb TIME 1578686893 SOCKET 0 APIC 0 microcode ca Any idea what this could be? Since it happened twice now.
January 14, 20206 yr Author The hardware errors actually happens every time during boot, except one time. Did around 20 - 30 reboots I guess. I checked the mcelog (/dev/mcelog) but its empty. I also did a second memtest: no errors. Apart from the notifications, the machine & array seems to be running fine. But I also had some issues with my new flash drive during some boots. Looked like it couldn't read some files. But these got solved when I moved the flash drive to a USB 3.0 port instead of a 2.0 which it was in before. Could that make sense? Could anyone please have a look at these hardware errors? The hardware is brand new, I want to avoid any problems with it.
January 14, 20206 yr The mce happens during initialization of the CPU. Happens on certain hardware combinations due to flux in the vortex / the house is built using pyramids instead of A-Frames / Gremlins (IE: no particular reason for it, and nothing to worry about at all)
January 15, 20206 yr Author Thanks for your reply! Glad to hear that it is nothing to worry about. Is this something unraid specific?
January 15, 20206 yr Community Expert 5 minutes ago, Rubene said: Is this something unraid specific? I think it will occur on any Linux based system using the same kernel level as is being used by the Unraid release.
Archived
This topic is now archived and is closed to further replies.