May 4, 20179 yr Hello, In "Fix Common Problems" ive got notice: Machine Check Events detected on your server Your server has detected hardware errors. You should install mcelog via the NerdPack plugin, post your diagnostics and ask for assistance on the unRaid forums. The output of mcelog (if installed) has been logged So ive run : mcelog: Family 6 Model 92 CPU: only decoding architectural errors ---- I have ASrock J4205-ITX mainboard What's next? thanks
May 4, 20179 yr Post a diagnostics. It may be mcelog does not support that processor, but need more info from syslog to verify.
May 4, 20179 yr Quote May 4 21:35:03 unRAIDTower mcelog: Running trigger `unknown-error-trigger' May 4 21:35:03 unRAIDTower mcelog: CPU 0 on socket 0 received unknown error May 4 21:35:03 unRAIDTower mcelog: Location: CPU 0 on socket 0 Quote May 4 21:35:04 unRAIDTower root: Uncorrected error A hardware failure did trigger the machine check event. But mce doesn't have it classified Quote The unknown-error-trigger runs on any errors not otherwise categorized. If the mce doesn't re-occur (reset your server to clear out the already existing log), then I'd chalk it up to the stars just weren't aligned properly. But, if it does reoccur, then its going to wind up being on of the following - Power Supply - CPU - Motherboard (in particular the voltage regulation on the board) Unfortunately, only if the problem is reoccuring can you diagnose what is actually causing it... And of course we don't know if anything was actually affected by the uncorrected error or not. Edited May 4, 20179 yr by Squid
May 8, 20179 yr Author Its reoccurring even after reboot... hmm im using PicoPSU power supply, i might test it with other "regular" power supply if it makes a difference. unraidtower-diagnostics-20170508-1902.zip
May 11, 20179 yr Author so changed to other power supply and the same error is occuring But it seems like its an unRAID issue...i found some issue here: https://github.com/andikleen/mcelog/issues/35 so is there an old code problem or what?
May 12, 20179 yr Not sure if mcelog is running as daemon with this plugin, can try from console/telnet/SSH, enter mcelog --client > mcelog.txt and post the txt file if it contains anything, preferably after unRAID has been running for several hours/days. Beyond that I can only suggest rebooting and running the memtest from the boot menu at least overnight to see if you get memory errors. If none and the server is not locking up then I'd say ignore it.
May 15, 20179 yr Author This is what ive got: root@unRAIDTower:/mnt/user/Appz# mcelog mcelog: Family 6 Model 92 CPU: only decoding architectural errors root@unRAIDTower:/mnt/user/Appz# mcelog --client > mcelog.txt mcelog: client connect: No such file or directory mcelog: client command write: Transport endpoint is not connected mcelog: client read: Invalid argument mcelog: client connect: No such file or directory mcelog: client command write: Transport endpoint is not connected mcelog: client read: Invalid argument unraidtower-diagnostics-20170516-0121.zip
May 19, 20179 yr Author did the test for the whole night cca 8h and no issue , RAM are new did test also before i build the server. But now server was if some failed state, not sure if you can see the error in diag, so i made screenshot as well. I didnt had those issues before, strange - ive changed USD flash drive before i bought the product, but as far i understand it just load it once during the boot and thats it... unraidtower-diagnostics-20170519-1405.zip
May 20, 20179 yr Noticed the clocksource doing dance between tsc and hpet. Perhaps Apollo Lake support in 4.9 Kernel not quite fully ready? Someone with more Kernel experience will need to chime in or get the attention of Limetech.
May 20, 20179 yr Author no clue hope someone from Limetech could have a look - or i might try to mail them, if no reply within few days thanks anyway!
May 20, 20179 yr Tsc was final clocksource and had switched to it from hpet. I had a loop in my search and didn't catch it and assumed it switched from tsc back to hpet, but it didn't for the duration of the log that was posted. If keep having issues post another diagnostic before rebooting, if possible. Sent from my ASUS_Z00AD using Tapatalk
May 21, 20179 yr Author new update came out 6.3.4 even more errors occured unraidtower-syslog-20170521-1002.zip
May 24, 20179 yr Author May 24 04:40:08 unRAIDTower root: Fix Common Problems: Error: Machine Check Events detected on your server May 24 04:40:08 unRAIDTower mcelog: Running trigger `unknown-error-trigger' May 24 04:40:08 unRAIDTower mcelog: CPU 0 on socket 0 received unknown error May 24 04:40:08 unRAIDTower mcelog: Location: CPU 0 on socket 0 May 24 04:40:08 unRAIDTower root: mcelog: Family 6 Model 92 CPU: only decoding architectural errors May 24 04:40:08 unRAIDTower root: mcelog: Family 6 Model 92 CPU: only decoding architectural errors May 24 04:40:08 unRAIDTower root: Hardware event. This is not a software error. May 24 04:40:08 unRAIDTower root: MCE 0 May 24 04:40:08 unRAIDTower root: CPU 0 BANK 4 May 24 04:40:08 unRAIDTower root: ADDR fef13b80 May 24 04:40:08 unRAIDTower root: TIME 1495568891 Tue May 23 21:48:11 2017 May 24 04:40:08 unRAIDTower root: MCG status: May 24 04:40:08 unRAIDTower root: MCi status: May 24 04:40:08 unRAIDTower root: Uncorrected error May 24 04:40:08 unRAIDTower root: MCi_ADDR register valid May 24 04:40:08 unRAIDTower root: Processor context corrupt May 24 04:40:08 unRAIDTower root: MCA: Internal unclassified error: 408 May 24 04:40:08 unRAIDTower root: STATUS a600000000020408 MCGSTATUS 0 May 24 04:40:08 unRAIDTower root: MCGCAP c07 APICID 0 SOCKETID 0 May 24 04:40:08 unRAIDTower root: CPUID Vendor Intel Family 6 Model 92 May 24 04:40:08 unRAIDTower root: <27>May 24 04:40:08 mcelog: CPU 0 on socket 0 received unknown error May 24 04:40:08 unRAIDTower root: <27>May 24 04:40:08 mcelog: Location: CPU 0 on socket 0 it start to drive me nuts...:( unraidtower-diagnostics-20170524-2314.zip
June 2, 20179 yr Author so i got reply from support , suggesting that might be HW error, so went to shop, got the same new mainboard - same errors. So its definitely not HW but OS issue, probably Apollo Lake chipset is not implemented right or what. New MB is on 1.20 bios , "old" 1.30 bios and same errors 1. want to try different USB drive, but not sure yet how to do it with license... 2. Also rollback to 6.3 or so
June 2, 20179 yr If you are not experiencing lockups or other problems, just uninstall the plugin so you don't get the mcelog errors and ignore it. Apollo Lake is still a work in progress in the Linux kernel and 4.10+ should provide better support based on what I am reading. You are not the only Apollo Lake that is having similar issues and has to do with Linux, not unRAID.
June 2, 20179 yr Author this is why i like windows....dont need to wait 6-12m till the new HW is supported. Ok but at least we know HW is FINE and its linux problem. --- I do have some reboots , not regular but 1x per 2-3d or so, no clue why Will check the mce plugin, thx
June 2, 20179 yr Just one more update to supplement my latest e-mail, we are approaching the release of 6.4-rc1 which will be on the 4.11 kernel. When we do, please try it and let us know if the errors persist.
June 10, 20179 yr Any luck with this Mobo and the new 6.4rc2 version of unraid? I wanted to purchase this board, but wanted to make sure it is supported by unraid... Any info would be appreciated! Thanks! Edited June 10, 20179 yr by airbillion
June 11, 20179 yr im on vacation...so i can try next week Thursday or so [emoji6]Awesome...I look forward to you results!Thanks!Sent from my ONEPLUS A3000 using Tapatalk
Archived
This topic is now archived and is closed to further replies.