D'n'S137 Posted May 23, 2020 Posted May 23, 2020 (edited) Hello all, My Situation: fresh System installed since 3 Month with new UNRAID V6, works fine for weeks, used Docker(lancache, folding@home, jdownloader[query downloads with slow internet]) also used VMs Windows and Linux and i had have some SMB and APF shares. Problem: My system crashes randomly after some time, even with no docker and no VMs started. When the system crashes it needs to be shutdown manually, i can´t login to the webUI and also the console is not responding but i still get my hourly Array Status and can ping the systems IP-Address. I have inserted an USB-Networkcard, i will make a test without it. - No change! - In the logs Folder there are no logs. Greetings mps-diagnostics-20200524-1440.zip Edited May 24, 2020 by D'n'S137 added diagnostics ZIP Quote
Frank1940 Posted May 24, 2020 Posted May 24, 2020 I would suggest that you set up the Syslog Server as it may log some bit of information that is causing your problem. You can find instruction on how to this in the following: https://forums.unraid.net/topic/46802-faq-for-unraid-v6/page/2/?tab=comments#comment-781601 1 Quote
D'n'S137 Posted May 24, 2020 Author Posted May 24, 2020 At last boot i got no errors after two hours the system crashed again i had to reset the system(hard shutdown). Today i booted the system an got some errors in the LOG see below: May 24 17:36:58 MPS kernel: Warning: PCIe ACS overrides enabled; This may allow non-IOMMU protected peer-to-peer DMA May 24 17:36:58 MPS kernel: tsc: Fast TSC calibration failed May 24 17:36:58 MPS kernel: ACPI: Early table checksum verification disabled May 24 17:36:58 MPS kernel: mce: [Hardware Error]: Machine check events logged May 24 17:36:58 MPS kernel: mce: [Hardware Error]: CPU 4: Machine Check: 0 Bank 5: bea0000000000108 May 24 17:36:58 MPS kernel: mce: [Hardware Error]: TSC 0 ADDR 1ffff8168b28a MISC d012000100000000 SYND 4d000000 IPID 500b000000000 May 24 17:36:58 MPS kernel: mce: [Hardware Error]: PROCESSOR 2:800f11 TIME 1590334600 SOCKET 0 APIC 8 microcode 8001138 May 24 17:36:58 MPS kernel: mce: [Hardware Error]: Machine check events logged May 24 17:36:58 MPS kernel: mce: [Hardware Error]: CPU 8: Machine Check: 0 Bank 5: bea0000000000108 May 24 17:36:58 MPS kernel: mce: [Hardware Error]: TSC 0 ADDR 1ffff8168b28a MISC d012000100000000 SYND 4d000000 IPID 500b000000000 May 24 17:36:58 MPS kernel: mce: [Hardware Error]: PROCESSOR 2:800f11 TIME 1590334600 SOCKET 0 APIC 1 microcode 8001138 May 24 17:36:58 MPS kernel: floppy0: no floppy controllers found May 24 17:36:58 MPS kernel: ccp 0000:07:00.2: psp initialization failed May 24 17:36:58 MPS kernel: ACPI Warning: SystemIO range 0x0000000000000B00-0x0000000000000B08 conflicts with OpRegion 0x0000000000000B00-0x0000000000000B0F (\GSA1.SMBI) (20180810/utaddress-204) May 24 17:36:58 MPS kernel: ata5.00: failed to enable Sense Data Reporting, Emask 0x1 May 24 17:36:58 MPS kernel: ata5.00: failed to enable Sense Data Reporting, Emask 0x1 May 24 17:36:58 MPS kernel: ata6.00: failed to enable Sense Data Reporting, Emask 0x1 May 24 17:36:58 MPS kernel: ata6.00: failed to enable Sense Data Reporting, Emask 0x1 May 24 17:37:11 MPS rpc.statd[1972]: Failed to read /var/lib/nfs/state: Success Do i have a bad CPU? Are there any reliable hardware test? I bought the CPU *used* to save a bug, maybe this was a scam/mistake. Quote
FreeMan Posted May 24, 2020 Posted May 24, 2020 I ran into MCEs with my old server. The recommendations from here were that I contact the CPU & motherboard vendors to see if there was anything they could do/help with. I ended up having to replace the CPU. Not saying that's guaranteed to be your only course of action here, but bracing you for the worst. Wait for someone more knowledgeable to chime in, but you may want to at least start checking with your vendors. 1 Quote
D'n'S137 Posted May 25, 2020 Author Posted May 25, 2020 I ran into MCEs with my old server. The recommendations from here were that I contact the CPU & motherboard vendors to see if there was anything they could do/help with. I ended up having to replace the CPU. Not saying that's guaranteed to be your only course of action here, but bracing you for the worst. Wait for someone more knowledgeable to chime in, but you may want to at least start checking with your vendors.Thanks for your experience report. It leads me to search for my Problem at AMD, some other people had similar issues with the same CPU it seems to be a Problem with the C-States and the SMT feature so I disabled C-States in my BIOS. Now I am testing if this solved my Problem. But I will reach out for the AMD support because it seems like they know the issue, with these Ryzen 7 1700(x) CPUs. Gesendet von meinem Pixel 3a XL mit Tapatalk Quote
Frank1940 Posted May 25, 2020 Posted May 25, 2020 google unraid.net amd ryzen problems And read the posts from Unraid.net first. You can read the ones on Reddit if you want more information. 1 Quote
D'n'S137 Posted May 26, 2020 Author Posted May 26, 2020 21 hours ago, Frank1940 said: google unraid.net amd ryzen problems And read the posts from Unraid.net first. You can read the ones on Reddit if you want more information. Okay so Thanks for your advise. I disabled C-States and SMT features, since i had no Problems at all. But i was wondering why can i run the Server for some days bevor and now i get these errors, because C-States were active all the time. Quote
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.