Barryrod Posted February 6, 2021 Share Posted February 6, 2021 (edited) So a couple weeks ago I was having issues with the server locking up and needing a hard reboot to get back into service. Someone steered me toward bios settings for power supply idle control set to typical and I disabled Global C-state Control and it seemed to be ok for about 2 weeks then started freezing again. Ver 6.8.3 Asrock B450M Pro4 Ryzen 3 3600G G Skill F4-3200C16D-16GFX (2x8G Kit) Here is a link to my original post. I wanted to rule out Memory issues and ran memtest for 3 passes with no errors I had left the logging on and found this was logged when it crashed this morning. Anyone able to decipher it? Feb 6 10:05:08 Tower kernel: BUG: Bad page state in process php pfn:2cbae6 Feb 6 10:05:08 Tower kernel: page:ffffea000b2eb980 count:0 mapcount:-8192 mapping:0000000000002000 index:0x1 compound_mapcount: -30587 Feb 6 10:05:08 Tower kernel: flags: 0x2ffff000000a000(private_2|head) Feb 6 10:05:08 Tower kernel: raw: 02ffff000000a000 dead000000000100 dead000000000200 0000000000002000 Feb 6 10:05:08 Tower kernel: raw: 0000000000000001 0000000000000000 00000000ffffdfff 0000000000004000 Feb 6 10:05:08 Tower kernel: page dumped because: page still charged to cgroup Feb 6 10:05:08 Tower kernel: page->mem_cgroup:0000000000004000 Feb 6 10:05:08 Tower kernel: bad because of flags: 0xa000(private_2|head) Feb 6 10:05:08 Tower kernel: Modules linked in: xt_CHECKSUM ipt_REJECT ip6table_mangle ip6table_nat nf_nat_ipv6 iptable_mangle ip6table_filter ip6_tables vhost_net tun vhost tap ipt_MASQUERADE iptable_filter iptable_nat nf_nat_ipv4 nf_nat ip_tables xfs md_mod nct6775 hwmon_vid fam15h_power bonding edac_mce_amd kvm_amd ccp kvm k10temp crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd wmi_bmof ahci r8169 video libahci i2c_piix4 pcc_cpufreq glue_helper i2c_core wmi realtek backlight button acpi_cpufreq Feb 6 10:05:08 Tower kernel: CPU: 0 PID: 21268 Comm: php Not tainted 4.19.107-Unraid #1 Feb 6 10:05:08 Tower kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B450M Pro4, BIOS P4.90 12/17/2020 Feb 6 10:05:08 Tower kernel: Call Trace: Feb 6 10:05:08 Tower kernel: dump_stack+0x67/0x83 Feb 6 10:05:08 Tower kernel: bad_page+0xec/0x106 Feb 6 10:05:08 Tower kernel: get_page_from_freelist+0x9f4/0xd0b Feb 6 10:05:08 Tower kernel: __alloc_pages_nodemask+0x150/0xae1 Feb 6 10:05:08 Tower kernel: ? flush_tlb_func_common.constprop.0+0x99/0xc2 Feb 6 10:05:08 Tower kernel: ? cpumask_next+0x15/0x16 Feb 6 10:05:08 Tower kernel: ? cpumask_any_but+0x14/0x23 Feb 6 10:05:08 Tower kernel: ? __vma_adjust+0x44f/0x58c Feb 6 10:05:08 Tower kernel: alloc_pages_vma+0x13c/0x163 Feb 6 10:05:08 Tower kernel: __handle_mm_fault+0xa79/0x11b7 Feb 6 10:05:08 Tower kernel: handle_mm_fault+0x189/0x1e3 Feb 6 10:05:08 Tower kernel: __do_page_fault+0x267/0x3ff Feb 6 10:05:08 Tower kernel: ? page_fault+0x8/0x30 Feb 6 10:05:08 Tower kernel: page_fault+0x1e/0x30 Feb 6 10:05:08 Tower kernel: RIP: 0033:0x1491b6c8e616 Feb 6 10:05:08 Tower kernel: Code: e0 c5 fe 6f 51 c0 c5 fe 6f 59 a0 48 81 e9 80 00 00 00 48 81 ea 80 00 00 00 c4 c1 7d 7f 01 c4 c1 7d 7f 49 e0 c4 c1 7d 7f 51 c0 <c4> c1 7d 7f 59 a0 49 81 e9 80 00 00 00 48 81 fa 80 00 00 00 77 b8 Feb 6 10:05:08 Tower kernel: RSP: 002b:00007ffe3ffa2348 EFLAGS: 00010202 Feb 6 10:05:08 Tower kernel: RAX: 0000000000ee5b70 RBX: 00000000ffffffff RCX: 0000000000e9b720 Feb 6 10:05:08 Tower kernel: RDX: 0000000000005470 RSI: 0000000000e962d0 RDI: 0000000000ee5b70 Feb 6 10:05:08 Tower kernel: RBP: 0000000000e86090 R08: 0000000000000010 R09: 0000000000eeb040 Feb 6 10:05:08 Tower kernel: R10: 0000000000ef3000 R11: 0000000000eedb50 R12: 0000000000e942d0 Feb 6 10:05:08 Tower kernel: R13: 0000000000e962d0 R14: 0000000000e962d0 R15: 0000000000000008 Feb 6 10:05:08 Tower kernel: Disabling lock debugging due to kernel taint Edited February 7, 2021 by Barryrod Quote Link to comment
Barryrod Posted February 9, 2021 Author Share Posted February 9, 2021 (edited) Attaching my diagnostics file and newest log. It was crashed again this afternoon. Every time it crashes, seems to show something diff. I checked and memory seems to be running at 2133 even though they are 3200 modules. i read that clocking them down to 2400 was a good idea, but mine are naturally running at 2133 i guess Starting to regret using unraid tower-diagnostics-20210209-1655.zip syslog (12) Edited February 9, 2021 by Barryrod Quote Link to comment
JorgeB Posted February 10, 2021 Share Posted February 10, 2021 Diags are just after rebooting so not much to see, assuming the "power supply idle control" is correctly set you can try this and then post that log. Quote Link to comment
Barryrod Posted February 10, 2021 Author Share Posted February 10, 2021 5 hours ago, JorgeB said: Diags are just after rebooting so not much to see, assuming the "power supply idle control" is correctly set you can try this and then post that log. That is how I got the syslog. I had mirrored the log onto the flash drive for a while now and had not turned it off yet due to the issues I was having with crashing. What I do is start at the bottom and search for root@Develop with direction set to up to find the beginning of the boot cycle, then look just before that to see what happened. I just do not understand what I am seeing. Quote Link to comment
JorgeB Posted February 10, 2021 Share Posted February 10, 2021 I missed that, but unfortunately there's nothing logged before the crash, that points to a hardware issue, another thing you can try it to boot the server in safe mode with all docker/VMs disable, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one. Quote Link to comment
Barryrod Posted February 10, 2021 Author Share Posted February 10, 2021 (edited) 25 minutes ago, JorgeB said: I missed that, but unfortunately there's nothing logged before the crash, that points to a hardware issue, another thing you can try it to boot the server in safe mode with all docker/VMs disable, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one. Each time it crashes it is a crap shoot as to if anything is logged. The log text I posted in the initial posting above from Feb 6 10:05:08 was the closest I have come to seeing why it is crashing. I will try booting into safe mode and go from there Edited February 10, 2021 by Barryrod Quote Link to comment
Barryrod Posted February 14, 2021 Author Share Posted February 14, 2021 On 2/10/2021 at 8:45 AM, JorgeB said: I missed that, but unfortunately there's nothing logged before the crash, that points to a hardware issue, another thing you can try it to boot the server in safe mode with all docker/VMs disable, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one. It still crashed after a day or so. Do you think I may be better off removing the Ryzen 3 3200G APU and putting in a newer Ryzen 5 3600? Quote Link to comment
JorgeB Posted February 15, 2021 Share Posted February 15, 2021 13 hours ago, Barryrod said: Do you think I may be better off removing the Ryzen 3 3200G APU and putting in a newer Ryzen 5 3600? Difficult to say if it will help or not. Quote Link to comment
Matt Elias Posted June 29, 2022 Share Posted June 29, 2022 On 2/6/2021 at 3:11 PM, Barryrod said: So a couple weeks ago I was having issues with the server locking up and needing a hard reboot to get back into service. Someone steered me toward bios settings for power supply idle control set to typical and I disabled Global C-state Control and it seemed to be ok for about 2 weeks then started freezing again. Ver 6.8.3 Asrock B450M Pro4 Ryzen 3 3600G G Skill F4-3200C16D-16GFX (2x8G Kit) I think I'm having this same issue with my B450M Pro4, R5 1400, Unraid 6.10.3 trial. Where is the "Power Supply Idle Control" setting in the bios? (I can't find it) Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.