Jump to content

Server keeps freezing up


Recommended Posts

So a couple weeks ago I was having issues with the server locking up and needing a hard reboot to get back into service. Someone steered me toward bios settings for power supply idle control set to typical and I disabled Global C-state Control and it seemed to be ok for about 2 weeks then started freezing again.

 

Ver 6.8.3

Asrock B450M Pro4

Ryzen 3 3600G

G Skill F4-3200C16D-16GFX (2x8G Kit)

 

Here is a link to my original post. 

 

I wanted to rule out Memory issues and ran memtest for 3 passes with no errors

 

I had left the logging on and found this was logged when it crashed this morning. Anyone able to decipher it?

 

 

 

Feb  6 10:05:08 Tower kernel: BUG: Bad page state in process php  pfn:2cbae6
Feb  6 10:05:08 Tower kernel: page:ffffea000b2eb980 count:0 mapcount:-8192 mapping:0000000000002000 index:0x1 compound_mapcount: -30587
Feb  6 10:05:08 Tower kernel: flags: 0x2ffff000000a000(private_2|head)
Feb  6 10:05:08 Tower kernel: raw: 02ffff000000a000 dead000000000100 dead000000000200 0000000000002000
Feb  6 10:05:08 Tower kernel: raw: 0000000000000001 0000000000000000 00000000ffffdfff 0000000000004000
Feb  6 10:05:08 Tower kernel: page dumped because: page still charged to cgroup
Feb  6 10:05:08 Tower kernel: page->mem_cgroup:0000000000004000
Feb  6 10:05:08 Tower kernel: bad because of flags: 0xa000(private_2|head)
Feb  6 10:05:08 Tower kernel: Modules linked in: xt_CHECKSUM ipt_REJECT ip6table_mangle ip6table_nat nf_nat_ipv6 iptable_mangle ip6table_filter ip6_tables vhost_net tun vhost tap ipt_MASQUERADE iptable_filter iptable_nat nf_nat_ipv4 nf_nat ip_tables xfs md_mod nct6775 hwmon_vid fam15h_power bonding edac_mce_amd kvm_amd ccp kvm k10temp crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd wmi_bmof ahci r8169 video libahci i2c_piix4 pcc_cpufreq glue_helper i2c_core wmi realtek backlight button acpi_cpufreq
Feb  6 10:05:08 Tower kernel: CPU: 0 PID: 21268 Comm: php Not tainted 4.19.107-Unraid #1
Feb  6 10:05:08 Tower kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./B450M Pro4, BIOS P4.90 12/17/2020
Feb  6 10:05:08 Tower kernel: Call Trace:
Feb  6 10:05:08 Tower kernel: dump_stack+0x67/0x83
Feb  6 10:05:08 Tower kernel: bad_page+0xec/0x106
Feb  6 10:05:08 Tower kernel: get_page_from_freelist+0x9f4/0xd0b
Feb  6 10:05:08 Tower kernel: __alloc_pages_nodemask+0x150/0xae1
Feb  6 10:05:08 Tower kernel: ? flush_tlb_func_common.constprop.0+0x99/0xc2
Feb  6 10:05:08 Tower kernel: ? cpumask_next+0x15/0x16
Feb  6 10:05:08 Tower kernel: ? cpumask_any_but+0x14/0x23
Feb  6 10:05:08 Tower kernel: ? __vma_adjust+0x44f/0x58c
Feb  6 10:05:08 Tower kernel: alloc_pages_vma+0x13c/0x163
Feb  6 10:05:08 Tower kernel: __handle_mm_fault+0xa79/0x11b7
Feb  6 10:05:08 Tower kernel: handle_mm_fault+0x189/0x1e3
Feb  6 10:05:08 Tower kernel: __do_page_fault+0x267/0x3ff
Feb  6 10:05:08 Tower kernel: ? page_fault+0x8/0x30
Feb  6 10:05:08 Tower kernel: page_fault+0x1e/0x30
Feb  6 10:05:08 Tower kernel: RIP: 0033:0x1491b6c8e616
Feb  6 10:05:08 Tower kernel: Code: e0 c5 fe 6f 51 c0 c5 fe 6f 59 a0 48 81 e9 80 00 00 00 48 81 ea 80 00 00 00 c4 c1 7d 7f 01 c4 c1 7d 7f 49 e0 c4 c1 7d 7f 51 c0 <c4> c1 7d 7f 59 a0 49 81 e9 80 00 00 00 48 81 fa 80 00 00 00 77 b8
Feb  6 10:05:08 Tower kernel: RSP: 002b:00007ffe3ffa2348 EFLAGS: 00010202
Feb  6 10:05:08 Tower kernel: RAX: 0000000000ee5b70 RBX: 00000000ffffffff RCX: 0000000000e9b720
Feb  6 10:05:08 Tower kernel: RDX: 0000000000005470 RSI: 0000000000e962d0 RDI: 0000000000ee5b70
Feb  6 10:05:08 Tower kernel: RBP: 0000000000e86090 R08: 0000000000000010 R09: 0000000000eeb040
Feb  6 10:05:08 Tower kernel: R10: 0000000000ef3000 R11: 0000000000eedb50 R12: 0000000000e942d0
Feb  6 10:05:08 Tower kernel: R13: 0000000000e962d0 R14: 0000000000e962d0 R15: 0000000000000008
Feb  6 10:05:08 Tower kernel: Disabling lock debugging due to kernel taint

 

 

Edited by Barryrod
Link to comment

Attaching my diagnostics file and newest log. It was crashed again this afternoon. Every time it crashes, seems to show something diff. I checked and memory seems to be running at 2133 even though they are 3200 modules. i read that clocking them down to 2400 was a good idea, but mine are naturally running at 2133 i guess

 

Starting to regret using unraid

tower-diagnostics-20210209-1655.zip syslog (12)

Edited by Barryrod
Link to comment
5 hours ago, JorgeB said:

Diags are just after rebooting so not much to see, assuming the "power supply idle control" is correctly set you can try this and then post that log.

That is how I got the syslog. I had mirrored the log onto the flash drive for a while now and had not turned it off yet due to the issues I was having with crashing. What I do is start at the bottom and search for root@Develop with direction set to up to find the beginning of the boot cycle, then look just before that to see what happened. I just do not understand what I am seeing.

Link to comment

I missed that, but unfortunately there's nothing logged before the crash, that points to a hardware issue, another thing you can try it to boot the server in safe mode with all docker/VMs disable, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.

Link to comment
25 minutes ago, JorgeB said:

I missed that, but unfortunately there's nothing logged before the crash, that points to a hardware issue, another thing you can try it to boot the server in safe mode with all docker/VMs disable, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.

Each time it crashes it is a crap shoot as to if anything is logged. The log text I posted in the initial posting above from Feb  6 10:05:08 was the closest I have come to seeing why it is crashing. I will try booting into safe mode and go from there

 

Edited by Barryrod
Link to comment
On 2/10/2021 at 8:45 AM, JorgeB said:

I missed that, but unfortunately there's nothing logged before the crash, that points to a hardware issue, another thing you can try it to boot the server in safe mode with all docker/VMs disable, let it run as a basic NAS for a few days, if it still crashes it's likely a hardware problem, if it doesn't start turning on the other services one by one.

It still crashed after a day or so. Do you think I may be better off removing the Ryzen 3 3200G APU and putting in a newer Ryzen 5 3600?

Link to comment
  • 1 year later...
On 2/6/2021 at 3:11 PM, Barryrod said:

So a couple weeks ago I was having issues with the server locking up and needing a hard reboot to get back into service. Someone steered me toward bios settings for power supply idle control set to typical and I disabled Global C-state Control and it seemed to be ok for about 2 weeks then started freezing again.

 

Ver 6.8.3

Asrock B450M Pro4

Ryzen 3 3600G

G Skill F4-3200C16D-16GFX (2x8G Kit)

 

 

I think I'm having this same issue with my B450M Pro4, R5 1400, Unraid 6.10.3 trial. Where is the "Power Supply Idle Control" setting in the bios? (I can't find it)

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...