drawde Posted December 22, 2016 Share Posted December 22, 2016 i have a small script that alerts me when my CPU load is higher than normal. it alerted me to my system load at 39. after logging in to take a look, it definitely was showing it spike up to around 39 and staying there. output from top wasn't showing much going on so i don't think i had any rogue apps or anything. dockers appear to be running.. not sure about other parts of the GUI, but after trying to stop the array it's basically stuck. saw the following error in my syslog: Dec 22 02:08:03 Tower kernel: general protection fault: 0000 [#1] PREEMPT SMP Dec 22 02:08:03 Tower kernel: Modules linked in: md_mod xt_nat veth ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_nat_ipv4 iptable_filter ip_tables nf_nat kvm_amd kvm r8169 ahci mii libahci mvsas libsas scsi_transport_sas sata_sil24 wmi k10temp asus_atk0110 sata_sil i2c_piix4 pata_atiixp i2c_core acpi_cpufreq [last unloaded: md_mod] Dec 22 02:08:03 Tower kernel: CPU: 0 PID: 556 Comm: kswapd0 Not tainted 4.4.30-unRAID #2 Dec 22 02:08:03 Tower kernel: Hardware name: System manufacturer System Product Name/M4A88T-M, BIOS 2403 12/23/2010 Dec 22 02:08:03 Tower kernel: task: ffff88040da8e0c0 ti: ffff8800ca998000 task.ti: ffff8800ca998000 Dec 22 02:08:03 Tower kernel: RIP: 0010:[<ffffffff8111dfaf>] [<ffffffff8111dfaf>] __destroy_inode+0xcc/0x11b Dec 22 02:08:03 Tower kernel: RSP: 0018:ffff8800ca99bbf0 EFLAGS: 00010206 Dec 22 02:08:03 Tower kernel: RAX: 0000ffffffffffff RBX: ffff88001405d8f0 RCX: 0000000000000000 Dec 22 02:08:03 Tower kernel: RDX: 0000000000000001 RSI: ffff88001405d970 RDI: 0001000000000000 Dec 22 02:08:03 Tower kernel: RBP: ffff8800ca99bbf8 R08: ffff88041ffcc2a0 R09: 0000000000000003 Dec 22 02:08:03 Tower kernel: R10: ffffea0002e165c0 R11: 0000000000000000 R12: ffff88001405d970 Dec 22 02:08:03 Tower kernel: R13: ffffffff81667280 R14: ffff8800ca99bd20 R15: 000000000000002b Dec 22 02:08:03 Tower kernel: FS: 00002b3df7176700(0000) GS:ffff88041fc00000(0000) knlGS:0000000000000000 Dec 22 02:08:03 Tower kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b Dec 22 02:08:03 Tower kernel: CR2: 00002b5bccef5000 CR3: 000000025de7f000 CR4: 00000000000006f0 Dec 22 02:08:03 Tower kernel: Stack: Dec 22 02:08:03 Tower kernel: ffff88001405d8f0 ffff8800ca99bc10 ffffffff8111e5aa ffff88001405d8f0 Dec 22 02:08:03 Tower kernel: ffff8800ca99bc38 ffffffff8111e735 ffff8800ca99bc68 ffff8800a46ddcb0 Dec 22 02:08:03 Tower kernel: 0000000000000000 ffff8800ca99bc50 ffffffff8111e76d ffff8800ca99bc68 Dec 22 02:08:03 Tower kernel: Call Trace: Dec 22 02:08:03 Tower kernel: [<ffffffff8111e5aa>] destroy_inode+0x1f/0x4d Dec 22 02:08:03 Tower kernel: [<ffffffff8111e735>] evict+0x15d/0x164 Dec 22 02:08:03 Tower kernel: [<ffffffff8111e76d>] dispose_list+0x31/0x3b Dec 22 02:08:03 Tower kernel: [<ffffffff8111f8bd>] prune_icache_sb+0x45/0x50 Dec 22 02:08:03 Tower kernel: [<ffffffff8110cf3c>] super_cache_scan+0x12a/0x174 Dec 22 02:08:03 Tower kernel: [<ffffffff810c6d52>] shrink_slab.part.6+0x190/0x20b Dec 22 02:08:03 Tower kernel: [<ffffffff810c91d8>] shrink_zone+0x17c/0x265 Dec 22 02:08:03 Tower kernel: [<ffffffff810c9f5f>] kswapd+0x5bc/0x75d Dec 22 02:08:03 Tower kernel: [<ffffffff81062f01>] ? finish_task_switch+0xee/0x1b5 Dec 22 02:08:03 Tower kernel: [<ffffffff810c99a3>] ? mem_cgroup_shrink_node_zone+0xae/0xae Dec 22 02:08:03 Tower kernel: [<ffffffff8105fb24>] kthread+0xcd/0xd5 Dec 22 02:08:03 Tower kernel: [<ffffffff8105fa57>] ? kthread_worker_fn+0x137/0x137 Dec 22 02:08:03 Tower kernel: [<ffffffff81629f7f>] ret_from_fork+0x3f/0x70 Dec 22 02:08:03 Tower kernel: [<ffffffff8105fa57>] ? kthread_worker_fn+0x137/0x137 Dec 22 02:08:03 Tower kernel: Code: 48 c7 c7 9f 09 79 81 e8 c3 c5 f2 ff 48 8b 43 28 f0 48 ff 88 f0 04 00 00 48 8b 7b 10 48 8d 47 ff 48 83 f8 fd 77 0a 48 85 ff 74 05 <f0> ff 0f 74 38 48 8b 7b 18 48 8d 47 ff 48 83 f8 fd 77 0a 48 85 Dec 22 02:08:03 Tower kernel: RIP [<ffffffff8111dfaf>] __destroy_inode+0xcc/0x11b Dec 22 02:08:03 Tower kernel: RSP <ffff8800ca99bbf0> Dec 22 02:08:03 Tower kernel: ---[ end trace 224c26f716313710 ]--- top - 12:00:58 up 5 days, 11:12, 4 users, load average: 41.07, 40.68, 37.77 Tasks: 970 total, 1 running, 442 sleeping, 5 stopped, 522 zombie %Cpu(s): 3.1 us, 1.2 sy, 0.0 ni, 94.9 id, 0.8 wa, 0.0 hi, 0.0 si, 0.0 st KiB Mem : 16178504 total, 414596 free, 1951004 used, 13812904 buff/cache KiB Swap: 0 total, 0 free, 0 used. 13126372 avail Mem PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 17454 nobody 20 0 1484124 621224 23768 S 6.9 3.8 38:04.69 mono 12839 root 20 0 25676 3880 2344 R 1.6 0.0 0:00.10 top 3723 sshd 20 0 323820 22988 13892 S 0.7 0.1 0:09.24 apache2 14883 root 20 0 1759292 40980 21516 S 0.7 0.3 8:20.57 docker 17097 nobody 20 0 163116 50112 3796 S 0.7 0.3 10:19.11 python 17155 nobody 20 0 1185268 54312 8752 S 0.7 0.3 11:49.91 kodi.bin 18700 sshd 20 0 323808 23028 13820 S 0.7 0.1 0:09.33 apache2 1586 root 20 0 9680 2512 2060 S 0.3 0.0 35:03.41 cpuload 8097 root 20 0 0 0 0 S 0.3 0.0 1:34.62 kworker/2:0 18530 nobody 20 0 848256 188260 3472 S 0.3 1.2 4:41.06 mysqld 18666 nobody 20 0 184664 79180 2828 S 0.3 0.5 11:18.32 python 18697 sshd 20 0 324644 22820 13276 S 0.3 0.1 0:09.22 apache2 20012 nobody 35 15 1774328 68288 11236 S 0.3 0.4 3:01.34 Plex Script Hos 20080 nobody 20 0 250804 52960 17440 S 0.3 0.3 3:32.61 Plex DLNA Serve 31704 root 20 0 0 0 0 S 0.3 0.0 0:02.01 kworker/u12:1 1 root 20 0 4372 1556 1456 S 0.0 0.0 0:14.09 init 2 root 20 0 0 0 0 S 0.0 0.0 0:00.24 kthreadd 3 root 20 0 0 0 0 S 0.0 0.0 0:23.78 ksoftirqd/0 7 root 20 0 0 0 0 S 0.0 0.0 22:51.31 rcu_preempt 8 root 20 0 0 0 0 S 0.0 0.0 0:00.01 rcu_sched 9 root 20 0 0 0 0 S 0.0 0.0 0:00.04 rcu_bh 10 root rt 0 0 0 0 S 0.0 0.0 0:18.79 migration/0 11 root rt 0 0 0 0 S 0.0 0.0 0:18.55 migration/1 12 root 20 0 0 0 0 S 0.0 0.0 0:24.82 ksoftirqd/1 15 root rt 0 0 0 0 S 0.0 0.0 0:18.61 migration/2 16 root 20 0 0 0 0 S 0.0 0.0 0:57.49 ksoftirqd/2 19 root rt 0 0 0 0 S 0.0 0.0 0:17.74 migration/3 20 root 20 0 0 0 0 S 0.0 0.0 16:59.53 ksoftirqd/3 22 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 kworker/3:0H 23 root 20 0 0 0 0 S 0.0 0.0 0:00.01 kdevtmpfs 24 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 netns 27 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 perf 273 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 writeback 275 root 25 5 0 0 0 S 0.0 0.0 0:00.00 ksmd 276 root 39 19 0 0 0 S 0.0 0.0 1:06.93 khugepaged 277 root 0 -20 0 0 0 S 0.0 0.0 0:00.00 crypto i'm also monitoring temps via snmp and for a cpu load that high, my temps are perfect, actually maybe even on the lower side than normally idle. tried to gracefully shut down and it's not doing anything. i'm remote but i'm waiting for someone to get there to forcefully reboot. MobaXterm_tower_20161222_114225.zip Quote Link to comment
RobJ Posted December 22, 2016 Share Posted December 22, 2016 That's a GPF, something I haven't seen in awhile. Unfortunately, there isn't a good clue evident as to the cause. Check for a motherboard BIOS update, yours is from 2010. I have seen another user with issues, also involving mono. Using PhAzE's plugins? I believe he has compiled a newer version of mono recently, but I don't think he believes there's anything wrong with mono. You also have "PCIe ACS overrides enabled", which may allow you to do things you couldn't otherwise, but also carries some risk. I don't know enough to say whether it could allow a GPF. And it could be just a random event, a memory fault, over heat event, or unknown and unknowable cosmic event. Quote Link to comment
drawde Posted December 22, 2016 Author Share Posted December 22, 2016 That's a GPF, something I haven't seen in awhile. Unfortunately, there isn't a good clue evident as to the cause. Check for a motherboard BIOS update, yours is from 2010. I have seen another user with issues, also involving mono. Using PhAzE's plugins? I believe he has compiled a newer version of mono recently, but I don't think he believes there's anything wrong with mono. is there something in the syslog error that leads you to believe it's mono? or because of the top output? i think sonarr uses mono but i'm not sure if that's what caused the issue, i think it just happened to be doing something when i copied that top. for now i'll keep an eye on it. i don't think it was an overheat event as snmp was still reporting and my temps were OK. if it happens again i'll run memtest. i did recently add 2 new drives (within 1 week), i precleared both drives 3x without much issue, so hopefully that's not it. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.