tallguydirk Posted July 23, 2019 Share Posted July 23, 2019 unRAID 6.7.2 Plugins shown in diagnostics Hardware listed in my signature My system locked up today while I wasn't actively interacting with it. I see OOM errors in the syslog but not sure how to diagnose what was causing them. I've been having issues lately with the system locking up randomly. Yesterday I noticed call traces in the syslog but was unable to figure out what they were caused by - see example below: Jul 22 13:02:44 TowerMediaServ kernel: ------------[ cut here ]------------ Jul 22 13:02:44 TowerMediaServ kernel: NETDEV WATCHDOG: eth0 (r8169): transmit queue 0 timed out Jul 22 13:02:44 TowerMediaServ kernel: WARNING: CPU: 5 PID: 0 at net/sched/sch_generic.c:461 dev_watchdog+0x15f/0x1b7 Jul 22 13:02:44 TowerMediaServ kernel: Modules linked in: xt_nat veth xt_CHECKSUM ipt_MASQUERADE ipt_REJECT ip6table_mangle ip6table_nat nf_nat_ipv6 iptable_mangle iptable_nat nf_nat_ipv4 nf_nat ip6table_filter ip6_tables iptable_filter ip_tables vhost_net tun vhost tap arc4 ecb md4 sha512_ssse3 sha512_generic cmac cifs ccm xfs md_mod nct6775 hwmon_vid bonding x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper hid_logitech_hidpp intel_cstate intel_uncore intel_rapl_perf ahci libahci pcc_cpufreq ie31200_edac i2c_i801 hid_logitech_dj video button r8169 ftdi_sio i2c_core usbserial cdc_acm realtek 3w_9xxx backlight Jul 22 13:02:44 TowerMediaServ kernel: CPU: 5 PID: 0 Comm: swapper/5 Not tainted 4.19.56-Unraid #1 Jul 22 13:02:44 TowerMediaServ kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z77 Pro3, BIOS P2.10 07/12/2013 Jul 22 13:02:44 TowerMediaServ kernel: RIP: 0010:dev_watchdog+0x15f/0x1b7 Jul 22 13:02:44 TowerMediaServ kernel: Code: 0b 06 97 00 00 75 36 4c 89 ef c6 05 ff 05 97 00 01 e8 8f b3 fd ff 89 e9 4c 89 ee 48 c7 c7 3e df d8 81 48 89 c2 e8 48 cd b1 ff <0f> 0b eb 0f ff c5 48 81 c2 40 01 00 00 39 cd 75 98 eb 13 48 8b 83 Jul 22 13:02:44 TowerMediaServ kernel: RSP: 0018:ffff88841f543ea0 EFLAGS: 00010286 Jul 22 13:02:44 TowerMediaServ kernel: RAX: 0000000000000000 RBX: ffff88841c9b4438 RCX: 0000000000000007 Jul 22 13:02:44 TowerMediaServ kernel: RDX: 000000000000096f RSI: 0000000000000002 RDI: ffff88841f5564f0 Jul 22 13:02:44 TowerMediaServ kernel: RBP: 0000000000000000 R08: 0000000000000003 R09: 0000000000020300 Jul 22 13:02:44 TowerMediaServ kernel: R10: 000000000000096e R11: 0000000000013510 R12: ffff88841c9b441c Jul 22 13:02:44 TowerMediaServ kernel: R13: ffff88841c9b4000 R14: ffff888418adc080 R15: 0000000000000005 Jul 22 13:02:44 TowerMediaServ kernel: FS: 0000000000000000(0000) GS:ffff88841f540000(0000) knlGS:0000000000000000 Jul 22 13:02:44 TowerMediaServ kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Jul 22 13:02:44 TowerMediaServ kernel: CR2: 000000000572fcb0 CR3: 0000000001e0a002 CR4: 00000000001626e0 Jul 22 13:02:44 TowerMediaServ kernel: Call Trace: Jul 22 13:02:44 TowerMediaServ kernel: <IRQ> Jul 22 13:02:44 TowerMediaServ kernel: call_timer_fn+0x18/0x7b Jul 22 13:02:44 TowerMediaServ kernel: ? qdisc_reset+0xc0/0xc0 Jul 22 13:02:44 TowerMediaServ kernel: expire_timers+0x7f/0x8e Jul 22 13:02:44 TowerMediaServ kernel: run_timer_softirq+0x72/0x120 Jul 22 13:02:44 TowerMediaServ kernel: ? hrtimer_init+0x2/0x2 Jul 22 13:02:44 TowerMediaServ kernel: ? hrtimer_wakeup+0x19/0x1c Jul 22 13:02:44 TowerMediaServ kernel: ? __hrtimer_run_queues+0xbd/0x105 Jul 22 13:02:44 TowerMediaServ kernel: ? recalibrate_cpu_khz+0x1/0x1 Jul 22 13:02:44 TowerMediaServ kernel: ? ktime_get+0x3a/0x8d Jul 22 13:02:44 TowerMediaServ kernel: __do_softirq+0xce/0x1e2 Jul 22 13:02:44 TowerMediaServ kernel: irq_exit+0x5e/0x9d Jul 22 13:02:44 TowerMediaServ kernel: smp_apic_timer_interrupt+0x7e/0x91 Jul 22 13:02:44 TowerMediaServ kernel: apic_timer_interrupt+0xf/0x20 Jul 22 13:02:44 TowerMediaServ kernel: </IRQ> Jul 22 13:02:44 TowerMediaServ kernel: RIP: 0010:cpuidle_enter_state+0xe8/0x141 Jul 22 13:02:44 TowerMediaServ kernel: Code: ff 45 84 ff 74 1d 9c 58 0f 1f 44 00 00 0f ba e0 09 73 09 0f 0b fa 66 0f 1f 44 00 00 31 ff e8 ae 0c be ff fb 66 0f 1f 44 00 00 <48> 2b 1c 24 b8 ff ff ff 7f 48 b9 ff ff ff ff f3 01 00 00 48 39 cb Jul 22 13:02:44 TowerMediaServ kernel: RSP: 0018:ffffc90001937ea0 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff13 Jul 22 13:02:44 TowerMediaServ kernel: RAX: ffff88841f560b00 RBX: 0000025600146501 RCX: 000000000000001f Jul 22 13:02:44 TowerMediaServ kernel: RDX: 0000025600146501 RSI: 0000000025a594f5 RDI: 0000000000000000 Jul 22 13:02:44 TowerMediaServ kernel: RBP: ffff88841f56b500 R08: 0000000000000002 R09: 00000000000203c0 Jul 22 13:02:44 TowerMediaServ kernel: R10: 0000000000287868 R11: 00000817aa7850e9 R12: 0000000000000004 Jul 22 13:02:44 TowerMediaServ kernel: R13: 0000000000000004 R14: ffffffff81e5a018 R15: 0000000000000000 Jul 22 13:02:44 TowerMediaServ kernel: do_idle+0x192/0x20e Jul 22 13:02:44 TowerMediaServ kernel: cpu_startup_entry+0x6a/0x6c Jul 22 13:02:44 TowerMediaServ kernel: start_secondary+0x197/0x1b2 Jul 22 13:02:44 TowerMediaServ kernel: secondary_startup_64+0xa4/0xb0 Jul 22 13:02:44 TowerMediaServ kernel: ---[ end trace 26a17b115aa8021d ]--- Last time I touched it was last night and I was using PlexMediaServer docker, a windows 7 VM, and a Windows 8 VM w/ GPU passthrough running Blue Iris. I had also recently completed several file transfers using krusader and invoked the mover but this may not be relevant. I was having high CPU utilization and couldn't figure out what was causing it. I think I may have shut down my windows 8 VM in order to ensure Plex was able to run smoothly as I had a user watching something. That was the last time I interacted with it until i came home today and noticed my home automation system wasn't working (which runs on the win7 VM). Then I noticed nothing on my unRAID system was working - no VMs, dockers, webUI, or console. After realizing it was fully locked up I pressed the power switch once hoping to initiate a graceful shutdown which luckily appeared to work - after rebooting I browsed to my flash drive over the network and uploaded diagnostics here. I appreciate any assistance! towermediaserv-diagnostics-20190723-1724.zip Quote Link to comment
Squid Posted July 26, 2019 Share Posted July 26, 2019 You're being bombed with this: Jul 23 03:40:44 TowerMediaServ kernel: 3w-9xxx: scsi1: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0xB7. Jul 23 03:40:44 TowerMediaServ kernel: 3w-9xxx: scsi1: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x37. Jul 23 03:42:45 TowerMediaServ kernel: 3w-9xxx: scsi1: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x85. Top of my head, you should set the disks that are attached to your 3ware controller to have the SMART controller type to be 3ware (click on each attached disk in the Main Tab) It *may* also help to uninstall the preclear and statistics sender plugins when you're not actively using them. Quote Link to comment
tallguydirk Posted July 26, 2019 Author Share Posted July 26, 2019 28 minutes ago, Squid said: You're being bombed with this: Jul 23 03:40:44 TowerMediaServ kernel: 3w-9xxx: scsi1: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0xB7. Jul 23 03:40:44 TowerMediaServ kernel: 3w-9xxx: scsi1: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x37. Jul 23 03:42:45 TowerMediaServ kernel: 3w-9xxx: scsi1: ERROR: (0x03:0x0101): Invalid command opcode:opcode=0x85. Top of my head, you should set the disks that are attached to your 3ware controller to have the SMART controller type to be 3ware (click on each attached disk in the Main Tab) It *may* also help to uninstall the preclear and statistics sender plugins when you're not actively using them. Thanks for the suggestion. They are all already set to be 3ware. I've had those errors for literally 2+ years and as far as I can tell they are benign. Something about the controller not passing a certain command when it's trying to pull temperature data or something. I can't rule them out as being related to my problem, but judging by how long they've been present when the machine was otherwise operating without issue, I think there is something else going on here. I'll try uninstalling preclear and the statistics sender plugin and report back if that improves anything. Quote Link to comment
tallguydirk Posted July 30, 2019 Author Share Posted July 30, 2019 The call traces seem to be related to my win8 VM I have running Blue Iris. I've left it shut down for the past 4 days and the server has been running everything else without any issues. Not really sure how to go about figuring out whats wrong with the win8 VM, nothing has really changed configuration wise, except maybe some windows updates... Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.