wesman Posted February 25, 2020 Share Posted February 25, 2020 I continue to have random system crashes, I was having issues with Reading from Disks (replaced SATA cables fixed this) but I am still getting random crashes, any help would be appreciated. valyria-diagnostics-20200224-1914.zip Quote Link to comment
JorgeB Posted February 25, 2020 Share Posted February 25, 2020 You can try this to see if it catches anything, booting in safe mode with all dockers/VMs off is also worth trying. Quote Link to comment
wesman Posted March 7, 2020 Author Share Posted March 7, 2020 Thanks @johnnie.black I setup the syslog capture and I am still seeing instability in the system but have no idea what I am looking at. it started to act a little crazy, wouldnt load the docker page, so I tried to reboot it, but it is hung. Attached is the syslog syslog-192.168.29.30.log Quote Link to comment
wesman Posted March 7, 2020 Author Share Posted March 7, 2020 Something keeps hanging in my system, resulting in me having to do a hard reset. I have tried all the thinks I can find to cli shutdown the array when it hangs but noting works. every two days, I have to hard crash my system, and run a parity check.. Any help is appreciated! interrupts.1 syslog valyria-diagnostics-20200307-1450.zip Quote Link to comment
Squid Posted March 7, 2020 Share Posted March 7, 2020 Are you overclocking? Have you run a memtest yet? Quote Link to comment
trurl Posted March 7, 2020 Share Posted March 7, 2020 Why did you start a new thread? I have merged your threads. Quote Link to comment
wesman Posted March 7, 2020 Author Share Posted March 7, 2020 8 minutes ago, Squid said: Are you overclocking? Have you run a memtest yet? @Squid I am not overclocking, but I don't know what memtest does or how to run it Quote Link to comment
wesman Posted March 7, 2020 Author Share Posted March 7, 2020 6 minutes ago, trurl said: Why did you start a new thread? I have merged your threads. Thanks Quote Link to comment
Squid Posted March 7, 2020 Share Posted March 7, 2020 It tests your memory. You run it via the boot menu Quote Link to comment
JorgeB Posted March 9, 2020 Share Posted March 9, 2020 You also have an overheating CPU, check cooling: Mar 5 08:11:15 Valyria kernel: CPU4: Core temperature above threshold, cpu clock throttled (total events = 1) Mar 5 08:11:15 Valyria kernel: CPU12: Core temperature above threshold, cpu clock throttled (total events = 1) Mar 5 08:11:15 Valyria kernel: CPU0: Package temperature above threshold, cpu clock throttled (total events = 1) Mar 5 08:11:15 Valyria kernel: CPU1: Package temperature above threshold, cpu clock throttled (total events = 1) Mar 5 08:11:15 Valyria kernel: CPU5: Package temperature above threshold, cpu clock throttled (total events = 1) Mar 5 08:11:15 Valyria kernel: CPU14: Package temperature above threshold, cpu clock throttled (total events = 1) Mar 5 08:11:15 Valyria kernel: CPU13: Package temperature above threshold, cpu clock throttled (total events = 1) Mar 5 08:11:15 Valyria kernel: CPU7: Package temperature above threshold, cpu clock throttled (total events = 1) Mar 5 08:11:15 Valyria kernel: CPU15: Package temperature above threshold, cpu clock throttled (total events = 1) Mar 5 08:11:15 Valyria kernel: CPU6: Package temperature above threshold, cpu clock throttled (total events = 1) Mar 5 08:11:15 Valyria kernel: CPU9: Package temperature above threshold, cpu clock throttled (total events = 1) Mar 5 08:11:15 Valyria kernel: CPU8: Package temperature above threshold, cpu clock throttled (total events = 1) Mar 5 08:11:15 Valyria kernel: CPU10: Package temperature above threshold, cpu clock throttled (total events = 1) Mar 5 08:11:15 Valyria kernel: CPU2: Package temperature above threshold, cpu clock throttled (total events = 1) Mar 5 08:11:15 Valyria kernel: CPU3: Package temperature above threshold, cpu clock throttled (total events = 1) Mar 5 08:11:15 Valyria kernel: CPU11: Package temperature above threshold, cpu clock throttled (total events = 1) Mar 5 08:11:15 Valyria kernel: CPU12: Package temperature above threshold, cpu clock throttled (total events = 1) Mar 5 08:11:15 Valyria kernel: CPU4: Package temperature above threshold, cpu clock throttled (total events = 1) Quote Link to comment
wesman Posted March 10, 2020 Author Share Posted March 10, 2020 On 3/7/2020 at 4:54 PM, Squid said: It tests your memory. You run it via the boot menu @Squid I am hoping I am not doing something wrong, but when I select Memtest, it just reboot the server, and comes back the menu, or boots to unraid if I let it. Is it suppose to do something? Like this Quote Link to comment
itimpi Posted March 10, 2020 Share Posted March 10, 2020 Memtest supplied with Unraid will only work Id booting in Legacy mode. If you want a version that works when booting in UEFI mode you need to download it yourself from memtest86.com Quote Link to comment
wesman Posted March 10, 2020 Author Share Posted March 10, 2020 9 hours ago, itimpi said: Memtest supplied with Unraid will only work Id booting in Legacy mode. If you want a version that works when booting in UEFI mode you need to download it yourself from memtest86.com Ah, thanks for the tip, I'll see what I can do. If I download it and install it, will it replace the memtest that is in the menu to launch the correct program? Quote Link to comment
trurl Posted March 10, 2020 Share Posted March 10, 2020 54 minutes ago, wesman said: If I download it and install it, will it replace the memtest that is in the menu to launch the correct program? You don't install it. You put it on another flash drive and boot it up. Quote Link to comment
wesman Posted March 12, 2020 Author Share Posted March 12, 2020 (edited) @trurl Thanks, I figured it out, eventually. Not the best at this sort of thing @itimpi @Squid Got in install on a USB and ran the complete suit of tests for 48 hours. zero Memory issues. @johnnie.black After the memtest I ran with the case cover off to see if that would be the issue (airflow), with the case open, no more CPU warnings, but still crashing SYSLOG - Mar 11 07:07:XX - about this time it was updating plug ins - Looks like I rebooted shortly after that, maybe... then put in a wrong password - looks like it started loading disks at Mar 11 07:10:06 - Somthing happened with the NVIDIA card at 11 10:38:35 Valyria kernel: Modules linked in: nvidia_uvm(O) - Something is happening at, call stack? I have no idea Mar 11 10:38:35 Valyria kernel: Workqueue: events macvlan_process_broadcast [macvlan] - This is ODD as I know I was sleeping by this time (is this just sending email? I didnt get an email) Mar 11 11:24:14 Valyria login[31887]: ROOT LOGIN on '/dev/pts/0' Mar 11 11:51:47 Valyria sSMTP[6500]: Creating SSL connection to host Mar 11 11:51:47 Valyria sSMTP[6500]: SSL connection using TLS_AES_256_GCM_SHA384 Mar 11 11:51:53 Valyria sSMTP[6500]: Sent mail for [email protected] (221 2.0.0 closing connection o12sm6023436pjs.6 - gsmtp) uid=0 username=root - Then it appears that the GPU Crashes, it loops on this message for a while Mar 11 20:50:00 Valyria kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x23:0x56:515) Mar 11 20:50:00 Valyria kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0 ** At Mar 11 20:50:00 Everything appears to enter a disabled state. Valyria kernel: vethd02192f: renamed from eth0 - Network appears to go down, dockers becomes disabled - ?? It install: a Mar 11 20:53:58 Valyria root: # Nerd Tools unRAID Plugin - ?? super weird why its trying to install all the plugins at 20:54:16 -- ?? appears to trying to stop and start the array Mar 11 20:59:19 Valyria kernel: mdcmd (63): start STOPPED -- alot of this happens as my MAC trys to remount its volumes Mar 11 21:48:16 Valyria webGUI: Successful login user root from 192.168.29.105 Mar 11 21:49:18 Valyria kernel: docker0: port 1(veth524682b) entered blocking state Mar 11 21:49:18 Valyria kernel: docker0: port 1(veth524682b) entered disabled state Mar 11 21:49:18 Valyria kernel: device veth524682b entered promiscuous mode Mar 11 21:49:18 Valyria kernel: IPv6: ADDRCONF(NETDEV_UP): veth524682b: link is not ready -- later in the log it looks like another NVIDIA crash - ** followed by a bunch of stack traces, which appear to loop until I reset it. Log attached below in another comment [10de:2184] 01:00.0 VGA compatible controller: NVIDIA Corporation TU116 [GeForce GTX 1660] (rev a1) using the @linuxserver.io unraid 6.8.2 Edited March 12, 2020 by wesman Quote Link to comment
trurl Posted March 12, 2020 Share Posted March 12, 2020 3 minutes ago, wesman said: Log attached - couldnt upload it for some reason but here is a link You should post the complete diagnostics instead of a link to an external site. The diagnostics gives more information, and external sites can be more trouble even if they are safe. Quote Link to comment
wesman Posted March 12, 2020 Author Share Posted March 12, 2020 Yep! Uploaded the log too, worked this time, I zipped it, perhaps it liked that better valyria-diagnostics-20200312-0907.zip syslog-192.168.29.30.log.zip Quote Link to comment
wesman Posted March 13, 2020 Author Share Posted March 13, 2020 Anyone have any ideas? Quote Link to comment
JorgeB Posted March 13, 2020 Share Posted March 13, 2020 There's a btrfs filesystem crashing, most likely the docker image, you should recreate it. CPU is still overheating: Mar 12 08:25:50 Valyria kernel: CPU6: Package temperature above threshold, cpu clock throttled (total events = 51) Mar 12 08:25:50 Valyria kernel: CPU14: Package temperature above threshold, cpu clock throttled (total events = 51) Mar 12 08:25:50 Valyria kernel: CPU8: Package temperature above threshold, cpu clock throttled (total events = 51) Mar 12 08:25:50 Valyria kernel: CPU3: Package temperature above threshold, cpu clock throttled (total events = 51) Mar 12 08:25:50 Valyria kernel: CPU11: Package temperature above threshold, cpu clock throttled (total events = 51) Quote Link to comment
wesman Posted March 14, 2020 Author Share Posted March 14, 2020 Thanks @johnnie.black I see this loop again and again, it does start with WARNING: CPU: 7 PID: 916 at fs/btrfs/extent_io.c:435 insert_state+0x30/0xe1 but I thought it was referencing the ncidia_uvm, i was thinking it might be related to plex and the video card. Mar 14 11:17:15 Valyria kernel: ------------[ cut here ]------------ Mar 14 11:17:15 Valyria kernel: BTRFS: end < start 4095 18446612749489541176 Mar 14 11:17:15 Valyria kernel: WARNING: CPU: 7 PID: 916 at fs/btrfs/extent_io.c:435 insert_state+0x30/0xe1 Mar 14 11:17:15 Valyria kernel: Modules linked in: nvidia_uvm(O) veth macvlan xt_CHECKSUM ipt_REJECT ip6table_mangle ip6table_nat nf_nat_ipv6 iptable_mangle ip6table_filter ip6_tables vhost_net tun vhost tap xt_nat ipt_MASQUERADE iptable_filter iptable_nat nf_nat_ipv4 nf_nat ip_tables xfs dm_crypt dm_mod dax nfsd lockd grace sunrpc md_mod nct6775 hwmon_vid atlantic e1000e igb(O) nvidia_drm(PO) nvidia_modeset(PO) x86_pkg_temp_thermal nvidia(PO) intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper intel_cstate wmi_bmof intel_wmi_thunderbolt mxm_wmi drm_kms_helper mpt3sas drm btusb btrtl btbcm btintel raid_class rsnvme(PO) bluetooth scsi_transport_sas nvme nvme_core wmi intel_uncore agpgart i2c_i801 ahci intel_rapl_perf Mar 14 11:17:15 Valyria kernel: syscopyarea sr_mod video sysfillrect i2c_core pcc_cpufreq ecdh_generic sysimgblt libahci backlight fb_sys_fops cdrom button acpi_pad [last unloaded: atlantic] Mar 14 11:17:15 Valyria kernel: CPU: 7 PID: 916 Comm: kswapd0 Tainted: P W O 4.19.98-Unraid #1 Mar 14 11:17:15 Valyria kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z390 Taichi Ultimate, BIOS P4.20 07/23/2019 Mar 14 11:17:15 Valyria kernel: RIP: 0010:insert_state+0x30/0xe1 Mar 14 11:17:15 Valyria kernel: Code: 89 c7 41 56 4d 89 ce 41 55 49 89 fd 41 54 49 89 d4 55 48 89 f5 53 48 89 cb 73 11 48 89 ce 48 c7 c7 54 88 d5 81 e8 26 aa e1 ff <0f> 0b 48 8b 4c 24 40 4c 89 ef 4c 89 65 00 48 89 ee 48 8b 54 24 38 Mar 14 11:17:15 Valyria kernel: RSP: 0018:ffffc90006703a90 EFLAGS: 00010286 Mar 14 11:17:15 Valyria kernel: RAX: 0000000000000000 RBX: 0000000000000fff RCX: 0000000000000007 Mar 14 11:17:15 Valyria kernel: RDX: 000000072af20962 RSI: 0000000000000002 RDI: ffff88903ddd64f0 Mar 14 11:17:15 Valyria kernel: RBP: ffff888037fd24b0 R08: 0000000000000003 R09: 0000000000003000 Mar 14 11:17:15 Valyria kernel: R10: 0000000000000000 R11: 000000000000004c R12: ffff888fb2841038 Mar 14 11:17:15 Valyria kernel: R13: ffff88815308f738 R14: ffffc90006703b18 R15: ffffc90006703b10 Mar 14 11:17:15 Valyria kernel: FS: 0000000000000000(0000) GS:ffff88903ddc0000(0000) knlGS:0000000000000000 Mar 14 11:17:15 Valyria kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 Mar 14 11:17:15 Valyria kernel: CR2: 000015162a65d000 CR3: 0000000001e0a001 CR4: 00000000003606e0 Mar 14 11:17:15 Valyria kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 Mar 14 11:17:15 Valyria kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 Mar 14 11:17:15 Valyria kernel: Call Trace: Mar 14 11:17:15 Valyria kernel: __set_extent_bit+0x132/0x430 Mar 14 11:17:15 Valyria kernel: lock_extent_bits+0x54/0x1d8 Mar 14 11:17:15 Valyria kernel: btrfs_evict_inode+0x141/0x420 Mar 14 11:17:15 Valyria kernel: evict+0xb8/0x16e Mar 14 11:17:15 Valyria kernel: dispose_list+0x30/0x39 Mar 14 11:17:15 Valyria kernel: prune_icache_sb+0x56/0x74 Mar 14 11:17:15 Valyria kernel: super_cache_scan+0x11a/0x16d Mar 14 11:17:15 Valyria kernel: do_shrink_slab+0x128/0x194 Mar 14 11:17:15 Valyria kernel: shrink_slab+0x20c/0x276 Mar 14 11:17:15 Valyria kernel: shrink_node+0x108/0x3cb Mar 14 11:17:15 Valyria kernel: kswapd+0x451/0x58a Mar 14 11:17:15 Valyria kernel: ? __switch_to_asm+0x41/0x70 Mar 14 11:17:15 Valyria kernel: ? __switch_to_asm+0x41/0x70 Mar 14 11:17:15 Valyria kernel: ? mem_cgroup_shrink_node+0xa4/0xa4 Mar 14 11:17:15 Valyria kernel: kthread+0x10c/0x114 Mar 14 11:17:15 Valyria kernel: ? kthread_park+0x89/0x89 Mar 14 11:17:15 Valyria kernel: ret_from_fork+0x1f/0x40 Mar 14 11:17:15 Valyria kernel: ---[ end trace ec01f8e5369e34db ]--- Mar 14 11:17:15 Valyria kernel: ------------[ cut here ]------------ Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.