Continued Crashes adn parity rebuilds


wesman

Recommended Posts

  • 2 weeks later...

You also have an overheating CPU, check cooling:

 

Mar  5 08:11:15 Valyria kernel: CPU4: Core temperature above threshold, cpu clock throttled (total events = 1)
Mar  5 08:11:15 Valyria kernel: CPU12: Core temperature above threshold, cpu clock throttled (total events = 1)
Mar  5 08:11:15 Valyria kernel: CPU0: Package temperature above threshold, cpu clock throttled (total events = 1)
Mar  5 08:11:15 Valyria kernel: CPU1: Package temperature above threshold, cpu clock throttled (total events = 1)
Mar  5 08:11:15 Valyria kernel: CPU5: Package temperature above threshold, cpu clock throttled (total events = 1)
Mar  5 08:11:15 Valyria kernel: CPU14: Package temperature above threshold, cpu clock throttled (total events = 1)
Mar  5 08:11:15 Valyria kernel: CPU13: Package temperature above threshold, cpu clock throttled (total events = 1)
Mar  5 08:11:15 Valyria kernel: CPU7: Package temperature above threshold, cpu clock throttled (total events = 1)
Mar  5 08:11:15 Valyria kernel: CPU15: Package temperature above threshold, cpu clock throttled (total events = 1)
Mar  5 08:11:15 Valyria kernel: CPU6: Package temperature above threshold, cpu clock throttled (total events = 1)
Mar  5 08:11:15 Valyria kernel: CPU9: Package temperature above threshold, cpu clock throttled (total events = 1)
Mar  5 08:11:15 Valyria kernel: CPU8: Package temperature above threshold, cpu clock throttled (total events = 1)
Mar  5 08:11:15 Valyria kernel: CPU10: Package temperature above threshold, cpu clock throttled (total events = 1)
Mar  5 08:11:15 Valyria kernel: CPU2: Package temperature above threshold, cpu clock throttled (total events = 1)
Mar  5 08:11:15 Valyria kernel: CPU3: Package temperature above threshold, cpu clock throttled (total events = 1)
Mar  5 08:11:15 Valyria kernel: CPU11: Package temperature above threshold, cpu clock throttled (total events = 1)
Mar  5 08:11:15 Valyria kernel: CPU12: Package temperature above threshold, cpu clock throttled (total events = 1)
Mar  5 08:11:15 Valyria kernel: CPU4: Package temperature above threshold, cpu clock throttled (total events = 1)

 

Link to comment
On 3/7/2020 at 4:54 PM, Squid said:

It tests your memory.

 

You run it via the boot menu

@Squid I am hoping I am not doing something wrong, but when I select Memtest, it just reboot the server, and comes back the menu, or boots to unraid if I let it.  Is it suppose to do something?

Like this
 

 

Link to comment
9 hours ago, itimpi said:

Memtest supplied with Unraid will only work Id booting in Legacy mode.   If you want a version that works when booting in UEFI mode you need to download it yourself from memtest86.com

Ah, thanks for the tip, I'll see what I can do. 

If I download it and install it, will it replace the memtest that is in the menu to launch the correct program?

Link to comment

@trurl Thanks, I figured it out, eventually. :)   Not the best at this sort of thing


@itimpi  @Squid Got in install on a USB and ran the complete suit of tests for 48 hours. zero Memory issues. 

 

@johnnie.black After the memtest I ran with the case cover off to see if that would be the issue (airflow), with the case open, no more CPU warnings, but still crashing

SYSLOG

- Mar 11 07:07:XX - about this time it was updating plug ins

- Looks like I rebooted shortly after that, maybe... then put in a wrong password

- looks like it started loading disks at Mar 11 07:10:06

- Somthing happened with the NVIDIA card at 11 10:38:35
         Valyria kernel: Modules linked in: nvidia_uvm(O)

- Something is happening at, call stack? I have no idea
          Mar 11 10:38:35 Valyria kernel: Workqueue: events macvlan_process_broadcast [macvlan]
- This is ODD as I know I was sleeping by this time (is this just sending email? I didnt get an email)
          Mar 11 11:24:14 Valyria login[31887]: ROOT LOGIN  on '/dev/pts/0'
          Mar 11 11:51:47 Valyria sSMTP[6500]: Creating SSL connection to host
          Mar 11 11:51:47 Valyria sSMTP[6500]: SSL connection using TLS_AES_256_GCM_SHA384
          Mar 11 11:51:53 Valyria sSMTP[6500]: Sent mail for [email protected] (221 2.0.0 closing connection o12sm6023436pjs.6 - gsmtp) uid=0 username=root

- Then it appears that the GPU Crashes, it loops on this message for a while

         Mar 11 20:50:00 Valyria kernel: NVRM: GPU 0000:01:00.0: RmInitAdapter failed! (0x23:0x56:515)
         Mar 11 20:50:00 Valyria kernel: NVRM: GPU 0000:01:00.0: rm_init_adapter failed, device minor number 0

** At Mar 11 20:50:00 Everything appears to enter a disabled state. 

         Valyria kernel: vethd02192f: renamed from eth0

- Network appears to go down, dockers becomes disabled

- ?? It install: a Mar 11 20:53:58 Valyria root: # Nerd Tools unRAID Plugin 

- ?? super weird why its trying to install all the plugins at 20:54:16

-- ?? appears to trying to stop and start the array Mar 11 20:59:19 Valyria kernel: mdcmd (63): start STOPPED

-- alot of this happens as my MAC trys to remount its volumes

         Mar 11 21:48:16 Valyria webGUI: Successful login user root from 192.168.29.105
         Mar 11 21:49:18 Valyria kernel: docker0: port 1(veth524682b) entered blocking state
         Mar 11 21:49:18 Valyria kernel: docker0: port 1(veth524682b) entered disabled state
         Mar 11 21:49:18 Valyria kernel: device veth524682b entered promiscuous mode
         Mar 11 21:49:18 Valyria kernel: IPv6: ADDRCONF(NETDEV_UP): veth524682b: link is not ready

-- later in the log it looks like another NVIDIA crash - 

** followed by a bunch of stack traces, which appear to loop until I reset it. 

 

 

Log attached below in another comment

[10de:2184] 01:00.0 VGA compatible controller: NVIDIA Corporation TU116 [GeForce GTX 1660] (rev a1)


using the @linuxserver.io unraid 6.8.2

 

Edited by wesman
Link to comment

There's a btrfs filesystem crashing, most likely the docker image, you should recreate it.

 

CPU is still overheating:

 

Mar 12 08:25:50 Valyria kernel: CPU6: Package temperature above threshold, cpu clock throttled (total events = 51)
Mar 12 08:25:50 Valyria kernel: CPU14: Package temperature above threshold, cpu clock throttled (total events = 51)
Mar 12 08:25:50 Valyria kernel: CPU8: Package temperature above threshold, cpu clock throttled (total events = 51)
Mar 12 08:25:50 Valyria kernel: CPU3: Package temperature above threshold, cpu clock throttled (total events = 51)
Mar 12 08:25:50 Valyria kernel: CPU11: Package temperature above threshold, cpu clock throttled (total events = 51)

 

 

Link to comment

Thanks @johnnie.black I see this loop again and again, it does start with 

WARNING: CPU: 7 PID: 916 at fs/btrfs/extent_io.c:435 insert_state+0x30/0xe1

 

but I thought it was referencing the ncidia_uvm, i was thinking it might be related to plex and the video card.

 

Mar 14 11:17:15 Valyria kernel: ------------[ cut here ]------------
Mar 14 11:17:15 Valyria kernel: BTRFS: end < start 4095 18446612749489541176
Mar 14 11:17:15 Valyria kernel: WARNING: CPU: 7 PID: 916 at fs/btrfs/extent_io.c:435 insert_state+0x30/0xe1
Mar 14 11:17:15 Valyria kernel: Modules linked in: nvidia_uvm(O) veth macvlan xt_CHECKSUM ipt_REJECT ip6table_mangle ip6table_nat nf_nat_ipv6 iptable_mangle ip6table_filter ip6_tables vhost_net tun vhost tap xt_nat ipt_MASQUERADE iptable_filter iptable_nat nf_nat_ipv4 nf_nat ip_tables xfs dm_crypt dm_mod dax nfsd lockd grace sunrpc md_mod nct6775 hwmon_vid atlantic e1000e igb(O) nvidia_drm(PO) nvidia_modeset(PO) x86_pkg_temp_thermal nvidia(PO) intel_powerclamp coretemp kvm_intel kvm crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd cryptd glue_helper intel_cstate wmi_bmof intel_wmi_thunderbolt mxm_wmi drm_kms_helper mpt3sas drm btusb btrtl btbcm btintel raid_class rsnvme(PO) bluetooth scsi_transport_sas nvme nvme_core wmi intel_uncore agpgart i2c_i801 ahci intel_rapl_perf
Mar 14 11:17:15 Valyria kernel: syscopyarea sr_mod video sysfillrect i2c_core pcc_cpufreq ecdh_generic sysimgblt libahci backlight fb_sys_fops cdrom button acpi_pad [last unloaded: atlantic]
Mar 14 11:17:15 Valyria kernel: CPU: 7 PID: 916 Comm: kswapd0 Tainted: P        W  O      4.19.98-Unraid #1
Mar 14 11:17:15 Valyria kernel: Hardware name: To Be Filled By O.E.M. To Be Filled By O.E.M./Z390 Taichi Ultimate, BIOS P4.20 07/23/2019
Mar 14 11:17:15 Valyria kernel: RIP: 0010:insert_state+0x30/0xe1
Mar 14 11:17:15 Valyria kernel: Code: 89 c7 41 56 4d 89 ce 41 55 49 89 fd 41 54 49 89 d4 55 48 89 f5 53 48 89 cb 73 11 48 89 ce 48 c7 c7 54 88 d5 81 e8 26 aa e1 ff <0f> 0b 48 8b 4c 24 40 4c 89 ef 4c 89 65 00 48 89 ee 48 8b 54 24 38
Mar 14 11:17:15 Valyria kernel: RSP: 0018:ffffc90006703a90 EFLAGS: 00010286
Mar 14 11:17:15 Valyria kernel: RAX: 0000000000000000 RBX: 0000000000000fff RCX: 0000000000000007
Mar 14 11:17:15 Valyria kernel: RDX: 000000072af20962 RSI: 0000000000000002 RDI: ffff88903ddd64f0
Mar 14 11:17:15 Valyria kernel: RBP: ffff888037fd24b0 R08: 0000000000000003 R09: 0000000000003000
Mar 14 11:17:15 Valyria kernel: R10: 0000000000000000 R11: 000000000000004c R12: ffff888fb2841038
Mar 14 11:17:15 Valyria kernel: R13: ffff88815308f738 R14: ffffc90006703b18 R15: ffffc90006703b10
Mar 14 11:17:15 Valyria kernel: FS:  0000000000000000(0000) GS:ffff88903ddc0000(0000) knlGS:0000000000000000
Mar 14 11:17:15 Valyria kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Mar 14 11:17:15 Valyria kernel: CR2: 000015162a65d000 CR3: 0000000001e0a001 CR4: 00000000003606e0
Mar 14 11:17:15 Valyria kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
Mar 14 11:17:15 Valyria kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
Mar 14 11:17:15 Valyria kernel: Call Trace:
Mar 14 11:17:15 Valyria kernel: __set_extent_bit+0x132/0x430
Mar 14 11:17:15 Valyria kernel: lock_extent_bits+0x54/0x1d8
Mar 14 11:17:15 Valyria kernel: btrfs_evict_inode+0x141/0x420
Mar 14 11:17:15 Valyria kernel: evict+0xb8/0x16e
Mar 14 11:17:15 Valyria kernel: dispose_list+0x30/0x39
Mar 14 11:17:15 Valyria kernel: prune_icache_sb+0x56/0x74
Mar 14 11:17:15 Valyria kernel: super_cache_scan+0x11a/0x16d
Mar 14 11:17:15 Valyria kernel: do_shrink_slab+0x128/0x194
Mar 14 11:17:15 Valyria kernel: shrink_slab+0x20c/0x276
Mar 14 11:17:15 Valyria kernel: shrink_node+0x108/0x3cb
Mar 14 11:17:15 Valyria kernel: kswapd+0x451/0x58a
Mar 14 11:17:15 Valyria kernel: ? __switch_to_asm+0x41/0x70
Mar 14 11:17:15 Valyria kernel: ? __switch_to_asm+0x41/0x70
Mar 14 11:17:15 Valyria kernel: ? mem_cgroup_shrink_node+0xa4/0xa4
Mar 14 11:17:15 Valyria kernel: kthread+0x10c/0x114
Mar 14 11:17:15 Valyria kernel: ? kthread_park+0x89/0x89
Mar 14 11:17:15 Valyria kernel: ret_from_fork+0x1f/0x40
Mar 14 11:17:15 Valyria kernel: ---[ end trace ec01f8e5369e34db ]---
Mar 14 11:17:15 Valyria kernel: ------------[ cut here ]------------

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.