Jump to content

Various issues after updating from 6.9 to 6.11


Go to solution Solved by BzzLghtyr,

Recommended Posts

I finally got tired of the “new version available” notification on my screen and updated. I believe I was on 6.9 before doing so. 
 

I’m not absolutely certain that the update is the cause of my problems but it’s currently the leading suspect. 
 

My first issue was something with the my docker containers just not responding and iirc not allowing me to stop them or connect to them. I followed a trail of suggestions and eventually changed the network from “macvlan” to “ipvlan”.

 

I then experienced other issues such as the array refusing to stop fully, ignoring powerdown command, and now after having to hard shutdown a few times, the parity check begins at a decent speed then slows to around 300 kb/s. I attempted to pause and to stop the parity check and it simply wouldn’t respond. 
 

Currently the server is at home waiting for me to force it to restart, as I tried to restart after further upgrading to 6.11.1 and it didn’t it make it all the way through (reboot webui counter got up to about 2000 before I gave up). 
 

I’ve attached my diagnostics. Please let me know if they are revealing too much information as I’m doing this remotely on mobile currently. 
 

I feel like I was at about 80 days uptime with no issues or hiccups before trying to update. 

dwigt-diagnostics-20221011-0811.zip

Link to comment

I will follow this thread as I have the exact same problem for a couple of days now. Dockers themselves work fine, but i cant load the docker page. The server does not respond to shutting down, stopping the array, pausing parity check, even pressing the power button does nothing except for the "beeb". When i close a notification it pops back up after a few seconds. I had to forse shut down, problem cam back. Great server name btw.

Link to comment
16 hours ago, JorgeB said:

You can assign it to a pool and start docker, but you'd need to correct all the paths.

I'd like to do this. I've run into pretty regular failures now outside of safe more but there's no real use being in safe mode to try anything if my docker location can't mount.

 

How can I move my appdata or the whole drive to being something in a cache pool rather than on an unassigned device?

 

I'm losing my mind at whatever's causing this.

Link to comment
2 hours ago, JorgeB said:

You just create a new pool and assign the device there, then need to adjust all the mappings, e.g. /mnt/disks/ud_name/appdata becomes /mnt/pool_name/appdata

And this won't cause the array to ask to format the drive? I got worried about following through as I watched a spaceinvaderone video on cache pools and it asking him to format after moving from unassigned to cache, but he may have been using a fresh drive or something. 

Link to comment
12 hours ago, BzzLghtyr said:

And this won't cause the array to ask to format the drive?

If the drive in question is formatted BTRFS, it should just work regardless. If it's XFS, make sure the pool you assign it to only has one slot, if there are more than one drive slot defined, XFS isn't an option, and it will ask to format it to BTRFS.

  • Like 1
Link to comment

A quick update while I troubleshoot and try things out. 
 

Safe mode didn’t help identify anything due to my setup, so I proceeded in normal Unraid OS. I decided to remove the docker folders plugin since it was a relatively new addition to the server as well as turning off the folder caching in Dynamix Cache Directories plugin simply because something in my google searches mentioned it was helpful for them. I don’t know that either of these steps mattered as I deal with the same issue of the server being powered but non-responsive hours later into the day. 


I ran memtest for about 2 hours to check on any obvious errors. I understand this is not exactly a thorough check at this duration but I just wanted to peek real quick. No errors in that time. 


I proceeded to boot normally again and this time I stopped most of my non-essential docker containers. I left Plex going and allowed it to proceed as normal, using hardware transcoding via nvidia gpu for multiple users in that time. I also had redbot up the whole time which is just a discord bot. I also left NextCloud, Swag and various DBs up. 
 

I just now completed my parity check (first one in a long time, 3 errors found.) which ran at an expected speed between 100 and 200 mb/s and the server has been up for almost 2 days without any issues. That’s not long but it’s longer than the few hours I was getting previously. 
 

Something that may be pertinent is that the WireGuard VPN settings I have are still the same as before I upgraded to 6.11 and in previous days/reboots I have gone into the settings for it and activated it (it changes to inactive when I reboot) as well as utilizing it. For my current uptime, I have left it inactive. 

 

I’ll be starting up my dockers and giving them some time while also testing if a higher CPU, disk or GPU load changes things (tdarr is usually running and almost always converting things but has been shut off. As has qbittorrent). If I don’t notice issues there, I may shut the containers back down and mess with the VPN being active again and see if that triggers and issues. 
 

still troubleshooting but feeling hopeful

Link to comment
18 hours ago, JonathanM said:

If the drive in question is formatted BTRFS, it should just work regardless. If it's XFS, make sure the pool you assign it to only has one slot, if there are more than one drive slot defined, XFS isn't an option, and it will ask to format it to BTRFS.

I will also add that I have not taken the time to make this change so my setup still has a majority of docker data in an unassigned device. 

Link to comment
On 10/14/2022 at 7:20 PM, BzzLghtyr said:

I will also add that I have not taken the time to make this change so my setup still has a majority of docker data in an unassigned device. 

So far so good running only the unassigned devices and unassigned devices plus plugin. Maybe you could try monitoring with only the unassigned devices plugin. If you still have problems you can assume its this plugin.

Link to comment

My server finally crashed again a few nights ago. I had to power cycle it manually.
 

The only thing I can think that I did and could have caused some sort of issue was that I used Owncast to stream my HD Homerun out to a web browser so I could watch local sports while I was away. I’ll refrain from this for a few days and see if we crash again. This uses GPU transcoding so a plugin is still needed. I also had Plex live and ready to transcode for anyone who may have used it. 

 

I moved my unassigned appdata to a  cache pool and deleted the unassigned devices plugin(s). Today I also upgraded my nvidia driver.

 

too soon to report any success but I’ll report any failures. 

Link to comment
6 hours ago, JorgeB said:

You are having the same issue described here, unfortunately not yet clear what causes it but you can read that thread for some ideas, and this way everyone affected can discuss it in the same place.

thanks, indeed the same issues. For now i have downgraded back to v6.10

Link to comment
  • 3 weeks later...

I had another crash again this morning.

 

Server fully halted. Unresponsive but still powered on. No display output.

 

The only entries in the log near that time occurred here:

Nov 13 09:27:06 Dwigt kernel: BUG: unable to handle page fault for address: ffffffff81fbb972
Nov 13 09:27:06 Dwigt kernel: #PF: supervisor write access in kernel mode
Nov 13 09:27:06 Dwigt kernel: #PF: error_code(0x0003) - permissions violation
Nov 13 09:27:06 Dwigt kernel: PGD 400e067 P4D 400e067 PUD 400f063 PMD 132eff063 PTE 8000000003fbb061
Nov 13 09:27:06 Dwigt kernel: Oops: 0003 [#1] PREEMPT SMP PTI
Nov 13 09:27:06 Dwigt kernel: CPU: 2 PID: 15 Comm: rcu_preempt Tainted: P           O      5.19.9-Unraid #1
Nov 13 09:27:06 Dwigt kernel: Hardware name: Gigabyte Technology Co., Ltd. Z87X-UD4H/Z87X-UD4H-CF, BIOS F9 03/18/2014
Nov 13 09:27:06 Dwigt kernel: RIP: 0010:rcu_gp_kthread+0x3d/0x14d
Nov 13 09:27:06 Dwigt kernel: Code: 48 89 44 24 18 31 c0 65 48 8b 1c 25 c0 bb 01 00 48 8b 15 d6 38 0c 01 48 8b 35 ff 9b fe 00 48 8b 3d 58 9d fe 00 e8 48 ab ff ff <66> c7 05 1c 9c fe 00 01 00 66 8b 05 13 9c fe 00 a8 01 75 44 48 8d
Nov 13 09:27:06 Dwigt kernel: RSP: 0018:ffffc9000009fef0 EFLAGS: 00010286
Nov 13 09:27:06 Dwigt kernel: RAX: 0000000080000000 RBX: ffff8881001f5e80 RCX: 0000000000000000
Nov 13 09:27:06 Dwigt kernel: RDX: ffffffff81ebaeca RSI: 0000000004198534 RDI: ffffffff820bbb08
Nov 13 09:27:06 Dwigt kernel: RBP: ffff888100141a80 R08: 0000000000000000 R09: ffff88882f32c070
Nov 13 09:27:06 Dwigt kernel: R10: 0000000000000000 R11: 0000000000000019 R12: ffffc9000002fdb8
Nov 13 09:27:06 Dwigt kernel: R13: ffffffff810d1d10 R14: 0000000000000000 R15: ffff8881001f5e80
Nov 13 09:27:06 Dwigt kernel: FS:  0000000000000000(0000) GS:ffff88882f300000(0000) knlGS:0000000000000000
Nov 13 09:27:06 Dwigt kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 13 09:27:06 Dwigt kernel: CR2: ffffffff81fbb972 CR3: 000000000400a001 CR4: 00000000001726e0
Nov 13 09:27:06 Dwigt kernel: Call Trace:
Nov 13 09:27:06 Dwigt kernel: <TASK>
Nov 13 09:27:06 Dwigt kernel: kthread+0xe7/0xef
Nov 13 09:27:06 Dwigt kernel: ? kthread_complete_and_exit+0x1b/0x1b
Nov 13 09:27:06 Dwigt kernel: ret_from_fork+0x22/0x30
Nov 13 09:27:06 Dwigt kernel: </TASK>
Nov 13 09:27:06 Dwigt kernel: Modules linked in: tcp_diag udp_diag inet_diag af_packet nvidia_uvm(PO) xt_nat veth ipvlan nf_conntrack_netlink nfnetlink xfrm_user xfrm_algo xt_addrtype br_netfilter xt_CHECKSUM xt_conntrack ipt_REJECT nf_reject_ipv4 xt_tcpudp ip6table_mangle ip6table_nat iptable_mangle vhost_net tun vhost vhost_iotlb tap xfs md_mod it87 hwmon_vid efivarfs iptable_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 poly1305_x86_64 ip6_udp_tunnel udp_tunnel libchacha ip6table_filter ip6_tables iptable_filter ip_tables x_tables bridge stp llc bonding tls ipv6 nvidia_drm(PO) nvidia_modeset(PO) x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel i915 kvm nvidia(PO) iosf_mbi drm_buddy i2c_algo_bit crct10dif_pclmul ttm crc32_pclmul crc32c_intel ghash_clmulni_intel drm_display_helper aesni_intel crypto_simd mxm_wmi cryptd rapl drm_kms_helper intel_cstate intel_uncore drm i2c_i801 i2c_smbus ahci
Nov 13 09:27:06 Dwigt kernel: libahci e1000e intel_gtt agpgart input_leds led_class cp210x i2c_core usbserial syscopyarea sysfillrect sysimgblt fb_sys_fops thermal fan button video wmi backlight unix
Nov 13 09:27:06 Dwigt kernel: CR2: ffffffff81fbb972
Nov 13 09:27:06 Dwigt kernel: ---[ end trace 0000000000000000 ]---
Nov 13 09:27:06 Dwigt kernel: RIP: 0010:rcu_gp_kthread+0x3d/0x14d
Nov 13 09:27:06 Dwigt kernel: Code: 48 89 44 24 18 31 c0 65 48 8b 1c 25 c0 bb 01 00 48 8b 15 d6 38 0c 01 48 8b 35 ff 9b fe 00 48 8b 3d 58 9d fe 00 e8 48 ab ff ff <66> c7 05 1c 9c fe 00 01 00 66 8b 05 13 9c fe 00 a8 01 75 44 48 8d
Nov 13 09:27:06 Dwigt kernel: RSP: 0018:ffffc9000009fef0 EFLAGS: 00010286
Nov 13 09:27:06 Dwigt kernel: RAX: 0000000080000000 RBX: ffff8881001f5e80 RCX: 0000000000000000
Nov 13 09:27:06 Dwigt kernel: RDX: ffffffff81ebaeca RSI: 0000000004198534 RDI: ffffffff820bbb08
Nov 13 09:27:06 Dwigt kernel: RBP: ffff888100141a80 R08: 0000000000000000 R09: ffff88882f32c070
Nov 13 09:27:06 Dwigt kernel: R10: 0000000000000000 R11: 0000000000000019 R12: ffffc9000002fdb8
Nov 13 09:27:06 Dwigt kernel: R13: ffffffff810d1d10 R14: 0000000000000000 R15: ffff8881001f5e80
Nov 13 09:27:06 Dwigt kernel: FS:  0000000000000000(0000) GS:ffff88882f300000(0000) knlGS:0000000000000000
Nov 13 09:27:06 Dwigt kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
Nov 13 09:27:06 Dwigt kernel: CR2: ffffffff81fbb972 CR3: 0000000468c62006 CR4: 00000000001726e0

 

I have also included my diagnostics.

 

This has been the biggest pain and I truly appreciate any and all help.

 

dwigt-diagnostics-20221113-1024.zip

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...