bucketphobia Posted October 2, 2023 Share Posted October 2, 2023 (edited) Hi, I've been having an issue with unraid freezing up, and have been working on tracing it but cannot seem to find an issue. This last time, I was looking at HTOP and noticed very high CPU usage across the board, and checked syslog with a bunch of php-fpm errors. Screenshots and diagnosis attached. Would appreciate any thoughts / help, thanks! Unraid version 6.12.4 omar-diagnostics-20231002-2055.zip Edited October 14, 2023 by bucketphobia Quote Link to comment
bucketphobia Posted October 2, 2023 Author Share Posted October 2, 2023 As an update, I also tried it with all dockers and VMs off - issue still persists Quote Link to comment
bucketphobia Posted October 3, 2023 Author Share Posted October 3, 2023 bump - would appreciate any thoughts Quote Link to comment
JorgeB Posted October 4, 2023 Share Posted October 4, 2023 Looks docker related, if you disable the docker service and reboot do you still see the issue? 1 Quote Link to comment
amuzhaqi Posted October 4, 2023 Share Posted October 4, 2023 I just had the same issue, for me it was caused by one of the docker containers. 1 Quote Link to comment
bucketphobia Posted October 5, 2023 Author Share Posted October 5, 2023 On 10/4/2023 at 10:57 AM, JorgeB said: Looks docker related, if you disable the docker service and reboot do you still see the issue? Disabling docker service does indeed seem to resolve the issue, but I cannot seem to trace it to any docker containers. If I stop all containers, the issue still persists. I did two things and it seems to have helped a bit (server only died about ~3d in this time, vs 2-5 hours), but not entirely sure of how related they are: implemented the macvlan fix, and used binhex's libtorrentv1 qbit image. 18 hours ago, amuzhaqi said: I just had the same issue, for me it was caused by one of the docker containers. Interesting. Can you please share more details, would be great to see if I could isolate it to the same issue as you! Quote Link to comment
JorgeB Posted October 5, 2023 Share Posted October 5, 2023 You can also try recreating the docker image, in case there's some issue with it. 1 Quote Link to comment
bucketphobia Posted October 5, 2023 Author Share Posted October 5, 2023 17 minutes ago, JorgeB said: You can also try recreating the docker image, in case there's some issue with it. Let me give that a shot. I currently use docker folders instead of image. Is there a reason to switch over to docker image? Quote Link to comment
JorgeB Posted October 5, 2023 Share Posted October 5, 2023 Should be the same, you recreate the folder instead. 1 Quote Link to comment
bucketphobia Posted October 5, 2023 Author Share Posted October 5, 2023 (edited) 9 hours ago, JorgeB said: Should be the same, you recreate the folder instead. Just happened again - noticed these lines in syslog shortly before server froze and needed to be hard reset - any ideas if related? Oct 6 00:31:39 Omar usbhid-ups[4724]: device->Product is NULL so it is not possible to determine whether to activate max_report_size workaround Oct 6 00:31:39 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything Oct 6 00:31:40 Omar kernel: usb 3-3: USB disconnect, device number 4 Oct 6 00:31:40 Omar usb_manager: Info: rc.usb_manager usb_remove American_Power_Conversion_Back-UPS_BX750MI_FW:8_T©____-302202G /dev/bus/usb/003/004 003 004 Oct 6 00:31:40 Omar usb_manager: Info: rc.usb_manager Device Match 003/004 vm: 003 004 Oct 6 00:31:40 Omar usb_manager: Info: rc.usb_manager Removed 003/004 vm: nostate 003 004 Oct 6 00:31:40 Omar kernel: usb 3-3: new full-speed USB device number 5 using xhci_hcd Oct 6 00:31:40 Omar kernel: hid-generic 0003:051D:0002.0003: hiddev96,hidraw0: USB HID v1.10 Device [American Power Conversion Back-UPS BX750MI FW:219A3541-302202G ] on usb-0000:00:14.0-3/input0 Oct 6 00:31:40 Omar usb_manager: Info: rc.usb_manager usb_add American_Power_Conversion_Back-UPS_BX750MI_FW:219A3541-302202G /dev/bus/usb/003/005 003 005 Oct 6 00:31:41 Omar upsmon[4732]: Poll UPS [[email protected]] failed - Data stale Oct 6 00:31:41 Omar usb_manager: Info: rc.usb_manager Autoconnect No Mapping found American_Power_Conversion_Back-UPS_BX750MI_FW:219A3541-302202G /dev/bus/usb/003/005 003 005 port 3-3 Oct 6 00:31:41 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything... Oct 6 00:37:11 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything Oct 6 00:37:13 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything Oct 6 00:37:15 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything Oct 6 00:37:16 Omar upsmon[4732]: Poll UPS [[email protected]] failed - Data stale Oct 6 00:37:17 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything Oct 6 00:37:19 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything Oct 6 00:37:21 Omar upsmon[4732]: Poll UPS [[email protected]] failed - Data stale Oct 6 00:37:21 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything Oct 6 00:37:23 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything Oct 6 00:37:25 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything Oct 6 00:37:26 Omar upsmon[4732]: Poll UPS [[email protected]] failed - Data stale Oct 6 00:37:27 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything Oct 6 00:37:29 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything Oct 6 00:37:31 Omar upsmon[4732]: Poll UPS [[email protected]] failed - Data stale Oct 6 00:37:31 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything Oct 6 00:37:33 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything Oct 6 00:37:35 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything Oct 6 00:37:36 Omar upsmon[4732]: Poll UPS [[email protected]] failed - Data stale Oct 6 00:37:37 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything Oct 6 00:37:39 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything Oct 6 00:37:41 Omar upsmon[4732]: Poll UPS [[email protected]] failed - Data stale Oct 6 00:37:41 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything Oct 6 00:37:43 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything Oct 6 00:37:45 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything Oct 6 00:37:46 Omar upsmon[4732]: Poll UPS [[email protected]] failed - Data stale Oct 6 00:37:47 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything Oct 6 00:37:49 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything Oct 6 00:37:51 Omar upsmon[4732]: Poll UPS [[email protected]] failed - Data stale Oct 6 00:37:51 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything Oct 6 00:37:53 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything Oct 6 00:37:55 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything Oct 6 00:37:56 Omar upsmon[4732]: Poll UPS [[email protected]] failed - Data stale Oct 6 00:37:57 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything Oct 6 00:37:59 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything I also installed netdata to see if that would help, and there's a large spike in iowait right before things go down. Also, in an effort to narrow down the issue, I've also uninstalled the unassigned device plugin family, and switched to ipvlan as well. Still no dice. Edited October 5, 2023 by bucketphobia Quote Link to comment
bucketphobia Posted October 5, 2023 Author Share Posted October 5, 2023 (edited) Another thing I noticed - is if the dashboard is open on a tab, system load and iowait goes up, but seems to go down to normal-ish after it is closed. Scratch that, went back to very high CPU usage a short while. (Initial Graph) (10-ish mins later) Edited October 5, 2023 by bucketphobia Quote Link to comment
XiMA4 Posted October 5, 2023 Share Posted October 5, 2023 I have a similar problem, not the first time. The server freezes after midnight. CPU is consuming 100%. I can monitor the work of unRAID virtual machine. After about an hour it lets up. ver. 6.12.4 / Docker Image btrfs / ipvlan Quote Link to comment
Vr2Io Posted October 6, 2023 Share Posted October 6, 2023 (edited) 5 hours ago, bucketphobia said: Another thing I noticed - is if the dashboard is open on a tab, system load and iowait goes up, but seems to go down to normal-ish after it is closed. Scratch that, went back to very high CPU usage a short while. As iowait upsurge, it may indicate something relate to storage issue. 19 hours ago, bucketphobia said: Disabling docker service does indeed seem to resolve the issue, but I cannot seem to trace it to any docker containers. If I stop all containers, the issue still persists. If all docker stop and just docker service start still cause the problem, it is odd ..... suggest try relocate docker storage to other storage device to try. Edited October 6, 2023 by Vr2Io Quote Link to comment
bucketphobia Posted October 7, 2023 Author Share Posted October 7, 2023 On 10/5/2023 at 1:53 PM, JorgeB said: You can also try recreating the docker image, in case there's some issue with it. Tried that, as well as tried docker image instead of folders. Still persists issue. Quote Link to comment
bucketphobia Posted October 8, 2023 Author Share Posted October 8, 2023 To rule out any issue with docker containers - ran the server with all containers stopped, and issue still persisted with a server crash. I guess it's an issue with the docker service itself. Any idea on potential fixes? Netdata: After the spike in iowait, server goes down and is unresponsive unless manually hard restarted. Stats across the board also drop (i.e., disk io, network, RAM, etc....) Syslog right before this latest crash: Oct 8 15:01:50 Omar usbhid-ups[5576]: nut_libusb_get_report: Input/Output Error Oct 8 15:01:52 Omar usbhid-ups[5576]: device->Product is NULL so it is not possible to determine whether to activate max_report_size workaround Oct 8 15:01:52 Omar usbhid-ups[5576]: libusb1: Could not open any HID devices: insufficient permissions on everything Oct 8 15:01:52 Omar upsd[5580]: Data for UPS [ups] is stale - check driver Oct 8 15:01:54 Omar usbhid-ups[5576]: device->Product is NULL so it is not possible to determine whether to activate max_report_size workaround Oct 8 15:01:56 Omar upsmon[5584]: Poll UPS [[email protected]] failed - Data stale Oct 8 15:01:56 Omar upsmon[5584]: Communications with UPS [email protected] lost Oct 8 15:01:58 Omar usbhid-ups[5576]: libusb1: Could not open any HID devices: insufficient permissions on everything Oct 8 15:01:58 Omar usbhid-ups[5576]: device->Product is NULL so it is not possible to determine whether to activate max_report_size workaround Oct 8 15:01:59 Omar usbhid-ups[5576]: libusb1: Could not open any HID devices: insufficient permissions on everything Oct 8 15:02:00 Omar usbhid-ups[5576]: device->Product is NULL so it is not possible to determine whether to activate max_report_size workaround Oct 8 15:02:01 Omar usbhid-ups[5576]: libusb1: Could not open any HID devices: insufficient permissions on everything Oct 8 15:02:01 Omar upsmon[5584]: Poll UPS [[email protected]] failed - Data stale Oct 8 15:02:02 Omar usbhid-ups[5576]: device->Product is NULL so it is not possible to determine whether to activate max_report_size workaround Oct 8 15:02:03 Omar usbhid-ups[5576]: libusb1: Could not open any HID devices: insufficient permissions on everything Oct 8 15:02:04 Omar usbhid-ups[5576]: device->Product is NULL so it is not possible to determine whether to activate max_report_size workaround Oct 8 15:02:05 Omar usbhid-ups[5576]: libusb1: Could not open any HID devices: insufficient permissions on everything Oct 8 15:02:06 Omar upsmon[5584]: Poll UPS [[email protected]] failed - Data stale Oct 8 15:02:06 Omar usbhid-ups[5576]: device->Product is NULL so it is not possible to determine whether to activate max_report_size workaround Oct 8 15:02:07 Omar usbhid-ups[5576]: libusb1: Could not open any HID devices: insufficient permissions on everything Oct 8 15:02:08 Omar usbhid-ups[5576]: device->Product is NULL so it is not possible to determine whether to activate max_report_size workaround Oct 8 15:02:09 Omar usbhid-ups[5576]: libusb1: Could not open any HID devices: insufficient permissions on everything Oct 8 15:02:10 Omar usbhid-ups[5576]: device->Product is NULL so it is not possible to determine whether to activate max_report_size workaround Oct 8 15:02:11 Omar usbhid-ups[5576]: libusb1: Could not open any HID devices: insufficient permissions on everything Oct 8 15:02:11 Omar upsmon[5584]: Poll UPS [[email protected]] failed - Data stale Oct 8 15:02:12 Omar usbhid-ups[5576]: device->Product is NULL so it is not possible to determine whether to activate max_report_size workaround Oct 8 15:02:13 Omar usbhid-ups[5576]: libusb1: Could not open any HID devices: insufficient permissions on everything Oct 8 15:02:15 Omar usbhid-ups[5576]: device->Product is NULL so it is not possible to determine whether to activate max_report_size workaround Oct 8 15:02:15 Omar usbhid-ups[5576]: libusb1: Could not open any HID devices: insufficient permissions on everything Oct 8 15:02:16 Omar upsmon[5584]: Poll UPS [[email protected]] failed - Data stale Oct 8 15:02:17 Omar usbhid-ups[5576]: device->Product is NULL so it is not possible to determine whether to activate max_report_size workaround Oct 8 15:02:17 Omar usbhid-ups[5576]: libusb1: Could not open any HID devices: insufficient permissions on everything Oct 8 15:02:19 Omar usbhid-ups[5576]: device->Product is NULL so it is not possible to determine whether to activate max_report_size workaround Oct 8 15:02:19 Omar usbhid-ups[5576]: libusb1: Could not open any HID devices: insufficient permissions on everything Oct 8 15:02:21 Omar usbhid-ups[5576]: device->Product is NULL so it is not possible to determine whether to activate max_report_size workaround Oct 8 15:02:21 Omar usbhid-ups[5576]: libusb1: Could not open any HID devices: insufficient permissions on everything Oct 8 15:02:21 Omar upsmon[5584]: Poll UPS [[email protected]] failed - Data stale Oct 8 15:02:23 Omar usbhid-ups[5576]: device->Product is NULL so it is not possible to determine whether to activate max_report_size workaround Oct 8 15:02:23 Omar usbhid-ups[5576]: libusb1: Could not open any HID devices: insufficient permissions on everything Oct 8 15:02:25 Omar usbhid-ups[5576]: device->Product is NULL so it is not possible to determine whether to activate max_report_size workaround Quote Link to comment
JorgeB Posted October 9, 2023 Share Posted October 9, 2023 Check the memory stats, does it get close to full memory use before crashing? Quote Link to comment
bucketphobia Posted October 9, 2023 Author Share Posted October 9, 2023 4 hours ago, JorgeB said: Check the memory stats, does it get close to full memory use before crashing? It does not, RAM usage seems to be pretty stable. Quote Link to comment
bucketphobia Posted October 10, 2023 Author Share Posted October 10, 2023 Happened again, but some new logs in syslog right before the crash: Oct 10 06:06:22 Omar winbindd[15096]: [2023/10/10 06:06:22.578378, 0] ../../source3/winbindd/winbindd_samr.c:71(open_internal_samr_conn) Oct 10 06:06:22 Omar winbindd[15096]: open_internal_samr_conn: Could not connect to samr pipe: NT_STATUS_IO_TIMEOUT ... Oct 10 06:30:15 Omar usbhid-ups[5571]: nut_libusb_get_report: Input/Output Error Oct 10 06:30:17 Omar usbhid-ups[5571]: device->Product is NULL so it is not possible to determine whether to activate max_report_size workaround Oct 10 06:30:17 Omar usbhid-ups[5571]: libusb1: Could not open any HID devices: insufficient permissions on everything Oct 10 06:30:17 Omar upsd[5575]: Data for UPS [ups] is stale - check driver Oct 10 06:30:17 Omar upsmon[5579]: Poll UPS [[email protected]] failed - Data stale Oct 10 06:30:17 Omar upsmon[5579]: Communications with UPS [email protected] lost Oct 10 06:30:19 Omar usbhid-ups[5571]: device->Product is NULL so it is not possible to determine whether to activate max_report_size workaround Oct 10 06:30:19 Omar usbhid-ups[5571]: libusb1: Could not open any HID devices: insufficient permissions on everything Oct 10 06:30:21 Omar usbhid-ups[5571]: device->Product is NULL so it is not possible to determine whether to activate max_report_size workaround Oct 10 06:30:21 Omar usbhid-ups[5571]: libusb1: Could not open any HID devices: insufficient permissions on everything Oct 10 06:30:22 Omar upsmon[5579]: Poll UPS [[email protected]] failed - Data stale ... (some more of the UPS errors) ... Oct 10 06:30:43 Omar usbhid-ups[5571]: libusb1: Could not open any HID devices: insufficient permissions on everything Oct 10 06:30:43 Omar kernel: SQUASHFS error: xz decompression failed, data probably corrupt Oct 10 06:30:43 Omar kernel: SQUASHFS error: Failed to read block 0x2e0d6c4: -5 Oct 10 06:30:43 Omar kernel: SQUASHFS error: xz decompression failed, data probably corrupt Oct 10 06:30:43 Omar kernel: SQUASHFS error: Failed to read block 0x2e4d66c: -5 Oct 10 06:30:43 Omar kernel: SQUASHFS error: xz decompression failed, data probably corrupt Oct 10 06:30:43 Omar kernel: SQUASHFS error: Failed to read block 0x10722d14: -5 Oct 10 06:30:43 Omar kernel: SQUASHFS error: xz decompression failed, data probably corrupt Oct 10 06:30:43 Omar kernel: SQUASHFS error: Failed to read block 0x2e4d66c: -5 Oct 10 06:30:43 Omar kernel: SQUASHFS error: xz decompression failed, data probably corrupt Oct 10 06:30:43 Omar kernel: SQUASHFS error: Failed to read block 0x10732670: -5 Oct 10 06:30:43 Omar kernel: SQUASHFS error: xz decompression failed, data probably corrupt Oct 10 06:30:43 Omar kernel: SQUASHFS error: Failed to read block 0x2e4d66c: -5 Oct 10 06:30:44 Omar kernel: SQUASHFS error: xz decompression failed, data probably corrupt Oct 10 06:30:44 Omar kernel: SQUASHFS error: Failed to read block 0x2e4d66c: -5 Oct 10 06:30:44 Omar kernel: SQUASHFS error: xz decompression failed, data probably corrupt Oct 10 06:30:44 Omar kernel: SQUASHFS error: Failed to read block 0x10722d14: -5 Oct 10 06:30:44 Omar kernel: SQUASHFS error: xz decompression failed, data probably corrupt Oct 10 06:30:44 Omar kernel: SQUASHFS error: Failed to read block 0x2e4d66c: -5 Oct 10 06:30:44 Omar kernel: SQUASHFS error: xz decompression failed, data probably corrupt Oct 10 06:30:44 Omar kernel: SQUASHFS error: Failed to read block 0x2e4d66c: -5 Oct 10 06:30:44 Omar kernel: SQUASHFS error: xz decompression failed, data probably corrupt Oct 10 06:30:44 Omar kernel: SQUASHFS error: Failed to read block 0x10722d14: -5 Oct 10 06:30:44 Omar kernel: SQUASHFS error: xz decompression failed, data probably corrupt Oct 10 06:30:44 Omar kernel: SQUASHFS error: Failed to read block 0x2e4d66c: -5 Quote Link to comment
JorgeB Posted October 10, 2023 Share Posted October 10, 2023 1 hour ago, bucketphobia said: Oct 10 06:30:44 Omar kernel: SQUASHFS error: xz decompression failed, data probably corrupt Oct 10 06:30:44 Omar kernel: SQUASHFS error: Failed to read block 0x10722d14: -5 Oct 10 06:30:44 Omar kernel: SQUASHFS error: xz decompression failed, data probably corrupt Oct 10 06:30:44 Omar kernel: SQUASHFS error: Failed to read block 0x2e4d66c: -5 These usually mean a flash drive problem. 1 Quote Link to comment
bucketphobia Posted October 10, 2023 Author Share Posted October 10, 2023 8 minutes ago, JorgeB said: These usually mean a flash drive problem. Thanks. I'll replace the flash drive tonight and get back to you. Quote Link to comment
bucketphobia Posted October 10, 2023 Author Share Posted October 10, 2023 11 hours ago, JorgeB said: These usually mean a flash drive problem. Replaced the flash drive, but the same symptoms still exist unfortunately. Server was slow for a few hours, then eventually died a few minutes ago. Any other ideas of things to try out? Thinking of rolling back to 6.11 to see if that resolves it? Quote Link to comment
bucketphobia Posted October 11, 2023 Author Share Posted October 11, 2023 I've been doing some additional research, and found a few threads that I believe may be the same issue I've been experiencing. I updated BIOS as suggested there, and enabled exclusive shares. Will see how it affects the problem. Quote Link to comment
JorgeB Posted October 11, 2023 Share Posted October 11, 2023 You can also create the small script below with the user scripts plugin and schedule it to run hourly, it will output the memory stats to the syslog, then see if there's anything abnormal regarding memory usage in the persistent syslog. #!/bin/bash free -h |& logger & Quote Link to comment
bucketphobia Posted October 14, 2023 Author Share Posted October 14, 2023 So as an update, I did three things as of the last post - and it appears one of them has worked, as I have ~84 hours of uptime so far with no signs of the issue happening again so far. Fingers crossed one of these is the long-term solution: 1) Updated BIOS to the latest version (in my case, I have a i7-4790K and was running on quite an old BIOS compared to the latest available 2) Enabled the "Exclusive Shares" option introduced in Unraid 6.12.4 (I had it manually done for certain heavy dockers previously like Plex, but it seems like it has helped.) 3) Disabled the KASM container (I am honestly not so sure how much this could potentially affect the outcome, but this time around I have had it stopped. I will attempt to use KASM again in the future and see if the issue is replicated) Thanks for all the help Jorge, appreciated. 1 Quote Link to comment
bucketphobia Posted November 1, 2023 Author Share Posted November 1, 2023 For anyone who may stumble on the post in the future, I have confirmed that the fix for this specific error is to enable "Exclusive Shares" option in Unraid 6.12 and to verify that it is "Yes" on your appdata share. I have brought back the dockers I had disabled/changed, and everything runs as expected. If that option is turned off, the same iowait issue comes back nearly instantaneously. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.