Very high CPU usage, Unraid freezing


Recommended Posts

Hi,

 

I've been having an issue with unraid freezing up, and have been working on tracing it but cannot seem to find an issue.

 

This last time, I was looking at HTOP and noticed very high CPU usage across the board, and checked syslog with a bunch of php-fpm errors.

 

Screenshots and diagnosis attached.

Would appreciate any thoughts / help, thanks!

 

Unraid version 6.12.4
omar-diagnostics-20231002-2055.zip
image.thumb.png.cacb2e9b210093059d4fc6a39a8d6dc5.png
image.thumb.png.6b47028d311a3d14994aa1d79a603527.png

image.png.77f72d6c1286d796b36e75c85a1904a3.png

 

Edited by bucketphobia
Link to comment
On 10/4/2023 at 10:57 AM, JorgeB said:

Looks docker related, if you disable the docker service and reboot do you still see the issue?

 

Disabling docker service does indeed seem to resolve the issue, but I cannot seem to trace it to any docker containers. If I stop all containers, the issue still persists.

 

I did two things and it seems to have helped a bit (server only died about ~3d in this time, vs 2-5 hours), but not entirely sure of how related they are: implemented the macvlan fix, and used binhex's libtorrentv1 qbit image.

 

18 hours ago, amuzhaqi said:

I just had the same issue, for me it was caused by one of the docker containers.

 

Interesting. Can you please share more details, would be great to see if I could isolate it to the same issue as you!

Link to comment
9 hours ago, JorgeB said:

Should be the same, you recreate the folder instead.

 

Just happened again - noticed these lines in syslog shortly before server froze and needed to be hard reset - any ideas if related?

 

Oct  6 00:31:39 Omar usbhid-ups[4724]: device->Product is NULL so it is not possible to determine whether to activate max_report_size workaround
Oct  6 00:31:39 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything
Oct  6 00:31:40 Omar kernel: usb 3-3: USB disconnect, device number 4
Oct  6 00:31:40 Omar usb_manager: Info: rc.usb_manager usb_remove  American_Power_Conversion_Back-UPS_BX750MI_FW:8_T©____-302202G /dev/bus/usb/003/004 003 004 
Oct  6 00:31:40 Omar usb_manager: Info: rc.usb_manager Device Match 003/004 vm:   003 004 
Oct  6 00:31:40 Omar usb_manager: Info: rc.usb_manager Removed 003/004 vm:  nostate 003 004
Oct  6 00:31:40 Omar kernel: usb 3-3: new full-speed USB device number 5 using xhci_hcd
Oct  6 00:31:40 Omar kernel: hid-generic 0003:051D:0002.0003: hiddev96,hidraw0: USB HID v1.10 Device [American Power Conversion Back-UPS BX750MI  FW:219A3541-302202G ] on usb-0000:00:14.0-3/input0
Oct  6 00:31:40 Omar usb_manager: Info: rc.usb_manager usb_add American_Power_Conversion_Back-UPS_BX750MI_FW:219A3541-302202G /dev/bus/usb/003/005 003 005
Oct  6 00:31:41 Omar upsmon[4732]: Poll UPS [[email protected]] failed - Data stale
Oct  6 00:31:41 Omar usb_manager: Info: rc.usb_manager Autoconnect No Mapping found American_Power_Conversion_Back-UPS_BX750MI_FW:219A3541-302202G /dev/bus/usb/003/005 003 005 port 3-3
Oct  6 00:31:41 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything...
Oct  6 00:37:11 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything
Oct  6 00:37:13 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything
Oct  6 00:37:15 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything
Oct  6 00:37:16 Omar upsmon[4732]: Poll UPS [[email protected]] failed - Data stale
Oct  6 00:37:17 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything
Oct  6 00:37:19 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything
Oct  6 00:37:21 Omar upsmon[4732]: Poll UPS [[email protected]] failed - Data stale
Oct  6 00:37:21 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything
Oct  6 00:37:23 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything
Oct  6 00:37:25 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything
Oct  6 00:37:26 Omar upsmon[4732]: Poll UPS [[email protected]] failed - Data stale
Oct  6 00:37:27 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything
Oct  6 00:37:29 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything
Oct  6 00:37:31 Omar upsmon[4732]: Poll UPS [[email protected]] failed - Data stale
Oct  6 00:37:31 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything
Oct  6 00:37:33 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything
Oct  6 00:37:35 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything
Oct  6 00:37:36 Omar upsmon[4732]: Poll UPS [[email protected]] failed - Data stale
Oct  6 00:37:37 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything
Oct  6 00:37:39 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything
Oct  6 00:37:41 Omar upsmon[4732]: Poll UPS [[email protected]] failed - Data stale
Oct  6 00:37:41 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything
Oct  6 00:37:43 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything
Oct  6 00:37:45 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything
Oct  6 00:37:46 Omar upsmon[4732]: Poll UPS [[email protected]] failed - Data stale
Oct  6 00:37:47 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything
Oct  6 00:37:49 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything
Oct  6 00:37:51 Omar upsmon[4732]: Poll UPS [[email protected]] failed - Data stale
Oct  6 00:37:51 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything
Oct  6 00:37:53 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything
Oct  6 00:37:55 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything
Oct  6 00:37:56 Omar upsmon[4732]: Poll UPS [[email protected]] failed - Data stale
Oct  6 00:37:57 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything
Oct  6 00:37:59 Omar usbhid-ups[4724]: libusb1: Could not open any HID devices: insufficient permissions on everything



I also installed netdata to see if that would help, and there's a large spike in iowait right before things go down.

 

image.png.0de40cc77ab4762c719125c2e218b29e.png

 

Also, in an effort to narrow down the issue, I've also uninstalled the unassigned device plugin family, and switched to ipvlan as well. Still no dice.

Edited by bucketphobia
Link to comment

Another thing I noticed - is if the dashboard is open on a tab, system load and iowait goes up, but seems to go down to normal-ish after it is closed. Scratch that, went back to very high CPU usage a short while.

 

(Initial Graph)

image.png.c00ee4ccd32a60555699c2430d6954da.png

 

 

(10-ish mins later)

image.png.1a30d3f898d7621bf5ca9ac423d19ec8.png

Edited by bucketphobia
Link to comment
5 hours ago, bucketphobia said:

Another thing I noticed - is if the dashboard is open on a tab, system load and iowait goes up, but seems to go down to normal-ish after it is closed. Scratch that, went back to very high CPU usage a short while.

As iowait upsurge, it may indicate something relate to storage issue.

 

19 hours ago, bucketphobia said:

Disabling docker service does indeed seem to resolve the issue, but I cannot seem to trace it to any docker containers. If I stop all containers, the issue still persists.

If all docker stop and just docker service start still cause the problem, it is odd ..... suggest try relocate docker storage to other storage device to try.

Edited by Vr2Io
Link to comment

To rule out any issue with docker containers - ran the server with all containers stopped, and issue still persisted with a server crash.

 

I guess it's an issue with the docker service itself. Any idea on potential fixes?

Netdata:

 

After the spike in iowait, server goes down and is unresponsive unless manually hard restarted. Stats across the board also drop (i.e., disk io, network, RAM, etc....)
image.thumb.png.5d1c92c978b8f7b63d884d60dcd385d7.png

Syslog right before this latest crash:

Oct  8 15:01:50 Omar usbhid-ups[5576]: nut_libusb_get_report: Input/Output Error
Oct  8 15:01:52 Omar usbhid-ups[5576]: device->Product is NULL so it is not possible to determine whether to activate max_report_size workaround
Oct  8 15:01:52 Omar usbhid-ups[5576]: libusb1: Could not open any HID devices: insufficient permissions on everything
Oct  8 15:01:52 Omar upsd[5580]: Data for UPS [ups] is stale - check driver
Oct  8 15:01:54 Omar usbhid-ups[5576]: device->Product is NULL so it is not possible to determine whether to activate max_report_size workaround
Oct  8 15:01:56 Omar upsmon[5584]: Poll UPS [[email protected]] failed - Data stale
Oct  8 15:01:56 Omar upsmon[5584]: Communications with UPS [email protected] lost
Oct  8 15:01:58 Omar usbhid-ups[5576]: libusb1: Could not open any HID devices: insufficient permissions on everything
Oct  8 15:01:58 Omar usbhid-ups[5576]: device->Product is NULL so it is not possible to determine whether to activate max_report_size workaround
Oct  8 15:01:59 Omar usbhid-ups[5576]: libusb1: Could not open any HID devices: insufficient permissions on everything
Oct  8 15:02:00 Omar usbhid-ups[5576]: device->Product is NULL so it is not possible to determine whether to activate max_report_size workaround
Oct  8 15:02:01 Omar usbhid-ups[5576]: libusb1: Could not open any HID devices: insufficient permissions on everything
Oct  8 15:02:01 Omar upsmon[5584]: Poll UPS [[email protected]] failed - Data stale
Oct  8 15:02:02 Omar usbhid-ups[5576]: device->Product is NULL so it is not possible to determine whether to activate max_report_size workaround
Oct  8 15:02:03 Omar usbhid-ups[5576]: libusb1: Could not open any HID devices: insufficient permissions on everything
Oct  8 15:02:04 Omar usbhid-ups[5576]: device->Product is NULL so it is not possible to determine whether to activate max_report_size workaround
Oct  8 15:02:05 Omar usbhid-ups[5576]: libusb1: Could not open any HID devices: insufficient permissions on everything
Oct  8 15:02:06 Omar upsmon[5584]: Poll UPS [[email protected]] failed - Data stale
Oct  8 15:02:06 Omar usbhid-ups[5576]: device->Product is NULL so it is not possible to determine whether to activate max_report_size workaround
Oct  8 15:02:07 Omar usbhid-ups[5576]: libusb1: Could not open any HID devices: insufficient permissions on everything
Oct  8 15:02:08 Omar usbhid-ups[5576]: device->Product is NULL so it is not possible to determine whether to activate max_report_size workaround
Oct  8 15:02:09 Omar usbhid-ups[5576]: libusb1: Could not open any HID devices: insufficient permissions on everything
Oct  8 15:02:10 Omar usbhid-ups[5576]: device->Product is NULL so it is not possible to determine whether to activate max_report_size workaround
Oct  8 15:02:11 Omar usbhid-ups[5576]: libusb1: Could not open any HID devices: insufficient permissions on everything
Oct  8 15:02:11 Omar upsmon[5584]: Poll UPS [[email protected]] failed - Data stale
Oct  8 15:02:12 Omar usbhid-ups[5576]: device->Product is NULL so it is not possible to determine whether to activate max_report_size workaround
Oct  8 15:02:13 Omar usbhid-ups[5576]: libusb1: Could not open any HID devices: insufficient permissions on everything
Oct  8 15:02:15 Omar usbhid-ups[5576]: device->Product is NULL so it is not possible to determine whether to activate max_report_size workaround
Oct  8 15:02:15 Omar usbhid-ups[5576]: libusb1: Could not open any HID devices: insufficient permissions on everything
Oct  8 15:02:16 Omar upsmon[5584]: Poll UPS [[email protected]] failed - Data stale
Oct  8 15:02:17 Omar usbhid-ups[5576]: device->Product is NULL so it is not possible to determine whether to activate max_report_size workaround
Oct  8 15:02:17 Omar usbhid-ups[5576]: libusb1: Could not open any HID devices: insufficient permissions on everything
Oct  8 15:02:19 Omar usbhid-ups[5576]: device->Product is NULL so it is not possible to determine whether to activate max_report_size workaround
Oct  8 15:02:19 Omar usbhid-ups[5576]: libusb1: Could not open any HID devices: insufficient permissions on everything
Oct  8 15:02:21 Omar usbhid-ups[5576]: device->Product is NULL so it is not possible to determine whether to activate max_report_size workaround
Oct  8 15:02:21 Omar usbhid-ups[5576]: libusb1: Could not open any HID devices: insufficient permissions on everything
Oct  8 15:02:21 Omar upsmon[5584]: Poll UPS [[email protected]] failed - Data stale
Oct  8 15:02:23 Omar usbhid-ups[5576]: device->Product is NULL so it is not possible to determine whether to activate max_report_size workaround
Oct  8 15:02:23 Omar usbhid-ups[5576]: libusb1: Could not open any HID devices: insufficient permissions on everything
Oct  8 15:02:25 Omar usbhid-ups[5576]: device->Product is NULL so it is not possible to determine whether to activate max_report_size workaround

 

Link to comment

Happened again, but some new logs in syslog right before the crash:

 

Oct 10 06:06:22 Omar winbindd[15096]: [2023/10/10 06:06:22.578378,  0] ../../source3/winbindd/winbindd_samr.c:71(open_internal_samr_conn)
Oct 10 06:06:22 Omar winbindd[15096]:   open_internal_samr_conn: Could not connect to samr pipe: NT_STATUS_IO_TIMEOUT
...

Oct 10 06:30:15 Omar usbhid-ups[5571]: nut_libusb_get_report: Input/Output Error
Oct 10 06:30:17 Omar usbhid-ups[5571]: device->Product is NULL so it is not possible to determine whether to activate max_report_size workaround
Oct 10 06:30:17 Omar usbhid-ups[5571]: libusb1: Could not open any HID devices: insufficient permissions on everything
Oct 10 06:30:17 Omar upsd[5575]: Data for UPS [ups] is stale - check driver
Oct 10 06:30:17 Omar upsmon[5579]: Poll UPS [[email protected]] failed - Data stale
Oct 10 06:30:17 Omar upsmon[5579]: Communications with UPS [email protected] lost
Oct 10 06:30:19 Omar usbhid-ups[5571]: device->Product is NULL so it is not possible to determine whether to activate max_report_size workaround
Oct 10 06:30:19 Omar usbhid-ups[5571]: libusb1: Could not open any HID devices: insufficient permissions on everything
Oct 10 06:30:21 Omar usbhid-ups[5571]: device->Product is NULL so it is not possible to determine whether to activate max_report_size workaround
Oct 10 06:30:21 Omar usbhid-ups[5571]: libusb1: Could not open any HID devices: insufficient permissions on everything
Oct 10 06:30:22 Omar upsmon[5579]: Poll UPS [[email protected]] failed - Data stale
... (some more of the UPS errors) ...
Oct 10 06:30:43 Omar usbhid-ups[5571]: libusb1: Could not open any HID devices: insufficient permissions on everything
Oct 10 06:30:43 Omar kernel: SQUASHFS error: xz decompression failed, data probably corrupt
Oct 10 06:30:43 Omar kernel: SQUASHFS error: Failed to read block 0x2e0d6c4: -5
Oct 10 06:30:43 Omar kernel: SQUASHFS error: xz decompression failed, data probably corrupt
Oct 10 06:30:43 Omar kernel: SQUASHFS error: Failed to read block 0x2e4d66c: -5
Oct 10 06:30:43 Omar kernel: SQUASHFS error: xz decompression failed, data probably corrupt
Oct 10 06:30:43 Omar kernel: SQUASHFS error: Failed to read block 0x10722d14: -5
Oct 10 06:30:43 Omar kernel: SQUASHFS error: xz decompression failed, data probably corrupt
Oct 10 06:30:43 Omar kernel: SQUASHFS error: Failed to read block 0x2e4d66c: -5
Oct 10 06:30:43 Omar kernel: SQUASHFS error: xz decompression failed, data probably corrupt
Oct 10 06:30:43 Omar kernel: SQUASHFS error: Failed to read block 0x10732670: -5
Oct 10 06:30:43 Omar kernel: SQUASHFS error: xz decompression failed, data probably corrupt
Oct 10 06:30:43 Omar kernel: SQUASHFS error: Failed to read block 0x2e4d66c: -5
Oct 10 06:30:44 Omar kernel: SQUASHFS error: xz decompression failed, data probably corrupt
Oct 10 06:30:44 Omar kernel: SQUASHFS error: Failed to read block 0x2e4d66c: -5
Oct 10 06:30:44 Omar kernel: SQUASHFS error: xz decompression failed, data probably corrupt
Oct 10 06:30:44 Omar kernel: SQUASHFS error: Failed to read block 0x10722d14: -5
Oct 10 06:30:44 Omar kernel: SQUASHFS error: xz decompression failed, data probably corrupt
Oct 10 06:30:44 Omar kernel: SQUASHFS error: Failed to read block 0x2e4d66c: -5
Oct 10 06:30:44 Omar kernel: SQUASHFS error: xz decompression failed, data probably corrupt
Oct 10 06:30:44 Omar kernel: SQUASHFS error: Failed to read block 0x2e4d66c: -5
Oct 10 06:30:44 Omar kernel: SQUASHFS error: xz decompression failed, data probably corrupt
Oct 10 06:30:44 Omar kernel: SQUASHFS error: Failed to read block 0x10722d14: -5
Oct 10 06:30:44 Omar kernel: SQUASHFS error: xz decompression failed, data probably corrupt
Oct 10 06:30:44 Omar kernel: SQUASHFS error: Failed to read block 0x2e4d66c: -5

 

Link to comment
1 hour ago, bucketphobia said:
Oct 10 06:30:44 Omar kernel: SQUASHFS error: xz decompression failed, data probably corrupt
Oct 10 06:30:44 Omar kernel: SQUASHFS error: Failed to read block 0x10722d14: -5
Oct 10 06:30:44 Omar kernel: SQUASHFS error: xz decompression failed, data probably corrupt
Oct 10 06:30:44 Omar kernel: SQUASHFS error: Failed to read block 0x2e4d66c: -5

These usually mean a flash drive problem.

  • Like 1
Link to comment
11 hours ago, JorgeB said:

These usually mean a flash drive problem.

 

Replaced the flash drive, but the same symptoms still exist unfortunately.

 

Server was slow for a few hours, then eventually died a few minutes ago.

 

Any other ideas of things to try out? Thinking of rolling back to 6.11 to see if that resolves it?

 

 

Link to comment

So as an update, I did three things as of the last post - and it appears one of them has worked, as I have ~84 hours of uptime so far with no signs of the issue happening again so far. Fingers crossed one of these is the long-term solution:

 

1) Updated BIOS to the latest version (in my case, I have a i7-4790K and was running on quite an old BIOS compared to the latest available
2) Enabled the "Exclusive Shares" option introduced in Unraid 6.12.4 (I had it manually done for certain heavy dockers previously like Plex, but it seems like it has helped.)
3) Disabled the KASM container (I am honestly not so sure how much this could potentially affect the outcome, but this time around I have had it stopped. I will attempt to use KASM again in the future and see if the issue is replicated)

Thanks for all the help Jorge, appreciated.

  • Like 1
Link to comment
  • 3 weeks later...

For anyone who may stumble on the post in the future, I have confirmed that the fix for this specific error is to enable "Exclusive Shares" option in Unraid 6.12 and to verify that it is "Yes" on your appdata share.

 

I have brought back the dockers I had disabled/changed, and everything runs as expected. If that option is turned off, the same iowait issue comes back nearly instantaneously. 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.