December 27, 20223 yr I've been experiencing intermittent freezes where the UI and all container apps become unresponsive. I am still able to connect via SSH and IPMI, and rebooting via powerdown -r restores functionality for a while. I haven't been able to correlate to any particular action by container or service. However, just prior to the most recent freeze this a.m. I noticed high RAM usage in the dashboard (98%). TOP only accounted for ~55%, so I don't think it was actually using all of the allocated RAM but thought I'd mention it. I've been running a Kiwi server on another PC since the last hang, and the syslog from today is attached. MootowerSyslogCatchAll-2022-12-27.txt
December 27, 20223 yr Nothing relevant in the syslog, see if you can get the diagnostics next time, you can also try booting in safe mode to see if it helps.
December 27, 20223 yr Author is there a way to get diagnostics via command line? They are reset upon reboot, or are they still available in the flash drive somewhere? Edited December 27, 20223 yr by VelcroBP clarifying question
December 29, 20223 yr Author Just had another OS freeze. The shares are still accessible but not any of the webUIs. I was able to run generate a diagnostics via IPMI and have attached. mootower-diagnostics-20221228-1931.zip
December 29, 20223 yr Author and here is the version created just before powerdown reboot mootower-diagnostics-20221228-1940.zip
December 29, 20223 yr Nothing relevant logged that I can see, try booting in safe mode to rule out plugin issues.
December 30, 20223 yr Author I will try safe mode. But, since I don't have a definitive way of forcing the issue, or any correlated events to test, how will I rule out plugin issues? Do I leave it run and if there's no freeze-up for x # of days then I assume it's a plugin?
December 30, 20223 yr Basically yes, if it doesn't crash start enabling services and plugins one by one.
December 30, 20223 yr Author This is by deleting all plugins (per below from a different post), then re-install one-by one and boot in normal mode? Can I leave the config folders in /plugins/ for the re-install? Quote Delete/rename all *.plg files in /boot/config/plugins, then re-enable one or a a few at a time.
December 30, 20223 yr Author Just now, VelcroBP said: then re-install one-by one Or can I just restore a copy of the .plg file into the /plugins/ folder? I have backed up the folder to another PC.
January 2, 20233 yr Author So far running stable after a couple days running in safe mode. I've renamed the extension of the .plg files. Clarification on the plugin testing: Is it enough to install one at a time during the current safe boot session? Or do I need to reboot into normal mode after restoring each .plg extension?
January 14, 20233 yr Author So I disabled all plugins and have been running in normal mode since 12/30. Every couple of days I re-enabled a few plugins, with the final batch being on 1/12. So far, no issues or freezes at all and I thought all must be well, the issue was with a plugin that was corrected by reinstalling. Today I was testing the Roku Jellyfin app (having recently setting up a container), and upon playing a file and returning to the menu, unRaid locked up again. Just the same as before, with all shares accessible and the console as well via SSH. Just the UI and apps are not responding. I can log into the Main and container WebUIs, but nothing loads. I'm attaching a diagnostic zip from during the hang (generated via console command), as well as the auto generated post-boot one. The freeze occurred at ~12:10. Syslog doesn't seem to report anything relevant, but I can attach it if needed. mootower-diagnostics- DURING - 20230114-1211.zip mootower-diagnostics- POST REBOOT - 20230114-1217.zip
January 15, 20233 yr Author I was not able to force a hang by replicating the actions with Jellyfin. It might have just coincidentally froze while I was testing Jellyfin - client stream?
January 16, 20233 yr Author Just hung up again. This time, a user was initiating a Plex remote stream that was transcoding from 1080 - SD (no idea why she would have her iOS Plex app set to SD). Attached are the diagnostics from during the freeze (~15:25) and right after reboot and starting the array. Any help anyone can provide in parsing these for a potential cause to start troubleshooting next would be greatly appreciated. I will continue disabling 1 plugin at a time unless other suggestions come in. Though I the fact that the 2 most recent incidents involved Jellyfin playback or Plex transcoding/playback make me inclined to think it's something related to video? I'm at a loss really. mootower-diagnostics-20230116-1527 -- DURING FREEZE.zip mootower-diagnostics-20230116-1537 -- AFTER REBOOT and ARRAY START.zip
January 17, 20233 yr Author ok thanks for looking. I will keep going with plugins, then I'll try disabling iGPU transcoding. Just grasping at straws really. If I can't find a root cause, I hope to have funds in the next few months to rebuild my server and replace/upgrade everything but the data drives. Just needs to hang in there until then.
January 17, 20233 yr Author I had a though last night, don't know if it's relevant. I have been getting CRC errors on one of my data drives. I've kept an eye on it, and recently it started happening more frequently. I've swapped the cable, and the mobo port and they continued. Yesterday I moved it into a different bay in the Norco 3x5 and if it still grows then I'm assuming bad drive. With that said about the drive, is it possible for an error communicating with a data drive, like during playback or transcode or whatever, could cause the OS/UI to freeze? With several of my reboots from hangs, the system fails to POST due to a SMART error with that drive. I press F1 to retry and it boots normally. Just grasping at straws really. I hope to have funds in the next few months to rebuild my server and replace/upgrade everything but the data drives. Just needs to hang in there until then lol.
January 20, 20233 yr Author New theory: issue occurs when Plex is transcoding AND Nextcloud sync operation is running? That was the system state at the time of hanging today anyway. So far Plex has been running for all the hangs I've been present for. And the only non-Plex was with Jellyfin?? Also new with today's freeze: the UI returned many errors of devices not being available for unmounting during the Powerdown, resulting in an unclean shutdown. mootower-diagnostics-20230120-1458 - HANG TIME.zip mootower-diagnostics-20230120-1501 - POST BOOT.zip
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.