glenner

Members
  • Posts

    38
  • Joined

  • Last visited

Recent Profile Visitors

The recent visitors block is disabled and is not being shown to other users.

glenner's Achievements

Noob

Noob (1/14)

0

Reputation

  1. Thanks Delarius. That's a brilliant solution... I'll go with the symbolic link. Thanks, -Glenner.
  2. Hi, I've noticed that my Logitech Media Server docker creates a log file that just balloons out of control. At one point it got to 80GB+ on my cache and caused my whole system to go unstable (see https://lime-technology.com/forums/topic/73395-all-my-dockers-are-missing-please-help) Right now, it has gotten to 3GB in 2 weeks and is growing. See server.log below: root@unraid:/mnt/user/appdata/LogitechMediaServer/logs# ls -al --block-size=M total 3337M drwxrwxrwx 1 nobody users 1M Aug 14 22:41 ./ drwxrwxrwx 1 nobody users 1M Jul 17 2017 ../ -rw-rw-rw- 1 nobody users 0M Jul 17 2017 perfmon.log -rw-rw-rw- 1 nobody users 1M Jul 19 2017 scanner.log -rw-rw-rw- 1 nobody users 3337M Aug 31 18:04 server.log -rw-rw-rw- 1 nobody users 1M Jul 18 2017 spotifyfamily1d.log It is full of these errors (thousands of them continuously coming out): [18-08-16 00:43:56.6065] Slim::Utils::Misc::msg (1252) Warning: [00:43:56.6063] EV: error in callback (ignoring): Can't call method "display" on an undefined va lue at /usr/share/perl5/Slim/Display/Lib/TextVFD.pm line 157. [18-08-16 00:43:56.7562] Slim::Utils::Misc::msg (1252) Warning: [00:43:56.7561] EV: error in callback (ignoring): Can't call method "display" on an undefined va lue at /usr/share/perl5/Slim/Display/Lib/TextVFD.pm line 157. Has anyone seen this error (google did not find much)? Does anyone know how I can limit the size of the log, either just for this docker or for all dockers? Thanks, -Glen.
  3. So last week after my system seemed to become stable again once I deleted my runaway LMS log file, I did run the balance, and then I upgraded everything over the weekend. I updated my bios, unraid 6.5.3, dockers, plugins, and recreated my appdata backups. I also found some new dockers I needed and set those up too... :-) btrfs stats are stil clean. I did a quick sanity check after each upgrade step to ensure the system was still stable... My system is up to date now. I did have to restore my plex db... as that did get corrupted in the initial outage. Fortunately, plex keeps dated backups under appdata and so that's an easy fix. I don't see my instability issues returning (missing dockers, missing VMs, errors).... at least anytime soon. I've had unraid for a year now, which I setup on a new custom pro build I bought last year. It's been solid and much better than the Windows box I used to run all my HTPC stuff on... The only issues I've seen over the last year that resulted in any kind of "outage" happened if some cache file gets huge and out of control. I saw it a while ago with a 100GB+ SageTV recording that brought down my whole server (https://forums.sagetv.com/forums/showthread.php?t=64895). And now I've seen it more recently with a huge Logitech Media Server log file that also effectively brought down my server. As best as I can tell interactions between the btrfs cache, mover settings, environment settings, and huge files can lead to issues. Once a cache file gets huge and there is "no space" left on the btrfs cache, and all the while a docker is actively attempting to write 20GB/hr to the cache, then all bets are off... I'd like to find a way to get some kind of alert if I have a huge file brewing, or excess disk usage in my cache. That might have averted all the problems I've had so far. I should never have say a 20GB+ file on the system (some SageTV recordings like a 3 hour sports program could hit 20GB before being moved off to the array, but that's the biggest file I ever want to see on the cache). Not sure if there is a plugin for that (maybe "Fix Common Problems" could scan for that), but will see if I can find something, or setup some kind of automated file size scan in the cron. In any event, I'm on 6.5.3 and I think super stable again... Thanks for your help. I really appreciate it.
  4. Thanks for the help on this upgrade trurl. I've confirmed everything is up and running smoothly on 6.5.3. Have all dockers and plugins updated and appdata backed up again. Cheers.
  5. Thanks a bunch! You're super helpful, and clearly good luck, as I'm up on 6.5.3 with no issues so far... Dockers and VMs are up, and I'm trying to take a tour of what's new. I'm going to reinstall my plugins now, assuming they are all still apply in 6.5.3. I wonder if some plugins that were useful in 6.3.5 may no longer apply or be needed a year later in 6.5.3. Or maybe some of them have been sucked into the base OS. Community Apps unassigned devices tips and tweaks (see screenshots for config) CA backup/restore dynamix system stats dynamix system information dynamix system temperature dynamix system buttons dynamix ssd trim I needed tips and tweaks in 6.3.5 to fix some cache parameters or I would get EOM errors (see screencap). That was covered here in this thread. I'm going to assume I still need these disk cache settings in 6.5.3. I also was running dynamix SSD trim daily. I'll assume I need to reenable that too.
  6. I actually cannot find dynamix.plg on my /boot flash... Here are some commands I just ran... Can't find that file anywhere.. root@unraid:/boot/config/plugins# ls -al total 112 drwxrwxrwx 14 root root 4096 Aug 17 17:20 ./ drwxrwxrwx 6 root root 4096 Aug 17 15:16 ../ drwxrwxrwx 3 root root 4096 Mar 15 18:55 NerdPack/ -rwxrwxrwx 1 root root 8138 Mar 15 18:55 NerdPack.plg* drwxrwxrwx 2 root root 4096 Aug 15 10:10 ca.cleanup.appdata/ -rwxrwxrwx 1 root root 2679 Aug 15 10:10 ca.cleanup.appdata.plg* drwxrwxrwx 3 root root 4096 Mar 15 18:51 ca.update.applications/ -rwxrwxrwx 1 root root 4352 Mar 15 18:51 ca.update.applications.plg* drwxrwxrwx 5 root root 4096 Jun 30 2017 dockerMan/ drwxrwxrwx 3 root root 4096 Aug 17 15:37 dynamix/ drwxrwxrwx 2 root root 4096 Apr 24 23:02 dynamix.apcupsd/ drwxrwxrwx 2 root root 4096 Jul 4 2017 dynamix.vm.manager/ drwxrwxrwx 3 root root 4096 Aug 17 14:50 fix.common.problems/ -rwxrwxrwx 1 root root 12268 Aug 15 10:11 fix.common.problems.plg* drwxrwxrwx 2 root root 4096 Aug 15 10:11 preclear.disk/ -rwxrwxrwx 1 root root 14622 Aug 15 10:11 preclear.disk.plg* drwxrwxrwx 2 root root 4096 Jun 30 2017 statistics.sender/ drwxrwxrwx 2 root root 4096 Aug 17 15:25 unassigned.devices/ drwxrwxrwx 3 root root 4096 Aug 15 10:13 user.scripts/ -rwxrwxrwx 1 root root 5854 Aug 15 10:13 user.scripts.plg* root@unraid:/boot/config/plugins# find . -name dynamix.plg root@unraid:/boot/config/plugins# find . -name dynamix.cfg ./dynamix/dynamix.cfg root@unraid:/boot/config/plugins# ls -al dynamix total 44 drwxrwxrwx 3 root root 4096 Aug 17 15:37 ./ drwxrwxrwx 14 root root 4096 Aug 17 17:20 ../ -rwxrwxrwx 1 root root 145 Feb 19 16:49 docker-update.cron* -rwxrwxrwx 1 root root 866 Aug 17 15:37 dynamix.cfg* -rwxrwxrwx 1 root root 116 Feb 19 16:49 monitor.cron* -rwxrwxrwx 1 root root 30 Aug 17 13:24 monitor.ini* -rwxrwxrwx 1 root root 73 Aug 17 14:40 mover.cron* -rwxrwxrwx 1 root root 88 Aug 6 2017 parity-check.cron* -rwxrwxrwx 1 root root 138 Feb 19 16:49 plugin-check.cron* -rwxrwxrwx 1 root root 120 Feb 19 16:49 status-check.cron* drwxrwxrwx 2 root root 4096 Jun 30 2017 users/
  7. When I did the "plugin update check" just now, it clearly shows me my 8 .plg files and they can be mapped to my 8 plugins in the UI. "dynamix.plg" is the "Dynamix webGUI" built-in plugin. I don't remember installing this myself, so I'm guessing if must be part of 6.3.5. You think that's safe to manually delete from the flash config folder? And so the steps would be? Delete the dynamix.plg from flash. Reboot Update unraid server OS plugin to 6.5.3 Reboot
  8. Ok.. so I've removed a whole bunch of plugins I had on my 6.3.5 system: unassigned devices all dynamix plugins community apps CA backup/restore tips and tricks, etc. But I still have 1 error from the update assistant: Checking for plugin updatesIssue Found: dynamix.plg (dynamix) is not up to date. It is recommended to update all your plugins.Checking for plugin compatibilityIssue Found: dynamix.plg (dynamix) is not known to Community Applications. Compatibility for this plugin CANNOT be determined and it may cause you issues. I don't have any dynamix plugins left, except for the dynamix webgui plugin, but that is "built-in" and so can't remove it... All I have left are the 8 plugins shown in this in the screencap. Any thoughts on how I can clear this last error?
  9. Thanks trurl! I'm uninstalling any plugins that are bothersome... And so just to be clear, I just stop the array and update the unraid OS plugin to 6.5.3? I don't need to enter "maintenance mode"? Was trying to find some explicit instructions... looks like you just stop the array. This is my first unraid OS upgrade, so hoping this goes smooth :-). Thanks.
  10. Hi. I'm trying to move to the latest 6.5.3 unraid OS today. Before I do the upgrade, I've been trying to clean up some things. I've upgraded my bios, updated my dockers, and plugins... There are some plugins I cannot update because they prereq a higher version of unraid. I ran the update assistant and get a few "issues". See below. Do I have to worry about these plugin "issues found"? Or am I good to go with this upgrade? Thanks for any guidance! Disclaimer: This script is NOT definitive. There may be other issues with your server that will affect compatibility.Current unRaid Version: 6.3.5 Upgrade unRaid Version: 6.5.3Checking cache drive partitioningOK: Cache drive partition starts on sector 64Checking for plugin updatesIssue Found: community.applications.plg (community.applications) is not up to date. It is recommended to update all your plugins.Issue Found: dynamix.plg (dynamix) is not up to date. It is recommended to update all your plugins.Issue Found: dynamix.system.stats.plg (dynamix.system.stats) is not up to date. It is recommended to update all your plugins.Issue Found: tips.and.tweaks.plg (tips.and.tweaks) is not up to date. It is recommended to update all your plugins.Issue Found: unassigned.devices.plg (unassigned.devices) is not up to date. It is recommended to update all your plugins.Checking for plugin compatibilityIssue Found: ca.backup.plg (unassigned.devices) is deprecated for ALL unRaid versions. This does not necessarily mean you will have any issues with the plugin, but there are no guarantees. It is recommended to uninstall the pluginIssue Found: dynamix.plg (unassigned.devices) is not known to Community Applications. Compatibility for this plugin CANNOT be determined and it may cause you issues.Checking for extra parameters on emhttpOK: emhttp command in /boot/config/go contains no extra parametersChecking for zenstates on Ryzen CPUOK: Ryzen CPU not detectedChecking for disabled disksOK: No disks are disabledChecking installed RAMOK: You have 4+ Gig of memoryChecking flash driveOK: Flash drive is read/writeChecking for valid NETBIOS nameOK: NETBIOS server name is compliant.Checking for ancient version of dynamix.plgOK: Dynamix plugin not foundChecking for VM MediaDir / DomainDir set to be /mntOK: VM domain directory and ISO directory not set to be /mntChecking for mover logging enabledMover logging is enabled. While this isn't an issue, it is now recommended to disable this setting on all versions of unRaid. You can do this in Settings - Schedule - Mover Schedule.Issues have been found with your server that may hinder the OS upgrade. You should rectify those problems before upgrading
  11. Status update: It looks like deleting the runaway Logitech log file and thereby freeing up huge amount of space on my cache has fixed my server. Here is what it looks like on my end: As I said in my last post, I stopped all the dockers last night and started making /mnt/cache backups. While using mc, I noticed it was taking a long time to copy over this file: /mnt/cache/appdata/LogitechMediaServer/logs/server.log I checked the file and found it to be a year's worth of obscenely verbose logging by LMS and the file was 85GB. I trashed the file and redid the cache backup. My cache usage has since dropped to 62GB out of 250GB used. I have not run the btrfs balance at this point. I rebooted and started the array. All of my dockers were immediately back up and running. My Windows 10 VM is also back and up and running. I have zero errors when running: btrfs dev stats /mnt/cache I did not rebuild my cache, docker.img, or VM. I only deleted the one 85GB log file and rebooted... I'm not clear that my docker image or VM are corrupted... They don't appear to be as best as I can tell. I checked the syslog, and app logs and don't see anything amiss. No errors that I can see... I've since updated a few plugins and dockers... stopped and started dockers from the UI. It all works. I've posted my latest diags... I'm pretty sure all of the issues I have had, including the write errors, are a result of the one out of control log file, and the way btrfs cache seems to operate in this particular situation where it thinks there is no space left for some reason (even though I should have still had 90GB free even with the massive log file present). I'll continue monitoring it over the next while to see if anything changes, but it looks pretty clear to me that this is what has happened in this case. unraid-diagnostics-20180815-2124.zip
  12. Thanks Johnnie. I have not run the rebalance just yet... I will run the balance after I make some backups of my cache. I'll try: btrfs balance start -dusage=75 /mnt/cache Right now, I've been taking screencaps of as much of my docker and system config as possible, in case I need to more fully rebuild my whole system for some reason. I'm also trying to use mc, rsync, and CA backup/restore to create a backup of /mnt/cache. You can never have too many backups at times like this. I also want to see what crashplan may have backed up for me if I can get that docker up. I do have a crashplan backup on my array, but can't tell what's on it. Note to myself: crashplan backup on the array is not very useful if the crashplan docker is offline and unusable. The first thing I noticed is that Logitech Media Center has a log file that is out of control. A year's worth of logging has produced an 85GB file, or more than half of my stated used cache. I've trashed the log. I'll need to figure out how to limit that log going forward. Damn! Ideally this kind of runaway file should just not be allowed, or maybe an alert could be triggered? Will need to look at that... But now I'm wondering if this runaway log could trigger most all of the issues I've been having, including the write errors? root@unraid:/mnt/cache/appdata/LogitechMediaServer/logs# ls --block-size=M -al total 84545M drwxrwxrwx 1 nobody users 1M Jul 19 2017 ./ drwxrwxrwx 1 nobody users 1M Jul 17 2017 ../ -rw-rw-rw- 1 nobody users 0M Jul 17 2017 perfmon.log -rw-rw-rw- 1 nobody users 1M Jul 19 2017 scanner.log -rw-rw-rw- 1 nobody users 84545M Aug 13 01:05 server.log -rw-rw-rw- 1 nobody users 1M Jul 18 2017 spotifyfamily1d.log
  13. Thanks Johnnie. This is what I've been seeing in the 24 hours after resetting the cache disk error stats: 1. I don't have any new errors since. root@unraid:/mnt# btrfs dev stats /mnt/cache [/dev/nvme0n1p1].write_io_errs 0 [/dev/nvme0n1p1].read_io_errs 0 [/dev/nvme0n1p1].flush_io_errs 0 [/dev/nvme0n1p1].corruption_errs 0 [/dev/nvme0n1p1].generation_errs 0 [/dev/nvme1n1p1].write_io_errs 0 [/dev/nvme1n1p1].read_io_errs 0 [/dev/nvme1n1p1].flush_io_errs 0 [/dev/nvme1n1p1].corruption_errs 0 [/dev/nvme1n1p1].generation_errs 0 2. I have 9 dockers configured and usually they would all be up. Right now I'm only running 5 dockers: sagetv, logitech media server, deluge, sickrage, and openvpn. Crashplan, duckdns, handbrake and plex are shutdown. 3. My SageTV docker recorded a bunch of shows last night (cache writes).... and I was able to watch another show simultaneously (cache and array reads depending on what I'm watching). Last night's recordings would results in at 20GB+ being written to the cache. No issues in these recordings... 4. My SageTV front end UI did slow down and become erratic while I was watching a show last night about Aug 13 23:49:03. 5. See the syslog. I start getting errors like this below. Between 20:04 (last mover run) and 23:49, SageTV is busy recording let's say ~20GB+ of prime time shows. Aug 13 20:04:16 unraid root: mover finished Aug 13 23:49:03 unraid shfs/user: err: shfs_write: write: (28) No space left on device Aug 13 23:49:03 unraid shfs/user: err: shfs_write: write: (28) No space left on device Aug 13 23:52:15 unraid kernel: loop: Write error at byte offset 3852955648, length 4096. Aug 13 23:52:15 unraid kernel: blk_update_request: I/O error, dev loop1, sector 7525296 Aug 13 23:52:15 unraid kernel: BTRFS error (device loop1): bdev /dev/loop1 errs: wr 433, rd 0, flush 0, corrupt 0, gen 0 Aug 13 23:52:15 unraid shfs/user: err: shfs_write: write: (28) No space left on device Aug 13 23:52:15 unraid shfs/user: err: shfs_write: write: (28) No space left on device Aug 13 23:54:31 unraid shfs/user: err: shfs_write: write: (28) No space left on device Aug 13 23:54:31 unraid shfs/user: err: shfs_write: write: (28) No space left on device Aug 13 23:57:06 unraid shfs/user: err: shfs_write: write: (28) No space left on device 6. These errors stop once I run the mover which finishes at: Aug 14 00:24:29 unraid root: mover finished The mover has moved the recently recorded shows from the cache to the array thereby clearing ~20GB from the cache. 7. I have since changed the mover to run hourly in order to keep the cache as lean as possible... and have not had these out of space errors so far today. 8. So the BTRFS errors and the "no space left on device" errors are dependent on whether I run the mover or not. 9. My system should have lots of free space on the cache and so it's not clear to me why it thinks it runs out of space unless the mover moves 10-20GB off the cache on an hourly basis. Main tab reports 156/250 GB used, with 94GB free. I should not be out of space? 10. Even with all these "BTRFS error (device loop1)" in the log, the error stats are still 0. So wondering what you think... I realize I still need to rebuild my cache to fix my system... but when would we expect my SSD write errors to recur? Wouldn't writing 20GB to the SSDs last night cause the error counts to increment? Is it time to replace SSDs or wait a bit longer and try to just restore my cache on my current hardware? syslog.txt
  14. For #1, I'm getting set to do that and create an appdata backup. I'm just avoiding shutting down my dockers while my wife and kids are watching TV... WAF is an issue for me my server is mission critical. :-) But so... I could also just use midnight commander (mc) to make a full copy of my /mnt/cache folder to a backup folder on the array? That will work too? Do you have to shutdown dockers before doing a backup? I did have crashplan installed at one point and it was backing up appdata... but I'd rather not use crashplan if I can avoid it. Either #1 or mc sounds much easier to me.
  15. Thanks johnnie.black. Ran it just now, here is the result. It does seem like a lot of errors. I'm not sure when these stats were last reset... On a working system you will only ever see 0's here? Or can some kind of docker software issue also errors? ie. Plex transcoding issues, SageTV tuner hardware loses signal and results in corrupted TV mpg recording (this happens sometimes). I'm just trying to make sure... Are you suggesting I need to pull these SSD cards and put in new ones? I'll post updated status tomorrow... Right now, I still I mostly only have SageTV up and running and hitting the cache at up to 10-18GB/hr or so at times... So lots of IO goes to this cache on a daily basis. I think I need to shutdown my dockers to make a cache and appdata backup... I'm trying to figure out how to recover my system and rebuild my cache.... root@unraid:/mnt/cache# btrfs dev stats -z /mnt/cache [/dev/nvme0n1p1].write_io_errs 14402 [/dev/nvme0n1p1].read_io_errs 1 [/dev/nvme0n1p1].flush_io_errs 0 [/dev/nvme0n1p1].corruption_errs 0 [/dev/nvme0n1p1].generation_errs 0 [/dev/nvme1n1p1].write_io_errs 200298 [/dev/nvme1n1p1].read_io_errs 0 [/dev/nvme1n1p1].flush_io_errs 0 [/dev/nvme1n1p1].corruption_errs 0 [/dev/nvme1n1p1].generation_errs 0