MarkRMonaco

Members
  • Posts

    85
  • Joined

  • Last visited

Everything posted by MarkRMonaco

  1. @Linus, glad to hear it worked out for you. I got lucky on my end and my system stabilized on 6.9.1.
  2. Update, I'm just shy of 28hrs of uptime... I'm leaning towards this issue being resolved.
  3. Not that I'm trying to jinx myself, but the system has been online for about 13hrs now. We'll see if it remains online while I'm at work...
  4. @Linus, let me know if you had any luck after downgrading. I think one of the reasons I was unsuccessful in my downgrade attempt, is that my cache drive required a XFS repair (as I mentioned in my previous post). I did wind up going back to 6.9.1 since I was not seeing any differences in stability on 6.9.0-rc2. Unfortunately, I ran into another lock-up this morning after that & the XFS repair. Like your other post, I was unable to find anything useful in the syslog before/after I brought the system back online. Therefore, I wound up starting a separate topic so I could post logs, etc.
  5. Thanks @Hoopster. Those traces were from the crash/lock-up that occurred overnight. I brought the system back online in the 7am (CST) range this morning and ran a XFS repair on my cache drive (once I saw the "rcu_sched self-detected stall on CPU" error in the logs, and checked the forum). After that, I rebooted the system at least one more time (maybe two) before I went to work. Unfortunately, the syslog did not have any further mentions of "traces" or "self-detected stalls" before or after the most recent lock-up this morning (10:28:56 am CST).
  6. Additional things that I've checked: BIOS Version - Was one version behind. Just brought it current (after the most recent lock-up). Global C-States (BIOS) - Verified it was disabled Current Control (BIOS) - Verified it was set to "Typical Current Idle" XMP Profiles (BIOS) - Verified it was disabled Downcore Control (BIOS) - Verified it was disabled Docker - "Host access to custom networks" was already disabled/off.
  7. Over the past few days, my server has been going into an unresponsive state at random times. My only recourse has been to force the system down via the power button. I originally captured a "rcu_sched self-detected stall on CPU" error early this morning (before it locked-up). Once I brought the system back online, I ran a XFS repair on my cache drive (after reading this post on the forum), and have not seen any further instances of the error. Mar 30 02:54:36 WadeWilson kernel: rcu: INFO: rcu_sched self-detected stall on CPU However, a few hours after my rebooting and fixing the cache drive, the system went unresponsive again (around 10:28:56 am CST) this morning while I was at work. Once I got back home, I brought the system back back online around 5:41:45 pm CST (17:41:45) after forcing a shutdown with the power button. Unfortunately, the syslog did not have anything useful this time (no mentions of "traces" or "self-detected stall"): Mar 30 10:00:01 WadeWilson crond[2094]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null Mar 30 10:28:50 WadeWilson dhcpcd[1991]: br0: Router Advertisement from fe80::aa5e:45ff:feee:1a38 Mar 30 10:28:56 WadeWilson dhcpcd[1991]: br0: Router Advertisement from fe80::aa5e:45ff:feee:1a38 Mar 30 17:41:45 WadeWilson kernel: mdcmd (36): set md_write_method 1 Mar 30 17:41:45 WadeWilson kernel: Mar 30 17:41:45 WadeWilson root: Delaying execution of fix common problems scan for 10 minutes Mar 30 17:41:45 WadeWilson unassigned.devices: Mounting 'Auto Mount' Devices... A few additional notes: I had a concern that my system was experiencing a bug reported in the forum since the majority of my docker containers are using a static IP on br0. However, I have not found/seen any "kernel panics" either in the syslog. I did try to downgrade to 6.8 yesterday (3/29). However, I was unable to start the array because of my cache drive (assuming because the drive needed the XFS repair and I didn't know it). I also wound up trying 6.9.0-rc2 again yesterday (3/29), because I did not recall having these stability issues while on it. However, that did not make any impact/improvement. Therefore, I went back to 6.9.1. The stability issues were not present when I originally upgraded to 6.9 and 6.9.1 stable when either were released. The only other item worth mentioning is that my server lost power about a week ago due to a local power outage in my neighborhood. The system currently does not have a UPS connected to it. wadewilson-diagnostics-20210330-1747.zip syslog-172.28.3.249.log
  8. Spoke too soon with mine... Went down overnight. Logs showed a self-reported CPU stall. From what I've seen on the forum, it's pointing to cache drive corruption. So, I ran xfs_repair and rebooted. Back to monitoring...
  9. @Linus, I was running into stability issues as well on my server with 6.9.1-stable, which the frequency of freezing/locking-up increased over the past few days (where I would get a few hours or so of stability after bringing the system back online). For me, I may have been experiencing a bug (that others reported) regarding using Docker containers on br0 (with a static IP). However, I was never able to capture the telltale "kernel panic" error in the logs (due to my syslog server config previously not working). I attempted downgrading back to 6.8 (since using VLANs was not an option for me). Unfortunately, it did not recognize my cache drive being formatted as XFS. Therefore, I used the Unraid USB Creator tool, and downloaded 6.9.0-rc2 (which worked for me in the past). Afterward, I restored my config backup, booted up my server, and have been stable so far (I'll have to see if it remains online overnight)...
  10. I was finally able to get my syslog working thanks to this topic:
  11. I was able to get my Forge server to work for the time being with the docker image from "Veriwind's Repository". However, I would eventually like to get it to work under Binhex's.
  12. Thanks @SpaceInvaderOnefor the tutorial info. My question pertains to working around the container wanting to use the latest server jar version, and use a specific older one instead. I checked the docker tags available and they do not go back far enough (in terms of versions). That's why I was asking if there is a variable (etc) that I can use to point to a specific jar file in the appdata folder (named different than the one that gets auto-updated).
  13. Has anyone figured out how to specify an alternate server jar file? I'm trying to run a v1.12.2 instance for Forge. I originally had it running under Binhex's MineOS container, but found that the WebGUI for it would stop working whenever the Forge server was running.
  14. Hi, I have an issue that I've been trying to address where the kernel reports that the clock is unsynchronized in the syslog. I have already rebuilt my flash drive with a fresh copy of Unraid and restored my config files a few weeks ago (for a different issue). To combat this, I have a user script that runs on a cron schedule (currently set to 2hrs) to force an NTP resync, but I would like to get rid of the error altogether. Also, I have already replaced the CMOS battery on the motherboard last month. When checking the system time (under Date & Time) on the dashboard, the clock does appear to be accurate (at least for the hour & minute). Error: Feb 11 13:29:54 WadeWilson ntpd[2070]: kernel reports TIME_ERROR: 0x41: Clock Unsynchronized Any ideas or should I simply ignore it? Logs are attached. wadewilson-diagnostics-20210211-1334.zip
  15. I'm about to hit 20hrs since my last reboot. So, I guess we can call this one solved.
  16. Since XMP was the last thing that I turned off and the system has been fine since, I can at least rule out the RAM being bad. Just odd that this happened now (unless it was building over time).
  17. @Altheran, if you have not done it already, I would suggest moving plex to its own dedicated SSD via unassigned devices. I have mine running on a 2nd nvme SSD formatted as XFS. From there, I have my Plex docker container's appdata mapped to the unassigned drive's mount point. Its helped to take the load off Unraid's SSD (not only in performance but storage used) and has not been affected by my recent stability issues. You can use Krusader to move the Plex appdata when its container is not running.
  18. I let the system run for a little over two hours and went ahead with reformatting my cache drive to XFS.
  19. @jonathanm, that would make a lot more sense. I have two slots configured (as I was going to add a 2nd drive a few months ago, but never did). I'll worry about reformatting once I get my system stabilized.
  20. Another update, I had a few more issues pop-up last night. First one, the WebGUI was complaining that the license file was missing/corrupted. So, I redownloaded a fresh copy of my key file, and placed it on the USB drive. I then moved the drive to a different port on the computer (because I was also seeing a mention of "reset SuperSpeed Gen 1 USB device" in the syslog). Once the system was back up, it ran for several hours without issues (before I went to bed). When I woke up this morning, I found that the system was being unresponsive (hard-locked). I verified in the BIOS that Global C-States was already off, and typical current was already enabled. Therefore, I turned off spread spectrum (since XMP was enabled at that time). Once it was back up, I added "rcu_nocbs=0-15" to the syslinux config and rebooted (at the time I didn't realize that I mistyped it and had "cu_nocbs=0-15" in the config). From there, I went out to the store and came back a few hours later to another hard-lock. This time, I went back into BIOS and turned off XMP. Once the system was back up, I corrected the "rcu_nocbs=0-15" entry and rebooted. From there, I opened putty on my other computer and began a tail on the syslog. Note, I already have the syslog server enabled on the unraid system with it looping back to itself, but have never been able to get anything to write to the share. As for the USB drive itself, I have Unraid configured to use UEFI mode.
  21. Thanks. I'll have to look into that. When I reassigned the cache drive, it automatically formatted as btrfs.
  22. Well, this morning I logged in and found that the WebGUI was reporting that it couldn't access flash. So, I powered the server down, reformatted the thumb drive with a fresh copy of 6.9-rc2, and restored my config backup. Now, I need to start the parity check all over again... joy.
  23. I agree @Squid. I'm assuming there is a chance that the appdata may have had some corruption in it when it was backed-up before the reformat. At the moment, everything seems to be ok and any btrfs errors were corrected (according to the logs). Since it is doing a parity check after the forced reboot, I'm going to let it sit for the time being. If I see anything else pop-up in the system log or if any other abnormal activity occurs, I'll reply back here (hopefully, w/ logs).
  24. Just another update. Ran into some issues while restoring my docker containers from my saved templates. It would occasionally cause the docker service to stop. In most cases, I was able to stop the array and restart it, which would get docker running again. However, at some point, stopping the array would get hung up at the cache drive. Thankfully, I was able to stop it via terminal with "umount -l /mnt/cache". At which point, I rebooted the server and immediately ran btrfs scrub again. Errors/corruption were detected, but corrected. Scrub device /dev/nvme0n1p1 (id 1) done Scrub started: Thu Jan 28 21:51:42 2021 Status: finished Duration: 0:00:54 Total to scrub: 235.02GiB Rate: 3.15GiB/s Error summary: no errors found WARNING: errors detected during scrubbing, corrected Unfortunately, I forgot to pull logs before I rebooted... I'll continue to keep an eye on it and will report back (w/ logs) if anything changes.