Jump to content

MarkRMonaco

Members
  • Posts

    93
  • Joined

  • Last visited

Posts posted by MarkRMonaco

  1. @Altheran, if you have not done it already, I would suggest moving plex to its own dedicated SSD via unassigned devices. I have mine running on a 2nd nvme SSD formatted as XFS. From there, I have my Plex docker container's appdata mapped to the unassigned drive's mount point. Its helped to take the load off Unraid's SSD (not only in performance but storage used) and has not been affected by my recent stability issues.

     

    You can use Krusader to move the Plex appdata when its container is not running.

  2. 27 minutes ago, jonathanm said:

    You must change the desired format type to XFS if that's what you want to use. That option will not be available if you have more than 1 slot available for cache disks.

    @jonathanm, that would make a lot more sense. I have two slots configured (as I was going to add a 2nd drive a few months ago, but never did). I'll worry about reformatting once I get my system stabilized.

  3. Another update, I had a few more issues pop-up last night. First one, the WebGUI was complaining that the license file was missing/corrupted. So, I redownloaded a fresh copy of my key file, and placed it on the USB drive. I then moved the drive to a different port on the computer (because I was also seeing a mention of "reset SuperSpeed Gen 1 USB device" in the syslog). Once the system was back up, it ran for several hours without issues (before I went to bed).

     

    When I woke up this morning, I found that the system was being unresponsive (hard-locked). I verified in the BIOS that Global C-States was already off, and typical current was already enabled. Therefore, I turned off spread spectrum (since XMP was enabled at that time). Once it was back up, I added "rcu_nocbs=0-15" to the syslinux config and rebooted (at the time I didn't realize that I mistyped it and had "cu_nocbs=0-15" in the config).

     

    From there, I went out to the store and came back a few hours later to another hard-lock. This time, I went back into BIOS and turned off XMP. Once the system was back up, I corrected the "rcu_nocbs=0-15" entry and rebooted. From there, I opened putty on my other computer and began a tail on the syslog. Note, I already have the syslog server enabled on the unraid system with it looping back to itself, but have never been able to get anything to write to the share.

     

    As for the USB drive itself, I have Unraid configured to use UEFI mode.

  4. I agree @Squid.  I'm assuming there is a chance that the appdata may have had some corruption in it when it was backed-up before the reformat. At the moment, everything seems to be ok and any btrfs errors were corrected (according to the logs). Since it is doing a parity check after the forced reboot, I'm going to let it sit for the time being. If I see anything else pop-up in the system log or if any other abnormal activity occurs, I'll reply back here (hopefully, w/ logs).

  5. Just another update. Ran into some issues while restoring my docker containers from my saved templates. It would occasionally cause the docker service to stop. In most cases, I was able to stop the array and restart it, which would get docker running again. However, at some point, stopping the array would get hung up at the cache drive. Thankfully, I was able to stop it via terminal with "umount -l /mnt/cache". At which point, I rebooted the server and immediately ran btrfs scrub again. Errors/corruption were detected, but corrected. 

     

    Scrub device /dev/nvme0n1p1 (id 1) done
    Scrub started:    Thu Jan 28 21:51:42 2021
    Status:           finished
    Duration:         0:00:54
    Total to scrub:   235.02GiB
    Rate:             3.15GiB/s
    Error summary:    no errors found
    WARNING: errors detected during scrubbing, corrected

     

    Unfortunately, I forgot to pull logs before I rebooted...

     

    I'll continue to keep an eye on it and will report back (w/ logs) if anything changes.

  6. Just an update on this. Reading other information on the forums and the wiki, I decided to reformat the drive. As a precaution, I first erased the drive, unassigned it, formatted to XFS, reassigned and let it automatically reformat it back to BTRFS. When rebuilding the docker image file, I also opted for the XFS option.

     

    Running appdata backup/restore as we speak...

  7. Ran into a kernel panic error within the last 24 hours. Recently (within the past hour or so), my Unraid server went unresponsive. At the time, the only activity was a single user on Plex (Docker) transcoding. I had to force the system to shutdown (via power button) and turn it back on again. As a precaution, since transcoding was on that cache drive, I moved it to RAM for the time being.

     

    Looking at the log, I found one instance of BTRFS complaining about corruption:

    Jan 28 20:47:26 WadeWilson kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0

     

    Since my cache drive is the only BTRFS formatted device in the system, I ran scrub on it:

    Scrub device /dev/nvme0n1p1 (id 1) done
    Scrub started:    Thu Jan 28 20:57:14 2021
    Status:           finished
    Duration:         0:00:25
    Total to scrub:   133.02GiB
    Rate:             2.77GiB/s
    Error summary:    csum=1
      Corrected:      0
      Uncorrectable:  1
      Unverified:     0
    ERROR: there are uncorrectable errors

     

    Anyone have any advice on what I should do next? Logs are attached.

    wadewilson-diagnostics-20210128-2102.zip

  8. Looking at my system log (and the fact that my local share is never populated with files), I am suspecting that Rsyslog is never initializing all the way. - I've tried various configuration changes, including trying to send (loop) the files back to the server using the remote address option (per a tip I received in one of my prior forum posts). However, it never seems to work correctly... - Is there a way to reinstall/rebuild it?

     

    System Log:

    Oct 1 10:50:09 WadeWilson rsyslogd: [origin software="rsyslogd" swVersion="8.2002.0" x-pid="21673" x-info="https://www.rsyslog.com"] start
    Oct 1 10:50:35 WadeWilson ool www[21084]: /usr/local/emhttp/plugins/dynamix/scripts/rsyslog_config
    Oct 1 10:50:38 WadeWilson rsyslogd: Could not find template 1 'remote' - action disabled [v8.2002.0 try https://www.rsyslog.com/e/3003 ]
    Oct 1 10:50:38 WadeWilson rsyslogd: error during parsing file /etc/rsyslog.conf, on or before line 121: errors occured in file '/etc/rsyslog.conf' around line 121 [v8.2002.0 try https://www.rsyslog.com/e/2207 ]
    Oct 1 10:50:38 WadeWilson rsyslogd: [origin software="rsyslogd" swVersion="8.2002.0" x-pid="21912" x-info="https://www.rsyslog.com"] start

    My "syslog" Share:

    image.png.2da7dbae1064bc8398f29b9da37a692b.png

     

    Current Settings:

    image.thumb.png.caa42f4b6f03deeff20c7bd160d8576e.png

     

    This is specific line that the error is complaining about in /etc/rsyslog.conf:

    image.png.d51de4dcc603c98b93d2b07a02d4e2c8.png

     

    wadewilson-diagnostics-20201001-1106.zip

  9. Everything appears to be fixed on my end and no further CRC errors have shown up since the cables were replaced. - For the time being, I keep the RAM at the factory XMP speed without any additional overclocks applied.

     

    As for the Syslog server, it is still not functioning correctly. Therefore, I may look into either a Docker that can serve the same purpose (making the Unraid server itself just a client to send logs) or one of my Raspberry Pi's.

  10. Ok, I just received a CRC error on a completely different drive. Therefore, I'm pretty confident that the cable itself is bad (or going bad). - Since that particular drive was my only parity drive. I'm going to pause the rebuild just to be safe. I'm going to shut down the server and have already placed a store pickup order for replacement cables (going to replace all 4 since the original cables were a sleeved bundle) from my local Fry's. -- Just waiting on the "order ready" confirmation...

  11. Thanks @kevschu.

     

    An update on my end...

    About a half of day later, the drive went back into "disabled" status due to errors. - Therefore, I went into the BIOS and brought the RAM clock back down to the base/stock XMP setting (3000MHz) w/ no additional overclock. From there, I shut the system back down, and swapped the SATA cable ordering (they're physically tagged) across the four 3.5" drives (1 through 4, top to bottom; versus 4 through 1). All of the power connectors were checked as well to ensure that they were fully seated.

     

    Now, I'm back to square one with the parity rebuild/sync since the drive had to be removed and re-added to the pool...

     

    In the meantime, let me know if anyone is interested in a new set of logs pulled from the system.

  12. On 9/14/2020 at 11:41 AM, MarkRMonaco said:

    Just an update. - I fixed the "system" share issue and made sure that it only resides within the cache pool. The other files that were on one of my drives were outdated. Therefore, I deleted them through Krusader.

     

    I also did the following:

    • Enabled the local syslog server and have it mirroring between the cache pool and the "flash" share.
    • Reverted back to stock (and rebooted) from the linuxserver.io Nvidia (Unraid Nvida plugin) image since my card wasn't supported.
    • Turned off (disabled) any ErP or C-State settings in the BIOS (which were previously enabled).

    Just an update, my drive is almost 100% rebuilt and I have not ran into issues with it being unresponsive (yet). So, it looks like one of these steps (above) solved the issue.

     

    image.png.3fc48cb94acc591e88df8fd4f59c9e8a.png

     

    I am, however, still experiencing issues actually getting syslog working at all. The Unraid share has yet to be populated with anything, and I am still running into that single error message whenever the service is started or restarted.

     

    image.png.e40b0b3ba196a7a50040cdfce8629e75.png

  13. 1 hour ago, trurl said:

    I think you have to fill in the Remote syslog server to tell Unraid to send them to itself.

     

    This is mentioned as the 3rd option in that FAQ I linked.

    Thanks. I missed that part.

     

    With the "flash mirroring" option turned off and the syslog server set to "both" for protocols, I'm still getting one error message returned when the service started/restarted:

    Starting rsyslogd daemon: 
    /usr/sbin/rsyslogd -i /var/run/rsyslogd.pid
    rsyslogd:  Could not find template 1 'remote' - action disabled [v8.2002.0 try https://www.rsyslog.com/e/3003 ]
    rsyslogd: error during parsing file /etc/rsyslog.conf, on or before line 121: errors occured in file '/etc/rsyslog.conf' around line 121 [v8.2002.0 try https://www.rsyslog.com/e/2207 ]

     

    Current Config:

     

    image.png.3e1b98bff1d6cbf2eaef1bbd4bfb4f22.png

  14. 57 minutes ago, JorgeB said:

    Like already mentioned you should respect the max officially supported RAM speeds by AMD depending on the config, at least while you're troubleshooting to rule that out, several cases in the forum of instability and even data corruption with Ryzen and overclocked RAM.

    Fair enough. - That will be my next step (going back down to the base XMP setting) if the system goes unresponsive again.

  15. Just an update. - I fixed the "system" share issue and made sure that it only resides within the cache pool. The other files that were on one of my drives were outdated. Therefore, I deleted them through Krusader.

     

    I also did the following:

    • Enabled the local syslog server and have it mirroring between the cache pool and the "flash" share.
    • Reverted back to stock (and rebooted) from the linuxserver.io Nvidia (Unraid Nvida plugin) image since my card wasn't supported.
    • Turned off (disabled) any ErP or C-State settings in the BIOS (which were previously enabled).
  16. 2 hours ago, ChatNoir said:

    Maybe a slower memory speed solves the issue, maybe not.

    In any case, finding the cause is a process of elimination. Testing this is super simple, requires no particuliar hardware and would eliminate this possibility from the equation.

     

    Particularly since you are running only one stick in single channel. This is not a config that is not usually covered by most tests.

    I'm not running in single-channel mode. There are two DIMMs installed. - 16gb (2x8gb) means that the kit installed is comprised of two 8gb modules, which is pretty standard notation for RAM specs.

×
×
  • Create New...