MarkRMonaco

January 30, 2021

@Altheran, if you have not done it already, I would suggest moving plex to its own dedicated SSD via unassigned devices. I have mine running on a 2nd nvme SSD formatted as XFS. From there, I have my Plex docker container's appdata mapped to the unassigned drive's mount point. Its helped to take the load off Unraid's SSD (not only in performance but storage used) and has not been affected by my recent stability issues.

You can use Krusader to move the Plex appdata when its container is not running.

January 30, 2021

I let the system run for a little over two hours and went ahead with reformatting my cache drive to XFS.

January 30, 2021

27 minutes ago, jonathanm said:

You must change the desired format type to XFS if that's what you want to use. That option will not be available if you have more than 1 slot available for cache disks.

@jonathanm, that would make a lot more sense. I have two slots configured (as I was going to add a 2nd drive a few months ago, but never did). I'll worry about reformatting once I get my system stabilized.

January 30, 2021

Another update, I had a few more issues pop-up last night. First one, the WebGUI was complaining that the license file was missing/corrupted. So, I redownloaded a fresh copy of my key file, and placed it on the USB drive. I then moved the drive to a different port on the computer (because I was also seeing a mention of "reset SuperSpeed Gen 1 USB device" in the syslog). Once the system was back up, it ran for several hours without issues (before I went to bed).

When I woke up this morning, I found that the system was being unresponsive (hard-locked). I verified in the BIOS that Global C-States was already off, and typical current was already enabled. Therefore, I turned off spread spectrum (since XMP was enabled at that time). Once it was back up, I added "rcu_nocbs=0-15" to the syslinux config and rebooted (at the time I didn't realize that I mistyped it and had "cu_nocbs=0-15" in the config).

From there, I went out to the store and came back a few hours later to another hard-lock. This time, I went back into BIOS and turned off XMP. Once the system was back up, I corrected the "rcu_nocbs=0-15" entry and rebooted. From there, I opened putty on my other computer and began a tail on the syslog. Note, I already have the syslog server enabled on the unraid system with it looping back to itself, but have never been able to get anything to write to the share.

As for the USB drive itself, I have Unraid configured to use UEFI mode.

January 29, 2021

Thanks. I'll have to look into that. When I reassigned the cache drive, it automatically formatted as btrfs.

January 29, 2021

Well, this morning I logged in and found that the WebGUI was reporting that it couldn't access flash. So, I powered the server down, reformatted the thumb drive with a fresh copy of 6.9-rc2, and restored my config backup. Now, I need to start the parity check all over again... joy.

January 29, 2021

I agree @Squid. I'm assuming there is a chance that the appdata may have had some corruption in it when it was backed-up before the reformat. At the moment, everything seems to be ok and any btrfs errors were corrected (according to the logs). Since it is doing a parity check after the forced reboot, I'm going to let it sit for the time being. If I see anything else pop-up in the system log or if any other abnormal activity occurs, I'll reply back here (hopefully, w/ logs).

January 29, 2021

Just another update. Ran into some issues while restoring my docker containers from my saved templates. It would occasionally cause the docker service to stop. In most cases, I was able to stop the array and restart it, which would get docker running again. However, at some point, stopping the array would get hung up at the cache drive. Thankfully, I was able to stop it via terminal with "umount -l /mnt/cache". At which point, I rebooted the server and immediately ran btrfs scrub again. Errors/corruption were detected, but corrected.

Scrub device /dev/nvme0n1p1 (id 1) done
Scrub started:    Thu Jan 28 21:51:42 2021
Status:           finished
Duration:         0:00:54
Total to scrub:   235.02GiB
Rate:             3.15GiB/s
Error summary:    no errors found
WARNING: errors detected during scrubbing, corrected

Unfortunately, I forgot to pull logs before I rebooted...

I'll continue to keep an eye on it and will report back (w/ logs) if anything changes.

January 29, 2021

Just an update on this. Reading other information on the forums and the wiki, I decided to reformat the drive. As a precaution, I first erased the drive, unassigned it, formatted to XFS, reassigned and let it automatically reformat it back to BTRFS. When rebuilding the docker image file, I also opted for the XFS option.

Running appdata backup/restore as we speak...

January 29, 2021

Ran into a kernel panic error within the last 24 hours. Recently (within the past hour or so), my Unraid server went unresponsive. At the time, the only activity was a single user on Plex (Docker) transcoding. I had to force the system to shutdown (via power button) and turn it back on again. As a precaution, since transcoding was on that cache drive, I moved it to RAM for the time being.

Looking at the log, I found one instance of BTRFS complaining about corruption:

Jan 28 20:47:26 WadeWilson kernel: BTRFS error (device loop2): bdev /dev/loop2 errs: wr 0, rd 0, flush 0, corrupt 1, gen 0

Since my cache drive is the only BTRFS formatted device in the system, I ran scrub on it:

Scrub device /dev/nvme0n1p1 (id 1) done
Scrub started:    Thu Jan 28 20:57:14 2021
Status:           finished
Duration:         0:00:25
Total to scrub:   133.02GiB
Rate:             2.77GiB/s
Error summary:    csum=1
  Corrected:      0
  Uncorrectable:  1
  Unverified:     0
ERROR: there are uncorrectable errors

Anyone have any advice on what I should do next? Logs are attached.

wadewilson-diagnostics-20210128-2102.zip

October 1, 2020

I don't have a 2nd Unraid server (yet). So, I'm just trying to save the files locally. It was a recommendation I received in a prior post just to make sure that I am not losing anything in the event the system crashes (which was happening to me prior before I realized I had a bad SATA3 cable).

October 1, 2020

Looking at my system log (and the fact that my local share is never populated with files), I am suspecting that Rsyslog is never initializing all the way. - I've tried various configuration changes, including trying to send (loop) the files back to the server using the remote address option (per a tip I received in one of my prior forum posts). However, it never seems to work correctly... - Is there a way to reinstall/rebuild it?

System Log:

Oct 1 10:50:09 WadeWilson rsyslogd: [origin software="rsyslogd" swVersion="8.2002.0" x-pid="21673" x-info="https://www.rsyslog.com"] start
Oct 1 10:50:35 WadeWilson ool www[21084]: /usr/local/emhttp/plugins/dynamix/scripts/rsyslog_config
Oct 1 10:50:38 WadeWilson rsyslogd: Could not find template 1 'remote' - action disabled [v8.2002.0 try https://www.rsyslog.com/e/3003 ]
Oct 1 10:50:38 WadeWilson rsyslogd: error during parsing file /etc/rsyslog.conf, on or before line 121: errors occured in file '/etc/rsyslog.conf' around line 121 [v8.2002.0 try https://www.rsyslog.com/e/2207 ]
Oct 1 10:50:38 WadeWilson rsyslogd: [origin software="rsyslogd" swVersion="8.2002.0" x-pid="21912" x-info="https://www.rsyslog.com"] start

My "syslog" Share:

image.png.2da7dbae1064bc8398f29b9da37a692b.png

Current Settings:

This is specific line that the error is complaining about in /etc/rsyslog.conf:

image.png.d51de4dcc603c98b93d2b07a02d4e2c8.png

wadewilson-diagnostics-20201001-1106.zip

September 18, 2020

Everything appears to be fixed on my end and no further CRC errors have shown up since the cables were replaced. - For the time being, I keep the RAM at the factory XMP speed without any additional overclocks applied.

As for the Syslog server, it is still not functioning correctly. Therefore, I may look into either a Docker that can serve the same purpose (making the Unraid server itself just a client to send logs) or one of my Raspberry Pi's.

September 17, 2020

Just another update... I picked up the replacement SATA-III cables and installed them. Of course, that means that I now need to wait for the drive to rebuild again since it was postponed earlier today.

If anyone has any further insight into the Syslog issue, I would love to get that up and running. Thanks in advance.

September 16, 2020

Ok, I just received a CRC error on a completely different drive. Therefore, I'm pretty confident that the cable itself is bad (or going bad). - Since that particular drive was my only parity drive. ~~I'm going to pause the rebuild just to be safe.~~ I'm going to shut down the server and have already placed a store pickup order for replacement cables (going to replace all 4 since the original cables were a sleeved bundle) from my local Fry's. -- Just waiting on the "order ready" confirmation...

September 16, 2020

48 minutes ago, JorgeB said:

This is still above the max officially supported RAM speed for your CPU.

I get that and will drop it down further, if necessary. - However, I just changed several things at once (again), and need to see if any of it has made an impact.

September 16, 2020

Thanks @kevschu.

An update on my end...

About a half of day later, the drive went back into "disabled" status due to errors. - Therefore, I went into the BIOS and brought the RAM clock back down to the base/stock XMP setting (3000MHz) w/ no additional overclock. From there, I shut the system back down, and swapped the SATA cable ordering (they're physically tagged) across the four 3.5" drives (1 through 4, top to bottom; versus 4 through 1). All of the power connectors were checked as well to ensure that they were fully seated.

Now, I'm back to square one with the parity rebuild/sync since the drive had to be removed and re-added to the pool...

In the meantime, let me know if anyone is interested in a new set of logs pulled from the system.

September 15, 2020

On 9/14/2020 at 11:41 AM, MarkRMonaco said:

Just an update. - I fixed the "system" share issue and made sure that it only resides within the cache pool. The other files that were on one of my drives were outdated. Therefore, I deleted them through Krusader.

I also did the following:

Enabled the local syslog server and have it mirroring between the cache pool and the "flash" share.

Reverted back to stock (and rebooted) from the linuxserver.io Nvidia (Unraid Nvida plugin) image since my card wasn't supported.

Turned off (disabled) any ErP or C-State settings in the BIOS (which were previously enabled).

Just an update, my drive is almost 100% rebuilt and I have not ran into issues with it being unresponsive (yet). So, it looks like one of these steps (above) solved the issue.

image.png.3fc48cb94acc591e88df8fd4f59c9e8a.png

I am, however, still experiencing issues actually getting syslog working at all. The Unraid share has yet to be populated with anything, and I am still running into that single error message whenever the service is started or restarted.

image.png.e40b0b3ba196a7a50040cdfce8629e75.png

September 15, 2020

I also checked my "syslog" share, and it looks like it is not populating with any files as well...

September 14, 2020

1 hour ago, trurl said:

I think you have to fill in the Remote syslog server to tell Unraid to send them to itself.

This is mentioned as the 3rd option in that FAQ I linked.

Thanks. I missed that part.

With the "flash mirroring" option turned off and the syslog server set to "both" for protocols, I'm still getting one error message returned when the service started/restarted:

Starting rsyslogd daemon: 
/usr/sbin/rsyslogd -i /var/run/rsyslogd.pid
rsyslogd:  Could not find template 1 'remote' - action disabled [v8.2002.0 try https://www.rsyslog.com/e/3003 ]
rsyslogd: error during parsing file /etc/rsyslog.conf, on or before line 121: errors occured in file '/etc/rsyslog.conf' around line 121 [v8.2002.0 try https://www.rsyslog.com/e/2207 ]

Current Config:

image.png.3e1b98bff1d6cbf2eaef1bbd4bfb4f22.png

September 14, 2020

1 hour ago, trurl said:

Post a screenshot of your Syslog Server settings.

I also put a screenshot of the specific lines that were called out from the /etc/rsyslog.conf file in my previous reply.

September 14, 2020

Now, regarding the syslog configuration, is this something I should be concerned about (and do I need to take any action)?

Lines 66 & 67:

image.png.52804e414cce14c2441a5d45dbb37063.png

Line 123:

image.png.0845e5661e4e03793c9e812ca850fc3e.png

September 14, 2020

57 minutes ago, JorgeB said:

Like already mentioned you should respect the max officially supported RAM speeds by AMD depending on the config, at least while you're troubleshooting to rule that out, several cases in the forum of instability and even data corruption with Ryzen and overclocked RAM.

Fair enough. - That will be my next step (going back down to the base XMP setting) if the system goes unresponsive again.

September 14, 2020

Just an update. - I fixed the "system" share issue and made sure that it only resides within the cache pool. The other files that were on one of my drives were outdated. Therefore, I deleted them through Krusader.

I also did the following:

Enabled the local syslog server and have it mirroring between the cache pool and the "flash" share.
Reverted back to stock (and rebooted) from the linuxserver.io Nvidia (Unraid Nvida plugin) image since my card wasn't supported.
Turned off (disabled) any ErP or C-State settings in the BIOS (which were previously enabled).

September 14, 2020

2 hours ago, ChatNoir said:

Maybe a slower memory speed solves the issue, maybe not.

In any case, finding the cause is a process of elimination. Testing this is super simple, requires no particuliar hardware and would eliminate this possibility from the equation.

Particularly since you are running only one stick in single channel. This is not a config that is not usually covered by most tests.

I'm not running in single-channel mode. There are two DIMMs installed. - 16gb (2x8gb) means that the kit installed is comprised of two 8gb modules, which is pretty standard notation for RAM specs.

MarkRMonaco

Posts

Joined

Last visited

Content Type

Profiles

Forums

Downloads

Store

Gallery

Bug Reports

Documentation

Landing

Posts posted by MarkRMonaco

[SOLVED] BTRFS Corrupt Error on Cache Drive [6.9.0-rc2]

[SOLVED] BTRFS Corrupt Error on Cache Drive [6.9.0-rc2]

[SOLVED] BTRFS Corrupt Error on Cache Drive [6.9.0-rc2]

[SOLVED] BTRFS Corrupt Error on Cache Drive [6.9.0-rc2]

[SOLVED] BTRFS Corrupt Error on Cache Drive [6.9.0-rc2]

[SOLVED] BTRFS Corrupt Error on Cache Drive [6.9.0-rc2]

[SOLVED] BTRFS Corrupt Error on Cache Drive [6.9.0-rc2]

[SOLVED] BTRFS Corrupt Error on Cache Drive [6.9.0-rc2]

[SOLVED] BTRFS Corrupt Error on Cache Drive [6.9.0-rc2]

[SOLVED] BTRFS Corrupt Error on Cache Drive [6.9.0-rc2]

[SOLVED] Built-in Syslog Server Not Working

[SOLVED] Built-in Syslog Server Not Working

[SOLVED] Server Going Unresponsive Multiple Times This Week

[SOLVED] Server Going Unresponsive Multiple Times This Week

[SOLVED] Server Going Unresponsive Multiple Times This Week

[SOLVED] Server Going Unresponsive Multiple Times This Week

[SOLVED] Server Going Unresponsive Multiple Times This Week

[SOLVED] Server Going Unresponsive Multiple Times This Week

[SOLVED] Server Going Unresponsive Multiple Times This Week

[SOLVED] Server Going Unresponsive Multiple Times This Week

[SOLVED] Server Going Unresponsive Multiple Times This Week

[SOLVED] Server Going Unresponsive Multiple Times This Week

[SOLVED] Server Going Unresponsive Multiple Times This Week

[SOLVED] Server Going Unresponsive Multiple Times This Week

[SOLVED] Server Going Unresponsive Multiple Times This Week