arough

Friday at 07:21 AM

I think I got it fixed.

Here's what I did in the end:

(disable every autostart function. Array, docker service and VMs)
copy all important files from `appdata` and everything in `system` to a backup folder on the array via rsync
- Curiously I only got one error while syncing. If any copied files are still broken, that's the price I have to pay for not having a recent backup ¯\_(ツ)_/¯
- I included the -a option so I wouldn't have problems with wrong permissions for the docker containers in the end. Not that I ever went through this before...
shrink the cache pool to one drive by unassigning the second drive (because I was at the drive limit)
remove the unused SSD
install the new SSD and create a new pool (`cachetwo`) with it
copy the data from step 1 to the new cache pool
swap every relevant share from `cache` to `cachetwo`
- probably best to disable mover before this step, but I knew it wasn't time for it to run
- Fix common problems will tell you that there is data for the shares on the old pool. While true, it is irrelevant in this instance
unassign the remaining drive from the original cache (this way I could remount it if I did something wrong)
delete the docker.img and rebuild it via `docker -> add container` and selecting
(add a new second pool drive when it arrives today)

This way I also swapped out the pool drives for peace of mind.
So the only hardware left from before are the array drives, PSU and the RAM (which is now at 3200MHz).

For now everything seems to be stable and so far no CRC errors while downloading and unpacking .... stuff

Did I do anything that could bite me in the ass later?

Thank you very much for helping! I only would have stumbled across the Ryzen RAM issue after nights of furiously googling and ripping my hair out. If ever

May 2

The memtest ran for 16 hours without issues at 3200 MHz.

I noticed that the backup I tried to do from the readonly zfs pool to the second flash drive wasn't successful.
Only a few folders are on the flash drive.
Curiously while copying the flash drive seemed to fill and showed ~80GB used (even if the cache pool only has ~55GB of data).

Looking at it now there are only 3.9GB used and merely three folders from the appdata copied.

This would be the plethora of errors in the syslog.

Is there any way to get all the data from the pool?

As far as I can tell when I configure app data to move cache -> array it stops at the same file in the ESPHOME folder as if I would the files manually.
I guess this would be one of the broken files mentioned below.

The GUI shows there are 12 corrupted files on the pool, but running `zpool status -v` only shows this

image.png.95df57c390eb9a27a31396e20f8026df.png

May 1

I turned the RAM speed down to 3200MHz.
Hopefully that is enough, as I can't find any official metrics for my 5600G. But the page of the 4600G says 3200MHz.

Running memtest now and will let it run through the night.

If all is well are the next steps I wrote above the correct way to go about fixing it?

May 1

As far as I can tell I have errors on my ZFS RAID1 2-disk cache pool which leads to a corrupt docker.img and crashing docker containers and unresponsive system or outright crashes.
When everything seems to run "fine" i get CRC errors on nearly everything I download and unrar and eventually the server seems to self-destruct and I need to hard reboot.
Also my Home Assistant regularly get's a broken db and starts a new one. Guess it's because it too get's corrupted from what ever is going on.

I attached a few screenshots with errors I'm getting.
The first few is the "crashed" state where I can still access the webGUI, navigate it but nothing I do in there does anything meaningful.

The fourth is when trying to backup the docker.img

I had bought new hardware and saw this as a sign to finally install the new mainboard and CPU. I also installed new SATA cables out of caution.
I also ran memtest for ~4h and nothing came up.
Nothing changed, but I didn't get my hopes up anyway, as the cache and/or docker.img are in a broken/corrupt state.

I found what to do individually for a corrupt docker.img or pool, but am unsure as to how to proceed.

I mounted the pool in readonly mode and backed up all I could (docker.img wouldn't copy but will be recreated anyway).
I've got one new SSD on hand and will be ordering a second one.

So far I've manually copied /appdata and docker.img from the pool to a usb drive.
While copying them (via cp) I got a bunch of errors that it can't copy symlinks. (last screenshot) I could force it via -L but then when moving stuff back I would have lost the linking and just have hard copies. I guess I have to tackle the problem when recreating the docker.img and just don't copy any linked files right?

What steps do I take to get to a running server with a fresh docker.img and a new pool with only the new SSD (the second one will be added once it arrives).

Do I just

remove both SSDs
install the single new SSD
create a new pool
manually move the files back
delete the docker.img and recreate it and the Apps
????
profit

Keep in mind I have the 6 disk license and am already at the cap.

Thanks in advance

tower-diagnostics-20240501-1653.zip

September 13, 2022

Thanks @trurl.
Will delete the shares.

I forked CA and hacked together an old version that still runs on 6.8.3 to potentially fix the Minecraft docker.
In the end CA wasn't even needed to fix it

I got Minecraft running again (was an issue with an old version of `runc` that is already updated on UnRAID 6.9.2+)

The Minecraft server is running great again. Peaks to 90% when someone is joining or exploring new areas, but when playing in already generated chunks UnRAID hovers around 20%.

It looks to me as if there is a direct link between my problems and the upgrade.
Is there anything I should test while on 6.8.3 to gather data and compare it to 6.10.3?

Here is another diagnostics with docker running and everything as fast as it was before.

tower-diagnostics-20220913-1630.zip

Are there maybe some other settings that were introduced in 6.9 or 6.10 that could mess with my setup?
Should I maybe upgrade to 6.9 first to see if the slowdowns happen there too?

Is there any way to check if the Spectre and Meltdown mitigations were correctly disabled on 6.10.3 after I edited the `syslinux.cfg`?

I would of course check this after upgrading to 6.10.3 again.

I wanted to upgrade to a Ryzen 3600 in the coming weeks and thought it would be good to upgrade UnRAID before that.
Maybe that wasn't the best idea

September 13, 2022

As far as I can tell this is only relevant for `prefer/only` right?

Spoiler

The other ones already had folders on array disks and just used them.

As there was no docker containers running and nothing moved in the `running without cache` state only empty folders were created and the complete `system` share was copied somehow.

How should I proceed from here?
Delete the empty folders manually via the console and delete the `system` folder on disk4?

Spoiler

image.png.ffe1333f9d67ce8365231c254bb73e2a.png

September 13, 2022

I don't know what you mean.

When I started without the cache drives UnRAID created every share again?

After assigning the cache and starting the array every share looks fine to me.

All cache settings are there and all the files seem to be there.

What do you suppose I should do?

September 13, 2022

God damn it...
forget every iperf log I posted before.
For some reason it is way slower over my Mac then with my PC.
The two top ones are from my windows PC.

Spoiler

But UnRAID is nevertheless faster with 6.8.3.

The duplicati backups take 1-2 minutes again.
The minecraft server for some reason will not start so I can't test it right now.
And because CA is missing (the newest version isn't compatible with 6.8 anymore and I can't find a way to install an old one) I can't install another minecraft app/docker.

Is there any way to do this?

Plex seems to behave better.
It is now transcoding for ~30 secs at 100% to fill the buffer and then only spiking every few seconds for like a second to fill the buffer again.

While moving the same file (on my Mac again to keep it comparable) htop now looks like this.

Spoiler

Way lower load than with 6.10.3.

Moving files via FileBot is now back to ~450 MB/s again, opposed to ~140MB/s with 6.10.3.
I use FileBot on Windows, but it is transferring directly on the cache drives as source and target are both on it.

Should I update to 6.10.3 again and see how iperf performs with my PC so the data I present to you is actually useful? 😅

EDIT: Just assigning the cache drives again worked btw.

September 13, 2022

oh I think I see what happened.
My cache drives aren't assigned.
If I try to assign them they are marked as `new device`.
I didn't start the array with them assigned yet, because I fear UnRAID will try to erase them, as they are detected as a new device.
How to proceed from here?

September 13, 2022

Well...

I restored to 6.8.3 via the GUI and now I don't have an `Apps` tab an my docker tab says `No Docker containers installed`.

I had to install iperf via Nerd Pack again as it was not installed after the downgrade.

It doesn't look so different from 6.10.3

Spoiler

But there is something fishy going on now with the docker stuff.

Here is the diagnostics.

tower-diagnostics-20220913-1330.zip

September 13, 2022

Can I just use the backup of my Flash Drive to go back or is there another way?

September 13, 2022

Sadly that didn't help.

Spoiler

iperf was even slower for a bit, but I don't think that's because of the disabled mitigations.
At least I couldn't explain how it should have an effect on it.

Spoiler

While copying a 2.1GB file this is what htop looks like:

Spoiler

And the UnRAID GUI:

Spoiler

image.png.11876977e33dbca99535bc96d0203a21.png

As you can see there is some activity after the file transfer is complete.
The progress bar for the transfer is at 100% and my MacBook isn't sending data anymore.
Only when it's at 100% the cache is written to at it's normal speed of ~500MB/s.

Same behaviour with my Windows PC.
It feels like there is some kind of slower cache in between. It definitely isn't the RAM as it would be faster and the RAM usage isn't going up significantly while transferring.

And of course I'm not using WiFi for all of this.

September 12, 2022

Hey,

I updated my server from 6.8 to 6.10 and now pretty much everything runs with high CPU load and/or slower then before the update.

iperf is somewhat slow.

Spoiler

Before the update I was able to move files from a local machine to the server with the full 1Gbit/s and now im only getting 70-80MB/s.

In the GUI it looks like there isn't even any activity on the cache drives until the transfer is nearly done and then it goes up to the ~500Mbit/s of my MX500s for the last few seconds.

When moving files on the server the speed normally went up to ~400MB/s and now hovers around 140MB/s.

One measurable slowdown is with my duplicati backup. It normally took 1-2 minutes when nothing has changes and only the checks were running. It now takes 5-7 minutes.

Spoiler

A minecraft server I set up a few days ago ramps up my CPU to ~90% with one player connected.
Before the update four players were able to play without that much load on the CPU.

Plex also uses nearly 100% when transcoding for one user and ~50% when direct playing.

That are the things I noticed so far, I think.

I don't understand what is happening here. The network isn't up to snuff and the CPU is overloading constantly.
The drives seem to be fine, when checking the speed via the DiskSpeed docker.
(One of my array drives is collecting `Reallocated sector count` errors, but I'm already in contact with Seagate for advanced RMA. And as this seems to be a problem when using the cache drives, I don't see how that could be related)

UnRAID forgot the settings for the SMART notifications after the update so I was getting the "normal" `current pending sector count` errors every MX500 gets. I didn't find any other settings UnRAID "lost" after the update so far.

I attached the diag archive.

Can someone make any sense of all this?

Thanks in advance.

tower-diagnostics-20220912-2100.zip

March 27, 2020

Can someone help me with my F@H docker only using one of my CPU cores?

As you can see in the image F@H somehow sees that I have 4 cores but doesn't use them.

I already tried CPU pinning, but to no avail.

Maybe some setting inside the docker that I#m not seeing?

Thanks in advance

January 26, 2020

Hey guys,

loving this nextcloud docker.

Have it working with MariaDB and LetsEncrypt Reverse Proxy.

But I have two problems with it.

When I logout of e.g. the admin account to log in with my user account it doesn't redirect me to the login page.

I press logout and nothing happens. If I press F5 it redirects me, but not automatically.

But this is only a minor problem and nothing to serious.

Next I am apperantly in an update loop.

I update, check for updates and there is another one ready.

The docker says there is an update but only this happens:

image.png.f5f9fb96ae6f9d122880c8d7b525dc4a.png

Sometimes it does a real update and downloads some files, one of it is ~150mb in size.

But I am still on 17.0.2.

Is there no update for v18 in this docker yet?

Thanks in advance guys

arough

Posts

Joined

Last visited

Content Type

Profiles

Forums

Downloads

Store

Gallery

Bug Reports

Documentation

Landing

Posts posted by arough

ZFS pool, docker BTRFS and container errors. How to proceed?

ZFS pool, docker BTRFS and container errors. How to proceed?

ZFS pool, docker BTRFS and container errors. How to proceed?

ZFS pool, docker BTRFS and container errors. How to proceed?

UnRAID slow after update from 6.8 to 6.10

UnRAID slow after update from 6.8 to 6.10

UnRAID slow after update from 6.8 to 6.10

UnRAID slow after update from 6.8 to 6.10

UnRAID slow after update from 6.8 to 6.10

UnRAID slow after update from 6.8 to 6.10

UnRAID slow after update from 6.8 to 6.10

UnRAID slow after update from 6.8 to 6.10

UnRAID slow after update from 6.8 to 6.10

[Support] Linuxserver.io - Folding@home

[Support] Linuxserver.io - Nextcloud