DizRD

July 1, 2023

My unraid server lost power two times consecutively the other night... and I've have some performance issues since then.. Slow access, etc. I noticed the system was cycling between trying to run parity check and run mover over and over.

I disabled mover, ran parity check, took the array offiline then ran btrfs --readonly on one of the cache drives.. (/dev/sde) it reported multiple btrfs errors, so I had to run --repait as well. I smart checked the drive and it seems fine, but I'm seeing alot of IO errors in the syslog it looks like.

Mover runs now and doesn't pause, but it doesn't seem to ever complete and I stop seeing it in the logs after a point in time. I'm not sure if the files that were on that cache drive seem to have issues, I can't tell if it's just file level corruption for those files or if the drive is failing, so I thought I'd get some other eyes on it for opinions, below is some of the log weirdness, That's not filtered on my part.:

Jul 1 10:32:57 deathstar move: file: //..g/...
Jul 1 10:32:57 deathstar move: move_object: //..g/... File exists
Jul 1 10:32:58 deathstar move: skip: /mnt/cache/postgres/global/1262
Jul 1 10:32:59 deathstar move: skip: /mnt/cache/postgres/global/6100
Jul 1 10:33:00 deathstar move: file: //..t/...
### [PREVIOUS LINE REPEATED 4 TIMES] ###
Jul 1 10:33:03 deathstar emhttpd: read SMART /dev/sdu
Jul 1 10:33:03 deathstar move: file: //..d/...
Jul 1 10:33:04 deathstar move: file: //..0/...
Jul 1 10:33:05 deathstar move: file: //..t/...
Jul 1 10:33:05 deathstar move: file: //..k/...
Jul 1 10:33:06 deathstar move: file: //..f/...
Jul 1 10:33:06 deathstar move: skip: /mnt/unprotectedcache/ebooks/.config/openbox/autostart
Jul 1 10:33:06 deathstar root: Specified filename //..e/... does not exist.
Jul 1 10:33:06 deathstar move: file: //..e/...
Jul 1 10:33:06 deathstar move: move_object: //..e/... No such file or directory

deathstar-diagnostics-20230701-1500.zip

December 22, 2022

Just to confirm before I try it... So how would I make sure mover doesn't try to move the contents on the unencrypted directory to various disks? Set the share to cache only?

December 21, 2022

So I have an gocryptfs folder I'd like to mount on Unraid..

Is it a bad idea/no-no to mount the gocryptfs "drive" /mnt/user/somedir?

If it's a bad idea, where should I mount it? /mnt/somedir feels weird now after getting used to /mnt/user/somedir

December 19, 2022

Something weird happened last night.. My unraid server just kind of choked? I noticed some apps started stalling. the major ones running were plex and tdarr. I've recently added filerun to the mix as well and was browsing files at the time it choked with filerun. I tried to stop the apps, but the UI wouldn't let me. I looked at the logs and saw a bunch of: sshfs cache share full messages. I tried to run a diagnostic collection to submit here but it stalled in the middle of generation and then the ui became totally unresponsive..

I switched to ssh and was able to see processes, but most file level activities would fail. I noticed there were thousands of plex healthcheck scripts there were stalled, so I tried to kill them. I noticed too there were alot of libreoffice processes open, I assume from filerun trying to generate metadata about files which I tried to kill. I also stopped the docker service and mover.. at the end of the day, I couldn't restore functionality so I had to reboot from the command line.

Any clues on what happened?

deathstar-diagnostics-20221219-1034.zip

December 10, 2022

So yea, I had to figure out what Bonienl was talking about.. Maybe there is a better way, but I had to get a Smart switch that supports vlans, and create a VLAN, attach a network adapter to the VLAN port on the switch and then I attached the relevant docker network to the vlan.. It works.. but it's a pain to setup. Networking in Kubernetes would be easier, but I know that's not officially supported. Ultimately with my time in unraid while I love it as a storage device/internal app server, I wouldn't trust the isolation provided by docker and vlans at the moment for public internet facing apps.. But that's just me as a security person. I'm probably just going to setup a fedora server with kubernetes for any public internet facing apps.

October 11, 2022

Any other suggestions?

October 10, 2022

Thanks for the response!

Yea it's a 24 bay case, but not all the drive bays are full in front so they don't have even airflow to keep the cool under parity load it seems. I will look into what I can do on that.

As for the Docker image file, I'm using Directory layout instead of the default docker image encapsulation.

October 10, 2022

See attached

deathstar-diagnostics-20221009-1909.zip

October 9, 2022

Hey, My Unraid server lost power, there was some corruption on my XFS drives, and then I was advised to update to the latest unraid version. I'm using a directory for docker instead of the default docker image. I tried docker image prune -a but it doesn't seem to make a difference. Somewhere along the way a number of app/containers were removed. When I try to reinstall them, I get the following errors:

docker run
-d
--name='Postgres12.5'
--net='netlan'
-e TZ="America/Chicago"
-e HOST_OS="Unraid"
-e HOST_HOSTNAME="deathstar"
-e HOST_CONTAINERNAME="Postgres12.5"
-e 'POSTGRES_PASSWORD'='***'
-e 'POSTGRES_USER'='dsmreadme'
-e 'POSTGRES_DB'='dsmrdb'
-l net.unraid.docker.managed=dockerman
-l net.unraid.docker.icon='https://raw.githubusercontent.com/Flight777/unraid_justworks_templates/main/images/postgres/Postgresql_elephant.png'
-p '5432:5432/tcp'
-v '/mnt/user/postgres':'/var/lib/postgresql/data':'rw' 'postgres:12.5-alpine'
a4a539f6a3aa5a3919852d623b360bb5365fa5ffcbb83cd9f6eecc9925a09759
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "docker-entrypoint.sh": executable file not found in $PATH: unknown.

The command failed.

---trying a bulk reinstall results in this:

Starting binhex-krusader
binhex-krusader failed to start. You should install it by itself to fix the errors
Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "/usr/bin/tini": stat /usr/bin/tini: no such file or directory: unknown
Error: failed to start containers: binhex-krusader

Starting cyberchef
Starting dupeGuru
dupeGuru failed to start. You should install it by itself to fix the errors
Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "/init": stat /init: no such file or directory: unknown
Error: failed to start containers: dupeGuru

Starting joplin
joplin failed to start. You should install it by itself to fix the errors
Error response from daemon: unable to find user joplin: no matching entries in passwd file
Error: failed to start containers: joplin

October 1, 2022

Hey, My Unraid server lost power, there was some corruption on my XFS drives, and then I was advised to update to the latest unraid version. Somewhere along the way the postgres app/container was removed. When I try to reinstall it, I get the following error:

docker run
-d
--name='Postgres12.5'
--net='netlan'
-e TZ="America/Chicago"
-e HOST_OS="Unraid"
-e HOST_HOSTNAME="deathstar"
-e HOST_CONTAINERNAME="Postgres12.5"
-e 'POSTGRES_PASSWORD'='***'
-e 'POSTGRES_USER'='dsmreadme'
-e 'POSTGRES_DB'='dsmrdb'
-l net.unraid.docker.managed=dockerman
-l net.unraid.docker.icon='https://raw.githubusercontent.com/Flight777/unraid_justworks_templates/main/images/postgres/Postgresql_elephant.png'
-p '5432:5432/tcp'
-v '/mnt/user/postgres':'/var/lib/postgresql/data':'rw' 'postgres:12.5-alpine'
a4a539f6a3aa5a3919852d623b360bb5365fa5ffcbb83cd9f6eecc9925a09759
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "docker-entrypoint.sh": executable file not found in $PATH: unknown.

The command failed.

September 26, 2022

Updating to 6.11.0 seems to have fixed whatever permissions issue I had with tdarr

September 24, 2022

this one is for tdarr node/worker

/usr/local/emhttp/plugins/dynamix.docker.manager/scripts/docker run -d --name='tdarr_node' --net='host' -e TZ="America/Chicago" -e HOST_OS="Unraid" -e HOST_HOSTNAME="deathstar" -e HOST_CONTAINERNAME="tdarr_node" -e 'serverIP'='192.168.144.79' -e 'serverPort'='8266' -e 'nodeIP'='0.0.0.0' -e 'nodeID'='RTX 2060' -e 'TCP_PORT_8267'='8267' -e 'PUID'='99' -e 'PGID'='100' -e 'NVIDIA_VISIBLE_DEVICES'='GPU-c864a4ca-17df-980b-3ff9-c88561641fc3' -e 'NVIDIA_DRIVER_CAPABILITIES'='all' -e 'dummyvar'='dummyvar' -l net.unraid.docker.managed=dockerman -l net.unraid.docker.webui='http://[IP]:[PORT:8265]' -l net.unraid.docker.icon='https://raw.githubusercontent.com/selfhosters/unRAID-CA-templates/master/templates/img/tdarr.png' -v '/mnt/user/appdata/tdarr/configs':'/app/configs':'rw' -v '/mnt/user/appdata/tdarr/logs':'/app/logs':'rw' -v '/mnt/user/video/':'/mnt/media':'rw' -v '/mnt/user/temptrans/':'/temp':'rw' -v '/mnt/user/tfy/':'/mnt/tfy':'rw' --runtime=nvidia 'haveagitgat/tdarr_node'

79de41f641f64083a713cce18d92060f51416da9ee53ec6354bd1246027c9fe1

September 24, 2022

/usr/local/emhttp/plugins/dynamix.docker.manager/scripts/docker run -d --name='tdarr' --net='host' -e TZ="America/Chicago" -e HOST_OS="Unraid" -e HOST_HOSTNAME="deathstar" -e HOST_CONTAINERNAME="tdarr" -e 'serverIP'='192.168.144.79' -e 'TCP_PORT_8266'='8266' -e 'TCP_PORT_8265'='8265' -e 'internalNode'='false' -e 'nodeIP'='0.0.0.0' -e 'nodeID'='MyInternalNode' -e 'TCP_PORT_8264'='8264' -e 'PUID'='99' -e 'PGID'='100' -e 'dummyvar'='dummyvar' -l net.unraid.docker.managed=dockerman -l net.unraid.docker.webui='http://[IP]:[PORT:8265]' -l net.unraid.docker.icon='https://raw.githubusercontent.com/selfhosters/unRAID-CA-templates/master/templates/img/tdarr.png' -v '/mnt/user/appdata/tdarr/server':'/app/server':'rw' -v '/mnt/user/appdata/tdarr/configs':'/app/configs':'rw' -v '/mnt/user/appdata/tdarr/logs':'/app/logs':'rw' -v '/mnt/user/video/':'/mnt/media':'rw' -v '/mnt/user/temptrans/':'/temp':'rw' -v '/mnt/user/tfy/':'/mnt/tfy':'rw' 'haveagitgat/tdarr'

a24b6a9decf72870b88c3ee3f2f9b81986991e7c934934818f68cee13f9fdb45

September 23, 2022

I recently updated to 6.10.3, and since then my tdarr installation seems to have some sort of permissions issue that didn't exist in 6.9.2.

I have an SSD I use for transcoding, it's mounted as a share

Error in Tdarr is:

"Cache file /temp/Vid1.mkv (1492975718 bytes) does not match size of new cache file /mnt/media/Vid1.mkv (0 bytes)"

deathstar-diagnostics-20220923-0037.zip

September 23, 2022

Marked as solved, I haven't seen the btrfs error after the scrub and a reboot. I will open a separate ticket for the permissions problems.

September 20, 2022

Weird, cause my syslog is still showing recent BTRFS errors:

Quote

Sep 19 22:40:42 deathstar kernel: BTRFS warning (device dm-15): csum failed root 5 ino 1838869 off 291581952 csum 0x069f7410 expected csum 0xf7d976f9 mirror 1

Sep 19 22:40:42 deathstar kernel: BTRFS error (device dm-15): bdev /dev/mapper/sdw1 errs: wr 0, rd 0, flush 0, corrupt 5732, gen 0

Sep 19 22:40:42 deathstar kernel: BTRFS warning (device dm-15): csum failed root 5 ino 1838869 off 291581952 csum 0x069f7410 expected csum 0xf7d976f9 mirror 1

Sep 19 22:40:42 deathstar kernel: BTRFS error (device dm-15): bdev /dev/mapper/sdw1 errs: wr 0, rd 0, flush 0, corrupt 5733, gen 0

Sep 19 22:40:58 deathstar kernel: BTRFS warning (device dm-15): csum failed root 5 ino 20864475 off 291581952 csum 0x069f7410 expected csum 0xf7d976f9 mirror 1

Sep 19 22:40:58 deathstar kernel: BTRFS error (device dm-15): bdev /dev/mapper/sdw1 errs: wr 0, rd 0, flush 0, corrupt 5734, gen 0

Sep 19 22:40:58 deathstar kernel: BTRFS warning (device dm-15): csum failed root 5 ino 20864475 off 291581952 csum 0x069f7410 expected csum 0xf7d976f9 mirror 1

Sep 19 22:40:58 deathstar kernel: BTRFS error (device dm-15): bdev /dev/mapper/sdw1 errs: wr 0, rd 0, flush 0, corrupt 5735, gen 0

Sep 19 22:41:38 deathstar kernel: BTRFS warning (device dm-15): csum failed root 5 ino 20864461 off 291581952 csum 0x069f7410 expected csum 0xf7d976f9 mirror 1

Sep 19 22:41:38 deathstar kernel: BTRFS error (device dm-15): bdev /dev/mapper/sdw1 errs: wr 0, rd 0, flush 0, corrupt 5736, gen 0

Sep 19 22:41:38 deathstar kernel: BTRFS warning (device dm-15): csum failed root 5 ino 20864461 off 291581952 csum 0x069f7410 expected csum 0xf7d976f9 mirror 1

Sep 19 22:41:38 deathstar kernel: BTRFS error (device dm-15): bdev /dev/mapper/sdw1 errs: wr 0, rd 0, flush 0, corrupt 5737, gen 0

September 19, 2022

Hmmm, I ran scrub, tracked down the 3 files it mentioned, and removed them. I've rerun scrub since then and found no errors, but I'm still seeing BTRFS errors on the cache drive. Do I need to restart?

deathstar-diagnostics-20220919-0619.zip

September 18, 2022

Thanks! Running Scrub now.

Now that I've updated to 10.3, should I open a separate thread for the permission issue it creates?

September 18, 2022

Updated to 10.3, still seeing the btrfs errors. What should I check out now?

deathstar-diagnostics-20220918-0427.zip

September 18, 2022

I will try to update the version again and report back. I guess if it causes permission issues again with the update, I will try to figure out how to fix it with the Dynamix Permissions plugin..

For my piece of mind, What does this error mean in the log:

Sep 17 03:48:27 deathstar kernel: BTRFS warning (device dm-15): csum failed root 5 ino 1838869 off 291581952 csum 0x069f7410 expected csum 0xf7d976f9 mirror 1 Sep 17 03:48:27 deathstar kernel: BTRFS error (device dm-15): bdev /dev/mapper/sdv1 errs: wr 0, rd 0, flush 0, corrupt 2986, gen 0 Sep 17 03:48:27 deathstar kernel: BTRFS warning (device dm-15): csum failed root 5 ino 1838869 off 291581952 csum 0x069f7410 expected csum 0xf7d976f9 mirror 1 Sep 17 03:48:27 deathstar kernel: BTRFS error (device dm-15): bdev /dev/mapper/sdv1 errs: wr 0, rd 0, flush 0, corrupt 2987, gen 0 Sep 17 03:48:29 deathstar kernel: BTRFS warning (device dm-15): csum failed root 5 ino 1838869 off 291581952 csum 0x069f7410 expected csum 0xf7d976f9 mirror 1 Sep 17 03:48:29 deathstar kernel: BTRFS error (device dm-15): bdev /dev/mapper/sdv1 errs: wr 0, rd 0, flush 0, corrupt 2988, gen 0 Sep 17 03:48:29 deathstar kernel: BTRFS warning (device dm-15): csum failed root 5 ino 1838869 off 291581952 csum 0x069f7410 expected csum 0xf7d976f9 mirror 1 Sep 17 03:48:29 deathstar kernel: BTRFS error (device dm-15): bdev /dev/mapper/sdv1 errs: wr 0, rd 0, flush 0, corrupt 2989, gen 0

September 16, 2022

Absolutely, Backups have been made, but I'm more curious about if the system is fault tolerant enough to operate if the cache drive dies or if the system halts, since I don't know how long it would take me to get a replacement cache drive.

September 15, 2022

Thanks, I'd heard of Dynamix but was kind of worried about manually changing permissions to get it to work in 10.3 in case that messed with any default permission configurations needed for upgrade paths in the future.

Some of my system shares are told to prefer cache.. What happens if cache really fails? Will the system no longer be operational until I replace the cache drive or disable cache?

September 14, 2022

Interesting! I will try the update. I had updated to 10.3 before, but ran into surprise permission issues on shares and rolled back to my previous Unraid version and the permission issues went away. I guess I will have to see if they are fixed in the newest version

September 14, 2022

Here's the Diagnostic after the reboot and things seem operational. Btw I haven't moved any cables or anything, everything hardware wise has been solid until now.

deathstar-diagnostics-20220913-2334.zip

September 14, 2022

I went to update the plex app, and it seemed to be stuck there forever. Eventually I reloaded the unraid docker page and it gave me an error on that page, something to the effect of plex.ico was read-only.. Then everything went bonkers. I started seeing errors in the systemlog about my cache drive not being accessible. I exported a diagnostic log at that time<attached>:

I went ahead and restarted the server to see maybe the docker container update and put something in a stuck state.. When I restarted, 4 of my pool drives said they were unmountable. I searched and found a thread about booting into maintenance mode and running xfs_repair on the drives. It fixed some things and then I rebooted.

Everything seems fine now, but i'm worried.

I ran a smart test on the cache drive, and it said it had errors, but I've never had good luck with my smart reports in unraid<attached>:

Anyone want to chime in on health insights or other suggestions?

Some of my system shares are told to prefer cache.. What happens if cache really fails? Will the system no longer be operational until I replace the cache drive or disable cache?

deathstar-diagnostics-20220913-2024.zip deathstar-smart-20220913-2221.zip

DizRD

Posts

Joined

Last visited

Content Type

Profiles

Forums

Downloads

Store

Gallery

Bug Reports

Documentation

Landing

Posts posted by DizRD

Weird Unraid issues.

Where is the best place to place an gocryptfs mount?

Where is the best place to place an gocryptfs mount?

Unraid became mostly unresponsive

Docker network security/isolation with iptables?

Docker containers broken?

Docker containers broken?

Docker containers broken?

Docker containers broken?

[Support] Flight777 - "Just Works" Container Repository

Unraid Tdarr Permission issue

Unraid Tdarr Permission issue

Unraid Tdarr Permission issue

Unraid Tdarr Permission issue

Is my server about to crash?

Is my server about to crash?

Is my server about to crash?

Is my server about to crash?

Is my server about to crash?

Is my server about to crash?

Is my server about to crash?

Is my server about to crash?

Is my server about to crash?

Is my server about to crash?

Is my server about to crash?