DizRD

Members
  • Posts

    37
  • Joined

  • Last visited

Everything posted by DizRD

  1. My unraid server lost power two times consecutively the other night... and I've have some performance issues since then.. Slow access, etc. I noticed the system was cycling between trying to run parity check and run mover over and over. I disabled mover, ran parity check, took the array offiline then ran btrfs --readonly on one of the cache drives.. (/dev/sde) it reported multiple btrfs errors, so I had to run --repait as well. I smart checked the drive and it seems fine, but I'm seeing alot of IO errors in the syslog it looks like. Mover runs now and doesn't pause, but it doesn't seem to ever complete and I stop seeing it in the logs after a point in time. I'm not sure if the files that were on that cache drive seem to have issues, I can't tell if it's just file level corruption for those files or if the drive is failing, so I thought I'd get some other eyes on it for opinions, below is some of the log weirdness, That's not filtered on my part.: Jul 1 10:32:57 deathstar move: file: //..g/... Jul 1 10:32:57 deathstar move: move_object: //..g/... File exists Jul 1 10:32:58 deathstar move: skip: /mnt/cache/postgres/global/1262 Jul 1 10:32:59 deathstar move: skip: /mnt/cache/postgres/global/6100 Jul 1 10:33:00 deathstar move: file: //..t/... ### [PREVIOUS LINE REPEATED 4 TIMES] ### Jul 1 10:33:03 deathstar emhttpd: read SMART /dev/sdu Jul 1 10:33:03 deathstar move: file: //..d/... Jul 1 10:33:04 deathstar move: file: //..0/... Jul 1 10:33:05 deathstar move: file: //..t/... Jul 1 10:33:05 deathstar move: file: //..k/... Jul 1 10:33:06 deathstar move: file: //..f/... Jul 1 10:33:06 deathstar move: skip: /mnt/unprotectedcache/ebooks/.config/openbox/autostart Jul 1 10:33:06 deathstar root: Specified filename //..e/... does not exist. Jul 1 10:33:06 deathstar move: file: //..e/... Jul 1 10:33:06 deathstar move: move_object: //..e/... No such file or directory deathstar-diagnostics-20230701-1500.zip
  2. Just to confirm before I try it... So how would I make sure mover doesn't try to move the contents on the unencrypted directory to various disks? Set the share to cache only?
  3. So I have an gocryptfs folder I'd like to mount on Unraid.. Is it a bad idea/no-no to mount the gocryptfs "drive" /mnt/user/somedir? If it's a bad idea, where should I mount it? /mnt/somedir feels weird now after getting used to /mnt/user/somedir
  4. Something weird happened last night.. My unraid server just kind of choked? I noticed some apps started stalling. the major ones running were plex and tdarr. I've recently added filerun to the mix as well and was browsing files at the time it choked with filerun. I tried to stop the apps, but the UI wouldn't let me. I looked at the logs and saw a bunch of: sshfs cache share full messages. I tried to run a diagnostic collection to submit here but it stalled in the middle of generation and then the ui became totally unresponsive.. I switched to ssh and was able to see processes, but most file level activities would fail. I noticed there were thousands of plex healthcheck scripts there were stalled, so I tried to kill them. I noticed too there were alot of libreoffice processes open, I assume from filerun trying to generate metadata about files which I tried to kill. I also stopped the docker service and mover.. at the end of the day, I couldn't restore functionality so I had to reboot from the command line. Any clues on what happened? deathstar-diagnostics-20221219-1034.zip
  5. So yea, I had to figure out what Bonienl was talking about.. Maybe there is a better way, but I had to get a Smart switch that supports vlans, and create a VLAN, attach a network adapter to the VLAN port on the switch and then I attached the relevant docker network to the vlan.. It works.. but it's a pain to setup. Networking in Kubernetes would be easier, but I know that's not officially supported. Ultimately with my time in unraid while I love it as a storage device/internal app server, I wouldn't trust the isolation provided by docker and vlans at the moment for public internet facing apps.. But that's just me as a security person. I'm probably just going to setup a fedora server with kubernetes for any public internet facing apps.
  6. Thanks for the response! Yea it's a 24 bay case, but not all the drive bays are full in front so they don't have even airflow to keep the cool under parity load it seems. I will look into what I can do on that. As for the Docker image file, I'm using Directory layout instead of the default docker image encapsulation.
  7. See attached deathstar-diagnostics-20221009-1909.zip
  8. Hey, My Unraid server lost power, there was some corruption on my XFS drives, and then I was advised to update to the latest unraid version. I'm using a directory for docker instead of the default docker image. I tried docker image prune -a but it doesn't seem to make a difference. Somewhere along the way a number of app/containers were removed. When I try to reinstall them, I get the following errors: docker run -d --name='Postgres12.5' --net='netlan' -e TZ="America/Chicago" -e HOST_OS="Unraid" -e HOST_HOSTNAME="deathstar" -e HOST_CONTAINERNAME="Postgres12.5" -e 'POSTGRES_PASSWORD'='***' -e 'POSTGRES_USER'='dsmreadme' -e 'POSTGRES_DB'='dsmrdb' -l net.unraid.docker.managed=dockerman -l net.unraid.docker.icon='https://raw.githubusercontent.com/Flight777/unraid_justworks_templates/main/images/postgres/Postgresql_elephant.png' -p '5432:5432/tcp' -v '/mnt/user/postgres':'/var/lib/postgresql/data':'rw' 'postgres:12.5-alpine' a4a539f6a3aa5a3919852d623b360bb5365fa5ffcbb83cd9f6eecc9925a09759 docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "docker-entrypoint.sh": executable file not found in $PATH: unknown. The command failed. ---trying a bulk reinstall results in this: Starting binhex-krusader binhex-krusader failed to start. You should install it by itself to fix the errors Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "/usr/bin/tini": stat /usr/bin/tini: no such file or directory: unknown Error: failed to start containers: binhex-krusader Starting cyberchef Starting dupeGuru dupeGuru failed to start. You should install it by itself to fix the errors Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "/init": stat /init: no such file or directory: unknown Error: failed to start containers: dupeGuru Starting joplin joplin failed to start. You should install it by itself to fix the errors Error response from daemon: unable to find user joplin: no matching entries in passwd file Error: failed to start containers: joplin
  9. Hey, My Unraid server lost power, there was some corruption on my XFS drives, and then I was advised to update to the latest unraid version. Somewhere along the way the postgres app/container was removed. When I try to reinstall it, I get the following error: docker run -d --name='Postgres12.5' --net='netlan' -e TZ="America/Chicago" -e HOST_OS="Unraid" -e HOST_HOSTNAME="deathstar" -e HOST_CONTAINERNAME="Postgres12.5" -e 'POSTGRES_PASSWORD'='***' -e 'POSTGRES_USER'='dsmreadme' -e 'POSTGRES_DB'='dsmrdb' -l net.unraid.docker.managed=dockerman -l net.unraid.docker.icon='https://raw.githubusercontent.com/Flight777/unraid_justworks_templates/main/images/postgres/Postgresql_elephant.png' -p '5432:5432/tcp' -v '/mnt/user/postgres':'/var/lib/postgresql/data':'rw' 'postgres:12.5-alpine' a4a539f6a3aa5a3919852d623b360bb5365fa5ffcbb83cd9f6eecc9925a09759 docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "docker-entrypoint.sh": executable file not found in $PATH: unknown. The command failed.
  10. Updating to 6.11.0 seems to have fixed whatever permissions issue I had with tdarr
  11. this one is for tdarr node/worker /usr/local/emhttp/plugins/dynamix.docker.manager/scripts/docker run -d --name='tdarr_node' --net='host' -e TZ="America/Chicago" -e HOST_OS="Unraid" -e HOST_HOSTNAME="deathstar" -e HOST_CONTAINERNAME="tdarr_node" -e 'serverIP'='192.168.144.79' -e 'serverPort'='8266' -e 'nodeIP'='0.0.0.0' -e 'nodeID'='RTX 2060' -e 'TCP_PORT_8267'='8267' -e 'PUID'='99' -e 'PGID'='100' -e 'NVIDIA_VISIBLE_DEVICES'='GPU-c864a4ca-17df-980b-3ff9-c88561641fc3' -e 'NVIDIA_DRIVER_CAPABILITIES'='all' -e 'dummyvar'='dummyvar' -l net.unraid.docker.managed=dockerman -l net.unraid.docker.webui='http://[IP]:[PORT:8265]' -l net.unraid.docker.icon='https://raw.githubusercontent.com/selfhosters/unRAID-CA-templates/master/templates/img/tdarr.png' -v '/mnt/user/appdata/tdarr/configs':'/app/configs':'rw' -v '/mnt/user/appdata/tdarr/logs':'/app/logs':'rw' -v '/mnt/user/video/':'/mnt/media':'rw' -v '/mnt/user/temptrans/':'/temp':'rw' -v '/mnt/user/tfy/':'/mnt/tfy':'rw' --runtime=nvidia 'haveagitgat/tdarr_node' 79de41f641f64083a713cce18d92060f51416da9ee53ec6354bd1246027c9fe1
  12. /usr/local/emhttp/plugins/dynamix.docker.manager/scripts/docker run -d --name='tdarr' --net='host' -e TZ="America/Chicago" -e HOST_OS="Unraid" -e HOST_HOSTNAME="deathstar" -e HOST_CONTAINERNAME="tdarr" -e 'serverIP'='192.168.144.79' -e 'TCP_PORT_8266'='8266' -e 'TCP_PORT_8265'='8265' -e 'internalNode'='false' -e 'nodeIP'='0.0.0.0' -e 'nodeID'='MyInternalNode' -e 'TCP_PORT_8264'='8264' -e 'PUID'='99' -e 'PGID'='100' -e 'dummyvar'='dummyvar' -l net.unraid.docker.managed=dockerman -l net.unraid.docker.webui='http://[IP]:[PORT:8265]' -l net.unraid.docker.icon='https://raw.githubusercontent.com/selfhosters/unRAID-CA-templates/master/templates/img/tdarr.png' -v '/mnt/user/appdata/tdarr/server':'/app/server':'rw' -v '/mnt/user/appdata/tdarr/configs':'/app/configs':'rw' -v '/mnt/user/appdata/tdarr/logs':'/app/logs':'rw' -v '/mnt/user/video/':'/mnt/media':'rw' -v '/mnt/user/temptrans/':'/temp':'rw' -v '/mnt/user/tfy/':'/mnt/tfy':'rw' 'haveagitgat/tdarr' a24b6a9decf72870b88c3ee3f2f9b81986991e7c934934818f68cee13f9fdb45
  13. I recently updated to 6.10.3, and since then my tdarr installation seems to have some sort of permissions issue that didn't exist in 6.9.2. I have an SSD I use for transcoding, it's mounted as a share Error in Tdarr is: "Cache file /temp/Vid1.mkv (1492975718 bytes) does not match size of new cache file /mnt/media/Vid1.mkv (0 bytes)" deathstar-diagnostics-20220923-0037.zip
  14. Marked as solved, I haven't seen the btrfs error after the scrub and a reboot. I will open a separate ticket for the permissions problems.
  15. Weird, cause my syslog is still showing recent BTRFS errors:
  16. Hmmm, I ran scrub, tracked down the 3 files it mentioned, and removed them. I've rerun scrub since then and found no errors, but I'm still seeing BTRFS errors on the cache drive. Do I need to restart? deathstar-diagnostics-20220919-0619.zip
  17. Thanks! Running Scrub now. Now that I've updated to 10.3, should I open a separate thread for the permission issue it creates?
  18. Updated to 10.3, still seeing the btrfs errors. What should I check out now? deathstar-diagnostics-20220918-0427.zip
  19. I will try to update the version again and report back. I guess if it causes permission issues again with the update, I will try to figure out how to fix it with the Dynamix Permissions plugin.. For my piece of mind, What does this error mean in the log: Sep 17 03:48:27 deathstar kernel: BTRFS warning (device dm-15): csum failed root 5 ino 1838869 off 291581952 csum 0x069f7410 expected csum 0xf7d976f9 mirror 1 Sep 17 03:48:27 deathstar kernel: BTRFS error (device dm-15): bdev /dev/mapper/sdv1 errs: wr 0, rd 0, flush 0, corrupt 2986, gen 0 Sep 17 03:48:27 deathstar kernel: BTRFS warning (device dm-15): csum failed root 5 ino 1838869 off 291581952 csum 0x069f7410 expected csum 0xf7d976f9 mirror 1 Sep 17 03:48:27 deathstar kernel: BTRFS error (device dm-15): bdev /dev/mapper/sdv1 errs: wr 0, rd 0, flush 0, corrupt 2987, gen 0 Sep 17 03:48:29 deathstar kernel: BTRFS warning (device dm-15): csum failed root 5 ino 1838869 off 291581952 csum 0x069f7410 expected csum 0xf7d976f9 mirror 1 Sep 17 03:48:29 deathstar kernel: BTRFS error (device dm-15): bdev /dev/mapper/sdv1 errs: wr 0, rd 0, flush 0, corrupt 2988, gen 0 Sep 17 03:48:29 deathstar kernel: BTRFS warning (device dm-15): csum failed root 5 ino 1838869 off 291581952 csum 0x069f7410 expected csum 0xf7d976f9 mirror 1 Sep 17 03:48:29 deathstar kernel: BTRFS error (device dm-15): bdev /dev/mapper/sdv1 errs: wr 0, rd 0, flush 0, corrupt 2989, gen 0
  20. Absolutely, Backups have been made, but I'm more curious about if the system is fault tolerant enough to operate if the cache drive dies or if the system halts, since I don't know how long it would take me to get a replacement cache drive.
  21. Thanks, I'd heard of Dynamix but was kind of worried about manually changing permissions to get it to work in 10.3 in case that messed with any default permission configurations needed for upgrade paths in the future. Some of my system shares are told to prefer cache.. What happens if cache really fails? Will the system no longer be operational until I replace the cache drive or disable cache?
  22. Interesting! I will try the update. I had updated to 10.3 before, but ran into surprise permission issues on shares and rolled back to my previous Unraid version and the permission issues went away. I guess I will have to see if they are fixed in the newest version
  23. Here's the Diagnostic after the reboot and things seem operational. Btw I haven't moved any cables or anything, everything hardware wise has been solid until now. deathstar-diagnostics-20220913-2334.zip
  24. I went to update the plex app, and it seemed to be stuck there forever. Eventually I reloaded the unraid docker page and it gave me an error on that page, something to the effect of plex.ico was read-only.. Then everything went bonkers. I started seeing errors in the systemlog about my cache drive not being accessible. I exported a diagnostic log at that time<attached>: I went ahead and restarted the server to see maybe the docker container update and put something in a stuck state.. When I restarted, 4 of my pool drives said they were unmountable. I searched and found a thread about booting into maintenance mode and running xfs_repair on the drives. It fixed some things and then I rebooted. Everything seems fine now, but i'm worried. I ran a smart test on the cache drive, and it said it had errors, but I've never had good luck with my smart reports in unraid<attached>: Anyone want to chime in on health insights or other suggestions? Some of my system shares are told to prefer cache.. What happens if cache really fails? Will the system no longer be operational until I replace the cache drive or disable cache? deathstar-diagnostics-20220913-2024.zip deathstar-smart-20220913-2221.zip