Jump to content

DizRD

Members
  • Posts

    37
  • Joined

  • Last visited

Posts posted by DizRD

  1. My unraid server lost power two times consecutively the other night... and I've have some performance issues since then.. Slow access, etc. I noticed the system was cycling between trying to run parity check and run mover over and over.

     

    I disabled mover, ran parity check, took the array offiline then ran btrfs --readonly on one of the cache drives.. (/dev/sde) it reported multiple btrfs errors, so I had to run --repait as well. I smart checked the drive and it seems fine, but I'm seeing alot of IO errors in the syslog it looks like.

    Mover runs now and doesn't pause, but it doesn't seem to ever complete and I stop seeing it in the logs after a point in time. I'm not sure if the files that were on that cache drive seem to have issues, I can't tell if it's just file level corruption for those files or if the drive is failing, so I thought I'd get some other eyes on it for opinions, below is some of the log weirdness, That's not filtered on my part.:

     

    Jul  1 10:32:57 deathstar  move: file: //..g/...
    Jul  1 10:32:57 deathstar  move: move_object: //..g/... File exists
    Jul  1 10:32:58 deathstar  move: skip: /mnt/cache/postgres/global/1262
    Jul  1 10:32:59 deathstar  move: skip: /mnt/cache/postgres/global/6100
    Jul  1 10:33:00 deathstar  move: file: //..t/...
    ### [PREVIOUS LINE REPEATED 4 TIMES] ###
    Jul  1 10:33:03 deathstar  emhttpd: read SMART /dev/sdu
    Jul  1 10:33:03 deathstar  move: file: //..d/...
    Jul  1 10:33:04 deathstar  move: file: //..0/...
    Jul  1 10:33:05 deathstar  move: file: //..t/...
    Jul  1 10:33:05 deathstar  move: file: //..k/...
    Jul  1 10:33:06 deathstar  move: file: //..f/...
    Jul  1 10:33:06 deathstar  move: skip: /mnt/unprotectedcache/ebooks/.config/openbox/autostart
    Jul  1 10:33:06 deathstar root: Specified filename //..e/... does not exist.
    Jul  1 10:33:06 deathstar  move: file: //..e/...
    Jul  1 10:33:06 deathstar  move: move_object: //..e/... No such file or directory

    deathstar-diagnostics-20230701-1500.zip

  2. Something weird happened last night.. My unraid server just kind of choked? I noticed some apps started stalling. the major ones running were plex and tdarr. I've recently added filerun to the mix as well and was browsing files at the time it choked with filerun. I tried to stop the apps, but the UI wouldn't let me. I looked at the logs and saw a bunch of: sshfs cache share full messages. I tried to run a diagnostic collection to submit here but it stalled in the middle of generation and then the ui became totally unresponsive..

     

    I switched to ssh and was able to see processes, but most file level activities would fail. I noticed there were thousands of plex healthcheck scripts there were stalled, so I tried to kill them. I noticed too there were alot of libreoffice processes open, I assume from filerun trying to generate metadata about files which I tried to kill. I also stopped the docker service and mover.. at the end of the day, I couldn't restore functionality so I had to reboot from the command line. 

     

    Any clues on what happened?

    deathstar-diagnostics-20221219-1034.zip

  3. So yea, I had to figure out what Bonienl was talking about.. Maybe there is a better way, but I had to get a Smart switch that supports vlans, and create a VLAN, attach a network adapter to the VLAN port on the switch and then I attached the relevant docker network to the vlan.. It works.. but it's a pain to setup. Networking in Kubernetes would be easier, but I know that's not officially supported. Ultimately with my time in unraid while I love it as a storage device/internal app server, I wouldn't trust the isolation provided by docker and vlans at the moment for public internet facing apps.. But that's just me as a security person. I'm probably just going to setup a fedora server with kubernetes for any public internet facing apps.

  4. Thanks for the response!

     

    Yea it's a 24 bay case, but not all the drive bays are full in front so they don't have even airflow to keep the cool under parity load it seems. I will look into what I can do on that.

     

    As for the Docker image file, I'm using Directory layout instead of the default docker image encapsulation.

  5. Hey, My Unraid server lost power, there was some corruption on my XFS drives, and then I was advised to update to the latest unraid version. I'm using a directory for docker instead of the default docker image. I tried docker image prune -a but it doesn't seem to make a difference. Somewhere along the way a number of  app/containers were removed. When I try to reinstall them, I get the following errors:

     

    docker run
      -d
      --name='Postgres12.5'
      --net='netlan'
      -e TZ="America/Chicago"
      -e HOST_OS="Unraid"
      -e HOST_HOSTNAME="deathstar"
      -e HOST_CONTAINERNAME="Postgres12.5"
      -e 'POSTGRES_PASSWORD'='***'
      -e 'POSTGRES_USER'='dsmreadme'
      -e 'POSTGRES_DB'='dsmrdb'
      -l net.unraid.docker.managed=dockerman
      -l net.unraid.docker.icon='https://raw.githubusercontent.com/Flight777/unraid_justworks_templates/main/images/postgres/Postgresql_elephant.png'
      -p '5432:5432/tcp'
      -v '/mnt/user/postgres':'/var/lib/postgresql/data':'rw' 'postgres:12.5-alpine'
    a4a539f6a3aa5a3919852d623b360bb5365fa5ffcbb83cd9f6eecc9925a09759
    docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "docker-entrypoint.sh": executable file not found in $PATH: unknown.

    The command failed.

     

    ---trying a bulk reinstall results in this:

     

    Starting binhex-krusader
    binhex-krusader failed to start. You should install it by itself to fix the errors
    Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "/usr/bin/tini": stat /usr/bin/tini: no such file or directory: unknown
    Error: failed to start containers: binhex-krusader

    Starting cyberchef
    Starting dupeGuru
    dupeGuru failed to start. You should install it by itself to fix the errors
    Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "/init": stat /init: no such file or directory: unknown
    Error: failed to start containers: dupeGuru

    Starting joplin
    joplin failed to start. You should install it by itself to fix the errors
    Error response from daemon: unable to find user joplin: no matching entries in passwd file
    Error: failed to start containers: joplin
     

  6. Hey, My Unraid server lost power, there was some corruption on my XFS drives, and then I was advised to update to the latest unraid version. Somewhere along the way the postgres app/container was removed. When I try to reinstall it, I get the following error:

     

    docker run
      -d
      --name='Postgres12.5'
      --net='netlan'
      -e TZ="America/Chicago"
      -e HOST_OS="Unraid"
      -e HOST_HOSTNAME="deathstar"
      -e HOST_CONTAINERNAME="Postgres12.5"
      -e 'POSTGRES_PASSWORD'='***'
      -e 'POSTGRES_USER'='dsmreadme'
      -e 'POSTGRES_DB'='dsmrdb'
      -l net.unraid.docker.managed=dockerman
      -l net.unraid.docker.icon='https://raw.githubusercontent.com/Flight777/unraid_justworks_templates/main/images/postgres/Postgresql_elephant.png'
      -p '5432:5432/tcp'
      -v '/mnt/user/postgres':'/var/lib/postgresql/data':'rw' 'postgres:12.5-alpine'
    a4a539f6a3aa5a3919852d623b360bb5365fa5ffcbb83cd9f6eecc9925a09759
    docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "docker-entrypoint.sh": executable file not found in $PATH: unknown.

    The command failed.

  7. this one is for tdarr node/worker

     

    /usr/local/emhttp/plugins/dynamix.docker.manager/scripts/docker run -d --name='tdarr_node' --net='host' -e TZ="America/Chicago" -e HOST_OS="Unraid" -e HOST_HOSTNAME="deathstar" -e HOST_CONTAINERNAME="tdarr_node" -e 'serverIP'='192.168.144.79' -e 'serverPort'='8266' -e 'nodeIP'='0.0.0.0' -e 'nodeID'='RTX 2060' -e 'TCP_PORT_8267'='8267' -e 'PUID'='99' -e 'PGID'='100' -e 'NVIDIA_VISIBLE_DEVICES'='GPU-c864a4ca-17df-980b-3ff9-c88561641fc3' -e 'NVIDIA_DRIVER_CAPABILITIES'='all' -e 'dummyvar'='dummyvar' -l net.unraid.docker.managed=dockerman -l net.unraid.docker.webui='http://[IP]:[PORT:8265]' -l net.unraid.docker.icon='https://raw.githubusercontent.com/selfhosters/unRAID-CA-templates/master/templates/img/tdarr.png' -v '/mnt/user/appdata/tdarr/configs':'/app/configs':'rw' -v '/mnt/user/appdata/tdarr/logs':'/app/logs':'rw' -v '/mnt/user/video/':'/mnt/media':'rw' -v '/mnt/user/temptrans/':'/temp':'rw' -v '/mnt/user/tfy/':'/mnt/tfy':'rw' --runtime=nvidia 'haveagitgat/tdarr_node'

    79de41f641f64083a713cce18d92060f51416da9ee53ec6354bd1246027c9fe1

  8. /usr/local/emhttp/plugins/dynamix.docker.manager/scripts/docker run -d --name='tdarr' --net='host' -e TZ="America/Chicago" -e HOST_OS="Unraid" -e HOST_HOSTNAME="deathstar" -e HOST_CONTAINERNAME="tdarr" -e 'serverIP'='192.168.144.79' -e 'TCP_PORT_8266'='8266' -e 'TCP_PORT_8265'='8265' -e 'internalNode'='false' -e 'nodeIP'='0.0.0.0' -e 'nodeID'='MyInternalNode' -e 'TCP_PORT_8264'='8264' -e 'PUID'='99' -e 'PGID'='100' -e 'dummyvar'='dummyvar' -l net.unraid.docker.managed=dockerman -l net.unraid.docker.webui='http://[IP]:[PORT:8265]' -l net.unraid.docker.icon='https://raw.githubusercontent.com/selfhosters/unRAID-CA-templates/master/templates/img/tdarr.png' -v '/mnt/user/appdata/tdarr/server':'/app/server':'rw' -v '/mnt/user/appdata/tdarr/configs':'/app/configs':'rw' -v '/mnt/user/appdata/tdarr/logs':'/app/logs':'rw' -v '/mnt/user/video/':'/mnt/media':'rw' -v '/mnt/user/temptrans/':'/temp':'rw' -v '/mnt/user/tfy/':'/mnt/tfy':'rw' 'haveagitgat/tdarr'

    a24b6a9decf72870b88c3ee3f2f9b81986991e7c934934818f68cee13f9fdb45

  9. Weird, cause my syslog is still showing recent BTRFS errors:

    Quote

    Sep 19 22:40:42 deathstar kernel: BTRFS warning (device dm-15): csum failed root 5 ino 1838869 off 291581952 csum 0x069f7410 expected csum 0xf7d976f9 mirror 1

    Sep 19 22:40:42 deathstar kernel: BTRFS error (device dm-15): bdev /dev/mapper/sdw1 errs: wr 0, rd 0, flush 0, corrupt 5732, gen 0

    Sep 19 22:40:42 deathstar kernel: BTRFS warning (device dm-15): csum failed root 5 ino 1838869 off 291581952 csum 0x069f7410 expected csum 0xf7d976f9 mirror 1

    Sep 19 22:40:42 deathstar kernel: BTRFS error (device dm-15): bdev /dev/mapper/sdw1 errs: wr 0, rd 0, flush 0, corrupt 5733, gen 0

    Sep 19 22:40:58 deathstar kernel: BTRFS warning (device dm-15): csum failed root 5 ino 20864475 off 291581952 csum 0x069f7410 expected csum 0xf7d976f9 mirror 1

    Sep 19 22:40:58 deathstar kernel: BTRFS error (device dm-15): bdev /dev/mapper/sdw1 errs: wr 0, rd 0, flush 0, corrupt 5734, gen 0

    Sep 19 22:40:58 deathstar kernel: BTRFS warning (device dm-15): csum failed root 5 ino 20864475 off 291581952 csum 0x069f7410 expected csum 0xf7d976f9 mirror 1

    Sep 19 22:40:58 deathstar kernel: BTRFS error (device dm-15): bdev /dev/mapper/sdw1 errs: wr 0, rd 0, flush 0, corrupt 5735, gen 0

    Sep 19 22:41:38 deathstar kernel: BTRFS warning (device dm-15): csum failed root 5 ino 20864461 off 291581952 csum 0x069f7410 expected csum 0xf7d976f9 mirror 1

    Sep 19 22:41:38 deathstar kernel: BTRFS error (device dm-15): bdev /dev/mapper/sdw1 errs: wr 0, rd 0, flush 0, corrupt 5736, gen 0

    Sep 19 22:41:38 deathstar kernel: BTRFS warning (device dm-15): csum failed root 5 ino 20864461 off 291581952 csum 0x069f7410 expected csum 0xf7d976f9 mirror 1

    Sep 19 22:41:38 deathstar kernel: BTRFS error (device dm-15): bdev /dev/mapper/sdw1 errs: wr 0, rd 0, flush 0, corrupt 5737, gen 0

     

     

  10. I will try to update the version again and report back. I guess if it causes permission issues again with the update, I will try to figure out how to fix it with the Dynamix Permissions plugin..

     

    For my piece of mind, What does this error mean in the log:

    Sep 17 03:48:27 deathstar kernel: BTRFS warning (device dm-15): csum failed root 5 ino 1838869 off 291581952 csum 0x069f7410 expected csum 0xf7d976f9 mirror 1 Sep 17 03:48:27 deathstar kernel: BTRFS error (device dm-15): bdev /dev/mapper/sdv1 errs: wr 0, rd 0, flush 0, corrupt 2986, gen 0 Sep 17 03:48:27 deathstar kernel: BTRFS warning (device dm-15): csum failed root 5 ino 1838869 off 291581952 csum 0x069f7410 expected csum 0xf7d976f9 mirror 1 Sep 17 03:48:27 deathstar kernel: BTRFS error (device dm-15): bdev /dev/mapper/sdv1 errs: wr 0, rd 0, flush 0, corrupt 2987, gen 0 Sep 17 03:48:29 deathstar kernel: BTRFS warning (device dm-15): csum failed root 5 ino 1838869 off 291581952 csum 0x069f7410 expected csum 0xf7d976f9 mirror 1 Sep 17 03:48:29 deathstar kernel: BTRFS error (device dm-15): bdev /dev/mapper/sdv1 errs: wr 0, rd 0, flush 0, corrupt 2988, gen 0 Sep 17 03:48:29 deathstar kernel: BTRFS warning (device dm-15): csum failed root 5 ino 1838869 off 291581952 csum 0x069f7410 expected csum 0xf7d976f9 mirror 1 Sep 17 03:48:29 deathstar kernel: BTRFS error (device dm-15): bdev /dev/mapper/sdv1 errs: wr 0, rd 0, flush 0, corrupt 2989, gen 0

  11. Thanks, I'd heard of Dynamix but was kind of worried about manually changing permissions to get it to work in 10.3 in case that messed with any default permission configurations needed for upgrade paths in the future.

     

    Some of my system shares are told to prefer cache.. What happens if cache really fails? Will the system no longer be operational until I replace the cache drive or disable cache?

     

  12. Interesting! I will try the update. I had updated to 10.3 before, but ran into surprise permission issues on shares and rolled back to my previous Unraid version and the permission issues went away. I guess I will have to see if they are fixed in the newest version

  13. I went to update the plex app, and it seemed to be stuck there forever. Eventually I reloaded the unraid docker page and it gave me an error on that page, something to the effect of plex.ico was read-only.. Then everything went bonkers. I started seeing errors in the systemlog about my cache drive not being accessible. I exported a diagnostic log at that time<attached>:

     

    I went ahead and restarted the server to see maybe the docker container update and put something in a stuck state.. When I restarted, 4 of my pool drives said they were unmountable. I searched and found a thread about booting into maintenance mode and running xfs_repair on the drives. It fixed some things and then I rebooted.

     

    Everything seems fine now, but i'm worried.

    I ran a smart test on the cache drive, and it said it had errors, but I've never had good luck with my smart reports in unraid<attached>:

     

    Anyone want to chime in on health insights or other suggestions?

     

    Some of my system shares are told to prefer cache.. What happens if cache really fails? Will the system no longer be operational until I replace the cache drive or disable cache?

     

     

     

     

    deathstar-diagnostics-20220913-2024.zip deathstar-smart-20220913-2221.zip

×
×
  • Create New...