lovingHDTV

Members
  • Posts

    603
  • Joined

  • Last visited

Posts posted by lovingHDTV

  1. I've two containers that have shown No Available for a couple weeks now.  Both are available when I go to the docker page.

     

    One is the official ESPHome: https://hub.docker.com/r/esphome/esphome

    The other is my own here: ghcr.io/dpeart/heatmaster:main

     

    How do I figure out why? I know that ESPHome has been updated and I need the update. Not as concerned about mine as I've not changed it.

     

    thanks

    david

  2. I saw this in the sylog I attached.  This is the syslog from before the hang/reboot.


     

    Dec 19 02:42:34 tower  emhttpd: Unregistered - flash device error (ENOFLASH7)
    Dec 19 02:42:34 tower  emhttpd: error: device_read_smart, 8372: Cannot allocate memory (12): device_spinup: stream did not open: sdj
    Dec 19 02:42:34 tower  emhttpd: error: device_read_smart, 8372: Cannot allocate memory (12): device_spinup: stream did not open: sdk
    Dec 19 02:42:34 tower  emhttpd: error: device_read_smart, 8372: Cannot allocate memory (12): device_spinup: stream did not open: sdh

     

  3. This AM the server was unresponsive.  I could ping it, but could not log in.  Even when trying to log in via the console, it would accept the user name, then just return to the login prompt.  I had a bunch of messages on the console that said something about monitor_nchan had issues with local resources.

     

    I had to hard reboot it, ugh, it is back up.  Attached are the diagnostics.  I looks like there are some drive errors, like for all the drives.  Any help appreciated.

     

    I had the syslog server going so here is the syslog. 

     

    thanks

    tower-diagnostics-20221219-0623.zip syslog-192.168.1.107.log

  4. I recreated the docker.img and it reports no errors.  However the container that was marked as unhealthy is still marked unhealthy.  Here is the new diagnostics.  I did move the docker.img from cache_hdd to cache_nvme.  Wow the performance gain is crazy.

     

    Is the change to ipvlan a simple change in the docker settings or do I have to do something additional?  I manually set all my container IPs.

     

    thanks

    david

    tower-diagnostics-20221213-0721.zip

  5. I am still getting BTRFS issues.  Last night I noticed the new cache pool had issues.  I had also seen some CRC issues earlier.  In the past when I saw CRC errors it was caused by power issues. I also noticed that several dockers had stopped, as the cache_hdd pool has all the dockers and it was set read only by the BTRFS errors.  The array stopped, but I couldn't shutdown because one of the dockers, even though docker had shutdown, was still running.  So I had to hard shutdown.

     

    I moved the two newly added cache_hdd drives to sata ports on the motherboard and off the SAS port.  I also put them on their own power connection.  

     

    This AM, I see that there are BTRFS errors on loop2, which is now read only, and dockers are screwed up.  The errors are not on the cache_hdd devices, just loop2.  I also see that two dockers are marked as unhealthy.  I did notice that one was marked unhealthy yesterday.  I'm beginning to think that docker.img is screwed up?  

     

    Here are the diags.tower-diagnostics-20221213-0616.zip

  6. I didn't try that.  I did notice that on 6.11.1 it has the same mac address, which isn't the original one.  I have my router setup to static DHCP the IP.  I think this has been an issue for a while, because I had to statically apply the IP within RunRaid a while ago tp get a consistent IP.

  7. Just upgraded from 6.11.1 and lost connectivity to the internet.  I can access UnRaid, but on a terminal I cannot ping outside the local network.  I got my first notification of this when fix common problems reported github was unreachable.

     

    Not sure if this matters, but the mac address changed.  And I didn't move the network cable.

     

    reverting back to 6.11.1 fixed the networking issue.

     

    tower-diagnostics-20221210-1950.zip

  8. When I run docker build, it created temporary images, then reports that it deletes them.

     

    However, when I look at the docker in the UI I see the images, and have to remove them manually.  When I do I get three different results.

    1. it removes

    2. I get a message saying it cannot remove becuase it is used by another image, then it removes it anyway

    3. I get a message the image doesn't exist.

     

    I know that when I build dockers using the UI, none of the happens.  What do I need to do , for my manual docker builds to make them clean?

     

    thanks

  9. I was able to get things back up and running.

     

    There were a few files I couldn't copy off the corrupted drive, fortunately they were debug logs.

     

    I then replace the drive with two 4TB drives and created a cache pool, managed to get all my dockers setup using the new cache pool and everything is up and running.

     

    The good news is that I was meaning to fix all this, as my setup is based on UnRaid prior to cache pools.  So now everything should work better.

     

    thanks

    • Like 1
  10. Because of a file system failure, I'm reconfiguring my cache pools.  I had some dockers running on a drive mounted outside the array, and not in a share that exists when I reconfigured.  I did the setup before cache pools were a thing.  The downsides of running UnRaid since it was first released is old setup issues.

     

    I needed to start docker, then change the settings for each affected docker to point to the new area, where I had copied the config data.  However, when I started docker it tried to autostart all the dockers, per their settings when I stopped the docker service.

     

    This caused all kinds of problems as each docker then created a path to their location, which of coarse didn't exist and then the docker wouldn't work because it wasn't configured.

     

    I would have liked to have started the docker, with an override set to not auto start any docker. Is this available?

     

    Instead, I had to stop all the dockers, remove the directories they created upon starting, re-copy the data from my backup area to the appdata folder and restart the dockers.  It took a few hours to complete, but I did finally get there.

     

    Had I been able to disable autostart I could have avoided all this mess.

     

    thanks

  11. It is a bit better, the free space errors are gone:

     

    root@tower:~# btrfs check /dev/sdh1
    Opening filesystem to check...
    Checking filesystem on /dev/sdh1
    UUID: 10cf35ee-3e74-4215-a481-d7012316918c
    [1/7] checking root items
    [2/7] checking extents
    [3/7] checking free space cache
    [4/7] checking fs roots
    root 5 inode 77786 errors 200, dir isize wrong
    root 5 inode 3812802 errors 1, no inode item
            unresolved ref dir 77786 index 705843 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref
    root 5 inode 3812803 errors 1, no inode item
            unresolved ref dir 77786 index 705845 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref
    root 5 inode 3812804 errors 1, no inode item
            unresolved ref dir 77786 index 705847 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref
    root 5 inode 3812806 errors 1, no inode item
            unresolved ref dir 77786 index 705849 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref
    root 5 inode 3812807 errors 1, no inode item
            unresolved ref dir 77786 index 705851 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref
    root 5 inode 3812808 errors 1, no inode item
            unresolved ref dir 77786 index 705853 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref
    root 5 inode 3812809 errors 1, no inode item
            unresolved ref dir 77786 index 705855 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref
    root 5 inode 3812810 errors 1, no inode item
            unresolved ref dir 77786 index 705857 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref
    root 5 inode 3812811 errors 1, no inode item
            unresolved ref dir 77786 index 705859 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref
    ERROR: errors found in fs roots
    found 1629164007424 bytes used, error(s) found
    total csum bytes: 1560507564
    total tree bytes: 3357343744
    total fs tree bytes: 875036672
    total extent tree bytes: 483115008
    btree space waste bytes: 676869574
    file data blocks allocated: 8606183337984
     referenced 1604909236224

     

  12. As the web page hung, and I could only see base files.  Nothing under /mnt /boot.  I rebooted.

     

    I then ran btrfs on the drive that reported issues:

    root@tower:~# btrfs check /dev/sdh1  
    Opening filesystem to check...
    Checking filesystem on /dev/sdh1
    UUID: 10cf35ee-3e74-4215-a481-d7012316918c
    [1/7] checking root items
    [2/7] checking extents
    [3/7] checking free space cache
    block group 677561499648 has wrong amount of free space, free space cache has 475136 block group has 491520
    failed to load free space cache for block group 677561499648
    block group 853655158784 has wrong amount of free space, free space cache has 2248704 block group has 2625536
    failed to load free space cache for block group 853655158784
    ...
    block group 2235560886272 has wrong amount of free space, free space cache has 696348672 block group has 753799168
    failed to load free space cache for block group 2235560886272
    block group 2236634628096 has wrong amount of free space, free space cache has 794693632 block group has 826208256
    failed to load free space cache for block group 2236634628096
    block group 2238782111744 has wrong amount of free space, free space cache has 798904320 block group has 841105408
    failed to load free space cache for block group 2238782111744
    block group 2239855853568 has wrong amount of free space, free space cache has 782303232 block group has 834203648
    failed to load free space cache for block group 2239855853568
    block group 2240929595392 has wrong amount of free space, free space cache has 793395200 block group has 864063488
    failed to load free space cache for block group 2240929595392
    block group 2242003337216 has wrong amount of free space, free space cache has 832372736 block group has 900947968
    failed to load free space cache for block group 2242003337216
    [4/7] checking fs roots
    root 5 inode 77786 errors 200, dir isize wrong
    root 5 inode 3812802 errors 1, no inode item
            unresolved ref dir 77786 index 705843 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref
    root 5 inode 3812803 errors 1, no inode item
            unresolved ref dir 77786 index 705845 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref
    root 5 inode 3812804 errors 1, no inode item
            unresolved ref dir 77786 index 705847 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref
    root 5 inode 3812806 errors 1, no inode item
            unresolved ref dir 77786 index 705849 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref
    root 5 inode 3812807 errors 1, no inode item
            unresolved ref dir 77786 index 705851 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref
    root 5 inode 3812808 errors 1, no inode item
            unresolved ref dir 77786 index 705853 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref
    root 5 inode 3812809 errors 1, no inode item
            unresolved ref dir 77786 index 705855 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref
    root 5 inode 3812810 errors 1, no inode item
            unresolved ref dir 77786 index 705857 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref
    root 5 inode 3812811 errors 1, no inode item
            unresolved ref dir 77786 index 705859 namelen 15 name metrics.interim filetype 1 errors 5, no dir item, no inode ref
    ERROR: errors found in fs roots
    found 1629647908864 bytes used, error(s) found
    total csum bytes: 1560507564
    total tree bytes: 3358048256
    total fs tree bytes: 875036672
    total extent tree bytes: 483098624
    btree space waste bytes: 676815308
    file data blocks allocated: 8606666534912
     referenced 1605392433152

     

    This is a drive that is unassigned.  It contains IO heavy containers, docker.img, and local backups.

     

    Should I try to repair this?

     

    tower-diagnostics-20221208-0647.zip

  13. So I've been doing a ton of:

     

    docker build 
    docker run
    docker stop
    docker rm

     

    Today as I try to get my docker container running.  I've noticed that after docker build there are several dockers shown by UnRaid that are not anything that I did.

     

    So I've been deleting them as I go.  Kind of annoying, but after you get enough of the you get a warning that docker.img is 71% full.

     

    So I was removing several of the abandoned containers when things stopped working.

     

    On my console I see the BTRFS critical corrupt leaf for sdh1 and loop2

     

    there is now nothing in /mnt/*

     

    The web page is no longer working, but I do have a command prompt.

     

    Suggestions on how to proceed?

     

    thanks

     

  14. Ok I figured out that I need to at least do:

    docker run --net br0.10 --name heatmaster --ip='192.168.10.246' -p="5000" -d hello-world

     

    This does create it so that I can see in the Docker page the correct IP address:port.  It looks just like the others.

     

    However, when I try to go to the port I get a permission denied message.

     

    If I get a console, I can access at 127.0.0.1:5000 just fine.

     

    Here is my container:

    FROM ubuntu
    
    RUN apt-get update
    RUN apt-get -y install python3
    RUN apt-get -y install python3-pip
    RUN apt-get -y install curl
    RUN pip install flask
    RUN pip install playwright
    
    ADD app.py /
    ADD heatmaster.py /
    WORKDIR /
    
    USER nobody
    
    EXPOSE 5000
    
    CMD ["python3","app.py"]

     

    Do I have to something on the CMD line to say to run on port 5000?

     

     

  15. this is my first attempt at creating a docker container and running it on UnRaid.

     

    I create my container using docker build.  I then try to run it with:

     

    docker run --name heatmaster --ip='192.168.20.246' -d heatmaster

     

    It runs, but I don't get assigned that ip address. I can get a console into it as well, but the IP address is a 127... address, not the one I assigned.

     

    How do I got about assigning an ip address to the container?  I got my command line by looking at an UnRaid container when it started up.

     

    thanks

    david

  16. I've been getting seemingly random crashes after upgrading to 6.11.  I'm currently at 6.11.1 and will update after my parity check completes from this AMs crash.

     

    Is there a way to save the previous syslog so I can keep the crash information for the next time?

     

    I've attached my syslogs after the reboot.  I did see an out of date plg that I removed, and fixed my SMTP settings. 

     

    Ideas?

    tower-diagnostics-20221128-0810.zip