vagrantprodigy

Members
  • Posts

    51
  • Joined

  • Last visited

Posts posted by vagrantprodigy

  1. 50 minutes ago, itimpi said:


    no way to know for certain then, but typically the list+found folder are where folders/files end up if their directory entry cannot be found.   With only one file there it does not sound as if you had serious corruption.     You can use the Linux ‘file’ command on the one in lost+found to help identify its type and thus what program to check it with.

     

    do you have backups you can use to check against with what is on unRaid?

    Only of Appdata, Docs, stuff like that. It's an 8TB drive, so I don't have a backup of all of it.

     

    I ran du against the file, and it shows a size of 0, and the file command returns the value empty, so I'm assuming everything is there. Thank you for all of your help.

  2. 13 minutes ago, itimpi said:


    Not for certain if you do not have a list of what was there before.     However, if no lost+found folder was created on the drive then it is highly likely that nothing was lost.

    I do have a lost+found folder, with 1 item in it. I don't have an exact list of what was on the disk previously.

  3. xfs_repair -L /dev/md1
    Phase 1 - find and verify superblock...
    Phase 2 - using internal log
            - zero log...
    ALERT: The filesystem has valuable metadata changes in a log which is being
    destroyed because the -L option was used.
            - scan filesystem freespace and inode maps...
    agi unlinked bucket 48 is 126543024 in ag 0 (inode=126543024)
    invalid start block 1481005391 in record 351 of cnt btree block 2/28863905
    agf_freeblks 13906192, counted 13906190 in ag 2
    sb_icount 73728, counted 49024
    sb_ifree 8475, counted 13377
    sb_fdblocks 252972481, counted 355543473
            - found root inode chunk
    Phase 3 - for each AG...
            - scan and clear agi unlinked lists...
            - process known inodes and perform inode discovery...
            - agno = 0
            - agno = 1
            - agno = 2
            - agno = 3
            - agno = 4
            - agno = 5
            - agno = 6
            - agno = 7
            - process newly discovered inodes...
    Phase 4 - check for duplicate blocks...
            - setting up duplicate extent list...
            - check for inodes claiming duplicate blocks...
            - agno = 1
            - agno = 2
            - agno = 0
            - agno = 3
            - agno = 6
            - agno = 7
            - agno = 5
            - agno = 4
    Phase 5 - rebuild AG headers and trees...
            - reset superblock...
    Phase 6 - check inode connectivity...
            - resetting contents of realtime bitmap and summary inodes
            - traversing filesystem ...
            - traversal finished ...
            - moving disconnected inodes to lost+found ...
    disconnected inode 126543024, moving to lost+found
    Phase 7 - verify and correct link counts...
    Maximum metadata LSN (78:3556288) is ahead of log (1:2).
    Format log to cycle 81.
    done

  4. 3 minutes ago, itimpi said:

    NO!    If you issue a format that will erase the contents of the disk and update parity to reflect this.

     

    You should run the xfs_repair command without -n and with -L to repair the file system.

    Thank you. I will do that now.

  5. My shares became unmounted a few minutes ago, and after a reboot, I'm getting errors that disk 1 is unmountable. The console is showing

    XFS (md1): Internal error i !: 1 at line 2111 of file fs/xfs/libxfs/xfs_)alloc.c. Caller xfs_free_ag_extent+0x3b7/0x602 [xfs]

    there are a few more lines, and it ends XFS (md1): Failed to recover intents

     

    In the meantime, all of my containers have vanished, I suspect my docker image was on that drive, and for whatever reason it isn't reading from parity? 

     

    Edit: Actually, looks like all of the data on disk1 just isn't being read from parity. Any ideas on why that data would just be missing?

     

    Edit 2: I tried xfs_repair -n /dev/md1, and got:

     

    Phase 1 - find and verify superblock...
    Phase 2 - using internal log
            - zero log...
    ALERT: The filesystem has valuable metadata changes in a log which is being
    ignored because the -n option was used.  Expect spurious inconsistencies
    which may be resolved by first mounting the filesystem to replay the log.
            - scan filesystem freespace and inode maps...
    agi unlinked bucket 48 is 126543024 in ag 0 (inode=126543024)
    invalid start block 1481005391 in record 351 of cnt btree block 2/28863905
    agf_freeblks 13906192, counted 13906190 in ag 2
    sb_icount 73728, counted 49024
    sb_ifree 8475, counted 13377
    sb_fdblocks 252972481, counted 355543473
            - found root inode chunk
    Phase 3 - for each AG...
            - scan (but don't clear) agi unlinked lists...
            - process known inodes and perform inode discovery...
            - agno = 0
            - agno = 1
            - agno = 2
            - agno = 3
            - agno = 4
            - agno = 5
            - agno = 6
            - agno = 7
            - process newly discovered inodes...
    Phase 4 - check for duplicate blocks...
            - setting up duplicate extent list...
    free space (2,155157649-155157650) only seen by one free space btree
            - check for inodes claiming duplicate blocks...
            - agno = 3
            - agno = 2
            - agno = 1
            - agno = 5
            - agno = 4
            - agno = 6
            - agno = 0
            - agno = 7
    No modify flag set, skipping phase 5
    Phase 6 - check inode connectivity...
            - traversing filesystem ...
            - traversal finished ...
            - moving disconnected inodes to lost+found ...
    disconnected inode 126543024, would move to lost+found
    Phase 7 - verify link counts...
    would have reset inode 126543024 nlinks from 0 to 1
    No modify flag set, skipping filesystem flush and exiting.

     

    Then I tried JorgeB's advice from 

    and when I mounted the disk using  mount -vt xfs -o noatime,nodiratime /dev/md1 /x, I got mount: /x: mount(2) system call failed: Structure needs cleaning.

     

    I suppose at this point I either need to do -L, or reformat the disk and rebuild from parity. Any suggestions on which are more likely to minimize data loss?

     

    The SMART data seems to indicate the drive is fine. Should I reformat the disk and rebuild from parity?

    tower-diagnostics-20210423-1250.zip

  6. I saw online some people were having this due to macvlan problems, which popped up in 6.8 or so, and were fixed, but are now back, and the log does seem to indicate that may be the problem. Unfortunately the containers I have using this broke when I took away their static ip (pihole, unifi, and a few others), so that is not an option. Do we know if the devs are working on a fix for this?

    • Like 1
  7. I began having kernel panics shortly after going to 6.9.0. After going to 6.9.1 I was able to keep the server up for about 2 days, but now I'm getting the panics again (2 in 3 days now). I was able to catch it while it was crashing this time and had the syslog on screen, and I've attached the portions of that I still had up. I did notice 3 threads at 100% utilization right before it crashed, but was unable to get the terminal to respond and bring up htop to find out what was using that cpu.

    syslog318.txt

  8. 22 minutes ago, vagrantprodigy said:

    I disabled docker, and renamed the old docker.img. It just crashed again. Most of these containers have been in place since early 2018, with the exact config they had prior to me renaming the docker image.

    I booted into GUI mode this time. When it froze, even the direct GUI was frozen for about 5 minutes. After that the local GUI was available, but the GUI was not available across the network. I could ping in/out of the box, though even that was intermittent.

  9. 2 minutes ago, Squid said:

    Its because of the inability to reach the internet.  One change in 6.8 was how docker icons are handled, and it necessitates that it redownload them, and its failing.  Until it does manage that, you will see that issue (unless it's a real issue and I can supply a couple commands to fix it)

     

    On the issue of not reaching the internet, because of all the bonding etc I'm not particularly able to help.

    Good to know. So I really just have one issue to fix, which is why the network setup works in 6.6.7, and not in 6.8.3. Hopefully someone is able to assist, I'd have to have to roll back to 6.6.7 for the 7th time.

  10. I upgraded from 6.6.7 to 6.8.3 today, and spent most of the day reinstalling containers, fixing plugins, etc to squash all of the bugs/incompatibilities. I have two remaining showstoppers. One is that the server can't reach the internet post upgrade. It appears to me that the default static route is for the wrong bridge, and therefore the traffic can't exit to the internet. The bridge it is trying to use is a local storage network (connects to my ESXi host). I have tried to delete this, but the delete button is not working.

     

    My other issue, possibly related, is that my containers are painfully slow, my docker page takes several minutes to load, and all of the icons for my containers are missing. I just see the ? icon instead of the icon that should appear for each container. The appdata for these is on NVME storage, and neither issue existed this morning on 6.6.7.

     

     

    unraid683error.JPG

    tower-diagnostics-20200312-1611.zip

  11. 11 minutes ago, trurl said:

    Just an update to the docker or an update to the application wouldn't have fixed the problem since the docker or the application wasn't the cause of the problem. Probably you changed something else that caused it to stop. I still think your docker image usage is too large at 38G, and docker allocation of 200G is ridiculous.

    Exactly my thoughts.

     

    Are you actually using all those NerdPack modules you install?

    I believe what stopped the plex overruns was an actual unRAID update, but it was quite a while ago, so I could be mistaken.

     

    I previously used the NerdPack modules, but I'm not at the moment. I'm going to uninstall the entire NerdPack plugin and reboot later today.

  12. 3 minutes ago, itimpi said:

    Looking at those diagnostics the error messages start appearing after the NerdPack plugin starts loading modules into RAM.    Do you get the same issues if you boot in Safe Mode (which stops plugins running) or even just stopping NerdPack from loading anything.    Using plugins always runs the risk of them loading code modules that are incompatible with the release of Unraid that you are running and thus de-stabilising the system.

    I'll give that a shot. Thanks.