vagrantprodigy

Members
  • Posts

    51
  • Joined

  • Last visited

Everything posted by vagrantprodigy

  1. It has to be the br0 ones, as I've turned the others off completely. I really don't want to mess around with vlans in my home network, and complicate things further. I'll probably spin up a VM in ESXi for docker for now, and if this isn't fixed in the next few months, I may just end up migrating to a new platform. 6.7 broke things for me, as did 6.8 and 6.8.3, so I came from 6.6.7. I promised myself prior to 6.9.0 if this was another failed upgrade, I'd look into alternatives to unRAID, which really sucks, as I have 2 unraid pro licenses, and have been using unRAID for several years.
  2. The kernel panics for me are getting more frequently, despite disabling all containers I don't absolutely need. All host network containers are disabled at this point, I have 2 on br0 that are running. I had my third kernel panic in 12 hours last night. Devs, is there a fix coming for this soon? If not, despite my 2 licenses, I need to start looking at other platforms, because this is causing a huge problem for me.
  3. Only of Appdata, Docs, stuff like that. It's an 8TB drive, so I don't have a backup of all of it. I ran du against the file, and it shows a size of 0, and the file command returns the value empty, so I'm assuming everything is there. Thank you for all of your help.
  4. I do have a lost+found folder, with 1 item in it. I don't have an exact list of what was on the disk previously.
  5. The disk has remounted, and I was able to get my old docker image to work again. Is there any way to tell what, if anything, was lost?
  6. xfs_repair -L /dev/md1 Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ALERT: The filesystem has valuable metadata changes in a log which is being destroyed because the -L option was used. - scan filesystem freespace and inode maps... agi unlinked bucket 48 is 126543024 in ag 0 (inode=126543024) invalid start block 1481005391 in record 351 of cnt btree block 2/28863905 agf_freeblks 13906192, counted 13906190 in ag 2 sb_icount 73728, counted 49024 sb_ifree 8475, counted 13377 sb_fdblocks 252972481, counted 355543473 - found root inode chunk Phase 3 - for each AG... - scan and clear agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... - check for inodes claiming duplicate blocks... - agno = 1 - agno = 2 - agno = 0 - agno = 3 - agno = 6 - agno = 7 - agno = 5 - agno = 4 Phase 5 - rebuild AG headers and trees... - reset superblock... Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... disconnected inode 126543024, moving to lost+found Phase 7 - verify and correct link counts... Maximum metadata LSN (78:3556288) is ahead of log (1:2). Format log to cycle 81. done
  7. My shares became unmounted a few minutes ago, and after a reboot, I'm getting errors that disk 1 is unmountable. The console is showing XFS (md1): Internal error i !: 1 at line 2111 of file fs/xfs/libxfs/xfs_)alloc.c. Caller xfs_free_ag_extent+0x3b7/0x602 [xfs] there are a few more lines, and it ends XFS (md1): Failed to recover intents In the meantime, all of my containers have vanished, I suspect my docker image was on that drive, and for whatever reason it isn't reading from parity? Edit: Actually, looks like all of the data on disk1 just isn't being read from parity. Any ideas on why that data would just be missing? Edit 2: I tried xfs_repair -n /dev/md1, and got: Phase 1 - find and verify superblock... Phase 2 - using internal log - zero log... ALERT: The filesystem has valuable metadata changes in a log which is being ignored because the -n option was used. Expect spurious inconsistencies which may be resolved by first mounting the filesystem to replay the log. - scan filesystem freespace and inode maps... agi unlinked bucket 48 is 126543024 in ag 0 (inode=126543024) invalid start block 1481005391 in record 351 of cnt btree block 2/28863905 agf_freeblks 13906192, counted 13906190 in ag 2 sb_icount 73728, counted 49024 sb_ifree 8475, counted 13377 sb_fdblocks 252972481, counted 355543473 - found root inode chunk Phase 3 - for each AG... - scan (but don't clear) agi unlinked lists... - process known inodes and perform inode discovery... - agno = 0 - agno = 1 - agno = 2 - agno = 3 - agno = 4 - agno = 5 - agno = 6 - agno = 7 - process newly discovered inodes... Phase 4 - check for duplicate blocks... - setting up duplicate extent list... free space (2,155157649-155157650) only seen by one free space btree - check for inodes claiming duplicate blocks... - agno = 3 - agno = 2 - agno = 1 - agno = 5 - agno = 4 - agno = 6 - agno = 0 - agno = 7 No modify flag set, skipping phase 5 Phase 6 - check inode connectivity... - traversing filesystem ... - traversal finished ... - moving disconnected inodes to lost+found ... disconnected inode 126543024, would move to lost+found Phase 7 - verify link counts... would have reset inode 126543024 nlinks from 0 to 1 No modify flag set, skipping filesystem flush and exiting. Then I tried JorgeB's advice from and when I mounted the disk using mount -vt xfs -o noatime,nodiratime /dev/md1 /x, I got mount: /x: mount(2) system call failed: Structure needs cleaning. I suppose at this point I either need to do -L, or reformat the disk and rebuild from parity. Any suggestions on which are more likely to minimize data loss? The SMART data seems to indicate the drive is fine. Should I reformat the disk and rebuild from parity? tower-diagnostics-20210423-1250.zip
  8. I had the crashes on 6.9.0 and 6.9.1, and updated to 6.9.2 shortly after it was released. Mine had a kernel panic overnight, so I can confirm this issue is NOT resolved. My original support thread with logs is here:
  9. I saw online some people were having this due to macvlan problems, which popped up in 6.8 or so, and were fixed, but are now back, and the log does seem to indicate that may be the problem. Unfortunately the containers I have using this broke when I took away their static ip (pihole, unifi, and a few others), so that is not an option. Do we know if the devs are working on a fix for this?
  10. I began having kernel panics shortly after going to 6.9.0. After going to 6.9.1 I was able to keep the server up for about 2 days, but now I'm getting the panics again (2 in 3 days now). I was able to catch it while it was crashing this time and had the syslog on screen, and I've attached the portions of that I still had up. I did notice 3 threads at 100% utilization right before it crashed, but was unable to get the terminal to respond and bring up htop to find out what was using that cpu. syslog318.txt
  11. For some reason disabling my 10GB connection fixed this. Very odd, but I suppose I'll have to live with gigabit for the moment.
  12. I just updated from 6.6.7 to 6.9, and my docker tab times out, and the plugins page does load after a minute or so, but the page isn't complete. I'd really like not to roll back to 6.6.7 for the fourth time, so if anyone can help me, I'd appreciate it. Diagnostics are attached. tower-diagnostics-20210303-1301.zip
  13. Any update on ZFS pools? I'd love to have a much more viable option than BTRFS for compression.
  14. I booted into GUI mode this time. When it froze, even the direct GUI was frozen for about 5 minutes. After that the local GUI was available, but the GUI was not available across the network. I could ping in/out of the box, though even that was intermittent.
  15. I disabled docker, and renamed the old docker.img. It just crashed again. Most of these containers have been in place since early 2018, with the exact config they had prior to me renaming the docker image.
  16. Since upgrading to 6.8.3 I am running a syslog server. How do I ensure this doesn't write to flash moving forward?
  17. My server has crashed twice this morning. I haven't made any changes since a few days ago when I upgraded from 6.6.7 to 6.8.3, and did a few things to clear errors. I've attached the syslog dump and diagnostics from the flash drive. syslog-20200318-104344.txttower-diagnostics-20200318-1044.zip
  18. I was able to fix this recently. The fix was to rename the syslog file on the flash drive (in the logs folder) to syslog.old. This generated a new file, and stopped the mass error messages. The original syslog file was up to 4GB in size, which I would assume is a hard limit for it.
  19. I seem to have fixed it. Adding a metric of 2 (versus the default of 1) to the storage network gateway made br0 take precedence, and I can now access external resources.
  20. no, I have a separate host. unRAID is on bare metal, as is ESXi.
  21. Good to know. So I really just have one issue to fix, which is why the network setup works in 6.6.7, and not in 6.8.3. Hopefully someone is able to assist, I'd have to have to roll back to 6.6.7 for the 7th time.
  22. I upgraded from 6.6.7 to 6.8.3 today, and spent most of the day reinstalling containers, fixing plugins, etc to squash all of the bugs/incompatibilities. I have two remaining showstoppers. One is that the server can't reach the internet post upgrade. It appears to me that the default static route is for the wrong bridge, and therefore the traffic can't exit to the internet. The bridge it is trying to use is a local storage network (connects to my ESXi host). I have tried to delete this, but the delete button is not working. My other issue, possibly related, is that my containers are painfully slow, my docker page takes several minutes to load, and all of the icons for my containers are missing. I just see the ? icon instead of the icon that should appear for each container. The appdata for these is on NVME storage, and neither issue existed this morning on 6.6.7. tower-diagnostics-20200312-1611.zip
  23. That file shows a syslog server at 192.168.0.90 ip address. That is my PRTG server. I believe I did have syslog routed there at one point. I'll stand a syslog sensor up again to see if that fixes it. If not, I'll rename the file and reboot.
  24. I don't have this on my server. My page map is attached. unraid_page_map.txt