vagrantprodigy

April 23, 2021

50 minutes ago, itimpi said:

no way to know for certain then, but typically the list+found folder are where folders/files end up if their directory entry cannot be found. With only one file there it does not sound as if you had serious corruption. You can use the Linux ‘file’ command on the one in lost+found to help identify its type and thus what program to check it with.

do you have backups you can use to check against with what is on unRaid?

Only of Appdata, Docs, stuff like that. It's an 8TB drive, so I don't have a backup of all of it.

I ran du against the file, and it shows a size of 0, and the file command returns the value empty, so I'm assuming everything is there. Thank you for all of your help.

April 23, 2021

13 minutes ago, itimpi said:

Not for certain if you do not have a list of what was there before. However, if no lost+found folder was created on the drive then it is highly likely that nothing was lost.

I do have a lost+found folder, with 1 item in it. I don't have an exact list of what was on the disk previously.

April 23, 2021

The disk has remounted, and I was able to get my old docker image to work again. Is there any way to tell what, if anything, was lost?

April 23, 2021

xfs_repair -L /dev/md1
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
destroyed because the -L option was used.
- scan filesystem freespace and inode maps...
agi unlinked bucket 48 is 126543024 in ag 0 (inode=126543024)
invalid start block 1481005391 in record 351 of cnt btree block 2/28863905
agf_freeblks 13906192, counted 13906190 in ag 2
sb_icount 73728, counted 49024
sb_ifree 8475, counted 13377
sb_fdblocks 252972481, counted 355543473
- found root inode chunk
Phase 3 - for each AG...
- scan and clear agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 1
- agno = 2
- agno = 0
- agno = 3
- agno = 6
- agno = 7
- agno = 5
- agno = 4
Phase 5 - rebuild AG headers and trees...
- reset superblock...
Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
disconnected inode 126543024, moving to lost+found
Phase 7 - verify and correct link counts...
Maximum metadata LSN (78:3556288) is ahead of log (1:2).
Format log to cycle 81.
done

April 23, 2021

3 minutes ago, itimpi said:

NO! If you issue a format that will erase the contents of the disk and update parity to reflect this.

You should run the xfs_repair command without -n and with -L to repair the file system.

Thank you. I will do that now.

April 23, 2021

My shares became unmounted a few minutes ago, and after a reboot, I'm getting errors that disk 1 is unmountable. The console is showing

XFS (md1): Internal error i !: 1 at line 2111 of file fs/xfs/libxfs/xfs_)alloc.c. Caller xfs_free_ag_extent+0x3b7/0x602 [xfs]

there are a few more lines, and it ends XFS (md1): Failed to recover intents

In the meantime, all of my containers have vanished, I suspect my docker image was on that drive, and for whatever reason it isn't reading from parity?

Edit: Actually, looks like all of the data on disk1 just isn't being read from parity. Any ideas on why that data would just be missing?

Edit 2: I tried xfs_repair -n /dev/md1, and got:

Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
ignored because the -n option was used. Expect spurious inconsistencies
which may be resolved by first mounting the filesystem to replay the log.
- scan filesystem freespace and inode maps...
agi unlinked bucket 48 is 126543024 in ag 0 (inode=126543024)
invalid start block 1481005391 in record 351 of cnt btree block 2/28863905
agf_freeblks 13906192, counted 13906190 in ag 2
sb_icount 73728, counted 49024
sb_ifree 8475, counted 13377
sb_fdblocks 252972481, counted 355543473
- found root inode chunk
Phase 3 - for each AG...
- scan (but don't clear) agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
free space (2,155157649-155157650) only seen by one free space btree
- check for inodes claiming duplicate blocks...
- agno = 3
- agno = 2
- agno = 1
- agno = 5
- agno = 4
- agno = 6
- agno = 0
- agno = 7
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
disconnected inode 126543024, would move to lost+found
Phase 7 - verify link counts...
would have reset inode 126543024 nlinks from 0 to 1
No modify flag set, skipping filesystem flush and exiting.

Then I tried JorgeB's advice from

and when I mounted the disk using mount -vt xfs -o noatime,nodiratime /dev/md1 /x, I got mount: /x: mount(2) system call failed: Structure needs cleaning.

I suppose at this point I either need to do -L, or reformat the disk and rebuild from parity. Any suggestions on which are more likely to minimize data loss?

The SMART data seems to indicate the drive is fine. Should I reformat the disk and rebuild from parity?

tower-diagnostics-20210423-1250.zip

March 19, 2021

I saw online some people were having this due to macvlan problems, which popped up in 6.8 or so, and were fixed, but are now back, and the log does seem to indicate that may be the problem. Unfortunately the containers I have using this broke when I took away their static ip (pihole, unifi, and a few others), so that is not an option. Do we know if the devs are working on a fix for this?

March 18, 2021

I began having kernel panics shortly after going to 6.9.0. After going to 6.9.1 I was able to keep the server up for about 2 days, but now I'm getting the panics again (2 in 3 days now). I was able to catch it while it was crashing this time and had the syslog on screen, and I've attached the portions of that I still had up. I did notice 3 threads at 100% utilization right before it crashed, but was unable to get the terminal to respond and bring up htop to find out what was using that cpu.

syslog318.txt

March 3, 2021

For some reason disabling my 10GB connection fixed this. Very odd, but I suppose I'll have to live with gigabit for the moment.

March 3, 2021

I just updated from 6.6.7 to 6.9, and my docker tab times out, and the plugins page does load after a minute or so, but the page isn't complete. I'd really like not to roll back to 6.6.7 for the fourth time, so if anyone can help me, I'd appreciate it. Diagnostics are attached.

tower-diagnostics-20210303-1301.zip

March 18, 2020

22 minutes ago, vagrantprodigy said:

I disabled docker, and renamed the old docker.img. It just crashed again. Most of these containers have been in place since early 2018, with the exact config they had prior to me renaming the docker image.

I booted into GUI mode this time. When it froze, even the direct GUI was frozen for about 5 minutes. After that the local GUI was available, but the GUI was not available across the network. I could ping in/out of the box, though even that was intermittent.

March 18, 2020

I disabled docker, and renamed the old docker.img. It just crashed again. Most of these containers have been in place since early 2018, with the exact config they had prior to me renaming the docker image.

March 18, 2020

Since upgrading to 6.8.3 I am running a syslog server. How do I ensure this doesn't write to flash moving forward?

March 18, 2020

My server has crashed twice this morning. I haven't made any changes since a few days ago when I upgraded from 6.6.7 to 6.8.3, and did a few things to clear errors. I've attached the syslog dump and diagnostics from the flash drive.

syslog-20200318-104344.txt tower-diagnostics-20200318-1044.zip

March 14, 2020

I was able to fix this recently. The fix was to rename the syslog file on the flash drive (in the logs folder) to syslog.old. This generated a new file, and stopped the mass error messages. The original syslog file was up to 4GB in size, which I would assume is a hard limit for it.

March 14, 2020

I seem to have fixed it. Adding a metric of 2 (versus the default of 1) to the storage network gateway made br0 take precedence, and I can now access external resources.

March 12, 2020

no, I have a separate host. unRAID is on bare metal, as is ESXi.

March 12, 2020

2 minutes ago, Squid said:

Its because of the inability to reach the internet. One change in 6.8 was how docker icons are handled, and it necessitates that it redownload them, and its failing. Until it does manage that, you will see that issue (unless it's a real issue and I can supply a couple commands to fix it)

On the issue of not reaching the internet, because of all the bonding etc I'm not particularly able to help.

Good to know. So I really just have one issue to fix, which is why the network setup works in 6.6.7, and not in 6.8.3. Hopefully someone is able to assist, I'd have to have to roll back to 6.6.7 for the 7th time.

March 12, 2020

I upgraded from 6.6.7 to 6.8.3 today, and spent most of the day reinstalling containers, fixing plugins, etc to squash all of the bugs/incompatibilities. I have two remaining showstoppers. One is that the server can't reach the internet post upgrade. It appears to me that the default static route is for the wrong bridge, and therefore the traffic can't exit to the internet. The bridge it is trying to use is a local storage network (connects to my ESXi host). I have tried to delete this, but the delete button is not working.

My other issue, possibly related, is that my containers are painfully slow, my docker page takes several minutes to load, and all of the icons for my containers are missing. I just see the ? icon instead of the icon that should appear for each container. The appdata for these is on NVME storage, and neither issue existed this morning on 6.6.7.

tower-diagnostics-20200312-1611.zip

January 14, 2020

That file shows a syslog server at 192.168.0.90 ip address. That is my PRTG server. I believe I did have syslog routed there at one point. I'll stand a syslog sensor up again to see if that fixes it. If not, I'll rename the file and reboot.

January 14, 2020

1 minute ago, itimpi said:

Settings->Syslog Server

I don't have this on my server. My page map is attached.

unraid_page_map.txt

January 14, 2020

1 hour ago, itimpi said:

There are network errors around not being able to contact the syslog server you have configured to be used at address 192.168.0.50. Do you have such a server up and running a syslog server>?

I don't. Do you know offhand where the setting to disable this is?

January 14, 2020

I am still getting an error in safe mode, after NerdPack was uninstalled. I've attached the safe mode diagnostic.

tower-diagnostics-20200114-1045.zip

January 14, 2020

11 minutes ago, trurl said:

Just an update to the docker or an update to the application wouldn't have fixed the problem since the docker or the application wasn't the cause of the problem. Probably you changed something else that caused it to stop. I still think your docker image usage is too large at 38G, and docker allocation of 200G is ridiculous.

Exactly my thoughts.

Are you actually using all those NerdPack modules you install?

I believe what stopped the plex overruns was an actual unRAID update, but it was quite a while ago, so I could be mistaken.

I previously used the NerdPack modules, but I'm not at the moment. I'm going to uninstall the entire NerdPack plugin and reboot later today.

January 14, 2020

3 minutes ago, itimpi said:

Looking at those diagnostics the error messages start appearing after the NerdPack plugin starts loading modules into RAM. Do you get the same issues if you boot in Safe Mode (which stops plugins running) or even just stopping NerdPack from loading anything. Using plugins always runs the risk of them loading code modules that are incompatible with the release of Unraid that you are running and thus de-stabilising the system.

I'll give that a shot. Thanks.

vagrantprodigy

Posts

Joined

Last visited

Content Type

Profiles

Forums

Downloads

Store

Gallery

Bug Reports

Documentation

Landing

Posts posted by vagrantprodigy

Disk 1 unmountable

Disk 1 unmountable

Disk 1 unmountable

Disk 1 unmountable

Disk 1 unmountable

Disk 1 unmountable

Kernel Panics occurring only on 6.9.0 and 6.9.1

Kernel Panics occurring only on 6.9.0 and 6.9.1

Docker tab won't load in 6.9, also can't update plugins

Docker tab won't load in 6.9, also can't update plugins

Unable to reach internet after upgrading to 6.8.3

Unable to reach internet after upgrading to 6.8.3

Unable to reach internet after upgrading to 6.8.3

Unable to reach internet after upgrading to 6.8.3

Unable to reach internet after upgrading to 6.8.3

Unable to reach internet after upgrading to 6.8.3

Unable to reach internet after upgrading to 6.8.3

Unable to reach internet after upgrading to 6.8.3

Unable to reach internet after upgrading to 6.8.3

Unable to reach internet after upgrading to 6.8.3

Unable to reach internet after upgrading to 6.8.3

Unable to reach internet after upgrading to 6.8.3

Unable to reach internet after upgrading to 6.8.3

Unable to reach internet after upgrading to 6.8.3

Unable to reach internet after upgrading to 6.8.3