[6.8.3] "user" directory colored red


MvL

Recommended Posts

I had a strange issue and I like to know the possibilities causing it.

 

My "user" directory vanished in my /mnt directory. The "user0" directory was colored red. I discovered it when my Docker containers where failing. I don't have a log file because I forgot to generate one before I rebooted. 

Edited by MvL
Link to comment

It happened again and now I have downloaded the diagnostic file!

 

I have to correct my case the user directory is not gone but it is colored red. Also the containers are not working anymore. I have to reboot the server to make it work again. Has this something to do with the overlay file system?

 

I think the problems start with a  NAS which was mounted via unassigned devices. I powered down the NAS before I unmounted the share in unassigned devices. I think there goes something wrong.

tower-diagnostics-20200402-2007.zip

Link to comment

This is what made /mnt/user go way:

Apr  2 20:01:48 Tower shfs: shfs: ../lib/fuse.c:1451: unlink_node: Assertion `node->nlookup > 1' failed.

But no idea what caused it or what it means exaclty, you could try running for a day or so without any dockers/VMs, if no issues start enabling them one by one.

Link to comment

Hi Johnnie, thanks for having a look.

 

I have the impression that it happens when I mount my NAS with unassigned devices and forget to unmount it before I switch it off. I'm absolutely not certain if this is the problem. To start I don't mount any remote shares then if it happens again I will start with the containers and VM's.

Link to comment

Okay it happened again..

 

Quote

Apr 3 11:34:07 Tower shfs: shfs: ../lib/fuse.c:1451: unlink_node: Assertion `node->nlookup > 1' failed.

 

I was messing with 2 containers when it happened. To be clear nothing was mounted via unassigned devices. I was messing with linuxserver/mariadb and linuxserver/nextcloud. Yesterday when it happened I was also messing with containers, linuxserver/mariadb and linuxserver/piwigo. So if it is container problem than it must be the linuxserver/mariadb or there is something wrong with Docker in combination with SHFS? What uses SHFS? Unassigned devices? If it is the linuxserver/mariadb conainer there should be more reports?

 

Update:

 

No of course SHFS is also part of unRAID.

 

df -h
df: /mnt/user: Transport endpoint is not connected
Filesystem      Size  Used Avail Use% Mounted on
rootfs           63G  640M   63G   1% /
tmpfs            32M  372K   32M   2% /run
devtmpfs         63G     0   63G   0% /dev
tmpfs            63G     0   63G   0% /dev/shm
cgroup_root     8.0M     0  8.0M   0% /sys/fs/cgroup
tmpfs           128M  392K  128M   1% /var/log
/dev/sda1        29G  438M   29G   2% /boot
/dev/loop0      9.2M  9.2M     0 100% /lib/modules
/dev/loop1      7.3M  7.3M     0 100% /lib/firmware
tmpfs           1.0M     0  1.0M   0% /mnt/disks
/dev/md1        9.1T  9.1T   66M 100% /mnt/disk1
/dev/md2        9.1T  9.1T  7.3G 100% /mnt/disk2
/dev/md3        9.1T  867G  8.3T  10% /mnt/disk3
/dev/md5        9.1T  9.0T  135G  99% /mnt/disk5
/dev/md6        9.1T  9.0T  182G  99% /mnt/disk6
/dev/md7        9.1T  7.6T  1.6T  83% /mnt/disk7
/dev/md8        9.1T  7.3T  1.9T  80% /mnt/disk8
/dev/md9        9.1T  7.4T  1.7T  82% /mnt/disk9
/dev/md10       9.1T  5.5T  3.7T  60% /mnt/disk10
/dev/md13       9.1T  3.3T  5.9T  36% /mnt/disk13
/dev/sdb1       448G  336G  111G  76% /mnt/cache
shfs             91T   68T   24T  75% /mnt/user0   *********************************
/dev/sdi1       1.9T  156G  1.7T   9% /mnt/disks/downloads
/dev/sdc1       1.9T  1.4T  520G  73% /mnt/disks/WDC_WD20EARS-00MVWB0_WD-WMAZA2219709
/dev/loop2       64G  3.2G   59G   6% /var/lib/docker
/dev/loop3      1.0G   17M  905M   2% /etc/libvirt

 

I think I miss the "/mnt/user"? I have to compare that when I have rebooted the server.

Edited by MvL
Link to comment

Indeed it is missing!

 

root@Tower:~# df -h
Filesystem      Size  Used Avail Use% Mounted on
rootfs           63G  627M   63G   1% /
tmpfs            32M  372K   32M   2% /run
devtmpfs         63G     0   63G   0% /dev
tmpfs            63G     0   63G   0% /dev/shm
cgroup_root     8.0M     0  8.0M   0% /sys/fs/cgroup
tmpfs           128M  308K  128M   1% /var/log
/dev/sda1        29G  438M   29G   2% /boot
/dev/loop0      9.2M  9.2M     0 100% /lib/modules
/dev/loop1      7.3M  7.3M     0 100% /lib/firmware
tmpfs           1.0M     0  1.0M   0% /mnt/disks
/dev/md1        9.1T  9.1T   66M 100% /mnt/disk1
/dev/md2        9.1T  9.1T  7.3G 100% /mnt/disk2
/dev/md3        9.1T  867G  8.3T  10% /mnt/disk3
/dev/md5        9.1T  9.0T  135G  99% /mnt/disk5
/dev/md6        9.1T  9.0T  182G  99% /mnt/disk6
/dev/md7        9.1T  7.6T  1.6T  83% /mnt/disk7
/dev/md8        9.1T  7.3T  1.9T  80% /mnt/disk8
/dev/md9        9.1T  7.4T  1.7T  82% /mnt/disk9
/dev/md10       9.1T  5.5T  3.7T  60% /mnt/disk10
/dev/md13       9.1T  3.3T  5.9T  36% /mnt/disk13
/dev/sdb1       448G  336G  111G  76% /mnt/cache
shfs             91T   68T   24T  75% /mnt/user0
shfs             92T   68T   24T  75% /mnt/user
/dev/sdi1       1.9T  156G  1.7T   9% /mnt/disks/downloads
/dev/sdc1       1.9T  1.4T  520G  73% /mnt/disks/WDC_WD20EARS-00MVWB0_WD-WMAZA2219709
/dev/loop2       64G  3.2G   59G   6% /var/lib/docker

 

The MariaDB is the only container I also used yesterday when I was messing around. Going to try another database. Let see what happens! (Don't wanna report something if I'm not completely sure).

Edited by MvL
Link to comment

The latest picture is after a reboot so everything is normal again...

7 minutes ago, itimpi said:

What is missing?  /mnt/user is showing in the screenshot!

If you look to the first picture then it is missing. I also have this error in the logs:

 

Quote

Apr 3 11:34:07 Tower shfs: shfs: ../lib/fuse.c:1451: unlink_node: Assertion `node->nlookup > 1' failed.

 

 

Link to comment
  • 2 weeks later...
  • 1 month later...
  • 3 months later...
  • 1 year later...

Ran into this problem for the first time today. I was renaming and deleting some empty files on the cache drive when I lost access to all shares. I had a script that was adding and deleting in the same directory. I was using a samba share but the script was on unraid so not samba. I'm using 6.8.3.

 

Weirdly (and fortunately) VMs were still running fine and rebooting appears to have restoreed everything.

Link to comment
  • 1 month later...

This issue seems to pop up for me and I haven't seen it before. What I changed recently was I added a script that renames a file, actually a couple of times and then runs shred on it. Not sure if its the shred or the rename. The file remains in the same directory, I'm just changing it's extension. I mention this because I read another post where some one mentioned renames in the same dir related to this issue.

 

I am using 6.8.3.

 

I'm going to experiment with removing the mv (rename really) command.

Edited by scorcho99
Link to comment
  • 2 weeks later...
  • 3 weeks later...

Well, this happened to me again today. I thought it was running the shred command that did it but that isn't involved into today's case. I believe the trigger is again a script that using "mv" to rename a file and then deletes it. I'll try removing that part of the action and seeing it happens again. The shred command actually probably performs a similar action when unlinking a file name so they might be the same type of cause.

 

Attached are diagnostics in the degraded state. VMs seem to stay running if they were up and I actually still have access to an unassigned device share.

 

Not 100% sure this is the only case, but the files are being changed on a share on my cache drive, with is a SSD and uses btrfs.

 

I only have one docker, a minecraft server, and I haven't actually run it in months.

 

I have an NFS share, not sure that is relevant but I only enabled that in the past couple months and it sort of matches the timeline of when this started occurring. But the files are being edited through samba shares, not NFS.

tower-diagnostics-20220112-0934.zip

Link to comment

I'd rather not disable NFS since it's solving a problem I had with samba shares. I already disabled hardlinks awhile back.

 

Does it make sense that NFS is involved if nothing is interacting with any NFS shares at the time of failure? The share is mounted in a single VM that was not running. And actually...looking back its not even clear if I'd enabled NFS when I first had this problem.

 

I don't use Tdarr.

Link to comment
1 hour ago, scorcho99 said:

I believe the trigger is again a script that using "mv" to rename a file and then deletes it.

This is probably the problem, you can get likely around that if you can run the script on a disk share vs user share, or /mnt/user0 vs /mnt/user if the files are on the array.

Link to comment
1 hour ago, JorgeB said:

This is probably the problem, you can get likely around that if you can run the script on a disk share vs user share, or /mnt/user0 vs /mnt/user if the files are on the array.

 

I reworked the script to instead copy and delete the original since that was a workable solution for this case.

 

I think I figured out why shred was so effective at triggering it all of a sudden. I was using the "-u" option. If you look at the help for shred, by default that renames the file before deleting it! (rather it says it "obsfucates the filename before unlinking". I bet if I drop that option it will be OK.

 

Good ideas on the disk shares, not sure I'll go that route but I think it could be made to work.

 

Edited by scorcho99
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.