Entire folder directories become unresponsive

thatja · May 17

Managed to get this off the flash drive in /boot/logs

syslog

thatja · May 17

Crashed 10mins after a reboot this time.

syslog

thatja · May 17

plexified-diagnostics-20240517-1052.zip

/mnt inaccessible again

Edited May 17 by thatja

JorgeB · May 17

I still don't see anything relevant logged, did you try without mergefs?

thatja · May 17

Just now, JorgeB said:

I still don't see anything relevant logged, did you try without mergefs?

I have been using the system all morning without mergerfs and it was working fine, 10mins after I mounted mergefs I got a crash. I am highly suspecting either failing USB or mergerfs to be the cause. Which is worrying as my setup depends on mergerfs, and I'm not sure what's changed because for 6months it has been solid.

Rysz · May 17

Something is taking down your system at night:

May 17 01:00:35 Plexified emhttpd: unclean shutdown detected

Sorry, but you really need to start listening to us and re-trace your steps on what you have changed/updated recently... especially regarding your plugins. There are a ton of additional (and a few of them of quite invasive nature) plugins installed on your server... any of which could cause this issue. The fact that it isn't even able to generate a non-empty diagnostics package further underlines the fact that there's something seriously wrong with your server at the moment.

Again, you need to start listening to the advice given, try disabling your plugins one-by-one and see if and when your server starts working again. You needing some plugins for your daily business doesn't change the fact that it's impossible to diagnose the problem without disabling some plugins at least temporarily. You also, as already pointed out by @JorgeB, need to set up the syslog server to see what is happening before the crashing and not just afterwards. So far we've only seen the logs after the system reboots, not from before, which would likely show the problem.

JorgeB · May 17

1 minute ago, thatja said:

I am highly suspecting either failing USB

A failing USB drive usually leaves traces in the syslog, so don't think that is the problem, but you can try a different one.

thatja · May 17

11 minutes ago, Rysz said:
Something is taking down your system at night:
May 17 01:00:35 Plexified emhttpd: unclean shutdown detected
Sorry, but you really need to start listening to us and re-trace your steps on what you have changed/updated recently... especially regarding your plugins. There are a ton of additional (and a few of them of quite invasive nature) plugins installed on your server... any of which could cause this issue. The fact that it isn't even able to generate a non-empty diagnostics package further underlines the fact that there's something seriously wrong with your server at the moment.

Again, you need to start listening to the advice given, try disabling your plugins one-by-one and see if and when your server starts working again. You needing some plugins for your daily business doesn't change the fact that it's impossible to diagnose the problem without disabling some plugins at least temporarily. You also, as already pointed out by @JorgeB, need to set up the syslog server to see what is happening before the crashing and not just afterwards. So far we've only seen the logs after the system reboots, not from before, which would likely show the problem.

This is where I am stuck, the clean shutdown was because I could not get into ssh OR the UI, that was the first crash at 1AM.

Secondly, the last 2 syslog provided above was before I restarted the server. (after /mnt became inaccessible) - this was from /boot/logs as I did enable syslog server.

I have had the server with nothing running at all, no docker containers but Plex, no mergerfs and it was fine. As soon as I mounted my mounts, I got another crash.

Could you please elaborate on this?

Quote

(and a few of them of quite invasive nature)

Edited May 17 by thatja

Rysz · May 17

4 minutes ago, thatja said:

This is where I am stuck, the clean shutdown was because I could not get into ssh OR the UI, that was the first crash at 1AM.

Secondly, the last 2 syslog provided above was before I restarted the server. (after /mnt became inaccessible) - this was from /boot/logs as I did enable syslog server.

I have had the server with nothing running at all, no docker containers but Plex, no mergerfs and it was fine. As soon as I mounted my mounts, I got another crash.

OK and where are the logs from what happened before 01am?

Because the server seems to have crashed and rebooted at 01am, we need to know what happened before.

There's no indication in the logs that mergerFS isn't operating as it should.

The opposite actually, it doing garbage collection until the very end of your logs shows it's still running. 🤔

thatja · May 17

3 minutes ago, Rysz said:

OK and where are the logs from what happened before 01am?

Because the server seems to have crashed and rebooted at 01am, we need to know what happened before.

There's no indication in the logs that mergerFS isn't operating as it should.

The opposite actually, it doing garbage collection until the very end of your logs shows it's still running. 🤔

The unclean shutdown was because power was pulled from the server, this wasn't a crash related to UNRAID but a power outage on my end, sorry for the confusion regarding that.

The crashes today caused by UNRAID/Something else occurred at 10:30AMish and 10:50AMish. Those are what the syslogs above cover before/after events of.

Also nothing at all has changed between when things were working good, and the first ever crash relating to this, all I've done is update plugins/docker containers where they have updates available, I've had 6months without issue until the first crash happened at the time of this thread creation.

Edited May 17 by thatja

Rysz · May 17

10 minutes ago, thatja said:

The unclean shutdown was because power was pulled from the server, this wasn't a crash related to UNRAID but a power outage on my end, sorry for the confusion regarding that.

The crashes today caused by UNRAID/Something else occurred at 10:30AMish and 10:50AMish. Those are what the syslogs above cover before/after events of.

Well there's nothing in the logs to indicate a failure of any kind around those times, related to mergerFS or not. But the fact that it fails to even generate a diagnostics package makes me think that the rootfs-ramdisk (at /) is either full (with some plugin writing to it non-stop filling it up), not accessible or otherwise broken somehow. It isn't even able to write the syslog or any other files into the diagnostics package, which would lead me to my earlier belief that it has something to do with the RAM. How much RAM do you have on your server? How did you shutdown your server after it crashed - because there's nothing in the logs anymore after your last SSH login to the crashed server.

Edited May 17 by Rysz

thatja · May 17

7 minutes ago, Rysz said:

Well there's nothing in the logs to indicate a failure of any kind around those times, related to mergerFS or not. But the fact that it fails to even generate a diagnostics package makes me think that the rootfs-ramdisk (at /) is either full (with some plugin writing to it non-stop filling it up), not accessible or otherwise broken somehow. It isn't even able to write the syslog or any other files into the diagnostics package, which would lead me to my earlier belief that it has something to do with the RAM. How much RAM do you have on your server? How did you shutdown your server after it crashed - because there's nothing in the logs anymore after your last SSH login to the crashed server.

How would I find out about the rootfs-ramdisk being full? or likewise if a plugin is writing to it?

I haver 96GB of RAM in the server, I restarted the system via reboot on SSH using my phone on an app called Termius, only the web UI ssh isn't responsive.

Rysz · May 17

15 minutes ago, thatja said:

How would I find out about the rootfs-ramdisk being full? or likewise if a plugin is writing to it?

I haver 96GB of RAM in the server, I restarted the system via reboot on SSH using my phone on an app called Termius, only the web UI ssh isn't responsive.

OK that's very interesting because if you restarted via reboot command it should show more in the syslogs. It should show it shutting down services, the array etc... but there's nothing after your last SSH login, which again makes me think that the ramdisk is full or otherwise unwritable at that point.

The next time it gets stuck, don't instantly reboot, but SSH into it first and run the following commands:

df -h

and

cat /etc/mtab

and

ls -la /mnt

Please post the output of those commands here then, before rebooting your server.

Feel free to enable mergerFS again and wait for it to get stuck again, just so we can be sure. 🙂

Also... where did you put the mergerFS mount commands, how are you running them?

Edited May 17 by Rysz

thatja · May 17

1 minute ago, Rysz said:
OK that's very interesting because if you restarted via reboot command it should show more in the syslogs. It should show it shutting down services, the array etc... but there's nothing after your last SSH login, which again makes me think that the ramdisk is full.

The next time it gets stuck, don't instantly reboot, but SSH into it first and run the following command:
df -h
Please post the output of that command here then, before rebooting your server.

Feel free to enable mergerFS again and wait for it to get stuck again, just so we can be sure. 🙂

Also - where did you put the mergerFS mount commands, how are you running them?

Okay, I will do that.

As for mergerfs, when I boot up my server, I have a bash script that I created that mounts my rclone, mergerfs and autoscan. I run this file around a minute after I start my array.

Here's the script

#!/bin/bash

# Start a screen session named "files"
screen -dmS files

# Attach to the "files" screen session and execute the first command
screen -S google -X stuff $'rclone mount --config=/mnt/nvme/plexified/mounts/rclone/rclone.conf --allow-other --no-traverse --vfs-cache-mode full --cache-dir /mnt/nvmedl/plexified/mounts/googlecache/ --vfs-cache-max-size 250G --dir-cache-time 96h --vfs-fast-fingerprint --vfs-refresh --drive-impersonate [email protected] googledecrypted: /mnt/nvmedl/plexified/mounts/google/\n'

# Wait for the command to start
sleep 2

# Execute the mergerfs commands
mergerfs -o defaults,allow_other,use_ino,fsname=mergerFS /mnt/nvmedl/plexified/mounts/google/Data/MoviesSrc/0000/:/mnt/nvmedl/plexified/mounts/google/Data/MoviesSrc/0001:/mnt/nvmedl/plexified/mounts/google/Data/MoviesSrc/0002:/mnt/nvmedl/plexified/mounts/google/Data/MoviesSrc/0003:/mnt/nvmedl/plexified/mounts/google/MoviesSrc/0004/ /mnt/nvmedl/plexified/mounts/moviesrc/Movies/

sleep 2 # Wait for 2 seconds before running the next mergerfs command

mergerfs -o defaults,allow_other,use_ino,category.create=ff,fsname=mergerFS /mnt/user/plexdata/:/mnt/nvmedl/plexified/mounts/moviesrc=NC:/mnt/nvmedl/plexified/mounts/google/Data=NC /mnt/nvmedl/plexified/mounts/secret/

# Wait for 30 seconds before starting autoscan
sleep 30

# Start a screen session named "autoscan", change to the correct directory, and then run the autoscan command
screen -dmS autoscan
screen -S autoscan -X stuff $'cd /mnt/nvme/plexified/services/autoscan\n'
screen -S autoscan -X stuff $'./autoscan_v1.4.0_linux_amd64\n'

Then I start my docker containers.

Rysz · May 17

OK, I updated my before post with two more commands to run when it gets stuck - should hopefully narrow down the problem.

AgentXXL · May 17

@thatja I've been using the mergerfs plugin for a few months now and have seen no issues similar to yours. Looking through the syslogs you've managed to capture, there is nothing I can see that indicates a mergerfs problem. I suspect a RAM issue. I would suggest shutting down and running a RAM test using Memtest86. At least for 24 - 36 hrs since your crashes appear to happen in that time frame.

Also just to confirm, you do have syslog server (Settings --> Syslog Server) set to archive the syslog to a share/folder? Your syslogs don't seem to be retaining anything prior to the reboots/crashes, so they're a little less useful.

Edited May 17 by AgentXXL

Rysz · May 19

Any news?

thatja · May 31

It has just happened again after almost 12 days of uptime.

thatja · May 31

On 5/17/2024 at 11:48 AM, Rysz said:
OK that's very interesting because if you restarted via reboot command it should show more in the syslogs. It should show it shutting down services, the array etc... but there's nothing after your last SSH login, which again makes me think that the ramdisk is full or otherwise unwritable at that point.

The next time it gets stuck, don't instantly reboot, but SSH into it first and run the following commands:
df -h
and
cat /etc/mtab
and
ls -la /mnt
Please post the output of those commands here then, before rebooting your server.

Feel free to enable mergerFS again and wait for it to get stuck again, just so we can be sure. 🙂

Also... where did you put the mergerFS mount commands, how are you running them?

Okay, so I've just tried running the first one

df -h

And my ssh window is hanging atm. Has been for around 3minutes now

thatja · May 31

root@Plexified:~# cat /etc/mtab
rootfs / rootfs rw,size=49452720k,nr_inodes=12363180,inode64 0 0
proc /proc proc rw,relatime 0 0
sysfs /sys sysfs rw,relatime 0 0
tmpfs /run tmpfs rw,nosuid,nodev,noexec,relatime,size=32768k,mode=755,inode64 0 0
/dev/sda1 /boot vfat rw,noatime,nodiratime,fmask=0177,dmask=0077,codepage=437,iocharset=iso8859-1,shortname=mixed,flush,errors=remount-ro 0 0
/dev/loop0 /lib squashfs ro,relatime,errors=continue 0 0
overlay /lib overlay rw,relatime,lowerdir=/lib,upperdir=/var/local/overlay/lib,workdir=/var/local/overlay-work/lib 0 0
/dev/loop1 /usr squashfs ro,relatime,errors=continue 0 0
overlay /usr overlay rw,relatime,lowerdir=/usr,upperdir=/var/local/overlay/usr,workdir=/var/local/overlay-work/usr 0 0
devtmpfs /dev devtmpfs rw,relatime,size=8192k,nr_inodes=12363180,mode=755,inode64 0 0
devpts /dev/pts devpts rw,relatime,gid=5,mode=620,ptmxmode=000 0 0
tmpfs /dev/shm tmpfs rw,relatime,inode64 0 0
fusectl /sys/fs/fuse/connections fusectl rw,relatime 0 0
hugetlbfs /hugetlbfs hugetlbfs rw,relatime,pagesize=2M 0 0
cgroup2 /sys/fs/cgroup cgroup2 rw,nosuid,nodev,noexec,relatime,nsdelegate,memory_recursiveprot 0 0
tmpfs /var/log tmpfs rw,relatime,size=131072k,mode=755,inode64 0 0
rootfs /mnt rootfs rw,size=49452720k,nr_inodes=12363180,inode64 0 0
tmpfs /mnt/disks tmpfs rw,relatime,size=1024k,inode64 0 0
tmpfs /mnt/remotes tmpfs rw,relatime,size=1024k,inode64 0 0
tmpfs /mnt/addons tmpfs rw,relatime,size=1024k,inode64 0 0
tmpfs /mnt/rootshare tmpfs rw,relatime,size=1024k,inode64 0 0
/dev/md1p1 /mnt/disk1 xfs rw,noatime,nouuid,attr2,inode64,logbufs=8,logbsize=32k,noquota 0 0
/dev/md2p1 /mnt/disk2 xfs rw,noatime,nouuid,attr2,inode64,logbufs=8,logbsize=32k,noquota 0 0
/dev/md3p1 /mnt/disk3 xfs rw,noatime,nouuid,attr2,inode64,logbufs=8,logbsize=32k,noquota 0 0
/dev/md4p1 /mnt/disk4 xfs rw,noatime,nouuid,attr2,inode64,logbufs=8,logbsize=32k,noquota 0 0
/dev/md5p1 /mnt/disk5 xfs rw,noatime,nouuid,attr2,inode64,logbufs=8,logbsize=32k,noquota 0 0
/dev/md6p1 /mnt/disk6 xfs rw,noatime,nouuid,attr2,inode64,logbufs=8,logbsize=32k,noquota 0 0
/dev/md7p1 /mnt/disk7 xfs rw,noatime,nouuid,attr2,inode64,logbufs=8,logbsize=32k,noquota 0 0
/dev/nvme0n1p1 /mnt/nvme xfs rw,noatime,nouuid,attr2,inode64,logbufs=8,logbsize=32k,noquota 0 0
/dev/nvme1n1p1 /mnt/nvmedl btrfs rw,noatime,ssd,discard=async,space_cache=v2,subvolid=5,subvol=/ 0 0
shfs /mnt/user0 fuse.shfs rw,nosuid,nodev,noatime,user_id=0,group_id=0,default_permissions,allow_other 0 0
shfs /mnt/user fuse.shfs rw,nosuid,nodev,noatime,user_id=0,group_id=0,default_permissions,allow_other 0 0
/dev/loop2 /var/lib/docker btrfs rw,noatime,ssd,space_cache=v2,subvolid=5,subvol=/ 0 0
/dev/loop2 /var/lib/docker/btrfs btrfs rw,noatime,ssd,space_cache=v2,subvolid=5,subvol=/ 0 0
nsfs /run/docker/netns/fdcd16e64b60 nsfs rw 0 0
nsfs /run/docker/netns/default nsfs rw 0 0
/dev/loop3 /etc/libvirt btrfs rw,noatime,ssd,space_cache=v2,subvolid=5,subvol=/ 0 0
googledecrypted: /mnt/nvmedl/plexified/mounts/google fuse.rclone rw,nosuid,nodev,relatime,user_id=0,group_id=0,allow_other 0 0
mergerFS /mnt/nvmedl/plexified/mounts/moviesrc/Movies fuse.mergerfs rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other 0 0
mergerFS /mnt/nvmedl/plexified/mounts/secret fuse.mergerfs rw,nosuid,nodev,relatime,user_id=0,group_id=0,default_permissions,allow_other 0 0
nsfs /run/docker/netns/93f74b6ab06f nsfs rw 0 0
nsfs /run/docker/netns/5b147fc09a9a nsfs rw 0 0
nsfs /run/docker/netns/21aa0e5b657b nsfs rw 0 0
nsfs /run/docker/netns/efbd08065b39 nsfs rw 0 0
nsfs /run/docker/netns/d628991141dd nsfs rw 0 0
nsfs /run/docker/netns/021970ea8a50 nsfs rw 0 0
nsfs /run/docker/netns/cf6e4881fffc nsfs rw 0 0
nsfs /run/docker/netns/ef2fa253537f nsfs rw 0 0
nsfs /run/docker/netns/3a977d309e1d nsfs rw 0 0
nsfs /run/docker/netns/2382e5f02b25 nsfs rw 0 0
nsfs /run/docker/netns/a3dd8518c453 nsfs rw 0 0
nsfs /run/docker/netns/29bd7cce0d5e nsfs rw 0 0
tmpfs /run/user/0 tmpfs rw,nosuid,nodev,relatime,size=9893912k,nr_inodes=2473478,mode=700,inode64 0 0
nsfs /run/docker/netns/b5ada1918a90 nsfs rw 0 0
nsfs /run/docker/netns/3ca37854a52b nsfs rw 0 0
nsfs /run/docker/netns/db6327bbe313 nsfs rw 0 0

That's what I get when I run cat /etc/mtab

The other two just hang without an output.

image.png.859dbf926ce92e9f385caa7341948715.png

Edited May 31 by thatja

thatja · May 31

Getting Diagnostics through the Ui also freezes

And also trying to get them via "diagnostics" inside ssh also just hangs

Rysz · May 31

Just now, thatja said:

Getting Diagnostics through the Ui also freezes

And also trying to get them via "diagnostics" inside ssh also just hangs

Can you try: df -h /

thatja · May 31

Just now, Rysz said:

Can you try: df -h /

image.png.b31fa23088e1060a07fbef6566d09d37.png

thatja · May 31

/mnt is completely inaccessible aswel

Rysz · May 31

Just now, thatja said:

Ok, that rules out the theory of a full rootfs ramdisk. Did you test your RAM sticks with memtest in the meantime?

Entire folder directories become unresponsive

Recommended Posts

Link to comment

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

JorgeB

thatja

JorgeB

Posted Images

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation