Entire folder directories become unresponsive

thatja · May 13

Between 4-18hrs after server start, my entire folder structure becomes unresponsive and I cannot start or stop Docker services and get the following error

image.png.e6c124602e255f09120a308859ff7c7d.png

My Plex server becomes unresponsive, but all other apps like Sonarr UI loads, but can't import/delete/access any files as the storage system is locked up. The only thing thats fixes this is a hard reboot of the server, which is not practical to do multiple times per day. It can go anywhere from 4-18hours without doing it, before being stuck in this state again.

I have attached diagnostics to the post, but not sure if its a full one as I'm struggling to gain access due to the lockups.

Can anyone advise on what to do to troubleshoot this?

plexified-diagnostics-20240513-0855.zip

thatja · May 13

When running diagnostics download it gets stuck on

sed -ri 's/^(share(Comment|ReadList|WriteList)=")[^"]+/\1.../' '/plexified-diagnostics-20240513-0907/shares/a-----a.cfg' 2>/dev/null

thatja · May 13

/mnt is inaccessible via FTP or SSH.

thatja · May 13

syslog

JorgeB · May 13

Syslog in the diags is empty, try

cp /var/log/syslog /boot/syslog.txt

then attach it here.

thatja · May 13

Here you go.

syslog.TXT

JorgeB · May 13

Not seeing anything logged, is the server having the issue now?

thatja · May 13

The server had issues between 2:50AM-5:20AM and that's the logs that I grabbed when I woken. I've since rebooted the machine and all is working again.

However, this has been the pattern for the past 5 days, it'll stay up for 16-18hrs before the same problem occurs, its done it every day for the past 4 days.

The only real difference has been updating the nvidia GPU driver, its been stable for 39 days previous to upgrading the GPU driver. But could a GPU driver really cause the issue that's occuring?

Edited May 13 by thatja

JorgeB · May 13

I missed the syslog you posted before my first post, the one in the diags was empty, I'm still not seeing any errors, but I do see this around the time you mention:

May 13 02:48:58 Plexified mergerfs[20489]: running basic garbage collection
May 13 02:48:58 Plexified mergerfs[20489]: threadpool (fuse.read): spawning 32 threads w/ max queue depth 32
May 13 02:48:58 Plexified mergerfs[20489]: read-thread-count=32; process-thread-count=-1; process-thread-queue-depth=-1; pin-threads=false;
May 13 02:49:00 Plexified mergerfs[20580]: running basic garbage collection
May 13 02:49:00 Plexified mergerfs[20580]: threadpool (fuse.read): spawning 32 threads w/ max queue depth 32
May 13 02:49:00 Plexified mergerfs[20580]: read-thread-count=32; process-thread-count=-1; process-thread-queue-depth=-1; pin-threads=false;

Do you still have issue if you don't use mergefs?

thatja · May 13

7 minutes ago, JorgeB said:

I missed the syslog you posted before my first post, the one in the diags was empty, I'm still not seeing any errors, but I do see this around the time you mention:

May 13 02:48:58 Plexified mergerfs[20489]: running basic garbage collection
May 13 02:48:58 Plexified mergerfs[20489]: threadpool (fuse.read): spawning 32 threads w/ max queue depth 32
May 13 02:48:58 Plexified mergerfs[20489]: read-thread-count=32; process-thread-count=-1; process-thread-queue-depth=-1; pin-threads=false;
May 13 02:49:00 Plexified mergerfs[20580]: running basic garbage collection
May 13 02:49:00 Plexified mergerfs[20580]: threadpool (fuse.read): spawning 32 threads w/ max queue depth 32
May 13 02:49:00 Plexified mergerfs[20580]: read-thread-count=32; process-thread-count=-1; process-thread-queue-depth=-1; pin-threads=false;

Do you still have issue if you don't use mergefs?

Well I'm not sure what that means per se, but I've used mergerfs since I first started using UNRAID in around December 2023, and mergefs is crucial to my setup, as I am merging my google drive and storing new files on my unraid array, so without mergerfs my system doesn't really work.

Mergerfs has been updated quite a lot over the past couple of months, it could be a bad update I guess, but I'm not sure if there is a way to downgrade mergerfs?

thatja · May 13

Worth noting, I got the system back up at 9:50AM this morning, its now 15:56PM and I haven't had a crash, this is with mergerfs too. Not sure if that rules mergerfs out, or not.

JorgeB · May 13

25 minutes ago, thatja said:

Not sure if that rules mergerfs out, or not.

Not really, but it would be the first thing I would test, that is, running without it, if you can for a few hours just for testing.

Rysz · May 15

Can you please post the mergerFS scripts where you are setting up your mergerFS mounts?

I see no actual errors regarding mergerFS, but let's see your scripts just to be sure. 🙂

mergerFS garbage collection is normal and occurs every 15 minutes by default (according to manual).

Also... the log posted starts with a system reboot at 02:45am - did you do this reboot?

... or did the system crash and reboot itself? Since you say trouble started at 02:50am.

... 02:50am would be after that 02:45am reboot, so was it a crash or user-triggered reboot?

Also... just to provide a timeline here - since you say the troubles started around 5 days ago:

The mergerFS backend (the actual binary) has last been updated 26/03/2024.

The mergerFS frontend (calling your mergerFS mount scripts) has last been updated 26/04/2024.

Those frontend changes have been minor, only introducing a timeout so that array start cannot get stuck.

So both these updates would have been way outside of the 5 days where you experienced trouble...

But please do post your mergerFS mount scripts nevertheless, you never know! 🙂

@JorgeB: Seems more like a general system problem (perhaps RAM-related?) to me.

It's also weird that diagnostics did not include a syslog, perhaps some problems writing to the rootfs (RAM-)disk?
Also the user said /mnt itself was inaccessible, that directory should always exist regardless of any mounts being there.

Edited May 15 by Rysz

JorgeB · May 15

39 minutes ago, Rysz said:

Seems more like a general system problem

It may well be, I just wanted the user to test without mergefs to rule that out, since there's nothing else relevant logged that I can see that would explain folders going away.

Rysz · May 15

Just now, JorgeB said:

It may well be, I just wanted the user to test without mergefs to rule that out, since there's nothing else relevant logged that I can see that would explain folders going away.

Yes, that's definitely a good idea, was already thinking a step further there. 😄

thatja · May 15

Hi. I didn't reboot the servewr at 2.45AM.

As for my mergerfs, its a pretty simple command that has been working since I started using UNRAID in December.

mergerfs -o defaults,allow_other,use_ino,fsname=mergerFS /mnt/nvmedl/plexified/mounts/google/Data/MoviesSrc/0000/:/mnt/nvmedl/plexified/mounts/google/Data/MoviesSrc/0001:/mnt/nvmedl/plexified/mounts/google/Data/MoviesSrc/0002:/mnt/nvmedl/plexified/mounts/google/Data/MoviesSrc/0003:/mnt/nvmedl/plexified/mounts/google/MoviesSrc/0004/ /mnt/nvmedl/plexified/mounts/moviesrc/Movies/

AND

mergerfs -o defaults,allow_other,use_ino,category.create=ff,fsname=mergerFS /mnt/user/plexdata/:/mnt/nvmedl/plexified/mounts/moviesrc=NC:/mnt/nvmedl/plexified/mounts/google/Data=NC /mnt/nvmedl/plexified/mounts/secret/

Worth noting, that /mnt/user/plexdata is my array, the rest are all rclone mounts merged to make /mnt/nvmedl/plexified/mounts/secret/

nvmedl is the name of my cache drive and it is an nvme as the name suggests.

Previously, over the past 5 days it had been happening after around 18hours, however yesterday it managed 1 day and 7 hours uptime, before the same happened around 5 hours ago. Again, the only fix was to hard reboot the server. I did try unmounting mergerfs folders but that didn't help.

Edited May 15 by thatja

Rysz · May 15

7 minutes ago, thatja said:
Hi. I didn't reboot the servewr at 2.45AM.

As for my mergerfs, its a pretty simple command that has been working since I started using UNRAID in December.
mergerfs -o defaults,allow_other,use_ino,fsname=mergerFS /mnt/nvmedl/plexified/mounts/google/Data/MoviesSrc/0000/:/mnt/nvmedl/plexified/mounts/google/Data/MoviesSrc/0001:/mnt/nvmedl/plexified/mounts/google/Data/MoviesSrc/0002:/mnt/nvmedl/plexified/mounts/google/Data/MoviesSrc/0003:/mnt/nvmedl/plexified/mounts/google/MoviesSrc/0004/ /mnt/nvmedl/plexified/mounts/moviesrc/Movies/
AND
mergerfs -o defaults,allow_other,use_ino,category.create=ff,fsname=mergerFS /mnt/user/plexdata/:/mnt/nvmedl/plexified/mounts/moviesrc=NC:/mnt/nvmedl/plexified/mounts/google/Data=NC /mnt/nvmedl/plexified/mounts/secret/
Worth noting, that /mnt/user/plexdata is my array, the rest are all rclone mounts merged to make /mnt/nvmedl/plexified/mounts/secret/

nvmedl is the name of my cache drive and it is an nvme as the name suggests.

Looks good to me - and you're running this through array_start.sh or array_start_complete.sh, I'm guessing?

Something definitely shutdown your server before 02:45am, because the log starts with a server boot at 02:45am.

Did you notice any parity checks or anything that would indicate an unclean shutdown has happened?

Honestly if you changed nothing on the mergerFS scripts and they worked since December...

I'd start looking at the GPU driver or a general RAM issue; might be worth running an extended memtest.

... to see if your RAM experiences any troubles after (x) hours of testing ...

But @JorgeB is definitely more experienced at general support than me, so take this with a grain of salt.

I don't think mergerFS is causing this, but as suggested I would try disabling it first and see if the problems still happen.

thatja · May 15

3 hours ago, Rysz said:

Looks good to me - and you're running this through array_start.sh or array_start_complete.sh, I'm guessing?

Something definitely shutdown your server before 02:45am, because the log starts with a server boot at 02:45am.

Did you notice any parity checks or anything that would indicate an unclean shutdown has happened?

Honestly if you changed nothing on the mergerFS scripts and they worked since December...

I'd start looking at the GPU driver or a general RAM issue; might be worth running an extended memtest.

... to see if your RAM experiences any troubles after (x) hours of testing ...

But @JorgeB is definitely more experienced at general support than me, so take this with a grain of salt.

I don't think mergerFS is causing this, but as suggested I would try disabling it first and see if the problems still happen.

Funny you should mention GPU driver, it did update a couple hours before this first outage occurred. I will try going back one driver!

thatja · May 16

And it's down again, I've just woken up and my server is down in the same state as before. Not sure what to try next.

JorgeB · May 16

Post a new syslog in case there's something there now.

Rysz · May 16

17 minutes ago, thatja said:

And it's down again, I've just woken up and my server is down in the same state as before. Not sure what to try next.

Best post the diagnostics package now, hopefully there'll be a log this time. Was mergerFS disabled now?

thatja · May 16

4 hours ago, Rysz said:

Best post the diagnostics package now, hopefully there'll be a log this time. Was mergerFS disabled now?

I was unable to get into unraid at all, even to get diagnostics.

I can't really start my plex server without mergerfs so I'm not sure what to do.

Rysz · May 16

26 minutes ago, thatja said:

I was unable to get into unraid at all, even to get diagnostics.

I can't really start my plex server without mergerfs so I'm not sure what to do.

Even after a restart diagnostics can be useful, so please do post them if you're able to access the server now.

JorgeB · May 16

1 hour ago, thatja said:

I was unable to get into unraid at all,

You can enable the syslog server and post that after it happens again.

thatja · May 17

19 hours ago, JorgeB said:

You can enable the syslog server and post that after it happens again.

Okay so server restarted again. I can't access UI but can SSH, and when running "diagnostics" it just sticks on;

root@Plexified:/mnt# diagnostics
Starting diagnostics collection...

root@Plexified:/mnt# diagnostics
Starting diagnostics collection...

Could this be a case of a failing flash drive? or bad files on flash drive?

/mnt is once again inaccessible.

Entire folder directories become unresponsive

Recommended Posts

Link to comment

Top Posters In This Topic

Popular Days

Top Posters In This Topic

Popular Days

Popular Posts

JorgeB

thatja

JorgeB

Posted Images

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation