Jump to content
P_K

unRAID unresponsive - shfs 100% cpu

71 posts in this topic Last Reply

Recommended Posts

Well, there is really no mention of ANYTHING to do with this from LT and apparently an all XFS system does this too so I'm not convinced it's a RFS problem.

 

It's always been best to install windows on a clean partition so that hardly qualifies as a comparison to having to wipe-out many drives worth of data...

Share this post


Link to post

I seem to have a similar problem. Came home yesterday to a completely unresponsive server. Had to reset as nothing else worked. Parity check came back clean. Just a while ago it was unresponsive again. Load of 875/875/875. Shfs process at 100%. No plugins, no docker. Don't have logs as wife was complaining because Kodi couldn't play files. So reset again. :(

 

I did the latest update on Thursday I believe.

Share this post


Link to post

Have you considered, either as an aid to troubleshooting or as a temporary work-round, disabling user shares?

Share this post


Link to post

I have this problem already since 6.3.0-rc5 . It's easy for me to replicate it. If i start writing new files to the cache array (which is 2x 250 gb evos BTRFS raid 1) server load keeps growing until 50-60, then eventually VM's starts dying, then docker apps. If I stop writing to the cache array then load goes down again to normal (< 1).

I noticed the problem when using FileBot Docker container to rename (move) files from cache array -> cache array. First 3-4 files are blazing fast, and then the load just keeps growing. Should i provide some logs, while I'm doing the renaming to further investigate this?

Edited by thomast_88

Share this post


Link to post
11 hours ago, thomast_88 said:

I have this problem already since 6.3.0-rc5 . It's easy for me to replicate it. If i start writing new files to the cache array (which is 2x 250 gb evos BTRFS raid 1) server load keeps growing until 50-60, then eventually VM's starts dying, then docker apps. If I stop writing to the cache array then load goes down again to normal (< 1).

I noticed the problem when using FileBot Docker container to rename (move) files from cache array -> cache array. First 3-4 files are blazing fast, and then the load just keeps growing. Should i provide some logs, while I'm doing the renaming to further investigate this?

 

It's not the same problem, but could be related. This problem causes a CPU core to be pegged at 100% continuously and the web-gui along with other stuff quits responding. It also never recovers.

Share this post


Link to post

```

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND

14102 root      20   0 1259740  48048    876 S 200.0  0.3 605:19.65 shfs
```

 

- I do have cache_dirs, but not enabled for user shares

- Mostly reiser drives, but a few XFS

- Too froze to get diagnostics or anything :/ 

 

Share this post


Link to post

Has anyone found a fix for machines in this state?

 

One of my unRAID machines is having the same issue 

 

Share this post


Link to post
Has anyone found a fix for machines in this state?
 
One of my unRAID machines is having the same issue 
 


Any reiserfs disks?

Share this post


Link to post

was just about to post a separate topic when i found this.  very similar sounding, server locks up every two days or so.  dockers down, can't access webgui.  can log in with putty, but can't seem to issue any commands.  have had to pull power, obviously not good but nothing else works.  

 

can i roll back to an older version of unraid?  any issues doing that?

 

EDIT:  out of interest, i have two machines, one seems unaffected, however the one i'm referring to has had this issue a couple of weeks

Edited by grither

Share this post


Link to post

I had this EXACT issue and failed to find the root cause after 30 hours of troubleshooting. 

 

 

My problem pointed to a software issue since it happened to me on completely disparate hardware using the same disks.

 

I built a new machine and mounted the RFS disks on snapraid for a stable mount. It wasn't a big deal since I needed to replace my ancient backup server.

 

The problem vanished after I migrated to new disks (including cache) and formatted as XFS.

 

I emailed Tom with suggestion that RFS be completely removed from support, but he is convinced that this is too extreme.

 

Share this post


Link to post
On 3/23/2017 at 11:11 AM, ixnu said:

 

I emailed Tom with suggestion that RFS be completely removed from support, but he is convinced that this is too extreme.

 

 

It doesn't seem to be only RFS systems that have this issue which points to something else causing it...

 

On 2/16/2017 at 9:29 PM, the_larizzo said:

This is happening to me also but all my filesystems are XFS. My server with run for about 2 days then the load goes through the roof and I can no longer run any commands or shut off. I have to hold the power button to restart.

 

 

 

 

Edited by lionelhutz

Share this post


Link to post

I do not lurk much, but the first course of action always seems to be move off RFS. It's obviously a difficult problem, but seems to be far more common on RFS n'est pas?

Share this post


Link to post
8 minutes ago, ixnu said:

I do not lurk much, but the first course of action always seems to be move off RFS. It's obviously a difficult problem, but seems to be far more common on RFS n'est pas?

 

Agree, seems to be the most common reason, also if the user has at least one xfs disk or a non reiser cache disk it's easy to confirm that is the cause of the problem by limiting all writes to non reiser disks and testing for a few days or weeks.

 

Share this post


Link to post
56 minutes ago, the_larizzo said:

I ended up having to fallback to 6.2.4. Server would lock up every 2 days, XFS only.

did this help?  please let us know.  also, did you have to rebuild all dockers or did they survive the rollback?

 

Share this post


Link to post
On 3/22/2017 at 3:15 AM, johnnie.black said:

 


Any reiserfs disks?

 

Yes of course. Been running this machines for years now. Maybe 5.

 

Just happened again and after a day of it not "coming back" I need to kill the machine.

Share this post


Link to post
2 hours ago, berizzle said:

Yes of course. Been running this machines for years now. Maybe 5.

 

Just happened again and after a day of it not "coming back" I need to kill the machine.

 

Reiserfs disks seem to be the #1 reason for this issue, convert one of your disks to XFS, limit all writes to that disk for a few days/weeks by changing your shares(s) included disks and see if crashing stops, if yes convert remaining disks.

 

PS: IMO you should convert even if this isn't the source of the problem, there have been multiple issues with reiser lately and they have terrible performance in certain situations.

 

Share this post


Link to post
1 minute ago, johnnie.black said:

 

Reiserfs disks seem to be the #1 reason for this issue, convert one of your disks to XFS, limit all writes to that disk for a few days/weeks by changing your shares(s) included disks and see if crashing stops, if yes convert remaining disks.

 

PS: IMO you should convert even if this isn't the source of the problem, there have been multiple issues with reiser lately and they have terrible performance in certain situations.

 

I have 23 drives, 21 are Reiserfs 42TB and 2 XFS 6TB.

9TB free over all the drives.

Is there any process that makes sense to convert these disks?

Share this post


Link to post
7 minutes ago, berizzle said:

I have 23 drives, 21 are Reiserfs 42TB and 2 XFS 6TB.

9TB free over all the drives.

Is there any process that makes sense to convert these disks?

 

Since you already have 2 XFS disks you can test before doing the conversion and confirm if that will really help by limiting all your writes to those disks, go to your share(s) and set included disks to only those 2 (all share data on the other disks will still be accessible but all new writes will go to the XFS disks), test for a few days/weeks.

 

To convert see this thread:

 

 

Share this post


Link to post

I've been experiencing this since 6.3.2 as well, all XFS disks.  Seeing more and more of these threads pop up.  I've tried downgrading to 6.3.0, and made it past the 2 day mark that 6.3.2 would die at.  I made it to 4 days and it died during the night as well.  When I return from work, I'll be downgrading to 6.2.4.

 

While I agree that XFS > reiser, I feel there is something more at play here.

Share this post


Link to post

I'm curious how many people have converted all their drives to XFS and eliminated the problem. To me, there are far more systems with RFS drives out there so that could explain why more systems with RFS drives have the issue.

Share this post


Link to post

I have had this problem in a mixed ReiserFS/XFS system; converted all drives to XFS and no shfs lock ups since then (a couple of weeks now).

Of course it could have been some weird file/directory structure inconsistency situation that went away when the files/directories were newly created during copying.

Share this post


Link to post

I finished converting 14 disks from Reiser to XFS about 5 days ago and things seem stable since.  I realize this isn't that interesting of a data point because it's not a long time, but that's the longest uptime I've had since upgrading to 6.3.x.  And, agreed, I did everything via rsync so it's entirely possible that the act of recreating everything in a new dir structure cleaned up something.   Still, it seems that ReiserFS is a quasi-supported relic at this point, so moving off of it seems to be advised.  While the rsync method works well, it's a bit tedious...would be nice to have a more automated method for those of us who have been upgrading our systems over the years.

Share this post


Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.