unRAID unresponsive - shfs 100% cpu


P_K

Recommended Posts

I seem to have a similar problem. Came home yesterday to a completely unresponsive server. Had to reset as nothing else worked. Parity check came back clean. Just a while ago it was unresponsive again. Load of 875/875/875. Shfs process at 100%. No plugins, no docker. Don't have logs as wife was complaining because Kodi couldn't play files. So reset again. :(

 

I did the latest update on Thursday I believe.

Link to comment

I have this problem already since 6.3.0-rc5 . It's easy for me to replicate it. If i start writing new files to the cache array (which is 2x 250 gb evos BTRFS raid 1) server load keeps growing until 50-60, then eventually VM's starts dying, then docker apps. If I stop writing to the cache array then load goes down again to normal (< 1).

I noticed the problem when using FileBot Docker container to rename (move) files from cache array -> cache array. First 3-4 files are blazing fast, and then the load just keeps growing. Should i provide some logs, while I'm doing the renaming to further investigate this?

Edited by thomast_88
Link to comment
11 hours ago, thomast_88 said:

I have this problem already since 6.3.0-rc5 . It's easy for me to replicate it. If i start writing new files to the cache array (which is 2x 250 gb evos BTRFS raid 1) server load keeps growing until 50-60, then eventually VM's starts dying, then docker apps. If I stop writing to the cache array then load goes down again to normal (< 1).

I noticed the problem when using FileBot Docker container to rename (move) files from cache array -> cache array. First 3-4 files are blazing fast, and then the load just keeps growing. Should i provide some logs, while I'm doing the renaming to further investigate this?

 

It's not the same problem, but could be related. This problem causes a CPU core to be pegged at 100% continuously and the web-gui along with other stuff quits responding. It also never recovers.

Link to comment
  • 3 weeks later...

```

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND

14102 root      20   0 1259740  48048    876 S 200.0  0.3 605:19.65 shfs
```

 

- I do have cache_dirs, but not enabled for user shares

- Mostly reiser drives, but a few XFS

- Too froze to get diagnostics or anything :/ 

 

Link to comment
  • 2 weeks later...

was just about to post a separate topic when i found this.  very similar sounding, server locks up every two days or so.  dockers down, can't access webgui.  can log in with putty, but can't seem to issue any commands.  have had to pull power, obviously not good but nothing else works.  

 

can i roll back to an older version of unraid?  any issues doing that?

 

EDIT:  out of interest, i have two machines, one seems unaffected, however the one i'm referring to has had this issue a couple of weeks

Edited by grither
  • Upvote 1
Link to comment

I had this EXACT issue and failed to find the root cause after 30 hours of troubleshooting. 

 

 

My problem pointed to a software issue since it happened to me on completely disparate hardware using the same disks.

 

I built a new machine and mounted the RFS disks on snapraid for a stable mount. It wasn't a big deal since I needed to replace my ancient backup server.

 

The problem vanished after I migrated to new disks (including cache) and formatted as XFS.

 

I emailed Tom with suggestion that RFS be completely removed from support, but he is convinced that this is too extreme.

 

  • Upvote 1
Link to comment
On 3/23/2017 at 11:11 AM, ixnu said:

 

I emailed Tom with suggestion that RFS be completely removed from support, but he is convinced that this is too extreme.

 

 

It doesn't seem to be only RFS systems that have this issue which points to something else causing it...

 

On 2/16/2017 at 9:29 PM, the_larizzo said:

This is happening to me also but all my filesystems are XFS. My server with run for about 2 days then the load goes through the roof and I can no longer run any commands or shut off. I have to hold the power button to restart.

 

 

 

 

Edited by lionelhutz
  • Upvote 1
Link to comment
8 minutes ago, ixnu said:

I do not lurk much, but the first course of action always seems to be move off RFS. It's obviously a difficult problem, but seems to be far more common on RFS n'est pas?

 

Agree, seems to be the most common reason, also if the user has at least one xfs disk or a non reiser cache disk it's easy to confirm that is the cause of the problem by limiting all writes to non reiser disks and testing for a few days or weeks.

 

Link to comment
2 hours ago, berizzle said:

Yes of course. Been running this machines for years now. Maybe 5.

 

Just happened again and after a day of it not "coming back" I need to kill the machine.

 

Reiserfs disks seem to be the #1 reason for this issue, convert one of your disks to XFS, limit all writes to that disk for a few days/weeks by changing your shares(s) included disks and see if crashing stops, if yes convert remaining disks.

 

PS: IMO you should convert even if this isn't the source of the problem, there have been multiple issues with reiser lately and they have terrible performance in certain situations.

 

Link to comment
1 minute ago, johnnie.black said:

 

Reiserfs disks seem to be the #1 reason for this issue, convert one of your disks to XFS, limit all writes to that disk for a few days/weeks by changing your shares(s) included disks and see if crashing stops, if yes convert remaining disks.

 

PS: IMO you should convert even if this isn't the source of the problem, there have been multiple issues with reiser lately and they have terrible performance in certain situations.

 

I have 23 drives, 21 are Reiserfs 42TB and 2 XFS 6TB.

9TB free over all the drives.

Is there any process that makes sense to convert these disks?

Link to comment
7 minutes ago, berizzle said:

I have 23 drives, 21 are Reiserfs 42TB and 2 XFS 6TB.

9TB free over all the drives.

Is there any process that makes sense to convert these disks?

 

Since you already have 2 XFS disks you can test before doing the conversion and confirm if that will really help by limiting all your writes to those disks, go to your share(s) and set included disks to only those 2 (all share data on the other disks will still be accessible but all new writes will go to the XFS disks), test for a few days/weeks.

 

To convert see this thread:

 

 

Link to comment

I've been experiencing this since 6.3.2 as well, all XFS disks.  Seeing more and more of these threads pop up.  I've tried downgrading to 6.3.0, and made it past the 2 day mark that 6.3.2 would die at.  I made it to 4 days and it died during the night as well.  When I return from work, I'll be downgrading to 6.2.4.

 

While I agree that XFS > reiser, I feel there is something more at play here.

Link to comment

I have had this problem in a mixed ReiserFS/XFS system; converted all drives to XFS and no shfs lock ups since then (a couple of weeks now).

Of course it could have been some weird file/directory structure inconsistency situation that went away when the files/directories were newly created during copying.

Link to comment

I finished converting 14 disks from Reiser to XFS about 5 days ago and things seem stable since.  I realize this isn't that interesting of a data point because it's not a long time, but that's the longest uptime I've had since upgrading to 6.3.x.  And, agreed, I did everything via rsync so it's entirely possible that the act of recreating everything in a new dir structure cleaned up something.   Still, it seems that ReiserFS is a quasi-supported relic at this point, so moving off of it seems to be advised.  While the rsync method works well, it's a bit tedious...would be nice to have a more automated method for those of us who have been upgrading our systems over the years.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.