unRAID unresponsive - shfs 100% cpu


P_K

Recommended Posts

  • 2 months later...

I've been having the "100% CPU on shfs, everything unresponsive" about once a week, and having to hard reboot.  Been happening on all of 6.3.x's (including the current 6.3.5), is the only thing I don't love about unRaid.

 

I have almost all reiserfs drives, guess it's time to start the slow migration off...

Screen Shot 2017-06-29 at 10.33.30 AM.png

Edited by JustinAiken
Added screenshot of the pain
Link to comment
11 minutes ago, JustinAiken said:

I have almost all reiserfs drives, guess it's time to start the slow migration off...

That's exactly what happened to me. How much free space do you have altogether on all your data drives? Post a screenshot of your Main page of the GUI if you want a gameplan to start the process.

 

Also, do you keep strict control over which files / shares reside on which slot number?

Link to comment

> Post a screenshot of your Main page of the GUI if you want a gameplan to start the process.

 

Attached - You can see the last drive is an empty 8TB - that was my precleared hot spare, I just added it in, formatted as XFS, and started moving data from my smallest drive over.  I figure I'll move a few 1.5TB/2TB drives onto there, then xfs them up.  I also have another 8TB spare (precleared, but no room to plug in - was going to keep for next time a drive died).

 

> Also, do you keep strict control over which files / shares reside on which slot number?

 

Nah... occasionally I use `diskmv` to load balance them so related folders are grouped, but don't have any per-drive shares.

drives.png

Edited by JustinAiken
Link to comment
50 minutes ago, JustinAiken said:

I just added it in, formatted as XFS, and started moving data from my smallest drive over.

It would be faster to do it largest to smallest, that way you wouldn't be dealing with multiple source and destinations to get the transfer done. ReiserFS is extremely bad about slow deletions, so moving is not desirable at all.

 

I would copy, not move, the entire contents of the 8TB ReiserFS disk to your new empty XFS disk, then when the copy is complete and verified, format the ReiserFS disk. It's much quicker to format than it is to delete files.

 

Also, I personally found having BTRFS disks in use while I still had the ReiserFS hangs was suicidal. I ended up corrupting a BTRFS disk likely BECAUSE of the way Reiser was acting. I'd personally avoid accessing the BTRFS disk until the conversion is done, or go ahead and convert that to XFS for now.

 

I would only use BTRFS on a perfectly stable server. It's too brittle for my liking.

 

With 20TB of free space, you could easily have 16TB of copying in progress at once, it shouldn't take long to get the whole thing done.

 

Just don't write to any ReiserFS disks, including deletions, while you are copying.

  • Upvote 1
Link to comment

Oh, another thing, if this was my server, I'd keep the smallest drives last for another reason. After the last of the small ReiserFS drives contents were copied, I'd simply remove them completely. By my count, you can keep your extra 8TB that isn't in the case right now as a spare, make it hot spare if you want, and completely REMOVE all the 2TB and under drives. I'll bet if you do that your parity check times would drop by HOURS, and your heat and power consumption would drop by a bunch as well.

 

Now is the ideal time!

Link to comment

It would be faster to do it largest to smallest

 

Ah crap - already started the smallest... Ah well, after I'm done cleaning off the 1.5TB drive, I'll take it out of the array, and use that physical slot to bring in the other blank 8TB so I can start moving the big reiserFS over that that.

 

I would copy, not move, the entire contents of the 8TB ReiserFS disk to your new empty XFS disk

 

How do you do that, in such a way that parity isn't invalidated? Doesn't unraid get confused when it has the same file on two different disks?

 

> then when the copy is complete and verified, format the ReiserFS disk.

 

How to verify?

 

 and completely REMOVE all the 2TB and under drives.

 

Planning on removing the 1.5TB and at least some of the 2TB's... 

 

I would only use BTRFS on a perfectly stable server. It's too brittle for my liking.

 

I'll get that one before the rest of the reisers - I'm not attached to BTRFS, I just randomly picked that one back before the strong community preference for XFS had formed :P 

Link to comment
1 hour ago, JustinAiken said:

after I'm done cleaning off the 1.5TB drive, I'll take it out of the array, and use that physical slot to bring in the other blank 8TB so I can start moving the big reiserFS over that that.

Removing a disk will require rebuilding parity.

 

Another approach would be to reformat the 1.5TB to XFS after you get the files off it, then rebuild the newly formatted 1.5TB XFS drive onto the 8TB. Since your sig indicates you have dual parity this would probably be a better idea than removing a disk then rebuilding parity.

 

1 hour ago, JustinAiken said:

How do you do that, in such a way that parity isn't invalidated? Doesn't unraid get confused when it has the same file on two different disks?

Not sure why you think parity would care. unRAID parity is realtime, and anytime a write occurs to a data disk, parity is updated. And the other thing that people often don't think of: All changes to a data disk are write operations! Deleting writes the filesystem metadata that marks the file deleted, formatting writes an empty filesystem. Deleting or formatting update parity because they are writes.

 

The thing that might get confusing is which file gets read or written when multiple files with the same path exist in a user share because they are on different disks. But since you are not using user shares at all when copying disk-to-disk, it doesn't matter, and since the files are identical it won't matter if you happen to read from the user share. Might be a good idea to avoid writing as much as possible while doing the conversion. It will make things faster that way too if you don't have other things reading or writing data.

Link to comment
1 hour ago, JustinAiken said:

> Removing a disk will require rebuilding parity.

 

After I moved all the files finished moving off, I was going to try this "shrink without rebuild": https://wiki.lime-technology.com/Shrink_array#The_.22Clear_Drive_Then_Remove_Drive.22_Method

 

Will that not work?

Should work but I haven't tried it. Might be faster than the reformat/rebuild method I mentioned, though a little more complicated.

 

That method in the wiki involves clearing the drive to be removed while it is still part of the array. Since it is only a 1.5TB drive it shouldn't take as long as rebuilding the 8TB.

 

But, if the 8TB drive isn't clear you would have to wait for it to clear after adding it. If you just use the 8TB to replace the XFS formatted 1.5TB then it won't matter if it is clear or not.

 

Is the 8TB drive clear?

Link to comment
2 hours ago, trurl said:

Another approach would be to reformat the 1.5TB to XFS after you get the files off it, then rebuild the newly formatted 1.5TB XFS drive onto the 8TB.

This is exactly what I would recommend in your case.

 

Leave all the drive slots filled until you are done, and then remove everything that's leaving all at once so you only have to rebuild parity 1 time.

 

4 hours ago, JustinAiken said:

How to verify?

If you use the rsync command to copy, like it mentions in the XFS conversion thread, there is a command line option to run that will go through all the copied files and verify there are no more differences between the drives. At that point, you can stop the array, change the format on the source disk from ReiserFS to XFS, and when you start the array it will prompt you to format the drive.

Link to comment
  • 3 weeks later...
  • 8 months later...

Seeing simular issues here when copying files from my Windows pc to my unRAID machine.

 

4x3TB storage + 1x3TB parity, all xfs
1x256GB cache, btrfs

8GB RAM
Unraid 6.4.0

 

Transfer speeds go down to 0-2MB/s and explorer.exe freezes.

Then I need to restart the array or the server before the transfer speeds will go back to 112MB/s (stable).
Happens around 1-4 times a day.
 

Screen Shot 2018-04-05 at 22.46.56.png

Capture2.PNG

Link to comment

I'm running with 6.5.0 now and experiencing the same issues.
I've also disabled VM's, all except 3 dockers (teamspeak3, plex and plexpy were still running). I've also removed CA Auto Update, since it's known too cause problems for some users.

 

I've copied 6 files today (with a total of around 5GB). Explorer.exe in Windows agains keeps hanging cause it gets no response from the unRAID server. Transferspeeds drop to 0bytes/second for 10-120 seconds and then spike to 5-25MB/s (instead of 112MB which I used to see).

 

While this problem is happening, the unRAID GUI remains working. As well as connecting via ssh.

So I was able to run the diagnostics command using ssh aswell.

 

See attached files for the cpu+mem usage of shfs (cpu time is 16:08.75) from the diagnostics zip (top.txt) and the transfer speed I'm getting while copying a file.

Screen Shot 2018-04-06 at 19.47.07.png

Capture6.PNG

Edited by ssh
Added information that I was able to run the diagnostics
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.