(Bypassed, not 'solved') Frequent stalls requiring hard power down


Recommended Posts

This problem could be caused by the way that this massive list of files for deletion is generated or compiled.  Remember that unRAID runs completely in memory and does not have a hard drive swap disk file to work with.  Your system could simply be running out of memory to store the list of files.  And this will cause problems as the OS will then begin dumping 'stale' processes to get more memory.  

Link to comment

Hey Frank, appreciate the reply - I had considered this, but both the memory usage is unaffected, plus the rm can run for many hours (happily deleting files at the time - not just enumerating the list of files to delete) before it crashes! The files are also structured in 5 - 8 folders deep, so each folder only has a couple of thousand files in or less.

Link to comment

My guess is that some resource is being exhausted within either XFS or, indeed, one of the supporting techs interfacing either the raid array or the docker/VM's (although I'm pretty certain I've had a crash with a regular Unraid-only rm - no virtual mount / 9p / etc). I had hoped to find the culprit by examining the XFS stats during a crash and seeing an exhaustion of inodes or something, but nothing really looked obviously wrong from that. Honestly, I'm at a loss as to what's causing this - my next step, though, is to try and make this reproducible (and consistent) and remove all the extra factors such as software (ZM), using a VM, a docker, etc etc. Just try and boil it down to the minimum steps to cause the problem... You're right that the level of file access doesn't seem to be abnormal - although the level of deletions may be. The hang is definitely total - even if the system is left for most of the day to recover, it never does. The hang is also purely IO (from what I can see) as the system is still trying to work (e.g., I can run simple SSH commands until it tries to do any IO and then that SSH session will also hang).

Link to comment

The discussion looks to be very similar in nature to what I've seen (except their problem only lasts 10 - 15 minutes - although, it could just be magnitude; perhaps mine would free up after 10 - 20 hours or more?! Never left it that long so far)...

 

https://www.spinics.net/lists/linux-xfs/msg06058.html

 

Towards the end the discussion moves towards the mass deletion of file structures, similar to what I've seen. Their solution was, effectively, to slow down deletions by reducing how parallel the deletions were...

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.