Unraid goes completely unresponsive - no ping or display


Recommended Posts

Hello! I've been running unraid for a bit now and generally loving it.

 

I have a problem where, every few days, the whole machine will stop responding. I have it plugged into an external monitor and keyboard. When it enters this state not even numlock will toggle on the keyboard. I piped the system logs to the flash drive in hopes of getting more information (as the logs shown while it was running were fine). This is the last bit of the log which doesn't seem interesting at all:

Jun  5 01:54:47 Tower shfs: share cache full
Jun  5 01:55:19 Tower shfs: share cache full
Jun  5 02:47:07 Tower shfs: share cache full
Jun  5 02:48:37 Tower shfs: share cache full
Jun  5 03:00:01 Tower Plugin Auto Update: Checking for available plugin updates
Jun  5 03:00:05 Tower Plugin Auto Update: Community Applications Plugin Auto Update finished
Jun  5 03:40:13 Tower crond[2476]: exit status 1 from user root /usr/local/sbin/mover &> /dev/null

From the looks of the log file itself it was last written to about an hour after the last log entry.

When it comes online it takes about 12 hours to verify the array if there is no usage, 24-36 while the server gets used. Seems normal but I'd like to avoid this all together.

I have the 'performance' mode turned on.

General hardware: AMD 2920X, 32GB ram, 8x8TB hdd, 2x512GB nvme cache.

My downloads, appdata, domains, system and isos all live on cache only.

 

Any help would be appreciated!

tower-diagnostics-20190605-1333.zip

Link to comment
34 minutes ago, ijuarez said:

is your cache drive full?

Cache drive isn't full - it has about 400GB free. From what I read about the 'error' in the log it's more about writes being directed directly to the drive for one reason or another, instead of being in the cache and the migrated. I've since turned off caching for the two shares that are stored on disk. It's a clean break - shares are either in cache or on disk.

2 minutes ago, johnnie.black said:

Note sure if Threadripper has the same Ryzen issues with Linux, it won't hurt the try the known workarounds, most are discussed here.

Thanks! I'll look into this! It's hard to tell since I can't 'trigger' it to fail, but, it can't hurt!

Link to comment
5 hours ago, elbweb said:

...

2x512GB nvme cache.

My downloads, appdata, domains, system and isos all live on cache only.

It could be the btrfs false disk full error. It happened to me a lot previously. I remember I used to run monthly balance to prevent it from happening (can't run it too regularly because it wears out the ssd).

Link to comment
  • 2 weeks later...

 

On 6/5/2019 at 8:12 AM, johnnie.black said:

Note sure if Threadripper has the same Ryzen issues with Linux, it won't hurt the try the known workarounds, most are discussed here.

So - I went through and disabled one of the states in my bios and haven't had the problem again yet. No idea if that's the solution but so far so good. Thanks @johnnie.black

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.