Jump to content

System randomly stops responding


cleight

Recommended Posts

Hello Community,

 

I have been dealing with an issue for months where my Unraid server will just stop responding. The server is still pingable, the webgui mostly loads the main page but the array and power commands are missing. When this happens I also lose access to SSH into the server. I'm not 100% sure what is going on, I tried upgrading to the latest beta build which helped but things are still a mess. It appears everytime I put any kind of load on the disks (Heavy file transfers, Video Encoding, moving files from one disk to another) this issue presents itself. I'm trying to pull some diagnostics off the server now, but here is the latest syslog I was able to acquire.

unraid-syslog-20191024-1513.zip

Link to comment
1 hour ago, johnnie.black said:

Cache filesystem is corrupt, same for disk5, also various other BUG and call traces logged, I would start with memtest.

@Johnnie.Black I have ran a memtest for over 24 hours and all tests came back normal. I also know that the Cache filesystem is corrupted, I believe from the multiple hard power offs of the system over the past few months. Currently I have nothing stored on the Cache pool so that isn't a huge issue to blow away and re-create. Now Disk5 showing corruption is a bit of a concern as I haven't gotten any drive errors or alerts in the gui. Do you have any other suggestions as to what might be causing the issue? 

Link to comment
Just now, johnnie.black said:

The constant Bug on and call traces are a concern, possibly hardware related, I assume you're already using the Ryzen workarounds, or it's normal for the server to crash after a few hours.

It isn't normal for the server to crash after a few hours. Sometimes it will stay up and running for 14-21 days before I see a crash. It appears to happen like I said in my original post when heavy disk I/O occurs. I will review the Ryzen workarounds thread to see if I missed something, I know I have set the C State through the go file ages ago.

Link to comment

@johnnie.black I finally got the server back online after having to force it down. Here is the latest diagnostics logs in case you wanted to look a bit deeper into the issue. Fix Common Problems is now reported Disk5 as read only so it aligns with what you seen in the last syslog. I am currently in the process of emptying Disk4 so I can replace the drive with a larger one and then convert it to XFS and will then move all the data off Disk 5 onto the new disk.

unraid-diagnostics-20191024-1846.zip

Link to comment
8 hours ago, johnnie.black said:

You also need to fix the corruption in the cache pool, best way is to re-format.

Johnnie thank you for your assistance. I was able to repair the filesystem on Disk 5 and also formatted the cache pool and re-configured and so far everything appears to be more stable than it was. I will let you know if anything else comes up.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...