(SOLVED) Unraid server locks up after about 5 minutes.


Recommended Posts

The is a recent problem. I came home from a work trip and the server was working nicely at 80 days up time. I noticed one of the drives was disabled and emulated due to UDMA CRC error count at 12.

According to threads that is normally caused by a SATA cable. I shut the server down and replaced the drives SATA cable. After a drive rebuild, no additional UDMA CRC error count have showed up but now the server locks up and I have the hard restart. The server does not respond to pings when locked up.

 

The only action I took since the 80 day uptime was to change sata cable and port of that hard drive.

I have tried upgrading unraid to unraid-6.7.0-rc7 and so far it does not look like it has fixed my issue.

I can start the server with the array off and it is stable. As soon as I start the array within a few minutes it locks up.

diagnositcs attached

Help would be awesome. Thanks.

 

Edit : Tried starting the array with only Netdata running as a docker. Netdata gui only shows some data. It does not show the drives reads and writes information. Shows drives but the data is all 0 even during parity check. So Netdata is saying 0MB/s read and write but main page says 104MB/s. With only Netdata running it is lasting longer has not locked up yet(15mins).

 

Just noticed that only 6% (0.5gb of 32gb) of ram is being used. What would cause this. Previously this would site around 12-18gb normally? Running only 1 or 2 docker applications is stable. Currently running Netdata and Plex server with 2.2gb ram usage.

 

tower-diagnostics-20190411-2329.zip

Edited by Scotison
Link to comment
2 hours ago, johnnie.black said:

Try starting in safe mode, also a good idea to run memtest since there a couple of crashes in that short log.

Stable with nothing running.

In the process of slowly starting services until it locks up again. Currently at 4 hrs uptime with plex and netdata running. Funny as soon as plex went up I now have 3 streams active. So far so good.

Hopefully will find issue.

Link to comment

Ran parity check to completion with 10 errors but a pass. No crashes while running parity check with plex and netdata. Uptime was about 24hrs. Started up deluge, radarr and sonarr and it locked shortly after. Restarted with no dockers running and it locked up almost immediately doesn't make sense as it has been OK with no dockers running before. When it locked up I noticed that the cpu had 4 or 6 threads pinned at 100% with the rest near idle.

Tried again with no dockers and it locked up again this time with 4 cpu threads pinned at 100%. In the lead up to the lock up the threads increased one by one until the first hit 100%. Then a second would start taking load and so on till the lock up at 4 threads.

Currently running memtest.

Edited by Scotison
Link to comment

Any parity check that does not report 0 errors is a potential cause for concern.    If you ran it without the option to correct errors set then you should probably run it again with the option to correct errors.  Assuming that completes OK then you should run it yet again and this time expect 0 errors.

Link to comment

I ran memtest overnight for 14 hrs. Did 3 passes with no errors but was locked up when i tried to do anything using keyboard.

I was using a wireless keyboard. Does that matter?

Attached is the pic of memtest while locked up. Second pic is on boot of unraid. It locked up at that point. It seems to be getting worst.

 

I will run parity check again but at this point I am running out of ideas and my knowledge is not the best in this field.

 

IMG_20190414_082611.thumb.jpg.87bb8512226b5951b86ff8a3f6580958.jpg

IMG_20190414_084719.thumb.jpg.4cb625cddfb361d07169edef2a9a44e6.jpg

 

Edited by Scotison
Link to comment

At the risk of speaking to early. I think I may have fix the lockups.

 

Early days but it is not showing any lock behavior any more.

 

I think the issue may have been usb devices. I have a UPS and external hard drive that is connected to the server via the front panel usb ports.

Removing both of them seems to stop the lockups.

Does anyone have an explanation of why that might be?

 

Link to comment

Half fixed. Server is now somewhat stable and able to run with minimal dockers for longer than a day.

It still randomly locks up and requires hard reset.

The hard drive sata cable I changed was originally plugged into to the motherboard. I moved it to a PCI sata expansion card port. Could that cause issues? I will be changing it back to investigate.

Link to comment

Server has been running for 6 days now without a crash. All services are now running again.

The only things I have changed is the SATA cable that start all this.

The cause seems to be my USP data link and an external hard drive during startup. When I remove both these USB connections and allow the server to power up with no services running for about a day, I can get it to behave. Over the next day the services can be started the USB connections plugged back in with no issues.

 

I hesitate to call this solved but I do no think anyone is going to add anything to this discussion.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.