Issue with unresponsive server


Talos

Recommended Posts

Hi guys..

 

I have a server which has been running 24/7 for a few years now without issue until this week. I am experiencing an issue whereby all my /mnt/Disk1, /mnt Disk2 etc Samba shares stop responding. I can access the GUI via the browser and run commands but if I try and access any of my drives from windows explorer or from any of the media players on any of my PC's on the network it just times out. It first happened on monday this week and now has just happened again.

 

If I try and stop the array it via the button on the Main GUI tab it gives me the across the bottom of the screen and then the GUI freezes up after a few minutes also.

 

"Stopping Docker...Stopping libvirt...Stop AVAHI...Stop SMB...Spinning up all drives...Sync filesystems..."

 

After this I was forced to hit the reset button and then go through the parity check process due to the unclean shutdown. It passed the parity check on powerup after a few hours with zero errors detected.

 

Today I came home from work and I had the same issue. All shared drives were unresponsive but i could access the GUI and draw down a diagnostics zip file which I've attached.

 

Don't have the faintest idea where to start with this one. Could it be an issue with SAMBA dying or am i looking at a dying drive or something else altogether?

 

Server details are as follows:

Unraid 6.1.9

ASRock - B75 Pro3-M

Intel® Celeron® CPU G550 @ 2.60GHz

8gb Ram

M1015 HBA

9x 3tb Toshiba ACA drives (8 on m1015 and parity on the mobo)

 

Any help would be greatly appreciated thanks guys.

 

Cheers.

theburrow-diagnostics-20170217-1625.zip

Link to comment

The good news is that your disks are fine. Two of them have recorded a temperature of 46 Celsius, which is a little bit on the warm side but it's something to bear in mind, rather than be a likely cause of your current problem. You have a very straightforward setup without any dockers or VMs. You don't have a cache or any user shares configured - just disk shares. You're using an older version of unRAID (I'm guessing 6.1.9, perhaps?) but that's ok because you don't have any of the issues that some people are having with the later versions. In fact, you have a very clean syslog and everything seems to be working just as it should, despite the obvious unclean shutdown previously.

 

You say your server has been working well for a few years so I'm going to suggest you open it up and clear out any dust and fluff, especially around the processor heatsink. Pop out and reseat the memory DIMMs. Check to make sure you haven't disturbed any of the cables to the disks and put the cover back on. Then, with a monitor and keyboard connected, power on and choose the Memtest option from the boot menu. Let it run for 24 hours or so and see if it reveals anything.

 

If the memory turns out to be good then restart and boot into unRAID but start the array in Maintenance mode and run a file system check on each disk. Report back with any questions or findings.

 

At this stage I'd suppress any urge to update to a newer version on unRAID because you know your system works well with your current version and there's no point in introducing another variable.

 

Link to comment

Lol.. thought I put the 6.1.9 in the original post but it says 2.1.9 instead.

 

The high temp is probably because we've had a heat wave here in the past few weeks hitting 47c on several days so the air con has been struggling.

 

I'll break the machine down today and make sure it's all clean inside and then do the file system check. Thanks John.

 

Ive been running unraid for about 8 or so years now so started long before all the docker and VM stuff :) i love unraid just because it works perfectly as a media server for all my machines although i am now starting to consider building a VM capable server so that i can run both my storage and my everyday desktop on there and do away with a seperate machine.

 

Sent from my XT1635-02 using Tapatalk

 

 

Link to comment

Well well, I was looking so closely for things going wrong I completely missed any reference to a version number in your post and at the point the array starts as trurl pointed out. My guess was from the kernel version! Se perhaps the emoticon  8) is wrong and they ought to be nerdy specs, not cool shades!

 

I did notice the Sydney timezone in your diagnostics so yes, you'll be experiencing some pretty warm temperatures at the moment. Phew!

 

I use dockers myself, but I don't really do VMs. Instead I have a collection of boxes, each set up to do a particular job!

 

Link to comment

Well.. I broke down the server yesterday... pulled the HSF off the CPU and gave it a vacuum as it was a tad clogged.. put it all back together and ran memtest for 12 hours - multiple passes with zero failures..

 

Started the array back up in maintenance mode and ran a reiserchk on all 8 of the data disks and all reported zero issues.. disk 5 had 1 safe link but apart from that nothing stood out.. It also completed a parity check overnight without parity errors..

 

hopefully it was just the CPU overheating from the dust and all will be ok now.. ill let it run normal now and see if the issue presents itself again.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.