Jump to content

[SOLVED] Server took a crap on itself


Recommended Posts

Posted

Hi, I've been having some issues with power I think so I decided to change out some SATA to 4 x SATA adapters for Molex to 4 x SATA adapters. Once booted I noticed my disks hadn't spun up as normal and once Unraid loaded there was no disks showing. I thought maybe they just didn't boot correctly through the HBA so shutdown and made sure the connections where all secure on the disks. Booted back up with all my disks and started the array, almost instantly 2 or 3 disks dropped out and caused errors so I stopped the rebuild, shutdown and attempted again after unassigning and reassigning the disks, failures again, done the same and now I have two disks showing as 'New Devices' under my Array Devices tab which I'm unable to assign to any disks under Unraid.

 

Have I completely messed my array or can this be saved? :(

unraid01.png

unraid02.png

Posted

You can do a new config, but if it was in the middle of a rebuild that disk might have some corruption, depending mainly for how long it continued with read errors on more disks, other option would be to force a rebuild of the that disk, did you save the diags before rebooting when the errors occurred?

 

 

Posted

Thank you johnnie.black, I've just ran the New Config option and made sure all drives are assigned to the correct Unraid disks. Should I select the 'Parity is already valid.' option as I know the parity is valid or should I let Unraid do its thing?

 

Unfortunately I didn't save the diagnostics beforehand as I assumed it was just a possible cable issue and shutdown, checked and booted back up.

unraid03.png

Posted

Ok so I just started the array again without checking the 'Parity is valid' option and having errors straight away again. The rebuild speed is also running at 4.6 GB/sec which I assume is majorly wrong?

 

Waiting on the server to 'collect diagnostic information' which is taking a little while, hopefully it can help thought... should I stop the rebuild or let it run?

Posted

Thanks again, got some new power cables on order and will try again tomorrow. Hopefully it's not my SAS cables or PSU otherwise I'll have to wait a month or so :( :(

Posted (edited)

Hi all, so I've replaced my power cables and SAS cables but still having issues.

 

During a read-check it's still throwing out errors and I've also got disk5 that's unmountable. What should my next steps be for this? Should I replace disk5 with unmountable format with a new disk and rebuild the array or should I replace disk3 that's giving errors with a new disk and rebuild the array?

 

Should I replace my HBA and/or power supply?

 

Also I noticed when I start the system up sometimes 2-3 disks don't get detected, could this be a power issue? - this happened with old and new SAS cables and if I remember correctly they're sometimes from different HBA ports.

 

Please help, going nuts here :(

unraid06.png

tower-diagnostics-20200526-1504.zip

Edited by jsN
Additional info
Posted

Don't mess with the disks/filesystem for now, not until the problem is fixed.

 

7 minutes ago, jsN said:

Should I replace my HBA and/or power supply?

That would be my next step, start with the one easier/cheaper to replace/swap, then run another read check.

 

Posted
3 minutes ago, Bandit_King said:

Use sata to 4x sata.

That's not a good idea, one SATA port should be split in two max, SATA plug is only ratted for 4.5A, a single disk can use 2.5A during startup.

Posted
9 hours ago, jsN said:

These are the ones I've purchased to replace the older ones https://www.amazon.co.uk/gp/product/B00B5VX2SA/ref=ppx_yo_dt_b_asin_title_o03_s00?ie=UTF8&psc=1

 

Should I be wary of these?

Those should probably be fine. They have well defined IDC connections for the wires, the type that typically catch fire are the molded ones where all you see of the individual connections is a big black blob of plastic, with who knows how much separation between terminals.

 

4 Pin molex are a much better design, each wire can be examined for issues and individually tweaked for solid connection if necessary. The only reason the traditional 4 pin molex gives trouble is if the terminal is loose fitting or a poor grade of metal.

Posted

Thanks jonathanm, had me worried a little there :)

 

Needing some advice for new power supply just in case my current one doesn't have enough juice when all disks are at full load and causing my errors. My current power supply is Corsair VS550 with specs attached in image.

 

Server Specs:

Intel DH61BE Motherboard

Intel i5-3470T

12GB DDR3

Dell H200 HBA

8 x HP MB3000FCWDH (3TB 6G SAS 7.2K 3.5in DP MDL SC HDD) - Future upgrade to 15x

2 x 120mm Fans

3 x 80mm Fans (inc CPU cooler)

 

Should the current spec power supply be enough for my needs or should I be looking for something with more amps on the 5/12v?

IMG_20200527_102326.jpg

IMG_20200527_110044.jpg

Posted
10 hours ago, jonathanm said:

Those should probably be fine. They have well defined IDC connections for the wires, the type that typically catch fire are the molded ones where all you see of the individual connections is a big black blob of plastic, with who knows how much separation between terminals.

 

Here is a YouTube video on the basic problem with SATA power connectors and how to tell a good connector from a bad one.  The Molex design is far superior to the SATA one.  In fact, the SATA connector design is a poster child on how-not-to-design a connector system! A little searching on YouTube will find many more videos on this same subject.

 

 https://www.youtube.com/watch?v=TataDaUNEFc

 

1 hour ago, jsN said:

Should the current spec power supply be enough for my needs or should I be looking for something with more amps on the 5/12v?

OK, the max inrush current for HD that I have ever seen quoted is 3A.  So the eight drives you have would require 288W of power on spin-up.  Your PS is rated at 550W TOTALThat means you can't exceed that rating for as long as a millisecond or bad things will start to happen.   (Look again at that spec power picture you posted and observe that the sum of the individual max's exceed the total rating.  A classic example of specmanship!)

 

Don't rule out a power supply problem--- particularly when a lot of unexplained things are going on.  Personally, I would be looking at the PS, the PS +12V cabling and the SATA data cabling.  Loose SATA data cables can be a real headache in many servers.  Using Locking SATA cables can have this problem:

 

https://support-en.wd.com/app/answers/detail/a_id/15954

Posted
15 hours ago, jonathanm said:

The only reason the traditional 4 pin molex gives trouble is if the terminal is loose fitting or a poor grade of metal.

These seem pretty solid on the connectors, PSU though I cannot say the same but once both are connected they seem very secure

 

4 hours ago, Frank1940 said:

Don't rule out a power supply problem--- particularly when a lot of unexplained things are going on.  Personally, I would be looking at the PS, the PS +12V cabling and the SATA data cabling.  Loose SATA data cables can be a real headache in many servers.  Using Locking SATA cables can have this problem:

 

https://support-en.wd.com/app/answers/detail/a_id/15954

Yeah this is why the PSU is my next step, as much as I didn't want it to be, the current PSU is second hand and I've ran it almost nonstop for just over a year and who knows how many hours it's had previously.

 

Loose SATA/SAS cables I can run out as I've tried 3 different brands, first ones had the SATA power direct on the SSF8482 connector to the disks and I thought the pressure of my SATA power cables could be causing them to sag and not make a good connection which I hoped my was problem. Currently using StarTech SFF8087 to 4x SFF8482 which seem very secure and a lot better quality but still no fix :(

Posted

Need further help

 

I've replaced the PSU and things look a lot better now from what I can tell but I believe I may have lost disk1. Currently disk5 is classed as an 'Unmountable Disk' and I've swapped disk1 which wasn't being detected with a new disk which is now being detected so guess my old disk1 was died.

 

I guess I may lose some data as I only have a single parity disk. What would my next steps be that I'd need to take to recover my server with minimal data loss?

Screenshot_2020-05-29-20-01-58-059_com.android.chrome.jpg

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...