unRAID eating HDD's like cookies


19 posts in this topic Last Reply

Recommended Posts

Hi unRAIDers.

 

I am having major issues with unRAID thinking it is funny reporting disks as "Failed" when they are working fine in other machines.

 

It all started about 8 weeks ago when it reported a 4TB unassigned drive used for my steam cache as just missing it just disappeared from the unRAID UI.

Rebooted and it came back but now it was reporting that a 4TB disk was only 14GB in size. Removed it from unRAID and put it in my gaming machine where it reported fine and crystal disk was happy so ran it with Disk Sentinel pro and it passed. Currently still running within my gaming machine and working fine.

 

Then a few days later it reported 2 disks have failed. Me in a panic then got 3 8TB WD Golds as replacements put them in took the others out and it was OK. The 2 "dead" drives both passed all my tests in my other machine but kept failing Pre clear in unRAID. The 8TB Ironwolf Pro would not even detect the device name just a total mess.

 

Now today I log in and guess what????? 2 more FU**** disk have failed I mean really? 

 

Currently just leaving it as is as there must be a underlying issue here that I cannot find.

 

Only things to note is my UPS keeps also reporting 0 minutes in unRAID and Battery low when it is fully charges for some reason not sure if that is a problem?

I have taken the hard drives and put them directly into the Motherboard sata controllers and same result.

Swapped out sata cables.

Removed all of my molex to sata splitters and got proper sata extenders by cable matters.

 

Diagnostics attached and specs below

 

Motherboard X399 AORUS PRO

CPU : 1950X

RAM: 4x 8GB Corsair Hyper x 3000MHZ

PSU : Corsair 650Watt

Storage cards: 2x Dell H310 

mufasa-diagnostics-20200404-1842.zip

Link to post

Does your Motherboard have a seperate SATA controller you could migrate your connexion to? This looks a lot like a failing of badly behaving SATA controller....

 

You could also confirm if it is a controller issue by popping in a linux live disk and check the state of the drives... (I suggest using Pop!OS 18.04 as it runs great and has all the tools you need pre-installed). DON'T INSTALL Linux, just chose "try without installing"... This way you can navigate the content of you drives without modifying them.
 

Link to post

Replied to reddit post. I would look hard at the PSU. Sometimes it seems like a wild idea but I’ve lost hair because of it. 
 

had many drives red ball only to work fine in a HDD enclosure. New PSU and no more errors. 

Edited by rmeaux
Link to post

Thanks guys I have a Corsair G650M in my unRAID box and a old Cooler Master Silent Pro Modular 2 1000w from 2012 running in my gaming rig. Going to swap these around tomorrow and see how it goes.

Also going to bypass the UPS for now as it keeps reporting 0 minutes run time even though the battery is fully charged(no replace battery warning) it is a APC Back-UPS Pro 1500

Link to post

OK so I have swapped out the PSU's from my unRAID machine and my Gaming machine. I do not have a Corsair 650Watt as per the original post it is a cooler master 650Watt same as my 1000Watt Cooler Master in my gaming Rig(Photo below of the 2 PSU's).

Put the 1000W in the unRAID machine and also removed my Sata extenders as I have 16 x Sata connectors on my 1000W PSU. I was also able to remove my Molex splitter as the 1000W PSU molex can reach as I have put the PSU in the top chamber of the case where I had the other one in the bottom (Antec 1900)

 

I turned the unRAID box back on and let it do the Pre clear on the 2 "Dead" drives. I have also plugged all drives into my Dell H310's now so none of them are on the motherboard at present. Photos below.

 

But as it is a lovely 20c in Isolation today we decided to have a BBQ (Last photo) so the Gaming PC is still in bits and will put it together tonight or tomorrow.

 

For now the unRAID machine is showing 0 errors so will wait and see how it goes over the next week as I need to do the Pre clear then add the drives back into the array and do a parity SYNC.

 

Fingers crossed.

20200405_111258.jpg

20200405_114731.jpg

20200405_125041.jpg

20200405_114431.jpg

20200405_135946.jpg

Link to post
9 minutes ago, Squid said:

If you're not using those other 2 x16 slots, you should move one of the HBAs to them.  They get hot under full load.

Thanks @Squid I was thinking of doing that. I had a P2000 in there and a 10GB nic but took both out as not being utilized and was just going to get a 1060 for transcoding if I can get this issue sorted.

 

Just an observation I have ordered 4 new 120mm fans for the machine (3x120mm front & 1 rear) as the drives we constantly getting hot (over 50c).

Today is the hottest day in the UK so far this year and all the drives are 10c cooler than they have been with the old PSU.

Am I just imaging things here or could the other PSU have been starving the fans of power to spin properly or even worse causing the drives to overheat?

image.thumb.png.2b089d875063784f7edfd4b878c55d1b.png

Link to post
4 minutes ago, JPDom1 said:

 

Today is the hottest day in the UK so far this year and all the drives are 10c cooler than they have been with the old PSU.

Am I just imaging things here or could the other PSU have been starving the fans of power to spin properly or even worse causing the drives to overheat?

 

 

I suspect it's more likely that the old splitters and cabling were restricting some of the airflow, but moving the power supply to the new position may help if it was generating a lot of heat. And a 650W supply might do that if it's not particularly efficient.

 

Unrelated to the power supply, but I also experienced a lot of similar issues when I 1st moved to unRAID last year. I had numerous drives failing in unRAID but testing fine afterwards, eventually leading me to the SAS/SATA backplanes that were in use in my old Norco RPC-4220 case. Once I removed the backplanes and direct-cabled to each drive, my failures went away.

 

I see that you have replaced the SATA cables and now have less (or no) need for any power splitters so that's a good start. Out of curiosity, do you know which controller your motherboard SATA ports use? If it was Marvel, some of them are known to have issues with BSD and Linux. Moving to the Dell HBAs is another good troubleshooting method, but just ensure they're running in IT mode (likely they are if you've got unRAID up and running). Hopefully running with the drives connected to the HBAs resolves your issue.

 

 

Link to post
19 hours ago, AgentXXL said:

 

I suspect it's more likely that the old splitters and cabling were restricting some of the airflow

 

This was proven, times and times again that it has no significant impact on temperatures (unless you would make a tight-sealed wall of wires!)

(don't mean to offend here :) )

 

An underpowered PSU, or an overloaded PSU rail will result in overheating in few possible ways:

1- The PSU will go beyond it's efficiency curve and put out more heat

2- The wire itself will heat-up if overloaded (that is less likely because something would shut down before you get there!)

3- If the PSU provides less than the required 5V or 12V due to overloading, components could still work, but will pull more current as the tension lowers (and it is a vicious circle!). More current = more heat.

 

People sometimes find it stupid to get a high quality, high power PSU, but it makes a ton of sense in the end... The important thing is to look at the efficiency curve of said PSU... If your usage is in the sweet spot of efficiency, the expense will pay for itself over time, especially on a 24/7 setup! You can look at the attached image for a general sense of what efficiency ratings mean for computer PSUs.

By the way, I stopped using cooler master branded PSUs because they are, to me, inferior products... I had many issues with this brand. Their cooling solutions are excellent though. For good PSUs, I prefer Seasonic and I've had a good (single!) experience with EVGA too.

 

_id1463390132_343178_2.png

Link to post

By the way, I am amazed that computer still use those cheap MOLEX adapters after all these years... You pay a lot of money for a high quality PSU to have it end with one of the cheapest connector there is! These connectors have poor contacts for their given size... But hey, they are the industry standard and will probably not be replaced in the near future! Maybe in 40 years! :D

Link to post

Both my 6 & 8TB "Dead" drives have now pre cleared and about to put them back into the array for a parity sync.

image.png.3a779febc3a7f8e3e3cb6610968edc15.png

 

Also purchased 2x 40mm 12v 2pin fans to put on the H310 cards but as usual it is from China with someone pretending to be in the UK so will have to wait 4-6 weeks for it to arrive 😡😡.

 

Thank you for the comments @Normand_Nadon I am baffled by the temperature reduction it is amazing me that a sata splitter and molex splitter can cause this much heat. The 650Watt PSU is in my gaming PC now although it is only running a 1080 Hybrid, Ryzen 5 2600, 3 drives and so on it is performing fine.

 

Will let the Parity SYNC complete and get the coolers on the H310 but hope this means the issue is resolved but only time will tell really....

 

Link to post
On 4/6/2020 at 6:59 AM, Normand_Nadon said:

This was proven, times and times again that it has no significant impact on temperatures (unless you would make a tight-sealed wall of wires!)

(don't mean to offend here :) )

 

No offense taken. In my 40+ years working with computers I've encountered airflow issues leading to temp increases a few times. They were corrected by re-routing and shortening or lengthening cables as required. While it is more likely that the power supply was producing the extra heat due to load and efficiency, I never underestimate the possibility that airflow plays a part in improper heating/cooling.

 

Link to post
5 hours ago, Benson said:

The disk temp would be same in summer time.

 

Not necessarily. If the room temperature of the space where your computer/servers reside is high, it makes it harder for the system to keep internal components cool. In cases with a proper push-pull airflow scenario, pulling the warmer room air into the case provides reduced cooling.

 

I've even installed air-conditioning in my computer/server room for this reason.

Link to post
6 hours ago, AgentXXL said:

 

Not necessarily. If the room temperature of the space where your computer/servers reside is high, it makes it harder for the system to keep internal components cool. In cases with a proper push-pull airflow scenario, pulling the warmer room air into the case provides reduced cooling.

 

I've even installed air-conditioning in my computer/server room for this reason.

The "rule of thumb" in heat exchanges, is that you need around 12 degrees Celsius of differential to start to have an appreciable heat transfer with air. It varies with humidity levels and some other things, but that is a good starting point.
One thing to remember in cooling (and in physics): Cold is not a physical thing ! Cold is a relation, an interpretation...
Heat is a thing, and you can have more or less of it. You do not displace cold... you move heat from one place to another (you dissipate the energy in a medium containing less of it)... Knowing this changes perspective when designing cooling solution in my opinion ! :) 

Link to post

Parity check complete and all disk appear happy now.

It is 22c in the UK today and my Server is running great.

Last week with the old PSU when it was 9c outside i had 3-5 disk over 50-55c.

 

image.thumb.png.144856ee10156cde930101845eae9480.png

 

Retract my comment on on the Chinese eBay delivery....My 40mm Fans arrived today but now need to get screws for them to go into the heatsink. Might just cable tie it or something. suggestions welcome I got these from evilbay https://www.ebay.co.uk/itm/Small-PC-fan-cooling-heat-sink-computer-case-40mm-12V-2-pin/172498532899?ssPageName=STRK%3AMEBIDX%3AIT&_trksid=p2057872.m2749.l2649

 

Link to post
  • 2 weeks later...

unRAID strikes again!

 

Had great uptime and performance for 10 days.

Today needed to switch the server off while doing electrical works in the house as a precaution as the UPS is still broken.

 

Upon powering on within 2 minutes a "failed" drive again!

image.thumb.png.572af5af36f8e63b38865cd2d27779f4.png

 

Diag attached.

 

Array did not want to stop so had to do a reboot from the GUI.

Disk reported 2056 erros as per the screenshot

it is now out of the array doing a pre clear. I know the disk is fine unRAID is just a fanny!

 

Was hoping switching to a 1000W PSU and removing all splitters would solve my issue.

 

Had enough of this!

mufasa-diagnostics-20200420-1352.zip

Link to post

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.