JPDom1 Posted April 4, 2020 Share Posted April 4, 2020 Hi unRAIDers. I am having major issues with unRAID thinking it is funny reporting disks as "Failed" when they are working fine in other machines. It all started about 8 weeks ago when it reported a 4TB unassigned drive used for my steam cache as just missing it just disappeared from the unRAID UI. Rebooted and it came back but now it was reporting that a 4TB disk was only 14GB in size. Removed it from unRAID and put it in my gaming machine where it reported fine and crystal disk was happy so ran it with Disk Sentinel pro and it passed. Currently still running within my gaming machine and working fine. Then a few days later it reported 2 disks have failed. Me in a panic then got 3 8TB WD Golds as replacements put them in took the others out and it was OK. The 2 "dead" drives both passed all my tests in my other machine but kept failing Pre clear in unRAID. The 8TB Ironwolf Pro would not even detect the device name just a total mess. Now today I log in and guess what????? 2 more FU**** disk have failed I mean really? Currently just leaving it as is as there must be a underlying issue here that I cannot find. Only things to note is my UPS keeps also reporting 0 minutes in unRAID and Battery low when it is fully charges for some reason not sure if that is a problem? I have taken the hard drives and put them directly into the Motherboard sata controllers and same result. Swapped out sata cables. Removed all of my molex to sata splitters and got proper sata extenders by cable matters. Diagnostics attached and specs below Motherboard : X399 AORUS PRO CPU : 1950X RAM: 4x 8GB Corsair Hyper x 3000MHZ PSU : Corsair 650Watt Storage cards: 2x Dell H310 mufasa-diagnostics-20200404-1842.zip Quote Link to comment
Normand_Nadon Posted April 4, 2020 Share Posted April 4, 2020 Does your Motherboard have a seperate SATA controller you could migrate your connexion to? This looks a lot like a failing of badly behaving SATA controller.... You could also confirm if it is a controller issue by popping in a linux live disk and check the state of the drives... (I suggest using Pop!OS 18.04 as it runs great and has all the tools you need pre-installed). DON'T INSTALL Linux, just chose "try without installing"... This way you can navigate the content of you drives without modifying them. Quote Link to comment
Normand_Nadon Posted April 4, 2020 Share Posted April 4, 2020 Oups.. For some reason, I did not see the end of your post on my cell phone and just saw it now... Forget the SATA bit, but please try the linux diagnostic to see if it detects the drives correctly on your hardware... Quote Link to comment
Conson Droppa Posted April 4, 2020 Share Posted April 4, 2020 (edited) Replied to reddit post. I would look hard at the PSU. Sometimes it seems like a wild idea but I’ve lost hair because of it. had many drives red ball only to work fine in a HDD enclosure. New PSU and no more errors. Edited April 4, 2020 by rmeaux 1 Quote Link to comment
JPDom1 Posted April 4, 2020 Author Share Posted April 4, 2020 Thanks guys I have a Corsair G650M in my unRAID box and a old Cooler Master Silent Pro Modular 2 1000w from 2012 running in my gaming rig. Going to swap these around tomorrow and see how it goes. Also going to bypass the UPS for now as it keeps reporting 0 minutes run time even though the battery is fully charged(no replace battery warning) it is a APC Back-UPS Pro 1500 Quote Link to comment
JPDom1 Posted April 5, 2020 Author Share Posted April 5, 2020 OK so I have swapped out the PSU's from my unRAID machine and my Gaming machine. I do not have a Corsair 650Watt as per the original post it is a cooler master 650Watt same as my 1000Watt Cooler Master in my gaming Rig(Photo below of the 2 PSU's). Put the 1000W in the unRAID machine and also removed my Sata extenders as I have 16 x Sata connectors on my 1000W PSU. I was also able to remove my Molex splitter as the 1000W PSU molex can reach as I have put the PSU in the top chamber of the case where I had the other one in the bottom (Antec 1900) I turned the unRAID box back on and let it do the Pre clear on the 2 "Dead" drives. I have also plugged all drives into my Dell H310's now so none of them are on the motherboard at present. Photos below. But as it is a lovely 20c in Isolation today we decided to have a BBQ (Last photo) so the Gaming PC is still in bits and will put it together tonight or tomorrow. For now the unRAID machine is showing 0 errors so will wait and see how it goes over the next week as I need to do the Pre clear then add the drives back into the array and do a parity SYNC. Fingers crossed. 1 Quote Link to comment
Squid Posted April 5, 2020 Share Posted April 5, 2020 If you're not using those other 2 x16 slots, you should move one of the HBAs to them. They get hot under full load. 1 Quote Link to comment
JPDom1 Posted April 5, 2020 Author Share Posted April 5, 2020 9 minutes ago, Squid said: If you're not using those other 2 x16 slots, you should move one of the HBAs to them. They get hot under full load. Thanks @Squid I was thinking of doing that. I had a P2000 in there and a 10GB nic but took both out as not being utilized and was just going to get a 1060 for transcoding if I can get this issue sorted. Just an observation I have ordered 4 new 120mm fans for the machine (3x120mm front & 1 rear) as the drives we constantly getting hot (over 50c). Today is the hottest day in the UK so far this year and all the drives are 10c cooler than they have been with the old PSU. Am I just imaging things here or could the other PSU have been starving the fans of power to spin properly or even worse causing the drives to overheat? Quote Link to comment
AgentXXL Posted April 5, 2020 Share Posted April 5, 2020 4 minutes ago, JPDom1 said: Today is the hottest day in the UK so far this year and all the drives are 10c cooler than they have been with the old PSU. Am I just imaging things here or could the other PSU have been starving the fans of power to spin properly or even worse causing the drives to overheat? I suspect it's more likely that the old splitters and cabling were restricting some of the airflow, but moving the power supply to the new position may help if it was generating a lot of heat. And a 650W supply might do that if it's not particularly efficient. Unrelated to the power supply, but I also experienced a lot of similar issues when I 1st moved to unRAID last year. I had numerous drives failing in unRAID but testing fine afterwards, eventually leading me to the SAS/SATA backplanes that were in use in my old Norco RPC-4220 case. Once I removed the backplanes and direct-cabled to each drive, my failures went away. I see that you have replaced the SATA cables and now have less (or no) need for any power splitters so that's a good start. Out of curiosity, do you know which controller your motherboard SATA ports use? If it was Marvel, some of them are known to have issues with BSD and Linux. Moving to the Dell HBAs is another good troubleshooting method, but just ensure they're running in IT mode (likely they are if you've got unRAID up and running). Hopefully running with the drives connected to the HBAs resolves your issue. Quote Link to comment
Normand_Nadon Posted April 6, 2020 Share Posted April 6, 2020 19 hours ago, AgentXXL said: I suspect it's more likely that the old splitters and cabling were restricting some of the airflow This was proven, times and times again that it has no significant impact on temperatures (unless you would make a tight-sealed wall of wires!) (don't mean to offend here ) An underpowered PSU, or an overloaded PSU rail will result in overheating in few possible ways: 1- The PSU will go beyond it's efficiency curve and put out more heat 2- The wire itself will heat-up if overloaded (that is less likely because something would shut down before you get there!) 3- If the PSU provides less than the required 5V or 12V due to overloading, components could still work, but will pull more current as the tension lowers (and it is a vicious circle!). More current = more heat. People sometimes find it stupid to get a high quality, high power PSU, but it makes a ton of sense in the end... The important thing is to look at the efficiency curve of said PSU... If your usage is in the sweet spot of efficiency, the expense will pay for itself over time, especially on a 24/7 setup! You can look at the attached image for a general sense of what efficiency ratings mean for computer PSUs. By the way, I stopped using cooler master branded PSUs because they are, to me, inferior products... I had many issues with this brand. Their cooling solutions are excellent though. For good PSUs, I prefer Seasonic and I've had a good (single!) experience with EVGA too. Quote Link to comment
Normand_Nadon Posted April 6, 2020 Share Posted April 6, 2020 By the way, I am amazed that computer still use those cheap MOLEX adapters after all these years... You pay a lot of money for a high quality PSU to have it end with one of the cheapest connector there is! These connectors have poor contacts for their given size... But hey, they are the industry standard and will probably not be replaced in the near future! Maybe in 40 years! Quote Link to comment
JPDom1 Posted April 7, 2020 Author Share Posted April 7, 2020 Both my 6 & 8TB "Dead" drives have now pre cleared and about to put them back into the array for a parity sync. Also purchased 2x 40mm 12v 2pin fans to put on the H310 cards but as usual it is from China with someone pretending to be in the UK so will have to wait 4-6 weeks for it to arrive 😡😡. Thank you for the comments @Normand_Nadon I am baffled by the temperature reduction it is amazing me that a sata splitter and molex splitter can cause this much heat. The 650Watt PSU is in my gaming PC now although it is only running a 1080 Hybrid, Ryzen 5 2600, 3 drives and so on it is performing fine. Will let the Parity SYNC complete and get the coolers on the H310 but hope this means the issue is resolved but only time will tell really.... Quote Link to comment
Vr2Io Posted April 7, 2020 Share Posted April 7, 2020 On 4/6/2020 at 12:19 AM, JPDom1 said: Today is the hottest day in the UK so far this year and all the drives are 10c cooler The disk temp would be same in summer time. Quote Link to comment
AgentXXL Posted April 7, 2020 Share Posted April 7, 2020 On 4/6/2020 at 6:59 AM, Normand_Nadon said: This was proven, times and times again that it has no significant impact on temperatures (unless you would make a tight-sealed wall of wires!) (don't mean to offend here ) No offense taken. In my 40+ years working with computers I've encountered airflow issues leading to temp increases a few times. They were corrected by re-routing and shortening or lengthening cables as required. While it is more likely that the power supply was producing the extra heat due to load and efficiency, I never underestimate the possibility that airflow plays a part in improper heating/cooling. Quote Link to comment
AgentXXL Posted April 7, 2020 Share Posted April 7, 2020 5 hours ago, Benson said: The disk temp would be same in summer time. Not necessarily. If the room temperature of the space where your computer/servers reside is high, it makes it harder for the system to keep internal components cool. In cases with a proper push-pull airflow scenario, pulling the warmer room air into the case provides reduced cooling. I've even installed air-conditioning in my computer/server room for this reason. Quote Link to comment
Normand_Nadon Posted April 8, 2020 Share Posted April 8, 2020 6 hours ago, AgentXXL said: Not necessarily. If the room temperature of the space where your computer/servers reside is high, it makes it harder for the system to keep internal components cool. In cases with a proper push-pull airflow scenario, pulling the warmer room air into the case provides reduced cooling. I've even installed air-conditioning in my computer/server room for this reason. The "rule of thumb" in heat exchanges, is that you need around 12 degrees Celsius of differential to start to have an appreciable heat transfer with air. It varies with humidity levels and some other things, but that is a good starting point. One thing to remember in cooling (and in physics): Cold is not a physical thing ! Cold is a relation, an interpretation... Heat is a thing, and you can have more or less of it. You do not displace cold... you move heat from one place to another (you dissipate the energy in a medium containing less of it)... Knowing this changes perspective when designing cooling solution in my opinion ! Quote Link to comment
JPDom1 Posted April 8, 2020 Author Share Posted April 8, 2020 Parity check complete and all disk appear happy now. It is 22c in the UK today and my Server is running great. Last week with the old PSU when it was 9c outside i had 3-5 disk over 50-55c. Retract my comment on on the Chinese eBay delivery....My 40mm Fans arrived today but now need to get screws for them to go into the heatsink. Might just cable tie it or something. suggestions welcome I got these from evilbay : https://www.ebay.co.uk/itm/Small-PC-fan-cooling-heat-sink-computer-case-40mm-12V-2-pin/172498532899?ssPageName=STRK%3AMEBIDX%3AIT&_trksid=p2057872.m2749.l2649 Quote Link to comment
JPDom1 Posted April 20, 2020 Author Share Posted April 20, 2020 unRAID strikes again! Had great uptime and performance for 10 days. Today needed to switch the server off while doing electrical works in the house as a precaution as the UPS is still broken. Upon powering on within 2 minutes a "failed" drive again! Diag attached. Array did not want to stop so had to do a reboot from the GUI. Disk reported 2056 erros as per the screenshot it is now out of the array doing a pre clear. I know the disk is fine unRAID is just a fanny! Was hoping switching to a 1000W PSU and removing all splitters would solve my issue. Had enough of this! mufasa-diagnostics-20200420-1352.zip Quote Link to comment
Normand_Nadon Posted April 20, 2020 Share Posted April 20, 2020 Just a thought... Maybe your parity disk is going bad too? It would report data error upon checking for parity, I guess... No? Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.