Mogo Posted June 5, 2020 Share Posted June 5, 2020 I've got a 24 bay norco 4224 case. I tried attaching a small fan to the LSI card however it still overheats. This causes the system to keep rebooting. Unfortunately the system is not stable for more than an hour or two. Any suggestions on how to fix this? Are there any other cards than run cooler? I only have the following slots remaining on the motherboard: PCIe 3.0 x1 PCIe 3.0 x4 PCIe 3.0 x8 (if I remove the LSI) 12 Bays are currently populated. Quote Link to comment
JorgeB Posted June 5, 2020 Share Posted June 5, 2020 10 minutes ago, Mogo said: I tried attaching a small fan to the LSI card however it still overheats. And are you sure that's the problem? I have some LSIs without direct cooling, just some general case airflow without any issues, and it should never make your server reboot anyway. Quote Link to comment
Mogo Posted June 5, 2020 Author Share Posted June 5, 2020 In the BMC panel this always shows up when there's a reboot: Unknown BIOS POST Progress Error- - Asserted Unknown Microcontroller / Coprocessor Transition to Running - Asserted If I turn on the system and have the case cover on, It will continually reboot either before starting unraid or shortly after unraid is loaded. I have a SUPERMICRO AOC-SAS2LP-MV8 when that was installed there were no issues. I can't go back and test with that card as the array is at 12 instead of 8 for that card. Quote Link to comment
JorgeB Posted June 5, 2020 Share Posted June 5, 2020 Maybe a problem with the LSI, find it odd that it would overheat so fast. Quote Link to comment
Michael_P Posted June 5, 2020 Share Posted June 5, 2020 Are you sure it's a legitimate card? Quote Link to comment
Mogo Posted June 5, 2020 Author Share Posted June 5, 2020 21 minutes ago, Michael_P said: Are you sure it's a legitimate card? Well when I bought it awhile back it was from newegg, shipped and sold by them. So I assume it's authentic. Quote Link to comment
Mogo Posted June 5, 2020 Author Share Posted June 5, 2020 Would unraid detect the drives connected to two different sas cards? Such as a SAS 9207-8i and a SUPERMICRO AOC-SAS2LP-MV8? Quote Link to comment
Vr2Io Posted June 6, 2020 Share Posted June 6, 2020 (edited) 6 hours ago, Mogo said: If I turn on the system and have the case cover on, It will continually reboot either before starting unraid or shortly after unraid is loaded. Better got some case air temperature figure for troubleshooting , does case have middle FAN ? ( next to disk backplane ) If you have FAN 3 hours ago, Mogo said: Would unraid detect the drives connected to two different sas cards? Such as a SAS 9207-8i and a SUPERMICRO AOC-SAS2LP-MV8? Yes. Edited June 6, 2020 by Benson Quote Link to comment
Mogo Posted June 6, 2020 Author Share Posted June 6, 2020 (edited) 14 hours ago, Benson said: Better got some case air temperature figure for troubleshooting , does case have middle FAN ? ( next to disk backplane ) If you have FAN Yes. Yes the case came with a fan wall. There are 3x120mm fans on the side adjacent to the hard drives. I even tried reversing the fans to the other side of the motherboard section and still had the same result. The stock fans I replaced a few years ago with Noctua fans. Edited June 6, 2020 by Mogo Quote Link to comment
Vr2Io Posted June 6, 2020 Share Posted June 6, 2020 (edited) I have similar problem several months ago, system not reboot but one of LSI HBA dead when I fully populated all disks. The problem are most intake air from disk side were block, so hot air can't effective get out and case ( not norco 4224 ) temp around ~50c. ( I haven't install rear FAN and no FAN for HBA, high speed stock fan also change to silent type ) By different experiment, I got a solution by increase cool air intake ( air hole between backplane and fan wall I seal before and now release ) and reverse the fan. The result are disk got hotter, but other parts temperature reduce a lot. So I would suggest you make a test, not reverse the fan and keep the small fan for HBA, don't fully cover the case, just cover up to fan wall and see the different. ( I notice you are not populated all disks, this also make me think why you will got over heat ) Edited June 6, 2020 by Benson Quote Link to comment
Mogo Posted June 6, 2020 Author Share Posted June 6, 2020 Thanks, I will give that a try. I see what you are trying to do by creating a semi wind tunnel and then letting the hot air escape. Honestly I don't expect it to work in my situation (everything I try seems to fail), however, I'll try anything at this point. Quote Link to comment
Michael_P Posted June 6, 2020 Share Posted June 6, 2020 FWIW- I am using the Norco 4224 as well, with all Noctua fans too, with 2 LSI HBAs, a 10Gb network card and an expander and 19 spinners and 2 SSDs and mine doesn't overheat. And my cable management is atrocious, lol. It's not even in a cool area, the ambient temp is low to mid 80's F. Quote Link to comment
Mogo Posted June 7, 2020 Author Share Posted June 7, 2020 (edited) 25 minutes ago, Michael_P said: FWIW- I am using the Norco 4224 as well, with all Noctua fans too, with 2 LSI HBAs, a 10Gb network card and an expander and 19 spinners and 2 SSDs and mine doesn't overheat. And my cable management is atrocious, lol. It's not even in a cool area, the ambient temp is low to mid 80's F. What model LSI are you using? The 93xx or the 92xx? In my research since having this problem it was recommended on some forums to go with the 92xx since they apparently run a great deal cooler compared to the 93xx that runs hotter. Edited June 7, 2020 by Mogo Quote Link to comment
Michael_P Posted June 7, 2020 Share Posted June 7, 2020 9207-8i Active cooling should be more than enough, tho. If it was just heat, I'd think slapping fans on the heatsink should have solved it no problem. You might think about replacing the thermal compound Quote Link to comment
Mogo Posted June 7, 2020 Author Share Posted June 7, 2020 Well in the meantime since I could not get the server to run, I ordered what I believe is a LSI 9207-8i. When I get the card I will try running that with the Supermicro card I have and see if everything runs smoothly. If it does, then I can play around with the 9305 card and do what you suggest or perhaps ask LSI for a replacement if that's possible. Quote Link to comment
Mogo Posted June 8, 2020 Author Share Posted June 8, 2020 So I tested the LSI 9207-8i and had the same issue. I did some further tests. If anyone can think of something let me know. Running memtest from the unraid boot screen allows me to simulate the issue after awhile. I did the following scenarios. All the ones that passed completed 1 memtest pass which generally ran for 30 minutes to 2.5 hours (if i forgot and it was still running). For all the scenarios below the cpu temperature reported by memtest was between 28°C - 41°C. Memtest was run in multi-threaded mode and during test 2, memory errors actually showed up so I removed the 2nd stick of ram. No other memory errors were seen in any subsequent tests. Also the case cover was on for all tests. 1) 0 LSI cards installed & 0 backplanes connected - passed 2) 2x LSI 9207-8i installed & 3 backplanes connected - failed 3) LSI 9305-24i installed & 0 backplanes connected - passed 4) LSI 9305-24i installed & 1 backplane connected - passed 5) LSI 9305-24i installed & 2 backplanes connected - failed Quote Link to comment
Michael_P Posted June 9, 2020 Share Posted June 9, 2020 Just pulling at threads here, but do you have any bent contacts on the CPU socket? Quote Link to comment
Mogo Posted June 9, 2020 Author Share Posted June 9, 2020 3 hours ago, Michael_P said: Just pulling at threads here, but do you have any bent contacts on the CPU socket? I just checked, I don't see anything bent or broken . I decided to replace the Noctua cpu cooler that was there with the stock intel cpu cooler. Temperatures actually went up to 48°C on memtest ( I guess all those cpu cooler reviews are true lol ). Alas the system rebooted after almost completing the first pass. Honestly, I'm out of ideas on what to do now. The only thing I haven't done is replace the psu. Can I ask what psu you are using? Quote Link to comment
Michael_P Posted June 9, 2020 Share Posted June 9, 2020 11 minutes ago, Mogo said: I just checked, I don't see anything bent or broken . I decided to replace the Noctua cpu cooler that was there with the stock intel cpu cooler. Temperatures actually went up to 48°C on memtest ( I guess all those cpu cooler reviews are true lol ). Alas the system rebooted after almost completing the first pass. Honestly, I'm out of ideas on what to do now. The only thing I haven't done is replace the psu. Can I ask what psu you are using? Flaky PSU could be the culprit, hard to say for sure unless you test/replace it I have a Thermaltake Toughpower Grand RGB 850W - here's a picture of my internals, stock cooler and poor airflow keep the CPU pretty warm, but it's stable Quote Link to comment
Mogo Posted June 9, 2020 Author Share Posted June 9, 2020 Wow I thought my cable management was bad. You make mine look professional lol, you weren't kidding about how yours was. Before I go and order a new psu, how do I test the current one and what should I be looking for? Quote Link to comment
Michael_P Posted June 9, 2020 Share Posted June 9, 2020 Looks professional enough with the lid on :D You'd probably need some sort of load tester, but that's outside of my knowledge. I usually just throw parts at stuff until it works Quote Link to comment
Mogo Posted June 13, 2020 Author Share Posted June 13, 2020 Just wanted to provide an update. I replaced the psu and so far the system has been running for almost 14 hours doing a parity check with the cover on the case. This is amazing. Fingers crossed since it hasn't been this stable in awhile. There are 3810 errors reported, but I believe that might be related to all the unclean shutdowns / server reboots over the last few weeks. Regardless of the outcome, I would like to thank everyone for their help and support. Quote Link to comment
JorgeB Posted June 14, 2020 Share Posted June 14, 2020 Thanks for the update. 14 hours ago, Mogo said: There are 3810 errors reported, but I believe that might be related to all the unclean shutdowns / server reboots over the last few weeks. Most likely. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.