SAS 9305-24i Overheating


Recommended Posts

I've got a 24 bay norco 4224 case.  I tried attaching a small fan to the LSI card however it still overheats.  This causes the system to keep rebooting.  Unfortunately the system is not stable for more than an hour or two.  Any suggestions on how to fix this?  Are there any other cards than run cooler?  I only have the following slots remaining on the motherboard:

 

PCIe 3.0 x1

PCIe 3.0 x4

PCIe 3.0 x8 (if I remove the LSI)

 

12 Bays are currently populated. 

Link to comment
10 minutes ago, Mogo said:

I tried attaching a small fan to the LSI card however it still overheats.

And are you sure that's the problem? I have some LSIs without direct cooling, just some general case airflow without any issues, and it should never make your server reboot anyway.

Link to comment

In the BMC panel this always shows up when there's a reboot:

 

Unknown    BIOS POST Progress    Error- - Asserted
Unknown    Microcontroller / Coprocessor    Transition to Running - Asserted

 

If I turn on the system and have the case cover on, It will continually reboot either before starting unraid or shortly after unraid is loaded.  I have a SUPERMICRO AOC-SAS2LP-MV8 when that was installed there were no issues.  I can't go back and test with that card as the array is at 12 instead of 8 for that card.

Link to comment
6 hours ago, Mogo said:

If I turn on the system and have the case cover on, It will continually reboot either before starting unraid or shortly after unraid is loaded.

Better got some case air temperature figure for troubleshooting , does case have middle FAN ? ( next to disk backplane )

 

If you have FAN

image.png.1f996ff338ace8c0890746182ee28c00.png  image.png.5739c4027ae6f7f78e94a704cdee4e3b.png 

 

 

3 hours ago, Mogo said:

Would unraid detect the drives connected to two different sas cards?  Such as a SAS 9207-8i and a SUPERMICRO AOC-SAS2LP-MV8?

Yes.

Edited by Benson
Link to comment
14 hours ago, Benson said:

Better got some case air temperature figure for troubleshooting , does case have middle FAN ? ( next to disk backplane )

 

If you have FAN

image.png.1f996ff338ace8c0890746182ee28c00.png  image.png.5739c4027ae6f7f78e94a704cdee4e3b.png 

 

 

Yes.

Yes the case came with a fan wall.  There are 3x120mm fans on the side adjacent to the hard drives.  I even tried reversing the fans to the other side of the motherboard section and still had the same result.  The stock fans I replaced a few years ago with Noctua fans.

Edited by Mogo
Link to comment

I have similar problem several months ago, system not reboot but one of LSI HBA dead when I fully populated all disks. The problem are most intake air from disk side were block, so hot air can't effective get out and case ( not norco 4224 ) temp around ~50c. ( I haven't install rear FAN and no FAN for HBA, high speed stock fan also change to silent type )

 

By different experiment, I got a solution by increase cool air intake ( air hole between backplane and fan wall I seal before and now release ) and reverse the fan. The result are disk got hotter, but other parts temperature reduce a lot.

 

So I would suggest you make a test, not reverse the fan and keep the small fan for HBA, don't fully cover the case, just cover up to fan wall and see the different. ( I notice you are not populated all disks, this also make me think why you will got over heat )

Edited by Benson
Link to comment

Thanks, I will give that a try.  I see what you are trying to do by creating a semi wind tunnel and then letting the hot air escape.  Honestly I don't expect it to work in my situation (everything I try seems to fail), however, I'll try anything at this point.

Link to comment

FWIW- I am using the Norco 4224 as well, with all Noctua fans too, with 2 LSI HBAs, a 10Gb network card and an expander and 19 spinners and 2 SSDs and mine doesn't overheat. And my cable management is atrocious, lol. It's not even in a cool area, the ambient temp is low to mid 80's F.

Link to comment
25 minutes ago, Michael_P said:

FWIW- I am using the Norco 4224 as well, with all Noctua fans too, with 2 LSI HBAs, a 10Gb network card and an expander and 19 spinners and 2 SSDs and mine doesn't overheat. And my cable management is atrocious, lol. It's not even in a cool area, the ambient temp is low to mid 80's F.

What model LSI are you using?  The 93xx or the 92xx?  In my research since having this problem it was recommended on some forums to go with the 92xx since they apparently run a great deal cooler compared to the 93xx that runs hotter.

Edited by Mogo
Link to comment

Well in the meantime since I could not get the server to run, I ordered what I believe is a LSI 9207-8i.  When I get the card I will try running that with the Supermicro card I have and see if everything runs smoothly.  If it does, then I can play around with the 9305 card and do what you suggest or perhaps ask LSI for a replacement if that's possible.

Link to comment

So I tested the LSI 9207-8i and had the same issue.  I did some further tests.  If anyone can think of something let me know.

 

Running memtest from the unraid boot screen allows me to simulate the issue after awhile.  I did the following scenarios.  All the ones that passed completed 1 memtest pass which generally ran for 30 minutes to 2.5 hours (if i forgot and it was still running).  For all the scenarios below the cpu temperature reported by memtest was between 28°C - 41°C.  Memtest was run in multi-threaded mode and during test 2, memory errors actually showed up so I removed the 2nd stick of ram.  No other memory errors were seen in any subsequent tests.  Also the case cover was on for all tests.

 

1) 0 LSI cards installed & 0 backplanes connected - passed

2) 2x LSI 9207-8i installed & 3 backplanes connected - failed

3) LSI 9305-24i installed & 0 backplanes connected - passed

4) LSI 9305-24i installed & 1 backplane connected - passed

5) LSI 9305-24i installed & 2 backplanes connected - failed

 

Link to comment
3 hours ago, Michael_P said:

Just pulling at threads here, but do you have any bent contacts on the CPU socket?

I just checked, I don't see anything bent or broken .  I decided to replace the Noctua cpu cooler that was there with the stock intel cpu cooler. Temperatures actually went up to 48°C on memtest ( I guess all those cpu cooler reviews are true lol ).  Alas the system rebooted after almost completing the first pass.  Honestly, I'm out of ideas on what to do now.  The only thing I haven't done is replace the psu.  

 

Can I ask what psu you are using?

Link to comment

 

11 minutes ago, Mogo said:

I just checked, I don't see anything bent or broken .  I decided to replace the Noctua cpu cooler that was there with the stock intel cpu cooler. Temperatures actually went up to 48°C on memtest ( I guess all those cpu cooler reviews are true lol ).  Alas the system rebooted after almost completing the first pass.  Honestly, I'm out of ideas on what to do now.  The only thing I haven't done is replace the psu.  

 

Can I ask what psu you are using?

 

Flaky PSU could be the culprit, hard to say for sure unless you test/replace it

 

I have a Thermaltake Toughpower Grand RGB 850W - here's a picture of my internals, stock cooler and poor airflow keep the CPU pretty warm, but it's stable

 

Case.thumb.jpg.9a4559ded9b37100d0b7d5500e3d8fc2.jpg

 

 

Link to comment

Wow I thought my cable management was bad.  You make mine look professional lol, you weren't kidding about how yours was.  Before I go and order a new psu, how do I test the current one and what should I be looking for?

Link to comment

Just wanted to provide an update.  I replaced the psu and so far the system has been running for almost 14 hours doing a parity check with the cover on the case.  This is amazing.  Fingers crossed since it hasn't been this stable in awhile.  There are 3810 errors reported, but I believe that might be related to all the unclean shutdowns / server reboots over the last few weeks.  Regardless of the outcome, I would like to thank everyone for their help and support.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.