Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

My Server is having a Stroke. :(

Featured Replies

Hello everyone. Hope you're doing well. 
I've been experiencing some issues with my server, and I can't really seem to figure out what's going on :( 

My Server Consists of: 

 

  • Supermicro X9DRi-LN4F+ 
  • 2 x 2690 v2 
  • 128GB RAM
  • 8 Drives on a Dell H200
  • 3 Drives on the onboard SAS
  • 3 NVMes on an Asus Hyper M.2 card
  • 1660 Ti GPU 
  • A Dual Coral TPU on a PCIe adapter. 


A month ago, my server showed me read errors on all the drives that were connected to the H200. 
I restarted and it worked for a couple of days. 

Then it happened again. And so on and so forth.

 

Then I saw some errors on the onboard SAS as well. So I thought that was causing the issue. So I moved the drives from the onboard SAS to the onboard SATA. And then everything worked for about 20 days. 

Today, the server, again, experienced read errors on all drives of the H200 Controller. 
So I bought a new one, flashed it to IT mode, and installed it. Same errors. Right when I boot the server, it just shows read errors, and I have to shut it down. 

I tried a second H200. 
I tried different PCIe slots. 
I don't know what else to try, my mind is about to explode. I've been heavily relying on this server for my work, and it just cut my legs off. 
I can't pinpoint the problem, so I know what to change/fix/buy. 

This is what I see 15 seconds after I start the array:

image.thumb.png.372b35373dddd5ccda30ceabf851ae55.png

 

and then I have to shut it down. :( 

Any ideas ? 

  • 3 weeks later...
  • Author

Sorry I was on a business trip and had no internet to post diagnostics. I was also so busy I couldn't mess with the server itself.  
So I THINK I've figured out the issue. But I keep an eye on it, just in case. 
I'm only replying for a possible future problem of similar type. 

Turns out, the Dell H200 controller was overheating and causing all sorts of problems. 
I just strapped on a Fan with a couple of zip ties, and everything works flawlessly. 

Which though, is pretty weird, since this is exactly the way the server has been running for quite a while now, and the room is 24/7 air conditioned to 22 degrees. 
I don't know what's gotten into it now. 

That being said, I will remove the heatsink, replace the paste if there is any, which I guess there must be, (or a pad) and will place the fan properly using a 3D printed bracket. 
But for now, just cooling the card, did the trick. 

Anyway, just for reference for future users !

5 hours ago, bonamin said:

replace the paste if there is any, which I guess there must be

The cards normal have thermal epoxy which is not easy to remove, see here.

  • 2 weeks later...
  • Author
On 5/19/2025 at 12:08 AM, Wody said:

The cards normal have thermal epoxy which is not easy to remove, see here.

Mine just have thermal paste, like a CPU.

So, new update. The server had been running fine up till like 2 days ago.
When the fire nation attacked. :D

Same problem. All disks on the HBA card get read errors.
I really don't know what to do. This frustrates me really, REALLY much. :(

I will wait for the next error, and I will post the diagnostics. Maybe someone can help.

  • 2 weeks later...
  • Author

Well, I managed to get the diag.

If anyone can take a look at this, I'm more than happy to listen to your thoughts.

I know removed the card all-together, and I've connected all 9 drives on the motherboard. 6 via SATA, and 3 vis the SAS port.

Since the problems begun, I've completely removed the Parity Drive, as I had to constantly rebuild it which would probably fry my drives all-together.

Anyway, other that 1 specific drive with all my family photos, the rest of the drives are media only so I don't really care even if one fails right now.
I just wanna fix this. This is getting seriously annoying, and I don't know where to move from here.

theark-diagnostics-20250604-1403.zip

Jun 4 11:20:19 TheArk kernel: mpt2sas_cm0: SAS host is non-operational !!!!

Problems with the LSI controller, is this onboard?

  • Author
3 hours ago, JorgeB said:

Jun 4 11:20:19 TheArk kernel: mpt2sas_cm0: SAS host is non-operational !!!!

Problems with the LSI controller, is this onboard?

No, it's not onboard.

I'm using a Dell PERC H200 card. (IT Mode)
Problem is, I've bought a second identical card, and both the cards gave me errors.

I've then cooled the cards using a fan, and also changed the thermal compound.
After this, I didn't have errors for a month. Until I had errors again. :D

This is what confuses me. I don't know if the cards are fried, if the cooling is insufficient, if a drive is misbehaving and causing the card to crash or something, or if the motherboard or one of the CPUs is problematic.

By this point, i"m even considering the PSU. I don't know what else to try.

Would you suggest I buy a 3rd card ?

  • Author

I did try different slots, and on different CPUs as half of the slots are via CPU1 and the others via CPU2.
didn't change anything.

In that case I would try a different controller, can still be an LSI, e.g., 9300-8i.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.