Skip to content
View in the app

A better way to browse. Learn more.

Unraid

A full-screen app on your home screen with push notifications, badges and more.

To install this app on iOS and iPadOS
  1. Tap the Share icon in Safari
  2. Scroll the menu and tap Add to Home Screen.
  3. Tap Add in the top-right corner.
To install this app on Android
  1. Tap the 3-dot menu (⋮) in the top-right corner of the browser.
  2. Tap Add to Home screen or Install app.
  3. Confirm by tapping Install.

Disk Read Errors on multiple disks. Need help diagnosing. LSI 9211 reporting: FAULT_STATE(0X2622)

Featured Replies

v 6.8.3

After moving my unRAID over to a different motherboard I've encountered read errors which I might be the fault of my LSI expansion card.  This is the second that resulted in a disk getting disabled.  I originally though it had something to do with my UEFI boot setting since the first time happened right at start up.  See topic below.  This time, the server was running for a couple days before it randomly happened.  

 

Does anyone have an idea of what the issue is specifically?  Is it the expansion card?  Or could problem still be my HDD that got disabled?  Before migrating over to the new mobo, I had zero issues for a year and half.  So I'd surprised if the card simply started dying out of nowhere.  One notable difference is I have 4 HDD's connected to the card instead of two.   

 

What are somethings I can try before buying a new card?  Is it possible the card is overheating? (unlikely due to when the errors popped up)

Is there any chance a mobo BIOS update will do anything?

 

 

 

nas-ng-diagnostics-20210420-1855.zip 

Edited by Marc_G2

  • Author

Before shutting the system down or anything, I started the array in maintenance mode and started a read check.  So far it hasn't given any errors.  So is it likely that the problem is that one HDD?  Disk 1 was the drive that got disabled in both occasions?  But if it's just that disk, does it make any sense for unRAID to report errors on the other disks?  Also the SMART stats for Disk 1 didn't indicate any issues either.

Edited by Marc_G2

Problem seems on ST4000VN000-2AH166 ( disk 3 ), it haven't response to HBA on-time, so HBA reset again and again, this will affect all disk which connect to HBA.

 

You should disconnect the SATA link at disk side one by one ( HBA disk only, stop array ), then keep track web log until HBA no more reset, this could narrow down the cause a bit.

Edited by Vr2Io

  • Author
2 minutes ago, Vr2Io said:

Problem seems on ST4000VN000-2AH166 ( disk 3 ), it haven't response to HBA on-time, so HBA reset again and again, this will affect all disk which connect to HBA.

 

You should disconnect the SATA link at disk one by one, then keep track the log until HBA no more reset, this could narrow down the cause a bit.

That would show up as an error in the system log right?  The problem there is my disks are getting disabled which requires a full rebuild afterward.  I swapped the sata cables and I'm doing rebuild right now.  I haven't seen any errors yet.

Just now, Marc_G2 said:

That would show up as an error in the system log right? 

Yes

 

1 minute ago, Marc_G2 said:

The problem there is my disks are getting disabled which requires a full rebuild afterward.

The problem is HBA non-stop reset due to device no response.

 

Previous reply amend

You should disconnect the SATA link at disk side one by one ( HBA disk only, stop array ), then keep track web log until HBA no more reset, this could narrow down the cause a bit.

  • Author
11 minutes ago, Vr2Io said:

Previous reply amend

You should disconnect the SATA link at disk side one by one ( HBA disk only, stop array ), then keep track web log until HBA no more reset, this could narrow down the cause a bit.

The problem is there's no errors most of the time.   So if the array isn't active, it seems especially unlikely for the error to occur.   

 

What line in the system log did you find that the issue started with disk 3?

8 minutes ago, Marc_G2 said:

The problem is there's no errors most of the time.

As said, HBA non-stop reset ... did you got that in Web log viewer ?

 

8 minutes ago, Marc_G2 said:

What line in the system log did you find that the issue started with disk 3?

It always late response or no response. Below is example missing device 5:0:0:0

 

Apr 20 18:33:16 NAS-NG kernel: sd 5:0:1:0: Power-on or device reset occurred
Apr 20 18:33:16 NAS-NG kernel: sd 5:0:2:0: Power-on or device reset occurred
Apr 20 18:33:16 NAS-NG kernel: sd 5:0:3:0: Power-on or device reset occurred

 

Apr 20 18:33:25 NAS-NG kernel: sd 5:0:1:0: Power-on or device reset occurred
Apr 20 18:33:25 NAS-NG kernel: sd 5:0:2:0: Power-on or device reset occurred
Apr 20 18:33:25 NAS-NG kernel: sd 5:0:3:0: Power-on or device reset occurred

 

After HBA no reset, then shoot the real cause by swap cable/disks/port, HBA / HBA ports / cable / disks could be the cause, you need well troubleshoot out.

Edited by Vr2Io

  • Author
5 minutes ago, Vr2Io said:

As said, HBA non-stop reset ... did you got that in Web log viewer ?

The system has running for a couple hours.  These are the only errors the system log is showing right now.

 

Capture.thumb.PNG.4ab6cfcd65cd2684dca4f5f136c19944.PNG

Then you can keep track until problem happen again.

  • Author

After looking over the system logs, I'm now thinking the LSI card (or less likely, the motherboard) is the problem.  I don't think it's any of the disks. 

 

But if anyone else has additional theories or things to try, please share

  • Community Expert

Yes, looks like an issue with the HBA:

 

Apr 20 18:34:05 NAS-NG kernel: mpt2sas_cm0: fault_state(0x2622)!
Apr 20 18:34:05 NAS-NG kernel: mpt2sas_cm0: sending diag reset !!
Apr 20 18:34:06 NAS-NG kernel: mpt2sas_cm0: diag reset: SUCCESS

 

Have you tried not sleeping the server? Not every hardware supports sleep/wake up correctly.

  • Marc_G2 changed the title to Disk Read Errors on multiple disks. Need help diagnosing. LSI 9211 reporting: FAULT_STATE(0X2622)
  • Author
Just now, JorgeB said:

Have you tried not sleeping the server? Not every hardware supports sleep/wake up correctly.

That's something that crossed my mind.  But I'm pretty sure that first time this issue happened was shortly after a boot up before ever going to sleep.  Later today I'm going to see if I can trigger the error by putting it to sleep and waking it up again. And just starting and stopping the array.

 

Is there the a way to configure unRAID to better handle this error?  Could I make unRAID immediately stop the array once it starts seeing this particular fault?  The way unRAID disables one of my disks after trying repeated resets is major a headache.  

  • Author
10 minutes ago, Marc_G2 said:

But I'm pretty sure that first time this issue happened was shortly after a boot up before ever going to sleep.

Actually the April 17th log seems to show it did go to sleep for some reason (normally it'd never go to sleep on Saturday).  So I'll focus on that area

Edited by Marc_G2

  • Author

I put system to sleep and woke it again with the array stopped.  And then tried starting it in maintenance mode.  I'm not getting fault so far.  My guess is the BIOS update fixed the issue.  I'd appreciate it if someone knowledgeable could look at this log after wake up to see if there's anything concerning.  

 

The one concerning thing I saw was a warning.  

ata4: COMRESET failed (errno=-16)

 

nas-ng-syslog-20210421-2123.zip

  • Community Expert
9 hours ago, Marc_G2 said:

ata4: COMRESET failed (errno=-16)

I've seen those before after waking up, probably normal.

  • Author

So these errors are a more concerning.  Why am I getting drive errors right after they all spin down?   The server is in maintenance mode at the moment, so does that have something to do with it?  

 

image.png.97cd2b8b324e889c35648c2075fd2f71.png

  • Community Expert
6 minutes ago, Marc_G2 said:

The server is in maintenance mode at the moment, so does that have something to do with it?  

No, and those errors after spin down are not that uncommon, and not a real problem, sometimes changing the disks APM level or disabling it helps.

Archived

This topic is now archived and is closed to further replies.

Account

Navigation

Search

Search

Configure browser push notifications

Chrome (Android)
  1. Tap the lock icon next to the address bar.
  2. Tap Permissions → Notifications.
  3. Adjust your preference.
Chrome (Desktop)
  1. Click the padlock icon in the address bar.
  2. Select Site settings.
  3. Find Notifications and adjust your preference.