Failed disk w/ read errors


rawfuls

Recommended Posts

disk4 disabled overnight, log shows tons of errors, UnRAID shows a billion writes with tons of errors.

Not totally thinking it's failed, maybe a loose connection; but recently had a similar situation and lost a good chunk of data.

 

Would like some guidance before I move forward, attaching diagnostics below.

I believe the plan is to use unBalance to move all data from disk4 and let unBalance throw it onto the rest of the array.

Once unBalance is finished, New Config > Retain all config > All > leave disk4 blank, start up the array, let it rebuild.

Once rebuilt, stop array, downsize array down to 10 drives (now 11), then move disk5 to disk4, disk6 to disk 5, etc and should be good?

36a1ba680c412d72606c3067e4e66b28.png

diagnostics-20210215-1121.zip

Link to comment

Sigh, restarted and now getting an error at start...

 

MPT BIOS Fault 11b encountered at adapter PCI...

Firmware Fault Code: 4101h.

 

This happened to my last LSI 9201-8i card, I wonder if the motherboard is eating these or if these cards are actually dying.

Tried a different PCI slot to the same issue.

unnamed.jpg

Edited by rawfuls
Link to comment
7 hours ago, Kevek79 said:

Just because I am curious.

Do you have active cooling on the lsi board.

The chips on those can get very toasty when not cooled extensively,

 

Nope, so now we know it really is the card failing then :)

Saw heatsinks on the cards and figured it would be alright. 

Looks like I'll be on the market for another card and figure out an active cooling resolution.

Link to comment

Not sure that is the issue but you could try to add cooling to see if this improve things.

 

23 minutes ago, rawfuls said:

Nope, so now we know it really is the card failing then :)

Possible that the card crashes because of overheating. Does not mean that it is dead.

 

25 minutes ago, rawfuls said:

Saw heatsinks on the cards and figured it would be alright. 

Those cards are build for server racks with plenty of airflow through the add-on cards.

If you are using a regular case, it is possible that there is not enough airflow for just a heatsink.

 

Maybe the card is dead, but it might be quick test to plug a fan on the MB and point it to the heatsink to see if it behaves better.

Link to comment
16 minutes ago, ChatNoir said:

Not sure that is the issue but you could try to add cooling to see if this improve things.

 

Possible that the card crashes because of overheating. Does not mean that it is dead.

 

Those cards are build for server racks with plenty of airflow through the add-on cards.

If you are using a regular case, it is possible that there is not enough airflow for just a heatsink.

 

Maybe the card is dead, but it might be quick test to plug a fan on the MB and point it to the heatsink to see if it behaves better.

The card was cooled off and still wouldn't pass the MPT BIOS error message.

Unless it's isolated as a motherboard PCI error, it does seem that I could have taken both of these cards out with heat (my previous card was in the same spot, no active cooling).

 

Case is a Rosewill L4500. It's on the side-most PCI slot, so no real cooling going on. 

I'm seeing quite a few people bolt up 40mm fans to the heatsinks, so that may be in order.

Link to comment

New card came in and still having the same issue...

Appears it's the motherboard/BIOS, as after testing on my desktop, both the old card and new card work... go figure.

 

I've put the new card in, and connected only 4x hard drives (one SAS -> 4 SATA), and it seems to boot without any issues.

Once I use both SAS ports on the 9201, it no longer boots and shows the above error message.

 

Seems like with the total of 8x drives, the 9201i card is indicating a failure... what gives?

 

EDIT: okay.. some more info.

(1) LSI 9201-8i w/ 2 SAS (8x hard drives via splitters) -> no boot

(1) LSI 9201-8i w/ 1 SAS (4x hard drives via splitters) -> successful boot

(2) LSI 9201-8i w/ 1 SAS each (4x hard drives via splitters) -> only recognizes (1) LSI 9201-8i card (4x drives)

 

Testing in my own personal desktop:

(1) LSI 9201-8i w/ 2 SAS (8x hard drives via splitters) -> successful boot

 

Seems to me like the motherboard isn't able to read all 8 drives anymore, any belief to thinking it's just a BIOS modification?

Edited by rawfuls
Link to comment

Success! Kind of.

The drive failed and seems to be (shorting?) out the controller it's connected to. If connected to the LSI card, the whole adapter fails out.

If connected to the board, the whole board controller fails out.

 

At this point, the drive has been disconnected and the parity is emulating the contents.

Is this where I now go New Config > Preserve All.

Do I downsize by one drive or start the array with the same # of drives, but leave disk4 empty?

Link to comment
15 minutes ago, JorgeB said:

New config won't preserve any data in the emulated drive, if you don't want to rebuild that drive with a different disk first move all the data from the emulated disk to others then do the new config (you'll need to re-sync parity).

So unBalance from disk4 to all other disks. New Config, Preserve All; reduce by 1 disk in total # of disk?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.