(Solved) Failing Disk - Next Steps


Recommended Posts

Hello Guys,

 

From prior expirience I learned not to act to quick and ask for advice here.

I had a Parity Check stared wich was showing quite a lot of sync erroes (30k+) and after completing about 70% a disk seems to have failed so I grabbed the diagnostics and attached them here. Can/Should I just replace the failing disk or is there something else I should try/do?

 

Best Regards

knowlage-diagnostics-20190109-1058.zip

Edited by Jaster
Link to comment

Several issues:

 

-are the sync errors expected? They are unrelated to the failing disk, any recent unclean shutdown?

-disk3 needs a new SATA cable

-disk1 appears to be failing, despite an healthy SMART report, run an extended SMART test.

-you're using a Marvell controller with a port multiplier, that should be replaced ASAP by an LSI

-disk9 dropped offline, most likely because of the marvell/port multiplier controller but since therer's no SMART report will need new diags after rebooting.

Link to comment
1 hour ago, johnnie.black said:

-are the sync errors expected? They are unrelated to the failing disk, any recent unclean shutdown?

Yes. I do have issues with and AMD GPU. It seems to cause deadlock for the whole server. But I couldn't find any solution/advice so I had several unclean shutdowns.

 

1 hour ago, johnnie.black said:

-disk3 needs a new SATA cable

Will do.

 

1 hour ago, johnnie.black said:

-disk1 appears to be failing, despite an healthy SMART report, run an extended SMART test.

Running it. I'll post as soon as it's done... seems to take a while

 

1 hour ago, johnnie.black said:

-you're using a Marvell controller with a port multiplier, that should be replaced ASAP by an LSI

I wans't aware of any issues and had been running this controller for a bout 3 years (in a different setup). Do I need to stick to LSI or are JMicron or Broadcom acceptable aswell?

 

1 hour ago, johnnie.black said:

 -disk9 dropped offline, most likely because of the marvell/port multiplier controller but since therer's no SMART report will need new diags after rebooting.

The disk is gone after the reboot, I attached the diags after rebooting

knowlage-diagnostics-20190109-1555.zip

Link to comment

Disk9 looks fine, likely controller related, when there's an error on a disk on one a port multiplier it can timeout and cause issues on the other disks there, it appears to me that's what happened.

 

4 hours ago, Jaster said:

Do I need to stick to LSI or are JMicron or Broadcom acceptable aswell?

JMicron is so-so, but not the best choice, LSI was bought by Broadcom, then Avago bough Broadcom, though they still use the Broadcom name, that's the best option for Unraid, Marvell controllers are by themselves not recommended, Marvell with a port multiplier it's just asking for trouble.

Link to comment

There's nothing on those logs about controllers issues, so it's likely not that, it could be related to this:

Dec 19 13:23:46 Knowlage kernel: resource sanity check: requesting [mem 0x000c0000-0x000dffff], which spans more than PCI Bus 0000:00 [mem 0x000c4000-0x000c7fff window]
Dec 19 13:23:46 Knowlage kernel: caller pci_map_rom+0x68/0xaf mapping multiple BARs
Dec 19 13:23:46 Knowlage kernel: resource sanity check: requesting [mem 0x000c0000-0x000dffff], which spans more than PCI Bus 0000:00 [mem 0x000c4000-0x000c7fff window]
Dec 19 13:23:46 Knowlage kernel: caller pci_map_rom+0x68/0xaf mapping multiple BARs
Dec 19 13:23:46 Knowlage kernel: vfio-pci 0000:65:00.0: vgaarb: changed VGA decodes: olddecodes=io+mem,decodes=io+mem:owns=io+mem

But VMs are not my forte and I don't really know what the above means, it could be harmless.

  • Upvote 1
Link to comment

I swapped the disk 9 port away from the controller to the main board and it returend.

I did the same for disk 3 and also changed the sata cable, but still seem to have some crc errors.

 

Just to be save, I started a parity check. Once its complete (~30h) I'll try to fetch the disk 1 smart data one more time.

knowlage-diagnostics-20190110-1432.zip

 

I also order a 9211-8i controller as I can see the difference not using my current one already just with the mainboard ports. Thanks for that one too!

Edited by Jaster
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.