Unable to start array, server hangs


badboyz

Recommended Posts

I have been trying to figure this out for last 2 days and I'm just at a loss on what to do.  I had a drive drop out of the array and had previous issues so moved unraid from running in esxi as a VM to bare metal on a dell r710.  All the hardware is the same other than not being passed through from esxi. When trying to start the array it gets hung up while mounting the drives.  attached are the diagnostics.  Please help.

nas-diagnostics-20190207-2048.zip

Link to comment
4 hours ago, johnnie.black said:

Strange issue, read errors on several disks, only thing that jumps out it that LSI HBA where they are connected is using an old firmware, update to latest 20.00.07.00 like the other one and see if there's any difference.

How do your get the LSI firmware version from the diagnostics?  That would avoid me having to reboot my system (where I currently have some long running jobs in progress) to check what version my LSI controller is using.

Link to comment
4 minutes ago, itimpi said:

How do your get the LSI firmware version from the diagnostics?  That would avoid me having to reboot my system (where I currently have some long running jobs in progress) to check what version my LSI controller is using.

From the syslog:

 

Feb  7 20:35:30 nas kernel: mpt2sas_cm0: LSISAS2008: FWVersion(17.00.01.00), ChipRevision(0x03), BiosVersion(07.33.00.00)

 

Link to comment
2 minutes ago, johnnie.black said:

From the syslog:

 


Feb  7 20:35:30 nas kernel: mpt2sas_cm0: LSISAS2008: FWVersion(17.00.01.00), ChipRevision(0x03), BiosVersion(07.33.00.00)

 

Thanks - that helped me find it.  Not sure why I failed to spot it myself but I did.   My system is saying

Feb  7 10:19:34 DJW-UNRAID kernel: mpt2sas_cm0: LSISAS2116: FWVersion(19.00.00.00), ChipRevision(0x02), BiosVersion(07.37.00.00)

That sounds like I should be doing a firmware upgrade?

Link to comment
5 hours ago, johnnie.black said:

Strange issue, read errors on several disks, only thing that jumps out it that LSI HBA where they are connected is using an old firmware, update to latest 20.00.07.00 like the other one and see if there's any difference.

I had a new HBA with 20.00.07.00 that I swapped out but I was still getting the same issue.  Going crazy I switched the server back to booting to esxi and running unraid as a VM and the array starts now.  Any idea why it might work in esxi with the controller being passed through but on the same physical server if I try running it bare metal it hangs?  

Happy its working again for now, but I'd really like to move to running it directly on the server (dell r710)

Link to comment

There are also read errors, this time on disk18 only, it has a very high number of CRC errors, so likely a bad cable, probably the same that was causing the errors on bare metal, as there are more disks with very high number of CRC errors, maybe a mini SAS cable, it can affect various disks, or a backplane/expander issue.

Link to comment

On bare metal there were errors on disks 9, 11, 17 and 19, all of them also with a very high number of CRC errors, but looking at the diags most of your disks have some, though most just a few, still doubt it's just a cable, guess you don't have notifications enable or you'd have been bombarded with warnings.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.