badboyz Posted February 8, 2019 Share Posted February 8, 2019 I have been trying to figure this out for last 2 days and I'm just at a loss on what to do. I had a drive drop out of the array and had previous issues so moved unraid from running in esxi as a VM to bare metal on a dell r710. All the hardware is the same other than not being passed through from esxi. When trying to start the array it gets hung up while mounting the drives. attached are the diagnostics. Please help. nas-diagnostics-20190207-2048.zip Quote Link to comment
JorgeB Posted February 8, 2019 Share Posted February 8, 2019 Strange issue, read errors on several disks, only thing that jumps out it that LSI HBA where they are connected is using an old firmware, update to latest 20.00.07.00 like the other one and see if there's any difference. Quote Link to comment
itimpi Posted February 8, 2019 Share Posted February 8, 2019 4 hours ago, johnnie.black said: Strange issue, read errors on several disks, only thing that jumps out it that LSI HBA where they are connected is using an old firmware, update to latest 20.00.07.00 like the other one and see if there's any difference. How do your get the LSI firmware version from the diagnostics? That would avoid me having to reboot my system (where I currently have some long running jobs in progress) to check what version my LSI controller is using. Quote Link to comment
JorgeB Posted February 8, 2019 Share Posted February 8, 2019 4 minutes ago, itimpi said: How do your get the LSI firmware version from the diagnostics? That would avoid me having to reboot my system (where I currently have some long running jobs in progress) to check what version my LSI controller is using. From the syslog: Feb 7 20:35:30 nas kernel: mpt2sas_cm0: LSISAS2008: FWVersion(17.00.01.00), ChipRevision(0x03), BiosVersion(07.33.00.00) Quote Link to comment
itimpi Posted February 8, 2019 Share Posted February 8, 2019 2 minutes ago, johnnie.black said: From the syslog: Feb 7 20:35:30 nas kernel: mpt2sas_cm0: LSISAS2008: FWVersion(17.00.01.00), ChipRevision(0x03), BiosVersion(07.33.00.00) Thanks - that helped me find it. Not sure why I failed to spot it myself but I did. My system is saying Feb 7 10:19:34 DJW-UNRAID kernel: mpt2sas_cm0: LSISAS2116: FWVersion(19.00.00.00), ChipRevision(0x02), BiosVersion(07.37.00.00) That sounds like I should be doing a firmware upgrade? Quote Link to comment
JorgeB Posted February 8, 2019 Share Posted February 8, 2019 AFAIK there are no known issues with p19, but it won't hurt to update to latest. Quote Link to comment
badboyz Posted February 8, 2019 Author Share Posted February 8, 2019 5 hours ago, johnnie.black said: Strange issue, read errors on several disks, only thing that jumps out it that LSI HBA where they are connected is using an old firmware, update to latest 20.00.07.00 like the other one and see if there's any difference. I had a new HBA with 20.00.07.00 that I swapped out but I was still getting the same issue. Going crazy I switched the server back to booting to esxi and running unraid as a VM and the array starts now. Any idea why it might work in esxi with the controller being passed through but on the same physical server if I try running it bare metal it hangs? Happy its working again for now, but I'd really like to move to running it directly on the server (dell r710) Quote Link to comment
JorgeB Posted February 8, 2019 Share Posted February 8, 2019 21 minutes ago, badboyz said: Any idea why it might work in esxi with the controller being passed through but on the same physical server if I try running it bare metal it hangs? Not really, very strange issue, can you post the diags from the VM, doubt it will help but just in case. Quote Link to comment
badboyz Posted February 8, 2019 Author Share Posted February 8, 2019 1 hour ago, johnnie.black said: Not really, very strange issue, can you post the diags from the VM, doubt it will help but just in case. here are the diags from the VM nas-diagnostics-20190208-0855.zip Quote Link to comment
JorgeB Posted February 8, 2019 Share Posted February 8, 2019 There are also read errors, this time on disk18 only, it has a very high number of CRC errors, so likely a bad cable, probably the same that was causing the errors on bare metal, as there are more disks with very high number of CRC errors, maybe a mini SAS cable, it can affect various disks, or a backplane/expander issue. Quote Link to comment
JorgeB Posted February 8, 2019 Share Posted February 8, 2019 On bare metal there were errors on disks 9, 11, 17 and 19, all of them also with a very high number of CRC errors, but looking at the diags most of your disks have some, though most just a few, still doubt it's just a cable, guess you don't have notifications enable or you'd have been bombarded with warnings. Quote Link to comment
badboyz Posted February 9, 2019 Author Share Posted February 9, 2019 Thanks for the assistance Johnnie. I think its the old netapp 24 bay shelf making the crc errors. I'm planning to move back to my 12 bay lenovo sa120. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.