Drive Identification "Wrong" on some disks after 6.34 to 6.4 upgrade


Recommended Posts

Help!

 

I was previously running v6.34, with 2 Parity Disks (8TB), 2 Cache Disks (256GB SSD), and 13 Data disks (Mix of 3 & 4 TB SAS).  Months back, I had added 4 new disks to expand the array.  After which, when rebooting, those Disks would show as "Missing" / "No Device".  I would then select the appropriate disk for each device and start the array.  PITA, but hadn't been able to spend the time to troubleshoot.  Jump to this morning, verified by nightly CA_Backup and upgraded to 6.4 through the Plugin.  Upgrade completed successfully, I went into Settings\Disk Settings and disabled Auto Start prior to rebooting the array (since I assumed I would need to re-assign the "Problem" devices).

 

Sure enough, after reboot the same 4 disks are marked as "Missing" / "No Device".  I selected the appropriate drives, at which point they change from "Missing" to "Wrong".  Unlike in 6.34, which recognized the identification and allowed me to start the array.

 

The names do appear different in Identification-

for example: 

H7240AS60SUN4.0T_001402E60HRX_PBH60HRX - 4 TB 

when it's looking for: 

H7240AS60SUN4.0T_001402E60HRX - 4 TB

 

I believe the LSI controller is contributing to the name difference, however in 6.34 I was able to manually assign (through the GUI) and start the array. 

 

So my big question is should I use Tools\New Config to reassign the disk to device mappings (Preserving the Parity and Cache slots), or is there a better way to resolve (I can always revert back to 6.34)?

 

I've include my last syslog, and some images of the Array Devices screen from 6.34 and now 6.4.

 

Thanks all, any help would be greatly appreciated!

hatchnas-syslog-20180117-1313.zip

UNRAID_v6_4-DriveAssignments.PNG

UNRAID_v6_34-Missing disk.PNG

Link to comment

I'm having a strange problem that sounds that it MIGHT be similar..  Unfortunately, I made the bad decision to make 3 differerent changes almost at the same time, which makes troubleshooting more "exciting"..

 

SO..  In addition to upgrading to the latest version, in the last 2 days, I have also changed out my controller card.  I was using an onboard controller, and switched to an AOC-SASLP-MV8 controller.  I also added a cache drive.


That being said..  the system came up just fine, and I was using it for a day or so.   This morning I got up to see that the server was inaccessible.  I went to the console to find continuous scrolling of write errors to drive 1.  All I could do is cold boot. 

 

Upon cold boot, it came up showing drive 1 was bad, but it let me mount the raid, showing drive 1 was being simulated (or whatever the terminology was on the screen... sorry)..  

 

Since the raid was up, I copied a few important files off, just in case, and then ran a quick SMART test on the "bad" disk, and everything looked fine.  I assume this failure was not real, but I am fine with rebuilding if necessary, but before I could even think about the next step, it came up and said I had 3 bad disks in the system.  At that moment, it seemed like the raid was still up, but just to be safe, I shut the machine down and stopped going further.


At this point, I haven't turned it back on.  My plan was to investigate whether software problems have been detected in the latest version, and/or back out to the old version, plus remove the cache drive, since nothing has been written and it just confuses the situation for now. Also, I started looking for firmware updates on the  SAS card, which I have found, but I'm confused because the docs say my machine should have been prompting me to hit CTRL-M to go into a setup mode, and I do not get that prompt.  

 

Short of other guidance / recommendations, I am going to try to boot onto a DOS image, and see if I can run any diags  /firmware for the HBA, and/or switch back to the motherboard SATA connectors and see what happens, but I'd be curious if anyone has any recommendations?

 

Thanks!

Steve

Link to comment

I've pulled the Diagnostics to submit with a defect report. 

 

Any recommendations around my "New Config" question?  

 

What are the ramifications to running "New Config" while preserving the Parity and Cache slot assignments.  Then reassigning the disks to the same slots they were previously (with WWDN set to automatic)?

 

Thoughts?

Link to comment

So I'm a little confused. 

 

If the identification on the Data Slots is where I'm having an issue, why should I choose "Retain all"?

 

Won't that just keep the same Device/Identification naming problem that I'm currently seeing on those 4 data slots?  I'm definitely not a guru on what "New Config' actually re-writes (frankly on anything), however reading the Utility's notes seems to imply that I should retain those slot groups that are "Known Good" (thus the Parity and Cache).

 

My apologies for questioning your advise, I just want to make sure I understand the process/utility.

Link to comment
11 minutes ago, Nhatch411 said:

So I'm a little confused. 

 

If the identification on the Data Slots is where I'm having an issue, why should I choose "Retain all"?

 

Won't that just keep the same Device/Identification naming problem that I'm currently seeing on those 4 data slots?  I'm definitely not a guru on what "New Config' actually re-writes (frankly on anything), however reading the Utility's notes seems to imply that I should retain those slot groups that are "Known Good" (thus the Parity and Cache).

 

My apologies for questioning your advise, I just want to make sure I understand the process/utility.

You can start with the ‘retain all’ option, and then change only the slots that need correcting.     Much easier (and less error prone) than trying to enter all slots from scratch.

Link to comment

Running the New Config Utility (with Retain All) corrected the naming issue and all drives are now online with a running array! 

 

Once back in the Main view, I selected Parity is Valid and started the Array, everything seems to be back to norm.

 

Later tonight I will reboot the array to see if the original issue is resolved,  I'll update this thread after verification.

 

Thanks all (especially bonienl)!

  • Like 1
Link to comment

Quick and final update. 

 

I rebooted the array, all drives were recognized and the array auto started without issue! 

 

The original problem with the 4 newest HDD's listing as "No Device" in their disk slot after reboot has been resolved, as well as the "Wrong" device error that appeared after the upgrade.

 

Thanks all!

  • Like 1
Link to comment
  • 4 months later...

I am having a similar issue where I installed an additional raid card in my server and moved one of the disks from the old card to the new one.

Unfortunately, I'm getting the "wrong disk" error.
Is there anything else I could do or is the New Config Utility the way?

If so, what are the steps? start array first? mount disk? don't mount disk?

 

Thanks.

Link to comment
3 hours ago, RevelRob said:

I am having a similar issue where I installed an additional raid card in my server and moved one of the disks from the old card to the new one.

Unfortunately, I'm getting the "wrong disk" error.

Either the original or the new card is incorrectly identifying the disks.

 

Best to start your own topic and attach the diagnostics zip file to your post with a full description of your hardware and the issue.

  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.