Nhatch411 Posted January 17, 2018 Share Posted January 17, 2018 Help! I was previously running v6.34, with 2 Parity Disks (8TB), 2 Cache Disks (256GB SSD), and 13 Data disks (Mix of 3 & 4 TB SAS). Months back, I had added 4 new disks to expand the array. After which, when rebooting, those Disks would show as "Missing" / "No Device". I would then select the appropriate disk for each device and start the array. PITA, but hadn't been able to spend the time to troubleshoot. Jump to this morning, verified by nightly CA_Backup and upgraded to 6.4 through the Plugin. Upgrade completed successfully, I went into Settings\Disk Settings and disabled Auto Start prior to rebooting the array (since I assumed I would need to re-assign the "Problem" devices). Sure enough, after reboot the same 4 disks are marked as "Missing" / "No Device". I selected the appropriate drives, at which point they change from "Missing" to "Wrong". Unlike in 6.34, which recognized the identification and allowed me to start the array. The names do appear different in Identification- for example: H7240AS60SUN4.0T_001402E60HRX_PBH60HRX - 4 TB when it's looking for: H7240AS60SUN4.0T_001402E60HRX - 4 TB I believe the LSI controller is contributing to the name difference, however in 6.34 I was able to manually assign (through the GUI) and start the array. So my big question is should I use Tools\New Config to reassign the disk to device mappings (Preserving the Parity and Cache slots), or is there a better way to resolve (I can always revert back to 6.34)? I've include my last syslog, and some images of the Array Devices screen from 6.34 and now 6.4. Thanks all, any help would be greatly appreciated! hatchnas-syslog-20180117-1313.zip Quote Link to comment
bonienl Posted January 17, 2018 Share Posted January 17, 2018 Change setting Display world-wide-name in device ID to Automatic. See Settings -> Display Settings. Quote Link to comment
Nhatch411 Posted January 17, 2018 Author Share Posted January 17, 2018 Doing it now... Quote Link to comment
Nhatch411 Posted January 17, 2018 Author Share Posted January 17, 2018 Still lists as "Wrong". Quote Link to comment
172pilot Posted January 17, 2018 Share Posted January 17, 2018 I'm having a strange problem that sounds that it MIGHT be similar.. Unfortunately, I made the bad decision to make 3 differerent changes almost at the same time, which makes troubleshooting more "exciting".. SO.. In addition to upgrading to the latest version, in the last 2 days, I have also changed out my controller card. I was using an onboard controller, and switched to an AOC-SASLP-MV8 controller. I also added a cache drive. That being said.. the system came up just fine, and I was using it for a day or so. This morning I got up to see that the server was inaccessible. I went to the console to find continuous scrolling of write errors to drive 1. All I could do is cold boot. Upon cold boot, it came up showing drive 1 was bad, but it let me mount the raid, showing drive 1 was being simulated (or whatever the terminology was on the screen... sorry).. Since the raid was up, I copied a few important files off, just in case, and then ran a quick SMART test on the "bad" disk, and everything looked fine. I assume this failure was not real, but I am fine with rebuilding if necessary, but before I could even think about the next step, it came up and said I had 3 bad disks in the system. At that moment, it seemed like the raid was still up, but just to be safe, I shut the machine down and stopped going further. At this point, I haven't turned it back on. My plan was to investigate whether software problems have been detected in the latest version, and/or back out to the old version, plus remove the cache drive, since nothing has been written and it just confuses the situation for now. Also, I started looking for firmware updates on the SAS card, which I have found, but I'm confused because the docs say my machine should have been prompting me to hit CTRL-M to go into a setup mode, and I do not get that prompt. Short of other guidance / recommendations, I am going to try to boot onto a DOS image, and see if I can run any diags /firmware for the HBA, and/or switch back to the motherboard SATA connectors and see what happens, but I'd be curious if anyone has any recommendations? Thanks! Steve Quote Link to comment
bonienl Posted January 17, 2018 Share Posted January 17, 2018 9 minutes ago, Nhatch411 said: Still lists as "Wrong". Are these disks connected to a RAID/SCSI controller? If yes, you may try the Dynamix SCSI Devices plugin. Quote Link to comment
Nhatch411 Posted January 17, 2018 Author Share Posted January 17, 2018 Here is the controller: Serial Attached SCSI controller: LSI Logic / Symbios Logic SAS2008 PCI-Express Fusion-MPT SAS-2 [Falcon] (rev 03) I'm hesitant to add the plugin... Quote Link to comment
bonienl Posted January 17, 2018 Share Posted January 17, 2018 Setting WWN to automatic should have worked with this controller. You may want to file a defect report and refer to here (6.4 has some changes in regards to SAS support). Quote Link to comment
Nhatch411 Posted January 17, 2018 Author Share Posted January 17, 2018 I've pulled the Diagnostics to submit with a defect report. Any recommendations around my "New Config" question? What are the ramifications to running "New Config" while preserving the Parity and Cache slot assignments. Then reassigning the disks to the same slots they were previously (with WWDN set to automatic)? Thoughts? Quote Link to comment
bonienl Posted January 17, 2018 Share Posted January 17, 2018 Yes, you can do a New Config with retain ALL. Then select parity is already valid when starting the array. No ill effects. 2 Quote Link to comment
Nhatch411 Posted January 17, 2018 Author Share Posted January 17, 2018 So I'm a little confused. If the identification on the Data Slots is where I'm having an issue, why should I choose "Retain all"? Won't that just keep the same Device/Identification naming problem that I'm currently seeing on those 4 data slots? I'm definitely not a guru on what "New Config' actually re-writes (frankly on anything), however reading the Utility's notes seems to imply that I should retain those slot groups that are "Known Good" (thus the Parity and Cache). My apologies for questioning your advise, I just want to make sure I understand the process/utility. Quote Link to comment
bonienl Posted January 17, 2018 Share Posted January 17, 2018 With "retain all" the list of devices is simply prepopulated for you. You can make changes afterwards, but most assignments will be already correct, saving you work Quote Link to comment
itimpi Posted January 17, 2018 Share Posted January 17, 2018 11 minutes ago, Nhatch411 said: So I'm a little confused. If the identification on the Data Slots is where I'm having an issue, why should I choose "Retain all"? Won't that just keep the same Device/Identification naming problem that I'm currently seeing on those 4 data slots? I'm definitely not a guru on what "New Config' actually re-writes (frankly on anything), however reading the Utility's notes seems to imply that I should retain those slot groups that are "Known Good" (thus the Parity and Cache). My apologies for questioning your advise, I just want to make sure I understand the process/utility. You can start with the ‘retain all’ option, and then change only the slots that need correcting. Much easier (and less error prone) than trying to enter all slots from scratch. Quote Link to comment
Nhatch411 Posted January 17, 2018 Author Share Posted January 17, 2018 Will do, Thanks for the clarification! Quote Link to comment
Nhatch411 Posted January 17, 2018 Author Share Posted January 17, 2018 Running the New Config Utility (with Retain All) corrected the naming issue and all drives are now online with a running array! Once back in the Main view, I selected Parity is Valid and started the Array, everything seems to be back to norm. Later tonight I will reboot the array to see if the original issue is resolved, I'll update this thread after verification. Thanks all (especially bonienl)! 1 Quote Link to comment
Nhatch411 Posted January 22, 2018 Author Share Posted January 22, 2018 Quick and final update. I rebooted the array, all drives were recognized and the array auto started without issue! The original problem with the 4 newest HDD's listing as "No Device" in their disk slot after reboot has been resolved, as well as the "Wrong" device error that appeared after the upgrade. Thanks all! 1 Quote Link to comment
RevelRob Posted May 24, 2018 Share Posted May 24, 2018 I am having a similar issue where I installed an additional raid card in my server and moved one of the disks from the old card to the new one. Unfortunately, I'm getting the "wrong disk" error. Is there anything else I could do or is the New Config Utility the way? If so, what are the steps? start array first? mount disk? don't mount disk? Thanks. Quote Link to comment
JonathanM Posted May 25, 2018 Share Posted May 25, 2018 3 hours ago, RevelRob said: I am having a similar issue where I installed an additional raid card in my server and moved one of the disks from the old card to the new one. Unfortunately, I'm getting the "wrong disk" error. Either the original or the new card is incorrectly identifying the disks. Best to start your own topic and attach the diagnostics zip file to your post with a full description of your hardware and the issue. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.