I broke my array, and now rethinking Adaptec 71605


Recommended Posts

I am in the process of assembling an unraid server to move over (and expand over) from a synology 12 bay NAS

 

I initially connected 5x 8TB drives (2x parity, 3x data) as main array and 2x 1TB SSDs as cache, all connected to my motherboard SATA ports. Have been running this happily for a few weeks

 

Today, I finally got an adaptec ASR-71605 to connect the remaining 16 SATA bays to the HBA. I was told that the card was in HBA mode (and I know its mode switchable), and I was negligent enough to not check on that. 

 

I made a mistakes of rearranging the drives to move the data drives to HBA. Should have validated the card first. The card was in Simple volume mode, and seems it corrupted my drives on first boot. When I didn't see any drives at boot, I realized what had happened and switched the card to HBA mode in BIOS

 

The drives were detected after that, but they are no longer mountable. I tried running xfs_repair in safe mode, and it replaced the primary superblock with secondary one. did a few other things and doesn't seem to complain now, but the drives are still unmountable "Unsupported partition layout"

 

Diagnostics attached. I am not sure if I can recover from this, but I still want to try. At this point I can rebuild everything, but I see a risk of the adaptec card accidentally resetting itself into non HBA mode some day and and I would not want want to deal with this again. Any experiences on whether this is something people ever experience? Or are the cards usually stable to not reset back into some other mode?

 

 

godaam-diagnostics-20221201-1949.zip

Link to comment

IIRC there have been similar issues with this controller before, for example after a firmware update, if parity is valid you might be able to rebuild one disk at a time since Unraid will recreate the partition, but if there's more damage than just the partition it might not work, you can test by unassigning one of the data disks and starting the array, then see if the emulated disk mounts.

  • Like 1
Link to comment
37 minutes ago, JorgeB said:

IIRC there have been similar issues with this controller before, for example after a firmware update,

Good point about firmware update, I can see that as an easy way to fall into a similar trap if settings are reset on update

 

37 minutes ago, JorgeB said:

if parity is valid you might be able to rebuild one disk at a time since Unraid will recreate the partition, but if there's more damage than just the partition it might not work, you can test by unassigning one of the data disks and starting the array, then see if the emulated disk mounts.

I am not sure I am following what to do here. don't want to go in a wrong direction. I am convinced my parity is valid, those drives haven't been moved off their motherboard and haven't seen any writes except for anything related to my attempts at xfs_repair. I don't undertstand how unraid can repair partitions from parity information only when all my data drives are offline. I was expecting to at most lose 2 drives for that to work, but seems I've lost 3. Any pointers?

Link to comment

Thanks for the pointers. I did not see anything like an emulated disk, but I think I might be doing what you suggested 

 

I unassigned one of the 3 data drives, it started showing up in unassigned drives. I started the array. The drive in unassigned section was not mountable, only had format button. I didn't touch that

 

I then stopped the array, put the drive back into its array spot and started again. So a rebuild is now happening with other 4 drives being read and the one I moved around being written. I think I understand your point reading the other thread you linked - partitions aren't protected by parity, so if the data is intact, it will be recreated on rebuild and while doing so unraid will create partitions fresh. So if data was OK, this might work.

 

So how do I know that it worked after first disk? It will not say unmountable? and I will be able to see partial data in shares from whatever was on it? If it shows up empty or still unmountable, it didn't work I guess

 

Curiously, while the rebuild is happening, the drive being rebuilt has gone from "Unmountable: unsupported partition layout" to "Unmountable: wrong or no filesystem". This is expected?

 

If it does work, since I have 2 parity drives, would I be able to rebuild the the other 2 data drives in a single rebuild rather than one by one? 1 rebuild is about 12 hours for me

Link to comment
1 hour ago, Ashish Pandey said:

Curiously, while the rebuild is happening, the drive being rebuilt has gone from "Unmountable: unsupported partition layout" to "Unmountable: wrong or no filesystem". This is expected?

Depends on the damage, post new diags to see if a valid filesystem is being detected.

Link to comment

Some success, but pointless 😂

 

I ran the repair, and did a lot of things, and then failed with

fatal error -- File system metadata writeout failed, err=117.  Re-run xfs_repair.

I re-ran it and it seemed to repair successfully on 2nd go

 

Then I restarted the array, and the disk was fine and mounted

 

BUT, it only came back with about 60GB of data. The healthy array had each of the disks about 60% full (about 4.8TB each). Most of what remained was lost+found, but it doesn’t account for any reasonable chunk 

 

At this point, it might be easier for me to build the Array from scratch and copy over data again, then to fight this battle. An LSI replacement is on the way. Lesson learned

Edited by Ashish Pandey
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.