dripsnek

Members
  • Posts

    2
  • Joined

  • Last visited

dripsnek's Achievements

Noob

Noob (1/14)

0

Reputation

  1. From the release notes and the Kernel bug report it looks like a symptom of the aacraid driver regression, if you're on 6.12.5 or .6 I'd roll back to 6.12.4 as soon as you can. I'd also independently verify the integrity of any files written to disks on the controller whilst 6.12.5 / 6 was running just in case. I can't find any info on 32136 other than it does exist (see above bug report) but it is not available on the Microsemi website. Also curious as to why this version is circulating and how it differed from 32118.
  2. Having been stung by this same issue recently, I've come to the conclusion that you should never "submit" any changes in the "Controller Configuration" menu with disks attached on these cards. In my case with an ASR-71605E running firmware 1.0.100.32118 though the maxView HII placed in the UEFI, pressing "submit" with any change on that page will cause the controller to immediately zero the first 64KiB of all disks attached to that controller regardless of state. Subsequent testing on my card confirmed repeatability of this behavior. Disconnecting all disks before changing configuration for my combination of card and motherboard appears to be the only 'safe' option. Fortunately I managed to recover from this. BTRFS partitions were unaffected as they started far-away off in the disk that simply regenerating the partition table brought them back online. XFS drives were a little more complicated as there was metadata lost within that 64KiB, however the AG0 root directory inode was intact based on offsets of the other AG superblocks using hexdump and dd, I was confident this could be repairable. I was a little spooked reading the XFS man page as in the bugs section it states "The no‐modify mode (-n option) is not completely accurate.", so I avoided using xfs_repair until I knew for sure what was going to happen. I prototyped a replica scenario using a loopback image to see what would happen if a XFS partition had everything in AG0 up to the first inode wiped out. After breaking my image, my gut was telling me that xfs_repair probably was designed with disk corruption in mind rather than disk erasure, so I wrote a 'fake' AG0 superblock based on one of the other AGs with the correct offset for the root directory inode in the correct location in the image. xfs_repair subsequently brought the test filesystem back online as it correctly identified that AG0 was present and corrupted, but the information required to find the other AGs and the root inode was enough to repair it. This carried through to my victim disk images. Ultimately only ~200 files ended up in lost+found, based on previous knowledge of the array and backups elsewhere they were re-identified despite only known by their inode numbers. I consider myself extremely lucky in this scenario. Despite the filesystems being repaired, I no longer consider them trustworthy and the process of reformatting and repopulating disks is on going. It goes without saying the parity drive was useless in this exercise due to the multiple drive failure as well as it being subject to the 64KiB erasure. Regardless of fault or feature, I've learned a lot from this and hope my experience helps other users of these cards.