Deler7 Posted March 15 Share Posted March 15 Seems i'm about having a bad weekend. As i decided to overhaul my tower build server to a rackmount Supermicro 36 slot SAS chassis, so i can add more disks. On my first startup, all my array drives got the status unmountable: Unsupported partition layout. All drives were XFS before this happened. I have no clue why this happened. I didn't change the mainbord or controller, it was a 1 to 1 transfer of all hardware into a new case, and added some disks (not formatted yet). The only difference is, before i used 3 mini-sas to 4 SATA cables to connect my harddrives, now i'm using a single SAS cable to the chassis expander. This shouldn't make that much of a difference ? Disks not having SMART errors, and are recognised by unraid. In maintenance mode, i click on a random disk, to perform a xfs repair, using the default -n option. This is the output: Phase 1 - find and verify superblock... bad primary superblock - bad magic number !!! attempting to find secondary superblock... .found candidate secondary superblock... verified secondary superblock... would write modified primary superblock Primary superblock would have been modified. Cannot proceed further in no_modify mode. Exiting now. Any clues what happened, and more importantly, can this be fixed without loosing the data ? I've read something about running xfs_repair -V on each disk, but before doing that, wanted to consult this forum first In my diag file, you will also see 12 Seagate disks, those are new and unrelated to the array. I already tried to boot without those new disks, it made no difference. parodius-diagnostics-20240315-2035.zip Quote Link to comment
JonathanM Posted March 15 Share Posted March 15 Maybe the expander remapped the partitions? @JorgeBmay have a better option, but I seem to remember the solution was to unassign one drive at a time, start the array, stop the array, reassign the drive, let it rebuild. Several hours for each drive. Whatever you do, don't remove more than one drive at a time. To test my solution, unassign one drive, start the array, and see if the emulated drive mounts properly. Quote Link to comment
Deler7 Posted March 15 Author Share Posted March 15 (edited) 3 hours ago, JonathanM said: Maybe the expander remapped the partitions? @JorgeBmay have a better option, but I seem to remember the solution was to unassign one drive at a time, start the array, stop the array, reassign the drive, let it rebuild. Several hours for each drive. Whatever you do, don't remove more than one drive at a time. To test my solution, unassign one drive, start the array, and see if the emulated drive mounts properly. Yea, i think as well something have to do with the expander, because that's the only component that was really changed. I have swapped disks in the past between direct to controller and expanders, never had such issue, although that was on HW raid. To my knowledge, expanders are just SAS switches, they don't do anything with data on drives. Unless this was a problem that started before the HW changes, and came up on the next boot, coincidentally after i moved to a new chassis. Tried to unassign one disk, but it wont allow me to start the array due Missing disk, or do i need to remove the drive physically ? Edited March 15 by Deler7 Quote Link to comment
JonathanM Posted March 16 Share Posted March 16 Is there not a checkbox to allow you start? Post a full screenshot of the main page. Quote Link to comment
Deler7 Posted March 16 Author Share Posted March 16 Ok. When the array is stopped, i mark 1 disk as no device ... ... On this way, i can't start the array, as its missing a disk. Note: disk7 is unassigned, but that was to replace a defective drive a few weeks (and reboots) ago. Note2: The Seagate 4TB drives are When i start the array with all the correct disks assigned: ... ... It wants me to format the WDC drives, ofcourse not doing so, as it contains data i would love to get it back. Quote Link to comment
JonathanM Posted March 16 Share Posted March 16 3 minutes ago, Deler7 said: On this way, i can't start the array, as its missing a disk. 56 minutes ago, JonathanM said: Is there not a checkbox to allow you start? Quote Link to comment
Deler7 Posted March 16 Author Share Posted March 16 (edited) Ah, when i checked the box, START remains grey. Closed all browsertabs, did a cache clear, logged back into server, and now it turns orange, so i can proceed now. Removed the disk, checked the box, start array. Stopped the array, reattach the disk i just unassigned. And start the array. I assume, the waiting game starts ? Edited March 16 by Deler7 Quote Link to comment
JonathanM Posted March 16 Share Posted March 16 29 minutes ago, Deler7 said: I assume, the waiting game starts ? Yes and no. My theory didn't work, if it did, the disk slot being rebuilt would be mounted already. Let the rebuild complete, and wait for @JorgeB Quote Link to comment
JorgeB Posted March 16 Share Posted March 16 This happened before with those Adaptec RAID controllers, they can apparently sometimes overwrite the MBR of the disks, but it's not good news that the emulated disk didn't mount, if you didn't reboot since doing that, post the diagnostics. Quote Link to comment
Deler7 Posted March 16 Author Share Posted March 16 After a night of spinning, the rebuild is complete. 13 minutes ago, JorgeB said: This happened before with those Adaptec RAID controllers, they can apparently sometimes overwrite the MBR of the disks, but it's not good news that the emulated disk didn't mount, if you didn't reboot since doing that, post the diagnostics. I havent rebooted since my previous steps. My new diagnostics below 👍 parodius-diagnostics-20240316-1120.zip Quote Link to comment
JorgeB Posted March 16 Share Posted March 16 No valid filesystem is being detected on the rebuilt disk1, post the output of: blkid and also gdisk /dev/sdi the latter is for disk2, to check the current partition layout. Quote Link to comment
Deler7 Posted March 16 Author Share Posted March 16 Allright. root@PARODIUS:~# blkid /dev/sda1: LABEL_FATBOOT="UNRAID" LABEL="UNRAID" UUID="2736-60C3" BLOCK_SIZE="512" TYPE="vfat" /dev/loop1: TYPE="squashfs" /dev/mapper/nvme3n1p1: LABEL="nvme-two" UUID="4128737249177850753" UUID_SUB="5585851513646370683" BLOCK_SIZE="4096" TYPE="zfs_member" /dev/nvme0n1p1: LABEL="cache" UUID="17440156396158875726" UUID_SUB="14585744415475683753" BLOCK_SIZE="4096" TYPE="zfs_member" /dev/nvme3n1p1: UUID="ad1f4b74-88bc-409b-8586-e81baf646027" TYPE="crypto_LUKS" /dev/nvme2n1p1: UUID="b8d4d0c9-ec99-4505-8cce-c9a19a817ba1" TYPE="crypto_LUKS" /dev/loop2: UUID="139601bd-7ef3-471e-9dc5-5e5e4f78d045" BLOCK_SIZE="512" TYPE="xfs" /dev/loop0: TYPE="squashfs" /dev/mapper/nvme2n1p1: LABEL="nvme-one" UUID="18166102060409839496" UUID_SUB="9952805448310778193" BLOCK_SIZE="4096" TYPE="zfs_member" /dev/nvme1n1p1: LABEL="cache" UUID="17440156396158875726" UUID_SUB="926196455089919428" BLOCK_SIZE="4096" TYPE="zfs_member" /dev/sdt1: PARTUUID="30cecb74-aef3-47de-99a5-1b6d6033178a" and From DISK2 root@PARODIUS:~# gdisk /dev/sdy GPT fdisk (gdisk) version 1.0.9.1 Caution: invalid main GPT header, but valid backup; regenerating main header from backup! Warning: Invalid CRC on main header data; loaded backup partition table. Warning! Main and backup partition tables differ! Use the 'c' and 'e' options on the recovery & transformation menu to examine the two tables. Warning! Main partition table CRC mismatch! Loaded backup partition table instead of main partition table! Warning! One or more CRCs don't match. You should repair the disk! Main header: ERROR Backup header: OK Main partition table: ERROR Backup partition table: OK Partition table scan: MBR: not present BSD: not present APM: not present GPT: damaged Found invalid MBR and corrupt GPT. What do you want to do? (Using the GPT MAY permit recovery of GPT data.) 1 - Use current GPT 2 - Create blank GPT Your answer: ^C I did the same for DISK3 until DISK12 all have identical results as above. In case of any help, this is the output of the rebuilded DISK1: (see below) root@PARODIUS:~# gdisk /dev/sdt GPT fdisk (gdisk) version 1.0.9.1 Partition table scan: MBR: protective BSD: not present APM: not present GPT: present Found valid GPT with protective MBR; using GPT. Command (? for help): ^C Quote Link to comment
Solution JorgeB Posted March 17 Solution Share Posted March 17 gdisk output confirms the partitions got clobbered, likely by the RAID controller, you can try to fix one with gdisk to see if it works after, to play it safer I would recommend cloning the disk first with dd and do it on the clone. Quote Link to comment
Deler7 Posted March 18 Author Share Posted March 18 (edited) A small update from my side. I made a DD copy from 1 disk to one of my new installed HDD's before continuing potentially screwing things (more)up Not sure how to use the gdisk tool, i just tried following a guide, using gdisk on the drive that just got its backup. I went for the "r" option (r recovery and transformation options), then choose "b" (use backup GPT header (rebuilding main)), and when finished write and exit. Then i rebooted the server, but still no luck, same error message. Then, in maintenance mode, in Unraid UI, i went for the same disk, and i started the xfs_repair without any arguments. Rebooted the server again, did a regular ARRAY START and there it was, this particular disk with all its files accessible again ! There were a few (just really only a few) files in the Lost+found folders, but that's OK for me. So, i guess im on the right track (?) At the moment, i'm DD'ing every disk now to my new placed harddrives to have a backup at least. When that is finished, i should follow the same procedure on each disk ? Edited March 18 by Deler7 Quote Link to comment
JorgeB Posted March 19 Share Posted March 19 9 hours ago, Deler7 said: When that is finished, i should follow the same procedure on each disk ? If it worked for one it should work for the other ones, except maybe disk1, since that one was rebuilt. Quote Link to comment
Deler7 Posted March 19 Author Share Posted March 19 1 hour ago, JorgeB said: If it worked for one it should work for the other ones, except maybe disk1, since that one was rebuilt. It worked, even for DISK1 👍 Thank you all very much for the support !!! 1 Quote Link to comment
JorgeB Posted March 19 Share Posted March 19 If you can, I would recommend replacing that controller, since it's a common issue with them. Quote Link to comment
Deler7 Posted March 20 Author Share Posted March 20 20 hours ago, JorgeB said: If you can, I would recommend replacing that controller, since it's a common issue with them. Oh yes, i instantly ordered a cheap LSI 9211-i8 controller and installed it yesterday on the server. As i expected to have some complications, i kept my backups, but this change from Adaptec to LSI went flawlessly. It was more like plug&play, all drives were recognised and the array stated without errors. To make sure everything is OK, i started a paritycheck. Perfect ! All my issues are resolved now. However, perhaps good information for others who are reading this thread in future and have the same issue. I have found the source of my original issue! I now can reproduce the issue. Not the swap to the SAS-SATA cables to SAS-Extender that caused this. It was change of controller mode. The Adaptec 7 series have 4 different controller modes you can pick for operation. - Auto - RAID hide RAW. Act as a pure HW raid controller, unassigned disks are not exposed to OS. - RAID expose RAW.(default) Same as above, but now it exposes unassigned disks not assigned on a HW RAID to the OS. This is the default setting at factory defaults ie. new cards. - HBA. No RAID volumes, all disks are RAW exposed to the OS, comparable to IT mode, this is the mode you should run with UNRAID. Here is the thing, although you might expect "RAID expose RAW" would be the same as "HBA" when having no RAID volumes, IT ISN'T. Switching between those 2 modes does somehow mess up with your partition tables. I did some tests on a discardable array. Create the array when controller is in HBA, then change controller to RAID expose RAW, will render the issue i had on my first post of this topic (drives are unmountable). Even when switching back to HBA mode, the damage is already done, and disk partitions needs to be fixed. Well, how could this happen to me, as i wasn't messing at the controller settings when migrating to a new enclosure. My mainbord UEFI bios. 🤬 As some are aware, when installing the 7 series (and perhaps newer) Adaptec cards on mainbord's that have UEFI, you no longer get the legacy CTRL+A for menu option at bootup, instead, you enter the controller bios within your mainbord UEFI bios at the Add-On devices sub-menu. For me unknown reason, the mainbord also remembers what slot the controller is plugged into, and keep it settings by slot. Meaning, when you change PCIe slot of the controller, at my Gigabyte Z590 mainbord, it defaults the controller to its original setting "RAID expose RAW". And indeed, when i was migrating to the enclosure, i did a one-time boot the system with the controller on a different PCIe slot, as i wanted to test a GPU in its main slot. I didn't check the array at that time, but i did a shutdown, moved the controller back to its original slot, but the 'damage' was already done. Hence my issue above. This seems only happening on my Gigabyte Z590 UEFI based mainbord, as my older dual Xeon legacy BIOS mainbord does not behave like this. On the legacy board, i can move the controller to any PCIe slot i like, it does not change it's mode. But when i install the controller to my Gigabyte Z590, it store my settings per PCIe slot. My lesson learned: do NOT change the PCIe slot of Adaptec cards at UEFI based mainbord's without verifying settings before fist boot. Or buy a LSI HBA in IT mode and never look back 1 Quote Link to comment
JorgeB Posted March 20 Share Posted March 20 Thanks for posting the above, it may help other users in the future, I'll keep a link to this thread. 1 Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.