Disks no longer mountable after stopping array

January 30, 20242 yr

I just recently migrated to Unraid this past week. I had an existing NAS running OMV that was connected to a 12 bay SAN, with drives formatted ext4 and pooled together with mergerfs (using snapraid for parity). I added a few temporary drives to the new Unraid machine to transfer stuff over. The new drives were formatted properly with XFS, and the transfer happened successfully. I then switched over my SAN to the new unraid machine, had unraid format all the new drives, and then used unbalance to scatter filed from one of my drives I wanted to use as a parity. That, too, happened successfully. Where it went wrong was after spinning down my array to re-enable the cache and to move that one disk to parity, and I got a bunch of errors that disks were wrong, even though the disks looked right. I used the new config tool, reassigned the drives, and now a all but 2 of my drives are listed as "Unmountable: Unsupported or no file system". This is strange considering I had Unraid properly format the drive after they were originally added to the array. If I unassign a disk and try to mount it using Unassigned Devices, the mount button is disabled.

Posted the diagnostics. I am hoping for recovery here, and am hoping to understand what happened.

Edit: From the logs, looks like there are bad superblocks somehow? Any way to recover from this?

nasa-diagnostics-20240130-1230.zip

Edited January 30, 20242 yr by drmalcolm

Quote

January 30, 20242 yr

Author

So quick update: i've read a number of articles and unraid documentation to initiate some repair attempts using xfs_repair on the various disks that seem to have failed. Those are still running, and have been for a couple hours now. While I hopefully await recovery, is any way of seeing why this happened in the first place? Nothing lost power (I am on an EATON ups and there was no power fluctuations), there wasn't even a reboot, I was using the disks just fine just before I stopped the array to add a parity, and the failed disks are scattered amongst devices (I have a disk that's directly attached to the motherboard via the built in sata backplane, and disks connected to 2 different MD1200 SANs via a LSI sas card in IT mode). Digging into the logs myself, I could see the various operations I performed, as well as the unmounts from stopping the array, but I couldn't find anything out of the ordinary that would potentially cause any sort of fs corruption.

Quote

January 30, 20242 yr

Community Expert

Sorry I didn't see this earlier.

9 minutes ago, drmalcolm said:

some repair attempts using xfs_repair on the various disks that seem to have failed

If you didn't run them from the webUI, then you may have gotten the command line wrong. Easy to do.

Let's start from scratch, now that you have already posted diagnostics. If I need to I can refer back to those but I would like to start fresh with something I can more easily and quickly make sense of.

Reboot, start the array in normal (not maintenance) mode, and post new diagnostics.

Quote

January 30, 20242 yr

Author

Sorry my wording was a bit unclear: I ran xfs_repair on the failed disks (the xfs_repair hadn't failed yet, they were still running). I was running

xfs_repair -v /dev/mdxp1

where x corresponds to the disk number I am checking. This was being run in the terminal just because I am usually more comfortable with cli work.

I rebooted, started the array and saved the diagnostics. You will no doubt see the invalid superblock magic number for a number of the disks in the syslog. While I do hope I can recover, I do want to make sure this doesn't happen again (be it user, hardware, or software error).

nasa-diagnostics-20240130-1746.zip

Quote

January 31, 20242 yr

Community Expert

4 hours ago, drmalcolm said:

I am usually more comfortable with cli work.

If you aren't familiar with Unraid and how some things work "under the hood", some things may not work the way you expect from using other Linux installations, so be careful with the cli. Don't mix user shares and disks/pools when moving or copying, for example.

Your xfs-repair commands were correct. I would like to see the output from that though.

I do have a hypothesis of what might have caused corruption, and failure of repair. Based on the disks that do mount, they are too full.

Quote

January 31, 20242 yr

Author

I appreciate the warning! I am learning, and while it's quite different from a "normal" Linux nas setup, it's quite a clever system.

As for xfs_repair output, I haven't had a single one finish yet. They've been running for several hours so far, and not even my small 4TB drive has finished scanning, and I suspect they might be going for quite some time. However, the longer they run, the less hopeful I am that a superblock can be recovered. Output so far has been:

Phase 1 - find and verify superblock...
bad primary superblock - bad magic number !!!

attempting to find secondary superblock...
...found candidate secondary superblock...
unable to verify superblock, continuing...
...........

Are disks that are too full causing the issues with corruption? I'm interested to hear more about this. I used the unbalance plugin to scatter the data from 1 drive to the rest of the drives (i made sure all other drives were selected), but it only really transferred to 3 other drives, which completely filled 2 of them (the two that are actually fine). The unbalance plugin has a default of 1GB min free space, which I suspect does not actually play nicely with unraid.

I also would like to hear your thoughts about what should be done with my two full drives. Whether or not these other drives are repaired, they will eventually be added back to the array as a working or freshly wiped xfs drive. When that happens, what would be the best way to balance the data so that this doesn't happen again? I am assuming that once I am settled, unraids default data allocation strategy of High Water should be enough to never cause this issue on its own, correct?

Quote

January 31, 20242 yr

Community Expert

15 hours ago, drmalcolm said:

Where it went wrong was after spinning down my array to re-enable the cache and to move that one disk to parity, and I got a bunch of errors that disks were wrong, even though the disks looked right.

Issue was likely a known bug, it can happen if the disks previously had an Unraid compatible partition, did you by any chance save the diags when the disks were showing as wrong to confirm?

Quote

January 31, 20242 yr

Community Expert

btrfs seems more prone to corruption when it fills than xfs. But any filesystem may need some extra space to work in if filesystem repair is required.

Quote

January 31, 20242 yr

Author

5 hours ago, JorgeB said:

Issue was likely a known bug, it can happen if the disks previously had an Unraid compatible partition, did you by any chance save the diags when the disks were showing as wrong to confirm?

The original set of diagnostics should have that information. I started the process around Jan 30 12:00:00. I went through the syslog and can see the series of unmounts from bringing the array down. Later at 12:00:45 is where I can see logs in the syslog about wrong disks being found, and then around 12:14 begins my series of attempts to fix using new config, and the subsequent mount failures. I hope that's able to help.

As for the repairs, they have all failed with "Sorry, could not find valid secondary superblock. Exiting now." Is there any further action I can take on these drives, or are these effectively dead? If they are dead, I'll start the process of formatting and restoring data, but I want to know what I can do about preventing this in the future. I have two disks that are full, what would be the best way to scatter some of that data so that there's more breathing room on the drives?

Quote

January 31, 20242 yr

Community Expert
Solution

24 minutes ago, drmalcolm said:

The original set of diagnostics should have that information.

The issue is better visible on vars,txt, not the syslog, and that is no longer showing "disk wrong", in any case it's almost certainly that issue, and even if it's not it won't make things worse, type on the CLI:

sgdisk -o -a 8 -n 1:1M:0 /dev/sdX

This is only for 3TB or larger disks, but I don't see any smaller arrays disks, replace X with the correct identifier for each affected array disk, then reboot, disks will show as wrong again, do another new config and they should now mount after array start.

Quote

January 31, 20242 yr

Author

1 hour ago, JorgeB said:
The issue is better visible on vars,txt, not the syslog, and that is no longer showing "disk wrong", in any case it's almost certainly that issue, and even if it's not it won't make things worse, type on the CLI:
sgdisk -o -a 8 -n 1:1M:0 /dev/sdX
This is only for 3TB or larger disks, but I don't see any smaller arrays disks, replace X with the correct identifier for each affected array disk, then reboot, disks will show as wrong again, do another new config and they should now mount after array start.

This worked! Thank you greatly for this! Disks are up and looks like the data is intact. My question now is: I am assuming that now these disks are "fixed", but how should I proceed? I have 3 disks that have 1GB left and I want to hear your recommendations on how best to balance that out. Assuming that the lack of free space likely contributed to the corruption, I definitely want to handle this before I stop the array next. I used the unbalance plugin the last time to accomplish this (mostly to scatter all data from one disk to other disks), but I wanted to see if there is a better unraid-native way I can do this. I've attached diagnostics just in case.

nasa-diagnostics-20240131-1248.zip

Quote

January 31, 20242 yr

Community Expert

11 minutes ago, drmalcolm said:

Assuming that the lack of free space likely contributed to the corruption

I would recommend having a little more free space left than that, or a filesystem repair may fail, but the issue you had was completely unrelated, it's an Unraid bug, it will be fixed for v6.12.7.

Quote

January 31, 20242 yr

Author

5 minutes ago, JorgeB said:

I would recommend having a little more free space left than that, or a filesystem repair may fail, but the issue you had was completely unrelated, it's an Unraid bug, it will be fixed for v6.12.7.

Gotcha, thanks! Is it safe to stop the array to add a disk for parity? I'd like to avoid triggering the bug again if possible.

Quote

January 31, 20242 yr

Community Expert

8 minutes ago, drmalcolm said:

Gotcha, thanks! Is it safe to stop the array to add a disk for parity?

Yes.

8 minutes ago, drmalcolm said:

I'd like to avoid triggering the bug again if possible.

To avoid the bug, and if you add new disks, make sure any existing partition is wiped, you can do that with the UD plugin, this is only needed for used disks, new disks that were never used won't have a partition.

Quote

January 31, 20242 yr

Community Expert

Dynamix File Manager plugin will let you work directly with folders and files on the server. Not as automated as unbalanced though.

Quote

Disks no longer mountable after stopping array

Featured Replies

Solved by JorgeB

Join the conversation

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)