"Unmountable: No file system" after rebuilding array


Recommended Posts

So I finally took the plunge and replaced my perc h700 card with an h200 flashed to IT mode. I had an existing xfs array on the perc with 12tb of data. Fingers crossed when I rebooted the array wouldnt start and the drives all gave the error "Unmountable: No file system"

I tried xfs_repair -v on the drives, but no luck. (errors corrected but still won't mount). I assumed changing cards that this was to be expected and I decided to start a new array with some new drives I had, and at the same time I figured by copying all my data to a new array at least I would have a non-fragemented array as my old one was probably 90% fragmented. Anyway 4 days later I finally finished copying everything and setting up all my dockers. Now I decided to add in 2 drives for parity. Precleared the new disks without errors, and rebooted only to have the exact same error as before, "Unmountable: No file system" !  Arghh. Side note- any advantage to setting up the parity while building the array as opposed to after? Would this have slowed down the copy process?

 

So, again, xfs_repair -v the array drives, fixed some errors, ran again, no more errors. Rebooted, same problem, "Unmountable: No file system"

 

What are my recommended next steps? Do I proceed with an xfs_repair -L even though I'm not getting any xfs_repair errors? I don't want to have to start from scratch again and copy 12tb of data, which is hard because its all sitting on shares spread across drives from my former array. But as a last resort I can do that. I'm hoping to salvage the existing data.

 

Attached are my diagnostics, thanks in advance to those offering some guidance.

 

EDIT:  After the xfs_repairs on all disks and powering down overnight, the array came back online on next boot, but with data gone. 

 

tower-diagnostics-20190808-1546.zip

Edited by TommyJohn
updated to solved
Link to comment

Very strange issue, all disks were complaining of:

log has mismatched uuid - can't recover

From a little googling this came up:

 

Quote

Which implies that the superblock was written to disk with the wrong 
UUID in it. And the only way that can happen is if the superblock 
for the wrong filesystem is written to the block device. Hence my 
question of whether you swapped the paths while the filesystems were 
mounted - the superblock is only read during mount time, and the 
UUID is never modified, so the only way an incorrect UUID can be 
written to the filesystem is if the block device changes underneath 
the mounted filesystem. 

No idea how this could happen with Unraid, never seen that error before, and very suspicious it's on all 3 disks.

Link to comment

You got two times of same problem. Does those disk which attatch H200 have plug to H700 and cause some over writing.

 

If not, I will suspect there are some firmware bug which allow H700 make change to H200 attach disk. Situation like if you have two HBA, a single BIOS interface could access both HBA. Does any config change in H700 before problem occur ?

 

In general, we can add parity disk anytime and won't write anything to data disk, I do this many time never got problem.

Edited by Benson
Link to comment

Benson: the h200 was installed after the h700 was removed, they weren't simultaneously connected. 

 

johnnie: This is a good point, I honestly can't remember if I formatted the disks on the h700 before moving them to the h200, though almost certain it was after because I wanted to ensure I had the correct serial numbers on the new drives showing in unraid. 

 

So what do I do now? Do I attempt xfs_repair -L ? Will this potentialy repair the filesystem?

 

UPDATE: Powered down for 24hrs and when powering back up this morning the array was back online... but all drives were empty. Looks like I've lost the data and have to recover from backup. Still worried that this might happen again, any indicators I should look for in logs? Any way to get a warning about the uuid issue in the future?

 

Here is my new xfs_repair message:

 

root@Tower:~# xfs_repair -v /dev/sdb1
Phase 1 - find and verify superblock...
        - block cache size set to 1513776 entries
Phase 2 - using internal log
        - zero log...
zero_log: head block 662654 tail block 662650
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed.  Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair.  If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.

 

Should I proceed with -L in attempt to recover data?

Edited by TommyJohn
update
Link to comment
1 hour ago, TommyJohn said:

Still worried that this might happen again, any indicators I should look for in logs?

Difficult to say, I never encountered this issue before, and can't really see how this could happen with normal use, you should at least grab the diags before any reboot, and if issue repeats post them, maybe with that we can see what's happening.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.