[6.9.2] Disk Unmountable: not mounted


redbear

Recommended Posts

Hi, 

 

My server has 8 data disks and 1 parity disk. A couple of weeks ago I started the process of upgrading my data disks from 8TB to 16TB. The parity disk, disk 1, disk 2 and disk 3 went fine. Just after finishing the rebuild on disk 3, disk 8 became unmountable. Since I was upgrading anyway, I went ahead and pulled disk 8, installed a new disk and kicked off the rebuild. It finished, but the new 16TB disk 8 was unmountable. I ran the xfs_repair with no luck, then ran xfs_repair -L. The disk became mountable, but I lost 6TB of data. I paused on adding new drives while I tried to figure out which data were lost and how to recover, a significant amount was in the lost+found directory. 

 

All was stable for a couple of days now, now disk 3 has became unmountable. I've run both short and extended SMART self-tests and they come back clean. a xfs_repair without parameters comes back with:

 

Quote

ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this.

 

I do have the old 8TB disk 3, so if need be I can reformat and restore from the old disk. 

Since this error has occurred on two different disk types on two different cables and two different controllers, I'm concerned that something needs correcting or this behavior may continue with other disks. 

 

Any advice appreciated,  diagnostics attached. 

 

Many thanks,

 

Redbear

pretzel-diagnostics-20210903-2016.zip

Link to comment
00:17.0 RAID bus controller [0104]: Intel Corporation SATA Controller [RAID mode] [8086:2822]
    Subsystem: Gigabyte Technology Co., Ltd SATA Controller [RAID mode] [1458:b005]
02:00.0 RAID bus controller [0104]: Marvell Technology Group Ltd. 88SE9485 SAS/SATA 6Gb/s controller [1b4b:9485] (rev c3)
    Subsystem: Marvell Technology Group Ltd. 88SE9485 SAS/SATA 6Gb/s controller [1b4b:9480]

RAID mode is not recommended, and Marvell controllers are not recommended.

 

Which controller are you having problems with?

Link to comment

Thanks for your time and response. 

The first unmountable disk issue with disk 8 occurred on the Marvel controller.

The second/current unmountable disk issue with disk 3 is occurring on the Intel controller. 

I can reboot the machine and place the controllers into AHCI mode instead of RAID mode. If I can stabilize the machine, I can move all of the data to four 16tb data drives while I look into replacing the marvel based controller.  For what it's worth the machine has been rock solid for three years. 

Edited by redbear
Link to comment
4 hours ago, redbear said:

I can reboot the machine and place the controllers into AHCI mode instead of RAID mode

RAID mode is fine with Intel fakeRAID, it still uses the AHCI driver.

 

4 hours ago, redbear said:

For what it's worth the machine has been rock solid for three years.

I would run memtest, multiple filesystem corruption without an apparent reason could be the result of bad RAM.

Link to comment

Ok, I've removed the Marvel card (swapped with an LSI/Broadcom board) and changed the Intel controller to AHCI mode (@JorgeB, just to be safe). 

The array is back online, disk3 is still unmountable. I no longer have the option to run an xfs_repair, since the disk's format now reads as "auto". 

At this point I'm willing to move forward and replace or rebuild the disk. Based on the diags (new set attached) should I rebuild it from parity replace it and rebuild it, or something else?

pretzel-diagnostics-20210907-2241.zip

Link to comment

A rebuild will never fix an unmountable state as the rebuild simply makes the physical drive match the emulated one.

 

if the format for the drive is set to ‘auto’ and you know what it was before then you can set it explicitly which will cause the repair option to again be offered.   With it set to ‘auto’ the system cannot offer a repair if the file system type is not recognised as it does not know which tool to use.

Link to comment

Thanks for the tip to set explicitly. I did that, and I was able to run xfs_repair -L. 

The disk mounts now, and it's down about 6TB of data. Oddly similar amount to the first disk that became unmountable. 

 

There were 8 8TB data drives in the machine originally. My goal is to 5 16TB data drives.  I have upgraded the parity drive and 4 of the data drives. Currently parity is valid. 

My current thought is:

1. shut down docker and mover,

2. use  Unbalance to copy data from disk4 up to disk1.

3. remove disk 4,

4. use it's port to attach my old disk3

5. use Unassigned Devices & MC to copy the lost data back to the new disk3

6. use Unbalance to copy data from my remaining 8tb drives to the new 16TB drives

7. remove the remaining 8TB drives

8. add the fifth 16TB disk

9. rebuild parity

 

Thoughts?

 

Link to comment

Still seems wrong. With a missing disk and single parity you will no longer have parity protection. Then when you replace the missing disk you will have to rebuild it.

 

Also unclear how the new disks figure into this either. Are they going to replace other disks that already have data on them? No need to move or copy any data to replace a disk with a larger disk, just replace/rebuild.

 

I will be away for a few hours. See if you can explain what you want in more detail and we will see if we can come up with a plan.

 

Link to comment

Maybe I was just getting confused because you wanted to move data around for some reason.

 

Reviewing the first post obviously you know how to upsize disks by replace/rebuild.

 

I guess you could go with your plan of removing disk4, use its port to access old disk3 unassigned, then when done rebuild disk4 to a larger disk.

 

Not sure how moving things around with unbalance fits into all this though. Are you planning to remove some of these smaller disks and not replace them? And that is why you need to rebuild parity at the end?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.