Files missing and unraid reporting a drive as "Unmountable: not mounted"


Recommended Posts

Hello, last night I went to watch a movie on my server and plex was showing I had 6 movies (I have have way more than six). After looking into it I noticed the entire array was offline. I rebooted the server (Yes I should have gotten a diagnostics snapshot first, but I didn't). When the server came up, the array was back online, and a plex rescan repopulated all my movies.

 

This morning I woke up to many missing files and the unraid UI showing a drive as "Unmountable: not mounted", and Unraid wants me to format the disk. I logged into the server and dmesg shows a bunch of

 

[54431.657222] ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
[54433.274993] ata5.00: configured for UDMA/33
[54433.275011] ata5: EH complete
[54433.465251] ata5.00: exception Emask 0x10 SAct 0x1fc00 SErr 0x190002 action 0xe frozen
[54433.465253] ata5.00: irq_stat 0x80400000, PHY RDY changed
[54433.465254] ata5: SError: { RecovComm PHYRdyChg 10B8B Dispar }
[54433.465256] ata5.00: failed command: READ FPDMA QUEUED
[54433.465258] ata5.00: cmd 60/40:50:90:2b:d9/05:00:09:00:00/40 tag 10 ncq dma 688128 in
                        res 40/00:00:d0:30:d9/00:00:09:00:00/40 Emask 0x10 (ATA bus error)
[54433.465258] ata5.00: status: { DRDY }
[54433.465259] ata5.00: failed command: READ FPDMA QUEUED
[54433.465261] ata5.00: cmd 60/10:58:d0:30:d9/02:00:09:00:00/40 tag 11 ncq dma 270336 in
                        res 40/00:00:d0:30:d9/00:00:09:00:00/40 Emask 0x10 (ATA bus error)
[54433.465262] ata5.00: status: { DRDY }
[54433.465262] ata5.00: failed command: READ FPDMA QUEUED
[54433.465264] ata5.00: cmd 60/40:60:e0:32:d9/05:00:09:00:00/40 tag 12 ncq dma 688128 in
                        res 40/00:00:d0:30:d9/00:00:09:00:00/40 Emask 0x10 (ATA bus error)
[54433.465264] ata5.00: status: { DRDY }
[54433.465265] ata5.00: failed command: READ FPDMA QUEUED
[54433.465267] ata5.00: cmd 60/b0:68:20:38:d9/03:00:09:00:00/40 tag 13 ncq dma 483328 in
                        res 40/00:00:d0:30:d9/00:00:09:00:00/40 Emask 0x10 (ATA bus error)
[54433.465267] ata5.00: status: { DRDY }
[54433.465268] ata5.00: failed command: READ FPDMA QUEUED
[54433.465270] ata5.00: cmd 60/40:70:d0:3b:d9/05:00:09:00:00/40 tag 14 ncq dma 688128 in
                        res 40/00:00:d0:30:d9/00:00:09:00:00/40 Emask 0x10 (ATA bus error)
[54433.465270] ata5.00: status: { DRDY }
[54433.465271] ata5.00: failed command: READ FPDMA QUEUED
[54433.465273] ata5.00: cmd 60/40:78:10:41:d9/05:00:09:00:00/40 tag 15 ncq dma 688128 in
                        res 40/00:00:d0:30:d9/00:00:09:00:00/40 Emask 0x10 (ATA bus error)
[54433.465273] ata5.00: status: { DRDY }
[54433.465274] ata5.00: failed command: READ FPDMA QUEUED
[54433.465275] ata5.00: cmd 60/40:80:50:46:d9/05:00:09:00:00/40 tag 16 ncq dma 688128 in
                        res 40/00:00:d0:30:d9/00:00:09:00:00/40 Emask 0x10 (ATA bus error)
[54433.465276] ata5.00: status: { DRDY }
[54433.465278] ata5: hard resetting link

 

This time I did take a diagnostic snapshot (attached). I rebooted the server and it came up in the same state - 1 drive is "unmountable" and the data on it is missing. Furthermore, Unraid is running a parity check (which I cancelled).

 

What I can't figure out is:

1) Why isn't unraid emulating the missing drive?

2) Why did unraid restart the array if a drive is missing?

3) Is the data on the missing "unmountable" drive gone if the parity check started? My fear is that it started rewriting parity to align with the missing drive.

 

how screwed am I?

storage-diagnostics-20211031-1033.zip

Link to comment
29 minutes ago, timekiller said:

1) Why isn't unraid emulating the missing drive?

 

Because a write to it never actually failed, and the drive is still there, even though there is corruption on it.  Corruption would appear to be caused by a piss-poor cable connection to the drive.  Reseat and/or replace

29 minutes ago, timekiller said:

2) Why did unraid restart the array if a drive is missing?

 

Because it's not missing or red-balled

29 minutes ago, timekiller said:

3) Is the data on the missing "unmountable" drive gone if the parity check started? My fear is that it started rewriting parity to align with the missing drive.

 

Nope.

 

Run check filesytem against disk 1

Link to comment
  1. The drive isn't missing, it is just unmountable. Parity only emulates missing or disabled drives.
  2. Even if you did have a missing drive, the array can be started as long as there are not more missing or disabled drives than you have parity.
  3. Likely parity already agrees with the unmountable disk.

Fix your connection problem, then you will have to repair the filesystem on disk1.

 

https://wiki.unraid.net/Manual/Storage_Management#Drive_shows_as_unmountable

 

Why have you given 100G to docker.img? 20G is often more than enough.

 

 

Link to comment

Update:

I powered down the server and removed the unmountable drive. After rebooting, unraid didn't start the array and I had to tell it to with a missing drive, as expected. However, the missing data is still missing. I pulled the unmountable drive and attached it to my desktop (Linux). I opened the drive with cryptsetup and had to run xfs_repair befor eI could mount it, but the missing data is still there, so that's good. But now unraid is in a state where it knows a drive is missing, but unraid is not emulating the missing data. I can rsync the entire drive back, but that will take a long while. I don't think there is any way around this though since unraid believes the missing drive was just waiting to be formatted.

 

Unraid shows the missing drive, but is still labelling it as "Unmountable" and there is no directory for it under /mnt/.

 

I feel like the only option now is to reinstall the drive use the Tools->New Config option to construct a new array with all the drives in place.

 

Anyone see another option here?

Link to comment
1 minute ago, timekiller said:

Anyone see another option here?

Yes.

Run the xfs repair on Unraid, which is what you should have done in the first place.

3 minutes ago, timekiller said:

unraid is not emulating the missing data.

Yes, it is. The filesystem is corrupt and needs to be repaired.

 

Parity emulates the entire drive, no matter what filesystem or corruption is there. Parity has no concept of files or filesystems, just the raw bits.

 

5 minutes ago, timekiller said:

Unraid shows the missing drive, but is still labelling it as "Unmountable" and there is no directory for it under /mnt/.

That's correct, it's emulating the filesystem exactly as it was on the drive, in need of repair.

 

Trurl & Squid already tried to tell you all this, so I don't know if my attempt at saying the same thing in different words is going to get through or not.

 

Now that you have removed the drive, the emulated filesystem doesn't match the physical drive, so you need to do the filesystem repair on the emulated drive, and when the emulated drive is mountable, then you can rebuild the physical drive to match the emulated one in Unraid.

 

I'll try restating all this again more succinctly.

 

Unmountable means the filesystem is corrupt or missing, and requires XFS or whichever filesystem is in use to be repaired.

 

Disabled means a write to the physical drive failed, which means the parity emulated drive is no longer in sync with the physical drive. That requires rebuilding parity or the drive to get the two back in sync. Normally when a drive is disabled you need to figure out why the write failed, sometimes it's a connection, sometimes the drive is actually dying, in your case you disabled the drive on purpose by removing it and then starting the array with it missing.

 

Unmountable and Disabled are two distinct problems with different solutions. In your first post you described an unmountable disk which needed a file system repair. The drive was not disabled until you disabled it.

 

The file system corruption was likely the result of a bad connection or bad cable, which will need to be addressed or the resynchronization of the emulated drive to the physical drive will probably fail.

Link to comment
23 minutes ago, timekiller said:

I pulled the unmountable drive and attached it to my desktop (Linux). I opened the drive with cryptsetup and had to run xfs_repair befor eI could mount it, but the missing data is still there, so that's good. But now unraid is in a state where it knows a drive is missing, but unraid is not emulating the missing data. I can rsync the entire drive back, but that will take a long while. I don't think there is any way around this though since unraid believes the missing drive was just waiting to be formatted.

By pulling the drive and running it outside of Unraid you just invalidated parity.

 

With the drive still "missing" from unraid, run the check filesystem against disk #1.  Then you reinstall the drive and rebuild the contents onto itself

Link to comment

Another update:

I restarted the array in maintenance mode so I could repair the now missing, emulated drive. I ran xfs_repair on /dev/mapper/md1 and it did it's thing. What I did not expect is that it xfs_repair moved every single file/directory on the drive to lost+found. This is especially confusing because running the same command on the real drive did not do this.

 

Since the original drive is fine, I'm just going to do a new config and let parity get rebuilt.

 

I realize I did not handle this the "right" way from the beginning, but I can't help but wonder if I had run the repair against the real drive in the first place if I would now be forced to manually go through 10TB worth of lost+found files and manually move/rename them.

Link to comment
20 minutes ago, timekiller said:

Another update:

I restarted the array in maintenance mode so I could repair the now missing, emulated drive. I ran xfs_repair on /dev/mapper/md1 and it did it's thing. What I did not expect is that it xfs_repair moved every single file/directory on the drive to lost+found. This is especially confusing because running the same command on the real drive did not do this.

 

Since the original drive is fine, I'm just going to do a new config and let parity get rebuilt.

 

I realize I did not handle this the "right" way from the beginning, but I can't help but wonder if I had run the repair against the real drive in the first place if I would now be forced to manually go through 10TB worth of lost+found files and manually move/rename them.

The normal recommendation is to start by running the repair against the emulated drive.   If that results in a lot of files in lost+found (because the emulated drive has bad corruption) then try repairing the physical drive (as the level of corruption can be different) as a backup solution.   In many cases repairing the emulated drive goes through error free.

Link to comment

Probably these are the root cause of all your trouble, including why the emulated drive repair didn't go well:

04:00.0 SATA controller [0106]: Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller [1b4b:9215]
05:00.0 SATA controller [0106]: Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller [1b4b:9215]
06:00.0 SATA controller [0106]: Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller [1b4b:9215]
07:00.0 SATA controller [0106]: Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller [1b4b:9215]

And likely to give problems in the future.

Link to comment
48 minutes ago, trurl said:

Probably these are the root cause of all your trouble, including why the emulated drive repair didn't go well:

04:00.0 SATA controller [0106]: Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller [1b4b:9215]
05:00.0 SATA controller [0106]: Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller [1b4b:9215]
06:00.0 SATA controller [0106]: Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller [1b4b:9215]
07:00.0 SATA controller [0106]: Marvell Technology Group Ltd. 88SE9215 PCIe 2.0 x1 4-port SATA 6 Gb/s Controller [1b4b:9215]

And likely to give problems in the future.

I'm open to recommendations for a replacement. I've asked for suggestions more than once, but haven't received a straight answer. I have 21 drives, so need 16 port cards. 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.