Drives became unreadable during parity check - now 2 are unmountable after reboot

brambo23 · March 14

Hello,

I woke up this morning and found some of my dockers weren't working this morning. However, I was busy with work so I couldn't get to it in the evening, when I looked at my unraid server I saw that my most recent 3 drives were listed as missing from the raid (the ones from the first screenshot.

I have a 12 drive array with 2 parity drives. I have a Silverstone SST-CS308B case with a startech 3 drive hotswap bay with a PCIE drive extension card (not sure which one but I know i bought it from the known working list over 5 years ago).

The 3 drives in question are in that 3 drive hotswap bay. I disabled the array (downloaded the diagnostics) and tried to get the drives detectable again but with no avail. I rebooted the server and now 1 of the drives are usable, but 2 are now listed as detected but unmountable.

They've been in the system since December or so, they don't have too much data but i want to see if there is a way to make them mountable at this point.

Thanks in advance.

llnnas1337-diagnostics-20240313-1909.zip

Edited March 14 by brambo23
updating screenshot

JorgeB · March 14

I would really recommend avoiding SATA port multipliers:

Mar 13 04:02:57 LLNNAS1337 kernel: ata4.15: Port Multiplier detaching
Mar 13 04:02:57 LLNNAS1337 kernel: ahci 0000:03:00.0: FBS is disabled
Mar 13 04:02:57 LLNNAS1337 kernel: ata4.01: disabled
Mar 13 04:02:57 LLNNAS1337 kernel: ata4.02: disabled
Mar 13 04:02:57 LLNNAS1337 kernel: ata4.03: disabled
Mar 13 04:02:57 LLNNAS1337 kernel: ata4.04: disabled
Mar 13 04:02:57 LLNNAS1337 kernel: ata4.00: disabled

If detached and dropped all connected disks, reboot and post new diags after array start.

brambo23 · March 14

3 hours ago, JorgeB said:

I would really recommend avoiding SATA port multipliers:

Mar 13 04:02:57 LLNNAS1337 kernel: ata4.15: Port Multiplier detaching
Mar 13 04:02:57 LLNNAS1337 kernel: ahci 0000:03:00.0: FBS is disabled
Mar 13 04:02:57 LLNNAS1337 kernel: ata4.01: disabled
Mar 13 04:02:57 LLNNAS1337 kernel: ata4.02: disabled
Mar 13 04:02:57 LLNNAS1337 kernel: ata4.03: disabled
Mar 13 04:02:57 LLNNAS1337 kernel: ata4.04: disabled
Mar 13 04:02:57 LLNNAS1337 kernel: ata4.00: disabled

If detached and dropped all connected disks, reboot and post new diags after array start.

I assume that's talking about my pcie sata extension card?

JorgeB · March 14

Yes, it has a SATA port multiplier.

brambo23 · March 14

7 minutes ago, JorgeB said:

Yes, it has a SATA port multiplier.

So what would you recommend to add additional SATA ports to my system? The card used to be a recommended on for unraid.

trurl · March 14

brambo23 · March 14

19 minutes ago, trurl said:

Awesome. I’ll take a look at that asap.

about the 2 drives. Is there any hope to restore the drive into the raid? Or do I just have to take the loss and reformat them?

itimpi · March 14

4 minutes ago, brambo23 said:

about the 2 drives. Is there any hope to restore the drive into the raid? Or do I just have to take the loss and reformat them?

Since the drives are currently marked as 'disabled' then Unraid has stopped using them and should be emulating them. You can see if the process for handling unmountable drives in the online documentation accessible via the Manual link at the bottom of the Unraid GUI works for the emulated drive.

If not there is a good chance that all (or at least most) of the contents can be recovered from the physical drives.

brambo23 · March 14

8 hours ago, itimpi said:

Since the drives are currently marked as 'disabled' then Unraid has stopped using them and should be emulating them. You can see if the process for handling unmountable drives in the online documentation accessible via the Manual link at the bottom of the Unraid GUI works for the emulated drive.

If not there is a good chance that all (or at least most) of the contents can be recovered from the physical drives.

So out of curiosity,

since there are two drives that are needing a reformat. If I ended up just reformatting those, wouldn’t the parity restore the data to those drives?

itimpi · March 14

Just now, brambo23 said:

since there are two drives that are needing a reformat. If I ended up just reformatting those, wouldn’t the parity restore the data to those drives?

NO.

The drives are not 'needing' a format' They are needing their corrupt file system to be repaired. If you attempt to format the drives with them disabled it will simply format the 'emulated' drives to contain an empty file system and update parity to reflect that so in effect you wipe all your data, A format operation is NEVER part of a data recovery action unless you WANT to remove the data on the drives you format.

brambo23 · March 14

1 minute ago, itimpi said:

NO.

The drives are not 'needing' a format' They are needing their corrupt file system to be repaired. If you attempt to format the drives with them disabled it will simply format the 'emulated' drives to contain an empty file system and update parity to reflect that so in effect you wipe all your data, A format operation is NEVER part of a data recovery action unless you WANT to remove the data on the drives you format.

Understood. So if I replaced those drives with NEW drives. It would repopulate the data correct?

itimpi · March 14

9 minutes ago, brambo23 said:

So if I replaced those drives with NEW drives. It would repopulate the data correct?

It would make the NEW drive match the emulated drive. That means if the emulated drive has a corrupt file system then the rebuilt drive will also have a corrupt file system. That is why I said you should be trying to repair the file system on the emulated drive(s), and it is recommended that you do this before attempting a rebuild.

The disabled drives may actually be fine - have you tried unassigning them and then seeing if it mounts in Unassigned Devices?

brambo23 · March 15

1 hour ago, itimpi said:

The disabled drives may actually be fine - have you tried unassigning them and then seeing if it mounts in Unassigned Devices?

I haven’t tried anything yet. I’ve been busy the last few days and haven’t really had time to address it. Hoping to read up when I can then have a plan of attack. So all of this is helpful

trurl · March 15

Check filesystem on each of those disks, using the webUI. Capture the output and post it.

brambo23 · March 15

1 hour ago, trurl said:

Check filesystem on each of those disks, using the webUI. Capture the output and post it.

I'm assuming this means it's not mounted right?

JonathanM · March 15

9 minutes ago, brambo23 said:

I'm assuming this means it's not mounted right?

Did you install the Unassigned Devices plugin?

brambo23 · March 15

26 minutes ago, JonathanM said:

Did you install the Unassigned Devices plugin?

No i did not.

Currently running version 6.9.2 of unraid and the one I just downloaded is for 6.11.0 and up.

I'm going to see if I can find a version compatible with 6.9.2

i highly doubt i can upgrade unraid in this state

JonathanM · March 15

7 hours ago, brambo23 said:

i highly doubt i can upgrade unraid in this state

Pretty sure you can, I can't think of any changes that would effect your ability to recover.

brambo23 · March 15

3 hours ago, JonathanM said:

Pretty sure you can, I can't think of any changes that would effect your ability to recover.

well, i'll give it a shot then

brambo23 · March 18

On 3/15/2024 at 6:18 AM, JonathanM said:

Pretty sure you can, I can't think of any changes that would effect your ability to recover.

So i upgraded, installed

On 3/14/2024 at 8:51 PM, trurl said:

Check filesystem on each of those disks, using the webUI. Capture the output and post it.

For drive in SDC (252MG) :

Quote

FS: xfs

Executing file system check: /sbin/xfs_repair -n '/dev/sdc1' 2>&1
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
ignored because the -n option was used. Expect spurious inconsistencies
which may be resolved by first mounting the filesystem to replay the log.
- scan filesystem freespace and inode maps...
sb_fdblocks 3210815618, counted 3234651569
- found root inode chunk
Phase 3 - for each AG...
- scan (but don't clear) agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- agno = 8
- agno = 9
- agno = 10
- agno = 11
- agno = 12
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 3
- agno = 8
- agno = 10
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- agno = 11
- agno = 9
- agno = 2
- agno = 12
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.

File system corruption detected!

For drive in sdb (RD0B)

Quote

FS: xfs

Executing file system check: /sbin/xfs_repair -n '/dev/sdb1' 2>&1
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan (but don't clear) agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- agno = 8
- agno = 9
- agno = 10
- agno = 11
- agno = 12
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 4
- agno = 5
- agno = 9
- agno = 11
- agno = 12
- agno = 7
- agno = 8
- agno = 3
- agno = 6
- agno = 2
- agno = 10
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.

No file system corruption detected!

brambo23 · March 18

So after this test, I mounted and unmounted the drives (unassigned) and it was able to read the amount of data on the drive, I ran the file system check on both drives and now both of them say no file system corruption detected.

Also I did buy and install a new card based on the recommendation of @trurl

should I be ok to mount these drives back in the array and try to start the array again?

trurl · March 18

9 minutes ago, brambo23 said:

No modify flag set

Do it again without -n. If it asks for it use -L

brambo23 · March 18

2 minutes ago, trurl said:

Do it again without -n. If it asks for it use -L

I ran it again (didn't change any flags) now it says no data corruption detected

Quote

FS: xfs

Executing file system check: /sbin/xfs_repair -n '/dev/sdc1' 2>&1
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan (but don't clear) agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- agno = 8
- agno = 9
- agno = 10
- agno = 11
- agno = 12
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 4
- agno = 7
- agno = 8
- agno = 10
- agno = 5
- agno = 2
- agno = 9
- agno = 12
- agno = 6
- agno = 3
- agno = 11
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.

No file system corruption detected!

trurl · March 18

2 minutes ago, brambo23 said:

I ran it again (didn't change any flags) now it says no data corruption detected

Can you actually see your data on each of the disks?

If you reassign them it will want to rebuild. You will have to New Config them back into the array and rebuild parity.

https://docs.unraid.net/unraid-os/manual/storage-management/#reset-the-array-configuration

brambo23 · March 18

3 minutes ago, trurl said:

Can you actually see your data on each of the disks?

If you reassign them it will want to rebuild. You will have to New Config them back into the array and rebuild parity.

https://docs.unraid.net/unraid-os/manual/storage-management/#reset-the-array-configuration

I can see data on the disks.

Do I have to reassign them? Can i just add them back to the disk they were originally assigned to?

Drives became unreadable during parity check - now 2 are unmountable after reboot

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation