Drives became unreadable during parity check - now 2 are unmountable after reboot


Go to solution Solved by trurl,

Recommended Posts

Hello,

 

I woke up this morning and found some of my dockers weren't working this morning.  However, I was busy with work so I couldn't get to it in the evening, when I looked at my unraid server I saw that my most recent 3 drives were listed as missing from the raid (the ones from the first screenshot.

 

I have a 12 drive array with 2 parity drives.  I have a Silverstone SST-CS308B case with a startech 3 drive hotswap bay with a PCIE drive extension card (not sure which one but I know i bought it from the known working list over 5 years ago).

 

The 3 drives in question are in that 3 drive hotswap bay.  I disabled the array (downloaded the diagnostics) and tried to get the drives detectable again but with no avail.  I rebooted the server and now 1 of the drives are usable, but 2 are now listed as detected but unmountable.

 

They've been in the system since December or so, they don't have too much data but i want to see if there is a way to make them mountable at this point.

 

 

Thanks in advance.image.thumb.png.d2aaaa6c5b30f2ed6ab562dcd4027f3a.png
 

 

 

llnnas1337-diagnostics-20240313-1909.zip

Edited by brambo23
updating screenshot
Link to comment

I would really recommend avoiding SATA port multipliers:

 

Mar 13 04:02:57 LLNNAS1337 kernel: ata4.15: Port Multiplier detaching
Mar 13 04:02:57 LLNNAS1337 kernel: ahci 0000:03:00.0: FBS is disabled
Mar 13 04:02:57 LLNNAS1337 kernel: ata4.01: disabled
Mar 13 04:02:57 LLNNAS1337 kernel: ata4.02: disabled
Mar 13 04:02:57 LLNNAS1337 kernel: ata4.03: disabled
Mar 13 04:02:57 LLNNAS1337 kernel: ata4.04: disabled
Mar 13 04:02:57 LLNNAS1337 kernel: ata4.00: disabled

 

If detached and dropped all connected disks, reboot and post new diags after array start.

Link to comment
3 hours ago, JorgeB said:

I would really recommend avoiding SATA port multipliers:

 

Mar 13 04:02:57 LLNNAS1337 kernel: ata4.15: Port Multiplier detaching
Mar 13 04:02:57 LLNNAS1337 kernel: ahci 0000:03:00.0: FBS is disabled
Mar 13 04:02:57 LLNNAS1337 kernel: ata4.01: disabled
Mar 13 04:02:57 LLNNAS1337 kernel: ata4.02: disabled
Mar 13 04:02:57 LLNNAS1337 kernel: ata4.03: disabled
Mar 13 04:02:57 LLNNAS1337 kernel: ata4.04: disabled
Mar 13 04:02:57 LLNNAS1337 kernel: ata4.00: disabled

 

If detached and dropped all connected disks, reboot and post new diags after array start.

I assume that's talking about my pcie sata extension card?

Link to comment
4 minutes ago, brambo23 said:

about the 2 drives. Is there any hope to restore the drive into the raid? Or do I just have to take the loss and reformat them?

Since the drives are currently marked as 'disabled' then Unraid has stopped using them and should be emulating them.   You can see if the process for handling unmountable drives in the online documentation accessible via the Manual link at the bottom of the Unraid GUI works for the emulated drive.   

 

If not there is a good chance that all (or at least most) of the contents can be recovered from the physical drives.

  • Thanks 1
Link to comment
8 hours ago, itimpi said:

Since the drives are currently marked as 'disabled' then Unraid has stopped using them and should be emulating them.   You can see if the process for handling unmountable drives in the online documentation accessible via the Manual link at the bottom of the Unraid GUI works for the emulated drive.   

 

If not there is a good chance that all (or at least most) of the contents can be recovered from the physical drives.

So out of curiosity,

 

since there are two drives that are needing a reformat. If I ended up just reformatting those, wouldn’t the parity restore the data to those drives?

Link to comment
Just now, brambo23 said:

since there are two drives that are needing a reformat. If I ended up just reformatting those, wouldn’t the parity restore the data to those drives?

NO.   

 

The drives are not 'needing' a format'  They are needing their corrupt file system to be repaired.   If you attempt to format the drives with them disabled it will simply format the 'emulated' drives to contain an empty file system and update parity to reflect that so in effect you wipe all your data,    A format operation is NEVER part of a data recovery action unless you WANT to remove the data on the drives you format.

 

 

Link to comment
1 minute ago, itimpi said:

NO.   

 

The drives are not 'needing' a format'  They are needing their corrupt file system to be repaired.   If you attempt to format the drives with them disabled it will simply format the 'emulated' drives to contain an empty file system and update parity to reflect that so in effect you wipe all your data,    A format operation is NEVER part of a data recovery action unless you WANT to remove the data on the drives you format.

 

 

Understood. So if I replaced those drives with NEW drives. It would repopulate the data correct?

Link to comment
9 minutes ago, brambo23 said:

So if I replaced those drives with NEW drives. It would repopulate the data correct?

It would make the NEW drive match the emulated drive.   That means if the emulated drive has a corrupt file system then the rebuilt drive will also have a corrupt file system.  That is why I said you should be trying to repair the file system on the emulated drive(s), and it is recommended that you do this before attempting a rebuild.   

 

The disabled drives may actually be fine - have you tried unassigning them and then seeing if it mounts in Unassigned Devices?

Link to comment
1 hour ago, itimpi said:

The disabled drives may actually be fine - have you tried unassigning them and then seeing if it mounts in Unassigned Devices?


I haven’t tried anything yet. I’ve been busy the last few days and haven’t really had time to address it. Hoping to read up when I can then have a plan of attack. So all of this is helpful

Link to comment
26 minutes ago, JonathanM said:

Did you install the Unassigned Devices plugin?

No i did not.

 

Currently running version 6.9.2 of unraid and the one I just downloaded is for 6.11.0 and up. 

 

I'm going to see if I can find a version compatible with 6.9.2

i highly doubt i can upgrade unraid in this state

Link to comment
On 3/15/2024 at 6:18 AM, JonathanM said:

Pretty sure you can, I can't think of any changes that would effect your ability to recover.

So i upgraded, installed

 

On 3/14/2024 at 8:51 PM, trurl said:

Check filesystem on each of those disks, using the webUI. Capture the output and post it.

For drive in SDC (252MG) :

Quote

FS: xfs

Executing file system check: /sbin/xfs_repair -n '/dev/sdc1' 2>&1
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
ALERT: The filesystem has valuable metadata changes in a log which is being
ignored because the -n option was used. Expect spurious inconsistencies
which may be resolved by first mounting the filesystem to replay the log.
- scan filesystem freespace and inode maps...
sb_fdblocks 3210815618, counted 3234651569
- found root inode chunk
Phase 3 - for each AG...
- scan (but don't clear) agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- agno = 8
- agno = 9
- agno = 10
- agno = 11
- agno = 12
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 3
- agno = 8
- agno = 10
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- agno = 11
- agno = 9
- agno = 2
- agno = 12
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.

File system corruption detected!

 

For drive in sdb (RD0B)

Quote

FS: xfs

Executing file system check: /sbin/xfs_repair -n '/dev/sdb1' 2>&1
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan (but don't clear) agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- agno = 8
- agno = 9
- agno = 10
- agno = 11
- agno = 12
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 4
- agno = 5
- agno = 9
- agno = 11
- agno = 12
- agno = 7
- agno = 8
- agno = 3
- agno = 6
- agno = 2
- agno = 10
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.

No file system corruption detected!

 

Link to comment

So after this test, I mounted and unmounted the drives (unassigned) and it was able to read the amount of data on the drive, I ran the file system check on both drives and now both of them say no file system corruption detected.

 

Also I did buy and install a new card based on the recommendation of @trurl 

should I be ok to mount these drives back in the array and try to start the array again?

Link to comment
2 minutes ago, trurl said:

Do it again without -n. If it asks for it use -L

I ran it again (didn't change any flags) now it says no data corruption detected

 

Quote

FS: xfs

Executing file system check: /sbin/xfs_repair -n '/dev/sdc1' 2>&1
Phase 1 - find and verify superblock...
Phase 2 - using internal log
- zero log...
- scan filesystem freespace and inode maps...
- found root inode chunk
Phase 3 - for each AG...
- scan (but don't clear) agi unlinked lists...
- process known inodes and perform inode discovery...
- agno = 0
- agno = 1
- agno = 2
- agno = 3
- agno = 4
- agno = 5
- agno = 6
- agno = 7
- agno = 8
- agno = 9
- agno = 10
- agno = 11
- agno = 12
- process newly discovered inodes...
Phase 4 - check for duplicate blocks...
- setting up duplicate extent list...
- check for inodes claiming duplicate blocks...
- agno = 0
- agno = 1
- agno = 4
- agno = 7
- agno = 8
- agno = 10
- agno = 5
- agno = 2
- agno = 9
- agno = 12
- agno = 6
- agno = 3
- agno = 11
No modify flag set, skipping phase 5
Phase 6 - check inode connectivity...
- traversing filesystem ...
- traversal finished ...
- moving disconnected inodes to lost+found ...
Phase 7 - verify link counts...
No modify flag set, skipping filesystem flush and exiting.

No file system corruption detected!

 

Link to comment
3 minutes ago, trurl said:

 

Can you actually see your data on each of the disks?

 

If you reassign them it will want to rebuild. You will have to New Config them back into the array and rebuild parity.

 

https://docs.unraid.net/unraid-os/manual/storage-management/#reset-the-array-configuration

I can see data on the disks.

 

Do I have to reassign them?  Can i just add them back to the disk they were originally assigned to?

image.png

image.png

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.