Starting array with failed disk after lost disk assignments

October 23, 20187 yr

Good Morning,

I had a disk fail, which i replaced. However, sometime before the parity sync/rebuild finished there was a power outage. When i booted the unraid back up - the disk assignments were lost. Through my research it sounds like its just critical that i get the parity disk correct - the other disks can be put in any location without negative consequence. So, i determined which disk was the parity, and have put it in the parity slot, and my other disks all in the disk # slots.

What I am unsure of, is should I now start the array as normal, start it with "parity is already valid" selected, or do something else entirely? Whats throwing me off is that all of the disks are recognized as a "New Device" right now (blue square). I want it to rebuilt the data on the failed disk, and trust the data on the others and the parity. How do I go about this without destroying everything?

Thanks!

Quote

October 23, 20187 yr

Community Expert

25 minutes ago, daemian said:

Through my research it sounds like its just critical that i get the parity disk correct

Yes in a normal situation, during a rebuild it's not so simple, start by posting the release your using, as well as if you're single or dual parity, or post the diagnostics.

Quote

October 23, 20187 yr

Author

Sure, version 6.5.3 single parity config. Diagnostics attached.

Thanks

dt-ur01-diagnostics-20181023-0850.zip

Quote

October 23, 20187 yr

Community Expert

A couple of questions:

Do you know what disk you were rebuilding, not the old disk#, the actual disk serial or current disk#?

Is parity the 6TB Hitachi or one of the currently assigned data disks?

Quote

October 23, 20187 yr

Author

Quote

Do you know what disk you were rebuilding, not the old disk#, the actual disk serial or current disk#?

I am pretty certain it is WCC4N0334109. I say that because i put all of the drives in as data drives, and strted the array (with no parity). The other 3 looked fine, but that one showed "Unmountable: No file system". I presume that would be because the power failure occurred before the parity sync finished.

Quote

Is parity the 6TB Hitachi or one of the currently assigned data disks?

The 6TB drive is the parity.

Edited October 23, 20187 yr by daemian

Quote

October 23, 20187 yr

Community Expert

1 minute ago, daemian said:

I am pretty certain it is WCC4N0334109. I say that because i put all of the drives in as data drives, and strted the array (with no parity). The other 3 looked fine, but that one showed "Unmountable: No file system". I presume that would be because the power failure occurred before the parity sync finished.

If parity is the 6TB then that's likely it, though it would have been best if the data disks were mounted read-only, but this should still work:

-Tools -> New Config -> Retain current configuration: All -> Apply
-Assign any missing disk(s) like parity
-Important - After checking the assignments leave the browser on that page, the "Main" page.

-Open an SSH session/use the console and type (I'll assume disk to rebuild is still disk1 if not adjust the command):

mdcmd set invalidslot 1 29

-Back on the GUI and without refreshing the page, just start the array, do not check the "parity is already valid" box, disk1 will start rebuilding, disk should mount immediately but if it's unmountable don't format, wait for the rebuild to finish and then run a filesystem check

Quote

October 23, 20187 yr

Author

So I just want to double check, this is what the screen looks like now:

image.png.c7587283b14da9b87d600bb41cee54ce.png

I have issues this command at the CLI

I have not refreshed or left the page. Now I am going to start the array, without the "Parity is already valid" selected.

Is that all correct?

Thank you for your help!

Quote

October 23, 20187 yr

Community Expert

14 minutes ago, daemian said:

Is that all correct?

Yes

Quote

October 23, 20187 yr

Author

Sorry, to be a pest, when I click start its warning me "Parity disk(s) contents will be overwritten" -your sure, right?

Quote

October 23, 20187 yr

Community Expert

Yes, it's normal, the GUI doesn't take into account the invalid slot command, as long as you typed the command correctly and didn't refresh the GUI Unraid won't touch parity and start rebuilding disk1 instead.

Quote

October 24, 20187 yr

Author

OK - so the rebuild is completed. Now in the GUI disk 1 shows as "Unmountable: No file system"

Quote

October 24, 20187 yr

Community Expert

59 minutes ago, daemian said:

OK - so the rebuild is completed. Now in the GUI disk 1 shows as "Unmountable: No file system"

A rebuild does not fix an “unmountable” problem as it works at the physical sector level, not the file system level. You normally need to run the file system repair tools to fix the unmountable state.

Quote

October 24, 20187 yr

Community Expert

1 hour ago, daemian said:

OK - so the rebuild is completed. Now in the GUI disk 1 shows as "Unmountable: No file system"

Possibly the result of starting the disks read-write before without parity before, or worse, parity is not in sync, either way try a filesystem check:

https://wiki.unraid.net/Check_Disk_Filesystems#Drives_formatted_with_XFS

or

https://wiki.unraid.net/Check_Disk_Filesystems#Checking_and_fixing_drives_in_the_webGui

Quote

October 24, 20187 yr

Community Expert

P.S. I didn't notice at first since I didn't check the complete syslog but you also have problems with your cache pool, there are read and write errors on both devices, but mainly cache1:

Oct 23 08:04:35 dt-ur01 kernel: BTRFS info (device sdi1): bdev /dev/sdi1 errs: wr 166, rd 1, flush 0, corrupt 0, gen 0
Oct 23 08:04:35 dt-ur01 kernel: BTRFS info (device sdi1): bdev /dev/sdh1 errs: wr 863327568, rd 506341990, flush 65261822, corrupt 0, gen 0

These are hardware errors and with SSDs usually the result of bad cables, after replacing them run a scrub and check that all errors were corrected, though if you're using any NOCOW shares there might be some undetected corruption there.

Quote

October 25, 20187 yr

Author

Thanks for pointing out the cache drive - I will check that out when i can.

For the original issue, when I try to run xfs_repair I get the following error:

root@dt-ur01:~# xfs_repair -v /dev/md1
Phase 1 - find and verify superblock...
        - block cache size set to 2290880 entries
Phase 2 - using internal log
        - zero log...
Log inconsistent (didn't find previous header)
failed to find log head
zero_log: cannot find log head/tail (xlog_find_tail=5)
ERROR: The log head and/or tail cannot be discovered. Attempt to mount the
filesystem to replay the log or use the -L option to destroy the log and
attempt a repair.

Do i try it with the -L options? It sounds like that may result in [more] data lose, but perhaps I don't really have any other option?

Thank you again for all of your time and assistance.

Quote

October 25, 20187 yr

Community Expert

9 minutes ago, daemian said:

Do i try it with the -L options?

Yes, usually there's no data loss.

Quote

October 25, 20187 yr

Author

well -L didn't get me any further

root@dt-ur01:~# xfs_repair -Lv /dev/md1
Phase 1 - find and verify superblock...
        - block cache size set to 2290880 entries
Phase 2 - using internal log
        - zero log...
Log inconsistent (didn't find previous header)
failed to find log head
zero_log: cannot find log head/tail (xlog_find_tail=5)

Quote

October 25, 20187 yr

Community Expert

This means the rebuilt disk has more serious corruption, either parity wasn't valid before or possibly the result of mounting the disks read-write before rebuilding, like I mentioned disks should be mounted read only since there will always be some filesystem housekeeping that won't be reflected in the existing parity, since it wasn't assigned, btrfs you'll usually never survive this, reiserfs usually survives without issues, xfs most times should survive but other times might not.

Quote

October 26, 20187 yr

Community Expert

One thing I forgot to mention, I've seen the error above as a result of a hardware issues before, and looking at your diags I see you're using the onboard Intel controller, and that's good, but it's set to IDE mode, change it to AHCI in the bios and try xfs_repair again.

Quote

October 26, 20187 yr

Author

Thanks johnnie. I believe i got the controller running in AHCI mode now instead, but the xfs_repair still fails the same. How could I confirm that it is now running in AHCI?

Quote

October 26, 20187 yr

Community Expert

Post current diags and I can check.

Quote

October 26, 20187 yr

Author

dt-ur01-diagnostics-20181026-1222.zip

Quote

October 26, 20187 yr

Community Expert

It's correct now, a couple more things you can try: upgrade to v6.6.2 since it has a newer xfs_repair release and if that still fails connect that disk to another pc, it would lose sync with parity but it might be worth a try.

Quote

October 26, 20187 yr

Author

Thanks Johnnie.

I upgraded to 6.5.3 and tried xfs_repair against. Still no luck. Putting this disk in another machine is not really an option for me with this one (I am remote to the site, and there are not much in the way of resources there).

I think I may need to bite the bullet and just format the drive, conceding that the data from that drive is lost. Its probably not really that big of a deal. Obviously not ideal, but I don't think I have much other choice. Would I just format that drive and then run a parity check to be sure everything is ok?

Quote

October 27, 20187 yr

Community Expert

Just formatting is enough, parity will be updated, then the regular scheduled checking suffices.

Quote

Starting array with failed disk after lost disk assignments

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)