Undoing New Config

Randommuch2010 · May 15, 2019

Hey everyone,

I'm quite inexperienced with Unraid, and seemingly have myself in a bit of a jam.

I've had two drives marked as faulty, contents emulated. One of said drives has stopped working entirely and I've swapped the failed 2TB drive for a new 3TB. When trying to start the array to begin a rebuild, I kept getting "Too many wrong disks" error.

I've attempted to create a new config but am unsure about data loss and how to recover the files on the drives. Any idea how to undo the new config so I can work on repairing the old setup?

TIA

JorgeB · May 15, 2019

We need more details on current config, or better yet post the diagnostics.

itimpi · May 15, 2019

You should include the system diagnostics zip file (obtained via Tools->Diagnostics) so we can see what state your system is in. A screen shot of the Main tab would also be a good idea.

Doing New Config was NOT what you should have done as by default that leads to data loss unless you know exactly what you are doing. It is definitely not the normal way to recover after a disk failure. Having said that once we know exactly what state your system is in we can give the best advice on how to avoid (or at least minimise) any data loss. You should not attempt to do anything yourself at this stage without further guidance or you may make things worse.

Do you have backups of the data that is at risk?

Edited May 15, 2019 by itimpi

Randommuch2010 · May 15, 2019

I've got a copy of the diagnostics from three days ago, so I'm not sure if I can somehow use that to recover from my mistakes.

It's been a long day and I had a brain fart

I've not done anything more since hitting "New Config", as I'm too concerned to risk any more errors on my part. Regarding the backup, there is no backup of the at-risk data, but it is only Movies and TV shows, so it's not the end of the world if it gets lost.

I've attached the diagnostics as requested.

tower-diagnostics-20190515-1746.zip

Edited May 15, 2019 by Randommuch2010
Extra info

itimpi · May 15, 2019

1 minute ago, Randommuch2010 said:

I've got a copy of the diagnostics from three days ago, so I'm not sure if I can somehow use that to recover from my mistakes.

It's been a long day and I had a brain fart

I've attached the diagnostics as requested.

tower-diagnostics-20190515-1746.zip 72.56 kB · 0 downloads

Is there any reason you cannot get the current Diagnostics so we can see the current state? Also a screen shot of the Main tab. We need to know exactly what the current state is so old diagnostics are of minimal value.

Randommuch2010 · May 15, 2019

The above diagnostics are from today, I also have a copy of the diagnostics from the 12th of May.

1 minute ago, itimpi said:

Is there any reason you cannot get the current Diagnostics so we can see the current state? Also a screen shot of the Main tab. We need to know exactly what the current state is so old diagnostics are of minimal value.

itimpi · May 15, 2019

Sorry - I misread you post and should have looked more closely at the diagnostics file name and realised they had to be current.

Some extra questions:

Which is the drive you are trying to replace? Looking at your screen shot it looks like it could be disk1 or disk4.
You said that you had problems with another drive? If so which one? Was this at the same time as the one you are trying to replace?
Are you sure the drive to be replaced has really failed? A disk being disabled by Unraid just means that a write to it failed, and these can be caused by external factors and no necessarily the drive itself.

Randommuch2010 · May 15, 2019

Just now, itimpi said:

Sorry - I misread you post and should have looked more closely at the diagnostics file name and realised they had to be current.

Some extra questions:

Which is the drive you are trying to replace? Looking at your screen shot it looks like it could be disk1 or disk4.

You said that you had problems with another drive? If so which one? Was this at the same time as the one you are trying to replace?

Are you sure the drive to be replaced has really failed? A disk being disabled by Unraid just means that a write to it failed, and these can be caused by external factors and no necessarily the drive itself.

The drive I'm trying to replace is an older SeaGate 2TB (ST2000DM001-9YN164_S2F01HQB), and I'm replacing it with the new 3TB Western Digital in the disk4 slot (WDC_WD30EFRX-68EUZN0_WD-WCC4N5DRV984). The problem drive was the SeaGate 2TB, it simply dropped from the array and hasn't recovered. It's not found by the SATA controller on the motherboard, tried swapping power+SATA cables to a known good pair and no joy.

itimpi · May 15, 2019

If the old drive cannot be seen at the BIOS or controller level then it probably really has failed

I think the following will work, but you might want to wait for @johnnie.black to confirm as he tends to be the best authority on such things, and may well have an alternative suggestion.:

Tick the Parity is valid checkbox on the main tab.
Start the array to get all the current disks recognised. We need to do this because you did the New Config which made Unraid forget all current assignments. The replacement disk will probably come up as unmountable. Because we ticked the Parity is valid checkbox Unraid should not be doing anything to the disks at this point other than recording their serial numbers.
Stop the array and unassign the replacement disk from disk4
Start the array which should now say it is emulating the missing disk. This step causes Unraid to 'forget' the replacement disk's serial number. You need this to happen so a later step works correctly. Ideally at this point disk4 should shows as mounted OK (albeit it is being emulated by the combination of the other disks plus parity) and you can look at its contents to check they are what you expect. This is what will eventually be rebuilt to the replacement disk so if it does not look correct (or comes up unmountable) at this stage stop and ask for help
Stop the array and re-assign the replacement disk to disk4
Start the array. This time it should say that is going to rebuild the contents of disk4. Let this run to completion and you should be good to go with all your data intact.
It would be a good idea to take a backup of your flash drive at this point by clicking on the flash drive on the Main tab and selecting the option to make a Flash Backup. This is best practise to do any time you make a configuration change just in case the flash drive ever fails and you need to switch to a new one.

If at any stage something unexpected happens stop and ask for help to avoid making things worse.

Do you have backups of any critical data on your Unraid server? You should always have these as data loss can occur for a wide variety of reasons not all of which Unraid can protect you from.

Edited May 15, 2019 by itimpi

JorgeB · May 15, 2019

1 hour ago, Randommuch2010 said:

I've had two drives marked as faulty, contents emulated

This part concerns me, as you only have single parity, but assuming parity is valid and only one drive is missing, this would be the best bet of recovering that data:

-Leave the new config as is

-Important - leave the browser on that page, the "Main" page.

-Open an SSH session/use the console and type (don't copy/paste directly from the forum, as sometimes it can insert extra characters):

mdcmd set invalidslot 4 29

-Back on the GUI and without refreshing the page, just start the array, do not check the "parity is already valid" box (GUI will still show that data on parity disk(s) will be overwritten, this is normal as it doesn't account for the invalid slot command, but they won't be as long as the procedure was correctly done), disk4 will start rebuilding, disk should mount immediately but if it's unmountable don't format, wait for the rebuild to finish and then run a filesystem check

Randommuch2010 · May 15, 2019

42 minutes ago, johnnie.black said:
This part concerns me, as you only have single parity, but assuming parity is valid and only one drive is missing, this would be the best bet of recovering that data:

-Leave the new config as is

-Important - leave the browser on that page, the "Main" page.

-Open an SSH session/use the console and type (don't copy/paste directly from the forum, as sometimes it can insert extra characters):
mdcmd set invalidslot 4 29
-Back on the GUI and without refreshing the page, just start the array, do not check the "parity is already valid" box (GUI will still show that data on parity disk(s) will be overwritten, this is normal as it doesn't account for the invalid slot command, but they won't be as long as the procedure was correctly done), disk4 will start rebuilding, disk should mount immediately but if it's unmountable don't format, wait for the rebuild to finish and then run a filesystem check

I've entered that into the CLI, and it hasn't spat out any sort of conformation for me, I'm unsure if that's normal or not. It's also still saying "Parity disk(s) content will be overwritten" when I try to start the array. Do I proceed anyway or is there something I'm missing?

JorgeB · May 15, 2019

1 minute ago, Randommuch2010 said:

I've entered that into the CLI, and it hasn't spat out any sort of conformation for me,

That's normal

1 minute ago, Randommuch2010 said:

It's also still saying "Parity disk(s) content will be overwritten"

2 minutes ago, Randommuch2010 said:

(GUI will still show that data on parity disk(s) will be overwritten, this is normal as it doesn't account for the invalid slot command, but they won't be as long as the procedure was correctly done)

Randommuch2010 · May 15, 2019

1 minute ago, johnnie.black said:

That's normal

Including the parity overwrite warning? If so, presumably I'm good to hit "Proceed".

JorgeB · May 15, 2019

5 minutes ago, johnnie.black said:

(GUI will still show that data on parity disk(s) will be overwritten, this is normal as it doesn't account for the invalid slot command, but they won't be as long as the procedure was correctly done)

Randommuch2010 · May 15, 2019

Whoops, sorry about that! Just hit proceed and it's cracking along with the Parity sync/rebuild as of about two minutes ago. Shares are all immediatly reachable and the data doesn't seem to have disappeared. I'll let it continue running throughout the night and check back in tomorrow.

I can't thank you enough for you help!

Randommuch2010 · May 17, 2019

On 5/15/2019 at 7:02 PM, johnnie.black said:
This part concerns me, as you only have single parity, but assuming parity is valid and only one drive is missing, this would be the best bet of recovering that data:

-Leave the new config as is

-Important - leave the browser on that page, the "Main" page.

-Open an SSH session/use the console and type (don't copy/paste directly from the forum, as sometimes it can insert extra characters):
mdcmd set invalidslot 4 29
-Back on the GUI and without refreshing the page, just start the array, do not check the "parity is already valid" box (GUI will still show that data on parity disk(s) will be overwritten, this is normal as it doesn't account for the invalid slot command, but they won't be as long as the procedure was correctly done), disk4 will start rebuilding, disk should mount immediately but if it's unmountable don't format, wait for the rebuild to finish and then run a filesystem check

Rebuild has been marked as complete, Disk4 is still showing as "No filesystem" and as a result, I can't run a filesystem check. Only option it's giving me currently is the option to format it. Any ideas?

JorgeB · May 17, 2019

Post new diags.

Randommuch2010 · May 17, 2019

Attached new diags.

tower-diagnostics-20190517-1832.zip

JorgeB · May 17, 2019

Disk3 is failing and there were read errors during disk4's rebuild, so there will be data corruption on the rebuilt disk4, you can still try to see if xfs_repair can fix the filesystem, start the array in maintenance mode and type:

xfs_repair -v /dev/md4

Randommuch2010 · May 17, 2019

Phase 1 - find and verify superblock...
- block cache size set to 326760 entries
Phase 2 - using internal log
- zero log...
zero_log: head block 271366 tail block 271362
ERROR: The filesystem has valuable metadata changes in a log which needs to
be replayed. Mount the filesystem to replay the log, and unmount it before
re-running xfs_repair. If you are unable to mount the filesystem, then use
the -L option to destroy the log and attempt a repair.
Note that destroying the log may cause corruption -- please attempt a mount
of the filesystem before doing this.

Got this back when I attempted to run that command

JorgeB · May 17, 2019

run again with -L

Randommuch2010 · May 17, 2019

Last phase appears to have failed;

Phase 6 - check inode connectivity...
- resetting contents of realtime bitmap and summary inodes
- traversing filesystem ...
entry ".." in directory inode 100 points to non-existent inode 538430905
bad hash table for directory inode 100 (no data entry): rebuilding
rebuilding directory inode 100
bad hash table for directory inode 140 (no data entry): rebuilding
rebuilding directory inode 140
entry ".." in directory inode 141 points to non-existent inode 4050559755
bad hash table for directory inode 141 (no data entry): rebuilding
rebuilding directory inode 141
entry ".." in directory inode 16935984 points to non-existent inode 552014935
bad hash table for directory inode 16935984 (no data entry): rebuilding
rebuilding directory inode 16935984
bad hash table for directory inode 67099682 (no data entry): rebuilding
rebuilding directory inode 67099682
xfs_repair: phase6.c:1376: longform_dir2_rebuild: Assertion `done' failed.
Aborted

JorgeB · May 17, 2019

2 minutes ago, Randommuch2010 said:

Last phase appears to have failed;

Yes, it's a xfs_repair bug, you need to update to v6.7 and run it again.

Randommuch2010 · May 17, 2019

Last few lines of the xfs_repair;

"resetting inode 817376329 nlinks from 5 to 3
resetting inode 817376333 nlinks from 4 to 3
Maximum metadata LSN (7:785632) is ahead of log (1:2).
Format log to cycle 10.
cache_purge: shake on cache 0x5231e0 left 5 nodes!?
cache_purge: shake on cache 0x5231e0 left 5 nodes!?
cache_zero_check: refcount is 1, not zero (node=0x15319c092410)
cache_zero_check: refcount is 1, not zero (node=0x153184091410)
cache_zero_check: refcount is 1, not zero (node=0x153184089060)
cache_zero_check: refcount is 1, not zero (node=0x15318408ee10)
cache_zero_check: refcount is 1, not zero (node=0x15319c00b210)
done"

Unsure if it usually runs through in only a couple of minutes, disk4 is still showing as unmountable.

JorgeB · May 17, 2019

Start the array normally and post new diags grabbed after that.

Undoing New Config

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Join the conversation