Randommuch2010 Posted May 15, 2019 Share Posted May 15, 2019 Hey everyone, I'm quite inexperienced with Unraid, and seemingly have myself in a bit of a jam. I've had two drives marked as faulty, contents emulated. One of said drives has stopped working entirely and I've swapped the failed 2TB drive for a new 3TB. When trying to start the array to begin a rebuild, I kept getting "Too many wrong disks" error. I've attempted to create a new config but am unsure about data loss and how to recover the files on the drives. Any idea how to undo the new config so I can work on repairing the old setup? TIA Quote Link to comment
JorgeB Posted May 15, 2019 Share Posted May 15, 2019 We need more details on current config, or better yet post the diagnostics. Quote Link to comment
itimpi Posted May 15, 2019 Share Posted May 15, 2019 (edited) You should include the system diagnostics zip file (obtained via Tools->Diagnostics) so we can see what state your system is in. A screen shot of the Main tab would also be a good idea. Doing New Config was NOT what you should have done as by default that leads to data loss unless you know exactly what you are doing. It is definitely not the normal way to recover after a disk failure. Having said that once we know exactly what state your system is in we can give the best advice on how to avoid (or at least minimise) any data loss. You should not attempt to do anything yourself at this stage without further guidance or you may make things worse. Do you have backups of the data that is at risk? Edited May 15, 2019 by itimpi Quote Link to comment
Randommuch2010 Posted May 15, 2019 Author Share Posted May 15, 2019 (edited) I've got a copy of the diagnostics from three days ago, so I'm not sure if I can somehow use that to recover from my mistakes. It's been a long day and I had a brain fart I've not done anything more since hitting "New Config", as I'm too concerned to risk any more errors on my part. Regarding the backup, there is no backup of the at-risk data, but it is only Movies and TV shows, so it's not the end of the world if it gets lost. I've attached the diagnostics as requested. tower-diagnostics-20190515-1746.zip Edited May 15, 2019 by Randommuch2010 Extra info Quote Link to comment
itimpi Posted May 15, 2019 Share Posted May 15, 2019 1 minute ago, Randommuch2010 said: I've got a copy of the diagnostics from three days ago, so I'm not sure if I can somehow use that to recover from my mistakes. It's been a long day and I had a brain fart I've attached the diagnostics as requested. tower-diagnostics-20190515-1746.zip 72.56 kB · 0 downloads Is there any reason you cannot get the current Diagnostics so we can see the current state? Also a screen shot of the Main tab. We need to know exactly what the current state is so old diagnostics are of minimal value. Quote Link to comment
Randommuch2010 Posted May 15, 2019 Author Share Posted May 15, 2019 The above diagnostics are from today, I also have a copy of the diagnostics from the 12th of May. 1 minute ago, itimpi said: Is there any reason you cannot get the current Diagnostics so we can see the current state? Also a screen shot of the Main tab. We need to know exactly what the current state is so old diagnostics are of minimal value. Quote Link to comment
itimpi Posted May 15, 2019 Share Posted May 15, 2019 Sorry - I misread you post and should have looked more closely at the diagnostics file name and realised they had to be current. Some extra questions: Which is the drive you are trying to replace? Looking at your screen shot it looks like it could be disk1 or disk4. You said that you had problems with another drive? If so which one? Was this at the same time as the one you are trying to replace? Are you sure the drive to be replaced has really failed? A disk being disabled by Unraid just means that a write to it failed, and these can be caused by external factors and no necessarily the drive itself. Quote Link to comment
Randommuch2010 Posted May 15, 2019 Author Share Posted May 15, 2019 Just now, itimpi said: Sorry - I misread you post and should have looked more closely at the diagnostics file name and realised they had to be current. Some extra questions: Which is the drive you are trying to replace? Looking at your screen shot it looks like it could be disk1 or disk4. You said that you had problems with another drive? If so which one? Was this at the same time as the one you are trying to replace? Are you sure the drive to be replaced has really failed? A disk being disabled by Unraid just means that a write to it failed, and these can be caused by external factors and no necessarily the drive itself. The drive I'm trying to replace is an older SeaGate 2TB (ST2000DM001-9YN164_S2F01HQB), and I'm replacing it with the new 3TB Western Digital in the disk4 slot (WDC_WD30EFRX-68EUZN0_WD-WCC4N5DRV984). The problem drive was the SeaGate 2TB, it simply dropped from the array and hasn't recovered. It's not found by the SATA controller on the motherboard, tried swapping power+SATA cables to a known good pair and no joy. Quote Link to comment
itimpi Posted May 15, 2019 Share Posted May 15, 2019 (edited) If the old drive cannot be seen at the BIOS or controller level then it probably really has failed I think the following will work, but you might want to wait for @johnnie.black to confirm as he tends to be the best authority on such things, and may well have an alternative suggestion.: Tick the Parity is valid checkbox on the main tab. Start the array to get all the current disks recognised. We need to do this because you did the New Config which made Unraid forget all current assignments. The replacement disk will probably come up as unmountable. Because we ticked the Parity is valid checkbox Unraid should not be doing anything to the disks at this point other than recording their serial numbers. Stop the array and unassign the replacement disk from disk4 Start the array which should now say it is emulating the missing disk. This step causes Unraid to 'forget' the replacement disk's serial number. You need this to happen so a later step works correctly. Ideally at this point disk4 should shows as mounted OK (albeit it is being emulated by the combination of the other disks plus parity) and you can look at its contents to check they are what you expect. This is what will eventually be rebuilt to the replacement disk so if it does not look correct (or comes up unmountable) at this stage stop and ask for help Stop the array and re-assign the replacement disk to disk4 Start the array. This time it should say that is going to rebuild the contents of disk4. Let this run to completion and you should be good to go with all your data intact. It would be a good idea to take a backup of your flash drive at this point by clicking on the flash drive on the Main tab and selecting the option to make a Flash Backup. This is best practise to do any time you make a configuration change just in case the flash drive ever fails and you need to switch to a new one. If at any stage something unexpected happens stop and ask for help to avoid making things worse. Do you have backups of any critical data on your Unraid server? You should always have these as data loss can occur for a wide variety of reasons not all of which Unraid can protect you from. Edited May 15, 2019 by itimpi Quote Link to comment
JorgeB Posted May 15, 2019 Share Posted May 15, 2019 1 hour ago, Randommuch2010 said: I've had two drives marked as faulty, contents emulated This part concerns me, as you only have single parity, but assuming parity is valid and only one drive is missing, this would be the best bet of recovering that data: -Leave the new config as is -Important - leave the browser on that page, the "Main" page. -Open an SSH session/use the console and type (don't copy/paste directly from the forum, as sometimes it can insert extra characters): mdcmd set invalidslot 4 29 -Back on the GUI and without refreshing the page, just start the array, do not check the "parity is already valid" box (GUI will still show that data on parity disk(s) will be overwritten, this is normal as it doesn't account for the invalid slot command, but they won't be as long as the procedure was correctly done), disk4 will start rebuilding, disk should mount immediately but if it's unmountable don't format, wait for the rebuild to finish and then run a filesystem check Quote Link to comment
Randommuch2010 Posted May 15, 2019 Author Share Posted May 15, 2019 42 minutes ago, johnnie.black said: This part concerns me, as you only have single parity, but assuming parity is valid and only one drive is missing, this would be the best bet of recovering that data: -Leave the new config as is -Important - leave the browser on that page, the "Main" page. -Open an SSH session/use the console and type (don't copy/paste directly from the forum, as sometimes it can insert extra characters): mdcmd set invalidslot 4 29 -Back on the GUI and without refreshing the page, just start the array, do not check the "parity is already valid" box (GUI will still show that data on parity disk(s) will be overwritten, this is normal as it doesn't account for the invalid slot command, but they won't be as long as the procedure was correctly done), disk4 will start rebuilding, disk should mount immediately but if it's unmountable don't format, wait for the rebuild to finish and then run a filesystem check I've entered that into the CLI, and it hasn't spat out any sort of conformation for me, I'm unsure if that's normal or not. It's also still saying "Parity disk(s) content will be overwritten" when I try to start the array. Do I proceed anyway or is there something I'm missing? Quote Link to comment
JorgeB Posted May 15, 2019 Share Posted May 15, 2019 1 minute ago, Randommuch2010 said: I've entered that into the CLI, and it hasn't spat out any sort of conformation for me, That's normal 1 minute ago, Randommuch2010 said: It's also still saying "Parity disk(s) content will be overwritten" 2 minutes ago, Randommuch2010 said: (GUI will still show that data on parity disk(s) will be overwritten, this is normal as it doesn't account for the invalid slot command, but they won't be as long as the procedure was correctly done) Quote Link to comment
Randommuch2010 Posted May 15, 2019 Author Share Posted May 15, 2019 1 minute ago, johnnie.black said: That's normal Including the parity overwrite warning? If so, presumably I'm good to hit "Proceed". Quote Link to comment
JorgeB Posted May 15, 2019 Share Posted May 15, 2019 5 minutes ago, johnnie.black said: (GUI will still show that data on parity disk(s) will be overwritten, this is normal as it doesn't account for the invalid slot command, but they won't be as long as the procedure was correctly done) Quote Link to comment
Randommuch2010 Posted May 15, 2019 Author Share Posted May 15, 2019 Whoops, sorry about that! Just hit proceed and it's cracking along with the Parity sync/rebuild as of about two minutes ago. Shares are all immediatly reachable and the data doesn't seem to have disappeared. I'll let it continue running throughout the night and check back in tomorrow. I can't thank you enough for you help! Quote Link to comment
Randommuch2010 Posted May 17, 2019 Author Share Posted May 17, 2019 On 5/15/2019 at 7:02 PM, johnnie.black said: This part concerns me, as you only have single parity, but assuming parity is valid and only one drive is missing, this would be the best bet of recovering that data: -Leave the new config as is -Important - leave the browser on that page, the "Main" page. -Open an SSH session/use the console and type (don't copy/paste directly from the forum, as sometimes it can insert extra characters): mdcmd set invalidslot 4 29 -Back on the GUI and without refreshing the page, just start the array, do not check the "parity is already valid" box (GUI will still show that data on parity disk(s) will be overwritten, this is normal as it doesn't account for the invalid slot command, but they won't be as long as the procedure was correctly done), disk4 will start rebuilding, disk should mount immediately but if it's unmountable don't format, wait for the rebuild to finish and then run a filesystem check Rebuild has been marked as complete, Disk4 is still showing as "No filesystem" and as a result, I can't run a filesystem check. Only option it's giving me currently is the option to format it. Any ideas? Quote Link to comment
Randommuch2010 Posted May 17, 2019 Author Share Posted May 17, 2019 Attached new diags. tower-diagnostics-20190517-1832.zip Quote Link to comment
JorgeB Posted May 17, 2019 Share Posted May 17, 2019 Disk3 is failing and there were read errors during disk4's rebuild, so there will be data corruption on the rebuilt disk4, you can still try to see if xfs_repair can fix the filesystem, start the array in maintenance mode and type: xfs_repair -v /dev/md4 Quote Link to comment
Randommuch2010 Posted May 17, 2019 Author Share Posted May 17, 2019 Phase 1 - find and verify superblock... - block cache size set to 326760 entries Phase 2 - using internal log - zero log... zero_log: head block 271366 tail block 271362 ERROR: The filesystem has valuable metadata changes in a log which needs to be replayed. Mount the filesystem to replay the log, and unmount it before re-running xfs_repair. If you are unable to mount the filesystem, then use the -L option to destroy the log and attempt a repair. Note that destroying the log may cause corruption -- please attempt a mount of the filesystem before doing this. Got this back when I attempted to run that command Quote Link to comment
Randommuch2010 Posted May 17, 2019 Author Share Posted May 17, 2019 Last phase appears to have failed; Phase 6 - check inode connectivity... - resetting contents of realtime bitmap and summary inodes - traversing filesystem ... entry ".." in directory inode 100 points to non-existent inode 538430905 bad hash table for directory inode 100 (no data entry): rebuilding rebuilding directory inode 100 bad hash table for directory inode 140 (no data entry): rebuilding rebuilding directory inode 140 entry ".." in directory inode 141 points to non-existent inode 4050559755 bad hash table for directory inode 141 (no data entry): rebuilding rebuilding directory inode 141 entry ".." in directory inode 16935984 points to non-existent inode 552014935 bad hash table for directory inode 16935984 (no data entry): rebuilding rebuilding directory inode 16935984 bad hash table for directory inode 67099682 (no data entry): rebuilding rebuilding directory inode 67099682 xfs_repair: phase6.c:1376: longform_dir2_rebuild: Assertion `done' failed. Aborted Quote Link to comment
JorgeB Posted May 17, 2019 Share Posted May 17, 2019 2 minutes ago, Randommuch2010 said: Last phase appears to have failed; Yes, it's a xfs_repair bug, you need to update to v6.7 and run it again. Quote Link to comment
Randommuch2010 Posted May 17, 2019 Author Share Posted May 17, 2019 Last few lines of the xfs_repair; "resetting inode 817376329 nlinks from 5 to 3 resetting inode 817376333 nlinks from 4 to 3 Maximum metadata LSN (7:785632) is ahead of log (1:2). Format log to cycle 10. cache_purge: shake on cache 0x5231e0 left 5 nodes!? cache_purge: shake on cache 0x5231e0 left 5 nodes!? cache_zero_check: refcount is 1, not zero (node=0x15319c092410) cache_zero_check: refcount is 1, not zero (node=0x153184091410) cache_zero_check: refcount is 1, not zero (node=0x153184089060) cache_zero_check: refcount is 1, not zero (node=0x15318408ee10) cache_zero_check: refcount is 1, not zero (node=0x15319c00b210) done" Unsure if it usually runs through in only a couple of minutes, disk4 is still showing as unmountable. Quote Link to comment
JorgeB Posted May 17, 2019 Share Posted May 17, 2019 Start the array normally and post new diags grabbed after that. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.