November 11, 20196 yr Hello All, I am in urgent need of some advice. I have been a long time Unraid user with 0 issues but the past few days have been really bad. I will give some background for context, it may or may not be relevant to the current situation. So a few weeks ago i wanted to expand my array. I had 2x6TB (Parity and disk 1) and 4x3TB drives (Disk 2-5) giving me a total of 18TB. I bought 2 more 6TB that i wanted to put in new slots. I put them in and started preclearing them. During the preclear one of the new 6TB came up with Smart errors (sector reallocation). I shut it down and took it out (and did an RMA), however when the machine rebooted Disk 5 was in a red state but no smart errors. I did a quick google and tried to unassigned it and reassign it to force a rebuild. While it was re building it kept erroring. I assumed it was a faulty drive and took it out and replaced it with the 6TB new drive. It did a rebuild fine and everything was OK, giving me a new array size of 21TB, until yesterday. Yesterday I tried to restart one of my docker containers. The docker tab was missing and there was an error so I rebooted. When it tried to start it couldn't find the USB key to boot from. I shut it down put it in my laptop and it appeared fine. I had a spare USB drive so i copied the old key to the new one and it started fine. After doing the licence dance the array started fine and it stared doing a parity check. After a few hours I checked back on it and it had quite a few thousand errors. I let it complete overnight and it had ~22000 corrections. I noticed that on of my dockers wasn't working correctly so i rebooted again and now it says disk 5 is "Unmountable: no file system". and wants me to format it. Whats even more worrying is that its not emulating disk 5 and the total array size has dropped back to 15TB, like the new disk wasnt even part of the array. My question is, what do I do. Obviously i don't want to loose any data, but im not sure if thats possible. Do I format? If i do will it rebuild disk 5 or will it just wipe it? please help. Edited November 11, 20196 yr by Hullscotty1986
November 11, 20196 yr Author Sorry forgot to say. Running unraid Version: 6.8.0-rc5 Motherboard: ASUSTeK Computer INC. M5A78L-M PLUS/USB3, Version Rev X.0x American Megatrends Inc., Version 0502 BIOS dated: Fri 18 Nov 2016 12:00:00 AM UTC CPU AMD FX™-8350 Eight-Core @ 4000 MHz Memory 16 GiB DDR3. Diagnostic logs also attached media-diagnostics-20191111-1052.zip
November 11, 20196 yr Community Expert DO NOT FORMAT. On mobile now so haven't looked at Diagnostics. Bad connections are much more common than bad disks and since you have been mucking about in the case very likely. I will have a look later if nobody else does.
November 11, 20196 yr Author Thanks for your reply. I have not formatted yet. The array is offline at the moment. I have switched the SATA cable out and its still the same. Let me know what you find in the logs. Thanks for your help. Scott
November 11, 20196 yr Community Expert 5 hours ago, Hullscotty1986 said: I let it complete overnight and it had ~22000 corrections. Diags are after rebooting, so we can't see what happened there, but for now check filesystem on disk5. https://wiki.unraid.net/Check_Disk_Filesystems
November 11, 20196 yr Author I have run the check with the array in maintenance mode and it had found a load of errors (attached) Do i now run the check without the -n and just as -v to force it to repair? Thanks Scott disk5errors.txt
November 11, 20196 yr Community Expert 7 minutes ago, Hullscotty1986 said: Do i now run the check without the -n and just as -v to force it to repair? Yes, and if it asks use -L
November 11, 20196 yr Author Used -vL anhd got: fatal error -- couldn't map inode 4298301629, err = 117 resetting inode 6442942751 nlinks from 42 to 1 resetting inode 6442944898 nlinks from 1 to 2 resetting inode 6442944901 nlinks from 1 to 2 resetting inode 6442944903 nlinks from 1 to 2 Full attached disk5errors vL.txt
November 11, 20196 yr Community Expert Likely whatever happened before that caused the errors corrupted the filesystem in a way that's currently unfixable, though xfs_repair should always repair the filesystem with more or less data loss, you could try askig for help in the xfs mailing list.
November 11, 20196 yr Am I correct in thinking that even pulling the corrupted disk and replacing would not be a solution as the corruption may be reflected in parity?
November 11, 20196 yr Community Expert Very unlikely to help, but since we don't know what caused the errors can't say for sure, you can try it even without replacing the disk, just by unassigning it and checking the emulated disk.
November 11, 20196 yr Author I shut it down and backed up my flash drive. At the same time I pulled the power from disk 5. Rebooted, and started the array. It now says that disk 5 is missing but still unmountable no file system. Disk 5 is not emulated. I thinking at this time to restore a previous backup of my unraid config, from when i switched USB keys.
November 11, 20196 yr Community Expert 1 minute ago, Hullscotty1986 said: I thinking at this time to restore a previous backup of my unraid config, from when i switched USB keys. Can't imagine this would make any difference.
November 11, 20196 yr Author You were correct, still says unmountable. Is this because the parity is now corrupted?
November 11, 20196 yr Community Expert Try running xfs_repair on the emulated but most likely will have a similar result to the actual disk.
November 11, 20196 yr Author Yes you were right same issue. So, i have my old 3TB drive but i dont have a config with the drive in the pool. I think the issue was the power cable to the drive, not the drive itself. So I can think of 1 of 2 things to do. 1 Modify the config so it has the old drive id, and rebuild the parity. Or - Format the new drive so disk 5 is clean, then copy everything form the 3TB unassigned disk to the new 6TB disk. I would prefer option 1 but i dont know if its possible?
November 11, 20196 yr I'm definitely guessing as I haven't had to use ddrescue yet (thank goodness) but if xfs_repair doesn't work and there is no emulation of D5 then would his recovery process be to physically remove D5 from the array and rebuild parity (assuming full data loss). Could he then run ddrescue on D5 from unassigned devices and add the newly recovered D5 (missing some if not most of the original data) back to the array and rebuild parity again? @Hullscotty1986 I wrote the above as you posted... so is your 3TB disk a backup of D5 or will you need to recover data from D5?
November 11, 20196 yr Author @Dissones4U It is a backup minus about 7 days which is better than full data loss. I have mounted it as a unassigned disk and the data is still good.
November 11, 20196 yr I suspect having that backup will make your life easier... I can't speak to modifying the config file but another option may be to remove D5 (shrink the array) and then add the 3TB back in. Honestly that sounds like a pita so hopefully they green light your config modification as it sounds way faster.
November 11, 20196 yr Author I dont think it will be possible tbh. I am going to start the format now so I can do the copy overnight
November 11, 20196 yr Author Yeah either way is going to yield the same result. I have started the copy, done 60GB of 3TB! I have thrown away the power cable i was using on disk 5 too. The issues didn't start until I changed the power cable. I don't know for sure if that is the issue but the fact that my old disk5 is still readable after it kept going red makes me think it is. After the copy is done I will do a full parity check and hopefully everything will be back to normal, minus a few files that i lost. Thanks all for your help.
November 12, 20196 yr Author Hi All, So the data copied overnight. I did a dip test on disk 5 and the files seem to open OK. I kicked off a parity check and its up to 30GB but already has 2798 errors. Would this be expected with the situation i'm in? I would have thought as it copied the files it would have updated the parity so it should have been correct. Attached is the latest diagnostic logs (no reboots this time). Does anything look dodgy with any of the disks? Thanks Scott media-diagnostics-20191112-1025.zip
November 12, 20196 yr Community Expert 10 minutes ago, Hullscotty1986 said: Would this be expected with the situation i'm in? We can't say because we don't have the diags from the errors before, and don't know what happened, just let it fix all sync errors.
November 12, 20196 yr Author Yeah I will do thanks. At about 50GB it stopped producing sync errors and. Its up to 160GB now with 0 errors in the last 100GB. After this check I will run another one to make sure it comes up with 0 errors. Is there anything in the logs saying the Unassigned drive is bad? Is there any reason I shouldn't put it back into the array after this?
Archived
This topic is now archived and is closed to further replies.