Hullscotty1986 Posted November 11, 2019 Share Posted November 11, 2019 (edited) Hello All, I am in urgent need of some advice. I have been a long time Unraid user with 0 issues but the past few days have been really bad. I will give some background for context, it may or may not be relevant to the current situation. So a few weeks ago i wanted to expand my array. I had 2x6TB (Parity and disk 1) and 4x3TB drives (Disk 2-5) giving me a total of 18TB. I bought 2 more 6TB that i wanted to put in new slots. I put them in and started preclearing them. During the preclear one of the new 6TB came up with Smart errors (sector reallocation). I shut it down and took it out (and did an RMA), however when the machine rebooted Disk 5 was in a red state but no smart errors. I did a quick google and tried to unassigned it and reassign it to force a rebuild. While it was re building it kept erroring. I assumed it was a faulty drive and took it out and replaced it with the 6TB new drive. It did a rebuild fine and everything was OK, giving me a new array size of 21TB, until yesterday. Yesterday I tried to restart one of my docker containers. The docker tab was missing and there was an error so I rebooted. When it tried to start it couldn't find the USB key to boot from. I shut it down put it in my laptop and it appeared fine. I had a spare USB drive so i copied the old key to the new one and it started fine. After doing the licence dance the array started fine and it stared doing a parity check. After a few hours I checked back on it and it had quite a few thousand errors. I let it complete overnight and it had ~22000 corrections. I noticed that on of my dockers wasn't working correctly so i rebooted again and now it says disk 5 is "Unmountable: no file system". and wants me to format it. Whats even more worrying is that its not emulating disk 5 and the total array size has dropped back to 15TB, like the new disk wasnt even part of the array. My question is, what do I do. Obviously i don't want to loose any data, but im not sure if thats possible. Do I format? If i do will it rebuild disk 5 or will it just wipe it? please help. Edited November 11, 2019 by Hullscotty1986 Quote Link to comment
Hullscotty1986 Posted November 11, 2019 Author Share Posted November 11, 2019 Sorry forgot to say. Running unraid Version: 6.8.0-rc5 Motherboard: ASUSTeK Computer INC. M5A78L-M PLUS/USB3, Version Rev X.0x American Megatrends Inc., Version 0502 BIOS dated: Fri 18 Nov 2016 12:00:00 AM UTC CPU AMD FX™-8350 Eight-Core @ 4000 MHz Memory 16 GiB DDR3. Diagnostic logs also attached media-diagnostics-20191111-1052.zip Quote Link to comment
trurl Posted November 11, 2019 Share Posted November 11, 2019 DO NOT FORMAT. On mobile now so haven't looked at Diagnostics. Bad connections are much more common than bad disks and since you have been mucking about in the case very likely. I will have a look later if nobody else does. Quote Link to comment
Hullscotty1986 Posted November 11, 2019 Author Share Posted November 11, 2019 Thanks for your reply. I have not formatted yet. The array is offline at the moment. I have switched the SATA cable out and its still the same. Let me know what you find in the logs. Thanks for your help. Scott Quote Link to comment
JorgeB Posted November 11, 2019 Share Posted November 11, 2019 5 hours ago, Hullscotty1986 said: I let it complete overnight and it had ~22000 corrections. Diags are after rebooting, so we can't see what happened there, but for now check filesystem on disk5. https://wiki.unraid.net/Check_Disk_Filesystems Quote Link to comment
Hullscotty1986 Posted November 11, 2019 Author Share Posted November 11, 2019 I have run the check with the array in maintenance mode and it had found a load of errors (attached) Do i now run the check without the -n and just as -v to force it to repair? Thanks Scott disk5errors.txt Quote Link to comment
JorgeB Posted November 11, 2019 Share Posted November 11, 2019 7 minutes ago, Hullscotty1986 said: Do i now run the check without the -n and just as -v to force it to repair? Yes, and if it asks use -L Quote Link to comment
Hullscotty1986 Posted November 11, 2019 Author Share Posted November 11, 2019 Used -vL anhd got: fatal error -- couldn't map inode 4298301629, err = 117 resetting inode 6442942751 nlinks from 42 to 1 resetting inode 6442944898 nlinks from 1 to 2 resetting inode 6442944901 nlinks from 1 to 2 resetting inode 6442944903 nlinks from 1 to 2 Full attached disk5errors vL.txt Quote Link to comment
JorgeB Posted November 11, 2019 Share Posted November 11, 2019 Likely whatever happened before that caused the errors corrupted the filesystem in a way that's currently unfixable, though xfs_repair should always repair the filesystem with more or less data loss, you could try askig for help in the xfs mailing list. Quote Link to comment
Dissones4U Posted November 11, 2019 Share Posted November 11, 2019 Am I correct in thinking that even pulling the corrupted disk and replacing would not be a solution as the corruption may be reflected in parity? Quote Link to comment
JorgeB Posted November 11, 2019 Share Posted November 11, 2019 Very unlikely to help, but since we don't know what caused the errors can't say for sure, you can try it even without replacing the disk, just by unassigning it and checking the emulated disk. 1 Quote Link to comment
Hullscotty1986 Posted November 11, 2019 Author Share Posted November 11, 2019 I shut it down and backed up my flash drive. At the same time I pulled the power from disk 5. Rebooted, and started the array. It now says that disk 5 is missing but still unmountable no file system. Disk 5 is not emulated. I thinking at this time to restore a previous backup of my unraid config, from when i switched USB keys. Quote Link to comment
trurl Posted November 11, 2019 Share Posted November 11, 2019 1 minute ago, Hullscotty1986 said: I thinking at this time to restore a previous backup of my unraid config, from when i switched USB keys. Can't imagine this would make any difference. Quote Link to comment
Hullscotty1986 Posted November 11, 2019 Author Share Posted November 11, 2019 You were correct, still says unmountable. Is this because the parity is now corrupted? Quote Link to comment
JorgeB Posted November 11, 2019 Share Posted November 11, 2019 Try running xfs_repair on the emulated but most likely will have a similar result to the actual disk. Quote Link to comment
Hullscotty1986 Posted November 11, 2019 Author Share Posted November 11, 2019 Yes you were right same issue. So, i have my old 3TB drive but i dont have a config with the drive in the pool. I think the issue was the power cable to the drive, not the drive itself. So I can think of 1 of 2 things to do. 1 Modify the config so it has the old drive id, and rebuild the parity. Or - Format the new drive so disk 5 is clean, then copy everything form the 3TB unassigned disk to the new 6TB disk. I would prefer option 1 but i dont know if its possible? Quote Link to comment
Dissones4U Posted November 11, 2019 Share Posted November 11, 2019 I'm definitely guessing as I haven't had to use ddrescue yet (thank goodness) but if xfs_repair doesn't work and there is no emulation of D5 then would his recovery process be to physically remove D5 from the array and rebuild parity (assuming full data loss). Could he then run ddrescue on D5 from unassigned devices and add the newly recovered D5 (missing some if not most of the original data) back to the array and rebuild parity again? @Hullscotty1986 I wrote the above as you posted... so is your 3TB disk a backup of D5 or will you need to recover data from D5? Quote Link to comment
Hullscotty1986 Posted November 11, 2019 Author Share Posted November 11, 2019 @Dissones4U It is a backup minus about 7 days which is better than full data loss. I have mounted it as a unassigned disk and the data is still good. Quote Link to comment
Dissones4U Posted November 11, 2019 Share Posted November 11, 2019 I suspect having that backup will make your life easier... I can't speak to modifying the config file but another option may be to remove D5 (shrink the array) and then add the 3TB back in. Honestly that sounds like a pita so hopefully they green light your config modification as it sounds way faster. Quote Link to comment
Hullscotty1986 Posted November 11, 2019 Author Share Posted November 11, 2019 I dont think it will be possible tbh. I am going to start the format now so I can do the copy overnight Quote Link to comment
Dissones4U Posted November 11, 2019 Share Posted November 11, 2019 Dude, you should let them confirm unless you're 100% positive... Quote Link to comment
Hullscotty1986 Posted November 11, 2019 Author Share Posted November 11, 2019 Yeah either way is going to yield the same result. I have started the copy, done 60GB of 3TB! I have thrown away the power cable i was using on disk 5 too. The issues didn't start until I changed the power cable. I don't know for sure if that is the issue but the fact that my old disk5 is still readable after it kept going red makes me think it is. After the copy is done I will do a full parity check and hopefully everything will be back to normal, minus a few files that i lost. Thanks all for your help. Quote Link to comment
Hullscotty1986 Posted November 12, 2019 Author Share Posted November 12, 2019 Hi All, So the data copied overnight. I did a dip test on disk 5 and the files seem to open OK. I kicked off a parity check and its up to 30GB but already has 2798 errors. Would this be expected with the situation i'm in? I would have thought as it copied the files it would have updated the parity so it should have been correct. Attached is the latest diagnostic logs (no reboots this time). Does anything look dodgy with any of the disks? Thanks Scott media-diagnostics-20191112-1025.zip Quote Link to comment
JorgeB Posted November 12, 2019 Share Posted November 12, 2019 10 minutes ago, Hullscotty1986 said: Would this be expected with the situation i'm in? We can't say because we don't have the diags from the errors before, and don't know what happened, just let it fix all sync errors. Quote Link to comment
Hullscotty1986 Posted November 12, 2019 Author Share Posted November 12, 2019 Yeah I will do thanks. At about 50GB it stopped producing sync errors and. Its up to 160GB now with 0 errors in the last 100GB. After this check I will run another one to make sure it comes up with 0 errors. Is there anything in the logs saying the Unassigned drive is bad? Is there any reason I shouldn't put it back into the array after this? Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.