Jump to content

[SOLVED] How to recover from a faulty disk and an unmountable disk at the same time?


Recommended Posts

hello community!

 

i have read similar topics from other users, but i would like to hear from other experienced members and get the advice from the experts for this situation of mine.

it might be the same or different from the others, but i think even if i can't be helped, others could learn from my mistakes.

 

last monday i received an alert saying my disk 8 was in error state.

when i got back from work i checked the server and noticed the disk was making a bit of a funny sound, sounded like it needed some oiling or something.

at that point it was the only 1 disk showing the red dot and error state.

yesterday i got another drive to replace it so i shutdown the server and pulled out the faulty drive and plugged in the replacement.

 

 

and that's how my story begins...

 

when my array started, another disk (disk 5) suddenly had the "unmountable: unsupported partition layout", but the drive was green. the array started with 2 faulty drives. oh! the horrors....

i had the worst feeling in my bones... array started with 2 missing/bad drives, in my simple mind it means the parity is no longer valid and i won't be able to rebuild the data for disk 8.

anyway while thinking about what to do about disk 5, i started preparing my disk replacement 8. I'm using unassigned devices and unassigned devices plus. I was actually looking for the preclear function but seems to have gone missing. there was some partitions that was on the disk, so i deleted them. but still no preclear. so i thought nevermind that, just this once i will have to replace a drive without preclearing first. i really need to start my server otherwise nothing in my entire house will work. not even internet because pfsense is running as a VM in unraid (having second thoughts about that now).

 

so at this point i figured i will try getting disk 5 back to normal state. I don't think it is actually faulty. it was fine just earlier. probably a loose connection. if i can get disk 5 back then unraid can rebuild disk 8.

so i shutdown unraid, pulled and replugged disk 5 and started up again, same result.

i shutdown again, pulled and replugged into another drive bay. basically i swapped disk 5 and disk 8 so i'm pretty sure there shouldn't be any issue with the connection. still same result. the array starts as if nothing is wrong, but disk 5 still shows "unmountable: unsupported partition layout" and disk 8 is not installed.

 

i went to maintenance mode, tried running the check from GUI, but as far as i could tell the results didn't show anything to repair. i ran the check with -n, -nV, -V and left the options blank. as far as i know it didn't have anything to repair, so it actually didn't do anything.

 

then i decided that eventhough disk 5 was unmountable unraid thinks that the array is ok, so maybe the parity and data is still there. just for some reason unraid can't mount the disk share. if i can rebuild disk 8 first, then later i should be able to rebuild disk 5 from parity.

i assigned the replacement to disk 8 and see how it goes.

array started and data rebuild was running. it did appear to be reading from the unmountable disk 5 while rebuilding.

however even from the start of the rebuild, disk 8 was "unmountable: no file system".

there were lots of writes to the disk since data was rebuilding, so i didn't think i should format the disk halfway.

i gathered from other user's postings that i should wait for the rebuild to complete then repair the disk.

 

this morning the rebuild for disk 8 is done but still appearing as "unmountable: no file system".

i might have seriously messed up, i don't know. at this point i still have lots of missing files and folders.

i will try repairing the file system after work later today.

hopefully data in disk 8 can be recovered.

 

after that should i try rebuild disk 5? i thought i would just unassign the drive and let unraid emulate disk 5. then i would reassign it back and let unraid rebuild it.

 

is it possible that rebuild will produce the exact same bit-for-bit disk with the same unmountable problem?

should i even try?

 

i really hope disk 5 can be recovered. Other than running the check on it, i don't think i did anything that would over-write or modify the data that was on it.

from what i've read here, this "unmountable: unsupported partition layout" for other users was caused by faulty sata controller or other hardware issues. no one actually had to repair the disk. not very comforting.

 

okay i think i need to stop here. i ramble on and on when i'm anxious.

 

i would be very grateful for any advice or thoughts.

 

thanks.

IMG_1779 (2).JPG

Edited by limawaken
Link to comment

Any time you get an ‘unmountable’ disk showing that previously mounted OK (including one being emulated) then the way to recover from this is to run a file system check/repair.   A rebuild will never fix this state as it will simply rebuild back to the same ‘unmountable’ state.

Link to comment
1 hour ago, itimpi said:

Any time you get an ‘unmountable’ disk showing that previously mounted OK (including one being emulated) then the way to recover from this is to run a file system check/repair.   A rebuild will never fix this state as it will simply rebuild back to the same ‘unmountable’ state.

ah i figured as much... thanks for the confirmation.

which means i have less options for recovery. maybe i will have to mount it outside the array and copy the contents out? but if unraid couldn't mount it what are the chances that unassigned devices will be able to?

Link to comment
4 hours ago, johnnie.black said:

Disk5 is showing as "invalid partition", so not necessarily a filesystem problem, please post the diagnostics: Tools -> Diagnostics

here's the diagnostics. i didn't know what to look for so i uploaded the entire zip file.

 

i ran check on disk 8 and it is mounted and shares can be accessed now. i haven't gone through all the folders but i'd say most of the disk was recovered. there was a bunch of stuff in the lost and found folder, sadly most of it's not identifiable...

 

did i mess up? should i have tried fixing the disk 5 invalid partition problem first?

will i be able to try rebuilding disk 8 again if we are able to fix the disk 5 invalid partition issue?

silometalico-diagnostics-20200605-1934.zip

Edited by limawaken
Link to comment

If parity is valid disk5 can be rebuilt, since rebuilding recreates the partition, you can test this unassigning disk5 and starting the array, check that the emulated disk mounts and contents look correct, if all OK you can rebuild on top, but if you know or suspect that parity isn't valid this won't work.

  • Like 1
Link to comment
1 hour ago, johnnie.black said:

If parity is valid disk5 can be rebuilt, since rebuilding recreates the partition, you can test this unassigning disk5 and starting the array, check that the emulated disk mounts and contents look correct, if all OK you can rebuild on top, but if you know or suspect that parity isn't valid this won't work.

what about what @itimpi said earlier, that rebuild can't fix unmountable errors?

 

parity seems to be valid, unraid rebuilt disk 8 without any complaints. array operation started and unraid page shows "parity valid".

what are the tell tale signs of invalid parity?

besides the stuff in lost+found I can't tell if there is anything else wrong with the rebuilt disk 8. there are stuff which i think are missing, but could probably be in disk 5.

 

if parity wasn't valid will unraid rebuild the disk without giving any warnings?

because doing that will definitely wipe everything from the disk, shatter all hope of a successful recovery and crush my soul, right?

 

Edited by limawaken
Link to comment
37 minutes ago, limawaken said:

that rebuild can't fix unmountable errors?

It can't fix "unmountable: no filesystem", if can fix "unmountable: invalid partition", of course that after fixing the partition there can also be filesystem corruption.

 

Parity might show valid but still be invalid, or partially out of sync if something was done before that might cause it, and a disk can still rebuild successfully but have some corruption, but if there's nothing you did to suspect it might no be valid it should be.

Link to comment

i guess parity partially out of sync is a possibility... parity is constantly being written, right? after unraid rebuilt and repaired disk 8 i would assume parity would have changed?

 

but if the disk can still be rebuild and most of the data saved, i'm going to give it a try.

Link to comment

johnnie.black you're awesome. i'm eternally grateful for all your advice.

 

unraid was able to emulate disk 5 contents. i assigned the disk back to the array and unraid has started rebuilding it.

 

about the missing or corrupted data in disk 8, would that have been due to disk 5 being in unmountable state so parity couldn't be read from it?

was there a way i could have fixed disk 5 first?

Link to comment
37 minutes ago, limawaken said:

would that have been due to disk 5 being in unmountable state so parity couldn't be read from it?

No, Unraid can still use it for parity calculation despite being unmountable, only if a disk is disabled (red x) it can't be used for parity calculation.

 

39 minutes ago, limawaken said:

was there a way i could have fixed disk 5 first?

Not before rebuilding 5, since you only have one parity, also that type of error "invalid partition" shouldn't happen out of the blue, it suggests something changed the MBR of the disk, and that should not happen during normal utilization, though you're not the first this happens to.

 

Link to comment

I started getting udma crc error warnings from disk 4. a lot. Last warning said there were 5622.

that can’t be good right? Does it mean disk 5 will have a lot of corrupted files after the rebuild? it isn’t even halfway through and already 5622 errors.

why is this happening suddenly? It’s like the universe hates me.

I'm quite sure I didn’t have any errors when disk 8 was being rebuilt.

Link to comment

udma crc errors now at 34577 and Unraid main page now shows 512 errors. 

Rebuild seems unusually slow. 

I’m kinda lost. Should I shutdown and check the connections now or should I wait until disk 5 rebuild is finished.

after checking the cables and all, should I rebuild disk 5 again? Would the errors cause data corruption?

87A47439-6158-4C33-9313-5C2B3CAE431C.jpeg

Link to comment

yes, i shutdown and basically just pulled the drive out and put it back. powered up and unraid started rebuilding again. only 2 hours left and no errors so far.

so strange because disk 4 wasn't even on the same drive cage as disk 5 or disk 8, yet somehow developed faulty sata connection.

(i'm using supermicro cages, never had any issues)

 

many thanks for your invaluable guidance, johnnie.black.

Link to comment

parity and data rebuilding completed without any errors, so i'm very happy about that.

it seems that everything in disk 5 was fully recovered, which seems almost miraculous considering how it was rebuilt right after replacing another failed drive that had to be rebuilt then repaired from unmountable errors.

 

my take away from this experience is that data on unraid is for lack of a better way to put it very recoverable.

i think that in a similar situation for other NAS solutions the entire array would have been invalid and it would have most likely meant total loss of data.

 

 

my next project will be setting up another unraid server for backing up my unraid server. the new hp microserver plus looks nice.

 

i'm tagging this topic as solved.

Link to comment
  • JorgeB changed the title to [SOLVED] How to recover from a faulty disk and an unmountable disk at the same time?

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...