Jump to content
We're Hiring! Full Stack Developer ×

drive went 'unmountable' after rebuild - needs formatting now! (6.8.3)


erniej

Recommended Posts

Ok... a bit of history:

Last week, I shut down and replaced a working 1.5tb drive with an 8tb drive (the 8tb worked fine - came from my synology nas).
Unraid server threw the errors up saying the 1.5tb was missing and is simulated with parity. I followed by saying it was replaced by this 8tb drive and data rebuild and parity rebuild commenced.

 

All seemed fine. I didn't really check much after that - parity and data rebuild said it would take 4.5 days or so and I proceeded to function as normal use on the server.  I'm pretty certain it all completed and was back up and running as expected (my email reports say FAIL (data rebuild/parity rebuild in progress ending on Sept 3rd) and PASS starting Sept 4th. 

 

All seemed fine... Fast forward to today.... I'm copying a few smaller files over to the server and all of a sudden I get 'no access' (from windows). I log into the server and see that all my shares are gone.  I also notice that my Disk 1 now says "unmountable: no file system" - and at the bottom, I'm prompted to format. I do a bit of a google search on the problem of missing shares, and then reboot the server. Shares all return, but the drive still says unmountable and wants me to format it.

 

My question is - what happened and what do I do now?
I expected parity and data rebuild to just bring this drive back online and operate normally with the extended space - and am pretty sure it did.... or is this the  normal process - any data on that initial 1.5tb drive was rebuilt onto a different drive instead of the replacement - and now I need to format this new drive?  BUT during the rebuild process, the server was showing the extra space.  I've attached a few screenshots showing progression - with the 1.5tb drive I had 70.5TB of space, during the rebuild 77TB of space (with the 8tb drive replacing the 1.5drive) and now today 69TB of space.

Hopefully someone can shed some insight into what has happened and my next course of action here....   I have no idea what might have ended up on the 8tb drive that was replacing the 1.5tb drive during the last couple of days (if anything).  


Also new is:
Event: Docker high image disk utilization Subject: Warning [FRACTAL] - Docker image disk utilization of 77% Description: Docker utilization of image file /mnt/user/system/docker/docker.img Importance: warning

No idea if the above is related or unrelated. 
Thanks for any assistance (or perhaps just clarification on a drive rebuild/replacement process). 

ScreenClip.png

ScreenClip.png

ScreenClip.png

fractal-diagnostics-20200906-1412.zip

Link to comment

Replacing the disk will rebuild the disk. It doesn't rebuild parity. Whatever was on the original disk should be what is on the replaced disk, but that depends on parity plus all the other disks emulating the original disk for the rebuild.

 

The usual thing to do with unmountable is to repair the filesystem, but if you still have the original disk getting the files from there might be the simpler approach.

 

Do you still have the original disk with all its data?

Link to comment
29 minutes ago, trurl said:

Replacing the disk will rebuild the disk. It doesn't rebuild parity. Whatever was on the original disk should be what is on the replaced disk, but that depends on parity plus all the other disks emulating the original disk for the rebuild.

 

The usual thing to do with unmountable is to repair the filesystem, but if you still have the original disk getting the files from there might be the simpler approach.

 

Do you still have the original disk with all its data?

No, the original disk went into my second unRaid build and was wiped... there would have been minimal (if any) data on it and hence why I already installed and wiped it elsewhere.... the 'used' amount was showing as about 25gb so very negligible. 

 

Since this disk did rebuild into the 8tb drive, what would have triggered it now to show unmountable and needing a format?  Any ideas?   I'd hate to see any other disk start 'dropping' like this - since it's essentially a critical fail with no way to recover data.

Anyway, do you think I'm good to just format and potentially forget about any data that 'might have' rebuilt onto it? I'm all ok with that, just concerns that this type of fail could occur again - but with significant data next time. 

 

 

Link to comment
12 minutes ago, trurl said:

The easiest thing would be to format, but it might be instructive to try to repair the filesystem to see if you can get the files back. See this wiki:

 

https://wiki.unraid.net/Check_Disk_Filesystems#Checking_and_fixing_drives_in_the_webGui

I've formatted it and it's back into the array and usable... I did lose some temporary data, but it was quite temporary stuff and no issues with the loss - just a minor hiccup in my flow of things for today. 

I guess my main concern as a relatively new unRaid user is that this seems like a catastrophic failure - the drive went unusable (for whatever the reason), but unRaid didn't pickup on this and allow for a parity simulation & rebuild - perhaps it was just purely coincidence it happened right after removing this drive from another system and putting it into my unRaid array and letting rebuild occur - but going back through my logs, that initial rebuild and parity rebuild all finished up so the array should have been solid as far as redundancy being available.  

Oh well... back to usual now and will monitor that drive closely for a while.  I'm just a bit paranoid about touching that system - I wanted to bump all the drives to 10tb or bigger over the next couple of weeks... might take it slower now!

Link to comment

Are you sure the original drive was mountable before you replaced it?

4 hours ago, trurl said:

Whatever was on the original disk should be what is on the replaced disk, but that depends on parity plus all the other disks emulating the original disk for the rebuild.

If the original disk was unmountable, then if parity and all other disks were in sync, the rebuild would have resulted in the same unmountable contents.

 

Or, if parity and all other disks were not in sync, then rebuild could become corrupt and unmountable.

 

If we still had the original and it was mountable, then we would know something else was the cause of the corruption.

 

Do you do regular parity checks? Do they always result in exactly zero sync errors (the only acceptable result for accurately doing rebuilds).

 

Were there any errors shown in the Errors column during rebuild? Do any of your disks display SMART warnings on the Dashboard page?

 

When rebuilding a disk, what you should see is a lot of writes to the rebuilding disk, and a lot of reads of all other array disks including parity. And zero in the Errors column for all disks.

Link to comment
5 minutes ago, trurl said:

Are you sure the original drive was mountable before you replaced it?

If the original disk was unmountable, then if parity and all other disks were in sync, the rebuild would have resulted in the same unmountable contents.

 

Or, if parity and all other disks were not in sync, then rebuild could become corrupt and unmountable.

 

If we still had the original and it was mountable, then we would know something else was the cause of the corruption.

 

Do you do regular parity checks? Do they always result in exactly zero sync errors (the only acceptable result for accurately doing rebuilds).

 

Were there any errors shown in the Errors column during rebuild? Do any of your disks display SMART warnings on the Dashboard page?

 

When rebuilding a disk, what you should see is a lot of writes to the rebuilding disk, and a lot of reads of all other array disks including parity. And zero in the Errors column for all disks.


I'm fairly certain all was good yesterday.
Here's the report I had in email this morning prior to seeing Disk 1  as unmountable. 

 

Event: Unraid Status
Subject: Notice [FRACTAL] - array health report [PASS]
Description: Array has 14 disks (including parity & cache)
Importance: normal

Parity - ST10000DM0004-1ZC101_ZA2BV175 (sdn) - active 36 C [OK]
Disk 1 - ST8000DM004-2CX188_ZCT0N83Q (sdf) - active 36 C [OK]
Disk 2 - ST1000LM024_HN-M101MBB_S2TPJ9CC605626 (sdo) - standby [OK]
Disk 3 - ST10000DM0004-1ZC101_ZA2C36GR (sdj) - active 34 C [OK]
Disk 4 - ST6000DM003-2CY186_ZF200ZQG (sdi) - active 36 C [OK]
Disk 5 - WDC_WD30EZRX-00MMMB0_WD-WCAWZ1175087 (sdl) - standby [OK]
Disk 6 - ST1000LM024_HN-M101MBB_S2YFJ9BD400082 (sdm) - standby [OK]
Disk 7 - ST8000DM004-2CX188_ZCT0R4X4 (sdh) - active 35 C [OK]
Disk 8 - ST10000DM0004-1ZC101_ZA2B9BQH (sde) - standby [OK]
Disk 9 - ST10000DM0004-1ZC101_ZA2BBQV6 (sdb) - standby [OK]
Disk 10 - ST10000DM0004-1ZC101_ZA2C640S (sdg) - standby [OK]
Disk 11 - ST10000DM0004-1ZC101_ZA2C6CA5 (sdk) - standby [OK]
Cache - Samsung_SSD_860_QVO_1TB_S59HNG0MB23020Y (sdc) - active 34 C [OK]
Cache 2 - Seagate_BarraCuda_SSD_ZA1000CM10002_7M101GBJ (sdd) - active 42 C [OK]

Parity is valid
Last checked on Thu 03 Sep 2020 03:14:47 PM MDT (3 days ago), finding 0 errors.
Duration: 4 days, 19 hours, 54 minutes, 14 seconds. Average speed: 24.0 MB/s

This is from September 3 when I the data rebuild finished from my Disk 1 replacement.

Event: Unraid Parity sync / Data rebuild
Subject: Notice [FRACTAL] - Parity sync / Data rebuild finished (0 errors)
Description: Duration: 4 days, 19 hours, 54 minutes, 14 seconds. Average speed: 24.0 MB/s
Importance: normal

This is August 29 - I pulled the original 1.5tb Disk 1 and inserted this 8tb disk in it's place - rebuild started:

Event: Unraid Disk 1 error
Subject: Warning [FRACTAL] - Disk 1, drive not ready, content being reconstructed
Description: ST8000DM004-2CX188_ZCT0N83Q (sdf)
Importance: warning

I definitely remember all the reads occurring - and the writes to the new Disk 1 and the Parity drive and was quite certain everything was done and working correctly prior to this morning.

Odd is that all my shares disappeared to start this problem off is it not?  Is that a separate problem that occured?

Link to comment
39 minutes ago, erniej said:

Odd is that all my shares disappeared to start this problem off is it not?  Is that a separate problem that occured?

User shares are sometimes broken if array or cache disk is unmountable. It is just a symptom of one of the disks included in user shares being unmountable.

Link to comment
  • 4 weeks later...

@trurl  Just as an FYI, all was working since this last message ( I ended up formatting the drive and adding it back to the array)... However today, shares were 'dead' and  various docker apps also dead.    I rebooted the server and this same drive was back to "unmountable: no file system".  Parity doesn't recognize a failure here.  Of course, since the 6th,  there have since been a lot of files added.

 

Is there a log somewhere that I can see what was on the drive so I can recover from backups since parity isn't doing it's job?    I'm just going to pull this drive and take it out of service than risk this type of thing happening again.   Alternatively, if I pull this drive now - do you think unRaid would show it as a failed drive and them emulate the previous contents?   

The thing is that the way it shows right now it's just like that drive never existed in my array.

Any advice or thoughts on this?  There must be some way to get a list of what was on this drive - assuming that unRaid tracks this info to manage itsfile system.

Thanks!

Edited by erniej
Link to comment
3 hours ago, erniej said:

Any advice or thoughts on this?

You are using a Marvell sata controller for a majority of your drives.  Marvell controllers are not recommended as they can drop drives and cause problems. Best bet would be to find a LSI controller.

3 hours ago, erniej said:

I'm just going to pull this drive and take it out of service than risk this type of thing happening again.   Alternatively, if I pull this drive now - do you think unRaid would show it as a failed drive and them emulate the previous contents?   

If it's not currently disabled then the emulated drive will have the same contents as the physical drive.  Parity is always in sync with the drives.  You may need to read up on how parity works, it's not a backup but just a way to recover from one (or two with dual parity) missing/failed drives.  Parity cannot recover from data loss resulting from writes to disk (ie. formatting, deleting files, ransomware, etc).  You would need a full backup solution in that case.

 

Edit:  It also looks like you're using a port multiplier on that Marvell controller.  That is also not recommended.

Edited by civic95man
  • Like 1
Link to comment
29 minutes ago, civic95man said:

You are using a Marvell sata controller for a majority of your drives.  Marvell controllers are not recommended as they can drop drives and cause problems. Best bet would be to find a LSI controller.

If it's not currently disabled then the emulated drive will have the same contents as the physical drive.  Parity is always in sync with the drives.  You may need to read up on how parity works, it's not a backup but just a way to recover from one (or two with dual parity) missing/failed drives.  Parity cannot recover from data loss resulting from writes to disk (ie. formatting, deleting files, ransomware, etc).  You would need a full backup solution in that case.

 

Edit:  It also looks like you're using a port multiplier on that Marvell controller.  That is also not recommended.

I'll replace those cards with LSI's over the next week or so.. good to know!    Is there any 'study materials' out there on the LSI vs Marvell controllers so I know the difference and the hows/whys?   Also - for the port multiplier - again, not sure what to deal with there - I just plug it in and things 'worked' for me.  I guess I better get it figured out to eliminate future problems though. 

2nd - this 'failed' drive isn't showing as failed though - and therefore parity isn't emulating anything from this drive - it's just like it completely 'disappeared' as an operating drive with files on it and now shows as a new drive ready for formatting instead.  Any reference to those files outside of this server shows those files as missing now - hence I had hoped that unRaid would have a log of what was on that drive if I can't get parity to emulate the drive while I put in a new drive.  I can restore from backups  - but since I don't know what was on that specific drive, I'm stuck having to do a full comparative restore of 23tb of data.  Not impossible of course, but would be easier if I could just pull the specific missing files and put back. 

Link to comment
7 hours ago, itimpi said:

Have you tried the steps documented here in the online documentation for trying a file system check/repair on the ‘unmountable’ dtive?

Yes - the system did a check and although the drive still says unmountable, unraid says "Array Started•Parity-Sync / Data-Rebuild" -- not sure really what's going on - the data is still missing even though the icon on my disk 1 says it's being emulated. 

I still fear the data is lost and I'll have to bring it back from a backup source -- the frustration is that there was only a small amount of data on this drive and it would have been nice to just restore that data if I could find out what had been on that drive vs split across the other drives.

And the ultimate in frustration is why unRaid doesn't see this as a failed drive and properly emulate?  At this point, I'm ready to turn off parity protection - for the twice now this drive failed like this, parity has done nothing to help in keeping things running until I can swap a drive and rebuild. 

Link to comment
18 hours ago, erniej said:

There must be some way to get a list of what was on this drive - assuming that unRaid tracks this info to manage itsfile system.

It doesn't. The user shares are just the aggregated top level folders on cache and array disks. It doesn't need to keep track of the files since the files are already kept track of by the filesystem of each disk.

 

Unraid disables a disk when a write to if fails. Then parity and all other disks emulate the disk until rebuilt. Filesystem corruption is a separate issue and requires a different fix. Replacing the controller as mentioned is probably the solution.

Link to comment
1 hour ago, erniej said:

And the ultimate in frustration is why unRaid doesn't see this as a failed drive and properly emulate?  At this point, I'm ready to turn off parity protection - for the twice now this drive failed like this, parity has done nothing to help in keeping things running until I can swap a drive and rebuild. 

Parity can only emulate a disk at the disk level. if the disk itself has corruption (file system corruption), whether or not it was due to a write error, then chances are that the emulated disk will have that corruption too.  It's not the end of the world though, the file system just needs to be repaired and many times that can be accomplished with little or no loss of data.

1 hour ago, erniej said:

Yes - the system did a check and although the drive still says unmountable, unraid says "Array Started•Parity-Sync / Data-Rebuild" -- not sure really what's going on - the data is still missing even though the icon on my disk 1 says it's being emulated.

What check are you referring to? A repair of the file system? It sounds like unraid is writing that 'emulated' disk back to the physical disk, which will not in itself fix the unmountable issue, that requires a repair of the filesystem.

15 hours ago, erniej said:

I'll replace those cards with LSI's over the next week or so.. good to know!    Is there any 'study materials' out there on the LSI vs Marvell controllers so I know the difference and the hows/whys?   Also - for the port multiplier - again, not sure what to deal with there - I just plug it in and things 'worked' for me.

Your data rebuild may fail if there are continued errors with that Marvell card and port multiplier.  Its best to replace those ASAP rather than waiting. As for Marvell; their drivers for linux seem flaky at best and is the reason that those controllers are not recommended for unraid.  Pretty much any LSI card will work, just make sure it is flashed to IT mode and has the latest firmware - you don't want to use unraid with a RAID card.

 

EDIT: -and avoid 'new' LSI cards from China as they are most often counterfeit.  Best to get used cards pulled for servers. 

 

Edited by civic95man
  • Like 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...