Formatted disk after rebuild


Recommended Posts

Well, I am sure I am not the first and I sure I will not the last....

 

Here is my situation.

 

As I was rebuilding my Unraid server, (I had errors on disks) I was replacing with clean Drives. (Replacing mechanical with SSD's) My config currently requires me to set a drive in the Raid controller as a Raid 0 before unraid picks it up in the OS. 

I am running an HP Proliant DL380 G7 with 2 x Intel Xeon X5660 2.8 GHZ and 32 Gigs of RAM

 

Single Parity 4 TB

6 additional 1 TB SSD's

 

As I was rebuilding I noticed one of my disks had an inordinate amount of errors on it. I was currently moving from HDD to SSD's (Replacing them one at a time.)

 

Typically the process was simple. Remove an HDD, Set the new SSD as RAID 0 in the controller, Boot into the OS and let Parity do it's job.

 

The drive with the errors on it, (This was a drive that was just purchased, already SSD, and should not have been failing at all) I discovered a giant dust bunny I assumed was inhibiting connection but even the raid controller viewed it as failed. As a result, I reset the virtual drive on the controller and booted into the OS.

 

Okay no problem. The thing rebuilt in typical time. When I rebooted once more, I noticed the file system was unmounted and the drive wanted to format still. So I did. 

 

I imagine now, that all the data that was built is lost. Is this correct? Any hope of recovery from parity or otherwise? 

 

Forgive my ignorance, I am new to unraid and I am used to data being striped across drives in some manor and there being a rebuild option. but from what I see, unraid handles data COMPLETELY different.  

 

Help me Obe Wan Kenobi, you are my only hope.

Link to comment

Formatting is never part of a rebuild process (and unraid warns you that also) it means the same thing on any other OS, namely wipe the drive with a fresh filesystem and get rid of all the files.

 

Since the parity system is updated in real time, any subsequent attempts at recovering via another rebuild will simply recover a blank file system.

 

There are however various windows utilities that will allow you to (hopefully) recover most of the files if you pull the disk. Google is your friend

 

And, striping or not unraid is identical to other NAS solutions and how it handles things via real time parity protection

 

Sent from my NSA monitored device

 

 

 

 

Link to comment
42 minutes ago, Squid said:

Formatting is never part of a rebuild process (and unraid warns you that also) it means the same thing on any other OS, namely wipe the drive with a fresh filesystem and get rid of all the files.

 

Since the parity system is updated in real time, any subsequent attempts at recovering via another rebuild will simply recover a blank file system.

 

There are however various windows utilities that will allow you to (hopefully) recover most of the files if you pull the disk. Google is your friend

 

And, striping or not unraid is identical to other NAS solutions and how it handles things via real time parity protection

 

Sent from my NSA monitored device

 

 

 

 

I totally knew that too, but what was messed up about it was that this was after the rebuild process. The disk indicated unmounted and still wanted to format. (I had rebooted after the rebuild as well.)

 

That's the only reason i even considered it. Thinking unraid would rebuild on the file system. No such luck though.

I guess I'll see how bad the damage is after i rebuild this last disk. My shares indicate "unprotected" the data wouldn't have been mirrored on another disk or anything right?

 

Thanks for catering to my inexperience again Squid!!

Link to comment

Hard to say why the disk was corrupted without diagnostics when it happened. But the solution then and before the format was to run the file system checks on it

The unprotected means that either some or all the files are correctly on the cache drive if installed, or there is a data disk missing or red balled

Sent from my NSA monitored device

Link to comment
1 hour ago, Squid said:

Hard to say why the disk was corrupted without diagnostics when it happened. But the solution then and before the format was to run the file system checks on it

The unprotected means that either some or all the files are correctly on the cache drive if installed, or there is a data disk missing or red balled

Sent from my NSA monitored device
 

For Future reference, Is there any documentation indicating how to do file checks in the scenario that it is one disk giving trouble. I have another disk (Disk 3) which is reporting errors. None of these disks should be having issues. They are all relatively new however disk 3 is continually reporting read errors. Not many (about 70k) but enough for me to go OCD on it. 

 

Additionally, I do not have a cache drive. The data disk missing would be the one that I formatted. So since the shares are there, There is nothing to be done with them for example, I have deleted one disk. Is everything lost? I am mainly asking how to recover from this assuming that only one disk has been formated. I Mean, I can still see the same folder structure on all the other disks, which leads me to believe there is still data on those other disks. How do I restore the shares and recover any existing data? When I UNC to the server, The storage share is empty along with several others.....

 

 

Link to comment
31 minutes ago, RogueWolf33 said:

Is there any documentation indicating how to do file checks

https://wiki.unraid.net/Check_Disk_Filesystems  Can be done right from the GUI

 

32 minutes ago, RogueWolf33 said:

however disk 3 is continually reporting read errors.

By and large the cause of most disk "problems" is crappy connections to the drive or motherboard (sata is a very poor connector, and very easy to slightly jar it)

33 minutes ago, RogueWolf33 said:

(about 70k)

You consider 70k not much??!?  Maybe every 2 years I get one (1) - if that.  70K I'd be losing it...

 

35 minutes ago, RogueWolf33 said:

The data disk missing would be the one that I formatted.

Can't be.  Formatted or not, it's a member of the array.  Post a screenshot of your Main Tab and your Shares Tab

Link to comment

Until disk 2 finishes rebuilding, the array is unprotected because if another drive dies then it cannot be rebuilt (that would be 2 simultaneous failures)

 

But, the massive read errors are a possible explanation for what happened to #2.

 

Any time there is a read error on any data disk, unRaid will calculate what is supposed to be in that particular sector (based upon parity and the other drives) and then write the back to that drive.  If the write succeeds, then the read error count is increased.  If the write fails, then the drive gets dropped from the array and it's contents are then emulated.

 

I'm not 100% sure if this also happens during a rebuild ( @johnnie.black would know) because during that process, the data being written may or may not be valid.

 

You've got filesystem corruption on disk 5 (this is why you're not seeing the shares)

 

I always thought that the read error counter was reset with every reboot (could be wrong as they happen so infrequently).  But, complicating it is that there are no smart reports for the drives to tell their health.  Looks like you've got a HP controller.  Maybe click on each of the disks in turn and set the SMART controller type to be HP.

Edited by Squid
Link to comment
4 minutes ago, Squid said:

Until disk 2 finishes rebuilding, the array is unprotected because if another drive dies then it cannot be rebuilt (that would be 2 simultaneous failures)

 

But, the massive read errors are a possible explanation for what happened to #2.

 

Any time there is a read error on any data disk, unRaid will calculate what is supposed to be in that particular sector (based upon parity and the other drives) and then write the back to that drive.  If the write succeeds, then the read error count is increased.  If the write fails, then the drive gets dropped from the array and it's contents are then emulated.

 

I'm not 100% sure if this also happens during a rebuild ( @johnnie.black would know) because during that process, the data being written may or may not be valid.

 

You've got filesystem corruption on disk 5 (this is why you're not seeing the shares)

 

I always thought that the read error counter was reset with every reboot (could be wrong as they happen so infrequently).  But, complicating it is that there are no smart reports for the drives to tell their health.  Looks like you've got a HP controller.  Maybe click on each of the disks in turn and set the SMART controller type to be HP.

For Clarity's sake, #2 is simply a HDD to SSD upgrade. He's rebuilding for that reason only. 

 

However, Disk #1 took a bite out of the big taco in the sky and that was the start of all of the heartache (Which is why the free space. Some would be technician must have formatted the thing like a noob)

 

Disk #1 had the dust bunnies

 

Disk #3 started the whole debacle with errors initially and prompted my SSD upgrades. Checking for HDD's in the system (They aren't exactly in order, who's job was that anyway?) coupled with the inability to see the actual disks (Because of the GUID's handed out presumably from the raid controller when it creates a raid 0 Virtual Disk to pass through to Unraid) may or may not have been the whole cause of the dust bunnies getting restless climbing into sata ports.

 

I had hoped Disk 3 was an HDD, and it still might be. Once everything rebuilds and chills out I can physically check all my disks carefully with some canned air to be sure no rogue HDD's are hidden.

 

The Read error counter does reset on reboot. That is true. I hoped the disk 3 ones wouldnt  come back... alas they have.

 

Don't know if any of this helped, but it was fun banter if nothing else :)

Link to comment
18 minutes ago, Squid said:

Until disk 2 finishes rebuilding, the array is unprotected because if another drive dies then it cannot be rebuilt (that would be 2 simultaneous failures)

 

 

 

You've got filesystem corruption on disk 5 (this is why you're not seeing the shares)

 

 

I can update when the rebuild finishes in about 30 minutes. 

 

So it sounds like I can repair disk 5 and restore 5 out of 6 of my array?

Link to comment

@johnnie.black is the best to continue this convo.  The 70K scares the crap out of me, and there's nothing obvious in the syslog about it, so don't want to give any advice that will make an already bad situation get even worse. 

 

2 minutes ago, RogueWolf33 said:

So it sounds like I can repair disk 5 and restore 5 out of 6 of my array?

Yup.  Just follow the directions in the prior link.  Basically Stop the array, restart it in maintenance mode.  Click on disk 5 and run the filesystem check.

  • Like 1
Link to comment
1 minute ago, Squid said:

@johnnie.black is the best to continue this convo.  The 70K scares the crap out of me, and there's nothing obvious in the syslog about it, so don't want to give any advice that will make an already bad situation get even worse. 

 

Yup.  Just follow the directions in the prior link.  Basically Stop the array, restart it in maintenance mode.  Click on disk 5 and run the filesystem check.

Right on Squid,

 

Thanks bud. I appreciate so much the feed back and the assistance already given. I will keep an eye out for @johnnie.black to continue this convo. 

 

disk #1 went WAYYY higher which is why i assumed 70k was like on the lower end. Ultimately I like 0's too. I'll set those smart controllers to HP and run another diag when the rebuild completes. Then I will shut down, dust, Check for HDD's and bring it back up in maintenance and just run disk checks on all the drives systematically to ensure best chance of success. 

 

Unless I hear otherwise from Johnnie or something.

Link to comment

Update:

 

Started in maintenance mode and checked all disk. Only disk 5 found with corruption. Repaired disk 5 with instructions using the -v switch. (or operator, I am not sure of the correct terminology)

 

Ran a check after the repair using -n

 

Everything is clean.

 

Once I started the array again in normal mode, My shares seem to have data again. Probably not all of it due to disk 1 being formatted but it's a huge jump for linux newbie nerds across the world. 

 

Post repair diagnostics attached.

tower-diagnostics-20190922-0256.zip

Link to comment

Final Update:

 

I powered down the server systematically removing my SSD's. No HDD's left in my system. (With the exception of my parity drive) I have completely battled the jungle of dust bunnies. They have been exterminated. May they rest in peace. I carefully reseated all SATA drives back in their original homes.

 

Prior to doing this, Disk 3 while attempting to view "Disk Log Information" on the "Main" tab, displayed cycling errors. Now, All drives file systems and Disk Log Info display green. Data is available again and I can start to rebuild.

 

Thank you so much  @Squid for always keeping the light on and lending a helping hand to us up and coming. 

 

I am sure I will be talking to you again in like 6 months or so when I inadvertently jack something else up!

 

Until then, Happy Raiding!

 

-RW33

Link to comment
6 hours ago, Squid said:

Any time there is a read error on any data disk, unRaid will calculate what is supposed to be in that particular sector (based upon parity and the other drives) and then write the back to that drive.  If the write succeeds, then the read error count is increased.  If the write fails, then the drive gets dropped from the array and it's contents are then emulated.

 

I'm not 100% sure if this also happens during a rebuild ( @johnnie.black would know) because during that process, the data being written may or may not be valid.

The same would happen with dual parity, but during a rebuild with single parity Unraid has no way of calculating missing data if there are errors on another disk, so some data on disk2 will be corrupt, unless by luck those sectors don't have data.

  • Like 1
Link to comment
7 hours ago, itimpi said:

@RogueWolf33  I assume you have realised that SSDs are not officially supported?     This is because Trim is not supported on array drives so performance is likely to degrade over time.

I have not realized that. I thought I might have picked up that they are not supported as parity drives. I was thinking I might have been lucky that I just kept my parity drive a mechanical. It seems that having a cache option where everyone is bound to use them might spur a development in that area. Not to mention they are HUGELY prominent in the general PC building community as price continues to drop. Either way, so far my Unraid (without my own intervention gumming up the works) has been operating great. But thank you for making me aware!

Link to comment

It's a hardware thing with SSDs, not a software issue

Quote

 There are two issues with SSD in a parity-protected array. First is dealing properly with TRIM. It all depends on what an SSD reports when a block is read that has been previously TRIM’ed. Some devices return previous contents, some return indeterminant data, some return all-zeros. We can only use devices that return all-zeros. The second issue is that in Unraid all writes to data devices will also require writes to parity device(s), hence those devices will sustain quite a bit more wear.On the other hand, creating a new array or adding an SSD to an existing array can be quite fast since we can make use of full-device TRIM that sets all data blocks to zeros.

 

Link to comment
7 hours ago, johnnie.black said:

The same would happen with dual parity, but during a rebuild with single parity Unraid has no way of calculating missing data if there are errors on another disk, so some data on disk2 will be corrupt, unless by luck those sectors don't have data.

@johnnie.black hmm. it almost sounds like from a rebuild standpoint, for redundancy and efficiently, dual parity would be a little more robust in data restoration. Unraid doesn't only reference parity drives though right? I can clearly see the read operations happening in tandem with all drives. Or am I misinterpreting this information?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.