Roll Back from a new config


Recommended Posts

Good there it's a happy resolution! This is first situation I have seen that dual parity has been needed to recover (there may have been others I have missed).

 

A couple of comments.

1. We tend to think of dual parity protecting its from two drive failures, but this situation involved zero drive failures. I would classify this incident in the "user error" or "user misunderstanding" category. The more prone a user is to these types of incidents, the more important the need for dual parity. In other words, dual parity is much more likely to protect you from you, than to protect you from two real disk failures happening at the same time.) Newer users tend to be more prone to make mistakes as they learn the system, so I highly recommend dual parity for new users especially.

 

2. UnRaid recovery is a wonderful thing when it works, BUT it is dependent on all of the disks being in perfect synchronization with each other. While Raid5 or Raid6 would enforce this with strict controls, unRaid would not. Features like "trust parity" are inherently risky, because even if one bit on one disk is off, a recovery (if one is needed) will be off. So it might seen reasonable to ask in this situation, was there a possibility that a disk or parity got updated inappropriately? I think not.in this case, but unless flame creating md5 or similar checksums, he would not know for sure. I.have such checksums and a means to keep them updated for just such a check. It wouldn't help me fix the problem, but would tell me, from the sea of media files, what file or files are corrupted.

 

Again, great job Johnnie walking through a complicated recovery! And by flamegrilled who followered the process and appears to have recovered his goose being cooked with only mild grill marks.  :)

Link to comment

Features like "trust parity" are inherently risky, because even if one bit on one disk is off, a recovery (if one is needed) will be off. So it might seen reasonable to ask in this situation, was there a possibility that a disk or parity got updated inappropriately? I think not.in this case, but unless flame creating md5 or similar checksums, he would not know for sure. I.have such checksums and a means to keep them updated for just such a check. It wouldn't help me fix the problem, but would tell me, from the sea of media files, what file or files are corrupted.

 

I tested this procedure multiple times and it always worked, including checking cheksums on the rebuilt disk(s) and all were OK, but I agree that if the user doesn't have them there's no way to be sure, but if both parity disks were 100% in sync all data should be perfect.

 

I do have some reservations doing this if the rebuilt disk is larger than the original, the rebuilt disk will have file system corruption due to the different size, although in my tests with XFS it was easily fixed, reiser should also be fixable.

Link to comment

If you start an array that was shut down improperly the drives will start writing the journaled data to the disks when the disks are mounted. If the disks are continuously in the array that is typically no problem. But if you take a drive that you don't really know it's state and stick it in as a rebuild disk, and then do the trust parity procedure and start the array, UnRaid will attempt to mount that disk. If that disk is detected as unformatted, no problem. It will generate no writes and parity stays prefect. But if the disk is detected as formatted, unRaid will mount it, a process that results in a few disk writes and therefore parity writes. But these writes are to the housekeeping section of the disk. Would those few writes pollute parity? Probably not. But what if that disk had been subjected to a dirty shutdown outside the array? Now when the disk is mounted, the journaled writes will start getting applied randomly to the disk, and parity will start getting updated inappropriately. Will that pollute parity? Yes. And an ensuing rebuild will be subtly inaccurate. This is why I didn't like putting those two unknown disks in the array and starting it. If they been detected as formatted it could have been destructive.

Link to comment

Yes, you could start the server in maintenance mode, then physically remove the two disks, restart in regular mode. The two drives will be dropped and you can check to see that the emulated disks look good. Then plug in the drives again and add them to the array. They'll be detected as new, so no chance they'll affect parity.

 

I'm trying to remember, though, when starting an array for the first time, is maintenance mode even an option?

Link to comment

Yes, you can start in maintenance mode right after a new config, and I was thinking there's no need to physically remove the disks, you could do it like this:

 

-new config, assign all disks

-trust parity, start the array in maintenance mode

-stop array, unassign disk(s) to rebuild

-start array, check emulated disk(s) mount and look OK

-stop array, reassign disk(s)

-start array to begin rebuild

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.