Jump to content

Have I potentially lost or corrupted any data during my disk upgrade?


Recommended Posts

Ok, so I just want to get some advice on whether I need to tear down my setup and restart, or not.

 

I had an array that was getting low on space, so I purchased a 3TB disk to replace a 1TB currently included in the array. I ran 3 cycles of pre-clear with no issues and ran a parity check to make sure everything was in order.

 

Once everything was completed, I stopped the array and took out the hotswap tray with the 1TB and replaced it with the pre-cleared 3TB. I refreshed the Array Devices on the web interface, replaced the 1TB with the 3TB in the dropdown and proceeded to start the array, complete with the data rebuild that it asked for. The time estimate was 7 hours, which was due to complete before my mover script triggers - so I (potentially) stupidly left it on.

 

When I've come home today and checked on the rebuild, the disk is showing as unmountable in the GUI with 7 million writes to the disk. The cache drive is empty, but I can see last night's files on the shares. I stopped the array but the disk still shows as unmountable; in restarting the array again I've kicked off a new data rebuild.

 

I still have the 1TB disk with all the data on, so I don't think I'm in a horrific position - but I can't help but feel this rebuild could be corrupted due to the fact that parity data has been written overnight while this new disk was showing as unmountable and since I have now kicked off a new data rebuild using the latest parity drive.

 

Can anyone offer any thoughts on this? Or is it going to be a case of spending my weekend going back to the old drive, creating a new config, rebuilding parity and then reswapping the disk to see if it builds correctly this time?

 

I'm happy to provide diagnostics info if anyone think it would help - but the only thing I think that would show is a potential reason for the rebuild not working, and I'm rebuilding again now anyways. Although if it fails after this run then I'll definitely be looking into it. I just don't want to risk subtly corrupting something that I may later need to rely on if/when a disk fails.

 

Thanks

Link to comment

Which disk was showing as unmountable? Do you mean the new parity disk or one of your data disks?

 

Remember that the parity disk is the least important of all the disks in your server. If all the others are good then your data is very likely to be safe, though it is currently unprotected and will remain so until your parity rebuild has completed. There's no danger to letting the mover run while parity is rebuilding, provided the rest of your disks are sound, it will just slow it down a little.

 

When you say that you stopped the array and took out the hot swap tray, did you power down the server between those two operations? The fact that you say that you refreshed the web interface makes me think you replaced the parity disk with the system powered. You shouldn't do that. unRAID doesn't support the hot swapping of array disks.

 

Since you have written to the array (indirectly, by the use of the mover) the contents of your old parity disk are of no use to you. But your data is fine anyway - you've just messed up your parity.

 

My suggestion is to stop the array, power down, wait a few minutes, power back up and restart the array. The either let the parity rebuild (because you interrupted it again) or run a parity check with the Write corrections box checked and let it complete. Then you'll be where you want to be.

 

EDIT: I wrongly assumed the OP was replacing his parity disk.

 

Link to comment

Hi John,

 

Thanks for the reply. I think the bit you've missed is that the swap has been a data drive, not the parity drive. This is a data rebuild that is occurring, not a parity rebuild. The data on the new drive is being written to the replacement using the current parity disk state, which is why I think there may be an issue.

 

I don't know which of your questions are still relevant in light of that, but:

 

- The new data disk is showing as unmountable.

- I did not power down the system between swapping the drives, no. However I did stop and start the array between the change.

- The new disk had already been detected by the OS as sda, during the pre-clear. This appeared in the disk selection after a refresh.

Link to comment

Yes, you're right. I don't know why I assumed you were talking about replacing the parity disk, but I did. I'll strike out my previous post to minimise the confusion.

 

EDIT: But you did need to power down while replacing the disk. What I said about unRAID not supporting the hot swapping of array disks still holds.

 

EDIT: OK, we're discussing semantics here. Of course, there are two meanings to the words "replacing a disk". What I mean is that you must power down to replace the disk physically. If you already had it connected to another port, then you didn't need to power down to replace it logically.

Link to comment

EDIT: OK, we're discussing semantics here. Of course, there are two meanings to the words "replacing a disk". What I mean is that you must power down to replace the disk physically. If you already had it connected to another port, then you didn't need to power down to replace it logically.

 

I'm just trying to make sure we're both clear what the other side is talking about so you can reassure/belittle me appropriately and I can take the right steps to ensuring the data on all my drives is actually correct, and that parity is also correct at the end of all of this.

 

So I had the disk connected through a USB3 external docking station (shown here) and I use X-Case 5-in-3 hotswap caddies in my Unraid server (look similar to those shown here).

 

After the pre-clear was completed, I powered off the docking station and removed the disk from the loader. I then stopped the array and ejected the tray containing the 1TB data disk, swapped the disk and reinserted the tray. Then I refreshed the array devices, assigned the drive and restarted the array (triggering the data rebuild).

 

So, for next time: I absolutely have to power down after the "stop the array" stage.

 

In the current predicament at hand, my question is: do I need to stop this data rebuild, change the disks back, rebuild parity and then go through the upgrade process again? Or will the fact that it's currently rebuilding off of a parity disk that has been modified since I swapped the disk not be important going forward?

Link to comment

Hey, I'm not here to belittle anyone. I completely messed up my original reply, after all. We're all fallible.  :)

 

Your description of moving the new drive physically from a USB docking station to a 5-in-3 bay definitely falls under the "need to power down" category.

 

What I'd do is keep the original drive safe because you can always put it in your USB box and mount it using the Unassigned Devices plugin and read the files from it. However, it is no longer a valid member of the array because since removing it the array has been written to by the mover.

 

I'd let the rebuild finish and see what unRAID makes of the rebuilt disk. Trouble is, I expect it will be unmountable again. If that's the case it will surely be due to the fact that you hot-swapped it. Since the mover ran at a time when that disk was being rebuilt - I believe it was still being emulated so your data should still be valid across the remaining disks.

 

So what you need to do is something like this: stop the array and power down. Remove the disk. Power up and start the array, which forces unRAID to notice that the disk is missing, so make sure that it's happy and emulating the missing disk. Then stop the array and power down. Replace the disk. Power up and reassign the missing disk. Again, that forces unRAID to take notice of what you've done. Start the array and let it rebuild onto the new disk.

 

Hopefully, that will salvage the situation. Hopefully, if I'm wrong someone else will comment.

 

Link to comment

Diagnostics could show what caused the unmountable disk, but if the emulated disk (or the rebuilding disk if you're trying again) shows unmountable, it will still be like that after the rebuild finishes, since you have the old disk, and after making sure all cache data was moved to another disk and it's accessible, I believe best way to proceed is to do a new config with your old disk, let parity sync and then upgrade disk again. This assuming all disks are healthy.

Link to comment

Was the emulated disk showing as unmountable or just the rebuilt disk? I had attributed the unmountable nature of the rebuilt disk to the fact that it had been hot-swapped, since you'd verified that everything was well before executing the swap. Certainly, Johnnie's suggestion will get your system working again but you might well lose some of what was in your cache - if the mover moved it to the emulated disk, though if the emulated disk was also unmountable, then the cache contents must have been moved elsewhere and will be safe.

 

 

Link to comment

I never saw an emulated disk (a consequence of going straight from one disk to the other, I guess) so I don't know whether that was unmountable. But the mover ran after the rebuild would have finished - which is why I think the parity drive may be out of sync and may cause an issue as this replacement is rebuilt. I guess it depends how this emulation works and whether it'll remain in place until after the disk successfully mounts. All files that were moved off the cache yesterday are playable - so they haven't been written to a null drive or anything.

 

I've disabled the mover script now, and I'm waiting the remaining 6 hours of the rebuild to finish and will see how it goes. If it doesn't work, then I'll try John_M's method triggering a data rebuild first, and move onto johnnie.black's full parity rebuild from the old drive after.

 

Thanks both

 

EDIT: I've just spun up the old drive and checked its contents. The files on there are definitely not available on the current array shares - so I think it looks like emulation isn't currently in place? Will see what happens at the end of this rebuild and see if anything emulates after I power down and remove this drive, but I don't have high hopes atm.

Link to comment

Let me explain this better:

 

1-first, you should post your diagnostics, it could show what caused this in the first place

 

2-with everything working as it should, the mover running during a rebuild would not cause an unmountable disk or parity to be out of sync, it would slow down the rebuild, nothing more

 

3-if the current rebuilding disk is unmountable, it will be like that after the rebuild finishes

 

4-if in 3 disk is unmountable, cancel rebuild, stop array, unassign unmountable disk (select “no device”) and start array, now you’ll see the emulated disk, if it also shows as unmountable every rebuild will be like that

 

5- if in 4 disk is unmountable, do a new config with the old disk, let parity sync complete and upgrade again *

 

*- alternative, let rebuild in 3 finished, format the disk, and copy data from old disk. (if you’re sure all data moved from cache yesterday is on other disks on the array)

 

Link to comment

Fortunately, I was just in the process of posting as the rebuild finished overnight (and, as you say, the drive is still showing as unmountable).

 

1. Attached is the diagnostics file taken after the second rebuild has completed.

 

4. I have now powered down, ejected the drive tray and restarted - after starting the array I have "Disk 4 - Not installed - Unmountable" showing on the array page and the data from the missing disk is not available in the array. So the emulated disk is unmountable too.

 

5. I'll start the parity rebuild using the old disk today and try again. Regarding a new config; before I start, is there anything I need to keep in mind? i.e. assigning physical disks to the same slot in unRAID etc?

 

After the parity rebuild is complete, then I:

[*]Power down

[*]Eject old disk

[*]Power on

[*]Start array missing disk

[*]Check emulated disk and ensure data is avaiable on the array

[*]Stop array

[*]Power down

[*]Insert new disk

[*]Power on

[*]Start array and perform data rebuild

[*]Pray disk is mountable

 

Correct? Anything else I need to do or anything I need to do different?

 

 

EDIT: Hmm, interesting. I disabled the mover script yesterday during the rebuild to ensure there wouldn't be an issue - but last night's files are on the array and nothing on the cache disk. The web interface shows the mover is disabled and you can see it in the attached syslog being commented out at Feb 26 19:41:33.

 

EDIT 2: Have created a new config and assigned the data disks as they were before. Started the array, checked all files from the replaced disk were present and that files moved onto the array since I started were also present. Stopped the array and assigned parity. Now started a parity sync.

titanium-diagnostics-20160227-0902.zip

Link to comment

Disks themselves look ok but there are several with UDMA_CRC_errors, these usually mean bad sata cable or enclosure, but can also be old errors, keep an eye on them and if it keeps increasing replace cable/enclosure.

 

Device Model:     SAMSUNG HD502IJ
Serial Number:    S13TJDWQ641003
199 UDMA_CRC_Error_Count    0x003e   100   100   000    Old_age   Always       -       4

Device Model:     ST3000DM001-9YN166
Serial Number:    Z1F15WEV
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       481

Device Model:     ST3000DM001-9YN166
Serial Number:    Z1F16ABQ
199 UDMA_CRC_Error_Count    0x003e   200   200   000    Old_age   Always       -       66

Device Model:     WDC WD20EARS-00S8B1
Serial Number:    WD-WCAVY6677195
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       104

Device Model:     WDC WD20EARS-00S8B1
Serial Number:    WD-WCAVY6684004
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       40

Device Model:     WDC WD20EARS-00S8B1
Serial Number:    WD-WCAVY6684849
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       24

Device Model:     WDC WD30EFRX-68EUZN0
Serial Number:    WD-WCC4N4VA3YK6
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       2

Device Model:     WDC WD30EFRX-68EUZN0
Serial Number:    WD-WMC4N0E65C4R
199 UDMA_CRC_Error_Count    0x0032   200   197   000    Old_age   Always       -       11

Link to comment

After the parity rebuild is complete, then I:

[*]Power down

[*]Eject old disk

[*]Power on

[*]Start array missing disk

[*]Check emulated disk and ensure data is avaiable on the array

[*]Stop array

[*]Power down

[*]Insert new disk

[*]Power on

[*]Start array and perform data rebuild

[*]Pray disk is mountable

 

Correct? Anything else I need to do or anything I need to do different?

 

 

You can skip several steps here:

 

1. Power down

2. Eject old disk

3. Insert new disk

4. Power on

5. Assign new disk

6. Start array and perform data rebuild (all data should be immediately available)

 

Link to comment

You can skip several steps here:

 

1. Power down

2. Eject old disk

3. Insert new disk

4. Power on

5. Assign new disk

6. Start array and perform data rebuild (all data should be immediately available)

 

I followed this guide exactly, and the new data disk is currently rebuilding, but is showing as accessible on the array page and the data from the disk is currently accessible. :)

 

Thanks so much both of you! Is there a way of marking this as solved or anything?

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...