[solved] swap-disable won't let me start the array

July 13, 201511 yr

My system is as follows:

disk 1 - 2TB

disk 2 - 2TB

disk 3 - 2TB

disk 4 - 2TB

Parity - 2TB

Disk 2 has failed. I purchased 2 x 4TB drives. The idea is to a) replace the faulty drive, and b) consolidate disk 1 (which is showing signs of failure) on to the replaced Disk 2 (Disk 2a?).

I know i need to use swap-disable, so i have done the following:

1. Powered down.

2. Removed the failed disk 2 and replaced with a 4TB disk

3. Powered back up

4. Assigned the old parity to the Disk 2 slot

5. Assigned the new 4TB disk to the Parity slot.

However, UnRAID says "invalid configuration". If it try and assign the 4TB to the failed Disk 2 slot it (correctly) informs me that it's too large. Any ideas?

Syslog is attached.

fs-syslog-20150713-1829.zip

Quote

July 13, 201511 yr

Was disk2 red balled or listed as missing with the array started at any time? For the swap disable to work, the array must have been started in a degraded state at least once, IIRC.

Try assigning all the old drives in their original slots except for disk2, and start the array. It should start after you check the box that says something about running at risk or something similar, and emulate the contents of disk2. If that does not happen, swap disable couldn't work anyway.

Stop the array, assign the old parity as disk2, and the new 4TB as parity, and see if a message about copying parity to the new drive shows up.

Quote

July 13, 201511 yr

Author

Thanks for the pointer. No, it wasnt red, but it is showing a relatively large number of reallocated sectors and is often blocked (read only) - reiserfsck shows tree errors which can't be fixed. The drive is around 5 years old so i was kind of expecting it to fail soon anyway - i was just holding out and hoping that 6TB drives would drop in price quick enough but apparently it's not to be. Disk 1 is about the same age and has a smaller number of reallocated sectors, hence wanting to retire that too.

I did the following:

1. power down

2. remove disk 2

3. Replace disk 2 with new 4TB drive

4. Power up

5. check that Parity and Disks 1, 3, 4 are correctly recognised

6. Check the box to start the array in a degraded (unprotected) state

7. start the array

8. Check the box to stop the array

9. stop the array

10. Assign Parity to Disk 2

11. Assign the new 4TB drive to Parity

12. Check the "copy" box

13. Start the array.

It's now "copying..." - i'll report back when it's done but it looks like this has allowed me to initiate a "swap-disable"

Quote

July 13, 201511 yr

Thanks for the pointer. No, it wasnt red, but it is showing a relatively large number of reallocated sectors and is often blocked (read only) - reiserfsck shows tree errors which can't be fixed.

That is worrying. The newly rebuilt drive will have the exact same filesystem errors, because parity just rebuilds the entire drive exactly as is, parity doesn't have any concept of files or format. When was the last time you did a parity check that resulted in zero errors?

Do you have current backups?

You may need to recover what you can from disk2 by copying to another drive and format it to get a healthy filesystem on it. Since you posted in the V6 area, I assume you are running 6.01, which has the ability to use xfs if you reformat the drive, which will erase it. All signs currently point to xfs being the best choice for new drives at the moment.

Quote

July 13, 201511 yr

Author

The last successful parity was something like 3-6 months ago. It's simply not come up - there haven't been any ungraceful shutdowns since then. I only happened to notice something was going wrong when i tried to delete some stuff and couldn't - and then noted that the drive was mounted read-only.

i suspect they can't be fixed because the drive is failing too fast - but i'll give it a go after the copy completes. It's managing about 1% every 10 minutes, so i'll be able to start verifying the data on it some time tomorrow. It's mostly large (3-4GB) single file movies, so easier to replace rather than spend too much time recovering them, and anything important like photos is on crashplan too so, but it's still annoying. A clean failure would have been better

i've been reading up on XFS vs. btrfs. It's clear that most commentators value the longevity (= "chance to fix things") of XFS and i think i agree. I'll pick that up when this phase is done and i'm ready to add another 4TB drive to replace Drive 1 and consolidate another drive on to that.

Quote

July 13, 201511 yr

The last successful parity was something like 3-6 months ago. It's simply not come up - there haven't been any ungraceful shutdowns since then. I only happened to notice something was going wrong when i tried to delete some stuff and couldn't - and then noted that the drive was mounted read-only.

One of the benefits of scheduling regular parity checks is to weed out bad drives earlier rather than later. In my early days with unraid (7 years ago) I thought I would be smart and reuse some crappy marginal drives in my array because, hey, if one fails, I can just rebuild it, that's what RAID is for right? Turns out that was a spectacularly poor decision, because when one of my high usage drives failed, one of my marginal drives decided to torpedo my recovery by passing corrupt data during the rebuild.

Unraid is great in that it allows you to keep most drives spun down, and only spin up the one you need. Unraid sucks at knowing when a seldom used drive is failing. V6 is better, it at least tries to warn you about critical smart stats. Regular parity checks help to ensure that those seldom used drives are still healthy enough to be used in a rebuild attempt.

I run a non-correcting check every month, and keep a close eye on smart stats. Marginal drives are out at the first opportunity.

Quote

July 13, 201511 yr

Author

One of the benefits of scheduling regular parity checks is to weed out bad drives earlier rather than later. ... Regular parity checks help to ensure that those seldom used drives are still healthy enough to be used in a rebuild attempt. ...I run a non-correcting check every month, and keep a close eye on smart stats. Marginal drives are out at the first opportunity.

That's a good tip, thanks. I've set that up myself too now.

Quote

July 14, 201511 yr

Author

It appears to have completed the first part in about 14 hours. I now see:

"Stopped. Ugrading [sic] disk/swapping parity."

"Start will expand the file system of the data disk (if possible); then bring the array on-line and start Data-Rebuild."

I checked the box "Yes I want to do this" and pressed start. It's estimating a further 18-24 hours before the array is fully rebuilt.

Quote

July 15, 201511 yr

Author

the rebuild completed. I then ran reiserfsck --rebuild-tree and it successfully rebuilt the tree. about 200 files were orphaned but most of them were on a share that i'd been trying to delete anyway, and the tree corruption has not reoccurred (supporting my theory that it was due to drive failure). So - all good - thanks!

Quote

July 16, 201511 yr

Run a parity check to confirm the rebuild.

Quote

July 19, 201511 yr

Author

I didnt get a chance to start a parity check as it started remounting the drive as read only. reiserfsck showed more tree corruption.

My thoughts:

1. check the connections/cables - unlikely to be the cause as the drive is in a different slot in the case (it's a caddy system), and each slot has its own cable.

2. fix the drive using reiserfsck, recovering any missing or corrupt critical files from crashplan

3. move data off to another drive and reformat the faulty one as xfs

Quote

[solved] swap-disable won't let me start the array

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)