Parity check running at <10MB/s with 228 million errors...

-Daedalus · June 12, 2016

So I'm new to unRAID, running the following non-final setup:

1TB cache drive

8, 8, 4TB data drives

They're just a cobbling together of random drives before I move my existing data across. As such, it's only dummy data that's on them at the moment, so I don't care what happens to them.

I got the drives set up and parity built/checked all fine. I pulled a drive an reinstalled it, and the parity build went fine as well.

I wanted to experiment with a cache pool - as that's what I plan to run when I move my main data over - so I planned to move the 4TB drive out of the main array and add it to the cache drive. So I did the following:

I'm a little hazy on exactly what I did, but I ended up creating a new config, and assigning the drives as appropriate. I'm sure the correct drives were assigned to the main array (I took note of the serial number of the parity one, and the others are obviously easily identifiable by their size). So now I have:

1+4TB cache pool

8, 8TB data drives

A rebuild started however, and was going extremely slowly. Currently, it's at 8.9MB/s, estimating over 9 days until completion. The previous array rebuild took just under 20 hours. It's also currently at just over 228 million errors.

Anyone any idea what's going on here? I'm probably going to end up wiping it and starting over, (as I'm not waiting for a week for it to finish with data I don't care about) but I'd like to know what's going on so I don't get into this situation in the future.

Thanks all! Syslog attached.

server-diagnostics-20160612-1902.zip

trurl · June 12, 2016

Did you tell it to trust parity when you did the new config? If so that is what your problem is.

It seems like you are doing a correcting parity check instead of an initial parity sync. Since you removed the 4TB from the array you needed to do a parity sync to rebuild parity, not a parity check, because parity was no longer valid for the changed drive configuration.

-Daedalus · June 12, 2016

I did tell it to trust parity.

My understanding from the 'high-water' fill pattern was that the 8TB drives would get full first, and since only about 50GB of data was on the array, the 4TB drive should have been empty - and it was, I verified this - Why then would the parity not be the same? If the drive was full of zeros surely it wouldn't affect the parity drive?

If I was only doing a parity sync... Wouldn't that have just spat the same errors at me and not corrected them? Maybe I'm missing something obvious here.

Final question: If it is recalculating parity, wouldn't it be close in speed to the parity drive write? Why so slow? I'd expect some CPU overhead, but not that much.

Thanks for the quick response!

trurl · June 12, 2016

Stop the parity check, set another new config, and this time do a parity sync (rebuild) by not telling it to trust parity.

An empty filesystem is not the same thing as a clear drive. Even if there are no files on a formatted drive, the filesystem has data on the drive. When you format a drive (in any operating system you have ever used) you are actually writing an empty filesystem to the drive. A drive with a filesystem is not all zeros (clear) so when you remove it parity must be rebuilt.

A parity sync is faster than a parity check because it is not reading and comparing parity when it does a sync, it is just writing parity.

-Daedalus · June 12, 2016

Cool, will do.

I'd prefer, if possible, to keep my original 1TB cache drive intact, as it's got a VM and dockers set up.

I assume this won't get affected in any way by any of this?

trurl · June 12, 2016

Drives in the cache pool are not affected by parity, and in any case no data drives are affected by a parity sync or parity check. Only the parity drive is written.

Also, you should be aware that your cache pool as currently configured (1TB + 4TB) only has 1TB capacity.

btrfs disk usage calculator

-Daedalus · June 12, 2016

Cool. Figured, just wanted to be sure.

I figured (RE capacity) which is why I was a little surprised when my cache pool size read as 2.5TB.

trurl · June 12, 2016

Cool. Figured, just wanted to be sure.

I figured (RE capacity) which is why I was a little surprised when my cache pool size read as 2.5TB.

The displayed value is incorrect when drive sizes don't match. Been that way since cache pools were introduced in the v6 betas. It gets more complicated when there are more than 2 drives in the pool so that is probably why they haven't gotten around to fixing it yet.

-Daedalus · June 12, 2016

Good to know.

Got the drives configured the way I wanted, and everything seems to be working as I'd expect. Thanks again for all the quick responses.

-Daedalus · June 13, 2016

Not sure whether to make a new thread for this (different issue, but don't want to be clogging the first page of posts up too much)

Got the parity rebuilt fine. Once the sync was done I decided to test out the cache pool. I had only one drive in there, so I:

Stopped array

Assigned second drive to cache

started array

All showed up fine as protected, including the cache-only shares, as expected. For testing, I pulled a SATA cable from one of the drives - The first one in the pool, and the original - Everything kept playing and running perfectly!

Except no errors are showing up anywhere, both cache drives are still being shown as present, and cache-only shares (which should be unprotected now) are showing as protected.

Edit: The logs do show I/O errors, but everything on the front-end says everything is fine.

Have I missed something?

-Daedalus · June 13, 2016

Anyone? I'm not feeling too comfortable leaving this running with floods of I/O errors, and I'd rather not reboot in case syslog or something else is needed.

JorgeB · June 13, 2016

It's a known issue.

http://lime-technology.com/forum/index.php?topic=49458.0

-Daedalus · June 13, 2016

Alright, so the the behaviour I saw - everything continuing as normal - is the expected then.

What's the protocol for replacing a drive, if unRAID detects it as working? If I stop/start, I assume then it'll show as missing, and I can mount a new drive in the normal manner?

Side-question: Is this still the case with the 6.2 betas?

JorgeB · June 13, 2016

If you stop and start the array the pool will rebalance to single disk, you can then another disk to the pool, balance will be done automatically after start, if adding a previously pool used disk it's better to clear it/format it first.

6.2-beta is still the same.

-Daedalus · June 13, 2016

Nice one, cheers.

-Daedalus · June 13, 2016

Alright, so that didn't work as planned.

Stopped array

Removed first cache disk

Started array (2 slots, 1 drive)

Array has started, but all the data that was on the cache pool has gone. Would starting the cache in a single slot fix this?

JorgeB · June 13, 2016

Did you wait for the balance to finish after adding the 2nd cache disk and before disconnecting the 1st disk cable? If not data on 2nd cache disk can be incomplete/unmoutable.

-Daedalus · June 13, 2016

I waited a few minutes. There isn't much in the way of UI feedback (from what I could see) though, so I'm not sure.

JorgeB · June 13, 2016

Balance status is on the WebGUI cache page, clicking on the 1st cache disk.

JorgeB · June 13, 2016

Removed first cache disk

One more thing, did you just unassign cache1 or removed it from the server? Cache pool will appear as unmountable when starting array with a previously used cache pool disk unassigned, it has to be disconnected (or cleared/formatted).

-Daedalus · June 13, 2016

I unassigned it and started a preclear on it. I started the array while the preclear was in progress (figuring that it wouldn't show as a valid cache drive once the clear started)

Parity check running at <10MB/s with 228 million errors...

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Archived