Jump to content

Parity check running at <10MB/s with 228 million errors...


-Daedalus

Recommended Posts

So I'm new to unRAID, running the following non-final setup:

 

1TB cache drive

8, 8, 4TB data drives

 

They're just a cobbling together of random drives before I move my existing data across. As such, it's only dummy data that's on them at the moment, so I don't care what happens to them.

 

I got the drives set up and parity built/checked all fine. I pulled a drive an reinstalled it, and the parity build went fine as well.

 

I wanted to experiment with a cache pool - as that's what I plan to run when I move my main data over - so I planned to move the 4TB drive out of the main array and add it to the cache drive. So I did the following:

 

I'm a little hazy on exactly what I did, but I ended up creating a new config, and assigning the drives as appropriate. I'm sure the correct drives were assigned to the main array (I took note of the serial number of the parity one, and the others are obviously easily identifiable by their size). So now I have:

 

1+4TB cache pool

8, 8TB data drives

 

A rebuild started however, and was going extremely slowly. Currently, it's at 8.9MB/s, estimating over 9 days until completion. The previous array rebuild took just under 20 hours. It's also currently at just over 228 million errors.

 

 

Anyone any idea what's going on here? I'm probably going to end up wiping it and starting over, (as I'm not waiting for a week for it to finish with data I don't care about) but I'd like to know what's going on so I don't get into this situation in the future.

 

Thanks all! Syslog attached.

server-diagnostics-20160612-1902.zip

Link to comment

Did you tell it to trust parity when you did the new config? If so that is what your problem is.

 

It seems like you are doing a correcting parity check instead of an initial parity sync. Since you removed the 4TB from the array you needed to do a parity sync to rebuild parity, not a parity check, because parity was no longer valid for the changed drive configuration.

Link to comment

I did tell it to trust parity.

 

My understanding from the 'high-water' fill pattern was that the 8TB drives would get full first, and since only about 50GB of data was on the array, the 4TB drive should have been empty - and it was, I verified this - Why then would the parity not be the same? If the drive was full of zeros surely it wouldn't affect the parity drive?

 

If I was only doing a parity sync... Wouldn't that have just spat the same errors at me and not corrected them? Maybe I'm missing something obvious here.

 

Final question: If it is recalculating parity, wouldn't it be close in speed to the parity drive write? Why so slow? I'd expect some CPU overhead, but not that much.

 

Thanks for the quick response!

Link to comment

Stop the parity check, set another new config, and this time do a parity sync (rebuild) by not telling it to trust parity.

 

An empty filesystem is not the same thing as a clear drive. Even if there are no files on a formatted drive, the filesystem has data on the drive. When you format a drive (in any operating system you have ever used) you are actually writing an empty filesystem to the drive. A drive with a filesystem is not all zeros (clear) so when you remove it parity must be rebuilt.

 

A parity sync is faster than a parity check because it is not reading and comparing parity when it does a sync, it is just writing parity.

Link to comment

Cool. Figured, just wanted to be sure.

 

I figured (RE capacity) which is why I was a little surprised when my cache pool size read as 2.5TB.

The displayed value is incorrect when drive sizes don't match. Been that way since cache pools were introduced in the v6 betas. It gets more complicated when there are more than 2 drives in the pool so that is probably why they haven't gotten around to fixing it yet.
Link to comment

Not sure whether to make a new thread for this (different issue, but don't want to be clogging the first page of posts up too much)

 

Got the parity rebuilt fine. Once the sync was done I decided to test out the cache pool. I had only one drive in there, so I:

 

Stopped array

Assigned second drive to cache

started array

 

All showed up fine as protected, including the cache-only shares, as expected. For testing, I pulled a SATA cable from one of the drives  - The first one in the pool, and the original - Everything kept playing and running perfectly!

 

Except no errors are showing up anywhere, both cache drives are still being shown as present, and cache-only shares (which should be unprotected now) are showing as protected.

 

Edit: The logs do show I/O errors, but everything on the front-end says everything is fine.

 

Have I missed something?

Link to comment

Alright, so the the behaviour I saw - everything continuing as normal - is the expected then.

 

What's the protocol for replacing a drive, if unRAID detects it as working? If I stop/start, I assume then it'll show as missing, and I can mount a new drive in the normal manner?

 

Side-question: Is this still the case with the 6.2 betas?

Link to comment

If you stop and start the array the pool will rebalance to single disk, you can then another disk to the pool, balance will be done automatically after start, if adding a previously pool used disk it's better to clear it/format it first.

 

6.2-beta is still the same.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...