[Solved] Parity-Check seems to have stalled

GreggP · October 13, 2012

I had a power outage and I don't have a UPS to protect my unRaid server. After restarting, it automatically started the parity check. Now it seems like the parity check has stalled. In the command area it shows:

Parity-Check in progress.

Total size: 1,953,514,552 KB

Current position: 1,448,913,212 KB (74.1%)

Estimated speed: 8 KB/sec

Estimated finish: 285440.8 minutes

Sync errors: 101

At this rate it will take over 28 weeks to finish the parity check. Prior to the parity check, this was an unRaid system that was very stable. It's been running for a couple years without any problems. I haven't had to run a parity check in a long time and I've never had any sync errors in the past. The only changes have been the occasional replacement of drives for more storage capacity. When making these upgrades, I've followed the information in the wikis, etc. So it has been a very long time since I've had to deal with my unRaid server so I may need a lot of guidance. I am running unRaid version 4.7.

I assume I'll need to just stop the parity check and do the basic diagnostics. When I refresh the screen (using the 'Refresh' button on the console) after waiting several minutes the screen updates and it shows some activity. The parity check is still working, but very slowly. Here's the results about 15 minutes later:

Started

Parity-Check in progress.

Total size: 1,953,514,552 KB

Current position: 1,448,923,384 (74.1%)

Estimated speed: 8 KB/sec

Estimated finish: 225428.6 minutes

Sync errors: 106

As you can see, the sync errors are increasing. Or, should I just be patient and let the parity check continue until finished. Maybe the slow down is just temporary and will go back to a normal speed after the parity check moves past a section of one of the disks...?

I've started unmenu and it looks like there's all sorts of read errors with my disk2. Unmemu also shows that these sync errors are getting corrected. I tried attaching my syslog to this post, but it is now too large (4203KB) and exceeds the maximum size. Is there another way to share this vital info?

Using the service WeTransfer, you can download my syslog from here: http://wtrns.fr/JwjpVXV-3Eej4Gw.

Thanks for any help in advance.

Joe L. · October 13, 2012

please post a smart report for disk2. (you can easily get it from unMENU's disk management page.)

The errors are un-correctable media errors, They are being re-constructed on the fly and re-written to disk2, so it is taking its time.. Let the parity calc finish, but you can get a smart report now to see what it has found thus far.

Odds are you'll need to replace disk2, but first let's see the smart report.

Joe L.

GreggP · October 13, 2012

Here's the smart report for my disk2 (/dev/sdc)

smart.txt

GreggP · October 13, 2012

OK, the Parity-Check finally completed. Once it got past the errors, it probably ran at normal speeds.

My resulting syslog can be downloaded here (again, too big to attach to this post):

http://wtrns.fr/91atqBst4p4bEpq

The smart report after completing the parity check is attached. What next... Do you think I should run either the short or long SMART tests?

Joe, as always, your help is very appreciated.

Thank you very much!

smart2.txt

Joe L. · October 13, 2012

OK, the Parity-Check finally completed. Once it got past the errors, it probably ran at normal speeds.

My resulting syslog can be downloaded here (again, too big to attach to this post):

http://wtrns.fr/91atqBst4p4bEpq

The smart report after completing the parity check is attached. What next... Do you think I should run either the short or long SMART tests?

Joe, as always, your help is very appreciated.

Thank you very much!

you have several hundred sectors either re-allocated, or pending re-allcoation. I'd replace the disk.

Neither a short or long test will help... All they do is read the sectors on the disk, but they stop on the first error. The parity check is as good. What is troubling is that there are sectors still pending re-allocation, which indicates they were not able to be re-allocated.

Basically, RMA the disk.

Joe L.

GreggP · October 13, 2012

I'm starting the RMA process. Does anyone have experience navigating HGST's warranty/return process?

They have a 3 step process - troubleshoot, verify warranty eligibility, create RMA.

The first step is to run a troubleshooting utility, but this will only work if the drive is internal to the computer that has a web browser with their support page active. Is there another way to prove the drive is defective and in need of repair or replacement?

The second step is to verify the warranty eligibility by submitting the serial number. This drive model doesn't appear on their current list. It was purchased from Newegg about a year ago (August 26, 2011) and Newegg shows it had a 3 year limited warranty. I can get the serial number from the smart.txt report. However, when I submit this number (ML0220F310KJXD) in their 'Check Warranty' util, it comes back with a message saying "Invalid serial".

[10/16/12 update: called HGST customer service and they told me to only use the last 8 digits of my serial number and that worked.]

Again any advice is greatly appreciated.

Is it possible to use my unRAID server while I wait for a replacement drive? Or, am I risking data loss with this drive still in the array?

GreggP · October 16, 2012

I've purchased a new 2TB Western Digital WD20EARX drive and started the RMA process on this defective Hitachi 2TB GST Deskstar 5K3000 drive. I'm going to return the Hitachi drive as soon as the replacement arrives. Then when I get the replacement for this defective Hitachi RMA'd drive, I'll use it to replace an older 1TB drive and increase the size of my array. The new WD drive is scheduled to arrive this Friday.

Am I risking any data loss if I continue using my unRAID server with this defective Hitachi drive in it? I'm not planning to add any files to the defective drive until it can be replaced, but it would be nice to read from it (watch movies). I guess I'm a little concerned that another drive will fail before Friday, but unRAID doesn't really show the Hitachi as a "failed" drive. The status is still green.

Also, just out of curiosity I decided to run another parity check. I'm still only 30% through (with no errors), but I'm a little worried that I might be causing more harm here and wonder if I should stop the parity check.

If I let the parity check finish and it doesn't report any sync errors, does this mean my drive is okay and the un-allocated sectors were re-allocated?

GreggP · October 21, 2012

The replacement WD20EARX drive arrived last Friday. So I precleared it and then swapped out my defective Hitachi 5K3000 drive and started the array. unRAID just completed reconstructing the contents of the failed disk onto the new disk. Hopefully, everything is back to normal.

On the main page of the console, under the command area, it shows parity is valid and that it was last checked on 10/21/2012 at 3:46:33 PM, finding 156 errors.

Is there any cause for alarm with these 156 errors?

Should I run another parity-check to see if there are still errors?

The reason I ask is, I haven't had any parity-check errors for a long time on this server, so I don't know if this is pretty common when you replace a defective drive. Actually, I think this is the first time I've replaced a defective drive. All the previous replacements were for replacing a smaller drive with a larger drive.

I've attached my syslog, if that helps.

syslog-2012-10-21.txt

GreggP · October 22, 2012

anyone?

dgaschk · October 23, 2012

Run a check. There should be no errors.

GreggP · October 23, 2012

Parity-check completed. Zero errors.

Thanks for responding.

[Solved] Parity-Check seems to have stalled

Recommended Posts

GreggP

Link to comment

Joe L.

Link to comment

GreggP

Link to comment

GreggP

Link to comment

Joe L.

Link to comment

GreggP

Link to comment

GreggP

Link to comment

GreggP

Link to comment

GreggP

Link to comment

dgaschk

Link to comment

GreggP

Link to comment

Archived