Extremely Slow Parity Check

ElJimador · April 30, 2018

Hello. I finally finished a long running project to optimize/compress all the media on my shared Plex server (JBOX in my sig below) and downsize it from all 6TB to 4TB parity and data drives. After the final steps over the weekend to run a new config / parity rebuild with the new parity drive and then clear and format missing data drives 1 and 5, I wanted to run one final parity check with the new full array to make certain everything is fine (even though the main page was already saying parity was valid as soon as I added those last 2 drives to the array). That parity check is now 13.5 hours in and not even 5% complete yet, with an estimated completion of 5-50 days out. The attached log shows no errors during this time the check has been running but there's some weird loop of an abort task constantly completing over and over again that looks like it's associated with the SATA controller. Perhaps a breakout cable wasn't security reconnected to the card or something?

Need to know if it's safe at this point to cancel the parity check and anything else advised for next steps. Thanks!

jbox-syslog-20180430-1006.zip

trurl · April 30, 2018

Without looking at syslog most likely a connection issue. It is always safe to cancel a parity check but of course you should complete one eventually to make sure parity is OK.

Instead of posting syslog you should always go to Tools - Diagnostics and post the complete diagnostics zip, which contains syslog, SMART for all disks, and many other useful things.

JorgeB · April 30, 2018

First thing you want to do is to update the LSI firmware, FWVersion(20.00.00.00) was very buggy, update to 20.00.07, if still problem please post the complete diags.

ElJimador · April 30, 2018

Thanks trurl and Jonnie for your replies. I'll search around for instructions on updating the LSI firmware and I'll keep in mind to post the diagnostics next time if it happens again after that. One last question though: after I stopped the parity check and rebooted I checked the dashboard before starting the array and found UDMA CRC error counts for every drive attached to the LSI controller + the parity and cache drives which are connected onboard (though only a count of 1 on the parity drive vs. 10k - 400k for the drives connected to the controller). Is there anything I should glean from that in determining whether the root of the problem really was the firmware vs. a physical connection issue? Just glancing at the logs all of the task aborts I noticed were on the drives connected to one particular breakout cable which is why I thought that cable might have come a little loose from the card (and indeed the error counts on those drives were much higher at around ~400k for those vs. 10-30k for the drives connected via the other cable). I wasn't expecting to see errors on every drive though so just wondering if there's anything I should read into that before attempting the firmware update or anything else.

JorgeB · April 30, 2018

UDMA CRC errors are usually caused by a bad cable, but it can also be the controller or backplane if one exists.

Extremely Slow Parity Check

Recommended Posts

ElJimador

Link to comment

trurl

Link to comment

JorgeB

Link to comment

ElJimador

Link to comment

JorgeB

Link to comment

Archived