Drive Errors on new 4TB Parity drive


Recommended Posts

I ran this Seagate NAS 4TB drive through 5 preclear cycles which were all successful. I removed the old parity drive and installed the new 4TB drive to be the new parity drive. I'm at 37% right now rebuilding parity on the new 4TB drive and have a few of this drive error. No errors from the old parity drive in the same slot so this is new to me. Old parity drive is not in the server so power requirements should be nearly identical, plus I have a Cx500 PSU so that should be able to provide plenty of power. What do you guys think? I upgraded to 5.0 last week and ran several parity checks and all resulted in 0 errors before moving onto adding the 4TB drive as the new parity drive. Should I stop the rebuild?

 

Feb  2 04:15:35 unRAID kernel: ata1.00: exception Emask 0x50 SAct 0x0 SErr 0x400800 action 0x6 frozen (Errors)
Feb  2 04:15:35 unRAID kernel: ata1.00: irq_stat 0x08000000, interface fatal error (Errors)
Feb  2 04:15:35 unRAID kernel: ata1: SError: { HostInt Handshk } (Errors)
Feb  2 04:15:35 unRAID kernel: ata1.00: failed command: WRITE DMA EXT (Minor Issues)
Feb  2 04:15:35 unRAID kernel: ata1.00: cmd 35/00:00:c8:4c:13/00:04:0f:00:00/e0 tag 0 dma 524288 out (Drive related)
Feb  2 04:15:35 unRAID kernel:          res 50/00:00:c7:4c:13/00:00:0f:00:00/e0 Emask 0x50 (ATA bus error) (Errors)
Feb  2 04:15:35 unRAID kernel: ata1.00: status: { DRDY } (Drive related)
Feb  2 04:15:35 unRAID kernel: ata1: hard resetting link (Minor Issues)
Feb  2 04:15:35 unRAID kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 300) (Drive related)
Feb  2 04:15:35 unRAID kernel: ata1.00: configured for UDMA/133 (Drive related)
Feb  2 04:15:35 unRAID kernel: ata1: EH complete (Drive related)

Link to comment

Could be a loose or bad SATA cable.

 

Normally that's what I'd think, but the old parity drive has been in that slot for over a year with no errors. It's a 5-in-3 dock, so maybe the ports aren't seated in the backplane? What do I do so I can power down and try to reseat the drive? Cancel the parity sync? Or should I just let it finish the parity sync and then try to fix the interface problem?

Link to comment

IMO, you're experiencing parity sync errors on a new drive on a system where you've not experienced any.  Why trust what's going on?

 

I hope you ran a parity check before changing anything (and ran this while no writes occurred on the array whilst checking).

 

Assuming you've got a backup of your flash drive prior to exchanging the disk, the old disk is still in tact (unchanged), and you haven't written anything to the array (all those things are important AFAIK), I'd recommend the following:

 

Cancel the parity sync.

Stop the array.

Shut down.

Reseat all connections related to your disks.

Start up.

Start the array/begin parity sync again.

 

I'm certain that someone else will come up with better instructions, but those would be the steps I'd take.

 

The backups/untouched array and parity disk should allow you to go back to your previously stable system.

Link to comment

So the sync completed before I saw what you posted. Last night I powered down and reseated several times and unplugged/plugged the SATA cable a few times. Ran nocorrect parity check overnight and it just completed with only 1 sync error and NO ATA interface errors. I'd say that's a success, but I'll run another nocorrect parity check to be sure before I add the old parity drive to the array.

Link to comment

I could be wrong but I think at this point you need to run a correcting parity check (since an error was found).  Unless you don't think the error is related to the parity drive of course.  I would read this first though:

 

http://lime-technology.com/forum/index.php?topic=31020.msg280013#msg280013

 

The first parity checks resulted in no errors, the second check resulted in 16 sync errors and the third parity check resulted in 12 sync errors. And the sync errors were all on different bits between checks so this leads me to believe there's nothing wrong with the data disks and that the problem is either with the new parity drive, memory, PSU, motherboard SATA controllers, etc. I'm running memtest86 right now, and after 36-48 hours if there aren't any errors I'll then run Prime95.

Link to comment

I get some of those "frozen" errors in my log too. But it's only while copying from one drive to another using Midnight Commander logged in as root, and while trying to access the config menu via http. I figure the system is busy accessing 2 data drives + parity, then when I access the menu it just freezes for about 15 seconds, then continues and writes the "frozen" error in the log and resets one of the drive's "link". But for me it's happening on ata7 even though I'm not using the drive on ata7 in the copying. Maybe it's a sort of "system busy" error message. Oh well, probably has no relation to your issue.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.