TWO drives pre-clear fine but suffer write errors on rebuild (SOLVED)


Recommended Posts

OK, after running unraid for years I've found something to stump me.

 

I had a drive fail recently with write errors (drive 12/sdq) - giving the red cross in the drive list. I removed the drive from the array and ran an extended SMART test on it - which finished without error. I next ran a full pre-clear cycle (full read, full zero, full read) which also ran without error.

 

I added the drive back into the array thinking the drive controller had maybe remapped some bad sectors and I was OK again (for now), only to have the rebuild fail on the drive with more write errors.

 

Decided this drive really was bad so went to the store and purchased an identical WD Red 3TB drive. Conducted a full pre-clear cycle which it passed with flying colours. Added drive back into the array and it failed during rebuild with write errors. This replacement drive was added to a different controller (on-board vs LSI 9211-8i) with different SATA and power cables and mounted in a different chassis location.

 

So now I'm completely stumped. Two drives, both that pre-clear just fine, both fail during rebuild in around the same place (~ 5%) with write errors (different sectors listed).The only commonality is that they are both in drive position 12 and seem to fail very early in the rebuild process.

 

Anyone got any idea where I start troubleshooting this? No other issues with my array and unraid implementation; its been very stable for the last couple of years and normally has uptimes measured in many months at a time.

 

Diagnostics attached.

preston-diagnostics-20170527-1427.zip

Edited by akawoz
marking solved
Link to comment

Interesting - I've changed only one thing (with anything remotely to do with cabling, power, devices) in the last 6mths; plugged a NiMH battery charger into the same outlet that the server is plugged into. Did this about a week ago.

 

Will try a rebuild again with that removed. Feels a bit like voodoo, but it is a cheap one sourced from Aliexpress.

 

UPDATE: OK that wasn't the problem. Start getting write errors immediately when I start the array. Will power down and try to reseat everything tomorrow and report back.

Edited by akawoz
Link to comment

Haven't done the reseat process yet - but I'm wondering why the whole rest of the array is running just fine, except when I rebuild drive 12. Lots of reads and writes going on to the other 11 drives just fine. Remember the second drive I tried was connected using completely different cabling, to a different HBA (motherboard based).

Link to comment

Logs don't show what happened with the other disk, but with this one there was trouble from the start:

 

May 27 13:04:27 Preston kernel: ata1: softreset failed (1st FIS failed)
May 27 13:04:27 Preston kernel: ata1: SATA link down (SStatus 0 SControl 300)
May 27 13:04:27 Preston kernel: ata1: exception Emask 0x10 SAct 0x0 SErr 0x40d0002 action 0xe frozen
May 27 13:04:27 Preston kernel: ata1: irq_stat 0x00400040, connection status changed
May 27 13:04:27 Preston kernel: ata1: SError: { RecovComm PHYRdyChg CommWake 10B8B DevExch }
May 27 13:04:27 Preston kernel: ata1: hard resetting link

It failed to identify multiple times, and it ended up succeeding with speed limited to SATA2:

May 27 13:05:12 Preston kernel: ata1: softreset failed (1st FIS failed)
May 27 13:05:12 Preston kernel: ata1: limiting SATA link speed to 3.0 Gbps
May 27 13:05:12 Preston kernel: ata1: hard resetting link
May 27 13:05:12 Preston kernel: ata1: SATA link up 3.0 Gbps (SStatus 123 SControl 320)
May 27 13:05:12 Preston kernel: ata1.00: ATA-9: WDC WD30EFRX-68EUZN0,      WD-WCC4N2ZUES3F, 82.00A82, max UDMA/133
May 27 13:05:12 Preston kernel: ata1.00: 5860533168 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
May 27 13:05:12 Preston kernel: ata1.00: configured for UDMA/133
May 27 13:05:12 Preston kernel: ata1: EH complete

But errors started again immediately when trying to rebuild, it's clearly an hardware issue, assuming the disk is fine, cables or controller/port.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.