Write Cache on two drives causes lots of "hard resetting link" errors


Recommended Posts

As per the title I have two WD ultra star 12TB drives  (HGST HUH721212ALE604) that when write cache is enabled (which is by default) I constantly hard link resets and I hear the devices cycling.

 

Jan  3 10:29:58 NAS01 kernel: ata2.00: exception Emask 0x10 SAct 0x0 SErr 0x4090000 action 0xe frozen
Jan  3 10:29:58 NAS01 kernel: ata2.00: irq_stat 0x00400040, connection status changed
Jan  3 10:29:58 NAS01 kernel: ata2: SError: { PHYRdyChg 10B8B DevExch }
Jan  3 10:29:58 NAS01 kernel: ata2.00: failed command: FLUSH CACHE EXT
Jan  3 10:29:58 NAS01 kernel: ata2.00: cmd ea/00:00:00:00:00/00:00:00:00:00/a0 tag 21
Jan  3 10:29:58 NAS01 kernel:         res 40/00:00:40:9e:eb/00:00:02:03:00/40 Emask 0x10 (ATA bus error)
Jan  3 10:29:58 NAS01 kernel: ata2.00: status: { DRDY }
Jan  3 10:29:58 NAS01 kernel: ata2: hard resetting link
Jan  3 10:30:08 NAS01 kernel: ata2: softreset failed (device not ready)
Jan  3 10:30:08 NAS01 kernel: ata2: hard resetting link
Jan  3 10:30:14 NAS01 kernel: ata2: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Jan  3 10:30:14 NAS01 kernel: ata2.00: configured for UDMA/33
Jan  3 10:30:14 NAS01 kernel: ata2.00: retrying FLUSH 0xea Emask 0x10

 

They're in an array with two WD Red 4TBs which are fine and don't have this issue, happened ever since they were introduced to the HP Microserver Gen 8 that i use them in. If iI disable the write cache on the devices all errors dissapear. Changing slots the issue follows the drives, so I don't believe this is cable related, but perhaps an incompatibility with the drives on the Microserver controller or possibly a PSU issue?

 

For now although crude I'm running a user script each hour to disable the write cache for the two drives based on serial number i.e.

 

hdparm -W 0 /dev/disk/by-id/ata-HGST_HUH721212ALE604_5PGT4X7D
hdparm -W 0 /dev/disk/by-id/ata-HGST_HUH721212ALE604_5PGTEZTD

 

That seems to be doing the trick to avoid errors, as the write cache does seem to turn back on randomly. Any thoughts would be appreciated, an extended SMART test is always fine and both of these drives have no SMART errors at all.

  • Upvote 1
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.