[SOLVED] Bad drive or bad hardware?

May 8, 201610 yr

I am new to unRAID, and have built an array (v6) with a bunch of mostly new drives. One of them seems to be having problems (a 4TB WD Red, serial ending in VAX).

Background: I was in a hurry to try unRAID out so I only did one pass of pre-clear on each drive, and this ran with no problems for all drives. Everything was good until I tried to run a time-machine backup. I setup a time machine AFP share using only one drive (advice I'd seen on the forums), and this was on the 4TB WD Red. During the backup I got a lot of disk errors (I think they were read errors), and the backup failed. I left it at that (don't have much spare time to spend on this, and the TM backup wasn't urgent). A couple of days ago I swapped the SATA cables with another drive to determine whether it was a bad cable or port (I don't have spare cables, so swapping seemed like a logical approach). That day I got a write error message for the drive and it was kicked from the array. I sat down to work out what the problem was with the drive - to determine whether to RMA it.

I ran a short and long smart test, and each time they come back good. No bad / reallocated sectors, most things look ok to my uneducated eye. So, based on other forum posts I decided to try and rebuild the drive (I figured it would either work, or prove the drive was bad). I went through the process of de-assigning and re-assinging the drive, and started the re-build. It ran for a few hours then hit a whole load of write errors. This is where I am now. I downloaded the diagnostics, and took a look through them.

I want to RMA the drive, but the thing that's bugging me is that the SMART reports only list read errors, and doesn't mention any write errors. Could it be that the drive is good and the write errors are from a bad cable / SATA port. I just don't know how to tell. I don't want to send the drive back to Amazon and have them deem it's OK.

Can anyone offer advice on what to do to prove or disprove this is a bad drive? Are there any signs I'm missing in the SMART reports?

tower-diagnostics-20160507-0135.zip

Quote

May 8, 201610 yr

Community Expert

Well, something is going on with disk1. (I admit that I am no syslog guru but have just enough knowledge to make me dangerous...) I assume that is the HD that ends in "VAX". There is definitely something going on with the "VAX" disk in the SMART report. That disk has had a 153 errors in its short life so far. It could be bad but let's look at a few other things first.

List your hardware. Be specific and include MB, Amount of RAM, PSU and any cards you have plugged in.

Have you switched the SATA cable to another SATA port on your motherboard? (I know you said you 'swapped' the cable but was it only one end or both ends?)

Have you double checked that all of the SATA cables and SATA power connectors are on solidly?

Are you using any of the locking-type SATA cables? (They are incompatible with most recent WD drives!)

Have you set the MB SATA ports to be in the AHCI mode rather then the legacy mode? (This change has to be made in the BIOS.) I won't think this is the cause of this problem but you will get better performance in the AHCI mode.

Quote

May 8, 201610 yr

Community Expert

Although SMART attributes look perfect, there are some warnings that the disk is not very healthy:

Error 153 occurred at disk power-on lifetime: 287 hours (11 days + 23 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 18 a0 00 00 e0  Error: UNC 24 sectors at LBA = 0x000000a0 = 160

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 18 a0 00 00 e0 08      00:02:03.367  READ DMA
  ef 10 02 00 00 00 a0 08      00:02:03.365  SET FEATURES [Enable SATA feature]
  ec 00 00 00 00 00 a0 08      00:02:03.364  IDENTIFY DEVICE

Error 152 occurred at disk power-on lifetime: 274 hours (11 days + 10 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 c0 50 41 e0  Error: UNC 8 sectors at LBA = 0x004150c0 = 4280512

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 c0 50 41 e0 08      00:02:04.542  READ DMA

Error 151 occurred at disk power-on lifetime: 273 hours (11 days + 9 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 20 a0 00 00 e0  Error: UNC 32 sectors at LBA = 0x000000a0 = 160

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 20 a0 00 00 e0 08      00:07:22.566  READ DMA

Error 150 occurred at disk power-on lifetime: 273 hours (11 days + 9 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 f8 4f 44 e0  Error: UNC 8 sectors at LBA = 0x00444ff8 = 4476920

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 f8 4f 44 e0 08      00:05:24.959  READ DMA
  c8 00 18 88 3a 0b e0 08      00:05:24.958  READ DMA
  c8 00 78 c0 97 00 e0 08      00:05:20.831  READ DMA
  c8 00 f0 c8 96 00 e0 08      00:05:20.830  READ DMA

Error 149 occurred at disk power-on lifetime: 272 hours (11 days + 8 hours)
  When the command that caused the error occurred, the device was active or idle.

  After command completion occurred, registers were:
  ER ST SC SN CL CH DH
  -- -- -- -- -- -- --
  40 51 08 58 59 41 e0  Error: UNC 8 sectors at LBA = 0x00415958 = 4282712

  Commands leading to the command that caused the error were:
  CR FR SC SN CL CH DH DC   Powered_Up_Time  Command/Feature_Name
  -- -- -- -- -- -- -- --  ----------------  --------------------
  c8 00 08 58 59 41 e0 08      00:04:25.781  READ DMA
  ca 00 08 50 5a 41 e0 08      00:04:25.781  WRITE DMA
  ef 10 02 00 00 00 a0 08      00:04:25.781  SET FEATURES [Enable SATA feature]
  ec 00 00 00 00 00 a0 08      00:04:25.780  IDENTIFY DEVICE

These suggest that there were some bad sector issues, looks like the internal firmware dealt with them, as there are no pending sectors and it passed the extended SMART test (there were some more errors after it, so it may fail a new one), still would not trust this disk.

Quote

May 10, 201610 yr

Author

Thanks for your prompt reply Frank1940, I've added my setup details to my profile. I've also got an IO Crest 4 Port SATA III PCI-e 2.0 card, and an Nvidia Quadro 4000.

I originally used the 5 motherboard SATA ports for the drives (the IO Crest card was for expansion). When I started getting the read errors I wanted to isolate the problem, but I didn't have any spare cables so I shut everything down, pulled the SATA cable from the back of the bad drive and from the back of a behaving drive and swapped them around. My thinking was that if it was the cable or the motherboard port then the bad drive would start behaving, and the other drive would start producing errors. If it was a bad drive then it would carry on erroring, which it did.

I'm not using locking SATA cables, so I did also go through all the drives and ensure the cables were seated well, and the power cables were all seated OK.

The motherboard is set to use AHCI mode.

Thanks for the feedback johnnie.black. I agree these don't look good. Is it at all possible they could be caused by bad mobo/ports/cable?

Annoyingly I'm out of the Amazon return period, so I think I'm going to have to go back to WD for the RMA. Any advice before I do?

Quote

May 10, 201610 yr

Community Expert

Annoyingly I'm out of the Amazon return period, so I think I'm going to have to go back to WD for the RMA. Any advice before I do?

I don't think you have anything to worry about. Whenever I have RMA a drive under warranty, they have always shipped the replacement drive within 24 hours after receiving it. So I am sure that they never check them until later. However, I have the feeling that they maintain a customer database so that if you send in a lot of HD's just before the end of the warranty period, you will be flagged for 'special treatment'. But, in general, they assume that most people are honest. By now, you do realize that it is hassle to send a drive in and you are out of business until you get a replacement. So it is not a something many people would do on a whim. Plus, the manufacturers want a good PR on the easiness and quickness of their warranty service--- it is just good for business!

Quote

May 10, 201610 yr

Author

That makes me feel much better, thanks Frank!

Quote

[SOLVED] Bad drive or bad hardware?

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)