May 8, 201610 yr I am new to unRAID, and have built an array (v6) with a bunch of mostly new drives. One of them seems to be having problems (a 4TB WD Red, serial ending in VAX). Background: I was in a hurry to try unRAID out so I only did one pass of pre-clear on each drive, and this ran with no problems for all drives. Everything was good until I tried to run a time-machine backup. I setup a time machine AFP share using only one drive (advice I'd seen on the forums), and this was on the 4TB WD Red. During the backup I got a lot of disk errors (I think they were read errors), and the backup failed. I left it at that (don't have much spare time to spend on this, and the TM backup wasn't urgent). A couple of days ago I swapped the SATA cables with another drive to determine whether it was a bad cable or port (I don't have spare cables, so swapping seemed like a logical approach). That day I got a write error message for the drive and it was kicked from the array. I sat down to work out what the problem was with the drive - to determine whether to RMA it. I ran a short and long smart test, and each time they come back good. No bad / reallocated sectors, most things look ok to my uneducated eye. So, based on other forum posts I decided to try and rebuild the drive (I figured it would either work, or prove the drive was bad). I went through the process of de-assigning and re-assinging the drive, and started the re-build. It ran for a few hours then hit a whole load of write errors. This is where I am now. I downloaded the diagnostics, and took a look through them. I want to RMA the drive, but the thing that's bugging me is that the SMART reports only list read errors, and doesn't mention any write errors. Could it be that the drive is good and the write errors are from a bad cable / SATA port. I just don't know how to tell. I don't want to send the drive back to Amazon and have them deem it's OK. Can anyone offer advice on what to do to prove or disprove this is a bad drive? Are there any signs I'm missing in the SMART reports? tower-diagnostics-20160507-0135.zip
May 8, 201610 yr Community Expert Well, something is going on with disk1. (I admit that I am no syslog guru but have just enough knowledge to make me dangerous...) I assume that is the HD that ends in "VAX". There is definitely something going on with the "VAX" disk in the SMART report. That disk has had a 153 errors in its short life so far. It could be bad but let's look at a few other things first. List your hardware. Be specific and include MB, Amount of RAM, PSU and any cards you have plugged in. Have you switched the SATA cable to another SATA port on your motherboard? (I know you said you 'swapped' the cable but was it only one end or both ends?) Have you double checked that all of the SATA cables and SATA power connectors are on solidly? Are you using any of the locking-type SATA cables? (They are incompatible with most recent WD drives!) Have you set the MB SATA ports to be in the AHCI mode rather then the legacy mode? (This change has to be made in the BIOS.) I won't think this is the cause of this problem but you will get better performance in the AHCI mode.
May 8, 201610 yr Community Expert Although SMART attributes look perfect, there are some warnings that the disk is not very healthy: Error 153 occurred at disk power-on lifetime: 287 hours (11 days + 23 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 18 a0 00 00 e0 Error: UNC 24 sectors at LBA = 0x000000a0 = 160 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 18 a0 00 00 e0 08 00:02:03.367 READ DMA ef 10 02 00 00 00 a0 08 00:02:03.365 SET FEATURES [Enable SATA feature] ec 00 00 00 00 00 a0 08 00:02:03.364 IDENTIFY DEVICE Error 152 occurred at disk power-on lifetime: 274 hours (11 days + 10 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 08 c0 50 41 e0 Error: UNC 8 sectors at LBA = 0x004150c0 = 4280512 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 08 c0 50 41 e0 08 00:02:04.542 READ DMA Error 151 occurred at disk power-on lifetime: 273 hours (11 days + 9 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 20 a0 00 00 e0 Error: UNC 32 sectors at LBA = 0x000000a0 = 160 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 20 a0 00 00 e0 08 00:07:22.566 READ DMA Error 150 occurred at disk power-on lifetime: 273 hours (11 days + 9 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 08 f8 4f 44 e0 Error: UNC 8 sectors at LBA = 0x00444ff8 = 4476920 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 08 f8 4f 44 e0 08 00:05:24.959 READ DMA c8 00 18 88 3a 0b e0 08 00:05:24.958 READ DMA c8 00 78 c0 97 00 e0 08 00:05:20.831 READ DMA c8 00 f0 c8 96 00 e0 08 00:05:20.830 READ DMA Error 149 occurred at disk power-on lifetime: 272 hours (11 days + 8 hours) When the command that caused the error occurred, the device was active or idle. After command completion occurred, registers were: ER ST SC SN CL CH DH -- -- -- -- -- -- -- 40 51 08 58 59 41 e0 Error: UNC 8 sectors at LBA = 0x00415958 = 4282712 Commands leading to the command that caused the error were: CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name -- -- -- -- -- -- -- -- ---------------- -------------------- c8 00 08 58 59 41 e0 08 00:04:25.781 READ DMA ca 00 08 50 5a 41 e0 08 00:04:25.781 WRITE DMA ef 10 02 00 00 00 a0 08 00:04:25.781 SET FEATURES [Enable SATA feature] ec 00 00 00 00 00 a0 08 00:04:25.780 IDENTIFY DEVICE These suggest that there were some bad sector issues, looks like the internal firmware dealt with them, as there are no pending sectors and it passed the extended SMART test (there were some more errors after it, so it may fail a new one), still would not trust this disk.
May 10, 201610 yr Author Thanks for your prompt reply Frank1940, I've added my setup details to my profile. I've also got an IO Crest 4 Port SATA III PCI-e 2.0 card, and an Nvidia Quadro 4000. I originally used the 5 motherboard SATA ports for the drives (the IO Crest card was for expansion). When I started getting the read errors I wanted to isolate the problem, but I didn't have any spare cables so I shut everything down, pulled the SATA cable from the back of the bad drive and from the back of a behaving drive and swapped them around. My thinking was that if it was the cable or the motherboard port then the bad drive would start behaving, and the other drive would start producing errors. If it was a bad drive then it would carry on erroring, which it did. I'm not using locking SATA cables, so I did also go through all the drives and ensure the cables were seated well, and the power cables were all seated OK. The motherboard is set to use AHCI mode. Thanks for the feedback johnnie.black. I agree these don't look good. Is it at all possible they could be caused by bad mobo/ports/cable? Annoyingly I'm out of the Amazon return period, so I think I'm going to have to go back to WD for the RMA. Any advice before I do?
May 10, 201610 yr Community Expert Annoyingly I'm out of the Amazon return period, so I think I'm going to have to go back to WD for the RMA. Any advice before I do? I don't think you have anything to worry about. Whenever I have RMA a drive under warranty, they have always shipped the replacement drive within 24 hours after receiving it. So I am sure that they never check them until later. However, I have the feeling that they maintain a customer database so that if you send in a lot of HD's just before the end of the warranty period, you will be flagged for 'special treatment'. But, in general, they assume that most people are honest. By now, you do realize that it is hassle to send a drive in and you are out of business until you get a replacement. So it is not a something many people would do on a whim. Plus, the manufacturers want a good PR on the easiness and quickness of their warranty service--- it is just good for business!
Archived
This topic is now archived and is closed to further replies.