January 25, 20197 yr Greetings, I wonder if anyone has seen anything like this and would welcome suggestions as to how to proceed. I'll start by advising that I can upload diagnostics but as this is very intermittent it likely won't have logs related to this. The issue: Starting 4 months ago, and occurring 3 times since then I get an error "Array has 1 disk with read errors" - the disk becomes disabled. It's always the same disk, which is an identical model to another (I have two 4TB Ironwolf disks produced/purchased about 4 months apart.) The disk in question is just under a year old. Each time this has occurred I have attempted to run smartctl via command line - the disk doesn't respond - so checking the disk simply yields a startup message for smartctl and then exits. Works fine on the other 'identical' disk. I then reboot. This leads to the server getting stuck at 'Detecting hard drives' during initial startup. Interestingly the drive cage for the affected slot also shows a diagnostic red light for the disk. The first two times - I did this: Turn off server, remove the drive, switch the cable, switch the slot the drive seats into the backplane - so the cable, the sata port and the backplane slot are ones that were working fine before with other drives. Turn on server - drive is recognized fine. Run extended SMART tests - no errors found. So I then just reassign the disk and it happily rebuilds from parity. The SMART report on the drive doesn't show any errors. A month or so passes and then the same thing occurs - typically 2 read errors and the drive is disabled. This reoccurred again two days ago, so this time I pulled the drive and ran Seatools on Win 10 twice doing the generic long test - which in theory reads the entire disk to check for errors. Can't get an error. I'd like to RMA the drive but seems that if I cannot prove an error then the RMA probably won't be successful. At this time the only thing I can think of is - could this be a spindown timing/reporting issue? I can envision a situation whereby the disk is commanded to spin up, but doesn't respond in a timely manner. Strange though, since I'd expect that to be a firmware bug, and both of the 'identical' drives have the same firmware and appear identical in hdparm or Seatools disk info. I can also report that at no time since this issue has occurred are the temperatures unusual - in the summer they might have seen some heat in the 50c+ range but during the failure times I haven't seen a disk above 30c. Also the disk that isn't failing is always above the one that is, and thus experiences a little more heat than the failing one. Initially I suspected subtle vibration was doing something to the cable/backplane but three times on the same disk in different slots with different cables? Seems doubtful. Any ideas, suggestions are welcome. Yrs, Del
Archived
This topic is now archived and is closed to further replies.