Old disk "redballed" while pre-clearing new disk


Recommended Posts

8 hours ago, maddog808 said:

 

Thanks for the suggestion Frank. Would you be able to expand on how I should go about having a good look at it? It's a Corsair CX500 if that helps.

 

Bad choice of words on my part.  I meant the specs to begin with.  So I looked them up.  It does have a single 12V rail with a 38 ampere rating on that rail which is adequate for the ~8 drives you have in the server. 

 

But I will tell you that I had a Corsair CX450 fail in less than a month last year.  A sample size of one is not really enough evidence to say that Corsair makes bad power supplies but PS do go bad and sometimes when they do, they can trigger multiple failures that are difficult to figure out what is going on.  The best way to test is to replace it and see what happens.  It is a fairly cheap component so it is not too painful to do. (Plus, there is often one in a "junk" box just waiting for such a opportunity.) 

 

7 hours ago, maddog808 said:

Is there any reason at this point to not RMA this 3 week old drive?

 

If you haven't been sending a lot of disks for warranty service, you won't get any static on  a RMA request.  In fact, I suspect that they don't even look at the disk when it arrives expect to verify the serial number before they ship out the replacement.  The reason for this theory is that anytime, I have sent in a disk, the replacement is shipped out the same day. 

Edited by Frank1940
Link to comment
11 hours ago, jonathanm said:

Yes, if you can return it for a new replacement. Much better to get a brand new drive than a RMA refurb.

 

Good point. I was planning on returning to Newegg for a brand new one. So am I correct that the disk is bad, based on the errors in the report?

 

Link to comment
On 1/4/2018 at 7:22 PM, pwm said:

The syslog doesn't tell that there is anything wrong with the drive - only that there is a communications failure with the drive.

 

How have you eliminated the possibility of a bad cable or a bad disk controller port?

 

That's a great point, pwm. So I just tried to troubleshoot, using the following steps:

 

  1. Swapped cable. I was able to at least get through the SMART test after swapping the cable. So I tried to run a preclear on it, without the pre or post read. The zeroing was so slow, at about 150kb/s, that the estimated time to finish was like 989 days.
  2. Swapped ports on the PCIe SATA card - still using the new cable. Again, I was able to get through the SMART test. So I tried preclear again. Same results-super slow zeroing speed. So I tried just adding the drive to the array, and let Unraid clear the disk. Same results-very slow speed, estimated 800 days to finish.
  3. Tested read speeds of all disks in the server using "hdparm -tT /dev/sdx". All speeds were about what I expected, but the weird thing is that this drive, sdh, had decent speed for a WD Red @ 158 MB/sec.
  4. Tried preclear again, this time with the pre & post read. Pre-read verification failed instantly this time.
  5. Tried using "hdparm -tT /dev/sdh" one more time out of curiosity. Here's what the output of that test is: "/dev/sdh:
    read() hit EOF - device too small
     Timing buffered disk reads: read() hit EOF - device too small"
  6. I've also ordered another 4 port PCIe SATA card from Amazon. It will be here today, and I'll test it with a few drives (including this one I'm having trouble with).

 

So now I'm really stumped. I realize the disk looks fine in diagnostics and the SMART reports, with no reallocated sectors, no current pending sectors, no offline uncorrectable, etc. But based on everything I tried above, isn't it time to return this disk to Newegg, and get a brand new one?

 

I've also attached a fresh SMART report for the disk, which failed. And another diag in case anyone wants to check it out for me.

 

Thanks again to everyone who has helped with my ongoing server issues in this thread.  :D

unraid-diagnostics-20180106-0911.zip

unraid-smart-20180106-0908.zip

Edited by maddog808
Link to comment

Several ATA errors until it finally got disabled:

 

Quote

Jan  6 08:51:21 Unraid kernel: ata7: hard resetting link
Jan  6 08:51:22 Unraid kernel: ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Jan  6 08:51:27 Unraid kernel: ata7.00: qc timeout (cmd 0xec)
Jan  6 08:51:27 Unraid kernel: ata7.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan  6 08:51:27 Unraid kernel: ata7.00: revalidation failed (errno=-5)
Jan  6 08:51:27 Unraid kernel: ata7: hard resetting link
Jan  6 08:51:27 Unraid kernel: ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Jan  6 08:51:37 Unraid kernel: ata7.00: qc timeout (cmd 0xec)
Jan  6 08:51:37 Unraid kernel: ata7.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan  6 08:51:37 Unraid kernel: ata7.00: revalidation failed (errno=-5)
Jan  6 08:51:37 Unraid kernel: ata7: hard resetting link
Jan  6 08:51:38 Unraid kernel: ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Jan  6 08:52:08 Unraid kernel: ata7.00: qc timeout (cmd 0xec)
Jan  6 08:52:08 Unraid kernel: ata7.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan  6 08:52:08 Unraid kernel: ata7.00: revalidation failed (errno=-5)
Jan  6 08:52:08 Unraid kernel: ata7.00: disabled

 

It's connected on a Marvell controller, these are know to be flaky some times, suggest you try again using one of the onboard ports.

 

P.S. Also some ATA errors on the parity disk, possibly from a bad cable

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.