Old disk "redballed" while pre-clearing new disk

maddog808 · January 4, 2018

FWIW, I just tried to add the disk to the array. Unraid started to clear the disk, but the process was immediately cancelled with read and write errors. Attached is a syslog showing the errors.

Is there any reason at this point to not RMA this 3 week old drive?

unraid-syslog-20180103-2204.zip

Frank1940 · January 4, 2018

8 hours ago, maddog808 said:

Thanks for the suggestion Frank. Would you be able to expand on how I should go about having a good look at it? It's a Corsair CX500 if that helps.

Bad choice of words on my part. I meant the specs to begin with. So I looked them up. It does have a single 12V rail with a 38 ampere rating on that rail which is adequate for the ~8 drives you have in the server.

But I will tell you that I had a Corsair CX450 fail in less than a month last year. A sample size of one is not really enough evidence to say that Corsair makes bad power supplies but PS do go bad and sometimes when they do, they can trigger multiple failures that are difficult to figure out what is going on. The best way to test is to replace it and see what happens. It is a fairly cheap component so it is not too painful to do. (Plus, there is often one in a "junk" box just waiting for such a opportunity.)

7 hours ago, maddog808 said:

Is there any reason at this point to not RMA this 3 week old drive?

If you haven't been sending a lot of disks for warranty service, you won't get any static on a RMA request. In fact, I suspect that they don't even look at the disk when it arrives expect to verify the serial number before they ship out the replacement. The reason for this theory is that anytime, I have sent in a disk, the replacement is shipped out the same day.

Edited January 4, 2018 by Frank1940

JonathanM · January 4, 2018

9 hours ago, maddog808 said:

Is there any reason at this point to not RMA this 3 week old drive?

Yes, if you can return it for a new replacement. Much better to get a brand new drive than a RMA refurb.

maddog808 · January 5, 2018

11 hours ago, jonathanm said:

Yes, if you can return it for a new replacement. Much better to get a brand new drive than a RMA refurb.

Good point. I was planning on returning to Newegg for a brand new one. So am I correct that the disk is bad, based on the errors in the report?

pwm · January 5, 2018

The syslog doesn't tell that there is anything wrong with the drive - only that there is a communications failure with the drive.

How have you eliminated the possibility of a bad cable or a bad disk controller port?

maddog808 · January 6, 2018

On 1/4/2018 at 7:22 PM, pwm said:

The syslog doesn't tell that there is anything wrong with the drive - only that there is a communications failure with the drive.

How have you eliminated the possibility of a bad cable or a bad disk controller port?

That's a great point, pwm. So I just tried to troubleshoot, using the following steps:

Swapped cable. I was able to at least get through the SMART test after swapping the cable. So I tried to run a preclear on it, without the pre or post read. The zeroing was so slow, at about 150kb/s, that the estimated time to finish was like 989 days.
Swapped ports on the PCIe SATA card - still using the new cable. Again, I was able to get through the SMART test. So I tried preclear again. Same results-super slow zeroing speed. So I tried just adding the drive to the array, and let Unraid clear the disk. Same results-very slow speed, estimated 800 days to finish.
Tested read speeds of all disks in the server using "hdparm -tT /dev/sdx". All speeds were about what I expected, but the weird thing is that this drive, sdh, had decent speed for a WD Red @ 158 MB/sec.
Tried preclear again, this time with the pre & post read. Pre-read verification failed instantly this time.
Tried using "hdparm -tT /dev/sdh" one more time out of curiosity. Here's what the output of that test is: "/dev/sdh:
read() hit EOF - device too small
Timing buffered disk reads: read() hit EOF - device too small"
I've also ordered another 4 port PCIe SATA card from Amazon. It will be here today, and I'll test it with a few drives (including this one I'm having trouble with).

So now I'm really stumped. I realize the disk looks fine in diagnostics and the SMART reports, with no reallocated sectors, no current pending sectors, no offline uncorrectable, etc. But based on everything I tried above, isn't it time to return this disk to Newegg, and get a brand new one?

I've also attached a fresh SMART report for the disk, which failed. And another diag in case anyone wants to check it out for me.

Thanks again to everyone who has helped with my ongoing server issues in this thread.

unraid-diagnostics-20180106-0911.zip

unraid-smart-20180106-0908.zip

Edited January 6, 2018 by maddog808

JorgeB · January 6, 2018

Several ATA errors until it finally got disabled:

Quote

Jan 6 08:51:21 Unraid kernel: ata7: hard resetting link
Jan 6 08:51:22 Unraid kernel: ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Jan 6 08:51:27 Unraid kernel: ata7.00: qc timeout (cmd 0xec)
Jan 6 08:51:27 Unraid kernel: ata7.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan 6 08:51:27 Unraid kernel: ata7.00: revalidation failed (errno=-5)
Jan 6 08:51:27 Unraid kernel: ata7: hard resetting link
Jan 6 08:51:27 Unraid kernel: ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Jan 6 08:51:37 Unraid kernel: ata7.00: qc timeout (cmd 0xec)
Jan 6 08:51:37 Unraid kernel: ata7.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan 6 08:51:37 Unraid kernel: ata7.00: revalidation failed (errno=-5)
Jan 6 08:51:37 Unraid kernel: ata7: hard resetting link
Jan 6 08:51:38 Unraid kernel: ata7: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Jan 6 08:52:08 Unraid kernel: ata7.00: qc timeout (cmd 0xec)
Jan 6 08:52:08 Unraid kernel: ata7.00: failed to IDENTIFY (I/O error, err_mask=0x4)
Jan 6 08:52:08 Unraid kernel: ata7.00: revalidation failed (errno=-5)
Jan 6 08:52:08 Unraid kernel: ata7.00: disabled

It's connected on a Marvell controller, these are know to be flaky some times, suggest you try again using one of the onboard ports.

P.S. Also some ATA errors on the parity disk, possibly from a bad cable

Old disk "redballed" while pre-clearing new disk

Recommended Posts

maddog808

Link to comment

Frank1940

Link to comment

JonathanM

Link to comment

maddog808

Link to comment

pwm

Link to comment

maddog808

Link to comment

JorgeB

Link to comment

Join the conversation