[SOLVED] Hard drive failure or worse?

CaptainSpalding · March 20, 2014

I was trying to move files to disk17 within the array and it failed couple of times. Then "the ball" went orange. I just made parity check couple days ago and everything was fine.

Shouldn't the parity be correct and I could just replace the failed drive and rebuild?

I had similar problems with the drive before, but I could write to drive after few retries. The syslog mentions ata4, is that the port on the motherboard?

As I have migrated data to another server that has only sata ports in use the parity check speed is about 100Mb/s and with my older that has the HW as in my sig, I get about 30-50MB/s. There are 22 drives including parity. They are 2-3TB WD Green drives mostly, parity also. They are connected to PCI-e 16x slots that are 8x each when both are connected.

Should I get a motherboard that has more sataports if I would like the parity check to be faster?

The newer server has AT5NM10T-I with Atom D525.

syslog-2014-03-19_DISK17_FAIL.zip

dgaschk · March 20, 2014

Try a new SATA cable. See here: http://lime-technology.com/wiki/index.php/Troubleshooting#What_do_I_do_if_I_get_a_red_ball_next_to_a_hard_disk.3F

CaptainSpalding · March 20, 2014

I changed the sata cable, but stills shows red ball. I did seems to have got one reads on when the array auto-started. But read failed?

SSD · March 20, 2014

If a cabling problem causes a drive to be "red balled" and then you fix the underlying problem, the red ball does not go away. Read the wiki section again. It should explain the way to rebuild the disk and make your array happy again.

CaptainSpalding · March 20, 2014

Ok, thanks!

Just getting long smart report.

Short did not show nothing.

CaptainSpalding · March 20, 2014

Hmm... I used "smartctl -t long /dev/sdc", but if I read correctly it did only short test?

UDMA_CRC_Error_Count is 2 so maybe was just the cable?

smart_disk17.txt

SSD · March 21, 2014

Skip down to the section that says "Re-enable the drive".

CaptainSpalding · March 24, 2014

Just doing data rebuild and just asking, can I do a powerdown after rebuild is finished and do it on next powerup or should I do a parity check straight after?

SSD · March 24, 2014

A parity check soon after the reconstruct is a good idea, but does not have to be done instantly. You can powerdown the server or whatever. I would recommend you grab a syslog before you shutdown or reboot though, as it might contain hints if the drive had troubles during the rebuild. Also, check its smart report to make sure you're not seeing new issues.

CaptainSpalding · March 24, 2014

Just one dumb question, but does the unraid do any checks during rebuild or is it just pure write to the disk?

itimpi · March 24, 2014

Just one dumb question, but does the unraid do any checks during rebuild or is it just pure write to the disk?

When rebuilding a disk, you will be writing to the disk being rebuilt, and reading from all the other disks. No additional checks are done on the disk being written - unRAID expects the underlying OS to report a problem if a write fails.

CaptainSpalding · March 24, 2014

Thanks itimpi for the clarification.

There now about an hour left of the parity check and now this shows up in the syslog "Mar 24 16:00:31 Tower kernel: NTFS driver 2.1.30 [Flags: R/W MODULE]".

What does that mean? I have not never seen that before.

trurl · March 24, 2014

Thanks itimpi for the clarification.

There now about an hour left of the parity check and now this shows up in the syslog "Mar 24 16:00:31 Tower kernel: NTFS driver 2.1.30 [Flags: R/W MODULE]".

What does that mean? I have not never seen that before.

It was in the OP syslog. Since it says NTFS driver I think it is safe to assume it is unrelated to your unRAID array, which is reiserFS

CaptainSpalding · March 24, 2014

Ok, huh, thanks!

SSD · March 24, 2014

Rebuild worked? Back in business?

CaptainSpalding · March 24, 2014

Yes, thank you!

BTW, if there would be a write problem during rebuild (write error on data disk) would the parity also get unvalid?

Can you tell me why is the parity check so low with the hardware on my sig, there are 2-3TB drives if that's the case. But my new build with just 4 x 4TB drives (all SATA II ports) gets 100MB/s and this one does about half of that... why?

Is PCI-e x16 so bad with two SASLP's?

garycase · March 24, 2014

A parity check is limited to the speed of the slowest drives involved at any point in the check. It's not likely your controllers that are limiting the speed => you probably have a drive (or 2) with relatively low-density platters, whereas your new 4TB drives are probably all 1TB/platter units.

If you post the exact make/model of all of your drives, I can check the areal density of them and provide more specific details.

CaptainSpalding · March 24, 2014

The older 2TB drives are WD EARS and EARX. The EARS I think are the problem. But thats great news as I am replacing them with 4TB drives.

garycase · March 24, 2014

The older 2TB drives are WD EARS and EARX. The EARS I think are the problem. But thats great news as I am replacing them with 4TB drives.

Agree the issue is likely the EARS units. The early 2TB EARS drives were 500GB/platter (4 platters). Later units were 667GB/platter (3 platters). Either of those will be far slower than a modern 1TB/platter drive.

Note that when your parity check passes the 2TB point these drives are no longer involved. If you're watching it at that point you should see a notable increase in the speed.

CaptainSpalding · March 24, 2014

If you're watching it at that point you should see a notable increase in the speed.

That is correct.

[SOLVED] Hard drive failure or worse?

Recommended Posts

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Link to comment

Archived