Jump to content

MyMain Smart Errors, Need I be Worried?


SuperW2

Recommended Posts

Posted

Having some random reboots, and not real sure why... doubt specifically related to HD's but wanted to get opinion if I should worry about these errors (or which are most worry some).  I have replaced all SATA Cables with brand new locking ones about 6 months or so ago.  My MyMain Smart Screenshot is as below!

 

My Syslog indicate some errors but I'm not 100% which drive is being reported (Full Syslog file attached)

 

Sep  7 17:25:04 Media kernel: ata11.00: exception Emask 0x10 SAct 0x1 SErr 0x780100 action 0x6 (Errors)

Sep  7 17:25:04 Media kernel: ata11.00: irq_stat 0x08000000 (Drive related)

Sep  7 17:25:04 Media kernel: ata11: SError: { UnrecovData 10B8B Dispar BadCRC Handshk } (Errors)

Sep  7 17:25:04 Media kernel: ata11.00: failed command: READ FPDMA QUEUED (Minor Issues)

Sep  7 17:25:04 Media kernel: ata11.00: cmd 60/08:00:00:00:00/00:00:00:00:00/40 tag 0 ncq 4096 in (Drive related)

Sep  7 17:25:04 Media kernel:          res 40/00:00:00:00:00/00:00:00:00:00/40 Emask 0x10 (ATA bus error) (Errors)

Sep  7 17:25:04 Media kernel: ata11.00: status: { DRDY } (Drive related)

Sep  7 17:25:04 Media kernel: ata11: hard resetting link (Minor Issues)

Sep  7 17:25:04 Media kernel: ata11: SATA link up 3.0 Gbps (SStatus 123 SControl 300) (Drive related)

Sep  7 17:25:04 Media kernel: ata11.00: configured for UDMA/133 (Drive related)

Sep  7 17:25:04 Media kernel: ata11: EH complete (Drive related)

Sep  7 17:25:04 Media kernel: ata11: limiting SATA link speed to 1.5 Gbps (Drive related)

Sep  7 17:25:04 Media kernel: ata11.00: exception Emask 0x0 SAct 0x1 SErr 0x980000 action 0x6 frozen (Errors)

Sep  7 17:25:04 Media kernel: ata11: SError: { 10B8B Dispar LinkSeq } (Errors)

Sep  7 17:25:04 Media kernel: ata11.00: failed command: READ FPDMA QUEUED (Minor Issues)

Sep  7 17:25:04 Media kernel: ata11.00: cmd 60/08:00:00:00:00/00:00:00:00:00/40 tag 0 ncq 4096 in (Drive related)

Sep  7 17:25:04 Media kernel:          res 40/00:00:00:00:00/00:00:00:00:00/40 Emask 0x4 (timeout) (Errors)

 

9-7-2011%2525205-29-19%252520PM.jpg

syslog-2011-09-07.txt

Posted

Looking at your smart view, you have some serious drive issues going on:

 

More Serious

 

disk7 - reallocated_sector_ct=181

disk12 - reallocated_sector_ct=70

disk14 - reallocated_sector_ct=39

 

Less Serious

 

disk4 - reallocated_sector_ct=9

disk5 - current_pending_sector=3

disk6 - reported_uncorrect=9 / ata_error_count=9

disk9 - reallocated_sector_ct=1

disk13 - reallocated_sector_ct=1

disk16 - HPA?

disk18 - HPA?

 

Reallocated Sectors / Current_Pending_Sectors

---------------------------------------------

 

Reallocated sectors indicate that the drive has detected that a sector is flawed and not able to store data, so it has mapped a spare sector in its place.  This is a good thing.  Problem is often once sectors start going bad and being remapped, more sectors go bad, and the next thing you know all of the spare sectors are used up and the drive is toast.

 

But sometimes a few bad sectors are just a few bad sectors, and subsequent reallocations do not occur.  If this is the case the drive is fine.

 

A current_pending_sector is a sector that has been identified for future reallocation.  Normally these sectors get reallocated as the drive is used.

 

You need to run some parity checks and see if the number of reallocated sectors and/or pending sectors increase with each check.  If you can't run 3 parity checks in a row and have the reallocated sectors not increase on a drive, you should RMA that drive.

 

Ata_Error_Count and syslog errors

----------------------------------

 

There errors usually indicate some type of cabling issue.  I would resecure the cables on this drive.

 

The ata11 from your syslog is actually your cache disk.

 

HPA

---

 

HPA is a not a disk error, but indicates that the BIOS has used a trick to reduce (very slightly) the size of your disk and used it to keep a backup of your BIOS settings.  For technical reasons, HPAs can cause problem with newer versions of unRAID.

 

Fortunately, I don't believe you actually have HPAs - this looks like a false positive.  myMain does not have a reference value for a 360G drive (I have actually never heard of this size before).  So I would not worry about these unless you are using a Gigabyte motherboard.

 

If you send me a screenshot of the "Details" myMain view, I can add the reference value for a 360G drive and you will not get the false positives in the future.

 

Posted

Thanks for the reply bjp999... Details screenshot sent... the 360GB's I think were an OEM drives... got these from a friend...  They are on the next to be updated list (unless I need to replace some of the failures for the larger drives first).

 

I'm not sure how long the ATA_Error Counts have been going on... it's possible/likely that this was occurring before the last SATA Cable swap (is there a way to determine a date/time stamp for the errors).

 

I'll run through a few parity checks over the next few days and see if any of my reallocated sector #'s increase...

Posted

I started my first Partiy Check and it started puking all over ATA4 (and I hear clicking)... how exactly do I find which drive this is?

 

Sep  7 21:44:21 Media kernel:          res 40/00:58:47:5b:f3/00:00:00:00:00/40 Emask 0x10 (ATA bus error) (Errors)

Sep  7 21:44:21 Media kernel: ata4.00: status: { DRDY } (Drive related)

Sep  7 21:44:21 Media kernel: ata4.00: failed command: READ FPDMA QUEUED (Minor Issues)

Sep  7 21:44:21 Media kernel: ata4.00: cmd 60/08:40:5f:56:f3/00:00:00:00:00/40 tag 8 ncq 4096 in (Drive related)

Sep  7 21:44:21 Media kernel:          res 40/00:58:47:5b:f3/00:00:00:00:00/40 Emask 0x10 (ATA bus error) (Errors)

Sep  7 21:44:21 Media kernel: ata4.00: status: { DRDY } (Drive related)

Sep  7 21:44:21 Media kernel: ata4.00: failed command: READ FPDMA QUEUED (Minor Issues)

Sep  7 21:44:21 Media kernel: ata4.00: cmd 60/08:48:0f:5b:f3/00:00:00:00:00/40 tag 9 ncq 4096 in (Drive related)

Sep  7 21:44:21 Media kernel:          res 40/00:58:47:5b:f3/00:00:00:00:00/40 Emask 0x10 (ATA bus error) (Errors)

Sep  7 21:44:21 Media kernel: ata4.00: status: { DRDY } (Drive related)

Sep  7 21:44:21 Media kernel: ata4.00: failed command: READ FPDMA QUEUED (Minor Issues)

Sep  7 21:44:21 Media kernel: ata4.00: cmd 60/08:50:3f:5b:f3/00:00:00:00:00/40 tag 10 ncq 4096 in (Drive related)

Sep  7 21:44:21 Media kernel:          res 40/00:58:47:5b:f3/00:00:00:00:00/40 Emask 0x10 (ATA bus error) (Errors)

Sep  7 21:44:21 Media kernel: ata4.00: status: { DRDY } (Drive related)

Sep  7 21:44:21 Media kernel: ata4.00: failed command: READ FPDMA QUEUED (Minor Issues)

Sep  7 21:44:21 Media kernel: ata4.00: cmd 60/08:58:47:5b:f3/00:00:00:00:00/40 tag 11 ncq 4096 in (Drive related)

Sep  7 21:44:21 Media kernel:          res 40/00:58:47:5b:f3/00:00:00:00:00/40 Emask 0x10 (ATA bus error) (Errors)

Sep  7 21:44:21 Media kernel: ata4.00: status: { DRDY } (Drive related)

Sep  7 21:44:21 Media kernel: ata4: hard resetting link (Minor Issues)

Posted

You have seven HDs with "past failure" attribute set.

 

For Seagate HDs this means that they were run at above 55 deg.C in the past.

 

You probably should take a good look at your cooling especially when doing the "few parity checks" as this is the time when you will have the highest temperatures.

Posted

You have seven HDs with "past failure" attribute set.

 

For Seagate HDs this means that they were run at above 55 deg.C in the past.

 

You probably should take a good look at your cooling especially when doing the "few parity checks" as this is the time when you will have the highest temperatures.

 

Thanks for the reply... I did have a case fan issue that I hope to be resolved.  But i'm also thinking my not well planned build strategy which includes (20) 7200rpm drives stuffed into 4 IcyDock 5in4 cages in a big case that lives in my office closest wasn't the most prudent one.

 

As I upgrade/swap drives, the new plan includes 5900 RPM drives to replace the 7200, and the case now has about 8x80mm and 2x120mm fans to hopefully move the air around better.

Posted

Just finished my 3rd consecutive new Parity Check since posting this thead... I saw an increase in 1 high fly writes on my Parity Drive, 1 reallocated sector count on Disk 7 and 6 UDMA CRD Error Counts on my Cache Disk, all on the first check and nothing new on the subsequent 2nd or 3rd check.

 

See Spreadsheet below.

 

9-9-2011%2525203-48-02%252520PM.jpg

 

Posted

Just finished my 3rd consecutive new Parity Check since posting this thead... I saw an increase in 1 high fly writes on my Parity Drive, 1 reallocated sector count on Disk 7 and 6 UDMA CRD Error Counts on my Cache Disk, all on the first check and nothing new on the subsequent 2nd or 3rd check.

 

See Spreadsheet below.

 

9-9-2011%2525203-48-02%252520PM.jpg

 

 

Good that the reallocated sectors are not increasing. You should continue to monitor it on next few routine parity checks.  If they hold steady, you should be fine.  But if the values are creeping upward on certain drives, I'd RMA them.

 

I wouldn't worrry too much about the high fly writes or the crc error count unless the normalized values start to approach the threshold.

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...