hardware issue?

December 26, 201114 yr

Please help with this... for now I've just posted the section of my log that I'm worried about (Every now and again I get a bunch of this kind of error showing), entire syslog is attached.

Parity checks are all clean.

Dec 22 00:13:37 RCNAS kernel: ata11.01: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 (Errors)
Dec 22 00:13:37 RCNAS kernel: ata11.01: BMDMA stat 0x64 (Drive related)

Dec 22 00:13:37 RCNAS kernel: ata11.01: failed command: READ DMA EXT (Minor Issues)

Dec 22 00:13:37 RCNAS kernel: ata11.01: cmd 25/00:00:17:ef:d2/00:02:2b:00:00/f0 tag 0 dma 262144 in (Drive related)

Dec 22 00:13:37 RCNAS kernel: res 51/40:00:2d:ef:d2/40:00:2b:00:00/f0 Emask 0x9 (media error) (Errors)

Dec 22 00:13:37 RCNAS kernel: ata11.01: status: { DRDY ERR } (Drive related)

Dec 22 00:13:37 RCNAS kernel: ata11.01: error: { UNC } (Errors)

Dec 22 00:13:38 RCNAS kernel: ata11.00: configured for UDMA/133 (Drive related)

Dec 22 00:13:38 RCNAS kernel: ata11.01: configured for UDMA/133 (Drive related)

and on and on... see attachment for full syslog.... (note the missing lines in the syslog are just the mover script logs - you don't need to see the types of files I keep do you?).

I believe that the ata11.01 is the cache drive.

Should I be worried? The cache drive is on the motherboard controller.

Disk devices
parity device: pci-0000:00:1f.2-scsi-1:0:1:0 host12 (sdj) WDC_WD20EARS-00MVWB0_WD-WMAZA3407269

disk1 device: pci-0000:01:00.0-scsi-1:0:0:0 host1 (sdb) WDC_WD10EACS-00D6B0_WD-WCAU40384147

disk2 device: pci-0000:00:1f.2-scsi-0:0:0:0 host11 (sdh) WDC_WD10EAVS-00D7B1_WD-WCAU46190122

disk3 device: pci-0000:01:00.0-scsi-2:0:0:0 host2 (sdc) WDC_WD10EADS-00L5B1_WD-WCAU46192923

disk4 device: pci-0000:01:00.0-scsi-3:0:0:0 host3 (sdd) WDC_WD10EADS-00M2B0_WD-WMAV50454466

disk5 device: pci-0000:02:00.0-scsi-1:0:0:0 host6 (sde) WDC_WD10EADS-00M2B0_WD-WMAV50297857

disk6 device: pci-0000:04:02.0-scsi-3:0:0:0 host10 (sdg) WDC_WD15EARS-00MVWB0_WD-WCAZA2550600

disk7 device: pci-0000:01:00.0-scsi-0:0:0:0 host0 (sda) WDC_WD20EARS-00MVWB0_WD-WMAZA3269017

disk8 device: unassigned

disk9 device: unassigned

disk10 device: unassigned

disk11 device: unassigned

disk12 device: unassigned

disk13 device: unassigned

disk14 device: unassigned

disk15 device: unassigned

disk16 device: unassigned

disk17 device: unassigned

disk18 device: unassigned

disk19 device: unassigned

disk20 device: unassigned

cache device: pci-0000:00:1f.2-scsi-0:0:1:0 host11 (sdi) WDC_WD1001FALS-00J7B0_WD-WMATV0910106

Thanks for your help!

syslog-2011-12-26.zip

Quote

December 26, 201114 yr

Author

Thought I'd add the smart report for the cache drive (the one with the possible issue, i assume?):

SMART status Info for /dev/sdi

smartctl version 5.38 [i486-slackware-linux-gnu] Copyright © 2002-8 Bruce Allen

Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===

Device Model: WDC WD1001FALS-00J7B0

Serial Number: WD-WMATV0910106

Firmware Version: 05.00K05

User Capacity: 1,000,204,886,016 bytes

Device is: Not in smartctl database [for details use: -P showall]

ATA Version is: 8

ATA Standard is: Exact ATA specification draft version not indicated

Local Time is: Mon Dec 26 16:09:08 2011 EST

SMART support is: Available - device has SMART capability.

SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===

SMART overall-health self-assessment test result: PASSED

General SMART Values:

Offline data collection status: (0x82) Offline data collection activity

was completed without error.

Auto Offline Data Collection: Enabled.

Self-test execution status: ( 0) The previous self-test routine completed

without error or no self-test has ever

been run.

Total time to complete Offline

data collection: (19200) seconds.

Offline data collection

capabilities: (0x7b) SMART execute Offline immediate.

Auto Offline data collection on/off support.

Suspend Offline collection upon new

command.

Offline surface scan supported.

Self-test supported.

Conveyance Self-test supported.

Selective Self-test supported.

SMART capabilities: (0x0003) Saves SMART data before entering

power-saving mode.

Supports SMART auto save timer.

Error logging capability: (0x01) Error logging supported.

General Purpose Logging supported.

Short self-test routine

recommended polling time: ( 2) minutes.

Extended self-test routine

recommended polling time: ( 221) minutes.

Conveyance self-test routine

recommended polling time: ( 5) minutes.

SCT capabilities: (0x303f) SCT Status supported.

SCT Feature Control supported.

SCT Data Table supported.

SMART Attributes Data Structure revision number: 16

Vendor Specific SMART Attributes with Thresholds:

ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE

1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 65

3 Spin_Up_Time 0x0027 236 232 021 Pre-fail Always - 8200

4 Start_Stop_Count 0x0032 097 097 000 Old_age Always - 3146

5 Reallocated_Sector_Ct 0x0033 199 199 140 Pre-fail Always - 3

7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0

9 Power_On_Hours 0x0032 068 068 000 Old_age Always - 23866

10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0

11 Calibration_Retry_Count 0x0032 100 253 000 Old_age Always - 0

12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always - 77

192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 13

193 Load_Cycle_Count 0x0032 199 199 000 Old_age Always - 3146

194 Temperature_Celsius 0x0022 117 109 000 Old_age Always - 33

196 Reallocated_Event_Count 0x0032 197 197 000 Old_age Always - 3

197 Current_Pending_Sector 0x0032 195 195 000 Old_age Always - 852

198 Offline_Uncorrectable 0x0030 200 197 000 Old_age Offline - 0

199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0

200 Multi_Zone_Error_Rate 0x0008 200 174 000 Old_age Offline - 0

SMART Error Log Version: 1

No Errors Logged

SMART Self-test log structure revision number 1

Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error

# 1 Short offline Completed without error 00% 16746 -

SMART Selective self-test log data structure revision number 1

SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS

1 0 0 Not_testing

2 0 0 Not_testing

3 0 0 Not_testing

4 0 0 Not_testing

5 0 0 Not_testing

Selective self-test flags (0x0):

After scanning selected spans, do NOT read-scan remainder of disk.

If Selective self-test is pending on power-up, resume after 0 minute delay.

Quote

December 26, 201114 yr

your drive is dying,

197 Current_Pending_Sector 0x0032 195 195 000 Old_age Always - 852

There are 852 unreadable sectors, pending re-allocation when next written to.

Time to RMA it. (those are the "media errors" in your first post)

Quote

December 27, 201114 yr

Author

What I was afraid you would say... is WD good to deal with for RMA? or should I just buy a new one and save myself the hassle?

Quote

December 27, 201114 yr

Author

Forget my last post... its out of warranty (I don't have receipt so they use the manufacture date).

I see Joe's rep on here is immaculate so he isn't likely to be disagreed with so Off to the store I go

Thanks again!

Quote

December 29, 201114 yr

Author

One more question Joe (or anyone) please...

So I replaced the drive with another spare I had to RMA it (turned out to be under warranty after all)... before sending it in I decided to do a pre-clear on it to see how many of the pending sectors would switch to reallocated.

Zero did.

after the pre-clear it said:

0 sectors are pending re-allocation at the end of the preclear a change of -852 in the number of sectors pending re-allocation.

Would this indicate the drive is actually ok?

What should I tell WD if I RMA it?

Thanks!

Quote

December 29, 201114 yr

One more question Joe (or anyone) please...

So I replaced the drive with another spare I had to RMA it (turned out to be under warranty after all)... before sending it in I decided to do a pre-clear on it to see how many of the pending sectors would switch to reallocated.

Zero did.

after the pre-clear it said:

0 sectors are pending re-allocation at the end of the preclear a change of -852 in the number of sectors pending re-allocation.

Would this indicate the drive is actually ok?

What should I tell WD if I RMA it?

Thanks!

Were they re-allocated? Or, were they successfully re-written in place?

if re-written in place, then I would suspect the drive OR the power supply. It simply re-allocated, then yes, RMA it.

You need to look now at a current SMART report for that drive.

Quote

December 29, 201114 yr

One more question Joe (or anyone) please...

So I replaced the drive with another spare I had to RMA it (turned out to be under warranty after all)... before sending it in I decided to do a pre-clear on it to see how many of the pending sectors would switch to reallocated.

Zero did.

after the pre-clear it said:

0 sectors are pending re-allocation at the end of the preclear a change of -852 in the number of sectors pending re-allocation.

Would this indicate the drive is actually ok?

What should I tell WD if I RMA it?

Thanks!

Were they re-allocated? Or, were they successfully re-written in place?

if re-written in place, then I would suspect the drive OR the power supply. It simply re-allocated, then yes, RMA it.

You need to look now at a current SMART report for that drive.

Just sorta jumping in here (following this thread for educational purposes) but isn't this something the pre-clear script should pick up? As in seeing that the reallocated sector count has now gone up? Or is the logic too difficult to script thus requiring a human to look at it? In which case, it might be a good idea to tell the user, "something changed, there are the possibilities, go check the SMART report." Or something to clue them in?

Quote

December 29, 201114 yr

Just sorta jumping in here (following this thread for educational purposes) but isn't this something the pre-clear script should pick up? As in seeing that the reallocated sector count has now gone up? Or is the logic too difficult to script thus requiring a human to look at it? In which case, it might be a good idea to tell the user, "something changed, there are the possibilities, go check the SMART report." Or something to clue them in?

It would have.... but marcusone elected to only post one line from the final report, and not the entire report.

Therefore, we cannot tell, as our psychic skills are a bit rusty this late in the year.

I really have no way to tell how a manufacturer reacts when a specific drive is returned. I've seen people return a drive with only a few re-allocated sectors. I honestly doubt the manufacturers have the time to verify the returned drives when in an RMA process. They would just rather you not return a working drive.

If you have doubt, RMA a drive, especially if it had over 800 sectors it apparently either re-allocated because they could not be read, or re-written in place because they were not able to be read when written the first time. (800 sectors would probably not cause a SMART failure, as most drives have several thousand spare sectors, but it is a certain clue that more sectors will fail early in the drive's life)

Quote

December 29, 201114 yr

Just sorta jumping in here (following this thread for educational purposes) but isn't this something the pre-clear script should pick up? As in seeing that the reallocated sector count has now gone up? Or is the logic too difficult to script thus requiring a human to look at it? In which case, it might be a good idea to tell the user, "something changed, there are the possibilities, go check the SMART report." Or something to clue them in?

It would have.... but marcusone elected to only post one line from the final report, and not the entire report.

Fair enough, I just figured if there had been a blinking, flashing, screaming, bolded, airplane-towed banner in the report he would have included it. As such I assumed it was either not there, or just slightly more subtle

Quote

December 29, 201114 yr

Author

Sorry here is the preclear reports

preclear_reports.zip

Quote

December 29, 201114 yr

Author

So how do I determine if its the power supply or the hard drive?

I'm using the same power supply as the LimeTech built rigs have. "Corsair CMPSU-650TX 650W ATX12V / EPS12V" which I put in not even a year ago.

Quote

December 29, 201114 yr

These lines summed it up:

No SMART attributes are FAILING_NOW

852 sectors were pending re-allocation before the start of the preclear.

852 sectors were pending re-allocation after pre-read in cycle 1 of 1.

0 sectors were pending re-allocation after zero of disk in cycle 1 of 1.

0 sectors are pending re-allocation at the end of the preclear,

a change of -852 in the number of sectors pending re-allocation.

3 sectors had been re-allocated before the start of the preclear.

3 sectors are re-allocated at the end of the preclear,

the number of sectors re-allocated did not change.

So, every sector that could not be read and were pending re-allocation were able to be read once re-written in place.

Your power supply has a single 52 Ampere 12 volt rail, so its capacity should be OK. That leaves temperature, vibration, poor quality voltage regulation (bad power supply splitters, back-plane, etc) or a disk sensitive to environmental factors.

Was the disk used in another PC first? How did it get 852 unreadable sectors? It appears as if they were marked as un-readable in a prior use? Perhaps the disk is fine in the unRAID server, but horrible in its prior use?

Joe L.

Quote

December 29, 201114 yr

Author

It has been the cache drive in the unraid box for 6+ months (I did a preclear before I put it in, and didn't have those 800+ pending then)... I think it always had the 3 "bad" sectors it still reports.

Can dust cause an issue? it was a little dusty when I pulled it out (cleaned it and all the filters in the case before doing the preclear that you now have the reports for).

I'll check my power splitters; but if I remember correctly, don't use any (all direct from power supply to drive or hotswap cage).

The drive I replaced it with and I'm now using for a cache drive is in the same hotswap bay (so if its the back plane of the hotswap bay it should cause that drive to have issues... in theory anyways?).

Temp never goes above 33 in the case that the drive is normally in (basement with fans running over all the hard drives).

Thanks for your input Joe... I love how active you are with unraid!

Quote

hardware issue?

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)