what are these errors?


Recommended Posts

HI,

 

a bit scratching my head here

 

Mar 1 14:48:36 p5bplus kernel: ata9.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0 (Errors)

Mar 1 14:48:36 p5bplus kernel: ata9.00: irq_stat 0x40000001 (Drive related)

Mar 1 14:48:36 p5bplus kernel: ata9.00: failed command: READ DMA EXT (Minor Issues)

Mar 1 14:48:36 p5bplus kernel: ata9.00: cmd 25/00:a8:27:55:13/00:03:64:00:00/e0 tag 0 dma 479232 in (Drive related)

Mar 1 14:48:36 p5bplus kernel: res 51/40:17:b0:55:13/00:03:64:00:00/e0 Emask 0x9 (media error) (Errors)

Mar 1 14:48:36 p5bplus kernel: ata9.00: status: { DRDY ERR } (Drive related)

Mar 1 14:48:36 p5bplus kernel: ata9.00: error: { UNC } (Errors)

Mar 1 14:48:36 p5bplus kernel: ata9.00: configured for UDMA/133 (Drive related)

Mar 1 14:48:36 p5bplus kernel: ata9: EH complete (Drive related)

 

not sure what they are ...

changed a sata cable already to see if there is no issue with that ...  but the problem returns

it is one of the ports on my Mobo

I might be running on the treshold of my PSU capacities as it started after adding a disk ... and it was not this disk :P

 

syslog attached

 

running 5.0b6 but that is not related as i had them before too ... it just gets on my nerves now that it is filling up my logs :P

syslog-2011-03-01.zip

Link to comment

So what are media errors ?

 

run reiserfschk and this is the result

 

Do you want to run this program?[N/Yes] (note need to type Yes if you do):Yes

###########

reiserfsck --check started at Tue Mar  1 18:37:51 2011

###########

Replaying journal: Done.

Reiserfs journal '/dev/md2' in blocks [18..8211]: 0 transactions replayed

Checking internal tree.. finished

Comparing bitmaps..finished

Checking Semantic tree:

finished

No corruptions found

There are on the filesystem:

        Leaves 229750

        Internal nodes 1478

        Directories 459

        Other files 2277

        Data block pointers 232091559 (0 of them are zero)

        Safe links 0

###########

reiserfsck finished at Tue Mar  1 18:54:39 2011

###########

root@p5bplus:~#

what else can i do ?

 

Link to comment

ran a short smart report

 

root@p5bplus:~# smartctl  -a  -d  ata  /dev/sdj
smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

=== START OF INFORMATION SECTION ===
Model Family:     Western Digital Caviar Green (Adv. Format) family
Device Model:     WDC WD10EARS-00MVWB0
Serial Number:    WD-WCAZA0014151
Firmware Version: 50.0AB50
User Capacity:    1,000,204,886,016 bytes
Device is:        In smartctl database [for details use: -P show]
ATA Version is:   8
ATA Standard is:  Exact ATA specification draft version not indicated
Local Time is:    Tue Mar  1 19:03:58 2011 ICT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x85) Offline data collection activity
                                        was aborted by an interrupting command from host.
                                        Auto Offline Data Collection: Enabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                 (18300) seconds.
Offline data collection
capabilities:                    (0x7b) SMART execute Offline immediate.
                                        Auto Offline data collection on/off support.
                                        Suspend Offline collection upon new
                                        command.
                                        Offline surface scan supported.
                                        Self-test supported.
                                        Conveyance Self-test supported.
                                        Selective Self-test supported.
SMART capabilities:            (0x0003) Saves SMART data before entering
                                        power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        ( 211) minutes.
Conveyance self-test routine
recommended polling time:        (   5) minutes.
SCT capabilities:              (0x3035) SCT Status supported.
                                        SCT Feature Control supported.
                                        SCT Data Table supported.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x002f   188   188   051    Pre-fail  Always       -       12249
  3 Spin_Up_Time            0x0027   174   163   021    Pre-fail  Always       -       6291
  4 Start_Stop_Count        0x0032   099   099   000    Old_age   Always       -       1026
  5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
  7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
  9 Power_On_Hours          0x0032   096   096   000    Old_age   Always       -       2995
10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       -       148
192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       116
193 Load_Cycle_Count        0x0032   192   192   000    Old_age   Always       -       25796
194 Temperature_Celsius     0x0022   115   083   000    Old_age   Always       -       35
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   192   192   000    Old_age   Always       -       1309
198 Offline_Uncorrectable   0x0030   199   198   000    Old_age   Offline      -       196
199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
200 Multi_Zone_Error_Rate   0x0008   192   184   000    Old_age   Offline      -       2375

SMART Error Log Version: 1
ATA Error Count: 7 (device log contains only the most recent five errors)
        CR = Command Register [HEX]
        FR = Features Register [HEX]
        SC = Sector Count Register [HEX]
        SN = Sector Number Register [HEX]
        CL = Cylinder Low Register [HEX]
        CH = Cylinder High Register [HEX]
        DH = Device/Head Register [HEX]
        DC = Device Command Register [HEX]
        ER = Error register [HEX]
        ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed without error       00%      2995         -
# 2  Short offline       Interrupted (host reset)      10%      2992         -

SMART Selective self-test log data structure revision number 1
SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
    1        0        0  Not_testing
    2        0        0  Not_testing
    3        0        0  Not_testing
    4        0        0  Not_testing
    5        0        0  Not_testing
Selective self-test flags (0x0):
  After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.

root@p5bplus:~#

I don't see anything wrong

but i am a noob on drives

Link to comment

Media errors are un-readable sectors on a disk.

Your disk is basically failing... and badly.  There are over 1300 unreadable sectors marked for re-allocation when they are next written.

197 Current_Pending_Sector  0x0032   192   192   000    Old_age   Always       -       1309

 

You need to replace the disk.

 

DO NOT perform a parity CHECK, if you do it will be updated with the incorrect data from the un-readable sectors.

Stop the array, replace the drive, and then simply "Start" the array.

It will re-construct the correct contents from parity, as long as you do not overwrite parity by performing a "Check"

 

Joe L.

 

Link to comment

I might be running on the treshold of my PSU capacities as it started after adding a disk ... and it was not this disk :P

That power supply has multiple 12 volt rails, but ONLY 1 is used for all the  disks, and it is probably also shared by the motherboard.

 

That 1 rail is rated apparently at 19 Amps peak.

 

It is probably WAY overloaded, even if you have all green drives and figure only 2 amps per drive required.

 

You have about 33 Amps being drawn upon spin up by your disks.  Add to that the motherboard needs, and the fans... and I figure you need a 12 volt rail of 40 to 45 Amps capacity.

 

Link to comment

I have run a large array with an older 550 watt PSU that was reported to be multi-rail, but I found out later that internally it was single rail.  (There was a time that "multi-rail" had marketing appeal I guess).  But I agree that if this is really a multi-rail PSU internally, your PSU is underpowered.

 

I would be a little surprised that an underpowered PSU would cause read errors on this one disk.  I've always thought that an underpowered PSU would cause boot problems as disks could not get enough power to spin up (spinning up is the most power hungry operation your computer does).

 

There have been several cases that we've seen in the forums with pending sectors clearing themselves and all going back to zero after a parity check.  Although I agree with Joe L. that running a regular parity check could pollute your parity, I would suggest you run a read-only parity check to see if the pending sectors clear or become true reallocated sectors.  It would also be instructive to see if you get parity sync errors.

Link to comment

ok parity was already botched due to my fault .. so no saving that info :P

not a biggie ....

just wanted to know is it ok to run a preclear on a disk while parity sync is running

i removed disk 2 from the array and now i am trying to get a valid parity :P

before i add a new disk... might go for a wdd black 2gb for parity and try preclearing the old parity drive 

 

other psu is planned ... but want to go for a good one ... so need to save some money to buy a corsair tx 850 as eventually i want to go for a 20 drive setup

i have the icute case already standing here ...  but waiting for the new psu and for a solution to get 5 in 3 cages here in Thailand ...

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.