Lots of errors first parity w/ new drive


Recommended Posts

I am a bit of a noob with this stuff, so any help would be appreciated.  I am running RC3 with Simple Features and Unmenu if that helps.  Last night, my server ran its first auto parity check (scheduled by Simple Features) since the installation of the new drive (reality is old drive from old freenas server).  Prior to this, the new drive (sdd) had 0 errors reported, now it has 769.  You can see them in the attached log with the 18th starting after 3am.  I don't understand the log enough to figure what the problem may be, bad cable, bad setting, drive dying?  Any insight, or suggestions of checks to run would be greatly appreciated.  Part 1 is the start of the issue, part 2 is the later part where it stopped... 

syslog1.txt

Link to comment

UNC Media Errors are unreadable sectors on a disk.  (in this case disk2)

Jun 18 03:28:42 Tower kernel: handle_stripe read error: 1938867064/2, count: 1

Jun 18 03:28:42 Tower kernel: md: disk2 read error

Jun 18 03:28:42 Tower kernel: handle_stripe read error: 1938867072/2, count: 1

Jun 18 03:28:42 Tower kernel: md: disk2 read error

Jun 18 03:28:42 Tower kernel: handle_stripe read error: 1938867080/2, count: 1

Jun 18 03:28:42 Tower kernel: md: disk2 read error

Jun 18 03:28:42 Tower kernel: handle_stripe read error: 1938867088/2, count: 1

Jun 18 03:28:50 Tower kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

Jun 18 03:28:50 Tower kernel: ata3.00: irq_stat 0x40000001

Jun 18 03:28:50 Tower kernel: ata3.00: failed command: READ DMA EXT

Jun 18 03:28:50 Tower kernel: ata3.00: cmd 25/00:30:48:cf:90/00:03:73:00:00/e0 tag 0 dma 417792 in

Jun 18 03:28:50 Tower kernel:          res 51/40:af:c0:cf:90/00:02:73:00:00/e0 Emask 0x9 (media error)

Jun 18 03:28:50 Tower kernel: ata3.00: status: { DRDY ERR }

Jun 18 03:28:50 Tower kernel: ata3.00: error: { UNC }

Jun 18 03:28:50 Tower kernel: ata3.00: configured for UDMA/133

Jun 18 03:28:50 Tower kernel: ata3: EH complete

Jun 18 03:28:56 Tower kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

Jun 18 03:28:56 Tower kernel: ata3.00: irq_stat 0x40000001

Jun 18 03:28:56 Tower kernel: ata3.00: failed command: READ DMA EXT

Jun 18 03:28:56 Tower kernel: ata3.00: cmd 25/00:00:78:d2:90/00:04:73:00:00/e0 tag 0 dma 524288 in

Jun 18 03:28:56 Tower kernel:          res 51/40:df:90:d3:90/00:02:73:00:00/e0 Emask 0x9 (media error)

Jun 18 03:28:56 Tower kernel: ata3.00: status: { DRDY ERR }

Jun 18 03:28:56 Tower kernel: ata3.00: error: { UNC }

Jun 18 03:28:56 Tower kernel: ata3.00: configured for UDMA/133

Jun 18 03:28:56 Tower kernel: ata3: EH complete

Jun 18 03:29:05 Tower kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x0

Jun 18 03:29:05 Tower kernel: ata3.00: irq_stat 0x40000001

Jun 18 03:29:05 Tower kernel: ata3.00: failed command: READ DMA EXT

Jun 18 03:29:05 Tower kernel: ata3.00: cmd 25/00:00:78:e6:90/00:04:73:00:00/e0 tag 0 dma 524288 in

Jun 18 03:29:05 Tower kernel:          res 51/40:af:c0:e7:90/00:02:73:00:00/e0 Emask 0x9 (media error)

 

as requested, post a smart report for disk2.

Link to comment

The disk is suffering read errors it needs to be rebuilt. The safest thing would be to replace with a pre-cleared spare and rebuild. Then pre-clear the faulty drive and see if it recovers. The quickest option is to rebuild the disk on to itself. After rebuilding perform a parity check to make sure the rebuild can be read.

Link to comment

Started 3 passes of preclear on the other drive that came from my freenas (pre-Unraid days  ;) ).  Should be ready to add into the server in a couple days.

 

I did copy all the files from Disk 2 to my local windows machine last night.  It was only 85GB on it.  When I install the new drive, is it still smartest to let the parity rebuild it, or is it ok just to clear, install, and copy the files back in from my windows machine?

 

How will I be able to tell if the current drive with the errors is OK to add back into my server later?  I ran 3 passes on that before installing it, and it passed without issues, or so I thought.  I am attaching the preclear results for Disk 2, did I miss something when I installed this drive?

preclear_finish__WD-WCAV55558002_2012-05-29.txt

Link to comment

These are the values to watch:

 

 5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       72
198 Offline_Uncorrectable   0x0030   200   200   000    Old_age   Offline      -       53

 

The Pending RAW value should go to zero after the disk is cleared.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.