What just happened? 4 billion reads, 25 billion writes and a red balled disk!?!


Recommended Posts

Just when I was heaving a sigh of relief on having rescued a failed array (read more about it here - http://lime-technology.com/forum/index.php?topic=26278.0), I was able to upgrade the parity disk from 2TB to 3TB. The parity reconstruction was successful and I had an array with 3 green balls, 800 odd errors on disk 2 and parity check was pending.

 

So I started the parity check and returned back in a couple hours to see this horrific sight.

 

mainuc.jpg

 

Uploaded with ImageShack.us

 

syslog attached

 

SMART Test reports:

 

smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build)

Copyright © 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net

 

Smartctl open device: /dev/sda failed: No such device

 

 

 

syslog-2013-03-10.zip

Link to comment

I'm a newbiew at unraid, but have been reading through the forums for the last six months or so trying to get familiar with the pitfalls.

 

I had a quick look at your syslog and noticed this line:

 

Mar 10 04:37:49 NAS kernel: ata3.00: HPA detected: current 3907027055, native 3907029168

 

I'd wait until someone more knowledgeable comes along, but is there a chance you have a gigabyte motherboard like this thread:

 

http://lime-technology.com/forum/index.php?topic=26432.0

 

Ed

Link to comment

Has the drive ever been attached to a Gigabyte MB?

 

It is possible. I would not rule that out.

 

Does the HPA relate to the drive that is showing as failed? Would the simplest way out be to replace the disk with a fresh 2TB (or preferably 3TB drive) and let the drive be rebuilt and finish off the matter?

 

If I plan to use the drive with HPA in some future unRaid build, what would be the process to get rid of the HPA once and for all?

Link to comment

Although the disk may have HPA, is there a chance that the sata cable to the drive could be loose.  Have you been into the case recently while fixing the parity disk and possibly dislodged the cable.

 

The following bit in the sys log shows the start of repeated attempts to connect to the drive:

 

Mar  9 22:18:04 NAS kernel: ata3.00: exception Emask 0x10 SAct 0x0 SErr 0x4810000 action 0xe frozen
Mar  9 22:18:04 NAS kernel: ata3.00: irq_stat 0x08400040, interface fatal error, connection status changed
Mar  9 22:18:04 NAS kernel: ata3: SError: { PHYRdyChg LinkSeq DevExch }
Mar  9 22:18:04 NAS kernel: ata3.00: failed command: READ DMA EXT
Mar  9 22:18:04 NAS kernel: ata3.00: cmd 25/00:00:d8:7f:27/00:04:1f:00:00/e0 tag 0 dma 524288 in
Mar  9 22:18:04 NAS kernel:          res 50/00:00:d7:7f:27/00:00:1f:00:00/e0 Emask 0x10 (ATA bus error)
Mar  9 22:18:04 NAS kernel: ata3.00: status: { DRDY }
Mar  9 22:18:04 NAS kernel: ata3: hard resetting link
Mar  9 22:18:10 NAS kernel: ata3: link is slow to respond, please be patient (ready=0)
Mar  9 22:18:04 NAS kernel: ata3: hard resetting link
Mar  9 22:18:10 NAS kernel: ata3: link is slow to respond, please be patient (ready=0)
Mar  9 22:18:12 NAS kernel: ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
Mar  9 22:18:12 NAS kernel: ata3.00: configured for UDMA/133
Mar  9 22:18:12 NAS kernel: ata3: EH complete

 

Which repeats until the link is downgraded from 3.0 Gbps to 1.5 Gbps until eventually

 

Mar 10 04:37:49 NAS kernel: ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Mar 10 04:37:49 NAS kernel: ata3.00: HPA detected: current 3907027055, native 3907029168
Mar 10 04:37:49 NAS kernel: ata3.00: ATA-8: WDC WD20EURS-63S48Y0, 51.0AB51, max UDMA/133
Mar 10 04:37:49 NAS kernel: ata3.00: 3907027055 sectors, multi 0: LBA48 NCQ (depth 31/32), AA
Mar 10 04:37:49 NAS kernel: ata3.00: configured for UDMA/133
Mar 10 04:37:49 NAS kernel: ata3: EH complete
Mar 10 04:37:49 NAS kernel: scsi 2:0:0:0: Direct-Access     ATA      WDC WD20EURS-63S 51.0 PQ: 0 ANSI: 5
Mar 10 04:37:49 NAS kernel: sd 2:0:0:0: Attached scsi generic sg0 type 0
Mar 10 04:37:49 NAS kernel: sd 2:0:0:0: [sdf] 3907027055 512-byte logical blocks: (2.00 TB/1.81 TiB)
Mar 10 04:37:49 NAS kernel: sd 2:0:0:0: [sdf] 4096-byte physical blocks
Mar 10 04:37:49 NAS kernel: sd 2:0:0:0: [sdf] Write Protect is off
Mar 10 04:37:49 NAS kernel: sd 2:0:0:0: [sdf] Mode Sense: 00 3a 00 00
Mar 10 04:37:49 NAS kernel: sd 2:0:0:0: [sdf] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
Mar 10 04:37:49 NAS kernel:  sdf: sdf1
Mar 10 04:37:49 NAS kernel: sd 2:0:0:0: [sdf] Attached SCSI disk
Mar 10 04:41:04 NAS kernel: ata3: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen
Mar 10 04:41:04 NAS kernel: ata3: irq_stat 0x00400040, connection status changed
Mar 10 04:41:04 NAS kernel: ata3: SError: { PHYRdyChg DevExch }
Mar 10 04:41:04 NAS kernel: ata3: hard resetting link
Mar 10 04:41:10 NAS kernel: ata3: link is slow to respond, please be patient (ready=0)
Mar 10 04:41:13 NAS kernel: ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
Mar 10 04:41:13 NAS kernel: ata3.00: configured for UDMA/133
Mar 10 04:41:13 NAS kernel: ata3: EH complete
Mar 10 04:41:13 NAS kernel: sdf: detected capacity change from 0 to 2000397852160
Mar 10 05:11:53 NAS kernel: md: disk2 read error
Mar 10 05:11:53 NAS kernel: handle_stripe read error: 3625342960/2, count: 1
Mar 10 05:11:53 NAS kernel: REISERFS error (device md2): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [7885 12180 0x0 SD]
Mar 10 05:11:53 NAS kernel: REISERFS (device md2): Remounting filesystem read-only
Mar 10 05:11:53 NAS kernel: REISERFS error (device md2): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [7885 12180 0x0 SD]
Mar 10 05:11:53 NAS kernel: md: disk2 read error

 

The kernel picks up a read error. 

 

I would check the condition of the sata cables as well before doing anything else.

 

HTH

Ed

Link to comment

Now is this strange or what? I can access the "disabled" disk, ie disk2 as a Windows share and even access files on the disk/ open them?!? But in Main the disk is red-balled!

That will because unRAID can emulate one missing disk using the parity disk and the remaining data disks.  However that means you have no redundancy left, so if a second disk fails before you have recovered the current bad disk this will no longer be true so you want to recover this 'bad' disk as soon as you can.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.