extremeaudio Posted March 10, 2013 Share Posted March 10, 2013 Just when I was heaving a sigh of relief on having rescued a failed array (read more about it here - http://lime-technology.com/forum/index.php?topic=26278.0), I was able to upgrade the parity disk from 2TB to 3TB. The parity reconstruction was successful and I had an array with 3 green balls, 800 odd errors on disk 2 and parity check was pending. So I started the parity check and returned back in a couple hours to see this horrific sight. Uploaded with ImageShack.us syslog attached SMART Test reports: smartctl 5.40 2010-10-16 r3189 [i486-slackware-linux-gnu] (local build) Copyright © 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net Smartctl open device: /dev/sda failed: No such device syslog-2013-03-10.zip Quote Link to comment
fastedd Posted March 10, 2013 Share Posted March 10, 2013 I'm a newbiew at unraid, but have been reading through the forums for the last six months or so trying to get familiar with the pitfalls. I had a quick look at your syslog and noticed this line: Mar 10 04:37:49 NAS kernel: ata3.00: HPA detected: current 3907027055, native 3907029168 I'd wait until someone more knowledgeable comes along, but is there a chance you have a gigabyte motherboard like this thread: http://lime-technology.com/forum/index.php?topic=26432.0 Ed Quote Link to comment
extremeaudio Posted March 10, 2013 Author Share Posted March 10, 2013 Yes, I have seen that line appearing frequently in my logs. But no, I don't have a Gigabyte board. As shown in my signature, it is an Intel board. Upgraded from an older MSI board on which I had fist based this particular unRaid build. Quote Link to comment
dgaschk Posted March 10, 2013 Share Posted March 10, 2013 Has the drive ever been attached to a Gigabyte MB? Quote Link to comment
fastedd Posted March 10, 2013 Share Posted March 10, 2013 Fair enough. Although its not a gigabyte, to these ignorant eyes it looks very much like you have a HPA issue: http://lime-technology.com/forum/index.php?topic=10858.0 the numbers match the signature as does the 2113 sectors difference. However I would still wait until someone more knowledgeable replies, sorry. Ed Quote Link to comment
extremeaudio Posted March 10, 2013 Author Share Posted March 10, 2013 Has the drive ever been attached to a Gigabyte MB? It is possible. I would not rule that out. Does the HPA relate to the drive that is showing as failed? Would the simplest way out be to replace the disk with a fresh 2TB (or preferably 3TB drive) and let the drive be rebuilt and finish off the matter? If I plan to use the drive with HPA in some future unRaid build, what would be the process to get rid of the HPA once and for all? Quote Link to comment
dgaschk Posted March 10, 2013 Share Posted March 10, 2013 Search the forum for HPA. Post a SMART sport for the drive. Quote Link to comment
fastedd Posted March 11, 2013 Share Posted March 11, 2013 Although the disk may have HPA, is there a chance that the sata cable to the drive could be loose. Have you been into the case recently while fixing the parity disk and possibly dislodged the cable. The following bit in the sys log shows the start of repeated attempts to connect to the drive: Mar 9 22:18:04 NAS kernel: ata3.00: exception Emask 0x10 SAct 0x0 SErr 0x4810000 action 0xe frozen Mar 9 22:18:04 NAS kernel: ata3.00: irq_stat 0x08400040, interface fatal error, connection status changed Mar 9 22:18:04 NAS kernel: ata3: SError: { PHYRdyChg LinkSeq DevExch } Mar 9 22:18:04 NAS kernel: ata3.00: failed command: READ DMA EXT Mar 9 22:18:04 NAS kernel: ata3.00: cmd 25/00:00:d8:7f:27/00:04:1f:00:00/e0 tag 0 dma 524288 in Mar 9 22:18:04 NAS kernel: res 50/00:00:d7:7f:27/00:00:1f:00:00/e0 Emask 0x10 (ATA bus error) Mar 9 22:18:04 NAS kernel: ata3.00: status: { DRDY } Mar 9 22:18:04 NAS kernel: ata3: hard resetting link Mar 9 22:18:10 NAS kernel: ata3: link is slow to respond, please be patient (ready=0) Mar 9 22:18:04 NAS kernel: ata3: hard resetting link Mar 9 22:18:10 NAS kernel: ata3: link is slow to respond, please be patient (ready=0) Mar 9 22:18:12 NAS kernel: ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Mar 9 22:18:12 NAS kernel: ata3.00: configured for UDMA/133 Mar 9 22:18:12 NAS kernel: ata3: EH complete Which repeats until the link is downgraded from 3.0 Gbps to 1.5 Gbps until eventually Mar 10 04:37:49 NAS kernel: ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Mar 10 04:37:49 NAS kernel: ata3.00: HPA detected: current 3907027055, native 3907029168 Mar 10 04:37:49 NAS kernel: ata3.00: ATA-8: WDC WD20EURS-63S48Y0, 51.0AB51, max UDMA/133 Mar 10 04:37:49 NAS kernel: ata3.00: 3907027055 sectors, multi 0: LBA48 NCQ (depth 31/32), AA Mar 10 04:37:49 NAS kernel: ata3.00: configured for UDMA/133 Mar 10 04:37:49 NAS kernel: ata3: EH complete Mar 10 04:37:49 NAS kernel: scsi 2:0:0:0: Direct-Access ATA WDC WD20EURS-63S 51.0 PQ: 0 ANSI: 5 Mar 10 04:37:49 NAS kernel: sd 2:0:0:0: Attached scsi generic sg0 type 0 Mar 10 04:37:49 NAS kernel: sd 2:0:0:0: [sdf] 3907027055 512-byte logical blocks: (2.00 TB/1.81 TiB) Mar 10 04:37:49 NAS kernel: sd 2:0:0:0: [sdf] 4096-byte physical blocks Mar 10 04:37:49 NAS kernel: sd 2:0:0:0: [sdf] Write Protect is off Mar 10 04:37:49 NAS kernel: sd 2:0:0:0: [sdf] Mode Sense: 00 3a 00 00 Mar 10 04:37:49 NAS kernel: sd 2:0:0:0: [sdf] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA Mar 10 04:37:49 NAS kernel: sdf: sdf1 Mar 10 04:37:49 NAS kernel: sd 2:0:0:0: [sdf] Attached SCSI disk Mar 10 04:41:04 NAS kernel: ata3: exception Emask 0x10 SAct 0x0 SErr 0x4010000 action 0xe frozen Mar 10 04:41:04 NAS kernel: ata3: irq_stat 0x00400040, connection status changed Mar 10 04:41:04 NAS kernel: ata3: SError: { PHYRdyChg DevExch } Mar 10 04:41:04 NAS kernel: ata3: hard resetting link Mar 10 04:41:10 NAS kernel: ata3: link is slow to respond, please be patient (ready=0) Mar 10 04:41:13 NAS kernel: ata3: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Mar 10 04:41:13 NAS kernel: ata3.00: configured for UDMA/133 Mar 10 04:41:13 NAS kernel: ata3: EH complete Mar 10 04:41:13 NAS kernel: sdf: detected capacity change from 0 to 2000397852160 Mar 10 05:11:53 NAS kernel: md: disk2 read error Mar 10 05:11:53 NAS kernel: handle_stripe read error: 3625342960/2, count: 1 Mar 10 05:11:53 NAS kernel: REISERFS error (device md2): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [7885 12180 0x0 SD] Mar 10 05:11:53 NAS kernel: REISERFS (device md2): Remounting filesystem read-only Mar 10 05:11:53 NAS kernel: REISERFS error (device md2): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [7885 12180 0x0 SD] Mar 10 05:11:53 NAS kernel: md: disk2 read error The kernel picks up a read error. I would check the condition of the sata cables as well before doing anything else. HTH Ed Quote Link to comment
extremeaudio Posted March 11, 2013 Author Share Posted March 11, 2013 SMART Test Report attached SMART_11_03_13.txt Quote Link to comment
extremeaudio Posted March 11, 2013 Author Share Posted March 11, 2013 New syslog attached. Now is this strange or what? I can access the "disabled" disk, ie disk2 as a Windows share and even access files on the disk/ open them?!? But in Main the disk is red-balled! syslog-2013-03-11_1.txt Quote Link to comment
itimpi Posted March 11, 2013 Share Posted March 11, 2013 Now is this strange or what? I can access the "disabled" disk, ie disk2 as a Windows share and even access files on the disk/ open them?!? But in Main the disk is red-balled! That will because unRAID can emulate one missing disk using the parity disk and the remaining data disks. However that means you have no redundancy left, so if a second disk fails before you have recovered the current bad disk this will no longer be true so you want to recover this 'bad' disk as soon as you can. Quote Link to comment
extremeaudio Posted March 11, 2013 Author Share Posted March 11, 2013 Yeah I realized after typing that post that it must be the virtual business keeping up the shares and files. What should I do next? Quote Link to comment
dgaschk Posted March 13, 2013 Share Posted March 13, 2013 http://lime-technology.com/wiki/index.php/Troubleshooting#What_do_I_do_if_I_get_a_red_ball_next_to_a_hard_disk.3F Quote Link to comment
jtech007 Posted March 13, 2013 Share Posted March 13, 2013 This happen to me the other day when I forgot to stop the Array before I pulled a drive out of the hot swap tray. Put it back in and ran a parity sync and all was back to normal. Sounds like your problem might be a bit different. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.