Can't rebuild array... :-(


Recommended Posts

I've been having ongoing problems with my Unraid for a number of months now. It seems to be one problem after another. I'm getting quite disheartened by it all.  :(

 

A few days ago I had a drive show up with errors and turn red in the Disk status page. I tried unnassigning it, rebooting and then ressigning it, but the rebuild failed. I had another 1.5tb drive handy that I had been intending to stick in to increase the size of the array, so I put it in as the replacement. As soon as I assigned the new drive, everything slowed down (including the web interface). The first rebuild attempt took 2 days to get to 40% before failing. Subsequent attempts have failed almost straight away. I had two sata cards in my machine, with two drives on each. I wondered if one of them might have been faulty, and so took the one with the failing drive on it out, and put those two drives onto the remaining card. This didn't improve matters. The syslog can be viewed here: http://pastebin.com/m4f06c9fe

 

I love Unraid, and it's been the heavily used centre of my home network for quite some time now. Unfortunately, I know very little about how it actually works, and as my stored data creeps towards 8tb I become more and more nervous of doing something catastrophic...  :o

 

Thanks in advance for any help.

Link to comment

 

A few days ago I had a drive show up with errors and turn red in the Disk status page. I tried unnassigning it, rebooting and then ressigning it,

 

It looks like the replacement disk you have from WD is the trouble maker.

 

(a) try to assign this disk to different control card or on-board sata port.

(b) Replace the sata cable.

© if both (a) & (b) doesn't help then replace this disk

 

---------------------------------------------------------------------------------------------------------------------------------------------------------

#

Oct 16 11:41:22 Tower kernel: ata3: link is slow to respond, please be patient (ready=-19)

#

Oct 16 11:41:22 Tower kernel: ata3: COMRESET failed (errno=-16)

#

Oct 16 11:41:22 Tower kernel: ata3: link is slow to respond, please be patient (ready=-19)

#

Oct 16 11:41:22 Tower kernel: ata3: COMRESET failed (errno=-16)

#

Oct 16 11:41:22 Tower kernel: ata3: link is slow to respond, please be patient (ready=-19)

#

Oct 16 11:41:22 Tower kernel: ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

#

Oct 16 11:41:22 Tower kernel: ata3.00: qc timeout (cmd 0x27)

#

Oct 16 11:41:22 Tower kernel: ata3.00: failed to read native max address (err_mask=0x5)

#

Oct 16 11:41:22 Tower kernel: ata3.00: HPA support seems broken, skipping HPA handling

#

Oct 16 11:41:22 Tower kernel: ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

#

Oct 16 11:41:22 Tower kernel: ata3.00: configured for UDMA/133

#

Oct 16 11:41:22 Tower kernel: scsi 2:0:0:0: Direct-Access    ATA      WDC WD15EADS-00S 04.0 PQ: 0 ANSI: 5

#

Oct 16 11:41:22 Tower kernel: sd 2:0:0:0: [sdd] 2930277168 512-byte hardware sectors: (1.50 TB/1.36 TiB)

#

Oct 16 11:41:22 Tower kernel: sd 2:0:0:0: [sdd] Write Protect is off

#

Oct 16 11:41:22 Tower kernel: sd 2:0:0:0: [sdd] Mode Sense: 00 3a 00 00

#

Oct 16 11:41:22 Tower kernel: sd 2:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

#

Oct 16 11:41:22 Tower kernel: sd 2:0:0:0: [sdd] 2930277168 512-byte hardware sectors: (1.50 TB/1.36 TiB)

#

Oct 16 11:41:22 Tower kernel: sd 2:0:0:0: [sdd] Write Protect is off

#

Oct 16 11:41:22 Tower kernel: sd 2:0:0:0: [sdd] Mode Sense: 00 3a 00 00

#

Oct 16 11:41:22 Tower kernel: sd 2:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA

 

#

Oct 16 11:42:14 Tower kernel: md: unRAID driver removed

#

Oct 16 11:42:14 Tower kernel: md: unRAID driver 0.95.2 installed

#

Oct 16 11:42:14 Tower kernel: md: xor using function: pIII_sse (10028.400 MB/sec)

#

Oct 16 11:42:14 Tower kernel: md: import disk0: [8,160] (sdk) SAMSUNG HD154UI                          S1Y6J1KS715154      offset: 63 size: 1465138552

#

Oct 16 11:42:14 Tower kernel: md: import disk1: [8,0] (sda) ST3750640AS                                          5QD0SR8T offset: 63 size: 732574552

#

Oct 16 11:42:14 Tower kernel: md: import disk2: [8,64] (sde) ST3750640AS                                          3QD0B3P9 offset: 63 size: 732574552

#

Oct 16 11:42:14 Tower kernel: md: import disk3: [8,48] (sdd) WDC WD15EADS-00S2B0                          WD-WCAVY0244487 offset: 63 size: 1465138552

#

Oct 16 11:42:14 Tower kernel: md: disk3 replaced

#

Oct 16 11:42:14 Tower kernel: md: import disk4: [8,112] (sdh) Hitachi HDT721010SLA360                        STF607MS03U7PK offset: 63 size: 976762552

#

Oct 16 11:42:14 Tower kernel: md: import disk5: [8,96] (sdg) SAMSUNG HD103SI                          S1VSJ9CS702175      offset: 63 size: 976762552

#

Oct 16 11:42:14 Tower kernel: md: import disk6: [8,80] (sdf) Hitachi HDT721010SLA360                        STF607MH2RL6XK offset: 63 size: 732574552

#

Oct 16 11:42:14 Tower kernel: md: import disk7: [8,16] (sdb) ST3750640AS                                          5QD0SQN4 offset: 63 size: 732574552

#

Oct 16 11:42:14 Tower kernel: md: import disk8: [8,128] (sdi) Hitachi HDT721010SLA360                        STF607MH3TKUAK offset: 63 size: 976762552

#

Oct 16 11:42:14 Tower kernel: md: import disk9: [8,144] (sdj) Hitachi HDT721010SLA360                        STF607MH2VY1WK offset: 63 size: 976762552

#

Oct 16 11:42:33 Tower kernel: md: import disk0: [8,160] (sdk) SAMSUNG HD154UI                          S1Y6J1KS715154      offset: 63 size: 1465138552

#

Oct 16 11:42:33 Tower kernel: md: import disk1: [8,0] (sda) ST3750640AS                                          5QD0SR8T offset: 63 size: 732574552

#

Oct 16 11:42:33 Tower kernel: md: import disk2: [8,64] (sde) ST3750640AS                                          3QD0B3P9 offset: 63 size: 732574552

#

Oct 16 11:42:33 Tower kernel: md: import disk3: [8,48] (sdd) WDC WD15EADS-00S2B0                          WD-WCAVY0244487 offset: 63 size: 1465138552

#

Oct 16 11:42:33 Tower kernel: md: disk3 replaced

#

Oct 16 11:42:33 Tower kernel: md: import disk4: [8,112] (sdh) Hitachi HDT721010SLA360                        STF607MS03U7PK offset: 63 size: 976762552

#

Oct 16 11:42:33 Tower kernel: md: import disk5: [8,96] (sdg) SAMSUNG HD103SI                          S1VSJ9CS702175      offset: 63 size: 976762552

#

Oct 16 11:42:33 Tower kernel: md: import disk6: [8,80] (sdf) Hitachi HDT721010SLA360                        STF607MH2RL6XK offset: 63 size: 732574552

#

Oct 16 11:42:33 Tower kernel: md: import disk7: [8,16] (sdb) ST3750640AS                                          5QD0SQN4 offset: 63 size: 732574552

#

Oct 16 11:42:33 Tower kernel: md: import disk8: [8,128] (sdi) Hitachi HDT721010SLA360                        STF607MH3TKUAK offset: 63 size: 976762552

#

Oct 16 11:42:33 Tower kernel: md: import disk9: [8,144] (sdj) Hitachi HDT721010SLA360                        STF607MH2VY1WK offset: 63 size: 976762552

#

Oct 16 11:42:55 Tower kernel: md: import disk0: [8,160] (sdk) SAMSUNG HD154UI                          S1Y6J1KS715154      offset: 63 size: 1465138552

#

Oct 16 11:42:55 Tower kernel: md: import disk1: [8,0] (sda) ST3750640AS                                          5QD0SR8T offset: 63 size: 732574552

#

Oct 16 11:42:55 Tower kernel: md: import disk2: [8,64] (sde) ST3750640AS                                          3QD0B3P9 offset: 63 size: 732574552

#

Oct 16 11:42:55 Tower kernel: md: import disk3: [8,48] (sdd) WDC WD15EADS-00S2B0                          WD-WCAVY0244487 offset: 63 size: 1465138552

#

Oct 16 11:42:55 Tower kernel: md: disk3 replaced

#

Oct 16 11:42:55 Tower kernel: md: import disk4: [8,112] (sdh) Hitachi HDT721010SLA360                        STF607MS03U7PK offset: 63 size: 976762552

#

Oct 16 11:42:55 Tower kernel: md: import disk5: [8,96] (sdg) SAMSUNG HD103SI                          S1VSJ9CS702175      offset: 63 size: 976762552

#

Oct 16 11:42:55 Tower kernel: md: import disk6: [8,80] (sdf) Hitachi HDT721010SLA360                        STF607MH2RL6XK offset: 63 size: 732574552

#

Oct 16 11:42:55 Tower kernel: md: import disk7: [8,16] (sdb) ST3750640AS                                          5QD0SQN4 offset: 63 size: 732574552

#

Oct 16 11:42:55 Tower kernel: md: import disk8: [8,128] (sdi) Hitachi HDT721010SLA360                        STF607MH3TKUAK offset: 63 size: 976762552

#

Oct 16 11:42:55 Tower kernel: md: import disk9: [8,144] (sdj) Hitachi HDT721010SLA360                        STF607MH2VY1WK offset: 63 size: 976762552

#

Oct 16 11:43:20 Tower kernel: md: import disk0: [8,160] (sdk) SAMSUNG HD154UI                          S1Y6J1KS715154      offset: 63 size: 1465138552

#

Oct 16 11:43:20 Tower kernel: md: import disk1: [8,0] (sda) ST3750640AS                                          5QD0SR8T offset: 63 size: 732574552

#

Oct 16 11:43:20 Tower kernel: md: import disk2: [8,64] (sde) ST3750640AS                                          3QD0B3P9 offset: 63 size: 732574552

#

Oct 16 11:43:20 Tower kernel: md: import disk3: [8,48] (sdd) WDC WD15EADS-00S2B0                          WD-WCAVY0244487 offset: 63 size: 1465138552

#

Oct 16 11:43:20 Tower kernel: md: disk3 replaced

#

Oct 16 11:43:20 Tower kernel: md: import disk4: [8,112] (sdh) Hitachi HDT721010SLA360                        STF607MS03U7PK offset: 63 size: 976762552

#

Oct 16 11:43:20 Tower kernel: md: import disk5: [8,96] (sdg) SAMSUNG HD103SI                          S1VSJ9CS702175      offset: 63 size: 976762552

#

Oct 16 11:43:20 Tower kernel: md: import disk6: [8,80] (sdf) Hitachi HDT721010SLA360                        STF607MH2RL6XK offset: 63 size: 732574552

#

Oct 16 11:43:20 Tower kernel: md: import disk7: [8,16] (sdb) ST3750640AS                                          5QD0SQN4 offset: 63 size: 732574552

#

Oct 16 11:43:20 Tower kernel: md: import disk8: [8,128] (sdi) Hitachi HDT721010SLA360                        STF607MH3TKUAK offset: 63 size: 976762552

#

Oct 16 11:43:20 Tower kernel: md: import disk9: [8,144] (sdj) Hitachi HDT721010SLA360                        STF607MH2VY1WK offset: 63 size: 97

 

 

#

Oct 16 11:45:26 Tower kernel: ata3.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 frozen

#

Oct 16 11:45:26 Tower kernel: ata3.00: cmd 35/00:f8:37:04:00/00:03:00:00:00/e0 tag 0 dma 520192 out

#

Oct 16 11:45:26 Tower kernel:          res 40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)

#

Oct 16 11:45:26 Tower kernel: ata3.00: status: { DRDY }

#

Oct 16 11:45:26 Tower kernel: ata3: hard resetting link

#

Oct 16 11:45:32 Tower kernel: ata3: link is slow to respond, please be patient (ready=-19)

#

Oct 16 11:45:36 Tower kernel: ata3: COMRESET failed (errno=-16)

#

Oct 16 11:45:36 Tower kernel: ata3: hard resetting link

#

Oct 16 11:45:42 Tower kernel: ata3: link is slow to respond, please be patient (ready=-19)

#

Oct 16 11:45:46 Tower kernel: ata3: COMRESET failed (errno=-16)

#

Oct 16 11:45:46 Tower kernel: ata3: hard resetting link

#

Oct 16 11:45:52 Tower kernel: ata3: link is slow to respond, please be patient (ready=-19)

#

Oct 16 11:46:21 Tower kernel: ata3: COMRESET failed (errno=-16)

#

Oct 16 11:46:21 Tower kernel: ata3: limiting SATA link speed to 1.5 Gbps

#

Oct 16 11:46:21 Tower kernel: ata3: hard resetting link

#

Oct 16 11:46:26 Tower emhttp: shcmd (90): /usr/sbin/hdparm -y /dev/sdd >/dev/null

#

Oct 16 11:46:26 Tower kernel: ata3: COMRESET failed (errno=-16)

#

Oct 16 11:46:26 Tower kernel: ata3: reset failed, giving up

#

Oct 16 11:46:26 Tower kernel: ata3.00: disabled

#

Oct 16 11:46:26 Tower kernel: ata3: EH complete

#

Oct 16 11:46:26 Tower kernel: sd 2:0:0:0: [sdd] Unhandled error code

#

Oct 16 11:46:26 Tower kernel: sd 2:0:0:0: [sdd] Result: hostbyte=0x04 driverbyte=0x00

#

Oct 16 11:46:26 Tower kernel: end_request: I/O error, dev sdd, sector 1079

#

Oct 16 11:46:26 Tower kernel: sd 2:0:0:0: [sdd] Unhandled error code

#

Oct 16 11:46:26 Tower kernel: sd 2:0:0:0: [sdd] Result: hostbyte=0x04 driverbyte=0x00

#

Oct 16 11:46:26 Tower kernel: end_request: I/O error, dev sdd, sector 2095

#

Oct 16 11:46:26 Tower kernel: sd 2:0:0:0: [sdd] Unhandled error code

#

Oct 16 11:46:26 Tower kernel: sd 2:0:0:0: [sdd] Result: hostbyte=0x04 driverbyte=0x00

#

Oct 16 11:46:26 Tower kernel: end_request: I/O error, dev sdd, sector 65743

#

Oct 16 11:46:26 Tower kernel: sd 2:0:0:0: [sdd] Unhandled error code

#

Oct 16 11:46:26 Tower kernel: sd 2:0:0:0: [sdd] Result: hostbyte=0x04 driverbyte=0x00

#

Oct 16 11:46:26 Tower kernel: end_request: I/O error, dev sdd, sector 40431

#

Oct 16 11:46:26 Tower kernel: sd 2:0:0:0: [sdd] Unhandled error code

#

Oct 16 11:46:26 Tower kernel: sd 2:0:0:0: [sdd] Result: hostbyte=0x04 driverbyte=0x00

#

Oct 16 11:46:26 Tower kernel: end_request: I/O error, dev sdd, sector 2367

#

Oct 16 11:46:26 Tower kernel: md: disk3 write error

#

Oct 16 11:46:26 Tower kernel: handle_stripe write error: 1016/3, count: 1

#

Oct 16 11:46:26 Tower kernel: md: disk3 write error

#

Oct 16 11:46:26 Tower kernel: handle_stripe write error: 1024/3, count: 1

#

Oct 16 11:46:26 Tower kernel: md: disk3 write error

 

 

 

#

Oct 16 12:21:45 Tower kernel: ata3: link is slow to respond, please be patient (ready=-19)

#

Oct 16 12:21:49 Tower kernel: ata3: COMRESET failed (errno=-16)

#

Oct 16 12:21:49 Tower kernel: ata3: hard resetting link

#

Oct 16 12:21:55 Tower kernel: ata3: link is slow to respond, please be patient (ready=-19)

#

Oct 16 12:21:59 Tower kernel: ata3: COMRESET failed (errno=-16)

#

Oct 16 12:21:59 Tower kernel: ata3: hard resetting link

#

Oct 16 12:22:05 Tower kernel: ata3: link is slow to respond, please be patient (ready=-19)

#

Oct 16 12:22:22 Tower kernel: ata3: SATA link up 3.0 Gbps (SStatus 123 SControl 300)

#

Oct 16 12:22:22 Tower kernel: ata3.00: ATA-8: WDC WD15EADS-00S2B0, 04.05G04, max UDMA/133

#

Oct 16 12:22:22 Tower kernel: ata3.00: 2930277168 sectors, multi 0: LBA48 NCQ (depth 0/32)

#

Oct 16 12:22:22 Tower kernel: ata3.00: configured for UDMA/133

#

Oct 16 12:22:22 Tower kernel: ata3: exception Emask 0x10 SAct 0x0 SErr 0x0 action 0xf t4

#

Oct 16 12:22:22 Tower kernel: ata3: hotplug_status 0x11

#

Oct 16 12:22:22 Tower kernel: ata3: hard resetting link

#

Oct 16 12:22:28 Tower kernel: ata3: link is slow to respond, please be patient (ready=-19)

#

Oct 16 12:22:32 Tower kernel: ata3: COMRESET failed (errno=-16)

#

Oct 16 12:22:32 Tower kernel: ata3: hard resetting link

#

Oct 16 12:22:38 Tower kernel: ata3: link is slow to respond, please be patient (ready=-19)

#

Oct 16 12:22:42 Tower kernel: ata3: COMRESET failed (errno=-16)

#

Oct 16 12:22:42 Tower kernel: ata3: hard resetting link

#

Oct 16 12:22:48 Tower kernel: ata3: link is slow to respond, please be patient (ready=-19)

#

Oct 16 12:23:17 Tower kernel: ata3: COMRESET failed (errno=-16)

#

Oct 16 12:23:17 Tower kernel: ata3: limiting SATA link speed to 1.5 Gbps

#

Oct 16 12:23:17 Tower kernel: ata3: hard resetting link

#

Oct 16 12:23:22 Tower emhttp: disk_spinning: open: No such device or address

Link to comment

Howdy, and I apologize for getting so far behind in forum activity.  He is right about it being related to the WD 1.5TB drive (sdd, Disk 3), and also his suggestion in step (a).  That particular set of cables (SATA and power) and Promise connector look really hopeless.  Your WD 1.5TB drive had problems right from the start, and within minutes was disabled ("Oct 16 11:46:26 Tower kernel: ata3.00: disabled"), and although it was later found and setup again (as sdl instead of sdd), it was very quickly disabled again.  Attempting to rebuild anything with that port or cable set was hopeless.

 

Although I have included the SATA cable as part of that set, I don't think it is SATA cable related at all, as there are no corrupted communications errors anywhere.  Communications were either non-existent or they were completely fine, for short intervals!  I see no indications of anything wrong with the drive either, but you did not have long enough access to the drive for a good test.

 

When a drive is red-balled, the first step should always be to attempt to Obtain a SMART report.  To obtain it, you may have to install the drive outside of any backplane, and/or switch to a better SATA cable, and/or switch to better power cabling.  But a majority of drive problems are NOT the fault of the drive, but the interface to the drive, including cables, power, disk controller, temps, etc.  A good SMART report will immediately help you determine whether you are looking at a bad drive or a bad interface issue.

 

Since there are no indications of communications corruption, and the drive is able to at least briefly respond completely correctly, that leads me to suspect that the drive is probably good, and the problem is the power provided, perhaps faulty, loose, or fluctuating.  Elsewhere, Joe had problems similar to yours and lost much time trying to diagnose, until he discovered a cheap power cable splitter was causing all of his trouble.  That would be the first thing I would check (plus a SMART report!).  It is also possible that the power cable itself is defective, loose connectors or wires somewhere.  Another possibility is that there are too many drives on the same rail.  I would spend some time looking at how power is distributed to ALL of the drives.

 

I hope this provides some ideas for your troubleshooting.  I would not try to do any rebuilding, until the hardware is reliable, no errors in the syslog.

 

This is just my opinion, not based on any known facts, but I would not be surprised if some day there were incompatibilities found between old disk controllers, like your Promise card, and the newest drives, such as your WD 1.5TB drive.  The old cards were designed and built when drives were much smaller, and some of the current drive technologies had not yet been developed.  An incompatibility between your Promise card and the big WD is very very remote, but I would suggest that you might be safer and possibly find better performance connecting that drive to an onboard port or newer disk controller.  But don't make changes like that yet, since you are out of ports.  Without fixing the current problem, changing another drive to that Promise port and its cables would probably cause it to be red-balled, creating a worse situation with 2 failed drives!

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.