Disk problem unRAID 4,7

Rob_Dingen · July 21, 2011

Hi

I have a problem to preclear one disk WD20EARS 2TB

Parity - Hit MB sata 0

Disk 1- Hit MB sata 1

Disk 2- Hit MB sata 2

Disk 3- WD MB sata 3

Disk 4- WD AOC-SASLP

Disk 5- WD AOC-SASLP

The 3 Hit disks are already precleared.

Disk 4 and 5 are preclearing now but Disk 3 is not responding on the preclear command.

It's visible in the unmenu

Setup

Softw v : unRAID 4,7

MB : X8SIL-F

Proc : I3 540

Mem : 4G Kingston

SATA Contr : AOC-SASLP-MV8 softw v 21

3x HDD Hitachi 5K3000 2TB

3x HDD WD 20EARS 2TB

Case : NORCO 2420

Attach the syslog from the disk and smart info

Rob

Smart_disk.txt

Syslog_disk.txt

SSD · July 21, 2011

Hi

I have a problem to preclear one disk WD20EARS 2TB

Parity - Hit MB sata 0

Disk 1- Hit MB sata 1

Disk 2- Hit MB sata 2

Disk 3- WD MB sata 3

Disk 4- WD AOC-SASLP

Disk 5- WD AOC-SASLP

The 3 Hit disks are already precleared.

Disk 4 and 5 are preclearing now but Disk 3 is not responding on the preclear command.

It's visible in the unmenu

Setup

Softw v : unRAID 4,7

MB : X8SIL-F

Proc : I3 540

Mem : 4G Kingston

SATA Contr : AOC-SASLP-MV8 softw v 21

3x HDD Hitachi 5K3000 2TB

3x HDD WD 20EARS 2TB

Attach the syslog from the disk and smart info

Rob

Looks like you might have a cabling issue to that drive. You should repace (or al least reseat both ends) of the SATA cable to that disk.

Try that and see if the problems stop.

Rob_Dingen · July 21, 2011

Hi

Both ends is not possible because I use a NORCO 2420 case but I can try to reseat the connectors on the MB.

If the preclear from the other disks finish I can try to put the disk on the AOC-SASLP-MV8 and try again.

Rob

Rob_Dingen · July 21, 2011

Hi

Disk 4 which preclear running also quit at 54%.

Jul 21 21:17:18 Tower kernel: 72 0b 00 00 00 00 00 0c 00 0a 80 00 00 00 00 00
Jul 21 21:17:18 Tower kernel: 00 00 00 d7

Jul 21 21:17:18 Tower kernel: sd 0:0:0:0: [sdb] ASC=0x0 ASCQ=0x0 (Drive related)

Jul 21 21:17:18 Tower kernel: sd 0:0:0:0: [sdb] CDB: cdb[0]=0x28: 28 00 e8 e0 81 20 00 00 08 00 (Drive related)

Jul 21 21:17:18 Tower kernel: end_request: I/O error, dev sdb, sector 3907027232 (Errors)

Jul 21 21:17:18 Tower kernel: ata1: translated ATA stat/err 0x41/04 to SCSI SK/ASC/ASCQ 0xb/00/00 (Drive related)

Jul 21 21:17:18 Tower kernel: ata1: status=0x41 { DriveReady Error } (Errors)

Jul 21 21:17:18 Tower kernel: ata1: error=0x04 { DriveStatusError } (Errors)

Jul 21 21:17:18 Tower kernel: ata1: translated ATA stat/err 0x41/04 to SCSI SK/ASC/ASCQ 0xb/00/00 (Drive related)

Jul 21 21:17:18 Tower kernel: ata1: status=0x41 { DriveReady Error } (Errors)

Jul 21 21:17:18 Tower kernel: ata1: error=0x04 { DriveStatusError } (Errors)

Jul 21 21:17:18 Tower kernel: ata1: translated ATA stat/err 0x41/04 to SCSI SK/ASC/ASCQ 0xb/00/00 (Drive related)

Jul 21 21:17:18 Tower kernel: ata1: status=0x41 { DriveReady Error } (Errors)

Jul 21 21:17:18 Tower kernel: ata1: error=0x04 { DriveStatusError } (Errors)

Jul 21 21:17:18 Tower kernel: ata1: translated ATA stat/err 0x41/04 to SCSI SK/AS (Drive related)

Whats going on?

Rob

vca · July 21, 2011

Your smart log includes this:

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: electrical failure 30%         4         -
# 2  Short offline       Interrupted (host reset)      90%         4         -
# 3  Short offline       Interrupted (host reset)      90%         4         -
# 4  Short offline       Interrupted (host reset)      90%         4         -

which I don't recall seeing before - at a guess it looks like the drive is having power issues (electrical failure), so maybe you have an issue with your power supply or with the power connector? Do you have any power splitters - they have been a source of grief for many on this forum.

Regards,

Stephen

Rajahal · July 22, 2011

Are you supplying power to both of the molex connectors on each backplane in your Norco 4220? According to Norco you should only have to connect one of them, but some users have reported in the past that connecting both fixes odd power-related issues like you are seeing. I think it is worth a try. Try it with just one backplane at first (so that you don't have to go buy a bunch of new power splitters) and see if it makes any difference.

Rob_Dingen · July 23, 2011

Update to the problem.

Reconnect all sata cables and power cables and put a extra set of power cables from the psu to the molex connector of the norco backplane.

Reboot and still got a disk error.

Removed disk 3 and reboot now everything running smooth.

Disk 4 is preclearing again and I hope it will finish now.

Did some further testing on Disk 3 WD 20EARS and I think it was DOA it doesn't spin up in another computer.

Rob

Rob_Dingen · July 23, 2011

OK preclear stops again with preread at 54%

Part of the syslog

Jul 24 02:17:14 Tower kernel: sd 0:0:0:0: [sdb] ASC=0x0 ASCQ=0x0 (Drive related)
Jul 24 02:17:14 Tower kernel: sd 0:0:0:0: [sdb] CDB: cdb[0]=0x28: 28 00 e8 e0 80 98 00 00 08 00 (Drive related)

Jul 24 02:17:14 Tower kernel: end_request: I/O error, dev sdb, sector 3907027096 (Errors)

Jul 24 02:17:14 Tower kernel: ata1: translated ATA stat/err 0x41/04 to SCSI SK/ASC/ASCQ 0xb/00/00 (Drive related)

Jul 24 02:17:14 Tower kernel: ata1: status=0x41 { DriveReady Error } (Errors)

Jul 24 02:17:14 Tower kernel: ata1: error=0x04 { DriveStatusError } (Errors)

Jul 24 02:17:14 Tower kernel: ata1: translated ATA stat/err 0x41/04 to SCSI SK/ASC/ASCQ 0xb/00/00 (Drive related)

Jul 24 02:17:14 Tower kernel: ata1: status=0x41 { DriveReady Error } (Errors)

Jul 24 02:17:14 Tower kernel: ata1: error=0x04 { DriveStatusError } (Errors)

Jul 24 02:17:14 Tower kernel: ata1: translated ATA stat/err 0x41/04 to SCSI SK/ASC/ASCQ 0xb/00/00 (Drive related)

Jul 24 02:17:14 Tower kernel: ata1: status=0x41 { DriveReady Error } (Errors)

Jul 24 02:17:14 Tower kernel: ata1: error=0x04 { DriveStatusError } (Errors)

Jul 24 02:17:14 Tower kernel: ata1: translated ATA stat/err 0x41/04 to SCSI SK/ASC/ASCQ 0xb/00/00 (Drive related)

Jul 24 02:17:14 Tower kernel: ata1: status=0x41 { DriveReady Error } (Errors)

Jul 24 02:17:14 Tower kernel: ata1: error=0x04 { DriveStatusError } (Errors)

Jul 24 02:17:14 Tower kernel: ata1: translated ATA stat/err 0x41/04 to SCSI SK/ASC/ASCQ 0xb/00/00 (Drive related)

Jul 24 02:17:14 Tower kernel: ata1: status=0x41 { DriveReady Error } (Errors)

Jul 24 02:17:14 Tower kernel: ata1: error=0x04 { DriveStatusError } (Errors)

Jul 24 02:17:14 Tower kernel: ata1: translated ATA stat/err 0x41/04 to SCSI SK/ASC/ASCQ 0xb/00/00 (Drive related)

Jul 24 02:17:14 Tower kernel: ata1: status=0x41 { DriveReady Error } (Errors)

Jul 24 02:17:14 Tower kernel: ata1: error=0x04 { DriveStatusError } (Errors)

Jul 24 02:17:14 Tower kernel: sd 0:0:0:0: [sdb] Result: hostbyte=0x00 driverbyte=0x08 (System)

Jul 24 02:17:14 Tower kernel: sd 0:0:0:0: [sdb] Sense Key : 0xb [current] [descriptor] (Drive related)

Jul 24 02:17:14 Tower kernel: Descriptor sense data with sense descriptors (in hex):

I'm lost.

Rob

HDParm_2289.txt

Short_smart_test_2289.txt

vca · July 28, 2011

You might have another bad drive:

note:

197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       1

and:

SMART Self-test log structure revision number 1
Num  Test_Description    Status                  Remaining  LifeTime(hours)  LBA_of_first_error
# 1  Short offline       Completed: read failure       90%        39         2112414936
# 2  Short offline       Completed: read failure       90%        39         2112414936

In theory your drive only has one bad sector that is queued to be remapped, but I had a brand new WD20EARS that made a similar claim (though I didn't save a copy of the smart report as I usually do). When I tried to do a full smart test it failed to complete the test (though did not report anything else), so I took it out of my unRAID box and put it in my Windows PC and ran the WD Tools on it. The quick test failed to complete and after I cancelled the extended test (as it was taking too long) it reported that the drive had too many bad sectors. As in your case I initially discovered the problem during a preclear. From my notes:

Apr25 - Apr 27 the replacement drive WMA ZA3 782 259 also failed, it appeared to have got part way though the writing zeroes phase of the first pre-clear pass when it took out the unraid server (probably too many error messages). When I put it in my desktop CrystalDisk initially showed a normal smart report, then when I tried to run the WD Diagnostic the quick test ran very slowly and I stopped it after two hours and tried the long test overnight, the next morning it was still running and was now saying about 400 hours left, so I stopped the test and got the report which said FAIL, "08-Too many bad sectors detected". Now crystal disk no longer finds this drive.

Regards,

Stephen

Disk problem unRAID 4,7

Recommended Posts

Rob_Dingen

Link to comment

SSD

Link to comment

Rob_Dingen

Link to comment

Rob_Dingen

Link to comment

vca

Link to comment

Rajahal

Link to comment

Rob_Dingen

Link to comment

Rob_Dingen

Link to comment

vca

Link to comment

Join the conversation