Jump to content

"disk has read errors", now what?


FLK

Recommended Posts

Hi,

 

During my last parity check I've been notified about "READ ERRORS" on one of my drive, but I have absolutely no idea what I must do now.

 

Is there any way to know the files involved ?

And what exactly does the read error mean, looking at SMART value there's no reallocated sectors, only "Raw Read Error Rate" with a raw value of "25" ?

How should I take care of those errors ? replace the disk ?

 

I'm running unRAID 6 RC3 using btrfs on all disks.

 

Here is the syslog with two parity checks :

 

May 22 09:15:29 unFLK kernel: ata3.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x0

May 22 09:15:29 unFLK kernel: ata3.00: irq_stat 0x40000008

May 22 09:15:29 unFLK kernel: ata3.00: failed command: READ FPDMA QUEUED

May 22 09:15:29 unFLK kernel: ata3.00: cmd 60/20:38:c8:19:03/03:00:38:01:00/40 tag 7 ncq 409600 in

May 22 09:15:29 unFLK kernel:        res 41/40:00:d8:1b:03/00:00:38:01:00/40 Emask 0x409 (media error)

May 22 09:15:29 unFLK kernel: ata3.00: status: { DRDY ERR }

May 22 09:15:29 unFLK kernel: ata3.00: error: { UNC }

May 22 09:15:29 unFLK kernel: ata3.00: configured for UDMA/133

May 22 09:15:29 unFLK kernel: sd 3:0:0:0: [sdd] tag#7 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08

May 22 09:15:29 unFLK kernel: sd 3:0:0:0: [sdd] tag#7 Sense Key : 0x3 [current] [descriptor]

May 22 09:15:29 unFLK kernel: sd 3:0:0:0: [sdd] tag#7 ASC=0x11 ASCQ=0x4

May 22 09:15:29 unFLK kernel: sd 3:0:0:0: [sdd] tag#7 CDB: opcode=0x88 88 00 00 00 00 01 38 03 19 c8 00 00 03 20 00 00

May 22 09:15:29 unFLK kernel: blk_update_request: I/O error, dev sdd, sector 5234695128

May 22 09:15:29 unFLK kernel: ata3: EH complete

May 22 09:15:29 unFLK kernel: md: disk4 read error, sector=5234695064

May 22 09:15:29 unFLK kernel: md: disk4 read error, sector=5234695072

May 22 09:15:29 unFLK kernel: md: disk4 read error, sector=5234695080

May 22 09:15:29 unFLK kernel: md: disk4 read error, sector=5234695088

May 22 09:15:29 unFLK kernel: md: disk4 read error, sector=5234695096

May 22 09:15:29 unFLK kernel: md: disk4 read error, sector=5234695104

May 22 09:15:29 unFLK kernel: md: disk4 read error, sector=5234695112

May 22 09:15:29 unFLK kernel: md: disk4 read error, sector=5234695120

May 22 09:15:29 unFLK kernel: md: disk4 read error, sector=5234695128

May 22 09:15:29 unFLK kernel: md: disk4 read error, sector=5234695136

May 22 09:15:29 unFLK kernel: md: disk4 read error, sector=5234695144

May 22 09:15:29 unFLK kernel: md: disk4 read error, sector=5234695152

May 22 09:15:29 unFLK kernel: md: disk4 read error, sector=5234695160

May 22 09:15:29 unFLK kernel: md: disk4 read error, sector=5234695168

May 22 09:15:29 unFLK kernel: md: disk4 read error, sector=5234695176

May 22 09:15:29 unFLK kernel: md: disk4 read error, sector=5234695184

May 22 09:15:29 unFLK kernel: md: disk4 read error, sector=5234695192

May 22 09:15:29 unFLK kernel: md: disk4 read error, sector=5234695200

May 22 09:15:29 unFLK kernel: md: disk4 read error, sector=5234695208

May 22 09:15:29 unFLK kernel: md: disk4 read error, sector=5234695216

May 22 09:15:29 unFLK kernel: md: disk4 read error, sector=5234695224

May 22 09:15:29 unFLK kernel: md: disk4 read error, sector=5234695232

May 22 09:15:29 unFLK kernel: md: disk4 read error, sector=5234695240

May 22 09:15:29 unFLK kernel: md: disk4 read error, sector=5234695248

May 22 09:15:29 unFLK kernel: md: disk4 read error, sector=5234695256

May 22 09:15:29 unFLK kernel: md: disk4 read error, sector=5234695264

May 22 09:15:29 unFLK kernel: md: disk4 read error, sector=5234695272

May 22 09:15:29 unFLK kernel: md: disk4 read error, sector=5234695280

May 22 09:15:29 unFLK kernel: md: disk4 read error, sector=5234695288

May 22 09:15:29 unFLK kernel: md: disk4 read error, sector=5234695296

May 22 09:15:29 unFLK kernel: md: disk4 read error, sector=5234695304

May 22 09:15:29 unFLK kernel: md: disk4 read error, sector=5234695312

May 22 09:15:29 unFLK kernel: md: disk4 read error, sector=5234695320

May 22 09:15:29 unFLK kernel: md: disk4 read error, sector=5234695328

May 22 09:15:42 unFLK kernel: ata3.00: exception Emask 0x0 SAct 0x1c000000 SErr 0x0 action 0x0

May 22 09:15:42 unFLK kernel: ata3.00: irq_stat 0x40000008

May 22 09:15:42 unFLK kernel: ata3.00: failed command: READ FPDMA QUEUED

May 22 09:15:42 unFLK kernel: ata3.00: cmd 60/40:e0:00:21:03/05:00:38:01:00/40 tag 28 ncq 688128 in

May 22 09:15:42 unFLK kernel:        res 41/40:00:a0:23:03/00:00:38:01:00/40 Emask 0x409 (media error)

May 22 09:15:42 unFLK kernel: ata3.00: status: { DRDY ERR }

May 22 09:15:42 unFLK kernel: ata3.00: error: { UNC }

May 22 09:15:42 unFLK kernel: ata3.00: configured for UDMA/133

May 22 09:15:42 unFLK kernel: sd 3:0:0:0: [sdd] tag#28 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08

May 22 09:15:42 unFLK kernel: sd 3:0:0:0: [sdd] tag#28 Sense Key : 0x3 [current] [descriptor]

May 22 09:15:42 unFLK kernel: sd 3:0:0:0: [sdd] tag#28 ASC=0x11 ASCQ=0x4

May 22 09:15:42 unFLK kernel: sd 3:0:0:0: [sdd] tag#28 CDB: opcode=0x88 88 00 00 00 00 01 38 03 21 00 00 00 05 40 00 00

May 22 09:15:42 unFLK kernel: blk_update_request: I/O error, dev sdd, sector 5234697120

May 22 09:15:42 unFLK kernel: ata3: EH complete

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697056

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697064

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697072

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697080

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697088

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697096

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697104

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697112

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697120

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697128

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697136

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697144

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697152

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697160

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697168

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697176

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697184

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697192

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697200

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697208

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697216

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697224

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697232

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697240

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697248

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697256

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697264

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697272

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697280

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697288

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697296

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697304

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697312

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697320

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697328

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697336

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697344

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697352

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697360

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697368

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697376

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697384

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697392

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697400

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697408

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697416

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697424

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697432

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697440

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697448

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697456

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697464

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697472

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697480

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697488

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697496

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697504

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697512

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697520

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697528

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697536

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697544

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697552

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697560

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697568

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697576

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697584

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697592

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697600

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697608

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697616

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697624

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697632

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697640

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697648

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697656

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697664

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697672

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697680

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697688

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697696

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697704

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697712

May 22 09:15:42 unFLK kernel: md: disk4 read error, sector=5234697720

May 22 09:15:49 unFLK kernel: ata3.00: exception Emask 0x0 SAct 0x6 SErr 0x0 action 0x0

May 22 09:15:49 unFLK kernel: ata3.00: irq_stat 0x40000008

May 22 09:15:49 unFLK kernel: ata3.00: failed command: READ FPDMA QUEUED

May 22 09:15:49 unFLK kernel: ata3.00: cmd 60/40:08:40:26:03/05:00:38:01:00/40 tag 1 ncq 688128 in

May 22 09:15:49 unFLK kernel:        res 41/40:00:57:2a:03/00:00:38:01:00/40 Emask 0x409 (media error)

May 22 09:15:49 unFLK kernel: ata3.00: status: { DRDY ERR }

May 22 09:15:49 unFLK kernel: ata3.00: error: { UNC }

May 22 09:15:49 unFLK kernel: ata3.00: configured for UDMA/133

May 22 09:15:49 unFLK kernel: sd 3:0:0:0: [sdd] tag#1 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08

May 22 09:15:49 unFLK kernel: sd 3:0:0:0: [sdd] tag#1 Sense Key : 0x3 [current] [descriptor]

May 22 09:15:49 unFLK kernel: sd 3:0:0:0: [sdd] tag#1 ASC=0x11 ASCQ=0x4

May 22 09:15:49 unFLK kernel: sd 3:0:0:0: [sdd] tag#1 CDB: opcode=0x88 88 00 00 00 00 01 38 03 26 40 00 00 05 40 00 00

May 22 09:15:49 unFLK kernel: blk_update_request: I/O error, dev sdd, sector 5234698839

May 22 09:15:49 unFLK kernel: ata3: EH complete

May 22 09:15:49 unFLK kernel: md: disk4 read error, sector=5234698768

May 22 09:15:49 unFLK kernel: md: disk4 read error, sector=5234698776

May 22 09:15:49 unFLK kernel: md: disk4 read error, sector=5234698784

May 22 09:15:49 unFLK kernel: md: disk4 read error, sector=5234698792

May 22 09:15:49 unFLK kernel: md: disk4 read error, sector=5234698800

May 22 09:15:49 unFLK kernel: md: disk4 read error, sector=5234698808

May 22 09:15:49 unFLK kernel: md: disk4 read error, sector=5234698816

May 22 09:15:49 unFLK kernel: md: disk4 read error, sector=5234698824

May 22 09:15:49 unFLK kernel: md: disk4 read error, sector=5234698832

May 22 09:15:49 unFLK kernel: md: disk4 read error, sector=5234698840

May 22 09:15:49 unFLK kernel: md: disk4 read error, sector=5234698848

May 22 09:15:49 unFLK kernel: md: disk4 read error, sector=5234698856

May 22 09:15:49 unFLK kernel: md: disk4 read error, sector=5234698864

May 22 09:15:49 unFLK kernel: md: disk4 read error, sector=5234698872

May 22 09:15:49 unFLK kernel: md: disk4 read error, sector=5234698880

May 22 09:15:49 unFLK kernel: md: disk4 read error, sector=5234698888

May 22 09:15:49 unFLK kernel: md: disk4 read error, sector=5234698896

May 22 09:15:49 unFLK kernel: md: disk4 read error, sector=5234698904

May 22 09:15:49 unFLK kernel: md: disk4 read error, sector=5234698912

May 22 09:15:49 unFLK kernel: md: disk4 read error, sector=5234698920

May 22 09:15:49 unFLK kernel: md: disk4 read error, sector=5234698928

May 22 09:15:49 unFLK kernel: md: disk4 read error, sector=5234698936

May 22 09:15:49 unFLK kernel: md: disk4 read error, sector=5234698944

May 22 09:15:49 unFLK kernel: md: disk4 read error, sector=5234698952

May 22 09:15:49 unFLK kernel: md: disk4 read error, sector=5234698960

May 22 09:15:49 unFLK kernel: md: disk4 read error, sector=5234698968

May 22 09:15:49 unFLK kernel: md: disk4 read error, sector=5234698976

May 22 09:15:49 unFLK kernel: md: disk4 read error, sector=5234698984

May 22 09:15:49 unFLK kernel: md: disk4 read error, sector=5234698992

May 22 09:15:49 unFLK kernel: md: disk4 read error, sector=5234699000

May 22 09:15:49 unFLK kernel: md: disk4 read error, sector=5234699008

May 22 09:15:49 unFLK kernel: md: disk4 read error, sector=5234699016

May 22 09:15:49 unFLK kernel: md: disk4 read error, sector=5234699024

May 22 09:15:49 unFLK kernel: md: disk4 read error, sector=5234699032

May 22 09:15:49 unFLK kernel: md: disk4 read error, sector=5234699040

May 22 09:15:49 unFLK kernel: md: disk4 read error, sector=5234699048

May 22 09:15:49 unFLK kernel: md: disk4 read error, sector=5234699056

May 22 09:15:49 unFLK kernel: md: disk4 read error, sector=5234699064

May 22 09:16:01 unFLK sSMTP[12529]: Creating SSL connection to host

May 22 09:16:01 unFLK sSMTP[12529]: SSL connection using DHE-RSA-AES256-GCM-SHA384

May 22 09:16:03 unFLK sSMTP[12529]: Sent mail for xxxxxxxxxxxxxxxx (221 2.0.0 Bye) uid=0 username=root outbytes=766

May 22 09:25:53 unFLK kernel: ata3.00: exception Emask 0x0 SAct 0x7fffffff SErr 0x0 action 0x0

May 22 09:25:53 unFLK kernel: ata3.00: irq_stat 0x40000008

May 22 09:25:53 unFLK kernel: ata3.00: failed command: READ FPDMA QUEUED

May 22 09:25:53 unFLK kernel: ata3.00: cmd 60/40:00:60:fe:1e/05:00:39:01:00/40 tag 0 ncq 688128 in

May 22 09:25:53 unFLK kernel:        res 41/40:00:77:03:1f/00:00:39:01:00/40 Emask 0x409 (media error)

May 22 09:25:53 unFLK kernel: ata3.00: status: { DRDY ERR }

May 22 09:25:53 unFLK kernel: ata3.00: error: { UNC }

May 22 09:25:53 unFLK kernel: ata3.00: configured for UDMA/133

May 22 09:25:53 unFLK kernel: sd 3:0:0:0: [sdd] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x00 driverbyte=0x08

May 22 09:25:53 unFLK kernel: sd 3:0:0:0: [sdd] tag#0 Sense Key : 0x3 [current] [descriptor]

May 22 09:25:53 unFLK kernel: sd 3:0:0:0: [sdd] tag#0 ASC=0x11 ASCQ=0x4

May 22 09:25:53 unFLK kernel: sd 3:0:0:0: [sdd] tag#0 CDB: opcode=0x88 88 00 00 00 00 01 39 1e fe 60 00 00 05 40 00 00

May 22 09:25:53 unFLK kernel: blk_update_request: I/O error, dev sdd, sector 5253301111

May 22 09:25:53 unFLK kernel: ata3: EH complete

May 22 09:25:53 unFLK kernel: md: disk4 read error, sector=5253301040

May 22 09:25:53 unFLK kernel: md: disk4 read error, sector=5253301048

May 22 09:25:53 unFLK kernel: md: disk4 read error, sector=5253301056

May 22 09:25:53 unFLK kernel: md: disk4 read error, sector=5253301064

May 22 09:25:53 unFLK kernel: md: disk4 read error, sector=5253301072

May 22 09:25:53 unFLK kernel: md: disk4 read error, sector=5253301080

May 22 10:42:01 unFLK kernel: md: sync done. time=36719sec

May 22 10:42:01 unFLK kernel: md: recovery thread sync completion status: 0

May 22 10:47:13 unFLK emhttp: read_line: client closed the connection

May 22 10:47:13 unFLK emhttp: read_line: client closed the connection

May 22 10:47:13 unFLK emhttp: read_line: client closed the connection

May 22 10:47:13 unFLK emhttp: read_line: client closed the connection

May 22 10:47:13 unFLK emhttp: read_line: client closed the connection

May 22 11:04:56 unFLK php: /sbin/btrfs scrub start -B -r /mnt/disk4 &>/dev/null &

May 22 11:30:29 unFLK emhttp: /usr/bin/tail -n 42 -f /var/log/syslog 2>&1

May 22 11:42:02 unFLK kernel: mdcmd (46): spindown 0

May 22 11:42:02 unFLK kernel: mdcmd (47): spindown 1

May 22 12:55:09 unFLK kernel: mdcmd (48): spindown 0

May 22 12:55:29 unFLK kernel: mdcmd (49): spindown 2

May 22 13:00:02 unFLK kernel: mdcmd (50): spindown 1

May 22 14:10:45 unFLK kernel: mdcmd (51): check CORRECT

May 22 14:10:45 unFLK kernel: md: recovery thread woken up ...

May 22 14:10:45 unFLK kernel: md: recovery thread checking parity...

May 22 14:10:45 unFLK kernel: md: using 1536k window, over a total of 2930266532 blocks.

May 22 14:11:01 unFLK sSMTP[18014]: Creating SSL connection to host

May 22 14:11:01 unFLK sSMTP[18014]: SSL connection using DHE-RSA-AES256-GCM-SHA384

May 22 14:11:03 unFLK sSMTP[18014]: Sent mail for xxxxxxxxxxxxxxxx (221 2.0.0 Bye) uid=0 username=root outbytes=688

May 22 20:46:12 unFLK kernel: mdcmd (52): spindown 2

May 23 00:14:50 unFLK emhttp: /usr/bin/tail -n 42 -f /var/log/syslog 2>&1

May 23 00:16:07 unFLK kernel: md: sync done. time=36321sec

May 23 00:16:07 unFLK kernel: md: recovery thread sync completion status: 0

May 23 00:20:01 unFLK sSMTP[25703]: Creating SSL connection to host

May 23 00:20:01 unFLK sSMTP[25703]: SSL connection using DHE-RSA-AES256-GCM-SHA384

May 23 00:20:02 unFLK sSMTP[25703]: Sent mail for xxxxxxxxxxxxxxxx (221 2.0.0 Bye) uid=0 username=root outbytes=1342

May 23 00:40:09 unFLK kernel: mdcmd (53): spindown 2

May 23 00:41:55 unFLK emhttp: /usr/bin/tail -n 42 -f /var/log/syslog 2>&1

 

thanks!

Link to comment

Well, I'm a bit stumped.  Disk 4 is clearly reporting bad sectors, physical problems with the sector media, uncorrectable.  But when we ask the SMART portion of the drive, it says "nope, no problems here, everything's fine"!  So who's right?  Have we got an internal mutiny?  You recently did a SMART short test, try the long test, which will force SMART to examine every single sector.  Polling time for the long test says 393 minutes, a little over 6.5 hours!  Seems a little long, but maybe not.

 

Some minor oddities -

 

* The motherboard is claiming 6.0 gbps speeds for the 6 onboard SATA ports, but in this syslog, the last 4 ports only linked up at 3.0 gbps.  The first 2 did link at 6.0 gbps.  That means the SSD and your parity drive are on the fastest ports, which is desirable.  But there is no explanation why the other 4 ports were not faster.  I noticed this on Disk 4, which is a Red like the parity drive, and connected right next to it.  Have to wonder if your motherboard manufacturer cheated here.  I don't know if you will see a real difference or not, but it might be interesting to check and compare top speeds for all 4 WD Reds.  Use the hdparm commands below and compare the very last numbers (ignore the other numbers)-

 

Parity on 2nd port (6.0):  hdparm -tT /dev/sdc

Disk 4 on 3rd port (3.0):  hdparm -tT /dev/sdd

Disk 3 on 4th port (3.0):  hdparm -tT /dev/sde

Disk 1 on 6th port (3.0):  hdparm -tT /dev/sdg

 

* Syslog shows xenbr0 being set up and used.  I thought all xen stuff had been removed.  Or perhaps you have something manually configuring xen networking functionality?

 

* Your -rc3 syslog shows the snippet below, of an array of shareColor, shareFree, and shareSize vars.  I show only the first 2, the 0 and 1 entries, but you have 4 entries.  Another -rc3 syslog had 7 entries, so I'm guessing the count corresponds to the number of shares found?  This appears to be an incomplete feature, one built in 2 parts, one part setting up the vars, and another part using them.  Obviously, in -rc3, one of the parts was not finished, so I suspect that in -rc4 either this will be removed or a new display feature will be revealed.  To head off criticism of a new feature added to the -rc's, this appears to be only cosmetic.

May 20 10:17:17 unFLK emhttp: shareColor.0 not found

May 20 10:17:17 unFLK emhttp: shareFree.0 not found

May 20 10:17:17 unFLK emhttp: shareSize.0 not found

May 20 10:17:17 unFLK emhttp: shareColor.1 not found

May 20 10:17:17 unFLK emhttp: shareFree.1 not found

May 20 10:17:17 unFLK emhttp: shareSize.1 not found

May 20 10:17:17 unFLK emhttp: shareColor.2 not found

...etc...

Link to comment

I would suggest opening up the case and double checking that all of SATA cables to the hard drives are securely in place on both ends.  If you have any SATA cables that use the mechanical locking devices to secure the cables, replace them.  (One manufacturer changed the connector design on his hard drives and many (if not most) of the locking type SATA cables do not provide a reliable electrical connection!  This would not apply if you are using a quick-change hard drive cages.)

Link to comment
...One manufacturer changed the connector design on his hard drives and many (if not most) of the locking type SATA cables do not provide a reliable electrical connection!...
I seem to have missed this newsflash. Could you elaborate? As in, name names?
Link to comment

...One manufacturer changed the connector design on his hard drives and many (if not most) of the locking type SATA cables do not provide a reliable electrical connection!...
I seem to have missed this newsflash. Could you elaborate? As in, name names?

 

 

I should have known (actually dreading)  someone would would ask for this information!    ;)  I did a search on the vendor I thought was involved and found the thread:

 

      http://lime-technology.com/forum/index.php?topic=36065.msg335979#msg335979

Link to comment

Well, I'm a bit stumped.  Disk 4 is clearly reporting bad sectors, physical problems with the sector media, uncorrectable.  But when we ask the SMART portion of the drive, it says "nope, no problems here, everything's fine"!  So who's right?  Have we got an internal mutiny?  You recently did a SMART short test, try the long test, which will force SMART to examine every single sector.  Polling time for the long test says 393 minutes, a little over 6.5 hours!  Seems a little long, but maybe not.

 

I've run the long test and it "Completed without error"...

 

Minor question here, while running the long test I noticed that the disk spun down... so I disabled the "Spin down delay" and restarted the test again, maybe there's a bug to fix here? (prevent the disk from spinning down while running the test? or maybe the test was still running... but the UI was misleading since I got a message saying something like "can't run the test while disk in spun down"...)

 

Some minor oddities -

 

* The motherboard is claiming 6.0 gbps speeds for the 6 onboard SATA ports, but in this syslog, the last 4 ports only linked up at 3.0 gbps.  The first 2 did link at 6.0 gbps.  That means the SSD and your parity drive are on the fastest ports, which is desirable.  But there is no explanation why the other 4 ports were not faster.  I noticed this on Disk 4, which is a Red like the parity drive, and connected right next to it.  Have to wonder if your motherboard manufacturer cheated here.  I don't know if you will see a real difference or not, but it might be interesting to check and compare top speeds for all 4 WD Reds.  Use the hdparm commands below and compare the very last numbers (ignore the other numbers)-

 

Parity on 2nd port (6.0):  hdparm -tT /dev/sdc

Disk 4 on 3rd port (3.0):  hdparm -tT /dev/sdd

Disk 3 on 4th port (3.0):  hdparm -tT /dev/sde

Disk 1 on 6th port (3.0):  hdparm -tT /dev/sdg

 

You are right about my motherboard, 2 6gbs and 4 3gbs. And the hdparm command give me almost same results on all RED disk (~150MB/s on buffered disk reads)

 

* Syslog shows xenbr0 being set up and used.  I thought all xen stuff had been removed.  Or perhaps you have something manually configuring xen networking functionality?

 

That's the bridge name in the network settings, did not bother to change it after the last update so I guess it will stay here for ever :D

 

I also checked the cable thing (as noticed by Frank1940), everything looks fine to me.

 

The WD Disk with read errors is a refurbished one that I just get few weeks ago, so I'll send it back again no matter what, but how can I know if my data are safe? will the parity disk rebuild the data involved with the read errors when I'll replace the disk ? Any way to list the files affected ? I'm really in the dark here, what would you do ?  :o

 

Thanks for the support!

Link to comment

I've run the long test and it "Completed without error"...

That would seem to indicate the drive is fine.  Which means we still have no explanation for the read errors.  This drive is odd, doesn't surprise me when you say it's refurbished.  I just noticed another oddity, the SMART report claims it has a rotation rate of 5400 rpm!  That does not seem possible for a WD Red.  I suspect that on refurbishing, they reset much of the SMART numbers, and mistakenly set that wrong.

 

I would try a parity check now, and if it completes without any issues, then the drive and array are probably fine.  At that point, you could continue with the drive, or safely rebuild it.  I wouldn't consider rebuilding without a successful parity check.

 

Minor question here, while running the long test I noticed that the disk spun down... so I disabled the "Spin down delay" and restarted the test again, maybe there's a bug to fix here? (prevent the disk from spinning down while running the test? or maybe the test was still running... but the UI was misleading since I got a message saying something like "can't run the test while disk in spun down"...)

It's a minor issue we have lived with, and keep forgetting to mention to others, because it comes up rather infrequently.  It has just now been added to the webGui, so I hope that a temporary disabling of spin down is also added, for this test.  In this case, it was my fault for forgetting to mention it.

Link to comment

...    I just noticed another oddity, the SMART report claims it has a rotation rate of 5400 rpm!  That does not seem possible for a WD Red.    .....

 

 

 

I have a WD Red 3TB and the smart report says that it is a 5400rpm drive.  So that part is not an anomaly. 

Link to comment

Parity check started  8)

 

I've always assumed that my RED disks were 5400RPM but you're right, when I look at the other's SMART reports, there's no rotation rate indication. But since they are not running the same firmware version... When I received the refurbished one I noticed (it's written on the big sticker) it was running "NASWARE 3.0" while the others were running "NASWARE 2.0" (Firmware Version 82.00A82 vs 80.00A80).

 

Maybe that's something "new".

 

Link to comment

...  I just noticed another oddity, the SMART report claims it has a rotation rate of 5400 rpm!  That does not seem possible for a WD Red.

 

Sounds about right.  I know the Seagate NAS units that run at 5900 rpm have slightly better performance than the Reds (except for the 6TB Red, which has a 20% higher areal density than the 1TB/platter smaller units).

 

WD doesn't actually state the rotational rate in their spec sheets ... calling it "IntelliPower" => implying a variable rotational rate, but I suspect they're pretty much fixed at ~ 5400rpm.    "IntelliPower" is just a marketing term that essentially means "we don't want to say  :)"

 

Link to comment

...    I just noticed another oddity, the SMART report claims it has a rotation rate of 5400 rpm!  That does not seem possible for a WD Red.    .....

 

 

 

I have a WD Red 3TB and the smart report says that it is a 5400rpm drive.  So that part is not an anomaly.

 

Thanks everyone, just shows my ignorance, and some bad assumptions.  I had thought the way they were talked about, and from their cost, that they were high performance drives comparable to the WD Blacks, but configured differently for the needs of a NAS.

 

OK, an ignorant question, if they only spin at 5400 rpm, WHY do they cost so much?

Link to comment

...    I just noticed another oddity, the SMART report claims it has a rotation rate of 5400 rpm!  That does not seem possible for a WD Red.    .....

 

 

 

I have a WD Red 3TB and the smart report says that it is a 5400rpm drive.  So that part is not an anomaly.

 

Thanks everyone, just shows my ignorance, and some bad assumptions.  I had thought the way they were talked about, and from their cost, that they were high performance drives comparable to the WD Blacks, but configured differently for the needs of a NAS.

 

OK, an ignorant question, if they only spin at 5400 rpm, WHY do they cost so much?

 

Pricing is a Marketing Department decision!  They evaluate things like warranty period ( a marketing decision that has nothing to do with MTBF data) and the cost of any failures during that warranty period, the intended use of the device, the price of any competitive product that is intended in fill the same market niche, and the perceived  value of the product to the potential buyer.  If they think that enough of us are willing to pay a bit more for this product , they will set the price accordingly!

Link to comment

READ ERRORS, Season 1 Episode 5

 

Parity Check is OK, and no read errors in the process... I really hate when it stop working and then work again without doing anything.  ???

 

Anyway I started an "advance" replacement from WD so I wont take any risk here.

 

So last question, is there any way to "ask" the disk : "tell me which file is located at sector xxxx" ? (Since I have sector's list from syslog)

Link to comment

... if they only spin at 5400 rpm, WHY do they cost so much?

 

My 1TB SSD doesn't spin at all and it costs almost triple the price of a 4TB WD Red  8)

 

... more seriously, the Reds are designed to run cool; have specific anti-vibration features built in to minimize drive vibrations in 24/7 operation; and clearly have at least a bit factored into the price to pay for the longer warranty.    I really don't think they're all that expensive => the 3 and 4TB versions are typically under $40/TB, and can be found on sale for ~ $35/TB or so.    By comparison, the 4TB WD Blacks are over $50/TB  [Currently $212 at Newegg and $239 at Amazon]

 

... and with the 1TB/platter areal density (1.2TB/platter for the 6TB units) the sustained data rates ... even on the slowest inner cylinders ... are well above the limitations of a Gb network, so there's little real-life impact of the slower rotation speed [the seek times are indeed longer than with 7200rpm units; so if you're doing a lot of consecutive writes without a cache drive you'll notice a small difference].

 

Link to comment

So last question, is there any way to "ask" the disk : "tell me which file is located at sector xxxx" ? (Since I have sector's list from syslog)

 

It's possible, but a fair amount of work.  Try Googling "badblocks ReiserFS", and look for an article or 2 that discusses how to calculate the file (if any) at a bad block location.  I should caution you though that your sector list is very misleading, because of the varying block sizes being used at different levels of the disk I/O process.

 

For the future, you might want to look into bunker or bitrot, which run on the unRAID server, or CORZ Checksum, which runs in Windows.  They create checksum files, for use in detecting bit corruption, file corruption.

Link to comment

+1 for Corz checksum => runs on Windows, but does a very nice job of creating checksums for your files; and makes it very easy to validate them.

 

If you have suspected corruption on a disk, it's very simple to simply run Corz and isolate any corrupted files.

 

Link to comment

Isn't btrfs supposed to provide some sort of protection against that ? I really need to get more familiar with btrfs tools and features ^^

 

Anyway I'll check those solutions, thanks!

 

would be nice to have one of those builtin, with a webUI integration 8)

 

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...