Failed drive.. or SAS?

tyrindor · July 6, 2017

Just had a disk taken offline, a disk that has my most important files. I am copying them from the emulated drive to my windows PC for a quick backup. Not sure how to check SMART of the disk, it won't let me says "Can not read attributes". Is it completely dead or something? I'm not sure the best way to go about this after I copy my files to another PC. It's not letting me run any tests on the drive.

Its repeating these messages over and over now:

Quote

Jun 29 13:34:04 UNRAID kernel: sd 1:0:6:0: [sdi] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
Jun 29 13:34:04 UNRAID kernel: sd 1:0:6:0: [sdi] tag#0 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 e5 00
Jun 29 13:34:04 UNRAID kernel: sd 1:0:6:0: [sdi] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
Jun 29 13:34:04 UNRAID kernel: sd 1:0:6:0: [sdi] tag#0 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 98 00
Jun 29 13:34:04 UNRAID kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
Jun 29 13:34:04 UNRAID kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
Jun 29 13:34:04 UNRAID kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO

Here's exactly where the crash happened in the log. Seems like it threw a SAS error, followed by read/write errors on disk1.

aaa.txt

JorgeB · July 6, 2017

Diagnostics?

tyrindor · July 6, 2017

Restarting the server, drive still didn't show up. Hotswapped it and now it showed up. SMART looks fine? Passes quick SMART test...

Should I rebuild the data back onto it, or disable parity and build a new parity. My last parity check was 36 days ago, so there is a chance it could be wrong if I have SAS errors going around. Either way I already transferred off my irreplaceable files.

My guess is that because this is a shingled archive drive, something got botched up and the SAS card freaked out when it couldn't write immediately?

1	Raw read error rate	0x000f	114	099	006	Pre-fail	Always	Never	70247960
3	Spin up time	0x0003	090	090	000	Pre-fail	Always	Never	0
4	Start stop count	0x0032	100	100	020	Old age	Always	Never	704
5	Reallocated sector count	0x0033	100	100	010	Pre-fail	Always	Never	0
7	Seek error rate	0x000f	079	060	030	Pre-fail	Always	Never	85358140
9	Power on hours	0x0032	089	089	000	Old age	Always	Never	10194 (1y, 1m, 28d, 18h)
10	Spin retry count	0x0013	100	100	097	Pre-fail	Always	Never	0
12	Power cycle count	0x0032	100	100	020	Old age	Always	Never	49
183	Runtime bad block	0x0032	100	100	000	Old age	Always	Never	0
184	End-to-end error	0x0032	100	100	099	Old age	Always	Never	0
187	Reported uncorrect	0x0032	100	100	000	Old age	Always	Never	0
188	Command timeout	0x0032	100	100	000	Old age	Always	Never	0
189	High fly writes	0x003a	100	100	000	Old age	Always	Never	0
190	Airflow temperature cel	0x0022	068	049	045	Old age	Always	Never	32 (min/max 32/32)
191	G-sense error rate	0x0032	100	100	000	Old age	Always	Never	0
192	Power-off retract count	0x0032	100	100	000	Old age	Always	Never	151
193	Load cycle count	0x0032	100	100	000	Old age	Always	Never	1423
194	Temperature celsius	0x0022	032	051	000	Old age	Always	Never	32 (0 20 0 0 0)
195	Hardware ECC recovered	0x001a	114	099	000	Old age	Always	Never	70247960
197	Current pending sector	0x0012	100	100	000	Old age	Always	Never	0
198	Offline uncorrectable	0x0010	100	100	000	Old age	Offline	Never	0
199	UDMA CRC error count	0x003e	200	200	000	Old age	Always	Never	0
240	Head flying hours	0x0000	100	253	000	Old age	Offline	Never	1247 (205 155 0)
241	Total lbas written	0x0000	100	253	000	Old age	Offline	Never	28089705984
242	Total lbas read	0x0000	100	253	000	Old age	Offline	Never	258281706892

JorgeB · July 6, 2017

9 minutes ago, tyrindor said:

My guess is that because this is a shingled archive drive, something got botched up and the SAS card freaked out when it couldn't write immediately?

Possible bit IMO not likely, SMART looks fine, rebuild to the same disk, you could swap cables/backplane with another disk before rebuilding just to rule that out.

tyrindor · July 6, 2017

K, rebuilding to same disk and i'll see how it goes.

My setup is pure SAS cables so it's 1 SAS cable for 4 drives. I don't have any spares sadly, but i'll move the drive to another slot if it happens again.

JorgeB · July 6, 2017

12 minutes ago, tyrindor said:

but i'll move the drive to another slot if it happens again.

That works, use a slot tha uses a different SAS cable also.

tyrindor · July 6, 2017

15% into the rebuild and still nothing wrong...

I'm not sure if it's a good thing or bad thing. Something definitely went wrong, but if the rebuild goes fine then who knows what.

tyrindor · July 7, 2017

Data rebuild was successful. SMART still has nothing wrong in it. No errors in log.

I don't understand what the problem was. Either a SAS cable/card/hotswap bay had a fluke... or the Seagate Archive drive had an issue with it's shingling technology which resulted in unRAID thinking the drive was unresponsive.

I'll keep my fingers crossed that it doesn't happen again.

Failed drive.. or SAS?

Recommended Posts

tyrindor

Link to comment

JorgeB

Link to comment

tyrindor

Link to comment

JorgeB

Link to comment

tyrindor

Link to comment

JorgeB

Link to comment

tyrindor

Link to comment

tyrindor

Link to comment

Archived