Jump to content

Failed drive.. or SAS?


tyrindor

Recommended Posts

Just had a disk taken offline, a disk that has my most important files. I am copying them from the emulated drive to my windows PC for a quick backup. Not sure how to check SMART of the disk, it won't let me says "Can not read attributes". Is it completely dead or something? I'm not sure the best way to go about this after I copy my files to another PC. It's not letting me run any tests on the drive.

 

Its repeating these messages over and over now:

Quote

Jun 29 13:34:04 UNRAID kernel: sd 1:0:6:0: [sdi] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
Jun 29 13:34:04 UNRAID kernel: sd 1:0:6:0: [sdi] tag#0 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 e5 00
Jun 29 13:34:04 UNRAID kernel: sd 1:0:6:0: [sdi] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00
Jun 29 13:34:04 UNRAID kernel: sd 1:0:6:0: [sdi] tag#0 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 98 00
Jun 29 13:34:04 UNRAID kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
Jun 29 13:34:04 UNRAID kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO
Jun 29 13:34:04 UNRAID kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO

 

 

 

Here's exactly where the crash happened in the log. Seems like it threw a SAS error, followed by read/write errors on disk1.

 

aaa.txt

Link to comment

Restarting the server, drive still didn't show up. Hotswapped it and now it showed up. SMART looks fine? Passes quick SMART test...


Should I rebuild the data back onto it, or disable parity and build a new parity. My last parity check was 36 days ago, so there is a chance it could be wrong if I have SAS errors going around. Either way I already transferred off my irreplaceable files.

 

My guess is that because this is a shingled archive drive, something got botched up and the SAS card freaked out when it couldn't write immediately?

 

1 Raw read error rate 0x000f 114 099 006 Pre-fail Always Never 70247960
3 Spin up time 0x0003 090 090 000 Pre-fail Always Never 0
4 Start stop count 0x0032 100 100 020 Old age Always Never 704
5 Reallocated sector count 0x0033 100 100 010 Pre-fail Always Never 0
7 Seek error rate 0x000f 079 060 030 Pre-fail Always Never 85358140
9 Power on hours 0x0032 089 089 000 Old age Always Never 10194 (1y, 1m, 28d, 18h)
10 Spin retry count 0x0013 100 100 097 Pre-fail Always Never 0
12 Power cycle count 0x0032 100 100 020 Old age Always Never 49
183 Runtime bad block 0x0032 100 100 000 Old age Always Never 0
184 End-to-end error 0x0032 100 100 099 Old age Always Never 0
187 Reported uncorrect 0x0032 100 100 000 Old age Always Never 0
188 Command timeout 0x0032 100 100 000 Old age Always Never 0
189 High fly writes 0x003a 100 100 000 Old age Always Never 0
190 Airflow temperature cel 0x0022 068 049 045 Old age Always Never 32 (min/max 32/32)
191 G-sense error rate 0x0032 100 100 000 Old age Always Never 0
192 Power-off retract count 0x0032 100 100 000 Old age Always Never 151
193 Load cycle count 0x0032 100 100 000 Old age Always Never 1423
194 Temperature celsius 0x0022 032 051 000 Old age Always Never 32 (0 20 0 0 0)
195 Hardware ECC recovered 0x001a 114 099 000 Old age Always Never 70247960
197 Current pending sector 0x0012 100 100 000 Old age Always Never 0
198 Offline uncorrectable 0x0010 100 100 000 Old age Offline Never 0
199 UDMA CRC error count 0x003e 200 200 000 Old age Always Never 0
240 Head flying hours 0x0000 100 253 000 Old age Offline Never 1247 (205 155 0)
241 Total lbas written 0x0000 100 253 000 Old age Offline Never 28089705984
242 Total lbas read 0x0000 100 253 000 Old age Offline Never 258281706892
Link to comment
9 minutes ago, tyrindor said:

My guess is that because this is a shingled archive drive, something got botched up and the SAS card freaked out when it couldn't write immediately?

 

Possible bit IMO not likely, SMART looks fine, rebuild to the same disk, you could swap cables/backplane with another disk before rebuilding just to rule that out.

Link to comment

Data rebuild was successful. SMART still has nothing wrong in it. No errors in log.

 

I don't understand what the problem was. Either a SAS cable/card/hotswap bay had a fluke... or the Seagate Archive drive had an issue with it's shingling technology which resulted in unRAID thinking the drive was unresponsive.

 

I'll keep my fingers crossed that it doesn't happen again.

Link to comment

Archived

This topic is now archived and is closed to further replies.

×
×
  • Create New...