January 2, 20179 yr This morning I was looking at my server when I noticed the parity drive became disabled suddenly. When I checked the logs it says: Jan 2 10:27:26 Candle-Keep emhttp: err: mdcmd: write: Input/output error Jan 2 10:27:26 Candle-Keep kernel: mdcmd (130): spindown 0 Jan 2 10:27:26 Candle-Keep kernel: sd 1:0:1:0: [sdi] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00 Jan 2 10:27:26 Candle-Keep kernel: sd 1:0:1:0: [sdi] tag#0 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 e0 00 Jan 2 10:27:26 Candle-Keep kernel: md: do_drive_cmd: disk0: ATA_OP e0 ioctl error: -5 Jan 2 10:27:30 Candle-Keep emhttp: err: mdcmd: write: Input/output error Jan 2 10:27:30 Candle-Keep kernel: mdcmd (131): spindown 0 Jan 2 10:27:30 Candle-Keep kernel: sd 1:0:1:0: [sdi] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00 Jan 2 10:27:30 Candle-Keep kernel: sd 1:0:1:0: [sdi] tag#0 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 e0 00 Jan 2 10:27:30 Candle-Keep kernel: md: do_drive_cmd: disk0: ATA_OP e0 ioctl error: -5 Jan 2 10:27:34 Candle-Keep kernel: sd 1:0:1:0: [sdi] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00 Jan 2 10:27:34 Candle-Keep kernel: sd 1:0:1:0: [sdi] tag#0 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 e5 00 Jan 2 10:27:34 Candle-Keep kernel: sd 1:0:1:0: [sdi] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00 Jan 2 10:27:34 Candle-Keep kernel: sd 1:0:1:0: [sdi] tag#0 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 98 00 Jan 2 10:27:34 Candle-Keep kernel: sd 1:0:1:0: [sdi] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00 Jan 2 10:27:34 Candle-Keep kernel: sd 1:0:1:0: [sdi] tag#0 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 e5 00 Jan 2 10:27:34 Candle-Keep kernel: sd 1:0:1:0: [sdi] tag#0 UNKNOWN(0x2003) Result: hostbyte=0x04 driverbyte=0x00 Jan 2 10:27:34 Candle-Keep kernel: sd 1:0:1:0: [sdi] tag#0 CDB: opcode=0x85 85 06 20 00 00 00 00 00 00 00 00 00 00 40 98 00 This is followed by a ton of: kernel: program smartctl is using a deprecated SCSI ioctl, please convert it to SG_IO I cannot get a smart report from the parity drive as it produces the following error: Terminate command early due to bad response to IEC mode page Does anyone have an idea what happened, is my parity drive dead? My syslog is here: http://pastebin.com/BxTDrqfT
January 2, 20179 yr Author Reboot to see if it comes online and get a SMART report. Will do, I am also going to open it up and make sure all the cables are snug.
January 2, 20179 yr Author I stooped the box, checked all the cables but found nothing loose. I then powered it back up and it powered up with no errors, but the parity is still disabled. I managed to run a smart test which is here: http://pastebin.com/5r9XvGtY. I'm no expert but the smart log seems to say there is nothing wrong with the drive, this leaves me very concerned about what caused the error?
January 2, 20179 yr Community Expert Disk looks good, looking at the syslog the problem was the SASLP, it crashed, unRAID lost contact with the parity disk. Coincidentally the exact same thing happened to me this weekend, on the some controller, in my case the disk ejected from mine was a unassigned device used for the 2nd disk of my main VM, making it crash.
January 2, 20179 yr Author Disk looks good, looking at the syslog the problem was the SASLP, it crashed, unRAID lost contact with the parity disk. Coincidentally the exact same thing happened to me this weekend, on the some controller, in my case the disk ejected from mine was a unassigned device used for the 2nd disk of my main VM, making it crash. I might have to look into replacing that controller then, it is getting old after all. It should be safe then to start a Read-Check of all data disks?
January 2, 20179 yr Community Expert Might as well begin a parity sync instead. Stop array, unassign parity, start array, stop array, reassign parity do begin sync. The controller may be OK, I hope mine is, although is this happens again I'll replace it.
January 2, 20179 yr Author Thanks for your help you have been a lifesaver BTW do you think this is a good replacement card: http://www.ncix.com/detail/supermicro-aoc-sas2lp-mv8-8-channel-6gb-s-bf-62032.htm
January 2, 20179 yr Community Expert Yes you don't use VT-d (virtualization pass-trough), there are a few users with some issues with it if enable. I have 2 without any problems, but don't use VT-d on those servers. If you need VT-d recommend getting a LSI based controller, e.g., the 9211-8i, most get the Dell H310 or IBM M1015 because they are cheaper on ebay and can be crossflashed to LSI IT mode becoming for all purposes a LSI 9211-8i.
January 2, 20179 yr Community Expert I have found that it is relatively easy tor the SASLP controller to end up not perfectly aligned with the motherboard as the connector is so short. This tends to lead to momentary disconnects, particularly when the system is under load. Well worth checking for that as it is by no means obvious at a quick glance.
January 2, 20179 yr Author I have found that it is relatively easy tor the SASLP controller to end up not perfectly aligned with the motherboard as the connector is so short. This tends to lead to momentary disconnects, particularly when the system is under load. Well worth checking for that as it is by no means obvious at a quick glance. I thought of that but when I opened up the system it seemed flush. Anyways I have ordered an LSI 9211-8i which should resolve these issues I hope.
Archived
This topic is now archived and is closed to further replies.