May 21, 201016 yr Hi all Whilst watching a TV episode from one of my disks, my server crashed (the image froze in xbmc). The server was totally unresponsive so was forced to power it off. On reboot, 4 drives (/dev/sdf /dev/sdg /dev/sdi /dev/sdh) were shown as red, with /dev/sdi also saying unavailable. I rebooted again, this time, only /dev/sdi was shown as red, and in unmenu shown as DISK_DSBL. Also, during boot post for the controller (sil3114) the following error was displayed Warning. Have option ROM can not be invoke (vendor ID:1095h, Device ID 3114h). Since all drives originally shown as red are on the same controller, and I also have a second sil3114 controller in the server that is just used for a non-array disk, I decided to swap the controller (and remove the non-array disk). On reboot the boot post error for the controller is no longer displayed. But the disk (/dev/sdi) is still shown as red. I have looked at the syslog and can't see any errors (attached). I have also run hdparm and smart statistics from unmenu (attached). Is the drive actually faulty and needs to be replaced? Do I need to set it to trust the disk or run some other procedure? Is it likely that the original controller is faulty (currently removed from server)? Any advise is appreciated Thanks Jake syslog-2010-05-21.txt hdparm.txt smart.txt
May 21, 201016 yr Hi all Whilst watching a TV episode from one of my disks, my server crashed (the image froze in xbmc). The server was totally unresponsive so was forced to power it off. On reboot, 4 drives (/dev/sdf /dev/sdg /dev/sdi /dev/sdh) were shown as red, with /dev/sdi also saying unavailable. I rebooted again, this time, only /dev/sdi was shown as red, and in unmenu shown as DISK_DSBL. Also, during boot post for the controller (sil3114) the following error was displayed Warning. Have option ROM can not be invoke (vendor ID:1095h, Device ID 3114h). Since all drives originally shown as red are on the same controller, and I also have a second sil3114 controller in the server that is just used for a non-array disk, I decided to swap the controller (and remove the non-array disk). On reboot the boot post error for the controller is no longer displayed. But the disk (/dev/sdi) is still shown as red. I have looked at the syslog and can't see any errors (attached). I have also run hdparm and smart statistics from unmenu (attached). Is the drive actually faulty and needs to be replaced? Do I need to set it to trust the disk or run some other procedure? Is it likely that the original controller is faulty (currently removed from server)? Any advise is appreciated Thanks Jake For a disk to be show in "RED" a "write" to it had to have failed. If a "write" to a disk failed, it does not have the contents that it is expected to have. If you force the server to trust parity, and subsequently force the disk with the contents partially written online, then the next time you perform a full parity check, the parity disk will then contain bits representing partially written content of the failed drive. If instead, you un-assign the failed drive, start the server without it, then stop the server, re-assign the failed drive, it will reconstruct the contents onto the failed drive. This WILL have the contents that were originally was being attempted to be written to the "RED" drive, but "failed" Your call. If you were not "writing" anything critical to the drive when it was taken off-line, force it back online. It is nearly guaranteed it will be incorrect somewhere, even if just in an update time-stamp when the drive was being mounted. If you want to the array online with parity protection as quickly as possible, force it back online. If you trust your other disks and parity, let them rebuild the failed drive. Joe L.
May 21, 201016 yr Author Thanks Joe L, I ran a parity check 8 days a go with no errors and do trust the parity data, and there definately was no write being performed at the time of the crash. I have unassigned the drive, rebooted, re-assigned, and the rebuild has started. I suspect this is linked to the controller, I have two identical sil3114 controllers, I only added the second last week, the first has been running without issue for 2 years. It is the first controller that is now removed because of the error message, do you think I can trust this controller? I have been unable to find any meaning full information on the error. Jake
Archived
This topic is now archived and is closed to further replies.