Hello,
I was curious how I should proceed ...
I don't have a replacement disk currently and won't be able to get it until tomorrow morning ... (all replacement disks I have are smaller)
and then I'll have to do a triple preclear ...
Should I do any testing of the disk before replacing it? The system has been running fine for several months.
although in February I removed 1 of 3 2GB Ram modules as it was starting to have errors ...
Started the RMA process with OCZ over a month ago ... was approved and have neither heard ... nor received anything since.
Disk2 of 11 redballed at 8:45 am today while I was out
at 12:00 am in line 5098 of the syslog it started a mdcmd check nocorrect
at 08:45 am in line 5111 it started having errors ...
Apr 1 00:00:01 storage kernel: mdcmd (357): check NOCORRECT
Apr 1 00:00:01 storage kernel:
Apr 1 00:00:01 storage kernel: md: recovery thread woken up ...
Apr 1 00:00:01 storage kernel: md: recovery thread checking parity...
Apr 1 00:00:01 storage kernel: md: using 1152k window, over a total of 1953514552 blocks.
Apr 1 00:02:52 storage emhttp: shcmd (75): /usr/sbin/hdparm -y /dev/sdf >/dev/null
Apr 1 00:18:32 storage emhttp: shcmd (76): /usr/sbin/hdparm -y /dev/sdf >/dev/null
Apr 1 03:37:39 storage emhttp: shcmd (77): /usr/sbin/hdparm -y /dev/sdf >/dev/null
Apr 1 04:22:39 storage emhttp: shcmd (78): /usr/sbin/hdparm -y /dev/sdf >/dev/null
Apr 1 05:10:46 storage emhttp: shcmd (79): /usr/sbin/hdparm -y /dev/sdf >/dev/null
Apr 1 06:18:54 storage emhttp: shcmd (80): /usr/sbin/hdparm -y /dev/sdf >/dev/null
Apr 1 06:35:43 storage emhttp: shcmd (81): /usr/sbin/hdparm -y /dev/sdf >/dev/null
Apr 1 07:15:28 storage kernel: mdcmd (358): spindown 16
Apr 1 08:45:11 storage kernel: sas: command 0xe3d66300, task 0xc372a000, timed out: BLK_EH_NOT_HANDLED
Apr 1 08:45:11 storage kernel: sas: Enter sas_scsi_recover_host
Apr 1 08:45:11 storage kernel: sas: trying to find task 0xc372a000
Apr 1 08:45:11 storage kernel: sas: sas_scsi_find_task: aborting task 0xc372a000
Apr 1 08:45:11 storage kernel: /usr/src/sas/trunk/mvsas_tgt/mv_sas.c 1701:mvs_abort_task:rc= 5
Apr 1 08:45:11 storage kernel: sas: sas_scsi_find_task: querying task 0xc372a000
Apr 1 08:45:11 storage kernel: /usr/src/sas/trunk/mvsas_tgt/mv_sas.c 1645:mvs_query_task:rc= 5
Apr 1 08:45:11 storage kernel: sas: sas_scsi_find_task: task 0xc372a000 failed to abort
Apr 1 08:45:11 storage kernel: sas: task 0xc372a000 is not at LU: I_T recover
Apr 1 08:45:11 storage kernel: sas: I_T nexus reset for dev 0000000000000000
Apr 1 08:45:11 storage kernel: sas: I_T 0000000000000000 recovered
Apr 1 08:45:11 storage kernel: sas: --- Exit sas_scsi_recover_host
Apr 1 08:45:11 storage kernel: ata1: translated ATA stat/err 0x41/04 to SCSI SK/ASC/ASCQ 0xb/00/00
Apr 1 08:45:11 storage kernel: ata1: status=0x41 { DriveReady Error }
Apr 1 08:45:11 storage kernel: ata1: error=0x04 { DriveStatusError }
then later down I start getting stripe errors intermingled with more DriveReady errors
I last accessed that drive around 3:30 am via AirVideo over a VPN connection ... with no apparent troubles.
I tried getting a smart report for the drive which is not responding and it said
Statistics for /dev/sdb WDC_WD20EARS-00M_WD-WCAZA4631006
smartctl -a -d ata /dev/sdb
smartctl 5.39.1 2010-01-28 r3054 [i486-slackware-linux-gnu] (local build)
Copyright (C) 2002-10 by Bruce Allen, http://smartmontools.sourceforge.net
Smartctl: Device Read Identity Failed (not an ATA/ATAPI device)
A mandatory SMART command failed: exiting. To continue, add one or more '-T permissive' options.
in the unRaid FAQ it says
http://lime-technology.com/wiki/index.php?title=FAQ#How_do_I_recover_from_a_hard_disk_failure.3F
Thanks for your time,
Bobby
tower.unRaid.status-array.fault.txt
syslog-2012-04-01.zip