eltigro Posted May 28, 2010 Share Posted May 28, 2010 Hi There, I have logged in today and found that my tower is reporting an offline drive 'sdc' it has removed it from the array as it has 3 errors. I don't know if its a drive problem or not. Below is a smartctl output, but im not sure what it means. I've also attached a syslog, I chopped it downa bit as it was the same message repeated over and over, can anyone help? root@tower:~# smartctl -a -d ata /dev/sdc smartctl version 5.38 [i486-slackware-linux-gnu] Copyright © 2002-8 Bruce Alle n Home page is http://smartmontools.sourceforge.net/ Smartctl: Device Read Identity Failed (not an ATA/ATAPI device) A mandatory SMART command failed: exiting. To continue, add one or more '-T perm issive' options. root@tower:~# The drive is still accessable through \\tower\disk1, so im not entirely sure what to do, There has been no disruption to the unit or drives recently. Should I just do a trust my array? Many Thanks Kev syslog-a.txt Link to comment
Joe L. Posted May 28, 2010 Share Posted May 28, 2010 DO NOT USE THE TRUST MY ARRAY UNLESS YOU DO NOT WANT ANY OF THE DATA ON THE FAILED DISK!!!!! It has failed. It is not responding to the smartctl command. Do not press the button marked as "Restore" It is actually a "Delete Disk Configuration and Immediately Invalidate Parity" button. Pressing it will throw away your parity data that is currently allowing the parity drive and the other drives in your array to "simulate" the failed disk. Joe L. Link to comment
Joe L. Posted May 28, 2010 Share Posted May 28, 2010 Your syslog shows there errors: May 28 13:14:36 tower kernel: ata5.00: qc timeout (cmd 0xec) May 28 13:14:36 tower kernel: ata5.00: failed to IDENTIFY (I/O error, err_mask=0x4) May 28 13:14:36 tower kernel: ata5.00: revalidation failed (errno=-5) May 28 13:14:36 tower kernel: ata5: hard resetting link May 28 13:14:37 tower ntpd[1576]: synchronized to 88.198.39.175, stratum 3 May 28 13:14:37 tower ntpd[1576]: time reset +0.176077 s May 28 13:14:42 tower kernel: ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310) May 28 13:15:12 tower kernel: ata5.00: qc timeout (cmd 0xec) May 28 13:15:12 tower kernel: ata5.00: failed to IDENTIFY (I/O error, err_mask=0x4) May 28 13:15:12 tower kernel: ata5.00: revalidation failed (errno=-5) May 28 13:15:12 tower kernel: ata5.00: disabled May 28 13:15:12 tower kernel: ata5: hard resetting link May 28 13:15:12 tower kernel: ata5: SATA link up 1.5 Gbps (SStatus 113 SControl 310) May 28 13:15:12 tower kernel: ata5: EH complete May 28 13:15:12 tower kernel: sd 5:0:0:0: [sdc] Unhandled error code May 28 13:15:12 tower kernel: sd 5:0:0:0: [sdc] Result: hostbyte=0x04 driverbyte=0x00 May 28 13:15:12 tower kernel: sd 5:0:0:0: [sdc] CDB: cdb[0]=0x2a: 2a 00 00 00 5a 1f 00 00 10 00 May 28 13:15:12 tower kernel: end_request: I/O error, dev sdc, sector 23071 May 28 13:15:12 tower kernel: md: disk1 write error May 28 13:15:12 tower kernel: handle_stripe write error: 23008/1, count: 1 May 28 13:15:12 tower kernel: md: disk1 write error May 28 13:15:12 tower kernel: handle_stripe write error: 23016/1, count: 1 May 28 13:15:12 tower kernel: md: recovery thread woken up ... May 28 13:15:13 tower kernel: md: recovery thread has nothing to resync May 28 13:15:13 tower kernel: sd 5:0:0:0: [sdc] Unhandled error code May 28 13:15:13 tower kernel: sd 5:0:0:0: [sdc] Result: hostbyte=0x04 driverbyte=0x00 May 28 13:15:13 tower kernel: sd 5:0:0:0: [sdc] CDB: cdb[0]=0x2a: 2a 00 00 00 00 bf 00 00 08 00 May 28 13:15:13 tower kernel: end_request: I/O error, dev sdc, sector 191 May 28 13:15:13 tower kernel: md: disk1 write error May 28 13:15:13 tower kernel: handle_stripe write error: 128/1, count: 1 The disk controller was re-initialized several times in an attempt to re-establish communications with the disk. It failed when attempting to write to the disk. You could Stop the array (PRess the "Stop" button) Power down Verify the power and data cable to disk1 are not loose Power Up If you can then get a smart report, we can proceed. Otherwise, get yourself a new disk drive. Then power down Install it on the same port as the old Power up Press "Start" to begin the process of re-constructing the old contents onto the replacement drive. Keep your fingers off the button labeled as "restore" It will throw away your ability to reconstruct the old contents onto the new. As I said, it has NOTHING to do with restoring data, but it will delete the disk configuration and initialize a new configuration based on the WORKING drives..., immediately invalidate parity (as if you never calculated it) and begin the process of calculating new for the first time. (basically, your old disk would not be in the new config, nor would its data) Joe L. Link to comment
eltigro Posted May 28, 2010 Author Share Posted May 28, 2010 Hi Joe, Yea I wouldn't hit the RESTORE button, I read up about that. After i posted I done some digging. The drive is toast. Am I right in assuming everything will 'appear' as normal i.e. all files can be seen as parity is being used to display the files that would normally appear? The difference being no protection on system in the event of another disk failure. I removed the drive, rebooted then reselected it, to add back into the array, picking the option to start/rebuild, just incase the errors were a glitch, it then threw up 292 errors. So I have now removed the duff drive and will be replacing with a 1.5TB on tuesday. I'm guessing I follow the normal drive upgrade procedure: Make sure the old drive is de-selected from devices reboot assign new drive in it's place start/expand array BTW your an asset to this community big style, I don't know how you manage to do it. TIA Kev Link to comment
Joe L. Posted May 29, 2010 Share Posted May 29, 2010 Hi Joe, Yea I wouldn't hit the RESTORE button, I read up about that. After i posted I done some digging. The drive is toast. Am I right in assuming everything will 'appear' as normal i.e. all files can be seen as parity is being used to display the files that would normally appear? The difference being no protection on system in the event of another disk failure. Exactly. You are currently not protected from a second disk failing. It is parity in combination with all the other drives that is simulating the missing drive. I removed the drive, rebooted then reselected it, to add back into the array, picking the option to start/rebuild, just incase the errors were a glitch, it then threw up 292 errors. So I have now removed the duff drive and will be replacing with a 1.5TB on tuesday.Sounds good. (or rather, the disk sounds like it is bad) I'm guessing I follow the normal drive upgrade procedure: Make sure the old drive is de-selected from devices reboot assign new drive in it's place start/expand array BTW your an asset to this community big style, I don't know how you manage to do it. TIA Kev actually, all you need to do is power down put the replacement drive in place of the defective drive power up Press "Start" (after checking the I'm sure checkbox under it) The re-construction will begin. You can use the array as this is in progress, but it will slow the re-construction down a little bit if you do. I once played 4 different ISO images from the missing "simulated" drive while it was being re-constructed in one of my early tests of my server. There is no need to un-assign the old drive, nor to assign the new. unRAID will see you are replacing the drive and deal with the disk assignment on it own as long as you use the same disk controller port. Joe L. Link to comment
eltigro Posted June 2, 2010 Author Share Posted June 2, 2010 Hi Joe, so I received the new 1.5TB drive. As I had unassigned the old 750GB, I went into the devices and assigned the new 1.5Tb to 'disk1' same controller port and cable (using an incybox 5 in 3). Went back to main and the gui offered to rebuild and i ticked that I want to do this and start the array. It seemed to go off fine, and reported about 5hours rebuild time. I came back later, bout 30mins and it was idle and online. Except I now have a red dot next to my 'disk1' and the array is unprotected. when I try to access information on the disk thru '\\tower\disk1, all the other drive flash so im assuming it still simulating thr data from the parity backup. Prior to the reboot, there were alot of write errors to the drive, so I'm guessing the drive is a lemon or for some reason I cant build from parity ive attached a syslog. how to i initiate a rebuild? ******UPDATE******** So i removed the drive and managed to initiate the rebuild process but it bombed out within a minute, then the temp gets set to '0' degrees The smart report says Smartctl: Device Read Identity Failed (not an ATA/ATAPI device) same as before maybe a cable issue? im off to give that a try. ive attached another syslog - called syslog2 ******************************************* what can i do? Kevin syslog.txt Link to comment
eltigro Posted June 2, 2010 Author Share Posted June 2, 2010 here is the second syslog syslog2.txt Link to comment
Joe L. Posted June 2, 2010 Share Posted June 2, 2010 Hi Joe, so I received the new 1.5TB drive. As I had unassigned the old 750GB, I went into the devices and assigned the new 1.5Tb to 'disk1' same controller port and cable (using an incybox 5 in 3). Went back to main and the gui offered to rebuild and i ticked that I want to do this and start the array. It seemed to go off fine, and reported about 5hours rebuild time. I came back later, bout 30mins and it was idle and online. Except I now have a red dot next to my 'disk1' and the array is unprotected. when I try to access information on the disk thru '\\tower\disk1, all the other drive flash so im assuming it still simulating thr data from the parity backup. Prior to the reboot, there were alot of write errors to the drive, so I'm guessing the drive is a lemon or for some reason I cant build from parity ive attached a syslog. how to i initiate a rebuild? ******UPDATE******** So i removed the drive and managed to initiate the rebuild process but it bombed out within a minute, then the temp gets set to '0' degrees The smart report says Smartctl: Device Read Identity Failed (not an ATA/ATAPI device) same as before maybe a cable issue? im off to give that a try. ive attached another syslog - called syslog2 ******************************************* what can i do? Kevin I would suspect either the POWER cable or the SATA cable, or the PORT on the drive controller at this point or the connector on your 5-in-3 adapter. To try to get it to rebuild once more, you need to un-assign disk1, start the array, then stop the array, re-assign disk1, and start once more. This is of course after stopping the array, powering down, securing the loose cable (if it is that) and then powering up once more. What command "exactly" did you type to attempt to get the smart report? It should have been against /dev/sdb Also, be aware your 4.5.3 version of unRAID has a serious bug whereby when first powering up the array all the disks could show as unformatted. DO NOT PRESS THE FORMAT BUTTON!!! Instead, just stop the array and then press "Start" once more. (And then update to 4.5.4 where that bug is fixed) As long as you do not press the FORMAT button, you will not lose data, but there have been people who did press it, and formatted all their drives... and it was painful to un-format them. Also, DO NOT press the button labeled as "restore" as it is actually a "Delete Disk Configuration and Immediately Invalidate Parity" button. It is NOT what you want to do with a failed disk. Joe L. Link to comment
eltigro Posted June 2, 2010 Author Share Posted June 2, 2010 yea i run it against the sdb. yup replaced the cable and all seems ok so far! im gutted cause the cabling was so neat. maybe the 750TB wasnt toast after all. thanks a mill!! will keep u posted Kev Link to comment
eltigro Posted June 3, 2010 Author Share Posted June 3, 2010 just to let you know, the rebuild went fine. must have been those cables! cheers again Kev Link to comment
Joe L. Posted June 3, 2010 Share Posted June 3, 2010 just to let you know, the rebuild went fine. must have been those cables! cheers again Kev Great news... Now, update to unRAID 4.5.4 before you get caught by the bug in 4.5.3 that makes all your disks appear un-formatted. Joe L. Link to comment
eltigro Posted June 3, 2010 Author Share Posted June 3, 2010 just to let you know, the rebuild went fine. must have been those cables! cheers again Kev Great news... Now, update to unRAID 4.5.4 before you get caught by the bug in 4.5.3 that makes all your disks appear un-formatted. Joe L. ha yea the unformatted bug did kick in, just ignored it tho all good Kev Link to comment
Recommended Posts
Archived
This topic is now archived and is closed to further replies.