August 6, 201015 yr So it happened, a hard drive is blinking red, unmenu is showing DSBL and DISK_DSBL. Under smart view I see "zzz...". I'm assuming my HD is dead, but can someone check my syslog and confirm? Edit: Seems like it's a write error since the drive is disabled. Look's like I need a new drive, can someone please confirm? Edit2: While I'm at it. My parity drive shows: reallocated_sector_ct=1 » high_fly_writes=69 » head_flying_hours=1.97568e+12 » attribute_241=2.19267e+09 » attribute_242=622789139 under the smart option. Anything to be worried about? Thanks and thank you unraid! Edit: Smart test attached on 7th post syslog-2010-08-05.txt
August 6, 201015 yr This is the pertinent part of your syslog. It does tell us that you have had both read and write errors on this drive (disk2), but not much more than that. Jul 27 04:56:17 Server kernel: ata6.00: status: { DRDY } Jul 27 04:56:17 Server kernel: ata6: hard resetting link Jul 27 04:56:18 Server kernel: ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300) Jul 27 04:56:23 Server kernel: ata6.00: qc timeout (cmd 0xec) Jul 27 04:56:23 Server kernel: ata6.00: failed to IDENTIFY (I/O error, err_mask=0x4) Jul 27 04:56:23 Server kernel: ata6.00: revalidation failed (errno=-5) Jul 27 04:56:23 Server kernel: ata6: hard resetting link Jul 27 04:56:28 Server kernel: ata6: link is slow to respond, please be patient (ready=0) Jul 27 04:56:33 Server kernel: ata6: COMRESET failed (errno=-16) Jul 27 04:56:33 Server kernel: ata6: hard resetting link Jul 27 04:56:38 Server kernel: ata6: link is slow to respond, please be patient (ready=0) Jul 27 04:56:43 Server kernel: ata6: COMRESET failed (errno=-16) Jul 27 04:56:43 Server kernel: ata6: hard resetting link Jul 27 04:56:48 Server kernel: ata6: link is slow to respond, please be patient (ready=0) Jul 27 04:57:18 Server kernel: ata6: COMRESET failed (errno=-16) Jul 27 04:57:18 Server kernel: ata6: limiting SATA link speed to 1.5 Gbps Jul 27 04:57:18 Server kernel: ata6: hard resetting link Jul 27 04:57:23 Server kernel: ata6: COMRESET failed (errno=-16) Jul 27 04:57:23 Server kernel: ata6: reset failed, giving up Jul 27 04:57:23 Server kernel: ata6.00: disabled Jul 27 04:57:23 Server kernel: ata6: hard resetting link Jul 27 04:57:28 Server kernel: ata6: link is slow to respond, please be patient (ready=0) Jul 27 04:57:33 Server kernel: ata6: COMRESET failed (errno=-16) Jul 27 04:57:33 Server kernel: ata6: hard resetting link Jul 27 04:57:38 Server kernel: ata6: link is slow to respond, please be patient (ready=0) Jul 27 04:57:43 Server kernel: ata6: COMRESET failed (errno=-16) Jul 27 04:57:43 Server kernel: ata6: hard resetting link Jul 27 04:57:48 Server kernel: ata6: link is slow to respond, please be patient (ready=0) Jul 27 04:58:18 Server kernel: ata6: COMRESET failed (errno=-16) Jul 27 04:58:18 Server kernel: ata6: hard resetting link Jul 27 04:58:23 Server kernel: ata6: COMRESET failed (errno=-16) Jul 27 04:58:23 Server kernel: ata6: reset failed, giving up Jul 27 04:58:23 Server kernel: ata6: EH complete Jul 27 04:58:23 Server kernel: sd 6:0:0:0: [sde] Result: hostbyte=0x04 driverbyte=0x00 Jul 27 04:58:23 Server kernel: end_request: I/O error, dev sde, sector 598063903 Jul 27 04:58:23 Server kernel: md: disk2 read error Jul 27 04:58:23 Server kernel: handle_stripe read error: 598063840/2, count: 1 Jul 27 04:58:33 Server kernel: sd 6:0:0:0: [sde] Result: hostbyte=0x04 driverbyte=0x00 Jul 27 04:58:33 Server kernel: end_request: I/O error, dev sde, sector 598063903 Jul 27 04:58:33 Server kernel: sd 6:0:0:0: [sde] Result: hostbyte=0x04 driverbyte=0x00 Jul 27 04:58:33 Server kernel: end_request: I/O error, dev sde, sector 598618815 Jul 27 04:58:33 Server kernel: md: disk2 write error Jul 27 04:58:33 Server kernel: handle_stripe write error: 598063840/2, count: 1 Jul 27 04:58:33 Server kernel: md: disk2 read error Jul 27 04:58:33 Server kernel: handle_stripe read error: 598618752/2, count: 1 Jul 27 04:58:33 Server kernel: md: disk2 read error Jul 27 04:58:33 Server kernel: handle_stripe read error: 598618760/2, count: 1 Jul 27 04:58:33 Server kernel: md: disk2 read error Jul 27 04:58:33 Server kernel: handle_stripe read error: 598618768/2, count: 1 Jul 27 04:58:33 Server kernel: md: disk2 read error Jul 27 04:58:33 Server kernel: handle_stripe read error: 598618776/2, count: 1 Jul 27 04:58:33 Server kernel: md: recovery thread woken up ... Jul 27 04:58:33 Server kernel: md: recovery thread has nothing to resync Jul 27 04:58:33 Server kernel: sd 6:0:0:0: [sde] Result: hostbyte=0x04 driverbyte=0x00 Jul 27 04:58:33 Server kernel: end_request: I/O error, dev sde, sector 598618815 Jul 27 04:58:33 Server kernel: md: disk2 write error Jul 27 04:58:33 Server kernel: handle_stripe write error: 598618752/2, count: 1 Jul 27 04:58:33 Server kernel: md: disk2 write error Jul 27 04:58:33 Server kernel: handle_stripe write error: 598618760/2, count: 1 Jul 27 04:58:33 Server kernel: md: disk2 write error Jul 27 04:58:33 Server kernel: handle_stripe write error: 598618768/2, count: 1 Jul 27 04:58:33 Server kernel: md: disk2 write error Jul 27 04:58:33 Server kernel: handle_stripe write error: 598618776/2, count: 1 Your syslog doesn't contain any SMART information about the drive. First thing to do is reseat the drive, making sure all connections are secure. It could be something as simple as a loose cable. If that doesn't fix it, then run SMART on it and post the output here. Also, if there's anything in between the drive and the motherboard (such as a PCI or PCIe SATA controller card, a hot swap bay, etc.) you may want to take it out of the equation to determine that it isn't at fault. If none of this points to the culprit, then it could be your PSU. By the way, WOW you have a lot of power outages. Good thing you use a UPS.
August 6, 201015 yr Author I will try re-seating the cable when I get home. I saw a post here on the forum about getting SMART data from a disabled drive (need to search for that). I will post that when I get home. As far as power goes, do you think that will have any affect on the drives? I do have a UPS, but the power in the condo I'm renting SUCKS. We constantly get power outages, I need to call the landlord. Thanks.
August 6, 201015 yr The UPS should protect the drives as well as the rest of the server. Unless of course it is malfunctioning...
August 7, 201015 yr Author Short Tests seem to pass. SMART overall-health self-assessment test result: PASSED # 3 Short offline Completed without error 00% 9988 - I'm running the long test now, does it usually take forever it still says 90% remaining after 30 mins? BTW: Would pulling out the HD and running Seatools be as affective? That seems easier for me.
August 7, 201015 yr Short Tests seem to pass. SMART overall-health self-assessment test result: PASSED # 3 Short offline Completed without error 00% 9988 - I'm running the long test now, does it usually take forever it still says 90% remaining after 30 mins? BTW: Would pulling out the HD and running Seatools be as affective? That seems easier for me. Long test on a large drive takes about 4 or 5 hours.
August 8, 201015 yr Author I have put all the smart tests in a txt file. It looks like all of the tests passed? I'm not sure where to go from here. Thanks Smart_Tests.txt
August 8, 201015 yr The disk seems to be working... It probably was a loose cable. To make the server re-construct the drive (remember, it was taken off-line because a write to it failed) you'll need to go through the following steps: 1. Stop the array by pressing "Stop" 2. On the "Devices" page, un-assign the failed disk. 3. On the main web-management page, press "Start" to start the array. Doing it with the disk un-assigned is exactly the same as having a failed drive, but it will also cause the unRAID array to forget the model/serial number of the drive. 4. Next, press "Stop" once more. 5. On the "Devices" page, re-assign the failed disk. 6. On the main web-management page, press "Start" to begin the re-construction of disk2 back onto itself. (It will think it is a replacement for itself because you made it forget the model/serial number of the drive in the step where you un-assigned it) Wait for the re-construction to complete. It will take a while. Hope your power stays up (very good thing that you have a UPS) Joe L.
August 11, 201015 yr Author I ran Seatools on all of my drives to double check the results, and they all passed. But it did freeze once scanning the 750gb which is the one showing the write error. I'm starting to think it might be safer to replace the HD. Have any of you guys tried Spinrite? I'm thinking of giving it a go on the 750.
August 11, 201015 yr Assuming you do not need to obtain the data off a drive, if the drives are still under warranty it doesn't make sense to spend time and effort on patching the drives instead of doing an RMA, as it doesn't alleviate the higher risk for future failures.
August 11, 201015 yr I ran Seatools on all of my drives to double check the results, and they all passed. But it did freeze once scanning the 750gb which is the one showing the write error. I'm starting to think it might be safer to replace the HD. Have any of you guys tried Spinrite? I'm thinking of giving it a go on the 750. Spinwrite might be able to find and recover data from un-readable sectors, but it will do nothing to correct a drive that "freezes" I'd RMA the drive. Your time and sanity are worth a lot more than a unreliable disk.
August 11, 201015 yr Author I ran Seatools on all of my drives to double check the results, and they all passed. But it did freeze once scanning the 750gb which is the one showing the write error. I'm starting to think it might be safer to replace the HD. Have any of you guys tried Spinrite? I'm thinking of giving it a go on the 750. Spinwrite might be able to find and recover data from un-readable sectors, but it will do nothing to correct a drive that "freezes" I'd RMA the drive. Your time and sanity are worth a lot more than a unreliable disk. Do you guys think I would have any issues RMA'ing to Seagate? Seatools passes fine (2nd try). Last question, promise Can I replace the 750gb hd with my 1.5tb parity while upgrading the parity to 2tb, in one swoop? Or would I need to replacing the 750 first, let it rebuild then swap parity drives?
August 12, 201015 yr Can I replace the 750gb hd with my 1.5tb parity while upgrading the parity to 2tb, in one swoop? Or would I need to replacing the 750 first, let it rebuild then swap parity drives? No, never do more than one disk replacement at once, unless you don't care about the data on the drive being replaced. If you care about the data on the 750 GB drive, then first replace it and let unRAID rebuild the data onto it. After that completes, then you can replace the parity drive. Or you could do it in the opposite order, it wouldn't really matter, just do it one at a time.
Archived
This topic is now archived and is closed to further replies.