October 29, 201015 yr Background info: UnRaid ver 4.5.4 Original configuration: Parity - 750GB SATA Disk 1 - 750GB SATA Disk 2 - 320GB PATA I bought 1TB SATA drive to replace 320GB PATA drive that was running low on space. Installed new 1TB SATA in place of 750GB SATA parity drive and let it check parity, which finished successfully. Then, I removed 320GB PATA drive and replaced it with former 750GB SATA parity drive. Upon starting array data rebuilding starts and proceeds to about 50% (I refresh web interface once a minute or so to see how much was done) of entire process and then array hangs. Computer continues to work, fans spin, lights are on, but web interface is not accessible and monitor attached to unRaid server does not display anything. I now tried rebuilding process twice and both times it failed at exactly same spot, i.e. about 50%. I did short SMART test of the 750GB SATA drive that's being rebuilt and report suggests that drive is healthy, I have not done long test. What should be my next step? Thank you in advance.
October 29, 201015 yr Did you check parity, or did you rebuild parity, and then check it? Have you tried putting the original drive back in place? If that works, I'd try pre clearing the troublesome drive. Give it a few runs to see how it does.
October 29, 201015 yr Author Thank you for your reply. I followed this procedure: A special case exists when the new bigger disk is also bigger than the existing parity disk. In this case you must use your new disk to first replace parity, and then replace your small disk with your old parity disk: 1. Stop the array. 2. Power down the unit. 3. Replace smaller parity disk with new bigger disk. 4. Power up the unit. 5. Start the array. 6. Wait for Parity-Sync to complete. 7. Stop the array. 8. Power down the unit. 9. Replace smaller data disk with your old parity disk. 10. Power up the unit. 11. Start the array. I tried putting 320GB PATA drive back only to get a message that drive size is too small. What do you think needs to be done next?
October 29, 201015 yr Author I have started a process of preclearing the 750GB SATA former parity drive by first unassigning it from array and then running preclear_disk.sh script. It should take another 2 hours and then I will try to rebuild data in the array. Tips are still welcome.
October 29, 201015 yr After parity sync with the new parity drive, it's highly recommended to run a parity check.
October 29, 201015 yr Author Thank you for replying. This is what I got: Parity is Valid:. Last parity check < 1 day ago with no sync errors. Can I assume that parity was checked? I didn't do parity check on purpose, could it be done automatically?
October 29, 201015 yr Thank you for replying. This is what I got: Parity is Valid:. Last parity check < 1 day ago with no sync errors. Can I assume that parity was checked? I didn't do parity check on purpose, could it be done automatically? The same date/time stamp is used when you re-construct a disk. The display is misleading you. Perform a separate parity check now that will read the data you just wrote. Until you perform that step, you'll not know if the data you wrote to the new parity drive is readable.
October 29, 201015 yr Author Joe, I took offending drive off of array to get it precleared. It's doing its thing now. Array is off line because of the warning that said that without that third drive data is not protected. Are you suggesting I bring array back online with only two drives (parity and Disk 1) and proceed with parity check? If so, could you please direct me towards correct procedure for such operation? Thank you.
October 29, 201015 yr Joe, I took offending drive off of array to get it precleared. It's doing its thing now. Array is off line because of the warning that said that without that third drive data is not protected. Are you suggesting I bring array back online with only two drives (parity and Disk 1) and proceed with parity check? If so, could you please direct me towards correct procedure for such operation? Thank you. I'm a bit confused... Are you running in degraded mode now? Do you have valid parity on the new 1TB drive? (I think you do) The drive you are currently clearing... it is the old 750Gig parity drive? It sounds like you are doing OK... Just DO NOT format any drive, or initialize a new disk configuration, or press a button labeled "restore" All you need to do is assign the 750Gig drive to the failed drive's slot and press "Start" Joe L.
October 29, 201015 yr Author Joe, You are mostly correct. Array is stopped for now as I preclear former parity drive to act as Disk 2 which failed to rebuild couple of times. So far I have no indication that drive is bad. I do have valid parity according to green dot next to new 1TB parity drive and message Parity is Valid when I take array online. Drive I'm currently clearing is the old 750GB parity drive. So, once I'm done with preclearing former parity drive I will assign to be Disk 2, just like instruction suggests as I posted above. Do I still need to perform "separate parity check" as you suggested earlier? If so, please let me know how, it must be painfully obvious probably but I don't want to screw anything up at this point. Thank you.
October 29, 201015 yr Joe, You are mostly correct. Array is stopped for now as I preclear former parity drive to act as Disk 2 which failed to rebuild couple of times. So far I have no indication that drive is bad. I do have valid parity according to green dot next to new 1TB parity drive and message Parity is Valid when I take array online. Drive I'm currently clearing is the old 750GB parity drive. So, once I'm done with preclearing former parity drive I will assign to be Disk 2, just like instruction suggests as I posted above. Do I still need to perform "separate parity check" as you suggested earlier? If so, please let me know how, it must be painfully obvious probably but I don't want to screw anything up at this point. Thank you. After you re-construct disk2 you'll want to do a parity check to make sure the data you've just written to disk2 is readable. You might want to do that one in NOCORRECT mode. That can only be initiated from the command line by typing: /root/mdcmd check NOCORRECT It looks and acts exactly like the normal parity check but will not update parity to fix an error.
October 29, 201015 yr Author Thanks, will do, hopefully after preclear it will complete re-build, otherwise I'll be in trouble me thinks.
October 30, 201015 yr Author Joe, Unfortunately rebuild failed again. I'm attaching preclear screen shot, is this drive bad? Uploaded with ImageShack.us
October 30, 201015 yr Joe, Unfortunately rebuild failed again. I'm attaching preclear screen shot, is this drive bad? does not look bad to me
October 30, 201015 yr Author Does it have anything to do with replacing 320Gb PATA drive with 750Gb SATA drive?
October 30, 201015 yr Does it have anything to do with replacing 320Gb PATA drive with 750Gb SATA drive? No... unRAID does not care... I've replaced two PATA drives in my own server with SATA drives in the same fashion. Time to get back to basics... Post a syslog. If you cannot, start a tail -f /var/log/syslog in one telnet session or on the system console and re-start the re-construction process. If it errors out perhaps the syslog will provide a clue.
October 30, 201015 yr Author Joe, Syslog only keeps what's current, right? Once server is shut down syslog is cleared? I'm asking because once it errors out on rebuild I can not access array in any way, console or via browser. Perhaps tail -f /var/log/syslog will save it even after restart?
October 30, 201015 yr Joe, Syslog only keeps what's current, right? Once server is shut down syslog is cleared? I'm asking because once it errors out on rebuild I can not access array in any way, console or via browser. Perhaps tail -f /var/log/syslog will save it even after restart? It will not save it, but when it dies the final lines will be on the screen.
October 30, 201015 yr Author Unfortunately, when array died no new lines were displayed. I did save syslog when I started server. I'm attaching it. Not sure where to go next. I disabled PATA in BIOS, but this line is still reports error in syslog: Tower kernel: PIIX_IDE: probe of 0000:00:1f.1 failed with error -12 syslog-2010-10-30.txt
October 30, 201015 yr Author After disabling PATA I'm running re-build again. Here are errors that I managed to capture so far and looks like this time re-build passed dreaded 50% mark were it used to hang array: Oct 30 12:18:38 Tower in.telnetd[2603]: connect from 192.168.1.110 (192.168.1.110) Oct 30 12:18:41 Tower login[2604]: ROOT LOGIN on `pts/0' from `192.168.1.110' Oct 30 13:12:06 Tower kernel: ata1.00: exception Emask 0x12 SAct 0x0 SErr 0x1000500 action 0x6 Oct 30 13:12:06 Tower kernel: ata1.00: BMDMA stat 0x5 Oct 30 13:12:06 Tower kernel: ata1: SError: { UnrecovData Proto TrStaTrns } Oct 30 13:12:06 Tower kernel: ata1.00: failed command: READ DMA EXT Oct 30 13:12:06 Tower kernel: ata1.00: cmd 25/00:00:47:df:37/00:04:18:00:00/e0 tag 0 dma 524288 in Oct 30 13:12:06 Tower kernel: res 51/84:50:f7:e1:37/84:01:18:00:00/e0 Emask 0x12 (ATA bus error) Oct 30 13:12:06 Tower kernel: ata1.00: status: { DRDY ERR } Oct 30 13:12:06 Tower kernel: ata1.00: error: { ICRC ABRT } Oct 30 13:12:06 Tower kernel: ata1: hard resetting link Oct 30 13:12:06 Tower kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) Oct 30 13:12:06 Tower kernel: ata1.00: configured for UDMA/133 Oct 30 13:12:06 Tower kernel: ata1: EH complete
October 30, 201015 yr Author Rebuilding is done the only thing that I changed to achieve that was to disable PATA in bios. Array is in process of parity check now. I also did what Joe recommended right after re-build and here's result: Tower login: root Linux 2.6.32.9-unRAID. root@Tower:~# /root/mdcmd check NOCORRECT cmdOper=check cmdResult=ok root@Tower:~#
October 30, 201015 yr Rebuilding is done the only thing that I changed to achieve that was to disable PATA in bios. Array is in process of parity check now. I also did what Joe recommended right after re-build and here's result: Tower login: root Linux 2.6.32.9-unRAID. root@Tower:~# /root/mdcmd check NOCORRECT cmdOper=check cmdResult=ok root@Tower:~# Ok, now on the web-interface it will show a parity check in progress. This time it is really a "check" and not a check/correct. It will show the errors as if it is correcting them, but it is not.
October 31, 201015 yr Author OK, parity check is complete, no errors found. I rebooted server to see if earlier error would show up and only one error still exists: Oct 30 20:42:33 Tower kernel: PIIX_IDE: probe of 0000:00:1f.1 failed with error -12 I'm attaching entire syslog after fresh boot up. Could you please shed some light on what this error is and if I should worry about it? I'm assuming that the rest of syslog looks ok, although the following line worries me: Oct 30 20:42:33 Tower kernel: ata1: SATA link up 1.5 Gbps (SStatus 113 SControl 310) I thought that SATA drive should work at 3Gbps. There isn't any jumper on the drive as far as I know. Thanks for looking. syslog-2010-10-302.txt
Archived
This topic is now archived and is closed to further replies.