October 17, 201213 yr I was in the process of upgrading a drive of my Unraid server (at least 12% into it), when I suddenly lost connectivity with the server. I cannot access the Web main Menu, the unMenu, or telnet. I cannot access any of the shared drives. I can still ping the server. The drive lights are all still flashing, so it seems to still be rebuilding the drive. What should I do? Just wait the 10 hours that it should take to complete or restart the server? B2
October 18, 201213 yr Author Well since I did not get any response. I decided to wait. I drug out a monitor and keyboard and hooked it up to the server. I could login that way, but it was behaving very erratically (kept booting me to the login prompt). I waited until I thought the upgrade was done and then rebooted. It came back up and immediately went into a parity check. About an hour into it, it started spitting out parity errors by the boatload. I decided to let it continue fixing parity. Not sure that was the right call (concerned it way screw up data on the other drives), because this morning I had lost connectivity again and now I cannot get anything through the directly attached monitor and keyboard. The drive lights are still flashing like crazy. Now some observation that I made when I did have terminal access: My memory was being totally cinsumed (2G) My processor load was at around 5 (I have a AMD X2 64 @ 2GHz) So before starting the parity check, I upgraded my memory to 4G and I added the unmenu swap space on my cache drive. I think the culprit to the problem is that I recently installed SABnzbd, SickBeard and CouchPotato. Before I went to bed I left a telnet session up, with top running and froze with: top - 04:51:15 up 10:32, 2 users, load average: 4.01, 3.53, 3.85 Tasks: 111 total, 1 running, 110 sleeping, 0 stopped, 0 zombie Cpu(s): 24.2%us, 66.6%sy, 0.0%ni, 1.5%id, 5.5%wa, 1.1%hi, 1.2%si, 0.0%st Mem: 4151580k total, 4029776k used, 121804k free, 14712k buffers Swap: 2072568k total, 108120k used, 1964448k free, 3850208k cached PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1870 root 20 0 99.6m 892 788 S 104 0.0 423:01.56 shfs 1454 root 20 0 1692 580 520 S 44 0.0 139:27.30 syslogd 1458 root 20 0 1640 396 340 S 34 0.0 146:11.53 klogd 1743 root 20 0 0 0 0 S 4 0.0 37:32.40 unraidd 1621 root 20 0 0 0 0 D 3 0.0 16:01.23 mdrecoveryd 350 root 20 0 0 0 0 D 2 0.0 0:25.15 kswapd0 9022 root 20 0 216m 7436 3352 S 1 0.2 1:53.11 python 9053 root 20 0 201m 3044 2596 S 1 0.1 0:18.71 python 14929 root 20 0 2120 964 792 R 1 0.0 0:24.92 top 1 root 20 0 700 264 264 S 0 0.0 0:02.13 init 2 root 20 0 0 0 0 S 0 0.0 0:00.00 kthreadd 3 root RT 0 0 0 0 S 0 0.0 0:00.01 migration/0 4 root 20 0 0 0 0 S 0 0.0 0:00.03 ksoftirqd/0 5 root RT 0 0 0 0 S 0 0.0 0:00.02 migration/1 6 root 20 0 0 0 0 S 0 0.0 0:00.02 ksoftirqd/1 7 root 20 0 0 0 0 S 0 0.0 0:00.26 events/0 8 root 20 0 0 0 0 S 0 0.0 0:00.15 events/1 Now my question is what should I do now? Reboot and disable SAB/SB/CP and start parity check again? Let it continue till I get access back? (it has already been running for about 13 hours) Put the old drive back in and restart? BTW I have 10 disks in my array for a total of 12.5T. Someone please help!! B2
October 18, 201213 yr Author And I did get a status e-mail at 4:47am. nothing since and its now 10am. Here is the e-mail: Status: The unRaid array is resync/rebuilding parity. Parity CHECK/RESYNC in progress, 21.0% complete, est. finish in 5308.7 minutes. Speed: 4843 kb/s. Server Name: Ungol Server IP: 192.168.15.211 Date: Thu Oct 18 04:47:14 EDT 2012
October 18, 201213 yr Author I can't get a syslog...I have lost all connectivity! I am considering putting the old disk back in and rebooting...please advise soon.
October 18, 201213 yr Use the attached console to copy the syslog to the flash. No one can give advise without a syslog. See here: http://lime-technology.com/forum/index.php?topic=9880.0
October 18, 201213 yr Since you cannot get a syslog in any way and you have no access a reboot is your best call. I would advise the following: - turn off your system (you cannot do it in a nice way so just press the button; - double check all your inside cabling to make sure there is not something loose screwing stuff up, if you suspect an issue replace cable; get your usb drive and make a copy of everything on it, then delete all your .PLG files to make sure no plugins will be loaded when unraid restarts - restart with console attached and do a memory check (option appears during the unraid boot cycle - if memory checks shows issues replace memory - if no memory errors occur let the system start unraid. as soon as you get a login prompt login and go to /var/log and copy syslog back to the usb drive. - let the system start and see what happens, chances are the array will not start because there are drive issues, do not take action but post the syslog here. Good luck ~!
October 18, 201213 yr Author I changed my go file too: #!/bin/bash # Start the Management Utility /usr/local/sbin/emhttp & #/boot/unmenu/uu #cd /boot/packages && find . -name '*.auto_install' -type f -print | sort | xargs -n1 sh -c Basically commenting out unmenu which I think loads all the packages that you were referring too. Will this do what you want?
October 18, 201213 yr Where and what are the .PLG files? I can't find any? They are in /config/plugins also clear your /config/extra
October 18, 201213 yr I changed my go file too: #!/bin/bash # Start the Management Utility /usr/local/sbin/emhttp & #/boot/unmenu/uu #cd /boot/packages && find . -name '*.auto_install' -type f -print | sort | xargs -n1 sh -c Basically commenting out unmenu which I think loads all the packages that you were referring too. Will this do what you want? This is also a good thing, but (after you have backupped your flashdrive) also remove all .PLG's and the files in the EXTRA directory..
October 18, 201213 yr Author Ok I do not have a config/plugin or config/extras directory. Since I ran a memtest last night, I skipped this step. I rebooted the machine and I attached the syslog. The Unraid Web Page says "Starting...". syslog-restart-121018.txt
October 18, 201213 yr Ok I do not have a config/plugin or config/extras directory. Since I ran a memtest last night, I skipped this step. I rebooted the machine and I attached the syslog. The Unraid Web Page says "Starting...". Ok, wait this out a bit to see how the unraid web page will respond, I do not see anything unusual in the syslog at first glance..
October 18, 201213 yr Author Ok the status changed to "Started". Of course this happened after I rebooted during the hang of the rebuild process. It was when i went to check parity that it hung this last time with a boat load of errors. So I don't think the disk rebuilt properly, should I start another parity check?
October 18, 201213 yr If your array now says started that means your data should be fine, do any drives show "red balls" ? We want to avoid thinking we are running ok, but a drive has failed. .. If all drives show up as green then you should be a-ok, good thing to check parity but I would not expect any errors ..
October 18, 201213 yr Author Ok the web page now says "Started". However, I am concerned that the disk that I was updating when all of this started is now corrupt. When it hung during the rebuild process, I rebooted it when I thought it was done. It cam up much like it is now. I then started a parity check to verify that the rebuild worked, it hung again and that is were I'm essentially at now. Note that the parity check was essentially spitting out parity errors continuously. If I start another Parity Check, I'm afraid I'll be in the same boat again. Is there away to put the old disk back in and get me back to where I started? What should I do now?
October 18, 201213 yr I cannot tell you how you could reinsert your old disk, someone else might .. But it would be good idea to check your SMART results on the new drive, that will tell you if something is failing there ..
October 18, 201213 yr Author Can you run parity check and only have it correct one drive? I am more afraid that it might mess up data on my other drives. Or can you force it to rebuild the drive? B2
October 18, 201213 yr Author So I started a parity check. I have attached the syslog here. These are the messages that are concerning me: Oct 18 16:47:22 Ungol kernel: REISERFS warning: reiserfs-5090 is_tree_node: node level 39162 does not match to the expected one 1 Oct 18 16:47:22 Ungol kernel: REISERFS error (device md8): vs-5150 search_by_key: invalid format found in block 89473255. Fsck? Oct 18 16:47:22 Ungol kernel: REISERFS error (device md8): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [30324 30462 0x0 SD] parity_log.txt
October 19, 201213 yr Author So a recap on what happen: I was attempting to upgrade a 640GB drive to a 1 TB drive in my system. Note that the 1 TB drive was a drive that I had used in the system previously, I had replaced it with a 2TB drive about a week earlier. I did not pre-clear the drive...probably would have helped if I had, but I do not fave a spare SATA port. During the upgrade from 640G->1TB, the system went off into the weeds. Currently, the system is recomputing parity and there are many errors (62961676 so far). I have verfied that at least some of the files have been corrupted:( I am afraid that new calculated parity is now corrupted:( I believe I have most of the files backed up on another system (thank goodness)!! The question is how should I go about restoring them? I don't think the 1 TB drive is bad, since it was operable a week ago. Should I delete the files on the unraid and then just copy the backed up files back? I'll run smart on the drive after the parity check completes, but it saying there is 4334 minutes (72 hours) remaining:( I believe the culprit of the original hang was a result of adding SABnzbd/SickBeard/CouchPotato to my unraid system. I am running vanilla unraid now (4.7). Also since I have the old disk untouched, it seems like there should be a way to either copy the data from it or downgrade back to it.
Archived
This topic is now archived and is closed to further replies.