January 8, 200917 yr Hi, this is a 'test before I buy' unRAID build using the following hardware: Atom CPU/MB (D945GCLF) 2GB RAM (no errors in 10 passes of Memtest) Sil3114 PCI SATA controller 1TB WD GreenDrive (via PCI) 500GB WD GreenDrive (via PCI) 1TB Seagate (set as parity, connect to MB) Since the error first occurred (did not get the log), I would power on the system and allow a parity-check to run over night. After confirming the check is complete and valid, I would press Stop. This sends the current browser window into a "Waiting..." stage in which it never recovers from. Refreshing and opening other browser windows will not connect (it will not time out either). I can still ping and telnet into the system, however. This would happen in both 4.4 and 4.4.2, and I have tried doing this with the drives spun up or down. The log stops after the following: Jan 7 19:29:33 UnRaid emhttp: Spinning up all drives... Jan 7 19:29:51 UnRaid emhttp: shcmd (60): /etc/rc.d/rc.samba stop >/dev/null Jan 7 19:29:51 UnRaid emhttp: shcmd (61): /etc/rc.d/rc.nfsd stop >/dev/null Jan 7 19:29:52 UnRaid emhttp: Spinning up all drives... Jan 7 19:29:52 UnRaid emhttp: shcmd (62): sync Jan 7 19:29:52 UnRaid emhttp: shcmd (63): umount /mnt/user Jan 7 19:29:52 UnRaid emhttp: shcmd (64): rmdir /mnt/user Jan 7 19:29:52 UnRaid emhttp: shcmd (65): umount /mnt/disk1 Jan 7 19:29:52 UnRaid emhttp: shcmd (65): umount /mnt/disk2 Jan 7 19:29:53 UnRaid emhttp: shcmd (66): rmdir /mnt/disk1 Jan 7 19:45:15 UnRaid emhttp: shcmd (65): /usr/sbin/hdparm -y /dev/sdc >/dev/nu ll Jan 7 19:45:15 UnRaid emhttp: shcmd (66): /usr/sbin/hdparm -y /dev/sdb >/dev/nu ll Jan 7 19:45:15 UnRaid emhttp: shcmd (67): /usr/sbin/hdparm -y /dev/sda >/dev/nu From a different thread, I've read that to restart the browser process, use "/usr/local/sbin/emhttp &" When that is done, the command fails to kill the original process and starts a second one anyways. The browser interface is still stuck until I kill the second, newer process. Now with the browser working, both my drives show up as Unformatted (I am assuming the drives are somehow busy from the first Stop). Stopping the array again causes the browser interface to become unresponsive. The log this time is: Jan 7 20:06:29 UnRaid emhttp: unRAID System Management Utility version 4.4.2 Jan 7 20:06:29 UnRaid emhttp: Copyright (C) 2005-2008, Lime Technology, LLC Jan 7 20:06:29 UnRaid emhttp: Unregistered Jan 7 20:06:29 UnRaid emhttp: shcmd (1): cp /boot/config/passwd /etc Jan 7 20:06:29 UnRaid emhttp: shcmd (2): cp /boot/config/smbpasswd /etc/samba/p rivate Jan 7 20:06:29 UnRaid emhttp: Device inventory: Jan 7 20:06:29 UnRaid emhttp: pci-0000:00:1f.2-scsi-0:0:0:0 (sdc) ata-ST3100033 3AS_9TE0EWJD Jan 7 20:06:29 UnRaid emhttp: pci-0000:04:00.0-scsi-2:0:0:0 (sda) ata-WDC_WD10E ACS-00ZJB0_WD-WCASJ1234768 Jan 7 20:06:29 UnRaid emhttp: pci-0000:04:00.0-scsi-3:0:0:0 (sdb) ata-WDC_WD500 0AACS-00ZUB0_WD-WCASU1780313 Jan 7 20:06:52 UnRaid emhttp: shcmd (3): rmmod md-mod >>/var/log/go 2>&1 Jan 7 20:06:52 UnRaid emhttp: shcmd: shcmd (3): exit status: 1 Jan 7 20:06:52 UnRaid emhttp: shcmd (4): modprobe md-mod super=/boot/config/sup er.dat slots=8,32,8,16,8,0 >>/var/log/go 2>&1 Jan 7 20:06:53 UnRaid emhttp: shcmd (5): /etc/rc.d/rc.samba stop >/dev/null Jan 7 20:06:53 UnRaid emhttp: shcmd: shcmd (5): exit status: 1 Jan 7 20:06:53 UnRaid emhttp: shcmd (6): /etc/rc.d/rc.nfsd stop >/dev/null Jan 7 20:06:54 UnRaid emhttp: shcmd (7): cp /etc/exports- /etc/exports Jan 7 20:06:54 UnRaid emhttp: shcmd (: /etc/rc.d/rc.samba start >/dev/null Jan 7 20:06:54 UnRaid emhttp: shcmd (9): /etc/rc.d/rc.nfsd start >/dev/null Jan 7 20:06:54 UnRaid emhttp: main: can't bind listener socket: Address already in use Jan 7 20:07:14 UnRaid kernel: mdcmd (97): stop Jan 7 20:07:14 UnRaid kernel: md: 2 devices still in use. Jan 7 20:07:14 UnRaid emhttp: shcmd (68): cp /etc/exports- /etc/exports Jan 7 20:07:14 UnRaid emhttp: shcmd (69): /etc/rc.d/rc.samba start >/dev/null Jan 7 20:07:14 UnRaid emhttp: shcmd (70): /etc/rc.d/rc.nfsd start >/dev/null Jan 7 20:07:15 UnRaid emhttp: Spinning up all drives... Jan 7 20:07:15 UnRaid emhttp: Spinning up all drives... Jan 7 20:07:28 UnRaid emhttp: shcmd (71): /etc/rc.d/rc.samba stop >/dev/null Jan 7 20:07:28 UnRaid emhttp: shcmd (72): /etc/rc.d/rc.nfsd stop >/dev/null Jan 7 20:07:29 UnRaid emhttp: Spinning up all drives... Jan 7 20:07:29 UnRaid emhttp: shcmd (73): sync From here, I attempted to do a clean shutdown and restart of the system. Any pointers as to what I can try or what the problem is?
January 8, 200917 yr I will need to see the entire system log. You can copy it to the Flash via telnet or console: cp /var/log/syslog /boot/syslog.txt This command copies to the root of the flash where it will appear in the 'flash' share, or you can plug the flash into your PC. Something you might try is reassigning your drives so that only one is assigned, say to disk1 (not parity). Then repeat 'Stop' command. Do this until you find the drive/port that's causing the problem.
January 8, 200917 yr From a different thread, I've read that to restart the browser process, use "/usr/local/sbin/emhttp &" When that is done, the command fails to kill the original process and starts a second one anyways. The browser interface is still stuck until I kill the second, newer process. Now with the browser working, both my drives show up as Unformatted (I am assuming the drives are somehow busy from the first Stop). Stopping the array again causes the browser interface to become unresponsive. First, you mis-understood my post where I described how to re-start the emhttp process. OMLY do that if emhttp has been killed by the os (because you ran out of ram) and does not currently exist. I have no idea what it will do if two are run at the same time, but I know one thing for sure, it will not work normally. I've never had to "kill" my existing emhttp process because it was un-responsive. The only time it is un-responsive is if it is clearing a disk when adding a new one. The only time "stopping" an array has caused errors is if the power supply was not up to it, or there was an interrupt conflict or DMA errors of some kind. There are many times where you will need to press the "Refresh" button on the browser to get it to update. The process of spinning up all the drives in some versions of unRAID caused problems on certain hardware. Try the newer 4.2.2 version released yesterday. It is the best version at this time. In any case, if you type: ps -ef at the command line and see an existing emhttp process, do not start another one. You gave tiny pieces of your syslog. To help we really need to see the whole log, before you reboot. I suggest you open a telnet window and run tail -f /var/log/syslog It will let you see the errors. I suspect your motherboard will need one of the "boot codes" as described in the wiki. http://lime-technology.com/wiki/index.php?title=Boot_Codes Joe L.
January 9, 200917 yr Author First, you mis-understood my post where I described how to re-start the emhttp process. Oops, you're right. That was only if the process isn't running. Something you might try is reassigning your drives so that only one is assigned, say to disk1 (not parity). Then repeat 'Stop' command. Do this until you find the drive/port that's causing the problem. Over the weekend, I have tried adding the data drives in one at a time with no problems. I would power on, add a drive, format the drive, and check to see if its accessible of the network. If all is ok, I would stop the array, power down, and then add the second drive and repeat - with the parity drive being the last one added and the sync worked fine. I'm not sure if the problem could be caused by the parity drive, parity-check, or the long inactivity after the parity-check. I'll try putting the parity drive on a different cable and port as well as playing with the boot codes. Anyways, I've already rebooted, so attached is the log of a duplication of the problem.
January 9, 200917 yr Like h2tran, I am also in the "try before you buy" phase. I have been running a 9TB WHS system since the WHS alpha and am planning to migrate to something else... and unRAID looks like just the ticket. My first eval installation resulted in a bricked on-mobo e1000 LAN interface (more on this in a separate posting). So instead I turned to a good ol' reliable beast: my handbuilt Frankenbox that I had been using for my WHS server. The system is based on a SuperMicro X5DAL-TG2 mobo with 2 x 3.06GHz Xeons and 4GB of ECC RAM. Storage is 8 x 1TB data drives connected to a SuperMicro AOC-SAT2-MV8 and two additional drives (intended for parity and possibly cache) connected to the on-mobo SiI3112. This configuration is tried, tested, and absolutely rock solid... I have been running it for years. To evaluate unRAID, I scribbled 4.4.1 onto a USB stick and booted up the system. For speed of testing, I only used three drives: two data drives connected to the AOC-SAT2-MV8 and one parity drive connected to the SiI3112. The system installed and configured fine, and I had an array successfully built with parity testing OK. Then, when I tried to stop it, I ran into the same "web GUI freezes but network and console access are still OK" issue as h2tran. I have attached my syslog file from 4.4.1 to this message. With the system failing (and having discovered this thread via Google), I tried upgrading to 4.4.2... same problem. So just for grins I tried downgrading to 4.3.3... no joy there either. Based on what I've seen here, and based on experience gained from 30 years experience as a developer, I naturally started simplifying as much as possible... turning off unused devices in the BIOS, moving all drives onto the AOC-SAT2-MV8 only, moving the AOC-SAT2-MV8 to another slot, trying upteen combinations of boot parameters, etc. All fruitless. Above and beyond the usual cruft one finds in a Linux dmesg trace, I didn't see anything awful in the log that could account for this behavior... but of course I may not know what to look for. So I too would be grateful for any assistance in resolving this, and would be happy to be a test bed for any ideas you may have. Thanks! KoB
January 9, 200917 yr h2tran- your syslog indicates 'stop' is hanging on trying to un-mount disk2. Is it possible you a have something accessing this disk, perhaps in background? King- your syslog indicates 'stop' is hanging trying to do a 'sync' (h2tran gets past this point). In both cases it appears the drives were spun down when you clicked 'Stop' button. Do you see same problem (unresponsive GUI) if the drives are spun up when you click Stop button? King- saw your other thread about your NIC getting bricked - I think kernel in 4.4.x has fix for this already.
January 9, 200917 yr The game is afoot... Your comment about sync not completing got me thinking... so I did a bit more poking around. With a freshly-rebooted 4.4.2 system and all three disks active (parity rebuild), I logged into the system console and ran /bin/sync... and it hung! A quick look at the mount table showed, as expected, that only the USB stick and the data drives were mounted. Suspecting I might have a bad USB stick which would not sync, I installed the system on another stick... and had the same problem. Next, I booted the new stick without any drives installed. /bin/sync worked a treat. This of course pointed the finger at the data and/or parity disks. I could have tried reinitializing the existing disks to be absolutely scientific about it, but instead I put two other, never-unRAIDed-before data drives in along with my existing parity disk. The disks were formatted to completion and parity rebuild began. Tempting fate, I logged into the console while the parity rebuild was still going and ran /bin/sync... and it worked. After the parity rebuild completed, I tried this again and it still worked. So, on to the final test: stopping the array. I clicked, and IT WORKED. I am now proceeding with additional test/eval, but things look good. In summary. it seems as though my problem was due to a glitch in the combination of data and parity drives I had originally formatted/built for testing. Replacing the data drives fixed the problem; it's possible, but not verified, that simply reinitializing the "bad" drive(s) would've fixed the problem too. Later, when I add the "bad" drives to the array, it'll be interesting to see if things still work OK. Thanks for your quick response... hope this info is useful to h2tran and others. KoB
January 11, 200917 yr Author It seems my problem may lie with the PCI controller card. Just testing with 2 drives (1 data, 1 parity): with the controller card in place and using any port, it'll freeze at the same spot every time (at the rmdir command); however it works fine when both are plugged directly to the motherboard. Thanks all for the help and suggestions.
Archived
This topic is now archived and is closed to further replies.