Server Issues after having hard drive mounting problems in SNAP


Recommended Posts

Ok so this started in the SNAP thread, but now my problems seem to be larger than that. I figure I probably should move my problems over here, but here is the thread in case you are interested. http://lime-technology.com/forum/index.php?topic=5904.210, starting around reply#217,.

 

My version of unraid is 4.5.6. I can't get to my system log to post it. I saved it to the usb but I have no way of shutting down my server that I know of. I can't get in via telnet or webGUI. Also, I ran the "tail -f --lines=100 /var/log/syslog" command and now I just have a blinking command prompt. No root@tower to be seen.

 

I was trying to get SNAP working and get it to mount my NTFS drive correctly. Created the share, but it wasn't showing up so then the following happened:

 

I installed the ntfs add-on from the unraid menu thinking well maybe it would help, even though I shouldn't need it. So I restarted and now the last two lines on my command screen are:

/etc/rc.d/rc/inet1.conf : line 18: /boot/config/network.cfg: No such file or directory

/etc/rc.d/rc.inet1.conf: line 19: /var/tmp/network.ctfg: No such file or directory

There are a bunch of lines above those that I don't recognize either.

At log-in on the server console it didn't ask for my password and let me in after root.

 

I can't access the server from telnet or the web interface. I haven't yet set a static ip so Im sure that doesnt help since the router went down for a minute during reboot.

 

Edit:

Just tried /etc/rd.d/rc.inetd restart and I get

-bash: /etc/rd.d/rc.inetd: No such file or directory

 

Tried: smbstatus get

/var/cache/samba/locking.tdb not initialised

This is normal if an SMB client has never connected to your sever.

-Which is false since I have been sharing my movies this way across my home network

Ugh Angry

 

Testparm:

Load smb config files from /etc/samba/smb.conf

rlimit_max: rlimit_max (1024) below minimum Windows limit (16384)

Can't find include file /etc/samba/smb-names.conf

Can't find include file /boot/config/smb-extra.conf

Can't find include file /etc/samba/smb-shares.conf

Loaded services file OK.

Server role: ROLE_STANDALONE

[global]

          passdb backend = smbpasswd

          syslog = 0

          syslog only = Yes

          unix extensions = No

          loadprinters = No

          printcap name = /dev/null

          disable spoolss = Yes

          show add printer wizard = No

          use sendfile = Yes

          msdfs root = Yes

Then:

There is something not right for sure.  Is your array still functional?  Any weird messages in syslog?  If the array parts are working then maybe the drive, cable or something like that is bad.

 

How do I dump a syslog from the command line?(Found this in the troubleshooting guide) I have a monitor and keyboard hooked up to the server. I can't get to things any other way. What is the powerdown command? I have the script installed but last time I tried CTRL+ALT+DEL it didn't like that.

 

OK time to stop touching stuff I ran: tail -f --lines=100 /var/log/syslog

and now I just have a blinking command prompt at the bottom of the screen.

 

Seems like I am having issues poping up after having the same thing happen as this post:

http://lime-technology.com/forum/index.php?topic=8489.0

 

My unRaid 4.5.6 installation currently seems to be stuck attempting to power down.  Here is how the issue came to be:

 

After setting up my data drives, I installed a parity drive.  After several days, I ended up doing a forced reboot (I don't remember the circumstances now - I think it may have been a networking setting change).  Everything seemed to be working fine for a little while after that, but then I lost connectivity to the array (via web, telnet, and samba)...

 

*SideNote: How do I create that line in the command between syslog and more? See Below

cat /var/log/syslog | more

 

This is were I left off.

Link to comment

Well I tried using CTRL+ALT+DEL to shutdown (i do have the power down script installed), but it did not work. I ended up just holding the power button to power off.

Here is what I got on the command line using CTRL+ALT+DEL:

The system is going down for reboot NOW!

Sending processes the TERM signal

^[[4~Jan 25 14:24:11 Tower shutdown[1470]: shutting down for system reboot

Jan 25 14:24:11 Tower init: Switching to runlevel: 6

INIT: Sending processes the KILL signal

Running shutdown script /etc/rc.d/rc.6:

Saving system time to the hardware clock (UTC).

Unmounting remote filesystems.

/etc/rc.d/rc.inet1.conf: line 18: /boot/config/network.cfg: No such file or directory

/etc/rc.d/rc.inet1.conf: line 19: /var/tmp/network.cfg: No such file or directory

INIT: no more processes left in this runlevel

 

Once I shut down, I found that Unraid did not save my system log after I use the following commands in the troubleshooting section:

cp /var/log/syslog /boot/syslog-2011-01-24.txt

chmod a-x /boot/syslog-2011-01-24.txt

 

So I have attached the two system logs for yesterday that unraid did save on its own in the log folder.

 

Edit:

So I took out the Hitachi drive and booted the system. No weird network errors on the screen. Right now I am running a read-only parity check to see if anything was fouled up.

syslogs-20110124.zip

Link to comment

So my parity check completed with zero errors. Looking at the system log I have a few minor issues and 2 Errors. I have attached the system log. I could use some help decoding them thanks.

 

Minor Issues:

Jan 25 10:26:19 Tower kernel: NET: Registered protocol family 16

Jan 25 10:26:19 Tower kernel: ACPI: bus type pci registered

Jan 25 10:26:19 Tower kernel: PCI: MCFG configuration 0: base e0000000 segment 0 buses 0 - 255

Jan 25 10:26:19 Tower kernel: PCI: MCFG area at e0000000 reserved in E820

Jan 25 10:26:19 Tower kernel: PCI: Using MMCONFIG for extended config space

Jan 25 10:26:19 Tower kernel: PCI: Using configuration type 1 for base access

Jan 25 10:26:19 Tower kernel: bio: create slab <bio-0> at 0

Jan 25 10:26:19 Tower kernel: ACPI: EC: Look up EC in DSDT

Jan 25 10:26:19 Tower kernel: ACPI Warning for \_SB_._OSC: Return type mismatch - found Integer, expected Buffer (20090903/nspredef-1006) (Minor Issues)

Jan 25 10:26:19 Tower kernel: \_SB_:_OSC evaluation returned wrong type

Jan 25 10:26:19 Tower kernel: _OSC request data:1 6

Jan 25 10:26:19 Tower kernel: ACPI: Executed 1 blocks of module-level executable AML code

Jan 25 10:26:19 Tower kernel: ACPI: Interpreter enabled

Jan 25 10:26:19 Tower kernel: ACPI: (supports S0 S1 S5)

Jan 25 10:26:19 Tower kernel: ACPI: Using IOAPIC for interrupt routing

Jan 25 10:26:19 Tower kernel: ACPI Warning: Incorrect checksum in table [OEMB] - A0, should be 9D (20090903/tbutils-314) (Minor Issues)

Jan 25 10:26:19 Tower kernel: ACPI: No dock devices found.

Jan 25 10:26:19 Tower kernel: ACPI: PCI Root Bridge [PCI0] (0000:00)

 

Jan 25 10:26:19 Tower kernel: md: recovery thread has nothing to resync (unRAID engine)

Jan 25 10:26:20 Tower emhttp: shcmd (12): rm /etc/samba/smb-shares.conf >/dev/null 2>$stuff$1 (Other emhttp)

Jan 25 10:26:20 Tower emhttp: _shcmd: shcmd (12): exit status: 1 (Other emhttp)

Jan 25 10:26:20 Tower emhttp: shcmd (13): cp /etc/exports- /etc/exports (Other emhttp)

Jan 25 10:26:20 Tower emhttp: shcmd (14): mkdir /mnt/user (Other emhttp)

Jan 25 10:26:20 Tower emhttp: shcmd (15): /usr/local/sbin/shfs /mnt/user  -o noatime,big_writes,allow_other,default_permissions (Other emhttp)

Jan 25 10:26:21 Tower emhttp: shcmd (16): killall -HUP smbd (Minor Issues)

Jan 25 10:26:21 Tower emhttp: shcmd (17): /etc/rc.d/rc.nfsd restart | logger (Other emhttp)

Jan 25 10:26:22 Tower kernel: e1000e: eth0 NIC Link is Up 1000 Mbps Full Duplex, Flow Control: RX/TX (Network)

Jan 25 10:26:22 Tower ifplugd(eth0)[1441]: Link beat detected. (Network)

 

Errors:

Jan 25 10:26:30 Tower apcupsd[4296]: apcupsd 3.14.3 (20 January 2008) slackware startup succeeded

Jan 25 10:26:30 Tower apcupsd[4296]: NIS server startup succeeded

Jan 25 10:26:31 Tower kernel: inotifywait[5269]: segfault at a96d7f80 ip b78c2638 sp bfc6bd94 error 6 in ld-2.7.so[b78bc000+1c000] (Errors)

Jan 25 10:26:31 Tower kernel: inotifywait[5271]: segfault at a96f0f80 ip b78db638 sp bffc2d64 error 6 in ld-2.7.so[b78d5000+1c000] (Errors)

Jan 25 10:26:36 Tower ntpd[4812]: synchronized to 207.171.7.151, stratum 2

Jan 25 10:26:36 Tower ntpd[4812]: time reset -0.531180 s

Syslog-01-25-2011.txt

Link to comment

Those minor issues are nothing, or harmless.  The segfaults on the other hand *are* a problem, but unfortunately I have no experience with inotifywait, so I'll defer to someone else with more experience there.

 

In case you had not found it, that vertical line character is the 'pipe' symbol, the shift of the backslash key.

 

Well, I just finished examining all 3 syslogs, and while I ambitiously had hoped to find problems and offer fixes, --- I can't!  Except for the segfault, there is nothing wrong with your core system.  I could not help getting a little jealous of how fast your system is, but you probably don't consider *that* a problem!

 

Earlier, I agree there must have been a malfunction.  When it could not find "/boot/config/network.cfg", that probably meant that the flash drive was down, and that explains why you could not save syslogs to it then.  You do have a number of addons loaded, I suspect there is an issue with one of them.  You might try returning to a clean system, then adding addons back one at a time, and retesting whatever you had previously been attempting to do.

Link to comment

Those minor issues are nothing, or harmless.  The segfaults on the other hand *are* a problem, but unfortunately I have no experience with inotifywait, so I'll defer to someone else with more experience there.

Hopefully someone will know. I figured the minor issues probably weren't anything, but I couldn't remember which issues I was seeing before everything got funky.

 

In case you had not found it, that vertical line character is the 'pipe' symbol, the shift of the backslash key.

I had not found it. I was not even sure what it was called when looking. I'll add that into my new command line knowledge. Of course now I see the command on the keyboard, I looked and looked, figures. ::)

 

Well, I just finished examining all 3 syslogs, and while I ambitiously had hoped to find problems and offer fixes, --- I can't!  Except for the segfault, there is nothing wrong with your core system.  I could not help getting a little jealous of how fast your system is, but you probably don't consider *that* a problem!

Thanks for looking through all those logs. It's good to hear the system working well pending the segfault. As far as the speed, yea no problems here running air video ;D, The server is actually faster than my laptop, haha. For better or worse I ended up taking the backwards route to upgrading.

 

Earlier, I agree there must have been a malfunction.  When it could not find "/boot/config/network.cfg", that probably meant that the flash drive was down, and that explains why you could not save syslogs to it then.  You do have a number of addons loaded, I suspect there is an issue with one of them.  You might try returning to a clean system, then adding addons back one at a time, and retesting whatever you had previously been attempting to do.

The flash drive being down makes sense because I saw a bunch of info fly by about mounting a usb device at start-up. My Unraid flash drive was the only usb device attached. Previous to restarting and having those errors, I installed the ntfs-3g package through unMenu. I was thinking that it might help the system readi my Hitachi NTFS drive. Although I knnew I shouldn't need to install it based on the description. I also installed the monthly parity check package at that time. I had been forgetting to install it and remembered right then. Not thinking about the fact I was having problems already, I installed it. As far as returning to a clean install, can I just unistall my packages through unMenu and SNAP via the command line? Or do I need to wipe the usb device and start clean that way?

 

Link to comment
As far as returning to a clean install, can I just unistall my packages through unMenu and SNAP via the command line? Or do I need to wipe the usb device and start clean that way?

 

Everyone's go file is different, but usually all you would need to do is comment out any lines in the go script that start addons, to return it to its 'factory' state.  You comment them out by placing a # (pound symbol) at the start of the line.  I would make a backup of your flash drive first.

 

In UnMENU, you would turn off the Auto Install on reboot for any package you don't want running on the next boot.

 

I know I haven't been very helpful here, but hopefully given a few 'tools' to try. It's going to be a trial and error process.

Link to comment

Everyone's go file is different, but usually all you would need to do is comment out any lines in the go script that start addons, to return it to its 'factory' state.  You comment them out by placing a # (pound symbol) at the start of the line.  I would make a backup of your flash drive first.

 

In UnMENU, you would turn off the Auto Install on reboot for any package you don't want running on the next boot.

 

I know I haven't been very helpful here, but hopefully given a few 'tools' to try. It's going to be a trial and error process.

 

Rob thanks for the tips, it gives me a good starting point. While, I am doing this I probably should remove my parity drive? That way if I screw something up it doesn't start a rebuild or something that screws up my data, or is something of that nature only triggered manually?

 

Nevermind: Found this http://lime-technology.com/forum/index.php?topic=9794.0

 

Well I uninstalled SNAP and my errors went away. I would still like to know what the following errors mean for future reference if anyone knows:

Errors:

Jan 25 10:26:30 Tower apcupsd[4296]: apcupsd 3.14.3 (20 January 2008) slackware startup succeeded

Jan 25 10:26:30 Tower apcupsd[4296]: NIS server startup succeeded

Jan 25 10:26:31 Tower kernel: inotifywait[5269]: segfault at a96d7f80 ip b78c2638 sp bfc6bd94 error 6 in ld-2.7.so[b78bc000+1c000] (Errors)

Jan 25 10:26:31 Tower kernel: inotifywait[5271]: segfault at a96f0f80 ip b78db638 sp bffc2d64 error 6 in ld-2.7.so[b78d5000+1c000] (Errors)

Jan 25 10:26:36 Tower ntpd[4812]: synchronized to 207.171.7.151, stratum 2

Jan 25 10:26:36 Tower ntpd[4812]: time reset -0.531180 s

Link to comment

Everyone's go file is different, but usually all you would need to do is comment out any lines in the go script that start addons, to return it to its 'factory' state.  You comment them out by placing a # (pound symbol) at the start of the line.  I would make a backup of your flash drive first.

 

In UnMENU, you would turn off the Auto Install on reboot for any package you don't want running on the next boot.

 

I know I haven't been very helpful here, but hopefully given a few 'tools' to try. It's going to be a trial and error process.

 

Rob thanks for the tips, it gives me a good starting point. While, I am doing this I probably should remove my parity drive? That way if I screw something up it doesn't start a rebuild or something that screws up my data, or is something of that nature only triggered manually?

 

Nevermind: Found this http://lime-technology.com/forum/index.php?topic=9794.0

 

Well I uninstalled SNAP and my errors went away. I would still like to know what the following errors mean for future reference if anyone knows:

Errors:

Jan 25 10:26:30 Tower apcupsd[4296]: apcupsd 3.14.3 (20 January 2008) slackware startup succeeded

Jan 25 10:26:30 Tower apcupsd[4296]: NIS server startup succeeded

Jan 25 10:26:31 Tower kernel: inotifywait[5269]: segfault at a96d7f80 ip b78c2638 sp bfc6bd94 error 6 in ld-2.7.so[b78bc000+1c000] (Errors)

Jan 25 10:26:31 Tower kernel: inotifywait[5271]: segfault at a96f0f80 ip b78db638 sp bffc2d64 error 6 in ld-2.7.so[b78d5000+1c000] (Errors)

Jan 25 10:26:36 Tower ntpd[4812]: synchronized to 207.171.7.151, stratum 2

Jan 25 10:26:36 Tower ntpd[4812]: time reset -0.531180 s

 

SNAP uses inotifywait to detect events.  That package is installed in lots of other machines so I can't tell you what might be happening there.  I know unMenu has a different version of inotifywait on it's packages management page.  SNAP includes and installs an inotify package if one is not found to be installed in /usr/bin/inotifywait.  It might be worth investigating which is installed.

Link to comment

Everyone's go file is different, but usually all you would need to do is comment out any lines in the go script that start addons, to return it to its 'factory' state.  You comment them out by placing a # (pound symbol) at the start of the line.  I would make a backup of your flash drive first.

 

In UnMENU, you would turn off the Auto Install on reboot for any package you don't want running on the next boot.

 

I know I haven't been very helpful here, but hopefully given a few 'tools' to try. It's going to be a trial and error process.

 

Rob thanks for the tips, it gives me a good starting point. While, I am doing this I probably should remove my parity drive? That way if I screw something up it doesn't start a rebuild or something that screws up my data, or is something of that nature only triggered manually?

 

Nevermind: Found this http://lime-technology.com/forum/index.php?topic=9794.0

 

Well I uninstalled SNAP and my errors went away. I would still like to know what the following errors mean for future reference if anyone knows:

Errors:

Jan 25 10:26:30 Tower apcupsd[4296]: apcupsd 3.14.3 (20 January 2008) slackware startup succeeded

Jan 25 10:26:30 Tower apcupsd[4296]: NIS server startup succeeded

Jan 25 10:26:31 Tower kernel: inotifywait[5269]: segfault at a96d7f80 ip b78c2638 sp bfc6bd94 error 6 in ld-2.7.so[b78bc000+1c000] (Errors)

Jan 25 10:26:31 Tower kernel: inotifywait[5271]: segfault at a96f0f80 ip b78db638 sp bffc2d64 error 6 in ld-2.7.so[b78d5000+1c000] (Errors)

Jan 25 10:26:36 Tower ntpd[4812]: synchronized to 207.171.7.151, stratum 2

Jan 25 10:26:36 Tower ntpd[4812]: time reset -0.531180 s

 

SNAP uses inotifywait to detect events.  That package is installed in lots of other machines so I can't tell you what might be happening there.  I know unMenu has a different version of inotifywait on it's packages management page.  SNAP includes and installs an inotify package if one is not found to be installed in /usr/bin/inotifywait.  It might be worth investigating which is installed.

 

Well when SNAP was installed the package manager said that inotify tools was installed but not downloaded. However, when I looked in the /boot/packages, inotify-tools-3.8-i486-1 was in there. Doesn't that mean that it is downloaded as well? Now that I have uninstalled SNAP, package manager says that inotify tools is not downloaded, but I still see inotify-tools-3.8-i486-1 in /boot/packages.

 

Also, I pulled out my Hitachi drive and attached it to my laptop to see if the drive had become corrupt since UnRaid wasn't showing that it could be mounted. But once attached to my laptop the drive works fine. So I put it back into unraid, with snap uinistalled, but still there is no mount option in unmenu and i have the ntfs-3g - NTFS read/write filesystem driver installed.

 

Any ideas?

Link to comment

The logs are stored in a RAM disk. They do not survive a reboot. Have you run a memtest overnight? If that works, remove all addons as RobJ suggested, run a parity check, upgrade to 4.7 (we can't spend time debugging an old version), do a parity check. See if your problem still exists. Then add the addons and if the problem reoccurs we can start debugging. I'm betting the segfault is a RAM problem.

Link to comment

The logs are stored in a RAM disk. They do not survive a reboot. Have you run a memtest overnight? If that works, remove all addons as RobJ suggested, run a parity check, upgrade to 4.7 (we can't spend time debugging an old version), do a parity check. See if your problem still exists. Then add the addons and if the problem reoccurs we can start debugging. I'm betting the segfault is a RAM problem.

Thanks for the help. I'll run a memory test on it later today. I had done one just a few weeks ago and let it run for 24 hrs with no issues, so guess we will see if something has come up.

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.