January 6, 20179 yr Was running 6..0.1, from when the console root login prompt was displayed it use to take around 10 seconds for the webGUI to come active and was immediately responsive, I just updated to 6.2.4 and the webGUI now takes over a minute to come up and then it's very unresponsive and slow, almost unusable, clicking on the tab, Dashboard, Main etc, takes anything from 30s to several mins to respond. How can I fix this or how to I rollback to 6.0.1?
January 6, 20179 yr Author Diagnostic attached It is fairly consistently 20s for each tab to respond. I have 2 NICs, the onboard is disabled, I use the added Intel server card, always have. The reason I mention this is that I've just seen some console logging out that I've never ever seen before: default via 192.168.1.2 dev eth0 linkdown 192.168.1.0.0/24 dev eth0 proto kernel scope link src 192.168.1.9 linkdown device "eth1" does not exist"[code/] rutland-diagnostics-20170106-1216.zip
January 6, 20179 yr Here's your problem: Jan 6 12:16:22 rutland avahi-daemon[14303]: Failed to find user 'avahi'. See here: https://lime-technology.com/forum/index.php?topic=48508.msg515980#msg515980
January 7, 20179 yr Author Spoke to soon. unRAID gui working fine for several hours, but then I started to use share in anger, and after 5mins or so it has locked up, and is unresponsive to pings, in all the years I've used unRAID I have not had this happen. I'll grab logs, but for my use (home NAS file server) I think I would prefer to rollback to 6.0.1, is there a plugin to do rollbacks? If I get the server to respond somehow. Console dead as well, it's a hard freeze. Some output on console: __write_cache_pages+0x249/0x353 and so on wb_workfn kthread generic_writepages and bunch more similar then the flashing cursor I've attached logs after I hard rebooted. It came up fine after a reboot, but for now I have kept the array off line as it detected an unclean shutdown, and is pending a parity check. I don't want to run the risk of it failing during the parity check until I get some feedback. Before the crash I had successfully streamed a hour long TV show. Trouble started when I started to copy a handful of files over to a 2 disk share, it wrote one of the ~1GB files, and then I copied another (concurrently) and then Windows lost the connect to the mapped drive, I pinged the server and it was unresponsive. rutland-diagnostics-20170106-1837.zip
January 7, 20179 yr Author Ok, I may have found the issue, I had this in the passwd avahi-ipd:x:62:62:Avahi AutoIP Daemon User:/dev/null:/bin/false[code/] instead of [code]avahi-autoipd:x:62:62:Avahi AutoIP Daemon User:/dev/null:/bin/false I missed the 'auto' from autoipd - could this have caused it? I amended the type and reboot, but I now see this at the console, but all networking is working default via 192.168.1.2 dev eth0 linkdown 192.168.1.0.0/24 dev eth0 proto kernel scope link src 192.168.1.9 linkdown Current passwd and shadow avahi:x:61:214:Avahi Daemon User:/dev/null:/bin/false avahi-autoipd:x:62:62:Avahi AutoIP Daemon User:/dev/null:/bin/false avahi:!:14980:0:99999:7::: avahi-autoipd:!:14980:0:99999:7::: So, I'm confused that it shows the link as down yet it works.
January 7, 20179 yr Author OK, this is repeatable, I decided to go ahead and let it check the parity, it was running fine for several minutes, I then repeated copy the file as I did before and it locks up again. Same output on the console, I actually saw the output on the console come up as it failed, live if you like. My gut feeling is that it crashes when I try and write to unRAID and is networking related. Never had this issue once with 6.0.1 (same hardware BTW) or vs 4 -5.
January 7, 20179 yr There were big changes made to the networking subsystem going from 6.1 to 6.2. If you have a DHCP server on your network try deleting the config/network.cfg file on your USB boot device (/boot/config/network.cfg) and rebooting. Let it get an address from the DHCP server and then go into Settings -> Network Settings and re-enter your manual IP address (192.168.1.9). You'll have to reboot again but hopefully that should fix that problem.
January 7, 20179 yr Author I do have a DHCP server, my unRAID is configured with a static IP, and my router router reflects that this address is bound to it. I'll give your suggestions a that try once the parity check has completed, which has been running for 2hrs, so unRAID locally seems to work. If it fails after that then I would prefer to rollback to 6.0.1, as I put reliability over features, but hopefully the changes will work.
January 7, 20179 yr I think you're right. The system is essentially working and it's just the network that remains an issue. While you're waiting for your parity check to finish take a look at this. It's about upgrading to unRAID 6.2. The relevant section is entitled Network Changes and Issues. There's a wealth of useful information in the first few posts in that thread.
January 7, 20179 yr Author I took a look at the Network settings in the gui before I made any changes, and it was very simple, static IP, default gw etc, this was also reflected in the network.cfg file. # Generated settings: USE_DHCP=no IPADDR=192.168.1.9 NETMASK=255.255.255.0 GATEWAY=192.168.1.2 DHCP_KEEPRESOLV=no DNS_SERVER1=192.168.1.2 DNS_SERVER2= DNS_SERVER3= I then deleted network.cfg and rebooted, and there are several changes I have noticed. The console output has changed, it no longer says 'linkdown' for 192.168.1.9 It now also says 'eth1' doesn't exist, which is most likely the onboard NIC, although this is disabled in the BIOS, so not sure how it has managed this, maybe a hangover from the issue I had at the beginning of this post Under 'Network Settings' there are a heck of a lot more options that were not previously there, it has defaulted to 'Enable bonding - yes', 'Bond mode - active backup (1)', 'Bonding members - eth0', 'Enable bridging - yes' and IP asssignment automatic' The old manual settings are greyed out but correct there is now no longer a network.cfg under /boot/config, but I haven't made any changes via the gui yet, so maybe that will create the file once I've done that, if I do at all, see below. My question now is do I leave it as is, as my router assigns the server a static lease, or, do I still use the webgui to set a static IP? If I do set a static IP, which of the items mention in '1' above to I change?
January 7, 20179 yr If you make any changes from the default a new network.cfg file will be saved. You can leave it just as it is, if you want. My own DHCP server is configured to assign static leases to specific MAC addresses, too. Or you can set it to a static address. You'll need to set the IP address, netmask, gateway and DNS server(s), as you did originally, but you can leave the bridging and bonding options as they are now. The default options have been chosen so that if you have a server with an arbitrary number of Ethernet ports and you choose one at random and connect it to the rest of your network then it will work. That's nice if you're starting with a fresh clean configuration but, as you've found, an inherited configuration sometimes breaks the system.
January 7, 20179 yr Author I'll leave it as is, may end up using the second NIC at some point. Now to try the file copy again, and see if it has fixed the issue.
January 7, 20179 yr Author Nope did the same thing, and it seems to be almost the same amount of time ~20 seconds or so, every time. Server has locked up. The file I'm copying is a 476MB file via a torrent client, which runs on on XP via a mapped mapped drive - this system has worked fine for >5 years. So, I grab the torrent, and it saves the 478MB to a mapped drive, after downloading the torrent for around 20s, unRAID locks up. The issue seems time dependent. As a side note, the parity check ran for 6 hours and completed ok, so I think the disk sub system is fine. Could it be a Samba issue?
January 7, 20179 yr Can you get diagnostics before you reboot (ie. so that they show what happened)? If you can't do it via the GUI try to telnet or ssh in and type diagnostics or else plug in a monitor and keyboard and type the same command. It will put the zip file in the logs folder of your USB flash device (ie. in /boot/logs ).
January 7, 20179 yr Author I have monitor and keyboard plugged in, and I can see the console output when it locks up, as per previous post the console output has this Some output on console: __write_cache_pages+0x249/0x353 and so on wb_workfn kthread generic_writepages and bunch more similar then the flashing cursor So, not sure how I'm going to get the logs, does unRAID have other terminals? tty1 tty2 etc? As I said earlier the console is unresponsive. I let the parity check complete again, it will take around 6hrs, and then I'll try again, the good bad news is I can recreate this at will as I start the copy/torrent download, almost to the second, (20s) and boom it locks up and all the above message get dumped to the console.
January 7, 20179 yr I don't know whether other virtual terminals are supported and I'm not able to check right now, but doesn't CTRL-C give you a login prompt? You might have file system corruption, which would explain the inability to write to a share. I'd just try copying a regular file for the moment, rather than trying to download a torrent. I'm not sure how such files are handled - whether the full size is allocated first and the blocks filled in as they arrive, or whether it's allocated sparsely and grows as the parts arrive. I don't use torrents, myself. Either way it's a complication and simply trying to copy a regular file would be a simpler test. If it turns out that you do have file system corruption you'll need to start the array in Maintenance mode and run the appropriate file system repair tool on the affected disk(s).
January 7, 20179 yr Author I was thinking the same thing! A simple file copy to try next. Having said, the torrent download/file copy has worked fine on previous unRAID version I've used over the years. It does seem likely that I could have some file corruption going on.
January 7, 20179 yr Here: https://lime-technology.com/wiki/index.php/Check_Disk_Filesystems#Drives_formatted_with_XFS
January 7, 20179 yr Actually, you can do it via the GUI. Start the array in Maintenance mode and then click on the suspect disk's name (eg. Disk 2) at the left side of the Main page, right next to the green ball. That opens a page specific to that disk, with SMART information and (in Maintenance mode only) the option to check the file system. https://lime-technology.com/wiki/index.php/Check_Disk_Filesystems#Checking_and_fixing_drives_in_the_webGui
January 7, 20179 yr Author Things have got a little worse, I rebooted to do another parity check, everything comes up fine, webgui and the such like, I then hit the 'Start array and do parity check' and the webgui seems to be in an endless refresh state, bottom left corner of gui says 'Mounting disks...' I can still SSH in and the console is fine, just before the console login prompt comes up there is bunch of stuff along the lines of XFS, which I don't think I've ssen before, I've attached some screenshots. Do I need to let the parity check complete before running a file system repair? Assuming I can get the parity check to even work now, I'm sort of stuck. When I hit Start array and check parity the console output shows a bunch of stuff but this one might be a clue :sh line1: 8019 killed mount -t xfs -o noatime,nodiratime /dev/md2 /mnt/disk2 2>&1 8020 Done : logger This is the disk the torrent was writing too. Then XFS (md3): Mounting V% filesystem XFS (md3) Ending clean mount XFS (md4) mounting V5 filesystem XFS (md4) ending clean mount Then unRAID Server OS version 6.2.4 and the login prompt rutland-diagnostics-20170107-1119.zip
January 7, 20179 yr Author Current situation is this. When I reboot the webgui is fine As soon as I hit start array and check parity the webgui freezes but ssh and root console work. Consequently I currently don't know if the array is online and doing a parity check or not Looks like disk2 hasn't mounted, which was the disk the torrent was writing too. Filesystem Size Used Avail Use% Mounted on rootfs 1.8G 335M 1.5G 19% / tmpfs 1.9G 184K 1.9G 1% /run devtmpfs 1.8G 0 1.8G 0% /dev cgroup_root 1.9G 0 1.9G 0% /sys/fs/cgroup tmpfs 128M 2.1M 126M 2% /var/log /dev/sda1 15G 208M 15G 2% /boot /dev/md1 1.9T 935G 928G 51% /mnt/disk1 /dev/md3 1.9T 139G 1.7T 8% /mnt/disk3 /dev/md4 1.9T 191G 1.7T 11% /mnt/disk4 I've brough it up in maintenance mode, which works fine and I'm now running a check on disk2
January 7, 20179 yr You need to start the array in Maintenance mode (check the box before clicking Start Array). That will start the unRAID driver but will not try to mount the disks. Then you can fix the file system on Disk 2.
Archived
This topic is now archived and is closed to further replies.