February 1, 201610 yr Hey Guys, I have been having some issues and would appreciate some ideas. Every single night since installing unRAID 6.1.7 i have woken up to a dead network. My unRAID install is non excitant on my network and unresponsive to keyboard inputs. I have been running unRAID for a couple of years now inside an ESXi VM without issue. I decided the other night to run unRAID nativley and then use docker and vm's inside of it. It's great except the nightly crashes. I cant seem to get any logs because i cant SSH or use the keyboard. I get an end trace > end kernal panic which is all i can see on the screen. Now i know for a fact that the hardware is fine because its been up running ESXi for years without a hiccup. Any suggestions would be great. Thanks.
February 2, 201610 yr Author I think its happening each time mover starts. So i disabled mover for tonight and will see if it dies.
February 2, 201610 yr I think its happening each time mover starts. So i disabled mover for tonight and will see if it dies. You want to check that any shares/folders used by dockers that need to stay on the cache are set to be cache-only. It almost sounds as if you have something that is being moved that should not be.
February 2, 201610 yr Author You want to check that any shares/folders used by dockers that need to stay on the cache are set to be cache-only. It almost sounds as if you have something that is being moved that should not be. Thank-you for the suggestion. I have created a share called docker that i run the image and all of my configs from and a share called vm that i run all my virtual machines from. Both are set to Only under cache drive settings. I'll try leaving it on overnight and see if it crashes with mover stopped. If it does not ill start trying to break it with mover starting with all docker apps stopped. Turning them on until i find what breaks it. If its not docker of the vm's i don't know what to try next. But i guess ill just have to see where this gets me. Thanks.
February 9, 201610 yr Author I managed to have the log window open whilst the system crashed doing a mover task. It was whilst docker was running so i rebooted and attempted to do the mover again without starting docker and it crashed again. However it had completed a couple of mover tasks before i installed the btsync docker. http://pastebin.com/mKYmMhYb
February 10, 201610 yr Author http://pastebin.com/Z9EZff0e Everything crashes the second mover starts, docker and vm's are not running.
February 10, 201610 yr Author I just disabled everything completely, docker, plugins etc. and its still crashing.
February 10, 201610 yr Mover is just s shell script so if it causes a crash it will be indirectly. I suspect that either you have some sort of hardware issue, or you have File System corruption on one of the disks that is triggering a crash. I would suggest that you should: Run a memtest from the uNRAID boot menu to check out the RAM. Run a File system check against the disks. If neither of those show up anything I am not sure what to suggest but you are then looking at ways to check the power supply, motherboard, controller cards, etc
February 10, 201610 yr Author Mover is just s shell script so if it causes a crash it will be indirectly. I suspect that either you have some sort of hardware issue, or you have File System corruption on one of the disks that is triggering a crash. I would suggest that you should: Run a memtest from the uNRAID boot menu to check out the RAM. Run a File system check against the disks. If neither of those show up anything I am not sure what to suggest but you are then looking at ways to check the power supply, motherboard, controller cards, etc Thanks for the reply, first thing i done was a memtest which done multiple passes without error, ill do a filesystem check. I ran the computer flawlessly with ESXi and a passedthrough M1115, now with unRAID 6 im running it with the motherboards extra ports. I might disable the extra ports and see where that gets me.
February 10, 201610 yr Both pastebins show the same error - Feb 10 23:28:17 unRAID logger: rsync: get_xattr_names: llistxattr(""/mnt/user0/nathan"",1024) failed: Software caused connection abort (103) Feb 10 06:26:29 unRAID logger: rsync: get_xattr_names: llistxattr(""/mnt/user0/nathan"",1024) failed: Software caused connection abort (103) I suspect the connection abort is what caused all the connection errors that followed ("Transport endpoint is not connected"). It however did much worse than abort the connection, it apparently caused system corruption that showed up shortly as a 'general protection fault' with important modules 'tainted' (at which point I wouldn't trust anything the system reported, must reboot). So in both cases, the rsync command is trying to do something with extended attributes on the nathan share. I would start by doing the file system check on those drives containing nathan, including the Cache drive. I have a hazy memory of rare issues in rare circumstances with extended attributes in the Reiser file system, issues that disappeared when they converted to XFS. But my memory is not very good.
February 10, 201610 yr Author Both pastebins show the same error - Feb 10 23:28:17 unRAID logger: rsync: get_xattr_names: llistxattr(""/mnt/user0/nathan"",1024) failed: Software caused connection abort (103) Feb 10 06:26:29 unRAID logger: rsync: get_xattr_names: llistxattr(""/mnt/user0/nathan"",1024) failed: Software caused connection abort (103) I suspect the connection abort is what caused all the connection errors that followed ("Transport endpoint is not connected"). It however did much worse than abort the connection, it apparently caused system corruption that showed up shortly as a 'general protection fault' with important modules 'tainted' (at which point I wouldn't trust anything the system reported, must reboot). So in both cases, the rsync command is trying to do something with extended attributes on the nathan share. I would start by doing the file system check on those drives containing nathan, including the Cache drive. I have a hazy memory of rare issues in rare circumstances with extended attributes in the Reiser file system, issues that disappeared when they converted to XFS. But my memory is not very good. Thanks for the reply. I just deleted the nathan share from the cache drive as all it contained was android phone backups that were still stored on a USB drive. Once the files were deleted mover seems to be working fine again with the small tests i just threw at it. The files that were being the issue were stored on the cache drive formatted as btrfs, could this be an issue. My old unraid cache wasnt btrfs and i put 2x 120gb ssd's in as a cache pool and thats when my problems started. Should i format to XFS ?
February 10, 201610 yr Since deleting problem files does not necessarily guarantee that all file system problems are gone, I would still run file system checks. Btrfs has the scrub command, which is 'supposed' to fix btrfs issues. I would try it sooner than later, not trust that drive/pool until it's reported clean. If you can begin to trust your cache pool, then you may not need to look at other options.
February 10, 201610 yr How are you getting the files from your phone to the cache device? I'm wondering if there's a permission issue here. Next time you run into this issue run the following command on the nathan share on the cache drive ls -la and post the results here.
February 10, 201610 yr Author Since deleting problem files does not necessarily guarantee that all file system problems are gone, I would still run file system checks. Btrfs has the scrub command, which is 'supposed' to fix btrfs issues. I would try it sooner than later, not trust that drive/pool until it's reported clean. If you can begin to trust your cache pool, then you may not need to look at other options. It scrubbed fine, do you recall what format used to be default for cache drives in unRAID. Could it be that i'm trying to move certain files that have extended attributes from a btrfs to a reiserfs formatted drive when using rsync.
February 10, 201610 yr Since deleting problem files does not necessarily guarantee that all file system problems are gone, I would still run file system checks. Btrfs has the scrub command, which is 'supposed' to fix btrfs issues. I would try it sooner than later, not trust that drive/pool until it's reported clean. If you can begin to trust your cache pool, then you may not need to look at other options. It scrubbed fine, do you recall what format used to be default for cache drives in unRAID. Could it be that i'm trying to move certain files that have extended attributes from a btrfs to a reiserfs formatted drive when using rsync. in v5 the only supported file system was Reiserfs (assuming that is the question you were asking)
February 10, 201610 yr Author in v5 the only supported file system was Reiserfs (assuming that is the question you were asking) Thanks, i kind of feel like setting the cache drive to reiserfs and trying it with the same problem files again.
February 10, 201610 yr in v5 the only supported file system was Reiserfs (assuming that is the question you were asking) Thanks, i kind of feel like setting the cache drive to reiserfs and trying it with the same problem files again. The most stable FS for v6 is considered to be XFS.
Archived
This topic is now archived and is closed to further replies.