N4TH4N Posted February 1, 2016 Share Posted February 1, 2016 Hey Guys, I have been having some issues and would appreciate some ideas. Every single night since installing unRAID 6.1.7 i have woken up to a dead network. My unRAID install is non excitant on my network and unresponsive to keyboard inputs. I have been running unRAID for a couple of years now inside an ESXi VM without issue. I decided the other night to run unRAID nativley and then use docker and vm's inside of it. It's great except the nightly crashes. I cant seem to get any logs because i cant SSH or use the keyboard. I get an end trace > end kernal panic which is all i can see on the screen. Now i know for a fact that the hardware is fine because its been up running ESXi for years without a hiccup. Any suggestions would be great. Thanks. Quote Link to comment
N4TH4N Posted February 2, 2016 Author Share Posted February 2, 2016 I think its happening each time mover starts. So i disabled mover for tonight and will see if it dies. 1 Quote Link to comment
itimpi Posted February 2, 2016 Share Posted February 2, 2016 I think its happening each time mover starts. So i disabled mover for tonight and will see if it dies. You want to check that any shares/folders used by dockers that need to stay on the cache are set to be cache-only. It almost sounds as if you have something that is being moved that should not be. Quote Link to comment
N4TH4N Posted February 2, 2016 Author Share Posted February 2, 2016 You want to check that any shares/folders used by dockers that need to stay on the cache are set to be cache-only. It almost sounds as if you have something that is being moved that should not be. Thank-you for the suggestion. I have created a share called docker that i run the image and all of my configs from and a share called vm that i run all my virtual machines from. Both are set to Only under cache drive settings. I'll try leaving it on overnight and see if it crashes with mover stopped. If it does not ill start trying to break it with mover starting with all docker apps stopped. Turning them on until i find what breaks it. If its not docker of the vm's i don't know what to try next. But i guess ill just have to see where this gets me. Thanks. Quote Link to comment
N4TH4N Posted February 9, 2016 Author Share Posted February 9, 2016 I managed to have the log window open whilst the system crashed doing a mover task. It was whilst docker was running so i rebooted and attempted to do the mover again without starting docker and it crashed again. However it had completed a couple of mover tasks before i installed the btsync docker. http://pastebin.com/mKYmMhYb Quote Link to comment
N4TH4N Posted February 10, 2016 Author Share Posted February 10, 2016 http://pastebin.com/Z9EZff0e Everything crashes the second mover starts, docker and vm's are not running. Quote Link to comment
N4TH4N Posted February 10, 2016 Author Share Posted February 10, 2016 I just disabled everything completely, docker, plugins etc. and its still crashing. Quote Link to comment
itimpi Posted February 10, 2016 Share Posted February 10, 2016 Mover is just s shell script so if it causes a crash it will be indirectly. I suspect that either you have some sort of hardware issue, or you have File System corruption on one of the disks that is triggering a crash. I would suggest that you should: Run a memtest from the uNRAID boot menu to check out the RAM. Run a File system check against the disks. If neither of those show up anything I am not sure what to suggest but you are then looking at ways to check the power supply, motherboard, controller cards, etc Quote Link to comment
N4TH4N Posted February 10, 2016 Author Share Posted February 10, 2016 Mover is just s shell script so if it causes a crash it will be indirectly. I suspect that either you have some sort of hardware issue, or you have File System corruption on one of the disks that is triggering a crash. I would suggest that you should: Run a memtest from the uNRAID boot menu to check out the RAM. Run a File system check against the disks. If neither of those show up anything I am not sure what to suggest but you are then looking at ways to check the power supply, motherboard, controller cards, etc Thanks for the reply, first thing i done was a memtest which done multiple passes without error, ill do a filesystem check. I ran the computer flawlessly with ESXi and a passedthrough M1115, now with unRAID 6 im running it with the motherboards extra ports. I might disable the extra ports and see where that gets me. Quote Link to comment
RobJ Posted February 10, 2016 Share Posted February 10, 2016 Both pastebins show the same error - Feb 10 23:28:17 unRAID logger: rsync: get_xattr_names: llistxattr(""/mnt/user0/nathan"",1024) failed: Software caused connection abort (103) Feb 10 06:26:29 unRAID logger: rsync: get_xattr_names: llistxattr(""/mnt/user0/nathan"",1024) failed: Software caused connection abort (103) I suspect the connection abort is what caused all the connection errors that followed ("Transport endpoint is not connected"). It however did much worse than abort the connection, it apparently caused system corruption that showed up shortly as a 'general protection fault' with important modules 'tainted' (at which point I wouldn't trust anything the system reported, must reboot). So in both cases, the rsync command is trying to do something with extended attributes on the nathan share. I would start by doing the file system check on those drives containing nathan, including the Cache drive. I have a hazy memory of rare issues in rare circumstances with extended attributes in the Reiser file system, issues that disappeared when they converted to XFS. But my memory is not very good. 1 Quote Link to comment
N4TH4N Posted February 10, 2016 Author Share Posted February 10, 2016 Both pastebins show the same error - Feb 10 23:28:17 unRAID logger: rsync: get_xattr_names: llistxattr(""/mnt/user0/nathan"",1024) failed: Software caused connection abort (103) Feb 10 06:26:29 unRAID logger: rsync: get_xattr_names: llistxattr(""/mnt/user0/nathan"",1024) failed: Software caused connection abort (103) I suspect the connection abort is what caused all the connection errors that followed ("Transport endpoint is not connected"). It however did much worse than abort the connection, it apparently caused system corruption that showed up shortly as a 'general protection fault' with important modules 'tainted' (at which point I wouldn't trust anything the system reported, must reboot). So in both cases, the rsync command is trying to do something with extended attributes on the nathan share. I would start by doing the file system check on those drives containing nathan, including the Cache drive. I have a hazy memory of rare issues in rare circumstances with extended attributes in the Reiser file system, issues that disappeared when they converted to XFS. But my memory is not very good. Thanks for the reply. I just deleted the nathan share from the cache drive as all it contained was android phone backups that were still stored on a USB drive. Once the files were deleted mover seems to be working fine again with the small tests i just threw at it. The files that were being the issue were stored on the cache drive formatted as btrfs, could this be an issue. My old unraid cache wasnt btrfs and i put 2x 120gb ssd's in as a cache pool and thats when my problems started. Should i format to XFS ? Quote Link to comment
RobJ Posted February 10, 2016 Share Posted February 10, 2016 Since deleting problem files does not necessarily guarantee that all file system problems are gone, I would still run file system checks. Btrfs has the scrub command, which is 'supposed' to fix btrfs issues. I would try it sooner than later, not trust that drive/pool until it's reported clean. If you can begin to trust your cache pool, then you may not need to look at other options. Quote Link to comment
mr-hexen Posted February 10, 2016 Share Posted February 10, 2016 How are you getting the files from your phone to the cache device? I'm wondering if there's a permission issue here. Next time you run into this issue run the following command on the nathan share on the cache drive ls -la and post the results here. Quote Link to comment
N4TH4N Posted February 10, 2016 Author Share Posted February 10, 2016 Since deleting problem files does not necessarily guarantee that all file system problems are gone, I would still run file system checks. Btrfs has the scrub command, which is 'supposed' to fix btrfs issues. I would try it sooner than later, not trust that drive/pool until it's reported clean. If you can begin to trust your cache pool, then you may not need to look at other options. It scrubbed fine, do you recall what format used to be default for cache drives in unRAID. Could it be that i'm trying to move certain files that have extended attributes from a btrfs to a reiserfs formatted drive when using rsync. Quote Link to comment
itimpi Posted February 10, 2016 Share Posted February 10, 2016 Since deleting problem files does not necessarily guarantee that all file system problems are gone, I would still run file system checks. Btrfs has the scrub command, which is 'supposed' to fix btrfs issues. I would try it sooner than later, not trust that drive/pool until it's reported clean. If you can begin to trust your cache pool, then you may not need to look at other options. It scrubbed fine, do you recall what format used to be default for cache drives in unRAID. Could it be that i'm trying to move certain files that have extended attributes from a btrfs to a reiserfs formatted drive when using rsync. in v5 the only supported file system was Reiserfs (assuming that is the question you were asking) Quote Link to comment
N4TH4N Posted February 10, 2016 Author Share Posted February 10, 2016 in v5 the only supported file system was Reiserfs (assuming that is the question you were asking) Thanks, i kind of feel like setting the cache drive to reiserfs and trying it with the same problem files again. Quote Link to comment
itimpi Posted February 10, 2016 Share Posted February 10, 2016 in v5 the only supported file system was Reiserfs (assuming that is the question you were asking) Thanks, i kind of feel like setting the cache drive to reiserfs and trying it with the same problem files again. The most stable FS for v6 is considered to be XFS. Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.