Mover Crashes System Instantly


N4TH4N

Recommended Posts

Hey Guys,

 

I have been having some issues and would appreciate some ideas.

 

Every single night since installing unRAID 6.1.7 i have woken up to a dead network. My unRAID install is non excitant on my network and unresponsive to keyboard inputs.

 

I have been running unRAID for a couple of years now inside an ESXi VM without issue. I decided the other night to run unRAID nativley and then use docker and vm's inside of it.

 

It's great except the nightly crashes. I cant seem to get any logs because i cant SSH or use the keyboard.

 

I get an end trace > end kernal panic which is all i can see on the screen.

 

Now i know for a fact that the hardware is fine because its been up running ESXi for years without a hiccup.

 

Any suggestions would be great.

 

Thanks.

Link to comment

You want to check that any shares/folders used by dockers that need to stay on the cache are set to be cache-only.  It almost sounds as if you have something that is being moved that should not be.

 

Thank-you for the suggestion. I have created a share called docker that i run the image and all of my configs from and a share called vm that i run all my virtual machines from. Both are set to Only under cache drive settings.

 

9

 

I'll try leaving it on overnight and see if it crashes with mover stopped. If it does not ill start trying to break it with mover starting with all docker apps stopped. Turning them on until i find what breaks it. If its not docker of the vm's i don't know what to try next. But i guess ill just have to see where this gets me.

 

Thanks.

 

 

Link to comment

Mover is just s shell script so if it causes a crash it will be indirectly.  I suspect that either you have some sort of hardware issue, or you have File System corruption on one of the disks that is triggering a crash.  I would suggest that you should:

  • Run a memtest from the uNRAID boot menu to check out the RAM.
  • Run a File system check against the disks.

If neither of those show up anything I am not sure what to suggest but you are then looking at ways to check the power supply, motherboard, controller cards, etc

 

Link to comment

Mover is just s shell script so if it causes a crash it will be indirectly.  I suspect that either you have some sort of hardware issue, or you have File System corruption on one of the disks that is triggering a crash.  I would suggest that you should:

  • Run a memtest from the uNRAID boot menu to check out the RAM.
  • Run a File system check against the disks.

If neither of those show up anything I am not sure what to suggest but you are then looking at ways to check the power supply, motherboard, controller cards, etc

 

Thanks for the reply, first thing i done was a memtest which done multiple passes without error, ill do a filesystem check.

I ran the computer flawlessly with ESXi and a passedthrough M1115, now with unRAID 6 im running it with the motherboards extra ports. I might disable the extra ports and see where that gets me.

Link to comment

Both pastebins show the same error -

Feb 10 23:28:17 unRAID logger: rsync: get_xattr_names: llistxattr(""/mnt/user0/nathan"",1024) failed: Software caused connection abort (103)

Feb 10 06:26:29 unRAID logger: rsync: get_xattr_names: llistxattr(""/mnt/user0/nathan"",1024) failed: Software caused connection abort (103)

I suspect the connection abort is what caused all the connection errors that followed ("Transport endpoint is not connected").  It however did much worse than abort the connection, it apparently caused system corruption that showed up shortly as a 'general protection fault' with important modules 'tainted' (at which point I wouldn't trust anything the system reported, must reboot).

 

So in both cases, the rsync command is trying to do something with extended attributes on the nathan share.  I would start by doing the file system check on those drives containing nathan, including the Cache drive.

 

I have a hazy memory of rare issues in rare circumstances with extended attributes in the Reiser file system, issues that disappeared when they converted to XFS.  But my memory is not very good.

  • Like 1
Link to comment

Both pastebins show the same error -

Feb 10 23:28:17 unRAID logger: rsync: get_xattr_names: llistxattr(""/mnt/user0/nathan"",1024) failed: Software caused connection abort (103)

Feb 10 06:26:29 unRAID logger: rsync: get_xattr_names: llistxattr(""/mnt/user0/nathan"",1024) failed: Software caused connection abort (103)

I suspect the connection abort is what caused all the connection errors that followed ("Transport endpoint is not connected").  It however did much worse than abort the connection, it apparently caused system corruption that showed up shortly as a 'general protection fault' with important modules 'tainted' (at which point I wouldn't trust anything the system reported, must reboot).

 

So in both cases, the rsync command is trying to do something with extended attributes on the nathan share.  I would start by doing the file system check on those drives containing nathan, including the Cache drive.

 

I have a hazy memory of rare issues in rare circumstances with extended attributes in the Reiser file system, issues that disappeared when they converted to XFS.  But my memory is not very good.

 

Thanks for the reply.

 

I just deleted the nathan share from the cache drive as all it contained was android phone backups that were still stored on a USB drive. Once the files were deleted mover seems to be working fine again with the small tests i just threw at it.

 

The files that were being the issue were stored on the cache drive formatted as btrfs, could this be an issue. My old unraid cache wasnt btrfs and i put 2x 120gb ssd's in as a cache pool and thats when my problems started. Should i format to XFS ?

Link to comment

Since deleting problem files does not necessarily guarantee that all file system problems are gone, I would still run file system checks.  Btrfs has the scrub command, which is 'supposed' to fix btrfs issues.  I would try it sooner than later, not trust that drive/pool until it's reported clean.  If you can begin to trust your cache pool, then you may not need to look at other options.

Link to comment

Since deleting problem files does not necessarily guarantee that all file system problems are gone, I would still run file system checks.  Btrfs has the scrub command, which is 'supposed' to fix btrfs issues.  I would try it sooner than later, not trust that drive/pool until it's reported clean.  If you can begin to trust your cache pool, then you may not need to look at other options.

 

It scrubbed fine, do you recall what format used to be default for cache drives in unRAID. Could it be that i'm trying to move certain files that have extended attributes from a btrfs to a reiserfs formatted drive when using rsync.

Link to comment

Since deleting problem files does not necessarily guarantee that all file system problems are gone, I would still run file system checks.  Btrfs has the scrub command, which is 'supposed' to fix btrfs issues.  I would try it sooner than later, not trust that drive/pool until it's reported clean.  If you can begin to trust your cache pool, then you may not need to look at other options.

 

It scrubbed fine, do you recall what format used to be default for cache drives in unRAID. Could it be that i'm trying to move certain files that have extended attributes from a btrfs to a reiserfs formatted drive when using rsync.

in v5 the only supported file system was Reiserfs (assuming that is the question you were asking)
Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.