mgladwin Posted March 31, 2017 Share Posted March 31, 2017 I turned appdata backup off but left it installed. No lock up last night! Will continue to monitor. Sent from my SM-G930F using Tapatalk Quote Link to comment
Squid Posted March 31, 2017 Share Posted March 31, 2017 Ultimately my opinion here is that anyone who has "hanging" issues when utilizing a scheduled backup with CA backup has other issues with the server. Whilst I'm not saying or trying to imply that the backup plugin is perfect, there is nothing in there that would cause a server to actually crash. The backup is simply a rsync command followed by a rm-rf if necessary And in such crash issues, running the script manually succeeds no problems. Net result is that unfortunately, I am not of much help here. About the only suggestion I have is to limit either within share settings or in the backup settings to limit the disks utilized to a single one (ideally XFS) Quote Link to comment
mgladwin Posted March 31, 2017 Share Posted March 31, 2017 Ultimately my opinion here is that anyone who has "hanging" issues when utilizing a scheduled backup with CA backup has other issues with the server. Whilst I'm not saying or trying to imply that the backup plugin is perfect, there is nothing in there that would cause a server to actually crash. The backup is simply a rsync command followed by a rm-rf if necessary And in such crash issues, running the script manually succeeds no problems. Net result is that unfortunately, I am not of much help here. About the only suggestion I have is to limit either within share settings or in the backup settings to limit the disks utilized to a single one (ideally XFS)Thanks for your input Squid and I completely understand your answer. However could it be something to do with stopping all dockers and restarting them all at once before the actual rsync command is run? I have previously limited all share writes to one xfs disk and this didn't help. This is obviously a weird issue and not many people seem to be having it so I know it's hard to pin point answers. I'm a noob and just throwing my two cents around is all. Sent from my SM-G930F using Tapatalk Quote Link to comment
Squid Posted April 1, 2017 Share Posted April 1, 2017 37 minutes ago, mgladwin said: However could it be something to do with stopping all dockers and restarting them all at once The exact same process is done every time you stop and start the array.... 38 minutes ago, mgladwin said: This is obviously a weird issue and not many people seem to be having it so I know it's hard to pin point answers. Yeah.... Only thing that is similar is that some users also have problems with unRaid's mover, and no adequate answer has ever been given (or found for that matter)... And they both utilize rsync. One user found that if CA backup doesn't delete the old expired backup sets, then the crash didn't happen. This tends to imply an issue with the filesystem, since the script issues a standard linux (rm -rf) command to delete a directory. The only log that I've seen showing the crash shows that rsync just plain stopped. Didn't exit or anything. Just stopped doing what it was doing and went into (presumably) an infinite loop. Implies to me again file system issue, or an rsync issue. The script starts the rsync command and then just sits there twiddling its electronic thumbs until the command completes (succesfully or not) To my knowledge, anyone who has seen the problem can run the backup manually and have it succeed all the time. Implies there's nothing wrong with the file system or with rsync. (And if anything, running the script manually is more likely to cause a GUI crash since the GUI is actually somewhat involved (starts it and monitors it). The script itself has no concept of how it was started, and does not adjust itself at all to whether its started via cron or via the GUI) Possible related issue: a few people have with unRaid: shfs running at 100% and locking up the system. AFAIK tends to get blamed on RFS (for right or for wrong). TLDR: filesystem issues (whether direct corruption or an issue with unRaid's user share system) are at the top of my list of suspects, but my arguments for it also have arguments against it. As noted, the vast majority of users have zero issues. Only a very small handful of users are seeing this behaviour. For lack of anything else, my suggestions are: Run memtest from the boot menu for at least a single pass. Convert the entire array over to XFS. Don't mix filesystem types. This includes the cache drive. Anecdotal evidence continually shows that BTRFS does have some problems. Within backup settings, (not share settings), confine the destination to a single disk. If after a crash / lockup, ssh into the server and grab diagnostics if possible and run this command cp /var/lib/docker/unraid/ca.backup.datastore/appdata_backup.log /boot And then post the last bunch of lines from that file (now on the flash drive) along with the diagnostics (and a picture of what's on the locally attached monitor if there is one) Quote Link to comment
mgladwin Posted April 1, 2017 Share Posted April 1, 2017 Every time I've had an issue I haven't been able to SSH in or use local console. With that said, I might use the user scripts plugin to run the appdata backup log copy command you suggested say every 30 seconds and overwrite or append file or something to help capture this. Does that sound reasonable to try in my case? Or maybe just a "tail" capture would suffice? I have recently ran jb's unbalance to help move data around array while I was converting all disks to xfs. This all happened without issue and I believe it uses same rsync command with different flags though. Have done everything else you suggested in your last post bar changing my cache from btrfs to xfs. I have activated appdata backup again to disk 3 (xfs) only. See how tonight goes. Sent from my SM-G930F using Tapatalk Quote Link to comment
Squid Posted April 1, 2017 Share Posted April 1, 2017 (edited) 5 minutes ago, mgladwin said: Every time I've had an issue I haven't been able to SSH in or use local console. With that said, I might use the user scripts plugin to run the appdata backup log copy command you suggested say every 30 seconds and overwrite or append file or something to help capture this. Does that sound reasonable to try in my case? Or maybe just a "tail" capture would suffice? I have recently ran jb's unbalance to help move data around array while I was converting all disks to xfs. This all happened without issue and I believe it uses same rsync command with different flags though. Have done everything else you suggested in your last post bar changing my cache from btrfs to xfs. I have activated appdata backup again to disk 3 (xfs) only. See how tonight goes. Sent from my SM-G930F using Tapatalk The backup log will survive a reboot, so a tail isn't really necessary. If anything, install Fix Common Problems and start up troubleshooting mode and then upload afterwards the end of the backup log, FCPsyslog_tail.txt, and the last generated diagnostics. EDIT: an a pic of what's on the monitor before you reset Edited April 1, 2017 by Squid Quote Link to comment
mgladwin Posted April 1, 2017 Share Posted April 1, 2017 Ok so I saw there was a copy rsync log to flash setting in the CA Backup Settings which I wondered if this did same thing as you suggested. So i changed settings back to backing up to user share and not a particular disk and thought i would test to see what was actually outputted to the flash drive. I come back in 5 minutes and its locked up just the same! SSH'ed in a logged in, ran " cp /var/lib/docker/unraid/ca.backup.datastore/appdata_backup.log /boot " which gave me this (assumed typical time out or something):- Then this added to it after about 1 minute- And this is local monitor:- Right now I cant access GUI, dockers, anything. I will hard restart and put in trouble shooting mode and repeat same procedure hopefully capturing logs needed. Will report back again when thats done. Cheers Quote Link to comment
Squid Posted April 1, 2017 Share Posted April 1, 2017 (edited) 32 minutes ago, mgladwin said: Ok so I saw there was a copy rsync log to flash setting in the CA Backup Settings which I wondered if this did same thing as you suggested. It is, but when it does it is too late for when I wanted. Takes a while because it is a huge file. IRQ 16: disabled Now that's what I was looking for. (Well not exactly, but it is something to go on, and in retrospect makes some sense) After you reboot, can you post the output of cat /proc/interrupts Edited April 1, 2017 by Squid Quote Link to comment
mgladwin Posted April 1, 2017 Share Posted April 1, 2017 (edited) Sorry was one step ahead already. I rebooted and put in trouble shooting mode and re-ran manual appdata backup. Seemed to finish OK (on the status page in CA backup settings) and I was able to use some dockers but not all (lIke some had restarted but not all of them). No GUI still and forgot to check local monitor! I also forgot to capture the ca appdata backup log. Doh! This time I could seemingly also still use console so I did the " cat /proc/interrupt " as requested. All info (i remembered to get) attached to this post. And now i'm in trouble with my 4 year old daughter as her movie on Plex keeps stopping! Look forward to hearing from you Squid! Cheers. FCPsyslog_tail.txt tower-diagnostics-20170401-1351.zip EDIT: last few lines of appdata_backup.log (seems all ok to me) 2017/04/01 14:03:03 [30434] >f+++++++++ unifi/logs/server.log.1 2017/04/01 14:03:03 [30434] >f+++++++++ unifi/logs/server.log.2 2017/04/01 14:03:03 [30434] >f+++++++++ unifi/logs/server.log.3 2017/04/01 14:03:03 [30434] cd+++++++++ unifi/run/ 2017/04/01 14:03:03 [30434] >f+++++++++ unifi/run/firmware.json 2017/04/01 14:03:03 [30434] >f+++++++++ unifi/run/update.json 2017/04/01 14:03:03 [30434] sent 23,661,736,341 bytes received 2,539,866 bytes 41,048,180.76 bytes/sec 2017/04/01 14:03:03 [30434] total size is 23,642,044,893 speedup is 1.00 Restarting Duckdns Restarting letsencrypt Restarting NZBGet Restarting ombi Restarting openvpn-as Restarting plex Restarting plexpy Restarting quassel-core Restarting radarr Restarting sonarr Restarting tvheadend Restarting unifi Backup/Restore Complete. Rsync Status: Success Deleting Dated Backup set: /mnt/user/appdata_backup/[email protected] Deleting /mnt/user/appdata_backup/[email protected] Deleting Dated Backup set: /mnt/user/appdata_backup/[email protected] Deleting /mnt/user/appdata_backup/[email protected] Edited April 1, 2017 by mgladwin Quote Link to comment
Squid Posted April 1, 2017 Share Posted April 1, 2017 I think @RobJ might be able to help out here on why IRQ 16 is getting disabled (appears to happen under high I/O load), as I think he might be of more help. Since that screenshot was presumably after the reboot (when IRQ 16 is functioning normally), can you try and replicate this again, and after the IRQ gets disabled do the same cat /proc/interrupts again as a comparison (But do it after your daughter goes to sleep) Quote Link to comment
mgladwin Posted April 1, 2017 Share Posted April 1, 2017 I think [mention=189]RobJ[/mention] might be able to help out here on why IRQ 16 is getting disabled (appears to happen under high I/O load), as I think he might be of more help. Since that screenshot was presumably after the reboot (when IRQ 16 is functioning normally), can you try and replicate this again, and after the IRQ gets disabled do the same cat /proc/interrupts again as a comparison (But do it after your daughter goes to sleep) No worries. Will try and replicate. Generally I don't have access to console in any form after a lock up. How could I capture interrupts in this case? That last cat interrupts I posted was after a lock up but I seemed to still have SSH console available and I don't remember seeing the irq 16 message to be honest. Sent from my SM-G930F using Tapatalk Quote Link to comment
Squid Posted April 1, 2017 Share Posted April 1, 2017 12 minutes ago, mgladwin said: No worries. Will try and replicate. Generally I don't have access to console in any form after a lock up. How could I capture interrupts in this case? That last cat interrupts I posted was after a lock up but I seemed to still have SSH console available and I don't remember seeing the irq 16 message to be honest. Sent from my SM-G930F using Tapatalk If you have no access to the console, then you won't be able to do anything. But tomorrow sometime I'm going to get you to change the rsync parameters in the options to lower it's I/O bandwidth to see if it makes a difference. We're down to it being a hardware issue. No clue as to what the actual problem is or why it's happening but we may be able to work around it nonetheless. I just gotta do some testing to see what to get you to change in the parameters Quote Link to comment
mgladwin Posted April 1, 2017 Share Posted April 1, 2017 If it helps. EDIT : I do have tips and tweaks plugin with most of the "go faster" options on. I case this matters. Sent from my SM-G930F using Tapatalk Quote Link to comment
Squid Posted April 1, 2017 Share Posted April 1, 2017 (edited) Try this out: (Note that for users that have the lockups when a backup is NOT set to run, I can guarantee that your issue has nothing to do with me, whether or not CA Appdata Backup is installed or not). It also appears that this issue is purely a hardware / driver issue. The following is a possible workaround that may help. Update CA Appdata Backup to 2017.04.01 Create this file on the flash drive: /config/plugins/ca.backup/nice Edit the file, and have it contain the following: nice -n19 ionice -c3 Then try the backup again. What that line above is doing is running the rsync and the rm command at the lowest priority for both CPU and I/O. No guarantees, but maybe just maybe.... Edited April 1, 2017 by Squid Quote Link to comment
rippernz Posted April 1, 2017 Author Share Posted April 1, 2017 I have just added this and will see what it does, its due to kick off in about 1 1/2 hrs. Me, im off to bed now 1.20am here. Quote Link to comment
mgladwin Posted April 1, 2017 Share Posted April 1, 2017 (edited) 7 hours ago, Squid said: Try this out: (Note that for users that have the lockups when a backup is NOT set to run, I can guarantee that your issue has nothing to do with me, whether or not CA Appdata Backup is installed or not). It also appears that this issue is purely a hardware / driver issue. The following is a possible workaround that may help. Update CA Appdata Backup to 2017.04.01 Create this file on the flash drive: /config/plugins/ca.backup/nice Edit the file, and have it contain the following: nice -n19 ionice -c3 Then try the backup again. What that line above is doing is running the rsync and the rm command at the lowest priority for both CPU and I/O. No guarantees, but maybe just maybe.... Have also done as suggested, Will report back soon. Either way, thanks very much for your help on this Squid. EDIT: So I added the 'nice' file as per above and updated CA Appdata Backup. I also upgraded unRAID to 6.3.3. Put unRAID in troubleshooting mode and ran a manual appdata backup. Captured appdata_backup.log and all seemed to run/finish perfectly. All dockers came back up and the Web GUI and SSH window were both responsive after the backup. I'm calling that a clean, successful backup. It also didn't seem to take any longer than previous backups. I will continue to monitor over the next few days obviously report back. So far so good! Again @Squid thank you very much for your effort to help us all out and I hope this solution can help fix others having similar problems. I'm not calling the cat out of the bag just yet but it seems we might be on a winner. Edited April 1, 2017 by mgladwin Quote Link to comment
rippernz Posted April 1, 2017 Author Share Posted April 1, 2017 Ok, So the time is now nearly 8.15 am and my backup is still going (started at 3am) i think it was doing this yesterday as well as its also slowing down my parity check (i think). No dockers were started yesterday due to the backup running (i think), parity check was running at about 4MB /s and everything else was unresponsive after about 2 min of trying to do something. So if the backups are still going the server acts like it is hung. Also yesterday afternoon i tried deleting old backups over the keep threshold and it took over an hr and the server was not responsive for a short while (like it had hung). are we starting to see that backup & restore is hogging the system during backups? My current parity check is going at 1.2MB/s and going to take 3 days 21hrs to complete just under 4TB of the 8TB parity drive. One thing is for sure im able to use the GUI add view the backup that currently happening Quote Link to comment
Squid Posted April 1, 2017 Share Posted April 1, 2017 Ok, So the time is now nearly 8.15 am and my backup is still going (started at 3am) i think it was doing this yesterday as well as its also slowing down my parity check (i think). No dockers were started yesterday due to the backup running (i think), parity check was running at about 4MB /s and everything else was unresponsive after about 2 min of trying to do something. So if the backups are still going the server acts like it is hung. Also yesterday afternoon i tried deleting old backups over the keep threshold and it took over an hr and the server was not responsive for a short while (like it had hung). are we starting to see that backup & restore is hogging the system during backups? My current parity check is going at 1.2MB/s and going to take 3 days 21hrs to complete just under 4TB of the 8TB parity drive. One thing is for sure im able to use the GUI add view the backup that currently happeningThere's something else going on. My mod is a hail Mary workaround that might mask the problem based upon the irq disabled posted above. Ultimately this issue is out of my hands and more info (especially screenshots of the local monitor) are what's required to help others assist you guys.Sent from my LG-D852 using Tapatalk Quote Link to comment
mgladwin Posted April 2, 2017 Share Posted April 2, 2017 Unfortunately still getting IRQ 16 disabled and lock ups. Will continue to fault find but to be honest it's way above me. Worth starting a new thread for this@Squid?Sent from my SM-G930F using Tapatalk Quote Link to comment
Squid Posted April 2, 2017 Share Posted April 2, 2017 Just now, mgladwin said: Unfortunately still getting IRQ 16 disabled and lock ups. Will continue to fault find but to be honest it's way above me. Worth starting a new thread for this@Squid? Sent from my SM-G930F using Tapatalk I would say yes. It's beyond me as we're going into hardware issues... Quote Link to comment
mgladwin Posted April 2, 2017 Share Posted April 2, 2017 I would say yes. It's beyond me as we're going into hardware issues...No worries. Thanks for your help. Sent from my SM-G930F using Tapatalk Quote Link to comment
rippernz Posted April 4, 2017 Author Share Posted April 4, 2017 So i decided to upgrade to 6.3.3 2 days ago which is why i haven't posted anything back. So far so good, the last 2 nights i haven't had any hangs/lockups. I did note in 6.3.3 that reiserfsprogs has been downgraded, not sure if this is the cause of no lockup or not or even if it makes any difference. @mgladwinhave you upgraded to 6.3.3 yet? Quote Link to comment
mgladwin Posted April 4, 2017 Share Posted April 4, 2017 Yeah I have. Still having issues though. I started another thread based on my issue directly hoping someone can help with that. Have ascertained my issue is hardware related or so it seems. Sent from my SM-G930F using Tapatalk Quote Link to comment
grither Posted April 5, 2017 Share Posted April 5, 2017 still having same lockups even after rolling back to 6.2.4. they are just happening every 5 days, instead of every 2. this morning, uninstalled all plugins, and left my 3 dockers (plex, sab, and sonarr, all needo versions) will see if any change. Quote Link to comment
grither Posted April 12, 2017 Share Posted April 12, 2017 Anyone get their systems stable? I rolled back to 6.2.4 and still had lockups. Upgraded to 6.3.3, uninstalled all optional plugins, amd still got lockup after 3.5 days now going to try setting mover to monthly, instead of hourly. Its 1tb so should be okay Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.