dangil Posted May 24, 2012 Share Posted May 24, 2012 Since beta6a, until rc3, I have a intermittent issue while stopping the array. umount crashes the server when stopping the array I have a Supermicro X8SIL-V motherboard, with 6 onboard sata. If I only use these ports, I don't see the issue. if there is more than 1 sata controller, the issue appears. I did several cycles of mount/umount of disk6, that is the disk attached to pci-e controller, in maintenence mode, as well as in normal mode (with SMB Export set to off), and none failed I think this bug is brought up because the umount of disk5, the last disk on the onboard sata controller, is not completelly finished when, in the sequence, the umount for disk6, attached to the pci-e controler is started what led me to this conclusion is that on my syslog , 2 disks remain busy after umount crashes the kernel. the conclusion is that the unmount of the second to last disk and the unmount of the last disk conflict with each other. perhaps from a race condition between them. I think if a small delay between umounts is added, this issue could be solved attached is the syslog with the kernel BUG info syslog.zip Quote Link to comment
dangil Posted June 4, 2012 Author Share Posted June 4, 2012 an update since I started unmounting disk6 (connected to the PCI-E Sata Controller) manually, everything is running smoothly. no more crashes I can press stop, unraid will unmount the disks connected to the onboard sata, and stop the array Quote Link to comment
limetech Posted June 4, 2012 Share Posted June 4, 2012 an update since I started unmounting disk6 (connected to the PCI-E Sata Controller) manually, everything is running smoothly. no more crashes I can press stop, unraid will unmount the disks connected to the onboard sata, and stop the array I have a fix for this in -rc4 that may solve the problem. I say "may" because, though I can make it happen infrequently without the fix, with the fix I can't make it happen. But maybe after the 543'd stop/start sequence it will happen. The problem is a race condition in reiserfs that I think was introduced when they removed the "BKL" (big kernel lock) from the code, but I don't understand all the code enough to isolate and solve, my solution is a workaround, though I think it works. Maybe I should go and try and visit Hans in San Quentin Quote Link to comment
madburg Posted June 5, 2012 Share Posted June 5, 2012 an update since I started unmounting disk6 (connected to the PCI-E Sata Controller) manually, everything is running smoothly. no more crashes I can press stop, unraid will unmount the disks connected to the onboard sata, and stop the array I have a fix for this in -rc4 that may solve the problem. I say "may" because, though I can make it happen infrequently without the fix, with the fix I can't make it happen. But maybe after the 543'd stop/start sequence it will happen. The problem is a race condition in reiserfs that I think was introduced when they removed the "BKL" (big kernel lock) from the code, but I don't understand all the code enough to isolate and solve, my solution is a workaround, though I think it works. Maybe I should go and try and visit Hans in San Quentin Oh, that last part was too funny. I'll drink to that. Quote Link to comment
dalben Posted June 6, 2012 Share Posted June 6, 2012 I'm getting the same issue on rc4. I try to stop the array and the unraid server locks up. Doesn't even respond to a ping. Quote Link to comment
RobJ Posted June 7, 2012 Share Posted June 7, 2012 I'm getting the same issue on rc4. I try to stop the array and the unraid server locks up. Doesn't even respond to a ping. Can you provide a syslog that is saved as late as possible, and/or run a console or Telnet/PuTTY 'tail -f' and show us the very last messages? Quote Link to comment
dalben Posted June 7, 2012 Share Posted June 7, 2012 I'm getting the same issue on rc4. I try to stop the array and the unraid server locks up. Doesn't even respond to a ping. Can you provide a syslog that is saved as late as possible, and/or run a console or Telnet/PuTTY 'tail -f' and show us the very last messages? Here's what I think you're after. Jun 6 18:25:57 tdm status[23414]: No active PIDS on the array Jun 6 18:25:58 tdm rc.unRAID[23452]: Killing active pids on the array drives Jun 6 18:25:58 tdm rc.unRAID[23480]: Umounting the drives Jun 6 18:25:58 tdm rc.unRAID[23484]: /dev/md1 umounted Jun 6 18:25:58 tdm rc.unRAID[23484]: /dev/md2 umounted Jun 6 18:25:58 tdm rc.unRAID[23484]: /dev/md3 umounted Jun 6 18:25:59 tdm rc.unRAID[23494]: Stopping the Array Jun 6 18:25:59 tdm kernel: mdcmd (20): stop Jun 6 18:25:59 tdm kernel: md1: stopping Jun 6 18:25:59 tdm kernel: md2: stopping Jun 6 18:25:59 tdm kernel: md3: stopping Jun 6 18:26:02 tdm mdstatusdiff[23506]: --- /tmp/mdcmd.23346.1^I2012-06-06 18:25:59.481577266 +0800 Jun 6 18:26:02 tdm mdstatusdiff[23506]: +++ /tmp/mdcmd.23346.2^I2012-06-06 18:26:02.631635271 +0800 Jun 6 18:26:02 tdm mdstatusdiff[23506]: @@ -1,14 +1,14 @@ Jun 6 18:26:02 tdm mdstatusdiff[23506]: sbName=/boot/config/super.dat Jun 6 18:26:02 tdm mdstatusdiff[23506]: sbVersion=2.1.3 Jun 6 18:26:02 tdm mdstatusdiff[23506]: sbCreated=1314335405 Jun 6 18:26:02 tdm mdstatusdiff[23506]: -sbUpdated=1338978072 Jun 6 18:26:02 tdm mdstatusdiff[23506]: -sbEvents=260 Jun 6 18:26:02 tdm mdstatusdiff[23506]: -sbState=0 Jun 6 18:26:02 tdm mdstatusdiff[23506]: +sbUpdated=1338978359 Jun 6 18:26:02 tdm mdstatusdiff[23506]: +sbEvents=261 Jun 6 18:26:02 tdm mdstatusdiff[23506]: +sbState=1 Jun 6 18:26:02 tdm mdstatusdiff[23506]: sbNumDisks=4 Jun 6 18:26:02 tdm mdstatusdiff[23506]: sbSynced=1338945606 Jun 6 18:26:02 tdm mdstatusdiff[23506]: sbSyncErrs=0 Jun 6 18:26:02 tdm mdstatusdiff[23506]: mdVersion=2.1.3 Jun 6 18:26:02 tdm mdstatusdiff[23506]: -mdState=STARTED Jun 6 18:26:02 tdm mdstatusdiff[23506]: +mdState=STOPPED Jun 6 18:26:02 tdm mdstatusdiff[23506]: mdNumProtected=4 Jun 6 18:26:02 tdm mdstatusdiff[23506]: mdNumDisabled=0 Jun 6 18:26:02 tdm mdstatusdiff[23506]: mdDisabledDisk=0 Looking at the time, that *copuld* be a console generated shutdown command. The lockuop is more on the stop array from the webgui. I'm not sure what tail -f is. How do I set it up. I run a putty window, typed tail -f and got the following. Now what ? Linux 3.0.33-unRAID. root@tdm:~# tail -f tail: warning: following standard input indefinitely is ineffective Quote Link to comment
limetech Posted June 7, 2012 Share Posted June 7, 2012 I'm getting the same issue on rc4. I try to stop the array and the unraid server locks up. Doesn't even respond to a ping. Can you provide a syslog that is saved as late as possible, and/or run a console or Telnet/PuTTY 'tail -f' and show us the very last messages? Here's what I think you're after. Jun 6 18:25:57 tdm status[23414]: No active PIDS on the array Jun 6 18:25:58 tdm rc.unRAID[23452]: Killing active pids on the array drives Jun 6 18:25:58 tdm rc.unRAID[23480]: Umounting the drives Jun 6 18:25:58 tdm rc.unRAID[23484]: /dev/md1 umounted Jun 6 18:25:58 tdm rc.unRAID[23484]: /dev/md2 umounted Jun 6 18:25:58 tdm rc.unRAID[23484]: /dev/md3 umounted Jun 6 18:25:59 tdm rc.unRAID[23494]: Stopping the Array Jun 6 18:25:59 tdm kernel: mdcmd (20): stop Jun 6 18:25:59 tdm kernel: md1: stopping Jun 6 18:25:59 tdm kernel: md2: stopping Jun 6 18:25:59 tdm kernel: md3: stopping Jun 6 18:26:02 tdm mdstatusdiff[23506]: --- /tmp/mdcmd.23346.1^I2012-06-06 18:25:59.481577266 +0800 Jun 6 18:26:02 tdm mdstatusdiff[23506]: +++ /tmp/mdcmd.23346.2^I2012-06-06 18:26:02.631635271 +0800 Jun 6 18:26:02 tdm mdstatusdiff[23506]: @@ -1,14 +1,14 @@ Jun 6 18:26:02 tdm mdstatusdiff[23506]: sbName=/boot/config/super.dat Jun 6 18:26:02 tdm mdstatusdiff[23506]: sbVersion=2.1.3 Jun 6 18:26:02 tdm mdstatusdiff[23506]: sbCreated=1314335405 Jun 6 18:26:02 tdm mdstatusdiff[23506]: -sbUpdated=1338978072 Jun 6 18:26:02 tdm mdstatusdiff[23506]: -sbEvents=260 Jun 6 18:26:02 tdm mdstatusdiff[23506]: -sbState=0 Jun 6 18:26:02 tdm mdstatusdiff[23506]: +sbUpdated=1338978359 Jun 6 18:26:02 tdm mdstatusdiff[23506]: +sbEvents=261 Jun 6 18:26:02 tdm mdstatusdiff[23506]: +sbState=1 Jun 6 18:26:02 tdm mdstatusdiff[23506]: sbNumDisks=4 Jun 6 18:26:02 tdm mdstatusdiff[23506]: sbSynced=1338945606 Jun 6 18:26:02 tdm mdstatusdiff[23506]: sbSyncErrs=0 Jun 6 18:26:02 tdm mdstatusdiff[23506]: mdVersion=2.1.3 Jun 6 18:26:02 tdm mdstatusdiff[23506]: -mdState=STARTED Jun 6 18:26:02 tdm mdstatusdiff[23506]: +mdState=STOPPED Jun 6 18:26:02 tdm mdstatusdiff[23506]: mdNumProtected=4 Jun 6 18:26:02 tdm mdstatusdiff[23506]: mdNumDisabled=0 Jun 6 18:26:02 tdm mdstatusdiff[23506]: mdDisabledDisk=0 Looking at the time, that *copuld* be a console generated shutdown command. The lockuop is more on the stop array from the webgui. I'm not sure what tail -f is. How do I set it up. I run a putty window, typed tail -f and got the following. Now what ? Linux 3.0.33-unRAID. root@tdm:~# tail -f tail: warning: following standard input indefinitely is ineffective you want tail -f /var/log/syslog But I don't recognize "rc.unRAID" in your syslog, what is this? And please disable for now. Quote Link to comment
dalben Posted June 7, 2012 Share Posted June 7, 2012 Thanks for the command. In true demonstration mode, it didn't hang this time. Stopped the array fine from the webgui (simplefeautures). reboot from the webgui. All good. Between the last hang and now I stripped back everything down to bare unraid (removed all plugins and packages) and started rebuilding the server with the add-ons I need from scratch. I can only assume that there was some mixture of plugins/packages and caused some issues. The only lingering issue I have is cache won't unmount via the webgui so I need to force a umount -f on it. wrt rc.unRAID I a) don't know what it is and b) wouldn't know how to disable it. Quote Link to comment
limetech Posted June 7, 2012 Share Posted June 7, 2012 The only lingering issue I have is cache won't unmount via the webgui so I need to force a umount -f on it. This is a problem. Please disable all plug-ins and retest. Then if it still happens, open another issue thread. Quote Link to comment
dangil Posted June 7, 2012 Author Share Posted June 7, 2012 thanks Tom! updating to rc-4 to try this fix. will report back Quote Link to comment
chickensoup Posted June 7, 2012 Share Posted June 7, 2012 thanks Tom! updating to rc-4 to try this fix. will report back When you update to rc-4, make sure you do a clean install on your flash to remove any traces of old plugins etc. test this with no plugins and report the results. This includes simplefeatures, unmenu or anything that isn't supplied by lime-tech's default install. If the issue is still happening please repost a sylog with rc4. Quote Link to comment
dalben Posted June 13, 2012 Share Posted June 13, 2012 I've had this happen to me again today with RC4 I do have a few plugins installed but none of them are any different to the plugins that worked with RC3 and all previous RC & Beta release. I found my webgui (simplefeatures) was non responsive. unmenu worked for a while then that hang. From a console window I restarted unmenu fine. I clicked the "stop array" button and after about half a minute the server became unresponsive, not even responding to a ping. I've attached the tail -f syslog, but the only syslog entries that appeared after I clicked "Stop Array" were the following: Jun 13 20:21:35 tdm kernel: mdcmd (56): spinup 0 Jun 13 20:21:35 tdm kernel: Jun 13 20:21:35 tdm kernel: mdcmd (57): spinup 1 Jun 13 20:21:35 tdm kernel: Jun 13 20:21:35 tdm kernel: mdcmd (58): spinup 2 Jun 13 20:21:35 tdm kernel: Jun 13 20:21:35 tdm kernel: mdcmd (59): spinup 3 Jun 13 20:21:35 tdm kernel: Jun 13 20:21:43 tdm mountd[4117]: Caught signal 15, un-registering and exiting. Jun 13 20:21:44 tdm kernel: nfsd: last server has exited, flushing export cache Throughout the course of the day I had been working on cleaning all my and packages plugins and ensuring that only the latest working ones were installed. I know it was all fine earlier today as I restarted the server a few times to make sure that the SAB/SB/CP/TransM/FlexGet combo all started fine without any errors. I could try and strip everything back to the core UnRaid rc4 again but that sort of defeats the purpose as I only have a live server and I really don't want to lose the SAB/SB/CP/TransM/FlexGet automation I have setup. tdm_hang_syslog.txt Quote Link to comment
madburg Posted June 14, 2012 Share Posted June 14, 2012 Well, look at it this way. Tom is changing things, right? So, you can't go by a third party plugin that (may have) worked even one version prior. It's for the plug-in creator to adjust, change etc their plugin. For your sanity and others, first try removing all plugins and test various aspects. If all checks out for you, add one plugin at a time and test over again. If you find a plugin issue, post within the plugin forum for help (most likely will help others as well). Its not very hard to back up all the files (.plg, go script, etc) and start fresh and add each one by one, once you know the RC checks out. Quote Link to comment
dalben Posted June 14, 2012 Share Posted June 14, 2012 Well, look at it this way. Tom is changing things, right? So, you can't go by a third party plugin that (may have) worked even one version prior. It's for the plug-in creator to adjust, change etc their plugin. For your sanity and others, first try removing all plugins and test various aspects. If all checks out for you, add one plugin at a time and test over again. If you find a plugin issue, post within the plugin forum for help (most likely will help others as well). Its not very hard to back up all the files (.plg, go script, etc) and start fresh and add each one by one, once you know the RC checks out. Yeah, look, I'm not looking for someone's undivided attention to fix my issue if I'm not prepared to spend time doing a proper debug. I'm just reporting what I saw and threw up a syslog and if that helps tom or others find a bug in unraid or a package/plugin, great. I had already removed all plugins/packages and slowly re-added what I needed one by one. Stopping and restarting the array each time. All worked well with the final steps doing full server restarts. The problem appeared a day later when I wanted to stop the array. Quote Link to comment
madburg Posted June 14, 2012 Share Posted June 14, 2012 Yeah, look, I'm not looking for someone's undivided attention to fix my issue if I'm not prepared to spend time doing a proper debug. I'm just reporting what I saw and threw up a syslog and if that helps tom or others find a bug in unraid or a package/plugin, great. Its not so much that, as helping to get to the root cause so Tom can reproduce and notify if it is a bug or not. The plug owner(s) mostly dont know exactly what your running via this post so you wont get help here if it is a plugin problem. You should try to run the RC, grab a spare PC, etc. and copy a ton of data were you normally place data (cache, etc...) run something like that (and any thing else you normally do, understood that it wont be SAB/SB/CP/TransM/FlexGet off of unRAID) and wait a day, if this tests out and the Array stops, you know its a plug-in (as you were not running any). Then add one plug-in, say SAB, start your downloads for a day, and try to stop the array,. Your choice to keep adding additional plug-ins (keeping the prior on) or removing prior plugin and add a different one. Rinse and repeat after a full day for each. You WILL find which is causing you the issue. Quote Link to comment
dangil Posted June 26, 2012 Author Share Posted June 26, 2012 so far so good no crashes with rc4 clean install, no plugins or mods as always if it happens again I will report Quote Link to comment
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.