August 24, 201114 yr What are your plans if you have a power failure? I have an APC Home Theater battery backup and conditioner, and only the unRAID and PCH are connected to the battery outlets. Thats good. Should you have a power failure that lasts longer than your UPS run time, you would want your unRAID server to be able to shutdown cleanly, which you have reported it does not. So even though as you stated, dont normally shut it down unless you perform hardware maintenance, it will not cleanly shutdown in this senario. Which again may not be a concern for you, but is for others.
August 24, 201114 yr To add for some reason there is a community powerdown script that overwrites yours to be able to successfully shutdown unRAID, not sure why, but may need to be looked into as well to be incorporated…. No, Tom specifically set up his version of powerdown to not kill processes keeping disks busy. There are a number of reasons for this and it IS the correct approach. The community powerdown package takes a much more forceful approach, which for people running a UPS MIGHT be needed, but is not necessarily. Tom's approach in this case is the correct one. I'm curious. Assuming a UPS is connected. If processes aren't stopped, the array can't shut down, and the batteries run out, resulting in unclean file systems and possibly lost data. However if you kill processes before the battery runs out, and can stop the array and shut down cleanly, the system should come back in a clean state. So for a UPS environment (and maybe even a non-ups environment), wouldn't the preference be to kill processes stopping the array from shutting down cleanly? And now isn't there a "verification" button on the array stop? Maybe something that says "Hey, I see these processes using the array, if you stop it, I will kill them" warning.
August 24, 201114 yr I'm curious. Assuming a UPS is connected. If processes aren't stopped, the array can't shut down, and the batteries run out, resulting in unclean file systems and possibly lost data. Could possibly result in data loss, though it is unlikely. However if you kill processes before the battery runs out, and can stop the array and shut down cleanly, the system should come back in a clean state. again, depends on what is running. If a process is stopped in an unclean manner, then there is a possibility of data lose there also. So for a UPS environment (and maybe even a non-ups environment), wouldn't the preference be to kill processes stopping the array from shutting down cleanly? That is something for the user to decide and not Tom. Tom needs to take the approach of keeping everything up if something is causing it to be that way. It will be up to a plugin dev to make his plugin conform to the proper methods and procedures. And now isn't there a "verification" button on the array stop? Maybe something that says "Hey, I see these processes using the array, if you stop it, I will kill them" warning. That I do not know about. I am still running 5.0b6a on my servers.
August 24, 201114 yr I use the Unmenu package for clean power down and if it hangs on power down pressing the power button but not holding it down seems to work for doing a clean power down. I have not had any sync issues and I have extensively tested. I hope this helps.
August 24, 201114 yr Sounds like we should leave Tom's powerdown the way it is and use the communities powerdown (rename to ex. forcepowerdown) and have the UPS script use the 'forcepowerdown' for does powerfailure events to make sure it kills process and gets shutdown...
August 25, 201114 yr Update: This week should see -beta12 which uses the linux 3.0.3 kernel. So here's the explanation. First, it is very desirable to be keeping up with linux kernel releases. This is because the latest drivers and bug fixes almost always go into the "current" kernel, and it's up to various maintainers to port fixes into previous releases as they see fit. The Realtek r8169 driver is a good case in point. Looking through the change logs, since 2.6.39 there have been quite a few fixes, all the way up through and including kernel 3.0. For some reason Realtek NIC's and their drivers have been problematic from time-to-time, on many platforms besides linux; and, it's mainly updates to this driver which I watch to determine when to upgrade the kernel But something changed starting in 2.6.39 which "broke" parity-sync, or rather, caused it to slow waaaaay down, i.e., running at 25% of normal speed. I'd say I've spent probably a solid 40-50 hours trying to figure out what this problem was. I didn't know if this was a kernel problem, or an unraid driver problem being brought out now, or some other systemic problem. Turns out it was caused by a significant change in the kernel which resulted in about a 6-line change in the unraid driver to fix. So.. I am completing testing on -beta12 and should be able to release very soon. I don't know if this will solve the 'shutdown' crash since I can't reproduce that for some reason. Will this cause the hangs I was experiencing? (Server would become unresponsive to anything besides terminal/console. Showing uptime would show an ever increasing load. Any type of shutdown command would fail. Only option was to halt disks and manually power off. This would happen in idle situations where it'd sit for a few hours. I should really be more specific - sorry. Link to my original .10 release post. http://lime-technology.com/forum/index.php?topic=14158.msg135967#msg135967 I have been hesitant to try a beta since then - I did have some parity issues .. haven't found the files however.
August 25, 201114 yr Tom, Quick Q. Given the r8169 driver is causing us with certain chipsets issues are you thinking either: A. Hopefully the latest kernel will fix the issue. B. You have a plan B? Cheers
August 26, 201114 yr Hi Tom, I think I've managed to recreate the oops: I was browsing through NFS on a different machine then clicked stop with the NFS directory still in use on the array and I got the oops. Tested stopping and starting normally and there was no oops. Message from syslogd@mediastorage at Fri Aug 26 12:32:51 2011 ... mediastorage kernel: Call Trace: Message from syslogd@mediastorage at Fri Aug 26 12:32:51 2011 ... mediastorage kernel: Code: eb f6 5d c3 55 89 e5 9c 59 fa ba 00 01 00 00 3e 66 0f c1 10 38 f2 74 06 f3 90 8a 10 eb f6 89 c8 5d c3 55 89 e5 fa ba 00 01 00 00 <3e> 66 0f c1 10 38 f2 74 06 f3 90 8a 10 eb f6 5d c3 55 89 e5 fe Message from syslogd@mediastorage at Fri Aug 26 12:32:51 2011 ... mediastorage kernel: EIP: [<c131ffdf>] _raw_spin_lock_irq+0x9/0x1a SS:ESP 0068:f1371e18 Message from syslogd@mediastorage at Fri Aug 26 12:32:51 2011 ... mediastorage kernel: CR2: 0000000000000040 Message from syslogd@mediastorage at Fri Aug 26 12:32:51 2011 ... mediastorage kernel: Stack: Message from syslogd@mediastorage at Fri Aug 26 12:32:51 2011 ... mediastorage kernel: Process mdrecoveryd (pid: 1644, ti=f1370000 task=f1aee7c0 task.ti=f1370000) Message from syslogd@mediastorage at Fri Aug 26 12:32:51 2011 ... mediastorage kernel: last sysfs file: /sys/devices/pci0000:00/0000:00:13.2/usb2/2-1/2-1:1.0/host7/target7:0:0/7:0:0:0/block/sdj/stat Message from syslogd@mediastorage at Fri Aug 26 12:32:51 2011 ... mediastorage kernel: Oops: 0002 [#1] SMP
August 26, 201114 yr I got this error now Tower kernel: Oops: 0002 [#1] SMP Message from syslogd@Tower at Fri Aug 26 16:23:21 2011 ... Tower kernel: last sysfs file: /sys/devices/pci0000:00/0000:00:12.2/usb1/1-4/1-4:1.0/host0/target0:0:0/0:0:0:0/block/sda/stat Message from syslogd@Tower at Fri Aug 26 16:23:21 2011 ... Tower kernel: Call Trace: Message from syslogd@Tower at Fri Aug 26 16:23:21 2011 ... Tower kernel: Code: eb f6 5d c3 55 89 e5 9c 59 fa ba 00 01 00 00 f0 66 0f c1 10 38 f2 74 06 f3 90 8a 10 eb f6 89 c8 5d c3 55 89 e5 fa ba 00 01 00 00 <f0> 66 0f c1 10 38 f2 74 06 f3 90 8a 10 eb f6 5d c3 55 89 e5 fe Message from syslogd@Tower at Fri Aug 26 16:23:21 2011 ... Tower kernel: CR2: 0000000000000040 Message from syslogd@Tower at Fri Aug 26 16:23:21 2011 ... Tower kernel: EIP: [<c131ffdf>] _raw_spin_lock_irq+0x9/0x1a SS:ESP 0068:f7009e18 Message from syslogd@Tower at Fri Aug 26 16:23:21 2011 ... Tower kernel: Stack: Message from syslogd@Tower at Fri Aug 26 16:23:21 2011 ... Tower kernel: Process mdrecoveryd (pid: 7405, ti=f7008000 task=f04e09f0 task.ti=f7008000)
August 27, 201114 yr Hi guys, I was running unraid 4.7 perfectly fine but curiosity got the better of me and I installed 5.0-beta11. Everything went smoothly and I installed as per the instructions and ran the permissions script. However I was only able to access only one of my shares (out of eleven shares), thankfully it was my most important share folder. I was however able to access all the folders that reside under the shares by browsing through the disk shares. I have since reverted back to 4.7 and everything is working as per normal. Thankfully no damage or anything lost. Does anyone have any ideas as to what I may have possibly missed or done incorrectly? Thanks in advance!
August 27, 201114 yr Tom, Quick Q. Given the r8169 driver is causing us with certain chipsets issues are you thinking either: A. Hopefully the latest kernel will fix the issue. B. You have a plan B? Cheers The current answer is NO. Tom used the latest kernel available on 5beta12, but the problem persist into the driver. I'm sorry, but the users of this NIC only have two options now, buy another NIC (preferable an Intel one) or use an old version of betas, until this driver issue got addressed. If you have a PCI slot available and don't want to waste a PCIe slot, I strongly suggest this one: http://www.ebay.com/itm/Intel-8390MT-Pro-1000MT-Gigabit-PCI-Network-PCI-Card-/110673742000?pt=LH_DefaultDomain_0&hash=item19c4ab48b0 I used one of this for about one year without a single problem.
August 30, 201114 yr Right, so we have to either spend more money and have higher power draw for something that can be fixed by allowing users to choose which network driver to use. What Realtek NICs work/don't work? Sorry but the software broke it. The software fixes it.
August 30, 201114 yr Right, so we have to either spend more money and have higher power draw for something that can be fixed by allowing users to choose which network driver to use. What Realtek NICs work/don't work? Sorry but the software broke it. The software fixes it. From what I have gathered all Realtek NIC's except for the latest Realtek 8111E work fine. And if you would check the b12 thread (read through it) there is something in there that can be tried that seems to have fixed the issue with the 8111E NIC's.
August 31, 201114 yr Right, so we have to either spend more money and have higher power draw for something that can be fixed by allowing users to choose which network driver to use. What Realtek NICs work/don't work? Sorry but the software broke it. The software fixes it. From what I have gathered all Realtek NIC's except for the latest Realtek 8111E work fine. And if you would check the b12 thread (read through it) there is something in there that can be tried that seems to have fixed the issue with the 8111E NIC's. Yup seen that. Away from my server at the moment so can't try.
August 31, 201114 yr It seems like the cache disk is always spun up now. Is this on purpose or am I missing a switch? I've noticed this since I switched to beta 11. I did a quick search here and found nothing.. unless I"m using the wrong search words! Jim
August 31, 201114 yr The cache drive needs to be managed yourself using HDPARM. unRAID only manages the spin times for the data and parity drives, it does not manage the cache drive.
August 31, 201114 yr The cache drive needs to be managed yourself using HDPARM. unRAID only manages the spin times for the data and parity drives, it does not manage the cache drive. You can set spindown via the unRAID webGUI for the cache drive. I believe unRAID will manage the spindown of the cache drive. I can't really test this as my cache drive never spins down.
August 31, 201114 yr The cache drive needs to be managed yourself using HDPARM. unRAID only manages the spin times for the data and parity drives, it does not manage the cache drive. You can set spindown via the unRAID webGUI for the cache drive. I believe unRAID will manage the spindown of the cache drive. I can't really test this as my cache drive never spins down. I'll try to find the recent post by Limetech that seemed to indicate unRAID does not manage the cache drive. Perhaps I misread. EDIT: Yeah, now on a re-read, perhaps it means unRAID handles spindown differently from the cmd-line interface, but still manages it via hdparm on the GUI. The 'mdcmd spindown' command only spins down drives that are part of the array (parity and data drives). The cache drive is spun down using the 'hdparm' command. The reason for this is because the unraid driver keeps track of disk spinning/not spinning in order to implement "spinup groups".
August 31, 201114 yr Ok.. It was set to "default".. what ever that means as far as spin down.. I changed it to 15Min and we'll see what happens... Thanks, Jim
September 6, 201114 yr I exchanged a 1TB disk for a 3TB disk yesterday evening & started the rebuild. This morning the servers web control panel no longer is accessible, and whenever I mount anything (which takes minutes rather than seconds), I get: "Something wrong with the volume's CNID DB, using temporary CNID DB instead. Check server messages for details. Switching to read-only mode". I believe somebody has also had this problem a few posts back. My question is, if it is rebuilding and I hard-reset it (As I conclude that it has become unresponsive), is there a chance of data loss? No. I just telnet in and manually delete the three .Apple files, then the problem is fixed! I deleted a file called .apdisk or something like this ( it was only on disk1 ) I'm not absolutely sure if that solved the problem but I don't seem to get the error any more...
December 9, 201114 yr Kernel Oops emHTTP lockup My server has a problem when it tries to come out of sleep. There are some messages on the console about something crashing in the kernel and the Web console dies. However I am able to use the shell prompt and reboot the server. I can then start unraid again and the disks all seem fine. The server is able to complete run a parity check (abt 10 hours) without any problems. Only happens when it tries to come out of sleep. I had migrated to beta 14 and thinking the new kernel (3.x) was an issue moved back to 11 recently. Any suggestions. I am attaching the syslog which shows, towards the end, the errors. I am running on a VM on esxi 5 and have M1015 Lsi controllers. Thanks for any help and suggestion. RS log-dec-9-2011.txt
January 3, 201214 yr syslog also attached /dev/sdb is the USB drive. Anything to worry about here? Message from syslogd@TestTower at Tue Jan 3 07:32:43 2012 ... TestTower kernel: Oops: 0002 [#1] SMP Message from syslogd@TestTower at Tue Jan 3 07:32:43 2012 ... TestTower kernel: last sysfs file: /sys/devices/pci0000:00/0000:00:11.0/0000:02:02.0/usb1/1-1/1-1:1.0/host2/target2:0:0/2:0:0:0/block/sdb/stat Message from syslogd@TestTower at Tue Jan 3 07:32:43 2012 ... TestTower kernel: Process mdrecoveryd (pid: 10505, ti=f7400000 task=ee8d8000 task.ti=f7400000) Message from syslogd@TestTower at Tue Jan 3 07:32:43 2012 ... TestTower kernel: Stack: Message from syslogd@TestTower at Tue Jan 3 07:32:43 2012 ... TestTower kernel: Call Trace: Message from syslogd@TestTower at Tue Jan 3 07:32:43 2012 ... TestTower kernel: EIP: [<c131ffdf>] _raw_spin_lock_irq+0x9/0x1a SS:ESP 0068:f7401e18 Message from syslogd@TestTower at Tue Jan 3 07:32:43 2012 ... TestTower kernel: Code: eb f6 5d c3 55 89 e5 9c 59 fa ba 00 01 00 00 f0 66 0f c1 10 38 f2 74 06 f3 90 8a 10 eb f6 89 c8 5d c3 55 89 e5 fa ba 00 01 00 00 <f0> 66 0f c1 10 38 f2 74 06 f3 90 8a 10 eb f6 5d c3 55 89 e5 fe Message from syslogd@TestTower at Tue Jan 3 07:32:43 2012 ... TestTower kernel: CR2: 0000000000000040 syslog_20120103.txt
January 3, 201214 yr Kernel Oops emHTTP lockup My server has a problem when it tries to come out of sleep. Since "sleep" is not officially supported by unRAID, your only possibility is to try the newer kernel in later betas. If it works, great. It not, wait some more for a newer kernel.
January 18, 201214 yr How Stable is this release considered?, is 12a (or another more recent one) more stable? I'm not sure which version to use to get 3tb hd support.
Archived
This topic is now archived and is closed to further replies.