June 28, 200818 yr I've had a couple of strange shutdown problems with 4.3.1, was previously using 4.3b3 and that was fine. 1st problem was when trying to spinup all the discs from the webpage, and total access was lost, could not telnet in either. No response from a ping command either so the only thing i could do was kill the power. On a reboot, it began a parity sync which i was expecting. This completed with 0 errors and everything was fine. 2nd time when trying to initiate a shutdown from the management page caused again the webpage to stop responding. This time i could telnet in and managed to capture the log which i have attached. After a long time (quite a few minutes) the webpage came back with what looked like half the drives with red icons and marked as missing. Seems something has changed with 4.3.1 (or this and beta's after b3) as i've had zero problems up until this update. Any help would be much appreciated. Mark
June 28, 200818 yr Serious problems with the Maxtor's only, every single one, and not one issue with the Hitachi's. The issue occurred with the Maxtor attached to the last onboard SATA port, the 2 attached to the JMB363 controller, and the rest attached to the Promise cards, so that rules out a problem with the controllers. It appears the command sent to the Maxtor's to spin them down, leaves them in an unresponsive state. I would either turn off Spin Down globally, or revert to 4.2.4 for now. Perhaps Tom will have an idea. In summary, for v4.3.1 with this motherboard, those Maxtor's (Maxtor_7H500F0) appear to be incompatible with Spin Down.
June 28, 200818 yr The problem appears to be with spinning UP. When you shutdown, the system actually must first spin up all the drives in order to un-mount them, and then it can spin them down again before killing power. The spin-up code has not changed, but, in the case of Shutdown only, we used to spin-up the drives serially, which delayed Shutdown considerably, so we changed it to do the spin-ups in parallel. I'll have to look back to see exactly when this happened. With the system up and running normally, you might try clicking the "Spin down" button, wait a minute, then click "Spin up" button on Main page to see if this causes the problem as well. If so, it's conceivable that parallel spin-up is too much for your power supply. BTW - the way Spin Up code works is not by issuing any special command to the drive - what it does is read 512-byte sectors starting from sector 0, checking after each read if the drive is now spun up - if so, then we're done - if not, then keep reading. The only way a read could finish and the disk is still spun down is that the block is cached in the linux buffer cache. Sequentially reading the drive will eventually result in reading a sector not in cache and thus spinning up the drive.
June 28, 200818 yr Author Thanks for the replies guys. The first time i had an issue i did use the spin up button and that caused a complete hang and could not get access. I did notice that the spinup had changed to do everything in parallel but first noticed this in beta 3 and i never had any problems there. I have changed to not spin down drives automatically for now, don't leave the server on 24/7 so this is not an issue but still would like to have the problem fixed. I'll try the spindown and spinup and see what happens. Don't think its a psu issue, i'm running 2 good quality 500Watt PSU's, and like i say there were no issues in beta 3. Thanks Mark
July 1, 200818 yr Author So i have just tried again what you suggested Tom. Click the spindown, wait a minute and then hit the SpinUp. Doing this has caused a complete lack of being able to connect to the server. No webpage, no telnet, can't even ping it. Needless to say i am not happy. Now i'm going to have to hard powerdown and when it restarts another 12 hour parity check for no reason. Unless you have a solution i'm going back to beta 3 as i've used the spinup many times there without any issue at all. Mark
July 1, 200818 yr Most of us will prefer the simultaneous spin-up (as our mono/PSU can handle it), but I can see this as a user option in the future (eventually more things will have to become user options anyway). You have 500W PSU but that doesn't mean all the power output lines are the proper amperage. Better check this out.
July 1, 200818 yr Author 2x 500 watt power supplies to split the load. 4.3 Beta 3 had simultaneous spinup and i had no problems there so don't see this as a power issue. Seeing as all the drives do not even try to spinup and i lose connection immediately, can't even ping the box it would seem to me its a software issue. Mark
July 1, 200818 yr I think simultaneous spinup will reveal problems in hardware that may not have been a situation before. I think an option for simultaneous, staggered or single threaded spin up should be available. simultaneous being the hardest on the system. If there is any dips in power, the chances of your hardware freezing are high. If the network adapter stops responding to pings, Then the hardware has hung. It could have something to do with the IRQ re-arrangement in the latest kernel. I.E. a spike up spin up creates a number of interrupts which clash with one another. (shouldn't happen, but then again your hardware is hanging). Other problem could be the huge surge required to spin up simultaneously causing a dip. Two power supplies does not guarantee there is enough power from the outlet for the split second surge. If you are on a UPS that supports power elevation on surge, then chances are the power is OK. With Staggered spin up, I think each successive drive down the list should wait 3-5 seconds per drive. in my power control script. Drive 0 spins up first. Drive 1 spins up in 5 seconds. Drive 2 spins up in 10 seconds. Drive 3 spins up in 15 seconds. With Single threaded spin up, The spin up code waits for each drive to finish spinning up. This takes the longest. But also insures the least amount of surge on the power supply.
July 1, 200818 yr Author Was there a kernel change between 4.3b3 and 4.3.1? If so i'm thinking this is where the problem might be. Not that i can spend more time diagnosing it as everytime it hangs i have a 12 hour parity check. A real PITA. I'll say it again, simultaneous spinup had no issues in 4.3b3. I understand that spinning all the drives up at the same time will cause a surge of power but this was never a problem until an update to 4.3.1. Mark
July 1, 200818 yr Was there a kernel change between 4.3b3 and 4.3.1? Changes from 4.3-beta3 to 4.3-beta4 ----------------------------------- New feature: cache disk support. Improvement: enable SMART before reading disk temperature. Improvement: upgrade from linux kernel 2.6.24.3 to 2.6.24.4 (refer to http://lwn.net/Articles/274741). Improvement: upgrade from Samba 3.0.28 to Samba 3.0.28a (addresses some Vista issues, refer to http://us1.samba.org/samba/history/samba-3.0.28a.html). Improvement: added back a few more missing libraries needed for certain user customizations. Bug Fix: Support normal expansion of array when Parity is not installed. Limetech needs to answer this one The spin-up code has not changed, but, in the case of Shutdown only, we used to spin-up the drives serially, which delayed Shutdown considerably, so we changed it to do the spin-ups in parallel. I'll have to look back to see exactly when this happened.
July 1, 200818 yr If you would like a copy of unRAID Server 4.3-beta4.zip to test the above, just PM your email address to me. It's about 30MB. Edit: Try the FTP alternative a couple of posts below.
July 1, 200818 yr Hey RobJ I'd like a copy of beta 4 if you could. I lost my copy. I'll send you a pm with my email. Thanks Phil
July 1, 200818 yr Well, I have to apologize, especially to you Phil. A server upstream from me is indicating it will not accept an emailed file over 14.6MB, a rather bazaar number! I *think* I have set up an alternative: FTP address: ftp.jacoserv.com User name: unraid Password: unRAID (note capitalization) Please let me know if this does not work.
July 1, 200818 yr I downloaded the file just fine through ftp. Thanks Rob. Don't worry about the email though. Things happen That was a odd file size limitation... Phil
July 1, 200818 yr Well, I have to apologize, especially to you Phil. A server upstream from me is indicating it will not accept an emailed file over 14.6MB, a rather bazaar number! somebody set the quota to a simple decimal "15000" then
July 1, 200818 yr somebody set the quota to a simple decimal "15000" then The number they gave me was 14680064 !?! And I don't even know who 'they' is, was not identified, just 'The server'. (using Thunderbird, through Earthlink)
July 2, 200818 yr somebody set the quota to a simple decimal "15000" then The number they gave me was 14680064 !?! And I don't even know who 'they' is, was not identified, just 'The server'. (using Thunderbird, through Earthlink) That is 14MB (14 * 2^20)
July 3, 200818 yr Author Are you out there Tom? Any more thoughts on this issue. Could the new kernel be buggy? Mark
July 10, 200817 yr I have another shutdown failure scenario (or perhaps one mentioned already). I upgraded from 4.0 to 4.3.1, had lots of issues but thanks to the forum I appended the right string and it finally booted. Now everything is dandy except that the Server doesn't completely shutdown when I issue a shutdown command through the web interface. The system does seemingly become unresponsive (and unpingable), but the MB, and drive chassis are all still running. On physical reboot, the system does come up w/o any parity check. Still this is not exactly what I had in mind when "shutting off". Everything worked well in 4.0. My build: Asus P5B-E MB 1GB Corsair value Ram 512MB Sandisk Cruzer Micro F.D. Seasonic Super 460 Silencer 4x Chieftec 3-in-2 SATA mobile racks 3x 2 SATA port Syba PCI-e controller cards Tom, any idea? anyone else?
July 11, 200817 yr Author If possible you might want to give 4.3b3 a try. That version was rock solid for me. Mark
July 14, 200817 yr melechmet- The P5E-VM DO is famous (or infamous) for unusual shutdown/reset behavior. If you haven't already, try to upgrade to the latest bios, but, be sure to follow this procedure: 1. Download the bios & put it in the root of your Flash device. 2. Power-off your server. 3. There is jumper located just to the right of the lower-right corner of the Intel ICH9DO chip which must now be moved. From the Asus website: "Please set service mode Jumper to 2-3 during Bios update." 4. Now boot server and hit Del key to enter bios. We have found that you might see "CPU Fan error" message as a result of the service mode jumper being moved. This is ok. 5. Select ASUS Update Utility and use it to update your bios. 6. After bios update, the motherboard will reboot itself. Once this starts happening power down and restore the service mode jumper. Now when you power up with the new bios see if shutdown is working properly. We have determined that: 1. If you don't follow procedure of moving service jumper before bios update, the board may no longer power down properly (everything shuts off except the CPU and case fans). Once this happens seems to be no way to fix. 2. If after updating bios in manner specified, board still does not shutdown properly, we have found no way to fix & if this is important to you, only recourse is to RMA the board. Hope this helps.
July 21, 200817 yr I wonder whether all this has anything to do with my own problems; I get a kernel panic when I try to stop my array. I have a 650 W power supply which I thought would be adequate, especially with serial spin up. WOuld lack of power lead to a kernel panic? Another thought was my RAM, but it hasn't been changed during my upgrade from 4.2.1 to 4.3.3. What HAS changed is the following: Added a drive to the array, plus a cache drive. Upgraded parity from 750 WD to a 1 TB WD green drive. New total of 9 drives installed. Switched my Netgear gigabit card for a Broadcom one.
July 21, 200817 yr I am pretty sure a kernel panic could only be caused by something running at the most privileged level, most likely a driver issue or low level hardware fault (like a memory fault). Of the changes you mentioned, I don't see how any kind of drive change (adding, replacing, spin up or down) could be involved at all, so that leaves low level hardware problem, or a driver change in the newer kernel, or the driver change for the network card. Which leads me to suggest reverting back to your Netgear card, to eliminate it as a suspect. If your PSU is working correctly, 650W should be adequate, and probably is not implicated in a kernel panic anyway. That is 14MB (14 * 2^20) Thanks, Bill, that possibility never occurred to me.
July 21, 200817 yr I am pretty sure a kernel panic could only be caused by something running at the most privileged level, most likely a driver issue or low level hardware fault (like a memory fault). Of the changes you mentioned, I don't see how any kind of drive change (adding, replacing, spin up or down) could be involved at all, so that leaves low level hardware problem, or a driver change in the newer kernel, or the driver change for the network card. Which leads me to suggest reverting back to your Netgear card, to eliminate it as a suspect. If your PSU is working correctly, 650W should be adequate, and probably is not implicated in a kernel panic anyway. That is 14MB (14 * 2^20) Thanks, Bill, that possibility never occurred to me. Thanks for the tips. I'll report back in my thread... sorry for the jack!
Archived
This topic is now archived and is closed to further replies.