October 11, 201015 yr Good day Lime-tech, Our machine is ‘locking’ up and requires a hard boot about every 1.5 weeks. This is now in our production environment, has replaced our last redundant file server, and needs to be corrected quickly. It has a single shared folder spanning the drives for 5 Acronis Image backups (nightly incremental and weekly full) each in their own folders. In addition are 3 other folders that back up offsite data on a daily basis. This brings the total up to 8 folders, each containing large datasets with up to 5 users accessing the individual folders at any given time. (i.e. \\tower\backup\user1, \\tower\backup\user2, \\tower\backp\user3, ETC.) When the machine locks up 1 of 2 things happen: the screen is turned on and nothing is seen, the screen is turned on and a reprint happens so quickly that nothing can be deciphered. In both cases the machine doesn’t take any input commands from the terminal. Telnet is impossible because it is not viewable on the network when this happens. This makes a copy of the syslog impossible. In one lockup one of the user’s Acronis backups said that there was an IO error. Before the OS was loaded, a full system suite was ran to test the memory/equipment for potential errors. Nothing was found (for example, memtest was run for 48 hours with 0 errors) Each hard drive was fully verified before being added to the machine. Specs. 4 1 TB storage drive WD1001FALS (1 Parity Drive) CMPSU-400CX 400W BIOSTAR TA785GE Kingston 2 GB (1 stick, 3 slots still available) AMD Sempron 140 Sargas 2.7GHz 45 Watt Centurion 590 RC-590-KKN1-GP Supermicro Add-on Card AOC-SASLP-MV8 CBL-SFF8087OCF-05M ICY DOCK MB454SPF-B All the machines connect via a Cisco SG 100-16 Gigabit Switch. No additional packages have been installed. This is using the preconfigured server pro OS. One thought we had is that a recommendation for more Ram and/or a more robust drop in processor will correct this. Other thoughts and ideas are highly desired. If the processor is the problem, perhaps a AMD Athlon II X2 245 Regor 2.9 GHz x2 65 watt would be adequate? If not would a comparable speed quad core or a faster dual core be better for this application?
October 12, 201015 yr Note: I am not affiliated with LimeTech, I'm just a member here. If you wish to seek help from LimeTech, you should email them directly. They may or may not see this forum post. From what you have described, I doubt that the CPU or RAM is the cause. However, could it be an overheating issue? What are your drive's normal temperatures? I would also consider looking at the network in general and the NIC. Do you have another NIC that you could try? Intel NICs tend to be the best. Have you visually inspected the motherboard for problems? Dark spots, capacitor plague, etc. You say that this happens about every 1.5 weeks. Could you post a syslog from about a week in? There could be something in there that will help me and other forum members help you. Past that, we don't really have a lot to go on.
October 18, 201015 yr Author Drive’s normal temps: 39-43. Cooling -Front Panel ‘mesh’, Hard drives in icy with a 80mm intake -Lower Side panel 120mm fan intake (noctua) -Bottom PSU exhaust -2 top 120 exhaust (noctua) -1 rear exhaust (noctua) -Stock CPU fan/heatsink -All heatsinks are cool during heavy operation. The raid card’s small heatsink becomes warm, but never hot. We do not have another NIC. We are using the onboard NIC. The Mobo has no visual imperfections, all capacitors look ok. We’ve had this on older Mobos, but not this one. We just had a crash, unfortunately it happened a couple of hours before I returned from a work trip and therefore we do not have a long term syslog. Attached is a syslog created post boot. We are in the process of upgrading to 4.5.6. After this we’ll put in the unMenu so that we’ll have e-mail notification, a scheduled parity check, and UPS monitoring. When updating item’s, is it ok to kill a parity check so that upon each boot up we do not need to wait 4-5 hours before proceeding to the next step? Thank you for all the help. syslog_-_Copy.txt
October 18, 201015 yr When updating item’s, is it ok to kill a parity check so that upon each boot up we do not need to wait 4-5 hours before proceeding to the next step? I'm sure the official response is no, but I did when I was having similar issues. Have you noticed any correlation as to when the lockups occur? I.E. are there large files being written to it at that time? That is what would cause my system to hang, turned out to be the onboard NIC that was the problem - I added a dedicated intel nic and haven't had a problem since. In my case I believe the issue was due to the same IRQ being used by the onboard nic and secondary SATA controller. I wasn't able to change them independently so I disabled the onboard and added the Intel - again, haven't had an issue since.
October 18, 201015 yr I would look into possibly borrowing or upgrading the lan adapter. The TA785GE seems to have a Realtek 8111DL. This may or may not be the cause of the problem during peak activity on the lan. A hard lockup is hardware related with issues regarding the bus and/or communications on it. It seems when the acronis imaging occurs there may be some significant lan traffic. Perhaps put an intel PCIe adapter in and see how it goes. It could also have something to do with the SASLP adapter. Have you tried the drives on the motherboard ports?
October 18, 201015 yr Author Yes, we do transfer massive file sizes. Some 20gb+ . Two or more users may need to be doing this at the same time. Regarding the mother boards - no we have not tried the drives on the motherboard ports. We will also look into a dedicated NIC.
October 18, 201015 yr Author We are using the recommended build specs. The motherboard has one PCIe (currently populated by SAS controller) and the two PCI slots (empty). Currently, we use 4 hard drives. It looks like we will soon be using 7 or 8. The motherboard has only 6 SATA ports available. We can try the onboard SATA ports for now, which would open up the PCIe slot, but assuming the PCIe slot needs to remain populated, would there be any disadvantages to using the PCI slot instead of the PCIe INTEL NIC?
October 18, 201015 yr If a 1GB Intel PCI card is used in a PCI slot and it is the only board on the PCI slots, it may work with a small performance penalty. My benchmarks of pure PCIe 1GB intel to PCIe 1GB intel showed a maximum throughput of 990MB/s. when doing the same exact benchmark with PCI and/or PCI-X I saw a maximum throughput of 700MB/s. This is without disk I/O, a pure network app to network app benchmark.
October 18, 201015 yr Author Excellent information! The system just locked up. While it was locking up I was able to have it create a syslog (attached) before it (seconds later) became fully unresponsive. The screen showed the following message right before full lockdown: REISERFS error (device md1): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [2 9 0x0 SD] I attached the syslog as well (it has been cut down to fit ) syslog.txt
October 18, 201015 yr Excellent information! The system just locked up. While it was locking up I was able to have it create a syslog (attached) before it (seconds later) became fully unresponsive. The screen showed the following message right before full lockdown: REISERFS error (device md1): vs-13070 reiserfs_read_locked_inode: i/o failure occurred trying to find stat data of [2 9 0x0 SD] I attached the syslog as well (it has been cut down to fit ) That is probably just the file system on disk1 that in need of checking and repair. http://lime-technology.com/wiki/index.php?title=Check_Disk_Filesystems
October 18, 201015 yr Author Thanks Joe, We let the machine run after locking up for awhile. Eventually we were able to login to the web terminal. It showed a single drive failure via a blinking red ball. We replace the drive with a new spare, moved the drives from the SAS controller to the motherboards sata controllers (1-4), reassigned the disks correctly and are now in the process of rebuilding the parity. The ‘failed’ drive has no smart errors. Well continue to see what happens
October 19, 201015 yr Author We are now using the sata ports directly on the motherboard: file/folder access is MUCH quicker, boot time is much quicker, and we are not seeing errors pop up throught the syslog. The SAS controller has been removed and we are looking into an RMA. The version has been updated from 4.5.4 to 4.5.6, we’ve installed unMenu (excellent by the way) and are configuring a number of the add-ins. With that being said, we’ve been using the onboard NIC included on the Biostar TA785GE: the Realtek 8111DL as previously mentioned. This appears to be causing some problems when transferring large files (20 GB+ with the potential of multiple users at the same time). The info on supported PCIe (or PCI) NIC info seems to be a bit sparse, however it appears that an Intel PCIe card is the way to go. Here is what I’ve found, suggestions as to the validity of these choices or alternatives would be greatly appreciated. In addition, if the lower priced card is good, then I’d like to know that too as I’d much rather spend $30 over $180 for the same results.: Intel EXP19301CTBLK PCIe $30 http://www.newegg.com/Product/Product.aspx?Item=N82E16833106033 Intel PRO 1000 CT PCIe-FETH GETH $40 http://accessories.us.dell.com/sna/productdetail.aspx?sku=A2290104&cs=04&c=us&l=en&dgc=SS&cid=52102&lid=1342490 Intel PRO 1000PT PCIe $88 http://www.amazon.com/Intel-1000PT-Gigabit-Rohs-Pcienic/dp/B000IPO5M6/ref=sr_1_22?s=electronics&ie=UTF8&qid=1287514078&sr=1-22 Intel PRO/1000T PWLA8490T PCI (not PCIe) $145 http://www.buy.com/prod/intel-pro-1000t-network-adapter-pci-1-x-rj-45-10-100-1000base-t-intel/q/sellerid/13416188/loc/101/10261291.html Intel PRO EXP19402PT PCIe 1000PT dual $180 http://www.newegg.com/Product/Product.aspx?Item=N82E16833106014&cm_re=Intel_PRO%2f1000-_-33-106-014-_-Product Thanks!
October 19, 201015 yr Any of the cheap ones should work, but keep in mind that those are all PCIe x1 cards, not PCI. If you have an open PCIe x1 port, great - use it. However, I don't believe your motherboard has one. If you don't, then you'll need to look for a PCI network card (such as the one you linked for $145, but there's cheaper options). Definitely don't waste money on a dual port card, since unRAID can only make use of one port anyway (no teaming). I believe that any of these should work for you: Newegg link I don't know of any specific advantages of any of them, so I would probably just go for the cheapest one.
October 20, 201015 yr Author I wanted to confirm that an inexpensive PCI NIC is adequate when we have multiple users transferring 20 + GB data sets simultaneously? Thank you very much all for the help that you have given to us. If so we will be purchasing the one that had been recommend by Rajahal. You guys have been life savers! Much appreciated.
October 20, 201015 yr I think you should get a second opinion before purchasing anything. I wasn't really recommending that one (since I have no hands-on experience with it), just citing an example.
Archived
This topic is now archived and is closed to further replies.