July 1, 201313 yr This is normal for this machine... not desired of course... During parity checks, the server (4.5.3 - PRO license) is not accessible on the network, via Samba, the unRAID web interface, or via the unMENU web interface. After the parity checks complete, the server always looks normal again on all 3 listed network access methods. I am not sure at the moment how much memory is in this one, or which processor for sure. But it does have 9 data drives, with parity enabled. Multiple 2 TB drives, and some smaller. (It is currently in the monthly parity check process...) I believe this is a Pentium D, somewhere near 2.8 GHz, if I remember right... It should also have 1GB of RAM or less... Since I have usually tried to push to the limits, older hardware with unRAID for testing failure modes, I am just wondering if anyone has the best way to determine via the unRAID Linux console, where the bottleneck is. I assume the processor is at or very near 100% utilization during the parity checks. I also can imagine it might be a condition of low memory resources, but I do not think it is as likely. In the past, nothing has looked unusual in the log after the parity checks have completed. This is the first time I have had a chance to look into the condition while the monthly parity check is running, and I thought about it... What command(s) would be best to attempt at the console interface to see CPU utilization, and memory utilization? I only have done enough with Linux to do what I want when needed. I am mostly looking for a good command, or set of commands, that do not in themselves use LOTS of CPU/memory, so I do not crash an already overloaded server during a parity check. Is top a good command to use under this situation? I can not seem to find a list of what resources it may use when invoked... Any other good commands? Should I expect that the command q will still be able to stop the top process even under overloaded conditions? Also... Is there a way to lower the priority of the parity check process, to allow the system to be accessable during parity checks on an over-worked unRAID system? I just wanted to get a couple ideas, since I do not want to crash it during a parity check, and it is also the only time to test what I am seeing... This is my MAIN unRAID server...
July 1, 201313 yr Author Interesting developement... It is looking like resources are NOT the problem. Looks like I have another bad GB Ethernet card. I think this was the last of my REALTEK add-in cards that I was going to replace with an Intel one. I have had many failures with the GB Realteks over the last year, and this one has not seemed to be a problem with data transfers, so it had not yet been replaced. After the parity check is over, I will see if the ethernet comes back to life or not, and replace the card either way. I had never looked to see if I had lost the link or not in the past, but it is not there now. Tried different GOOD cables, and switched to a different switch, also known to be good with no prior operational issues, at one point the link apeared for about 1 second, then went off again. If this is what has been happening in the past with my parity checks, it is an odd issue indeed. I do know the Realtek uses more CPU resources than the Intel NICs, possibly there is a timing issue with the failing NIC, that is needs quicker CPU response to properly work... Will post again after Parity check and further NIOC testing... and swap...
July 1, 201313 yr Author OK, the link came back and network functions are fully restored as I expected after the parity check completed. I will swap out the current Realtek based Gb NIC and replace it with an Intel Gb NIC. But I expect all will be good then. CPU utilization was well below 25% average during the times I looked at it during the parity check, and the RAM also had no problems, and still had LOTS left... NO errors logged at all in the system log, and all went well with no parity errors either, as usual. Hardware information update: System has: CPU: Intel® Pentium® Dual CPU E2180 @ 2.00GHz RAM: 2GB, (not sure when I added that...) Would have still had no problems with 512MB during the parity check... NIC: Driver;r8169, NO ERRORS LOGGED. RX packets:3567 errors:0 dropped:0 overruns:0 frame:0 TX packets:2313 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 HD: Status Disk Mounted Device Model Reads Writes Errors Size Used %Used Free OK parity /dev/sde SAMSUNG_HD154UI_ 5282914 47 OK /dev/md1 /mnt/disk1 /dev/sdf SAMSUNG_HD154UI_ 5481154 6 1.50T 1.31T 88% 194.51G OK /dev/md2 /mnt/disk2 /dev/sdg SAMSUNG_HD154UI_ 4741222 6 1.50T 1.31T 88% 187.28G OK /dev/md3 /mnt/disk3 /dev/sda ST31500541AS_ 4911542 5 1.50T 1.25T 84% 247.92G OK /dev/md4 /mnt/disk4 /dev/sdh ST31000340AS_ 3307029 5 1.00T 924.22G 93% 75.96G OK /dev/md5 /mnt/disk5 /dev/hda ST3750640A_ 1601459 5 750.13G 747.23G 100% 2.91G OK /dev/md6 /mnt/disk6 /dev/hdb ST3750640A_ 1590664 6 750.13G 720.68G 97% 29.45G OK /dev/md7 /mnt/disk7 /dev/sdb ST3750640AS_ 2305626 5 750.13G 470.80G 63% 279.34G OK /dev/md8 /mnt/disk8 /dev/sdc ST3750640AS_ 2024220 6 750.13G 723.65G 97% 26.48G OK /dev/md9 /mnt/disk9 /dev/sdd ST3750640AS_ 2493748 5 750.13G 606.06G 81% 144.07G NOTE: Parity check ran for 12:21:18...
July 2, 201313 yr OK, the link came back and network functions are fully restored as I expected after the parity check completed. I will swap out the current Realtek based Gb NIC and replace it with an Intel Gb NIC. But I expect all will be good then. CPU utilization was well below 25% average during the times I looked at it during the parity check, and the RAM also had no problems, and still had LOTS left... NO errors logged at all in the system log, and all went well with no parity errors either, as usual. Hardware information update: System has: CPU: Intel® Pentium® Dual CPU E2180 @ 2.00GHz RAM: 2GB, (not sure when I added that...) Would have still had no problems with 512MB during the parity check... NIC: Driver;r8169, NO ERRORS LOGGED. RX packets:3567 errors:0 dropped:0 overruns:0 frame:0 TX packets:2313 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 HD: Status Disk Mounted Device Model Reads Writes Errors Size Used %Used Free OK parity /dev/sde SAMSUNG_HD154UI_ 5282914 47 OK /dev/md1 /mnt/disk1 /dev/sdf SAMSUNG_HD154UI_ 5481154 6 1.50T 1.31T 88% 194.51G OK /dev/md2 /mnt/disk2 /dev/sdg SAMSUNG_HD154UI_ 4741222 6 1.50T 1.31T 88% 187.28G OK /dev/md3 /mnt/disk3 /dev/sda ST31500541AS_ 4911542 5 1.50T 1.25T 84% 247.92G OK /dev/md4 /mnt/disk4 /dev/sdh ST31000340AS_ 3307029 5 1.00T 924.22G 93% 75.96G OK /dev/md5 /mnt/disk5 /dev/hda ST3750640A_ 1601459 5 750.13G 747.23G 100% 2.91G OK /dev/md6 /mnt/disk6 /dev/hdb ST3750640A_ 1590664 6 750.13G 720.68G 97% 29.45G OK /dev/md7 /mnt/disk7 /dev/sdb ST3750640AS_ 2305626 5 750.13G 470.80G 63% 279.34G OK /dev/md8 /mnt/disk8 /dev/sdc ST3750640AS_ 2024220 6 750.13G 723.65G 97% 26.48G OK /dev/md9 /mnt/disk9 /dev/sdd ST3750640AS_ 2493748 5 750.13G 606.06G 81% 144.07G NOTE: Parity check ran for 12:21:18... Another possibility... during a parity check/sync all the disks are active. Is the power supply able to keep up with the demands? If you have a multi-rail power supply, it may not. (or is the network card voltage sensitive, or does it have an interrupt conflict with your disk controllers?)
July 9, 201312 yr Author Looked at the power supply with an O-Scope. and the voltages all look good, and stable under parity checks, and normal operations also. After the unRAID system was on for a few days, the Realtek NIC started to only connect at 100Mb, and would no longer connect at Gb speed. This was the case even after a full power down, unplug, drain power supply, restart test... This is more like the failures I have had with the other Realtek NICs I have replaced. I replaced the NIC with an Intel Gb NIC and all is working well under all conditions again. :-)
Archived
This topic is now archived and is closed to further replies.