[SOLVED] unRAID network blowing up

March 24, 201115 yr

Author

Does the system do a parity check right after rebuild complete? I ask because the re-build finished and the main page shows parity is OK, last checked this morning at the same time the rebuild completed (roughly) and I would have thought the check would run a little longer than it did, unless of course it just considers the rebuild a check.

To be safe though, I am running a manual check, but still wanted to ask as this seemed a little odd to me as it was not what I was expecting. Not a huge deal, just one of those things though that got me wondering.

Quote

March 24, 201115 yr

Does the system do a parity check right after rebuild complete? I ask because the re-build finished and the main page shows parity is OK, last checked this morning at the same time the rebuild completed (roughly) and I would have thought the check would run a little longer than it did, unless of course it just considers the rebuild a check.

To be safe though, I am running a manual check, but still wanted to ask as this seemed a little odd to me as it was not what I was expecting. Not a huge deal, just one of those things though that got me wondering.

The processes of re-constructing a data drive and initially calculating parity are absolutely identical other than the disk being written and the set of disks being read. They apparently share the same "statistic" counter, so it is zeroed.

Joe L.

Quote

March 24, 201115 yr

Author

Does the system do a parity check right after rebuild complete? I ask because the re-build finished and the main page shows parity is OK, last checked this morning at the same time the rebuild completed (roughly) and I would have thought the check would run a little longer than it did, unless of course it just considers the rebuild a check.

To be safe though, I am running a manual check, but still wanted to ask as this seemed a little odd to me as it was not what I was expecting. Not a huge deal, just one of those things though that got me wondering.

The processes of re-constructing a data drive and initially calculating parity are absolutely identical other than the disk being written and the set of disks being read. They apparently share the same "statistic" counter, so it is zeroed.

Joe L.

OK, thanks Joe.

In the mean time, I did start another parity check and it shows 3 errors as of 5%. I assume these will correct themselves as part of the parity check, or will I need to run another once complete to ensure everything is OK?

I just want to ensure data integrity before I swap the motherboard out and work to get rid of the HPA on my 2 drives and then prepare for 4.7 as I can't get there without getting rid of the HPA as I learned when I initially tried upgrading. Slowly, I will get there and be back online and then hopefully once this is all cleared up I can return to troubleshooting the issue with the network that started this all.

Quote

March 24, 201115 yr

You should have no errors. Keep running checks until at least 2 in a row are error free.

Peter

Quote

March 25, 201115 yr

Author

OK, so the parity check finished with just the 3 errors. I see a new check box (at least i think it is new) that says Sync filesystem first below spin down. Do I need that to fix the parity errors or does the parity check do that? I want to start the next check to move towards ensuring a safe file system before swapping boards, but I don't want to do this wrong as parity is one of the most important things in the unRaid system.

Quote

March 25, 201115 yr

That check box has nothing to do with fixing your errors.

Peter

Quote

March 25, 201115 yr

Author

OK, thanks Peter. Check 2 completed with no errors, running the 3rd now.

Quote

April 6, 201115 yr

Author

OK, Finally I have completed a bit of an overhaul.

Replaced potentially failing drive, upgraded motherboard bios to disable HPA, removed HPA from 2 drives, upgraded to 4.7, upgraded 1TB drive to 2 TB and system has been running stable. Finally it was time to plug the system back into the Trendnet switch to see if that is the source, so I plugged back into the trendnet and started a transfer.

After about 4 gigs transfered, the system locked up. So, the issue was not related to hardware failure or anything like that, could this mean the trendnet is broken? I have a PCH a-110 plugged in through the trendnet and it seems to function fine.

So, to explain my configuration. My office PC connects to a Dlink switch, from there I go to the trendnet and the UnRAID server. I just cant imagine the issue being the switch because otherwise I would think it wouldnt work at all, but I am not sure. It also seems my server is good cause it functions when plugged into the dlink switch.

Finally I searched and found this thread, http://lime-technology.com/forum/index.php?topic=6893.0;wap2

but there never seems to be a solution, or his issue is completely different. I am not sure, but something is screwy. I guess, the one test would be to plug both into the trendnet and see what happens.

It is also crazy that the server is actually locked up when this happens. I managed to run the tail command for syslog and the last thing on the screen is a time synchronization, so it reveals just about nothing.

Any thoughts?

Quote

April 6, 201115 yr

What speed is the PCH NIC?

Quote

April 6, 201115 yr

The PCH only has a 10/100 interface.

Quote

April 6, 201115 yr

The Trendnet may work fine at 100 but not at 1000. Can you attach the server to the D-link?

Quote

April 6, 201115 yr

Author

Well, I can place the D-link in the rack and attach to it, but I dont want that as a solution at the moment. That is my last resort. The Trendnet shows it is gig speed, the light is green. Any non-gig device lights up orange.

Are we suspecting that potentially the Trendnet is broken? I ask because it has a 5 year warranty so it can be easily RMA-ed.

The crazy thing to me though is why in the world would a faulty router cause the entire unRAID system to lock up and stop responding, even after plugging back into the dlink, is is not accessible and nothing at the console level registers. I am yet to hard reset the system in hopes of getting it back.

Quote

April 6, 201115 yr

ata13=sdh and ata14=sdi both show errors in the syslog. You could remove both drives from the array and run initconfig. Disable parity as well for the test. See if the problem goes away. Then add the drives back one at a time to see when the problem returns. If the drives are not causing the problem you can restore the array to its original condition, run initconfig, and rebuild parity.

EDIT: After the console becomes unresponsive when attached to the d-link does removing the network cable get the console to work?

Quote

April 6, 201115 yr

Author

I will see what it does with the cable unplugged.

I will also pull a new syslog as the one attached is from when I was having issues with a drive. The mention of h and i tells me those are 2 newer drives and may be the ones attached to my 8 port SATA. Is there anything special that should have been done when I added that, cause I know I just stuck it in and started using it.

Quote

April 6, 201115 yr

What type of card is it? It may have BIOS settings. You may have to try pulling the card.

Quote

April 6, 201115 yr

Author

supermicro pci-x (64bit) in a pci slot I believe, but I will confirm when I get home since I will need a hard reset from the looks of it

Quote

April 7, 201115 yr

Author

One other thought I just had was could this be related to NIC speed settings, for some reason I want to say I may have things forced to gig. Recently I thought I heard something that with the new gig standards it is best to have it auto negotiate. Any thoughts? Maybe the D-link just handles it better than the trendnet.

As another test, I just sent the same files that crashed the unRAID to my WHS (home server machine and it worked. that system is auto running at a gig connected to the trendnet, so the data took the same path that locked up the unraid. So I am thinking it is something between the unraid and trendnet. I just hate the hard resets to get the system back.

I will reboot, and report a new syslog and also find the network settings.

Quote

April 7, 201115 yr

Author

The card is a AOC-SAT2-MV8

I was looking for the NIC settings, but I cannot find them so maybe they werent changed, unless someone can not otherwise as a good place to look. Maybe it is in the kernel, outside of unraid itself.

One thing I did notice was find was talk about Jumbo Frames, is it work enabling them with the MTU setting specified, 9000?

Quote

April 7, 201115 yr

Author

Found a different post talking about nic settings and such and it said to run some commands and post the results, so I figured it to be worth a shot to do it before anyone asked. Although not the same issue, his end solution was a new switch, which I hope not to need...

Here are the outputs from the commands:

when on Dlink where things seem to function,

Tower login: root

Linux 2.6.32.9-unRAID.

root@Tower:~# ifconfig eth0

eth0 Link encap:Ethernet HWaddr 00:24:1d:2c:a7:5f

inet addr:192.168.0.101 Bcast:192.168.0.255 Mask:255.255.255.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:2220 errors:0 dropped:0 overruns:0 frame:0

TX packets:1615 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:257772 (251.7 KiB) TX bytes:525824 (513.5 KiB)

Interrupt:26 Base address:0x4000

root@Tower:~# ethtool eth0

Settings for eth0:

Supported ports: [ TP MII ]

Supported link modes: 10baseT/Half 10baseT/Full

100baseT/Half 100baseT/Full

1000baseT/Half 1000baseT/Full

Supports auto-negotiation: Yes

Advertised link modes: 10baseT/Half 10baseT/Full

100baseT/Half 100baseT/Full

1000baseT/Half 1000baseT/Full

Advertised auto-negotiation: Yes

Speed: 1000Mb/s

Duplex: Full

Port: MII

PHYAD: 0

Transceiver: internal

Auto-negotiation: on

Supports Wake-on: pumbg

Wake-on: g

Current message level: 0x00000033 (51)

Link detected: yes

root@Tower:~# ping -c google.com

ping: bad number of packets to transmit.

root@Tower:~# ping -c 5 google.com

PING google.com (74.125.93.105) 56(84) bytes of data.

64 bytes from qw-in-f105.1e100.net (74.125.93.105): icmp_seq=1 ttl=48 time=47.2 ms

64 bytes from qw-in-f105.1e100.net (74.125.93.105): icmp_seq=2 ttl=48 time=51.0 ms

64 bytes from qw-in-f105.1e100.net (74.125.93.105): icmp_seq=3 ttl=48 time=49.6 ms

64 bytes from qw-in-f105.1e100.net (74.125.93.105): icmp_seq=4 ttl=48 time=51.9 ms

64 bytes from qw-in-f105.1e100.net (74.125.93.105): icmp_seq=5 ttl=48 time=52.2 ms

--- google.com ping statistics ---

5 packets transmitted, 5 received, 0% packet loss, time 4042ms

rtt min/avg/max/mdev = 47.291/50.454/52.276/1.824 ms

root@Tower:~#

Now when plugged into the trendnet switch

root@Tower:~# ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:24:1d:2c:a7:5f

inet addr:192.168.0.101 Bcast:192.168.0.255 Mask:255.255.255.0

UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

RX packets:3037 errors:0 dropped:0 overruns:0 frame:0

TX packets:2129 errors:0 dropped:0 overruns:0 carrier:0

collisions:0 txqueuelen:1000

RX bytes:338384 (330.4 KiB) TX bytes:585960 (572.2 KiB)

Interrupt:26 Base address:0x4000

root@Tower:~# ethtool eth0 Settings for eth0:

Supported ports: [ TP MII ]

Supported link modes: 10baseT/Half 10baseT/Full

100baseT/Half 100baseT/Full

1000baseT/Half 1000baseT/Full

Supports auto-negotiation: Yes

Advertised link modes: 10baseT/Half 10baseT/Full

100baseT/Half 100baseT/Full

1000baseT/Half 1000baseT/Full

Advertised auto-negotiation: Yes

Speed: 1000Mb/s

Duplex: Full

Port: MII

PHYAD: 0

Transceiver: internal

Auto-negotiation: on

Supports Wake-on: pumbg

Wake-on: g

Current message level: 0x00000033 (51)

Link detected: yes

root@Tower:~# ping -c 5 google.com PING google.com (74.125.93.105) 56(84) bytes of data.

64 bytes from qw-in-f105.1e100.net (74.125.93.105): icmp_seq=1 ttl=48 time=52.7 ms

64 bytes from qw-in-f105.1e100.net (74.125.93.105): icmp_seq=2 ttl=48 time=50.3 ms

64 bytes from qw-in-f105.1e100.net (74.125.93.105): icmp_seq=3 ttl=48 time=47.7 ms

64 bytes from qw-in-f105.1e100.net (74.125.93.105): icmp_seq=4 ttl=48 time=48.1 ms

64 bytes from qw-in-f105.1e100.net (74.125.93.105): icmp_seq=5 ttl=48 time=51.8 ms

--- google.com ping statistics ---

5 packets transmitted, 5 received, 0% packet loss, time 4035ms

rtt min/avg/max/mdev = 47.701/50.176/52.759/2.000 ms

root@Tower:~#

Quote

April 7, 201115 yr

Author

I have attached my latest syslog after a clean reboot

syslog-2011-04-07.txt

Quote

April 8, 201115 yr

Author

So at what point can this be considered an unknown error? I mean sure it can be the switch, but I can copy to other machines that are connected the same way or is it that unraid is that much more sensitive to potential network related issues?

Quote

April 8, 201115 yr

I just got a Trendnet 8 port Gig-E switch and I'm using it with unRAID with a Biostar MB. It seems to be working. I will be copying a DVD (7+GB) to the server in about an hour and I let you know how it goes.

You've tried 2 MBs that don't work with the Trendnet, right? Your Trendnet is broken; RMA it.

Quote

April 9, 201115 yr

My transfer went fine. I can also stream back.

Quote

April 9, 201115 yr

Author

Oh hell, here is a good one.

I tore everything apart tonight and started the network over. To test the Trendnet, I plugged everything into it and did some test copies and they all worked.

Next, I added my Dlink green 8 port switch and transferred across between the 2, success.

Added the wireless router and hard wired into it, across the dlink to the trendnet and success.

Added my 10/100 to the trendnet, this is where the internet comes into the network and the dish network satellite box connects (no unraid traffic is really sent through it, but I wanted to note that it is part of the setup). And again everything good.

Finally, I switched back the network cable to the original and things still worked.

Each time I copied a 6 gig DVD structure folder and a 13 gig BD video file. I honestly have no clue. All test that were failing before, tonight succeeded and I have no idea how. My only guess would be that there is a faulty cable somewhere and once the data is traversing 2 switches to unraid it breaks. It must be way more sensitive than Windows because a transfer that failed and crashed the unraid box still completed to my home server. Because of this is why it makes no sense really, because I have changed nothing at this point. It is all back to the way it was and knock on wood, as of this moment it all worked.

Now who is to say it wont fail tomorrow, but still. A job transfer that broke the unraid worked to WHS. This same transfer didnt break the unraid when both were plugged into the dlink switch. Then after un-hooking and re-hooking everything back, the same transfer that once broke the unraid server is now working, go figure. I will leave it a bit and see what happens after a few days.

Quote

April 9, 201115 yr

Consider replacing any older cables with new ones. CAT 6 cables at monoprice.com are really cheap.

Quote

[SOLVED] unRAID network blowing up

Featured Replies

Archived

Account

Navigation

Search

Configure browser push notifications

Chrome (Android)

Chrome (Desktop)

Safari (iOS 16.4+)

Safari (macOS)

Edge (Android)

Edge (Desktop)

Firefox (Android)

Firefox (Desktop)