Using 10Gb adaptor in PCIe 1x slot (An Unraid 10Gb journey)


KptnKMan

Recommended Posts

On 9/25/2021 at 7:54 PM, Ford Prefect said:

I agree, that there is some Kind of confusion.
Maybe caused by the Statement that the L3-MTU, although sitting on a higher layer needs to be smaller as L2-MTU, which sits on a lower layer.

What you need to understand, at least that is the way I understood it, is that a communication on L3 will not travel on L3 physically, only logically.
Hence MTU on the respective layer must be the same for each communication device (in oder to avoid fragmentation).
Physically the communication travels down on the respective local device stack until it reaches the lowest layer and only then the packet gets actually transported to the next hop.

A data packet on L3 with 1500MTU can have 1480bytes of net data without risking fragmentation. L3 envelope is an additional 20bytes.
As depicted oin the diagram you linked earlier, from MT, L2 adds another envelope.

Yeah, I don't think your statements were confusing for me in that way exactly, because I think I get the L2-L3 relationship (I'm not new to networking, but with this particular use case I've been trying to be "correct".) but I guess there is no correct answer.

I was mainly looking for a recommendation of a way forward to make sense of.

You indicated what I shouldn't do, so I was really trying to understand what a recommendation of what I should do would be.

 

Anyhow, thinking of a 20 byte IP header + payload combination makes enough sense that I can go on with that.

I also found some more examples like this article that support that understanding is a good way to go forward.

 

On 9/25/2021 at 7:54 PM, Ford Prefect said:

This is what interfaces on my CRS326 look like:

<image>

notice, that L2 MTU is shown with higher numbers (where L3 MTU is 9000 and 1500 respectively).

As much as this makes sense, the MTUs in the image raise new questions for me, like how are the odd numbered MTUs related to 1500 and 9000 respectively. But as it is, I'm not going to worry too much about that right now.

 

As it is now, I've set the MTU to 9000, and L2-MTU to 9800 (Max L2 apparently being 10218), and the sky doesn't seem to have fallen yet. 😅

Everybody seems to be behaving, and the network appears to be performing as desired.

 

I've also re-enabled the active-backup/10Gb-1Gb bond0, and that seems to be working without issues. 👍

I'll be monitoring the bond setup and seeing what happens.

 

Next I guess will be to return the 1Gb transceivers, get 10Gb RJ45 transceivers, and get an additional CSS610 for across the house. 🤔

Link to comment
33 minutes ago, KptnKMan said:

As it is now, I've set the MTU to 9000, and L2-MTU to 9800 (Max L2 apparently being 10218), and the sky doesn't seem to have fallen yet. 😅

Everybody seems to be behaving, and the network appears to be performing as desired.

Yes, nice...my numbers are different, because - for L3-MTU 1500, the 1592 for L2-MTU seemed to be the defaults in ROS, at least all my MT devices have them like that on 1G interfaces and I didn't set them myself.

For L3-MTU 9000 between 10G links of unraid, my switches and Router, I did set L2-MTU to the max of my RB4011 (which is different/lower from the max values of CRS3xx and CSS610).

 

33 minutes ago, KptnKMan said:

I've also re-enabled the active-backup/10Gb-1Gb bond0, and that seems to be working without issues. 👍

I'll be monitoring the bond setup and seeing what happens.

 

Next I guess will be to return the 1Gb transceivers, get 10Gb RJ45 transceivers, and get an additional CSS610 for across the house. 🤔

...sounds like a plan. Good luck with your project...keep us posted as it progresses.

Link to comment
  • 3 weeks later...

So I'm back with an update.

 

TL;DR:

1) I haven't returned the Ubiquity transceivers yet.

2) I finally made a quick and dirty network diagram.

3) Finally moved my main WAN router over to the server room, centralising everything except the ISP router.

4) Ordered, received, and setup the new 3rd CSS610 for the media setup.

5) Connected the new CSS610 across the house with new Mikrotik S+RJ10 transceivers and CAT7A+ S/FTP.

6) Connected the bond0 active-backup secondary 1Gb to the CRS309 using a couple new Mikrotik S-RJ10 transceivers.

 

 

Detailed:

1) I've been doing some testing on the Ubiquity transceivers I purchased before, I'm planning to return them but they seem to register now but still no link. Gonna keep playing with them a bit more.

 

2) I've made better, but I finally got around to making a very quick diagram:

image.png.49d3e6acc858e0f18bbda7e53454d722.png

 

3) Yep, I finally got the CAT7 cables run over to the server room and moved the WAN router. All working well and centralised in the same location. The ISP router runs in passthrough, and is direct connected to coaxial, so I can't move that unfortunately.

Pic of the core setup:

image.png.b314b59f27ace07cef585925d9ffb23f.png

 

4+5) So with the CAT7A+ run complete with keystones, and some short CAT7 cables acquired for each end, I hooked everything up. Those S+RJ10 transceivers do get toasty though, as advised. Measured temps of 83c, highest I've seen so far under load, which seems kinda high but I think its understood that these get hot?

Got the 3rd CSS610 hooked up and it is behaving normally, link up at 10Gbit, no issues encountered.

 

6) Purchased a couple S-RJ10 transceivers so that I could hook those directly into the CRS309, and that seems to work well. WIthout needing to bridge over to the CSS610 it seems more stable also, I haven't seen any dropouts. Also looks like I've almost filled the CRS309 SFP+ ports as well, which I didn't expect when starting this.

 

 

 

So all in all, I'm pretty pleased with how everything is turned out.

Gonna keep tweaking as I go, and I heard on the Uncast YouTube video on 10Gb that there is apparently some tips for running 10Gb Unraid somewhere. Gonna try and find that.

Link to comment
On 10/18/2021 at 7:18 AM, KptnKMan said:

So I'm back with an update.

...sound good! Nice progress you made ... thanks for sharing

 

On 10/18/2021 at 7:18 AM, KptnKMan said:

6) Connected the bond0 active-backup secondary 1Gb to the CRS309 using a couple new Mikrotik S-RJ10 transceivers.

Ummm... so you bought the 10G S-RJ10 tranceivers....according to your diagramm, they are sitting next to each other....please don't do that but rather mix them, so that only every 2nd port is populated with one....you can have a fiber or DAC port between them populated just fine.

 

On 10/18/2021 at 7:18 AM, KptnKMan said:

4+5) So with the CAT7A+ run complete with keystones, and some short CAT7 cables acquired for each end, I hooked everything up. Those S+RJ10 transceivers do get toasty though, as advised. Measured temps of 83c, highest I've seen so far under load, which seems kinda high but I think its understood that these get hot?

according to specs...even 90+ degC should not be an issue.

 

On 10/18/2021 at 7:18 AM, KptnKMan said:

6) Purchased a couple S-RJ10 transceivers so that I could hook those directly into the CRS309, and that seems to work well. WIthout needing to bridge over to the CSS610 it seems more stable also, I haven't seen any dropouts. Also looks like I've almost filled the CRS309 SFP+ ports as well, which I didn't expect when starting this.

That's what I thought....bonds are working better within the same switch/chip

 

Well, you could - according to your diagram, move one unraid bond to the css610 to gain some space and free up a RJ45 transceiver as well.

...or upgrade to: https://www.eurodk.de/de/products/crs/cloud-router-switch-317-1g-16srm ...Xmas is coming, after all ;-)

 

Edit: ...and you could, should you wish to free up that "raw-internet"-wire between Media- and server-room, tunnel this via a dedicated VLAN connection accross CSS610/CRS309/CSS610/WAN-Router

 

On 10/18/2021 at 7:18 AM, KptnKMan said:

So all in all, I'm pretty pleased with how everything is turned out.

did you finally move the Mellanox to the x1 slots, then?

Edited by Ford Prefect
Link to comment
13 hours ago, Ford Prefect said:

Ummm... so you bought the 10G S-RJ10 tranceivers....according to your diagramm, they are sitting next to each other....please don't do that but rather mix them, so that only every 2nd port is populated with one....you can have a fiber or DAC port between them populated just fine.

Ah, yeah, a couple things there.

I bought 2x S+RJ10 10Gb tranceivers and 2x S-RJ01 1Gb (I mistakenly stated them as S-RJ10), which are rated for much lower temperature. Currently they are next to each other, as in the diagram, but as of now they are very cool and not even nearly as lava hot as the single S+RJ10 hooked up to the media room CSS610. I'm planning to relocate the S-RJ01s soon, just haven't yet as they seem good right now.

 

13 hours ago, Ford Prefect said:

according to specs...even 90+ degC should not be an issue.

Yeah, I combed over the specs of the S+RJ10 and S-RJ01 to make sure I understood their tolerances as well.

That 10Gb transceiver is very hot, burns to the touch. I might have to put a small fan near it or something.

 

13 hours ago, Ford Prefect said:

did you finally move the Mellanox to the x1 slots, then?

Not yet, I temporarily relocated cards I don't need for a while out and they are both running in the 2nd full PCIEx16 slot of each system. I'd like to get a good idea of full performance before I eventually move them, if at all. At some point I will have to, I think.

 

13 hours ago, Ford Prefect said:

Edit: ...and you could, should you wish to free up that "raw-internet"-wire between Media- and server-room, tunnel this via a dedicated VLAN connection accross CSS610/CRS309/CSS610/WAN-Router

Proper VLANs are something I have my network pre-segmented for already, so that I can do that. My IOT stuff all uses separate WLANs from my domestic household stuff, and the Home Automation cabinet is separated for this exact reason as well. I played with VLANs at home a while ago, but now I have all widespread VLAN-capable hardware so the time is getting close... very close.

I try to be take things "one at a time" so I know things are stable before moving onto the next radical change. I guess thats the Ops Engineer in me.

 

13 hours ago, Ford Prefect said:

Well, you could - according to your diagram, move one unraid bond to the css610 to gain some space and free up a RJ45 transceiver as well.

...or upgrade to: https://www.eurodk.de/de/products/crs/cloud-router-switch-317-1g-16srm ...Xmas is coming, after all ;-)

Haha, I should probably be spending less money not more but this 10Gbit setup has been a pipe dream of mine for more than 8 years. I'm finally doing it, and very happy its working as well as it has.

 

Saying that... something weird is happening...

 

13 hours ago, Ford Prefect said:

That's what I thought....bonds are working better within the same switch/chip

Yeah, the bonds were working well... or so I thought... but I've highlighted something very frustratingly confusing by trying to properly configure the active-backup bond for both systems.

 

First of all, I did some testing by unplugging/disabling the 10Gb link for unraid2, which seemed to fail the link over as expected, but then it refused to fail back. It refused like at all, causing the server to become unavailable, until I eventually rebooted the system and it reverted to the 10Gb as primary (Without touching/changing anything else). This doesn't seem correct, as I shouldn't have to reboot the server to fix a failover.

I could see the link was up, and registering on the CRS309 but I could only see a couple Kb of Tx and nothing Rx.

 

So I decided that I should properly configure the active-backup on the CRS309, especially as I can now do it on a single box. I did some reading, and watched some YouTube vids about it, then dived in. Simple, I thought...

I started by removing the interfaces from the default bridge (As that's what the error I experienced earlier about this referred to), and configuring a new bridge, as you advised in your earlier post. Then I added that "bonding-unraid2" interface to the bridge.

Now there's a few things I have run into that are frustrating me, I've been round and round it for hours now and I'm not sure what to do. I've setup both unraid systems identically the same now and have had varying results.

 

1) If I so much as touch the bonding configuration of either bond on the CRS309, they stop working, and the entire server goes dark. I can't reach it, I have to gracefully shut it down or reboot it at the box and upon reboot, it comes up available again. Even so much as looking at the interface and clicking Ok or Apply (Without changing anything) causes it to freak out and give up.

 

2) I cannot set the MTU of the bonding-unraidX interface, as much as I've tried. It refuses, and throws an error that the "MTU could be set", then the server goes dark again until reboot. So at this point I've set everything back to 1500 on both unraid systems, just to try and have consistency to test and get working.

 

3) My second system unraid2, like refuses to play ball 90% of the time, and boots up with a link-local address of 169.254.129.35, indicating that it cannot find the network. I double/triple/quadruple checked the unraid network config and rebooted more than a dozen times to figure out what's going on. I found that a cold boot works more often and picks up an Ip of 192.168.178.12, but a reboot usually stalls for a bit right before login, and then uses 169.254.129.35. I use DHCP on my network, from my WAN router, never had any DHCP issues.

I think something on the CRS309 is very wrong, and I'm missing something.

I've tried using "arp" link monitoring with an IP, as well as Mii, both seem to have the same behaviour.

At this point I've shut down unraid2, as the alternative would be to delete the bonding interface and go back to what was "working".

 

4) My primary system unraid1, doesn't seem to be too bothered about any of this, and seems to be working with the bonding-unraid1 configuration on the CRS309. I haven't tested the bond failover by pulling the cable yet, as I've been busy trying to figure out why unraid2 is not picking up an IP with an identical config.

 

5) I updated the CRS309 to latest RouterOS stable 6.49, and nothing seems to have helped. Same issues. I saw some mentions of link stability in the changelogs and thought it might be worth a shot.

 

At this point I'm not sure what to do.

 

Some config screenshots (both systems networking configured identically) if that helps:

image.thumb.png.bd1e5796f01fa748271f5997a622c03a.png

 

image.thumb.png.e4b967fa7629335069173ac0bb7e975d.png

 

image.thumb.png.6c0b14fe52b7731026ea80eeb567fe1a.png

Link to comment
9 hours ago, KptnKMan said:

I'm planning to relocate the S-RJ01s soon, just haven't yet as they seem good right now.

with the low-temp 1Gpbs, there is no need to, I think....but definitely required for 10GBase-T ones.

9 hours ago, KptnKMan said:

Yeah, I combed over the specs of the S+RJ10 and S-RJ01 to make sure I understood their tolerances as well.

That 10Gb transceiver is very hot, burns to the touch. I might have to put a small fan near it or something.

If you dare, you could open it and check if there is a solder-pad for a fan, then actively cooling the CRS309 instead.

I've done this with my CRS326.

Some folks are mounting small cooling pads/chipset coolers on the transceiver sides, from what I have seen. But anything belpow 93degC should not be an issue für the S-RJ10.

 

9 hours ago, KptnKMan said:

Proper VLANs are something I have my network pre-segmented for already, so that I can do that. My IOT stuff all uses separate WLANs from my domestic household stuff, and the Home Automation cabinet is separated for this exact reason as well. I played with VLANs at home a while ago, but now I have all widespread VLAN-capable hardware so the time is getting close... very close.

I try to be take things "one at a time" so I know things are stable before moving onto the next radical change. I guess thats the Ops Engineer in me.

for VLANs with RouterOS, read this: https://forum.mikrotik.com/viewtopic.php?f=13&t=143620

 

9 hours ago, KptnKMan said:

Saying that... something weird is happening...

 

Yeah, the bonds were working well... or so I thought... but I've highlighted something very frustratingly confusing by trying to properly configure the active-backup bond for both systems.

 

First of all, I did some testing by unplugging/disabling the 10Gb link for unraid2, which seemed to fail the link over as expected, but then it refused to fail back. It refused like at all, causing the server to become unavailable, until I eventually rebooted the system and it reverted to the 10Gb as primary (Without touching/changing anything else). This doesn't seem correct, as I shouldn't have to reboot the server to fix a failover.

I could see the link was up, and registering on the CRS309 but I could only see a couple Kb of Tx and nothing Rx.

That *is* weird...to my knowledge it should not do that.

Which side of the DAC did you pull...unraid or CRS?

 

9 hours ago, KptnKMan said:

 

So I decided that I should properly configure the active-backup on the CRS309, especially as I can now do it on a single box. I did some reading, and watched some YouTube vids about it, then dived in. Simple, I thought...

I started by removing the interfaces from the default bridge (As that's what the error I experienced earlier about this referred to), and configuring a new bridge, as you advised in your earlier post. Then I added that "bonding-unraid2" interface to the bridge.

reading that post, I believe that I was referring to the unraid side, when talking about the bridge....maybe because you were going to deploy DUAL 10G cards, weren't you?

You should not need to use an extra bridge in the CRS, also because only the first bridge will do/allow hardware-offloading.

 

  • In the CRS, you should have *one* bridge.
  • Then connect all ports to that bridge (under bridge - ports) to form a proper switch
  • This is your starting point.
  • When creating the bond, remove the (two) to-be-bonded ports from the bridge, then under interface-bonding create a bonding interface and attach the two physical ports/interfaces you previously removed from the bridge to it.
  • Then add the bonded interface to the bridge.

I must admit I never tested failover bonds, as I only used LACP/802.3ad bonds, which worked fine.

 

 

 

9 hours ago, KptnKMan said:

Now there's a few things I have run into that are frustrating me, I've been round and round it for hours now and I'm not sure what to do. I've setup both unraid systems identically the same now and have had varying results.

 

1) If I so much as touch the bonding configuration of either bond on the CRS309, they stop working, and the entire server goes dark. I can't reach it, I have to gracefully shut it down or reboot it at the box and upon reboot, it comes up available again. Even so much as looking at the interface and clicking Ok or Apply (Without changing anything) causes it to freak out and give up.

 

2) I cannot set the MTU of the bonding-unraidX interface, as much as I've tried. It refuses, and throws an error that the "MTU could be set", then the server goes dark again until reboot. So at this point I've set everything back to 1500 on both unraid systems, just to try and have consistency to test and get working.

that also should work. I have my 10G links between MT-Switches set to 9000MTU without a problem.

Not using it on my clients/servers though, as I have too many WiFi ones anyway.

Be aware, that you should set the L2-MTU to a higher value than the L3-MTU (which is what you use, when just naming it MTU).

As you are mixing a 10G und 1G link in that bond, question is what is better...9000 for the bond, assuming that the 10G will be prime most of the time and the 1G will "suffer", when failover occurs or just keep 1500, which will work Ok for 10G as well. 

 

9 hours ago, KptnKMan said:

 

3) My second system unraid2, like refuses to play ball 90% of the time, and boots up with a link-local address of 169.254.129.35, indicating that it cannot find the network. I double/triple/quadruple checked the unraid network config and rebooted more than a dozen times to figure out what's going on. I found that a cold boot works more often and picks up an Ip of 192.168.178.12, but a reboot usually stalls for a bit right before login, and then uses 169.254.129.35. I use DHCP on my network, from my WAN router, never had any DHCP issues.

I think something on the CRS309 is very wrong, and I'm missing something.

I've tried using "arp" link monitoring with an IP, as well as Mii, both seem to have the same behaviour.

At this point I've shut down unraid2, as the alternative would be to delete the bonding interface and go back to what was "working".

 

4) My primary system unraid1, doesn't seem to be too bothered about any of this, and seems to be working with the bonding-unraid1 configuration on the CRS309. I haven't tested the bond failover by pulling the cable yet, as I've been busy trying to figure out why unraid2 is not picking up an IP with an identical config.

 

5) I updated the CRS309 to latest RouterOS stable 6.49, and nothing seems to have helped. Same issues. I saw some mentions of link stability in the changelogs and thought it might be worth a shot.

 

At this point I'm not sure what to do.

besides the move with the extra bridge, I would not do, I don't see an obvious flaw in your setup....so I am out of my wits here.

You maybe should ask the experts, over in the MT forum: https://forum.mikrotik.com/index.php

Provide the info of your setup (analog to diagnostics zip of unraid) from the CRS (/export hide-sensitive file=anynameyoulike)

 

9 hours ago, KptnKMan said:

 

Some config screenshots (both systems networking configured identically) if that helps:

image.thumb.png.bd1e5796f01fa748271f5997a622c03a.png

 

image.thumb.png.e4b967fa7629335069173ac0bb7e975d.png

 

image.thumb.png.6c0b14fe52b7731026ea80eeb567fe1a.png

...that all looks OK for me.

Link to comment
1 hour ago, Ford Prefect said:

with the low-temp 1Gpbs, there is no need to, I think....but definitely required for 10GBase-T ones.

Yeah, I did some reading and came to the same conclusion before I posted that the 10Gb ones NEED to be separated, but not so much for the 1Gb ones. I'll keep an eye on them regardless, but the 1Gb transceivers are barely warm to the touch.

 

1 hour ago, Ford Prefect said:

If you dare, you could open it and check if there is a solder-pad for a fan, then actively cooling the CRS309 instead.

I've done this with my CRS326.

Some folks are mounting small cooling pads/chipset coolers on the transceiver sides, from what I have seen. But anything belpow 93degC should not be an issue für the S-RJ10.

Well, I'm quite a daring guy so that sounds harmless. When I have time I'll do some research and see if anyone has done a teardown of the CRS309, and if not then I'll take a look.

Its not reached 90c but its been in the mid 80s at load, so I'll investigate the cooling possibilities just the same.

 

1 hour ago, Ford Prefect said:

That *is* weird...to my knowledge it should not do that.

Which side of the DAC did you pull...unraid or CRS?

Yeah super weird. Pulled it at the CRS309 side, should that make a difference?

This whole goose chase has kinda confused me, as it doesn't seem to have a good reason for it.

I might just wipe the network config by deleting the network.cfg and network-rules.cfg files and start again, not sure what else I can do on the unraid side to be honest.

 

1 hour ago, Ford Prefect said:

reading that post, I believe that I was referring to the unraid side, when talking about the bridge....maybe because you were going to deploy DUAL 10G cards, weren't you?

You should not need to use an extra bridge in the CRS, also because only the first bridge will do/allow hardware-offloading.

 

  • In the CRS, you should have *one* bridge.
  • Then connect all ports to that bridge (under bridge - ports) to form a proper switch
  • This is your starting point.
  • When creating the bond, remove the (two) to-be-bonded ports from the bridge, then under interface-bonding create a bonding interface and attach the two physical ports/interfaces you previously removed from the bridge to it.
  • Then add the bonded interface to the bridge.

I must admit I never tested failover bonds, as I only used LACP/802.3ad bonds, which worked fine.

Yeah sorry this is my bad, I miscommunicated here.

I didn't add an extra bridge, I know that would be a bad idea. There's a default configured bridge already present, and that's what I used.

I meant to say: "I started by removing the interfaces from the default bridge (As that's what the error I experienced earlier about this referred to), and configuring a new bonding interface (aka bonding-unraid2), as you advised in your earlier post. Then I added that bonding-unraid2 interface to the bridge."

Reading your instructions, that's exactly what I did so that's good to know at least.

From everything I've read and videos I've seen, setting up an active-backup bond should be simple and easy. Story of my life. 🙄

 

In the unraid interface it even states: "Mode 1 (active-backup) is the recommended setting. Other modes allow you to set up a specific environment, but may require proper switch support. Choosing a unsupported mode can result in a disrupted communication."

 

1 hour ago, Ford Prefect said:

that also should work. I have my 10G links between MT-Switches set to 9000MTU without a problem.

Not using it on my clients/servers though, as I have too many WiFi ones anyway.

Be aware, that you should set the L2-MTU to a higher value than the L3-MTU (which is what you use, when just naming it MTU).

As you are mixing a 10G und 1G link in that bond, question is what is better...9000 for the bond, assuming that the 10G will be prime most of the time and the 1G will "suffer", when failover occurs or just keep 1500, which will work Ok for 10G as well. 

I can set the link L2-MTU and L3-MTU but on the bond interface, I'm not sure why, I cant set the L3-MTU (Labelled 'MTU').

You can see in the 2nd screenshot I posted earlier, the L2-MTU is greyed out, and inaccessible:

image.png.192aec2bea96e9c8ffa7bdf58e3eef2b.png

 

Yeah I thought that the active-backup bond would be simple enough, especially as its "default" bond in unraid, and would be nice to fail back to the onboard controller... But its not proven so simple. Maybe I should have just gotten a 2nd DAC for each server and tried bond across the dual 10Gb instead.

Then I could have just used more simple 802.3ad link-aggregation, balance-rr or another method that requires same-speed links, like I did before when I had 1Gb cards. I guess you live and learn, still looking to nuke the network config on both systems and start again, maybe should do on the CRS309 as well, in case I (more than likely) messed something up.

 

2 hours ago, Ford Prefect said:

besides the move with the extra bridge, I would not do, I don't see an obvious flaw in your setup....so I am out of my wits here.

You maybe should ask the experts, over in the MT forum: https://forum.mikrotik.com/index.php

Provide the info of your setup (analog to diagnostics zip of unraid) from the CRS (/export hide-sensitive file=anynameyoulike)

Yeah, thanks I appreciate that.

Honestly I'm hesitant to start from the beginning seeking help in yet another forum just yet, so I'm gonna go with the scorched earth approach first, just nuke all the configs, start again and see if that might fix things.

I think I might just go step by step and see where it all falls apart.

Link to comment
5 minutes ago, KptnKMan said:

In the unraid interface it even states: "Mode 1 (active-backup) is the recommended setting. Other modes allow you to set up a specific environment, but may require proper switch support. Choosing a unsupported mode can result in a disrupted communication."

uhmmm...maybe this the hint we need?

What if "active" means that only unraid/the active part decides which link it will use?

Can you try and just use *no* bond at all on the CRS (or 2nd alternative, if there is a passive backup mode?) and leave unraid in active backup mode (maybe only one partner can be active side at a time)?

Will it use the 10G as long as it is connected, then the 1G if you pull the 10G on one side? (maybe order of ethX in the bond config will choose which one to use as primary).

 

Sorry, cann't help you out with my setup, as I am travelling abroad atm.

 

 

5 minutes ago, KptnKMan said:

I can set the link L2-MTU and L3-MTU but on the bond interface, I'm not sure why, I cant set the L3-MTU (Labelled 'MTU').

You can see in the 2nd screenshot I posted earlier, the L2-MTU is greyed out, and inaccessible:

image.png.192aec2bea96e9c8ffa7bdf58e3eef2b.png

Hm...maybe it just calculates this L2 value on its own...I am not on stable version, but on long-term only.

Also I did not check for a Bond-IF, only used it on a physical one, like a SFP+ port.

Smells like a feature, not a bug here.

Link to comment
6 minutes ago, Ford Prefect said:

uhmmm...maybe this the hint we need?

What if "active" means that only unraid/the active part decides which link it will use?

Can you try and just use *no* bond at all on the CRS (or 2nd alternative, if there is a passive backup mode?) and leave unraid in active backup mode (maybe only one partner can be active side at a time)?

Will it use the 10G as long as it is connected, then the 1G if you pull the 10G on one side? (maybe order of ethX in the bond config will choose which one to use as primary).

That makes sense, and I should probably do some more reading on the subject.

I will say that the active-backup bond worked just fine without the CRS309 being setup with a special bonding config... So maybe that could be it.

 

I've just nuked the CRS309, and both unraids, all came up without issues first time, with the active-backup bond setup by default (Only with the onboard primary). I've just switched them around and rebooted.

 

9 minutes ago, Ford Prefect said:

Sorry, cann't help you out with my setup, as I am travelling abroad atm.

Yeah, no worries. I appreciate all you've done to help me out.

 

12 minutes ago, Ford Prefect said:

Hm...maybe it just calculates this L2 value on its own...I am not on stable version, but on long-term only.

Also I did not check for a Bond-IF, only used it on a physical one, like a SFP+ port.

Smells like a feature, not a bug here.

I'm not really sure what to make of this, but I was also on the long-term until yesterday.

Well, I'm going ahead right now with some testing.

Everything seems to be behaving, reboots come up without fuss.

 

Also took this opportunity to move the unraid SFP+ DACs along, to give that S+RJ10 a little space.

In the new config, it looks like the 1Gb is working as primary, so I will pull some cables later and see if it fails over to the 10Gb.

From my observations, the standby always has no Rx transmissions, and the Tx is only ever a few kbps.

 

The CRS309 reset config (only set interface names):

image.thumb.png.e66a8f4432c901207a67f268506cf7a8.png

 

unraid with default config (only reordered the nics):

image.thumb.png.c5b31a06b8f8fc495ad13fc960a6fdad.png

Link to comment

iperf3 confirms the 1Gb is primary:

root@blaster:~# iperf3 -c 192.168.178.12
Connecting to host 192.168.178.12, port 5201
[  5] local 192.168.178.11 port 52652 connected to 192.168.178.12 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   106 MBytes   889 Mbits/sec    0    235 KBytes       
[  5]   1.00-2.00   sec   105 MBytes   883 Mbits/sec    0    235 KBytes       
[  5]   2.00-3.00   sec   105 MBytes   880 Mbits/sec    0    235 KBytes       
[  5]   3.00-4.00   sec   107 MBytes   898 Mbits/sec    0    238 KBytes       
[  5]   4.00-5.00   sec   106 MBytes   893 Mbits/sec    0    238 KBytes       
[  5]   5.00-6.00   sec   105 MBytes   877 Mbits/sec    0    235 KBytes       
[  5]   6.00-7.00   sec   107 MBytes   894 Mbits/sec    0    235 KBytes       
[  5]   7.00-8.00   sec   105 MBytes   882 Mbits/sec    0    235 KBytes       
[  5]   8.00-9.00   sec   106 MBytes   891 Mbits/sec    0    235 KBytes       
[  5]   9.00-10.00  sec   106 MBytes   891 Mbits/sec    0   5.66 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.03 GBytes   888 Mbits/sec    0             sender
[  5]   0.00-10.00  sec  1.03 GBytes   886 Mbits/sec                  receiver

iperf Done.

 

Link to comment
10 minutes ago, KptnKMan said:

I'm not really sure what to make of this, but I was also on the long-term until yesterday.

Well, I'm going ahead right now with some testing.

Everything seems to be behaving, reboots come up without fuss.

...just saying that you're only able to set L2-MTU parameter on a physical Interface..a bonding interface is not a physical one.

10 minutes ago, KptnKMan said:

Also took this opportunity to move the unraid SFP+ DACs along, to give that S+RJ10 a little space.

In the new config, it looks like the 1Gb is working as primary, so I will pull some cables later and see if it fails over to the 10Gb.

From my observations, the standby always has no Rx transmissions, and the Tx is only ever a few kbps.

The active side (unraid) should only advertise the MAC (the unraid bond/bridge) on the active link to the CRS, so it does not get confused (as the bond will have a single MAC, that of the first NIC in it, I think).

 

this is what MT help states: https://help.mikrotik.com/docs/display/ROS/Bonding#Bonding-active-backup

active-backup
This mode uses only one active slave to transmit packets. The additional slave only becomes active if the primary slave fails. The MAC address of the bonding interface is presented onto the active port to avoid confusing the switch. Active-backup is the best choice in high availability setups with multiple switches that are interconnected.

 

2 minutes ago, KptnKMan said:

Pulled the 1Gb cable (the unraid end) on unraid1, and looks like it failed over to the 10Gb.

After plugging it back in, it stayed on the 10Gb as primary:

I need to figure out how to configure unraid to prioritise the 10Gb always, unless its disconnected.

Any idea on that?

Hmmm....

AFAIK the first NIC in the bond will "lend" its MAC to the bond.

Normally, in unraid this is eth0.

You should check and re-arrange NIC numbering in the network settings of unraid, so that the 10G is eth0.

So when booting for the first time/ after reboot, the 10G should be connected in order to activate the NIC for eth0.

  • Thanks 1
Link to comment

Ok, so I pulled the 1Gb cable on unraid2 (the unraid end) and it also failed over to the 10Gb.

I can also see it on the CRS309:

image.thumb.png.cdf28c67fd482010732793332ffd44aa.png

 

iperf3 confirms the (inconsistent as before with MTU@1500) 10Gb link:

root@blaster:~# iperf3 -c 192.168.178.12
Connecting to host 192.168.178.12, port 5201
[  5] local 192.168.178.11 port 32828 connected to 192.168.178.12 port 5201
[ ID] Interval           Transfer     Bitrate         Retr  Cwnd
[  5]   0.00-1.00   sec   822 MBytes  6.89 Gbits/sec    0    291 KBytes       
[  5]   1.00-2.00   sec   894 MBytes  7.50 Gbits/sec    0    297 KBytes       
[  5]   2.00-3.00   sec   876 MBytes  7.35 Gbits/sec    0    300 KBytes       
[  5]   3.00-4.00   sec   930 MBytes  7.80 Gbits/sec    0    294 KBytes       
[  5]   4.00-5.00   sec   721 MBytes  6.05 Gbits/sec    0    300 KBytes       
[  5]   5.00-6.00   sec   964 MBytes  8.08 Gbits/sec    0    288 KBytes       
[  5]   6.00-7.00   sec   876 MBytes  7.35 Gbits/sec    0    303 KBytes       
[  5]   7.00-8.00   sec   916 MBytes  7.69 Gbits/sec    0    291 KBytes       
[  5]   8.00-9.00   sec   775 MBytes  6.50 Gbits/sec    0    280 KBytes       
[  5]   9.00-10.00  sec   880 MBytes  7.38 Gbits/sec    0    283 KBytes       
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  8.45 GBytes  7.26 Gbits/sec    0             sender
[  5]   0.00-10.00  sec  8.45 GBytes  7.26 Gbits/sec                  receiver

iperf Done.

 

Honestly, I'm less concerned about the speed and MTU settings at this point, it seems to work so that's good enough.

I'd rather like to figure out how to set a priority/preference on the eth0 10Gb link.

Link to comment
1 minute ago, Ford Prefect said:

...just saying that you're only able to set L2-MTU parameter on a physical Interface..a bonding interface is not a physical one.

Ah gotcha. Thanks, that makes total sense.

 

3 minutes ago, Ford Prefect said:

The active side (unraid) should only advertise the MAC (the unraid bond/bridge) on the active link to the CRS, so it does not get confused (as the bond will have a single MAC, that of the first NIC in it, I think).

 

this is what MT help states: https://help.mikrotik.com/docs/display/ROS/Bonding#Bonding-active-backup

active-backup
This mode uses only one active slave to transmit packets. The additional slave only becomes activ

Absolutely, this seems to match what I read and understand.

I'm not sure how to set the priority in unraid apart from the NIC order in the "interface rules".

Is there a config I can set where I can assign some kind of weight to the ethX interfaces?

 

5 minutes ago, Ford Prefect said:

Hmmm....

AFAIK the first NIC in the bond will "lend" its MAC to the bond.

Normally, in unraid this is eth0.

You should check and re-arrange NIC numbering in the network settings of unraid, so that the 10G is eth0.

So when booting for the first time/ after reboot, the 10G should be connected in order to activate the NIC for eth0.

I made sure to rearrange the interfaces to prioritise both the 10Gb links (1 being disconnected of course, but ok) and then the 1Gb link. This way I figured it would just go down the totem until it reaches the 1Gb link, and refer back to the 10Gb when it returns.

Doesn't seem to follow that behaviour, unless there is a timeout I'm not aware of? I figured that 30 seconds was the norm, even though it failed-over immediately, didn't seem to skip a beat.

 

4 minutes ago, Ford Prefect said:

...as said, it must/should be the UDEV rules in network config...you should be able to tie a name (eth0) to a MAC.

Hmm, I assumed this was the order in the network-rules.cfg file or the interface order in the "bonding members of bond0".

Interestingly, there is no network.cfg file since I deleted it to reset the config. I assumed this is because I have not modified anything, and the active-backup bond0 was configured by default. If I set an interface description or something, I expect it to return.

I initially set the "interface rules" in the order I want, but I should look further into configuring linux UDEV rules, if there is something else I can set for weight or timeout.

I think there's something I'm missing, but I can't see anything else obvious.

Link to comment
5 minutes ago, Ford Prefect said:

...the rules cfg should work.
Maybe the 10G link just needs more/too much time to become active.

Yeah, this is what I'm wondering about as well.

I'm going to have to do some tests and see if there is a timeout I'm missing.

I remember seeing in the CRS309 that there were a few timeout settings, so I assumed there would be something similar for unraid.

 

7 minutes ago, Ford Prefect said:

What if you swap players?
Make unraid passive side and CRS active??

That's something I could do.

I imagine the passive side would just be regular links?

I guess that would that mean disabling the bond0, then configuring the separate switch interfaces as a bond interface as before?

Would that mess with the DHCP or do I need to spoof both nics as the same MAC? 🤔

I need to read the mikrotik wiki to find out their recommendations, or try their forum as you advised. 🤔

 

I need to go find some answers, I appreciate you taking the time to run through this with me.

Link to comment
  • KptnKMan changed the title to Using 10Gb adaptor in PCIe 1x slot (An Unraid 10Gb journey)

I started a new thread to investigate some issues I encountered today regarding more networking trouble.

I can troubleshoot networking issues well enough, but Unraid configuration I'm not an expert at.

Something strange is going on, and I'm really not sure why.

I'd really just like to get the bond networking working stable and well so I can not have to mess with it.

 

 

Link to comment

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Restore formatting

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.