STP issue over bonded links


Recommended Posts

I'm setting up a brand new server, with a fresh install of Centos 7.

 

The server has 4 network interfaces, two connected to Switch A, and two to Switch B.

 

The two interfaces connected to Switch A are bonded on both sides with LACP for increased throughput, and likewise the pair connected to Switch B are also bonded.

 

On top of this, I have created a bridge between the two bonds, for resiliency in case one or other switch were to fail.

 

em1 > bond0

em2 > bond0

em3 > bond1

em4 > bond1

 

bond0 > br0

bond1 > br0

 

I'm having issues where the server isn't seeing the root bridge on the network (brctl showstp br0 is showing the local bridge ID as the root).

 

I've disconnected the cables from Switch B while I debug this (as while STP isn't working on the server this is creating a loop)

 

Using tcpdump, I can see the BPDU packets from Switch A (the root switch in the STP topology) are received on em1, but if I use tcpdump on bond0, the STP packets are not being received there. I can see outgoing BPDUs from the server on both bond0 and em1 - there just seems to be something dropping the incoming packets in the bond.

 

Has anyone come across this before, or have any ideas what I can try to get this resolved?

Link to comment
Share on other sites

They are trunked, as the server will be running a number of VMs which will need to be put onto different vlans. I could change the trunking mode from LACP to active/backup, but that would defeat the purpose of increased throughput.

Link to comment
Share on other sites

Either designate/purchase additional nics for the server for fail over and designate all current ports for aggregation on one switch and new ports for fail over or move all of the ports to one switch for throughput aggregation and have no fail over.  

Link to comment
Share on other sites

I'm unsure how that helps. If I have eight interfaces, bonded in fours to the two switches, and then bridged for failover using STP, that's essentially the same. I get more throughput (which isn't the issue here), but the BPDU packets are still going to be getting lost somewhere in the bonding setup.

Link to comment
Share on other sites

so mark 4 for primary and 4 for secondary, that is your failover.  

 

The switches are not liking the 8 ports connected at the same time and is creating a loopback per your original post.  The server does not seem to be able to handle or the switches are not able to handle the connection to both switches (I would go with the server here).  

 

You could always bring up a sniffer to determine where it is getting dropped, but with my limited experience with centos I would have to go with it does not have the capability natively to support what you are doing.  

 

Maybe @BudMan could shed a little more light onto this.  FWIW, I am fairly certain vmware would be able to handle this properly and you could create a centos guest under that as the switch technology built into vmware is a bit more than centos.

Link to comment
Share on other sites

If you bridged your connections then NO ###### you would have a loop..

 

So these switches your trying to connect to are not stacked?

Link to comment
Share on other sites

1 hour ago, BudMan said:

If you bridged your connections then NO ###### you would have a loop..

 

So these switches your trying to connect to are not stacked?

No, they don't support stacking.

 

The whole point is that STP will disable the link to one of the switches, and only re-enable it should the other switch fail. I have this architecture working fine in a number of other production deployments, but without the bonding layer involved. Something in the bonding driver in Linux seems to be dropping the BPDU packets, so the bridge is putting both links (bond0 and bond1) into forwarding state. If I were to do it with just em1 and em3, it would work fine, one link to each switch, and no bonding, STP would disable the ports as expected. The issue I'm having is that with bonded links, the BPDU packets are coming in (I can see them using TCP dump on em1), but being dropped by the bonding driver (they're not visible when running tcpdump on bond0). The bond should be passing all of the traffic coming in (save maybe the LACP packets)

Link to comment
Share on other sites

Possible I don't have any experience with bond interfaces in linux.. Not a fan overall, we use etherchannel sure..  But whey never bridge interfaces on a server.. You run the physical connections from the server to a stacked switch.. This provides your extra bandwidth and redundancy in case of a port or switch failure..

 

If you need more bandwidth - use a faster interface, go to 10 gig, shoot go to 40 gig ;)

 

Bridging interfaces on a host is almost always BAD IDEA!!

 

What sort of traffic are you using over the lacp?  Is this file sharing sort of traffic SMB?  Or you trying to get more bandwidth to say a httpd or something?  How many clients hitting this server?  From where?  Are they on the same L2 or different L3?

Link to comment
Share on other sites

1 hour ago, BudMan said:

Bridging interfaces on a host is almost always BAD IDEA!!

As long as you have STP running to prevent loops, it works fine, as I said, I have it working successfully in a number of other production deployments - albeit without bonding in use too.

 

I've managed to get it working with just bonding. Even though the switches are not stacked, it turns out that LACP is capable of doing what I need, so with all four interfaces bonded on the server, and the pairs bonded on the switches. On the server end, it simply detects two aggregators, and selects one to use to pass the traffic, and it can then fail over to the other should one go down. The failover isn't as quick as on my other setups using RSTP (it takes about 10 seconds) - but given the small chance of failure in the first place, I think it's fine.

Link to comment
Share on other sites

This topic is now closed to further replies.
  • Recently Browsing   0 members

    • No registered users viewing this page.