BGP Bestpath Selection - DMZ Link Bandwidth

Expand all



Contents

BGP Bestpath Selection - DMZ Link Bandwidth

Last updated: December 11, 2014

You must load the initial configuration files for the section, Basic BGP Routing, which can be found in CCIE R&S v5 Topology Diagrams & Initial Configurations. Reference the Advanced Technology Labs BGP Diagramto complete this task.

Task

  • Advertise Loopback0 interface of routers in AS 100 into BGP.
  • Enable a new eBGP peering between R5 and R1 using their directly connected DMVPN Tunnel interface.
  • Configure the DMVPN Tunnel interface on R5 with a bandwidth of 50 Mbps.
  • Modify the configuration of AS 200 routers so that R5 load-balances traffic destined to AS 100 Loopback0 prefixes proportional to the bandwidth of the links connecting R5 to R4 and R5 to R1.

Configuration

As mentioned in the previous tasks, local BGP process may implement equal-cost load-balancing to the paths that:

  • Have the same set of path attributes up to the MED (weight, Local Preference, Origin, MED).
  • Are of the same type (both learned via iBGP or eBGP).
  • Have the same IGP cost to reach their NEXT_HOP IP address.

If the above conditions are met and maximum-paths [ibgp] is configured under the BGP process, BGP will install multiple equal-cost routes into the local RIB and use them for load-balancing. We call the above condition as load-balancing conditions for BGP.

BGP also implements the unique unequal-cost load balancing feature. As you remember, unequal-cost load balancing could not be implemented easily with any IGP. The protocol needs a way to ensure that all alternative paths are loop-free. So far only EIGRP support this feature, because all alternate unequal cost paths are guaranteed to be loop free by the virtue of feasible successor property. As for BGP, it ensures loop-free property for any routes learned via eBGP, based on the duplicate AS number detection. Thus, it is possible to implement unequal-cost load-balancing in BGP toward the prefixes learned from other ASs.

This feature is called DMZ Link Bandwidth in IOS. The rationale behind this name is that load-balancing is based on the bandwidth of the links connecting the border BGP peers to their neighbors. Here is how it works for a single router with multiple eBGP peering links:

  1. You enable the feature on a border BGP router using the command bgp dmzlink-bw. With this command enabled, the BGP process will instruct the data plane to load-balance based on the bandwidth of the links used to connect to the external BGP peers. To select the links that are to be used for load-balancing, you configure the respective BGP peers using the command neighbor <IP> dmzlink-bw. The BGP process will consider the bandwidth on the links connecting to those peers when doing the unequal cost load-balancing. In Cisco terminology, those links are called the DMZ Links. The bandwidth is computed based on the bandwidth command configured on the respective interfaces, or based on the default administrative bandwidth.
  2. You enable the classic BGP equal-cost load-balancing using the command maximum-paths under the local BGP process. Now, assuming that you received the same prefix from multiple peers and all paths satisfy the BGP load-balancing conditions defined above, the BGP process will insert them into RIB and assign load-balancing weights proportional to the interface bandwidth values (DMZ link bandwidth).

This seems to be simple enough. Now what if you have multiple BGP border peers in your AS, each having just one uplink? Is it possible to implement an AS-wide load-balancing scheme based on the bandwidth of the upstream links? That is, it would be beneficial if every router learning multiple paths to the same prefix across iBGP links would load-balance toward them based on the bandwidth of the link where they were received. Cisco IOS allows for such implementation, using the following algorithm:

  1. When the DMZ Link bandwidth feature is enabled in the border BGP routers for the specific peers, the interface bandwidth value is copied into a new extended community attribute associated with the prefixes received from those eBGP peers. Thus, every prefix received on the eBGP peering link will carry the link's bandwidth as a special extended community attribute, if the link is enabled for the DMZ Link bandwidth feature. Remember that you need two commands in the border peers: bgp dmzlink-bw and neighbor <IP> dmzlink-bw.
  2. All BGP speakers in the AS should be configured to exchange extended communities across the iBGP peering links. This allows all internal BGP speakers to learn the bandwidth of the external link used to reach the prefixes. Use the command neighbor <IP> send-community extended to accomplish this.
  3. Provided that an internal BGP speaker has both bgp maximum-path ibgp and bgp dmzlink-bw commands enabled and receives multiple paths to reach the same prefix, it performs load-balancing if the paths meet the BGP load-balancing conditions.
  4. If all paths received carry the DMZ Link bandwidth extended community, the BGP process will perform unequal cost load-balancing proportional to the extended community attribute values.

In our scenario, R5 has two border routers in AS 100, R1, and R4 (after enabling the additional peering between R5 and R1). Our goal is to make R5 load balance for Loopback0 prfixes of AS 100 using both of its uplinks. We achieve this by configuring R5 with the dmzlink bandwidth feature on its uplinks to AS 100. At the same time, R5 is configured for eBGP multipathing and inserts both sets of paths into the local RIB.

R5:
router bgp 200
 maximum-path 4
 bgp dmzlink-bw
 neighbor 155.1.0.1 remote-as 100
 neighbor 155.1.0.1 dmzlink-bw
 neighbor 155.1.45.4 dmzlink-bw
!
interface Tunnel 0
 bandwidth 50000


R4:
router bgp 100
 network 150.1.4.4 mask 255.255.255.255


R1:
router bgp 100
 neighbor 155.1.0.5 remote-as 200
 network 150.1.1.1 mask 255.255.255.255


R6:
router bgp 100
 network 150.1.6.6 mask 255.255.255.255

Verification

Take any Loopback0 prefix learned from AS 100 and look it up in the BGP table. Notice that there are two paths, both marked as “multipath.” That means that BGP is using them both, even though only the second path is elected as “best” by the BGP process. Notice the DMZ-Link Bw attribute values for both paths with their ratio being 125000/6250=20.

R5#show ip bgp 150.1.1.1
BGP routing table entry for 150.1.1.1/32, version 26
Paths: (3 available, best #1, table default)
Multipath: eBGP
  Advertised to update-groups:
     15         16         17        
  Refresh Epoch 3
  100
    155.1.0.1 from 155.1.0.1 (150.1.1.1)
      Origin IGP, metric 0, localpref 100, valid, external, multipath, best
      DMZ-Link Bw 6250 kbytes
      rx pathid: 0, tx pathid: 0x0
  Refresh Epoch 2
  100
    155.1.13.1 (metric 3328) from 155.1.0.3 (150.1.3.3)
      Origin IGP, metric 0, localpref 100, valid, internal
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 2
  100
    155.1.45.4 from 155.1.45.4 (150.1.4.4)
      Origin IGP, localpref 100, valid, external, multipath(oldest)
     DMZ-Link 125000 kbytes
      rx pathid: 0, tx pathid: 0

Now look at the routing table entry for the same prefix. Notice that the share counters are 20:1, thus CEF hashing algorithm matched the exact ratio. That means that for approximately every 20 packets sent via R4, one packet is routed across R1 (although this proportion may be different with per-flow load balancing).

R5#show ip route 150.1.1.1
Routing entry for 150.1.1.1/32
  Known via "bgp 200", distance 20, metric 0
  Tag 100, type external
  Last update from 155.1.45.4 00:01:36 ago
  Routing Descriptor Blocks:
    155.1.45.4, from 155.1.45.4, 00:01:36 ago
      Route metric is 0, traffic share count is 20
      AS Hops 1
      Route tag 100
      MPLS label: none
  * 155.1.0.1, from 155.1.0.1, 00:01:36 ago
      Route metric is 0, traffic share count is 1
      AS Hops 1
      Route tag 100
      MPLS label: none

Now if we look at how CEF is programming this into the forwarding information base, more details will be uncovered.

 R5#show ip cef 150.1.1.1 internal  
 150.1.1.1/32, epoch 2, flags rib only nolabel, rib defined all labels, RIB[B], refcount 6, per-destination sharing
  sources: RIB 
  feature space:
   IPRM: 0x00018000
   Broker: linked, distributed at 4th priority
  ifnums:
   Tunnel0(10): 155.1.0.1
   GigabitEthernet1.45(13): 155.1.45.4
  path 7FC45840ED60, path list 7FC464C0DC90, share 1/1, type recursive, for IPv4
  recursive via 155.1.0.1[IPv4:Default], fib 7FC466C7DE08, 1 terminal fib, v4:Default:155.1.0.1/32
    path 7FC45840F230, path list 7FC464C0D790, share 1/1, type adjacency prefix, for IPv4
    attached to Tunnel0, adjacency IP midchain out of Tunnel0, addr 155.1.0.1 7FC466986A80
  path 7FC45840F7B0, path list 7FC464C0DC90, share 20/20, type recursive, for IPv4
  recursive via 155.1.45.4[IPv4:Default], fib 7FC466C7EC08, 1 terminal fib, v4:Default:155.1.45.4/32
    path 7FC45840F5A0, path list 7FC464C0DAB0, share 1/1, type adjacency prefix, for IPv4
    attached to GigabitEthernet1.45, adjacency IP adj out of GigabitEthernet1.45, addr 155.1.45.4 7FC4669875C0
  output chain:
    loadinfo 7FC464124AB8, per-session, 2 choices, flags 0003, 7 locks
    flags: Per-session, for-rx-IPv4
    16 hash buckets
      < 0 > IP midchain out of Tunnel0, addr 155.1.0.1 7FC466986A80 IP adj out of GigabitEthernet1.100, addr 169.254.100.1 7FC462148F58
      < 1 > IP adj out of GigabitEthernet1.45, addr 155.1.45.4 7FC4669875C0
      < 2 > IP adj out of GigabitEthernet1.45, addr 155.1.45.4 7FC4669875C0
      < 3 > IP adj out of GigabitEthernet1.45, addr 155.1.45.4 7FC4669875C0
      < 4 > IP adj out of GigabitEthernet1.45, addr 155.1.45.4 7FC4669875C0
      < 5 > IP adj out of GigabitEthernet1.45, addr 155.1.45.4 7FC4669875C0
      < 6 > IP adj out of GigabitEthernet1.45, addr 155.1.45.4 7FC4669875C0
      < 7 > IP adj out of GigabitEthernet1.45, addr 155.1.45.4 7FC4669875C0
      < 8 > IP adj out of GigabitEthernet1.45, addr 155.1.45.4 7FC4669875C0
      < 9 > IP adj out of GigabitEthernet1.45, addr 155.1.45.4 7FC4669875C0
      <10 > IP adj out of GigabitEthernet1.45, addr 155.1.45.4 7FC4669875C0
      <11 > IP adj out of GigabitEthernet1.45, addr 155.1.45.4 7FC4669875C0
      <12 > IP adj out of GigabitEthernet1.45, addr 155.1.45.4 7FC4669875C0
      <13 > IP adj out of GigabitEthernet1.45, addr 155.1.45.4 7FC4669875C0
      <14 > IP adj out of GigabitEthernet1.45, addr 155.1.45.4 7FC4669875C0
      <15 > IP adj out of GigabitEthernet1.45, addr 155.1.45.4 7FC4669875C0
    Subblocks:
     None

Note that out of the 16 hash buckets available, the DMVPN Tunnel is only being used for one, and the GigabitEthernet1.45 interface is being used for the rest. As of IOS 12.2(4)T, load balancing can be achieved between an EBGP path and an IBGP path. As long as everything else is equal, BGP will load balance between the two paths. The maximum-paths eibgp command must be used. In our example, R5 has three paths total, two external paths that are being used for load balancing and an internal path from R3. If we enabled maximum-paths eibgp on R5, R5 would use all three paths for load balancing.

R5(config)#router bgp 200
R5(config-router)#no maximum-paths 4
R5(config-router)#maximum-paths eibgp 4
%BGP: This may cause traffic loop if not used properly (command accepted)
%BGP-4-MULTIPATH_LOOP: This may cause traffic loop if not used properly (command accepted)
!
R5#show ip bgp 150.1.1.1
BGP routing table entry for 150.1.1.1/32, version 39
Paths: (3 available, best #1, table default)
Multipath: eiBGP
  Advertised to update-groups:
     15         16         17        
  Refresh Epoch 6
  100
    155.1.0.1 from 155.1.0.1 (150.1.1.1)
      Origin IGP, metric 0, localpref 100, valid, external, multipath, best
      DMZ-Link Bw 6250 kbytes
      rx pathid: 0, tx pathid: 0x0
  Refresh Epoch 5
  100
    155.1.13.1 (metric 3328) from 155.1.0.3 (150.1.3.3)
      Origin IGP, metric 0, localpref 100, valid, internal, multipath
      rx pathid: 0, tx pathid: 0
  Refresh Epoch 5
  100
    155.1.45.4 from 155.1.45.4 (150.1.4.4)
      Origin IGP, localpref 100, valid, external, multipath(oldest)
      DMZ-Link Bw 125000 kbytes
      rx pathid: 0, tx pathid: 0
!
R5#show ip route 150.1.1.1
Routing entry for 150.1.1.1/32
  Known via "bgp 200", distance 20, metric 0
  Tag 100, type external
  Last update from 155.1.45.4 00:01:47 ago
  Routing Descriptor Blocks:
    155.1.45.4, from 155.1.45.4, 00:01:47 ago
      Route metric is 0, traffic share count is 20
      AS Hops 1
      Route tag 100
      MPLS label: none
    155.1.13.1, from 155.1.0.3, 00:01:47 ago
      Route metric is 0, traffic share count is 20
      AS Hops 1
      Route tag 100
      MPLS label: none
    * 155.1.0.1, from 155.1.0.1, 00:01:47 ago
      Route metric is 0, traffic share count is 1
    AS Hops 1
     Route tag 100
     MPLS label: none

Note that now all three routes are being used. As you noticed earlier from the log message, this command must be used carefully because it can introduce forwarding loops.

^ back to top