This post represents the solution and explanation for quiz-21.
Have a look at it to test your knowledge. ☺

Quiz Review

A large enterprise consisting of multiple remote sites, uses a private MPLS cloud with EIGRP as the protocol between PE to CE and MPLS L3 VPNs to achieve the necessary connectivity.
Of particular interes, Site-A and Site-B have a Backdoor Link between them.
Everything works as desired until a new request reaches the network department: a new Site-ABC will be connected to PE-2 and users in this site will mostly connect to resources behind Site-A / CE-1 (192.168.1.55).
The requirement is to make sure that this traffic (from PE-2 to CE-1) will use the backdoor link instead of the MPLS cloud:

quiz-21-solution Pre-bestpath Cost Community

A simple investigation shows that in the current setup, traffic from PE-2 to CE-1 goes via the MPLS cloud:

PE-2#traceroute vrf CUST_A 192.168.1.55

Type escape sequence to abort.
Tracing the route to 192.168.1.55

  1 10.0.0.6 [MPLS: Labels 16/19 Exp 0] 60 msec 60 msec 40 msec
  2 192.168.1.1 [MPLS: Label 19 Exp 0] 36 msec 36 msec 40 msec
  3 192.168.1.2 44 msec *  20 msec
PE-2#

Problem Statement

The network engineer tries to understand the current routing status for destination 192.168.1.55 and finds out that PE-2 prefers the BGP path versus the EIGRP one:

PE-2#sh ip route vrf CUST_A 192.168.1.55
Routing entry for 192.168.1.55/32
  Known via "bgp 100", distance 200, metric 156160, type internal
  Redistributing via eigrp 100
  Advertised by eigrp 100 metric 100000 10 255 1 1500
                bgp 100 (self originated)
  Last update from 10.255.255.1 00:22:30 ago
  Routing Descriptor Blocks:
  * 10.255.255.1 (Default-IP-Routing-Table), from 10.255.255.1, 00:22:30 ago
      Route metric is 156160, traffic share count is 1
      AS Hops 0

PE-2#
PE-2#sh bgp vpnv4 uni all 192.168.1.55
BGP routing table entry for 100:1:192.168.1.55/32, version 22
Paths: (1 available, best #1, table CUST_A)
  Not advertised to any peer
  Local
    10.255.255.1 (metric 3) from 10.255.255.1 (10.255.255.1)
      Origin incomplete, metric 156160, localpref 100, valid, internal, best
      Extended Community: RT:100:1 Cost:pre-bestpath:128:156160
        0x8800:32768:0 0x8801:100:130560 0x8802:65281:25600 0x8803:65281:1500
      mpls labels in/out nolabel/19
PE-2#

He tries to influence the BGP path selection by setting a high local preference on the redistributed EIGRP routes, but unfortunatelly PE-2 still choses the prefix received over the MPLS as the best path:

ip access-list standard PE1_LOOPBACK
 permit 192.168.1.55
!
route-map SET_LP_500 permit 10
 match ip address PE1_LOOPBACK
 set local-preference 500
route-map SET_LP_500 permit 999
!
router bgp 100
 address-fam ipv4 vrf CUST_A
  redistribute eigrp 100 route-map SET_LP_500
PE-2#sh bgp vpnv4 uni all 192.168.1.55
BGP routing table entry for 100:1:192.168.1.55/32, version 22
Paths: (1 available, best #1, table CUST_A)
  Not advertised to any peer
  Local
    10.255.255.1 (metric 3) from 10.255.255.1 (10.255.255.1)
      Origin incomplete, metric 156160, **localpref 100, valid, internal, best
      Extended Community: RT:100:1 Cost:pre-bestpath:128:156160
        0x8800:32768:0 0x8801:100:130560 0x8802:65281:25600 0x8803:65281:1500
      mpls labels in/out nolabel/19
!
! the prefix received over MPLS (with default LP = 100) is still chosen as best !!
! although the redistributed one has LP = 500
!

As most of you already answered in the quiz the reason for not being able to influence the BGP Best Path selection with the Local Preference is the existence of the Cost Community as seen in this line Cost:pre-bestpath:128:156160.
What's that ?

Pre-bestpath Cost Community

Pre-bestpath is an extended non-transitive community that Cisco introduced in order to be able to influence the BGP Best Path selection in an arbitrary fashion, after partial computations of the normal process (or even before it starts) and take a decision based on local criteria. In some cases, especially in situation with Backdoor links, this can also help against routing loops.
This is not (yet, and probably will never be) part of an RFC standard as the proposed document is still in draft status. This draft is called "BGP Custom Decisions".
To achieve such a custom decision, this community uses a Point of Insertion (POI) to indicate at what point during the BGP Best Path Selection process the router has to stop and consider the value of the Cost Community. The breaking points or insertion points mentioned in the draft document are:

quiz-21-cost-community Insertion Points in BGP Best path selection

  • POI = 128, use Cost Community before anything else
  • POI = 129, use Cost Community after the IGP cost to next-hop has been compared
  • POI = 130, use Cost Community after the paths advertised by BGP speakers in a neighboring autonomous system (if any) have been selected
  • POI = 131, use Cost Community after BGP IDs have been compared

Out of all these, Cisco implemented only POI = 129 (IGP) that represents the default and POI = 128 that represents the ABSOLUTE_VALUE.

WARNING

This POI 128 (absolute value) totally modifies the BGP best path selection process by making the router compare the cost values before the entire process starts - hence the name pre-bestpath cost community.

EIGRP and Cost Community (Pre-bestpath)

Before presenting the solutions for the quiz, let's review some of the characteristics of EIGRP used as PE-CE protocol in relation with the pre-bestpath cost community:

  • by default, EIGRP routes redistributed into BGP get automatically the Cost Community POI 128 => this means that cost value is evaluated /compared before any other path attributes (including weight). Also, the community-ID is as well 128.
  • the value/cost of the pre-bestpath community is the composite metric of the redistributed EIGRP route
  • routes without this cost community are evaluated as if they had a cost value of 2147483647, which represents half of the maximum possible value
  • MP-BGP uses other set of communities to transport EIGRP metric values from one PE to another:
    • 0x8800 = Route Flag and Tag
    • 0x8801 = AS Number and Delay
    • 0x8802 = Reliability, Next Hop, and Bandwidth
    • 0x8803 = Reserve, Load and MTU
    • 0x8804 = (for external routes) Remote AS Number and Remote ID
    • 0x8805 = (for external routes) Remote Protocol and Remote Metric
  • the MP-BGP cloud is interpreted as a metric zero (0)

quiz-21-eigrp-as-pe-ce EIGRP as PE-CE Protocol

For example, 0x8801 AS Number determines if the prefix will be redistributed as internal (same AS number) or external (different AS numbers).

Now let's put all together and reveal the things behind the scene. As you can see in the picture below, PE-2 has the following information in the BGP table and it will try to find best path:

  • prefix 192.168.1.55 received from PE-1 over the MP-BGP with a pre-bestpath community of 128:156160 - this value represents the composite metric of the EIGRP route at the moment it was redistributed from EIGRP to BGP on PE-1
  • the MPLS cloud does not modifies this cost (MPLS cloud is transparent)
  • prefix 192.168.1.55 received from CE-2 over the EIGRP gets redistributed into BGP and immediately receives a pre-bestpath community of 128:158720 - this value represents the composite EIGRP metric at this point
  • due to the existence pre-bestpath, MP-BGP path is selected the best path, even though the locally redistributed one has a weight of 32768 (default weight for all locally originated routes) - as explained, weight does not count when pre-bestpath exists

quiz-21-explanation Pre-bestpath in action

Quiz Solutions

Now, knowing that the Pre-bestpath Cost Community modifies the normal BGP best path selection process by considering the value of this community (the cost) before anything else is compared (due to ABSOLUTE point of insertion of 128), it becomes obvious that modifying any of the "clasic" path attributes, such as Local Preference, AS PATH, MED or even Weight will not help.
The solutions would have to find a way to modify the pre-bestpath cost or to disable this community. Let's see them in action !

1. Change pre-bestpath on PE-2

One method to get the result we want is to modify the pre-bestpath community on PE-2 during redistribution from EIGRP into MP-BGP. Since we cannot use the same community-ID of 128 (because this gets over-written by the redistribution process) I will use a lower community-ID (1 in below example) and a random cost value (9999999) - according to the RFC Draft: "the Cost Community with the lowest Community-ID is considered first":

PE-2#sh run | s access-list|route-map|router bgp
ip access-list standard CE1_LOOPBACK
 permit 192.168.1.55
!
route-map SET_EXT_COST_COMMUNITY permit 10
 match ip address CE1_LOOPBACK
 set extcommunity cost pre-bestpath 1 9999999
route-map SET_EXT_COST_COMMUNITY permit 99
!
!
router bgp 100
 address-family ipv4 vrf CUST_A
  redistribute eigrp 100 route-map SET_EXT_COST_COMMUNITY
PE-2#sh bgp vpnv4 uni all 192.168.1.55
BGP routing table entry for 100:1:192.168.1.55/32, version 8
Paths: (1 available, best #1, table CUST_A)
  Advertised to update-groups:
     1
  Local
    192.168.2.2 from 0.0.0.0 (10.255.255.2)
      Origin incomplete, metric 158720, localpref 100, weight 32768, valid, sourced, best
      Extended Community: RT:100:1
        Cost:pre-bestpath:1:9999999
        Cost:pre-bestpath:128:158720 0x8800:32768:0 0x8801:100:133120
        0x8802:65282:25600 0x8803:65281:1500
      mpls labels in/out 33/nolabel
PE-2#
PE-2#traceroute vrf CUST_A 192.168.1.55

Type escape sequence to abort.
Tracing the route to 192.168.1.55

  1 192.168.2.2 64 msec 28 msec 12 msec
  2 192.168.12.1 44 msec *  24 msec
PE-2#

Note that the pre-bestpath:128:<eigrp_metric> also gets added during redistribution

2. Change pre-bestpath on PE-1

A similar solution to the above one, but this time play with the pre-bestpath cost community between the BGP peers:

PE-1#sh run | s access-list|route-map|router b
ip access-list standard CE1_LOOPBACK
 permit 192.168.1.55
!
route-map SET_EXT_COMM permit 10
 match ip address CE1_LOOPBACK
 set extcommunity cost pre-bestpath 128 7777777
route-map SET_EXT_COMM permit 999
!
!
router bgp 100
 address-family vpnv4
  neighbor 10.255.255.2 route-map SET_EXT_COMM out
PE-2#sh bgp vpnv4 uni all 192.168.1.55
BGP routing table entry for 100:1:192.168.1.55/32, version 36
Paths: (2 available, best #2, table CUST_A)
Flag: 0x820
  Advertised to update-groups:
     1
  Local
    10.255.255.1 (metric 3) from 10.255.255.1 (10.255.255.1)
      Origin incomplete, metric 156160, localpref 100, valid, internal
      Extended Community: RT:100:1
        Cost:pre-bestpath:128:7777777 0x8800:32768:0
        0x8801:100:130560 0x8802:65281:25600 0x8803:65281:1500
      mpls labels in/out 24/20
  Local
    192.168.2.2 from 0.0.0.0 (10.255.255.2)
      Origin incomplete, metric 158720, localpref 100, weight 32768, valid, sourced, best
      Extended Community: RT:100:1 Cost:pre-bestpath:128:158720
        0x8800:32768:0 0x8801:100:133120 0x8802:65282:25600 0x8803:65281:1500
      mpls labels in/out 24/nolabel
PE-2#

Note that the pre-bestpath:128:7777777 overwrites the initial one, as you cannot have two communities for the same point of insertion, 128 and the same community-ID, 128

In this case, comparison is done between same POI & community-ID (128) but EIGRP redistributed route has a lower cost (158720) versus the one received over MP-BGP (7777777).

3. Increase metrics using Offset-lists

Since the MPLS cloud is transparent for the EIGRP metric carried from PE-1 to PE-2, another solution to the quiz would be to modify the composite metric just before entering BGP, on PE-1, with an offset-list:

PE-1#sh run | s access-list|router eigrp
ip access-list standard CE1_LOOPBACK
 permit 192.168.1.55
!
!
router eigrp 1
 address-family ipv4 vrf CUST_A
  offset-list CE1_LOOPBACK in 1000000 FastEthernet0/0

4. Disabling the Pre-bestpath Behaviour

Last solution to this quiz would be to disable the pre-bestpath behaviour. To achieve this, command "bgp bestpath cost-community ignore" tells the router to ignore the presence of the pre-bestpath community and to follow the normal best path selection process.
This is the least recommended solution because you have to apply this command on all BGP speakers, which is not scalable.

Not applying it on all devices, will lead to routing loops due to inconsistent best path selection process!

PE-1(config)#router bgp 100
PE-1(config-router)#bgp bestpath cost-community ignore
PE-1(config-router)#^Z
!
!
PE-2(config)#router bgp 100
PE-2(config-router)#bgp bestpath cost-community ignore
PE-2(config-router)#^Z
PE-2#sh bgp vpnv4 uni all 192.168.1.55
BGP routing table entry for 100:1:192.168.1.55/32, version 3
Paths: (2 available, best #2, table CUST_A)
Flag: 0x820
  Advertised to update-groups:
     1
  Local
    10.255.255.1 (metric 3) from 10.255.255.1 (10.255.255.1)
      Origin incomplete, metric 156160, localpref 100, valid, internal
      Extended Community: RT:100:1 Cost:pre-bestpath:128:156160
        0x8800:32768:0 0x8801:100:130560 0x8802:65281:25600 0x8803:65281:1500
      mpls labels in/out 19/18
  Local
    192.168.2.2 from 0.0.0.0 (10.255.255.2)
      Origin incomplete, metric 158720, localpref 100, weight 32768, valid, sourced, best
      Extended Community: RT:100:1 Cost:pre-bestpath:128:158720
        0x8800:32768:0 0x8801:100:133120 0x8802:65282:25600 0x8803:65281:1500
      mpls labels in/out 19/nolabel
PE-2#traceroute vrf CUST_A 192.168.1.55

Type escape sequence to abort.
Tracing the route to 192.168.1.55

  1 192.168.2.2 24 msec 16 msec 20 msec
  2 192.168.12.1 44 msec *  52 msec
PE-2#

This brings the end to another veeeery long post.
Thank you for all your comments and inputs in the quiz !