CostiSer.Rohttp://costiser.ro/2019-10-08T02:40:00+01:00MACsec over WAN2019-10-08T02:40:00+01:00Costitag:costiser.ro,2019-10-08:2019/10/08/macsec-over-wan/<p><span class="dropcap-bg">MAC</span>sec is an interesting alternative to existing tunneling solutions, that protects Layer 2 by performing integrity, origin authentication and, optionally, encryption. Normal use-case is to use MACsec between hosts and access switches, between two hosts or between two switches. This article is a leftover from <a href="/2016/08/01/macsec-implementation-on-linux/">MACsec on Linux</a> that I first tested in 2016 when support for MACsec was just included in the kernel. I will describe how MACsec is used together with a Layer 2 GRE tunnel to protect the traffic between two remote sites, over WAN or Internet, like a site-to-site VPN at Layer 2.</p>
<h2 id="scenario-overview">Scenario Overview</h2>
<p>Today, I am not going to present MACsec - you can read my previous post about <a href="/2016/08/01/macsec-implementation-on-linux/">MACsec implementation on Linux</a>. Instead, this post presents a <strong>Layer 2 site-to-site VPN</strong> scenario using two Linux machines that perform MACsec inside a GRETAP tunnel - GRETAP is a Layer 2 tunnel. Here is a diagram with some high level notes:</p>
<p><a href="/uploads/macsec-over-wan-overview.png" title="MACsec over WAN Overview"><img alt="MACsec over WAN Overview" src="/uploads/macsec-over-wan-overview.png" title="MACsec over WAN Overview"/></a></p>
<ul>
<li>there are two remote sites, <strong>Site 1</strong> and <strong>Site 2</strong>, connected over a private WAN or over the Internet</li>
<li>both sites use same IP address space</li>
<li>a Layer 2 Tunnel is connecting the sites - in our demo, we are using a GRETAP tunnel, but any <a href="/2016/07/07/overlay-tunneling-with-openvswitch-gre-vxlan-geneve-greoipsec/">other L2 tunneling protocols</a> could also be used</li>
</ul>
<p>For demo purposes, I am using network namespaces (<code>netns</code>) and virtual ethernet interfaces (<code>veth</code>) to simulate the entire scenario on a single Linux machine. Let's jump right into it (assuming your machine is a Linux box), clone my <a href="https://github.com/costiser/macsec-over-wan">github repository</a> and run the <code>demo-setup-MACsecOverWAN.sh</code> bash script, as indicated below:</p>
<div class="row"><pre>
<black># preparation</black>
mkdir macsec-demo && cd macsec-demo
<black># clone my demo repository from github</black>
git clone https://github.com/costiser/macsec-over-wan.git
<black># run the setup script with "sudo"</black>
cd macsec-over-wan
<blue>sudo ./demo-setup-MACsecOverWAN.sh</blue>
</pre></div>
<h2 id="macsec-over-wan-implementation">MACsec over WAN Implementation</h2>
<h3 id="demo-overview">Demo Overview</h3>
<table class="notes">
<tr><td class="notes-left">
<div class="hidden-xs">NOTES</div><i class="icon-book-open icon-sm"> </i>
</td>
<td class="notes-right">
All commands in this demo must be executed with <b><blue>sudo</blue></b> because creating network namespaces and veth interfaces require root privileges!
</td></tr>
</table>
<p>Running the <code>sudo setup-MACsecOverWAN-layer2.sh</code> bash script will do the following:</p>
<ul>
<li>creates two network namespaces, <strong>host1</strong> and <strong>host2</strong>, representing two host devices</li>
<li>creates two network namespaces, <strong>nsra</strong> and <strong>nsrb</strong>, representing Linux Routers that are the tunnel endpoints</li>
<li>creates a network namespace, <strong>wan</strong>, for the WAN or Internet cloud</li>
</ul>
<p><a href="/uploads/macsec-over-wan.png" title="MACsec over WAN"><img alt="MACsec over WAN" src="/uploads/macsec-over-wan.png" title="MACsec over WAN"/></a></p>
<p>Test the connectivity between the two hosts with ICMP pings - use <code>ip netns exec <namespace> <command></code> to execute commands inside the network namespaces:</p>
<div class="row"><pre>
<blue>sudo ip netns exec host1 ping 192.168.1.2</blue>
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
64 bytes from 192.168.1.2: icmp_seq=1 ttl=64 time=0.204 ms
64 bytes from 192.168.1.2: icmp_seq=2 ttl=64 time=0.129 ms
64 bytes from 192.168.1.2: icmp_seq=3 ttl=64 time=0.123 ms
64 bytes from 192.168.1.2: icmp_seq=4 ttl=64 time=0.120 ms
...
</pre></div>
<p>While ping is running, capture the traffic in the <strong>wan</strong> namespace in order to confirm the MACsec and GRETAP overlay:</p>
<div class="row"><pre>
<blue>sudo ip netns exec wan tcpdump -neli wan1</blue>
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on wan1, link-type EN10MB (Ethernet), capture size 262144 bytes
21:21:55.814765 <blue>00:aa:aa:aa:aa:aa > 00:aa:aa:1f:1f:1f</blue>, ethertype IPv4 (0x0800), length 168: <green>1.1.1.1 > 2.2.2.2: GREv0, proto TEB (0x6558)</green>, length 134: <purple>00:00:00:00:00:01 > 00:00:00:00:00:02, ethertype Unknown (0x88e5)</purple>, length 130:
0x0000: 2c00 0000 0019 0000 0011 1111 0001 aafd ,...............
0x0010: 7858 40ec c447 2ab3 1463 6205 272b 85ef xX@..G*..cb.'+..
0x0020: 6a7b 7419 c3ec 4300 d0ab 9922 797a cf72 j{t...C...."yz.r
0x0030: 33ec 07fe 6b6b 3095 0e63 5743 06b7 813b 3...kk0..cWC...;
0x0040: 5501 7c3c 529f 72ec 668d 24d4 c443 e9f9 U.|<r.r.f.$..c.. !.=".v.|.." 0x0050:="" 0x0060:="" 21a6="" 3d8c="" 43dc="" 4b1a="" 5a27="" 5c58="" 653c="" 65a9="" 6770="" 7617="" 7cb0="" 7dfb="" a0a6="" c410="" d599="" e<k...\xz'..c.gp="" f63e="">..e.}.
0x0070: 0bab fae3 ....
21:21:55.814832 <blue>00:aa:aa:1f:1f:1f > 00:aa:aa:aa:aa:aa</blue>, ethertype IPv4 (0x0800), length 168: <green>2.2.2.2 > 1.1.1.1: GREv0, proto TEB (0x6558)</green>, length 134: <purple>00:00:00:00:00:02 > 00:00:00:00:00:01, ethertype Unknown (0x88e5)</purple>, length 130:
0x0000: 2c00 0000 001a 0000 0022 2222 0001 1bd5 ,........"""....
0x0010: 10f4 a646 7057 7acb 690b 1841 b35d 606a ...FpWz.i..A.]`j
0x0020: 93f9 f78c 95fb 424f e92e 9828 7953 d99b ......BO...(yS..
0x0030: e710 1a52 3218 4ed2 0854 e8aa 339b 4129 ...R2.N..T..3.A)
0x0040: 07d0 cce5 ff29 b398 ded1 8549 b83c d094 .....).....I.<..
0x0050: 1480 434f ca8b 17e4 e5bd 9fcc 5d33 617a ..CO........]3az
0x0060: 9651 3e68 1a0b 4db1 dcc8 3444 8d2e 0579 .Q>h..M...4D...y
0x0070: 2392 8953 #..S
</r.r.f.$..c..></pre></div>
<table class="notes">
<tr><td class="notes-left">
<div class="hidden-xs">NOTES</div><i class="icon-book-open icon-sm"> </i>
</td>
<td class="notes-right">
The first EtherType in the output of the tcpdump is <purple>0x6558</purple> that indicates a <code>Transparent Ethernet bridging</code>, meaning the GRETAP.<br>
The <purple>Unknown (0x88e5)</purple> is actually the EtherType for <code>MAC security (IEEE 802.1AE)</code>.
</br></td></tr>
</table>
<p><br>
<ins class="adsbygoogle" data-ad-client="ca-pub-7090359976267134" data-ad-format="auto" data-ad-slot="8618794473" style="display:block"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
<br/></br></p>
<h3 id="solution-explained">Solution Explained</h3>
<p>In order to explain how all pieces come together, let's look at the <code>nsra</code> and <code>nsrb</code> Linux routers that provide IP routing connectivity between the sites. Since MACsec operates at Layer 2, we need to build a Layer 2 tunnel between the sites. For that, we use GRETAP but any other Layer 2 Tunneling protocol can be used instead.</p>
<p>Note that I am setting some easy-to-read MAC addresses on most of the interfaces.</p>
<div class="row"><pre>
<black># on site1's Linux router (<purple>nsra</purple>)</black>
ip netns exec nsra <blue>ip link add gretap1 type gretap local 1.1.1.1 remote 2.2.2.2</blue>
ip netns exec nsra <blue>ip link set gretap1 address 00:00:00:11:11:11</blue>
ip netns exec nsra <blue>ip link set gretap1 up</blue>
<black># on site2's Linux router (<purple>nsrb</purple>)</black>
ip netns exec nsrb <blue>ip link add gretap1 type gretap local 2.2.2.2 remote 1.1.1.1</blue>
ip netns exec nsrb <blue>ip link set gretap1 address 00:00:00:22:22:22</blue>
ip netns exec nsrb <blue>ip link set gretap1 up</blue>
</pre></div>
<p>Now, we build the MACsec tunnel inside the GRETAP interface. The MACsec endpoint addresses are the ones of the parent <code>gretap1</code> interface.</p>
<div class="row"><pre>
<black># on site1's Linux router (<purple>nsra</purple>)</black>
ip netns exec nsra <blue>ip link add link gretap1 macsec1 type macsec encrypt on</blue>
ip netns exec nsra <blue>ip macsec add macsec1 tx sa 0 pn 1 on key 01 11111111111111111111111111111111</blue>
ip netns exec nsra <blue>ip macsec add macsec1 rx address 00:00:00:22:22:22 port 1</blue>
ip netns exec nsra <blue>ip macsec add macsec1 rx address 00:00:00:22:22:22 port 1 sa 0 pn 1 on key 02 22222222222222222222222222222222</blue>
ip netns exec nsra <blue>ip link set macsec1 up</blue>
<black># on site2's Linux router (<purple>nsrb</purple>)</black>
ip netns exec nsrb <blue>ip link add link gretap1 macsec1 type macsec encrypt on</blue>
ip netns exec nsrb <blue>ip macsec add macsec1 tx sa 0 pn 1 on key 02 22222222222222222222222222222222</blue>
ip netns exec nsrb <blue>ip macsec add macsec1 rx address 00:00:00:11:11:11 port 1</blue>
ip netns exec nsrb <blue>ip macsec add macsec1 rx address 00:00:00:11:11:11 port 1 sa 0 pn 1 on key 01 11111111111111111111111111111111</blue>
ip netns exec nsrb <blue>ip link set macsec1 up</blue>
</pre></div>
<p>Last piece of the puzzle is to "force" traffic into the <code>macsec1</code> interface. For that, I'm using a simple Linux bridge, <code>br0</code>, on each of the Linux routers that "bridges" the <code>macsec1</code> interface with the internal LAN interface of the site.
Note that you could achieve the same also with an Open vSwitch (OVS) switch.</p>
<div class="row"><pre>
<black># on site1's Linux router (<purple>nsra</purple>)</black>
ip netns exec nsra <blue>ip link add br0 type bridge</blue>
ip netns exec nsra <blue>ip link set veth11 master br0</blue>
ip netns exec nsra <blue>ip link set macsec1 master br0</blue>
ip netns exec nsra <blue>ip link set br0 up</blue>
<black># on site2's Linux router (<purple>nsrb</purple>)</black>
ip netns exec nsrb <blue>ip link add br0 type bridge</blue>
ip netns exec nsrb <blue>ip link set veth22 master br0</blue>
ip netns exec nsrb <blue>ip link set macsec1 master br0</blue>
ip netns exec nsrb <blue>ip link set br0 up</blue>
</pre></div>
<h2 id="packet-analysis_1">Packet Analysis</h2>
<p>Let's have a quick look the traffic protected by MACsec. Perform a <code>tcpdump</code> packet capture in the <code>wan</code> network namespace, as if someone in the Internet is sniffing the traffic. Save the capture in a file named <code>macsec.pcap</code>:</p>
<div class="row"><pre>
sudo ip netns exec wan tcpdump -i wan1 -w macsec.pcap
</pre></div>
<p>Generate some traffic from <code>host1</code>. Note that I am also clearing the ARP cache since I want to capture also the broadcast ARP Requests:</p>
<div class="row"><pre>
<black># clear ARP cache for host2</black>
sudo ip netns exec host1 arp -d 192.168.1.2
<black># generate some traffic</black>
sudo ip netns exec host1 ping 192.168.1.2
PING 192.168.1.2 (192.168.1.2) 56(84) bytes of data.
64 bytes from 192.168.1.2: icmp_seq=1 ttl=64 time=0.300 ms
64 bytes from 192.168.1.2: icmp_seq=2 ttl=64 time=0.173 ms
64 bytes from 192.168.1.2: icmp_seq=3 ttl=64 time=0.136 ms
^C
--- 192.168.1.2 ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 31ms
rtt min/avg/max/mdev = 0.136/0.203/0.300/0.070 ms
<black># check the ARP cache</black>
sudo ip netns exec host1 arp -an
? (192.168.1.2) at 00:00:00:00:00:02 [ether] on veth1
</pre></div>
<p>Opening the <code>macsec.pcap</code> with Wireshark would show multiple MACsec packets carrying both ARP and ICMP data. MACsec protects all Layer 2 traffic, including broadcasts. Some of the important things to note are included in the diagram below:</p>
<p><a href="/uploads/macsec-over-wan-packet-capture.png" title="MACsec over WAN Packet Capture"><img alt="MACsec Over WAN Packet Capture" src="/uploads/macsec-over-wan-packet-capture.png" title="MACsec over WAN Packet Capture"/></a></p>
<p>Taken the ARP Request broadcast frame as an example, we can see that MACsec inserts a <strong>Security Tag (16 Bytes)</strong> between the original Ethernet header and the Data, then appends an <strong>Integrity Check Value / ICV (16 Bytes)</strong> at the end:</p>
<p><a href="/uploads/macsec-frame-overview.png" title="MACsec Frame Overview"><img alt="MACsec Frame Overview" src="/uploads/macsec-frame-overview.png" title="MACsec Frame Overview"/></a></p>
<h2 id="mtu">MTU</h2>
<div class="row warning">
<div class="col-xs-12 col-sm-2 warning-left">
WARNING<br><i class="icon-fire icon-lg"> </i>
</br></div>
<div class="col-xs-12 col-sm-10 warning-right"> Everytime tunneling is involved, you <red>must</red> consider MTU!</div>
</div>
<p>Everytime tunneling is involved, you <red>must</red> consider MTU. In this scenario, we are using several layers of extra encapsulations, so we <red>must</red> decrease the MTU on the hosts. Let's do the math of what we add extra:</p>
<ul>
<li><strong>Original ETH</strong>: 14 bytes (since this is now transparently bridged over WAN)</li>
<li><strong>MACsec</strong>: 32 bytes</li>
<li><strong>GRETAP</strong>: 4 bytes</li>
<li><strong>IP</strong>: 20 bytes</li>
<li><strong>TOTAL</strong>: 70 bytes overhead</li>
<li><blue>MTU</blue>: 1500 - 70 = <blue>1430</blue></li>
</ul>
<p>We can easily confirm the Path MTU by sending ICMP with DF-bit set from the testing <code>host1</code> namespace. If the calculation is correct, then the maximum ICMP packet that we can send is: 1430 - 20 (IP header) - 8 (ICMP header) = 1402. Anything bigger than <strong>1402</strong> would be dropped:</p>
<div class="row"><pre>
<black># WORKING: Max size of ICMP packets: 1430(MTU)-8(ICMP)-20(IP)=1402</black>
sudo ip netns exec host1 ping -M do -c2 192.168.1.2 -s 1402
PING 192.168.1.2 (192.168.1.2) 1402(<purple>1430</purple>) bytes of data.
1410 bytes from 192.168.1.2: icmp_seq=1 ttl=64 time=0.144 ms
1410 bytes from 192.168.1.2: icmp_seq=2 ttl=64 time=0.149 ms
--- 192.168.1.2 ping statistics ---
2 packets transmitted, 2 received, <green>0% packet loss</green>, time 19ms
rtt min/avg/max/mdev = 0.144/0.146/0.149/0.012 ms
<black># NOT WORKING: size 1403</black>
sudo ip netns exec host1 ping -M do -c2 192.168.1.2 -s 1403
PING 192.168.1.2 (192.168.1.2) 1403(<red>1431</red>) bytes of data.
--- 192.168.1.2 ping statistics ---
2 packets transmitted, 0 received, <red>100% packet loss</red>, time 2ms
</pre></div>
<div class="row warning">
<div class="col-xs-12 col-sm-2 warning-left">
WARNING<br><i class="icon-fire icon-lg"> </i>
</br></div>
<div class="col-xs-12 col-sm-10 warning-right">As you notice in the above output, when size is <b>too big</b>, there is <red>nobody</red> sending back <b>Message too long</b> because everything is tunnelled at Layer-2, there is no Layer 3 in this tunnel! Oversized frames are silently dropped!</div>
</div>
<p>Of course, instead of decreasing the MTU on the clients side, another option would be to increase it on the WAN path. This would be possible if the WAN is a network under your control, but it would be impossible if you use the Internet.</p>
<h2 id="performance">Performance</h2>
<p>It is well known that encryption drastically decreases the network throughput. MACsec is no exception to that. Vendors providing MACsec capabilities claim line-rate performance for it. In Linux, there are efforts to offload MACsec onto NICs, but as far as I know, such work is still in progress.</p>
<p>In my demo, I wanted to see how much does the network performance decreases when uses MACsec. For that, I first wanted to get a <strong>baseline</strong> when the two hosts are connected via WAN with <strong>plain GRE</strong> without any MACsec involved. I have another bash script that creates the <strong>plain GRE</strong> setup and then I'm running iperf, with default options, between the <strong>host</strong> clients.</p>
<div class="row"><pre>
<black># create the demo with plain GRE</black>
sudo ./demo-setup-GREplain.sh
<black># start iperf server on host2</black>
sudo ip netns exec host2 iperf -s
<black># start iperf client on host1</black>
<black># subsequent commands with "P" option tells iperf to use multiple parallel client threads</black>
sudo ip netns exec host1 iperf -c 192.168.1.2
sudo ip netns exec host1 iperf -c 192.168.1.2 -P 3
sudo ip netns exec host1 iperf -c 192.168.1.2 -P 5
sudo ip netns exec host1 iperf -c 192.168.1.2 -P 7
</pre></div>
<p>The reported baseline performance in my case was close to <strong>27 Gbps</strong> for a single thread and up to <strong>100 Gbps</strong> for <code>-P 7</code> (seven client threads):</p>
<div class="row"><pre>
<black># output snippets from the iperf server</black>
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 32.1 GBytes 27.6 Gbits/sec
...
[SUM] 0.0-10.0 sec 81.2 GBytes 69.7 Gbits/sec
...
[SUM] 0.0-10.0 sec 108 GBytes 92.5 Gbits/sec
...
[SUM] 0.0-10.0 sec 117 GBytes 100 Gbits/sec
</pre></div>
<p>Now cleanup this <strong>plain GRE</strong> demo with <code>sudo ./demo-cleanup-GREplain.sh</code>.</p>
<p>Then we start again the MACsec setup demo, we manually configure the proper MTU on clients (<em>that is not part of the demo script</em>), then perform the same iperf tests:</p>
<div class="row"><pre>
<black># create the demo with MACsec over GRETAP</black>
sudo ./demo-setup-MACsecOverWAN.sh
<black># configure proper MTU on clients</black>
<purple>sudo ip netns exec host1 ip link set veth1 mtu 1430</purple>
<purple>sudo ip netns exec host2 ip link set veth2 mtu 1430</purple>
<black># start iperf server on host2</black>
sudo ip netns exec host2 iperf -s
<black># start iperf client on host1</black>
<black># subsequent commands with "P" option tells iperf to use multiple parallel client threads</black>
sudo ip netns exec host1 iperf -c 192.168.1.2
sudo ip netns exec host1 iperf -c 192.168.1.2 -P 3
sudo ip netns exec host1 iperf -c 192.168.1.2 -P 5
sudo ip netns exec host1 iperf -c 192.168.1.2 -P 7
</pre></div>
<p>The results, in my case, were <strong>2 Gbps</strong> for a single client thread and up to <strong>3.5 Gbps</strong> for seven threads (<code>-P 7</code>). Below is a snippet from the <code>iperf -s</code> output:</p>
<div class="row"><pre>
<black># output snippets from the iperf server with MACsec over GRETAP</black>
[ ID] Interval Transfer Bandwidth
[ 4] 0.0-10.0 sec 2.41 GBytes 2.07 Gbits/sec
...
[SUM] 0.0-10.0 sec 3.01 GBytes 2.58 Gbits/sec
...
[SUM] 0.0-10.0 sec 3.79 GBytes 3.25 Gbits/sec
...
[SUM] 0.0-10.0 sec 4.17 GBytes 3.57 Gbits/sec
</pre></div>
<table class="notes">
<tr><td class="notes-left">
<div class="hidden-xs">NOTES</div><i class="icon-book-open icon-sm"> </i>
</td>
<td class="notes-right">
Since everything runs in a single machine, simulated with network namespaces and veth interfaces, the iperf performance depends on how busy and how powerful your Linux machine is.
</td></tr>
</table>
<p>You could try to tweak the Linux network stack as described in <a href="/2016/07/11/performance-review-of-overlay-tunnels-with-openvswitch/#tweaking-the-linux-network-stack">another post</a>, but I did not spend any time on that.</p>
<p>To clear the entire setup, use <code>sudo ./demo-cleanup-MACsecOverWAN.sh</code> script. This concludes an article that I'm happy to have done with a delay of three years from my original <a href="/2016/08/01/macsec-implementation-on-linux/">MACsec on Linux</a> post.</p>
<p><br>
<em>As always, thank you for your interest and looking forward to your comments!</em></br></p>SDN Lesson #2 – Introducing Faucet as an OpenFlow Controller2017-03-07T14:40:00+00:00Costitag:costiser.ro,2017-03-07:2017/03/07/sdn-lesson-2-introducing-faucet-as-an-openflow-controller/<p><span class="dropcap-bg">W</span>elcome back to a new article about SDN - this time introducing an OpenFlow controller called <a href="https://github.com/reannz/faucet" target="_blank"><b>Faucet</b></a>, developed as a RYU application by New Zeeland Research and Education (REANNZ), Waikato University and Victoria University.
In this article, I am not going to write about Faucet's architecture and features since you can read about it on <a href="https://github.com/REANNZ/faucet" target="_blank">its github page</a> and <a href="https://faucet-sdn.blogspot.com" target="_blank">here</a> or <a href="https://faucetsdn.github.io" target="_blank">here</a>. Instead, I will describe the setup used for a demo presented at the <a href="https://inog.net/" target="_blank">Irish Network Operators Group</a> 11th meetup (iNOG::B).</p>
<p>Let me start by saying that, unfortunately, the SDN term has lately become more of a <em>buzz word</em> especially due to the all the presentations and campaigns that incorrectly use (<em>in my opinion</em>) the word <strong>SDN</strong>, prepared by vendors that try to show off their products but that are mostly oriented onto orchestration and automation. </p>
<p>Yes, I had this debate with a lot of people - <purple>what is SDN?</purple> - and I've been called an <strong>SDN puritan</strong> because, in my opinion, SDN is all about <red>separation between control plane and data plane</red> - this is the SDN's first principle and all advantages and flexibility come from it.</p>
<div class="row warning">
<div class="col-xs-12 col-sm-2 warning-left">
WARNING<br><i class="icon-fire icon-lg"> </i>
</br></div>
<div class="col-xs-12 col-sm-10 warning-right"> <b>Please note that throughout all my SDN related articles, I hold on to this statement: SDN means <red>separation of control plane from the data plane</red> !
</b></div>
</div>
<h2 id="what-is-the-target-of-this-demo">What is the target of this demo ?</h2>
<p>The purpose of this demo/article is to present few things that could be achieved with an OpenFlow controller such as Faucet:</p>
<ul>
<li>a quick introduction of Faucet</li>
<li>use Faucet to <strong>manage both physical and a virtual switches</strong> to achieve a common goal</li>
<li><strong>leverage Linux</strong> to offload different network functions to virtual linux containers (<strong>NFV</strong>)</li>
<li>use <strong>OpenFlow</strong> as the southbound protocol (especially its 1.3 version multi-table support)</li>
<li>demonstrate some of Faucet’s features such as PBR, ACLs and Port Mirroring</li>
</ul>
<p><em>Note that I intend to have separate articles about Faucet's multi-table architecture and other features!</em></p>
<h2 id="quick-faucet-overview">Quick Faucet Overview</h2>
<p>Faucet is an open source OpenFlow 1.3 controller, based on RYU framework (so written in Python), with features such as:</p>
<ul>
<li><em>switching</em>: VLANs, MAC learning, ACLs, configurable flooding modes</li>
<li><em>routing</em>: BGP, static routing, ACLs</li>
<li><em>other</em>: port mirroring, PBR, monitoring & statistics (with Gauge)</li>
</ul>
<p>You can get full list of features <a href="https://faucetsdn.github.io/#features">here</a> - note that new features are constantly added!</p>
<p>As with any good piece of software, Faucet follows Python style guides (PEP8) and contains a comprehensive list of tests that could run against both physical and virtual network topologies.</p>
<p>There are already several vendors that support it - Allied Telesis, NoviFlow, HP Enterprise/Aruba, Zodiac FX - and, of course, you can test it with software switches such as Open vSwitch or Lagopus.</p>
<h2 id="demo-overview">Demo Overview</h2>
<p>This setup was created for a demo presented at the <a href="https://inog.net/" target="_blank">Irish Network Operators Group</a> 11th meetup (iNOG::B).</p>
<h3 id="devices-used">Devices Used</h3>
<p>For this demo I have prepared the following devices:</p>
<ul>
<li>a Raspberry Pi (in black case) that will run the <strong>Faucet controller</strong></li>
<li>a Raspberry Pi (in white case) that will perform the <strong>Network Functions in Linux kernel</strong> (NFV)</li>
<li>a Raspberry Pi (in transparent case) that acts as a <strong>test user</strong></li>
<li>a <strong>Zodiac FX (card-size) switch</strong> - the worlds smallest OpenFlow switch</li>
</ul>
<p><br>
<img alt="Introducing Faucet OpenFlow Controller - topology" src="/uploads/inog-faucet-demo-photo.jpg"/>
<br/></br></p>
<table class="notes">
<tr><td class="notes-left">
<div class="hidden-xs">NOTE</div><i class="icon-book-open icon-sm"></i>
</td>
<td class="notes-right">I am aware that you may not have these devices, but don't worry!<br> Scroll down to the <a href="/2017/03/07/sdn-lesson-2-introducing-faucet-as-an-openflow-controller/#virtualize-everything_1">Virtualize Everything</a> section that shows you how to use <b>Mininet</b> to create the same topology on an Ubuntu virtual machine and test the same setup without any extra device! Yay ☺ !
</br></td></tr>
</table>
<p><br>
<ins class="adsbygoogle" data-ad-client="ca-pub-7090359976267134" data-ad-format="auto" data-ad-slot="8618794473" style="display:block"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
<br/></br></p>
<h3 id="topology">Topology</h3>
<p>The final topology of the setup looks like this:</p>
<p><img alt="Introducing Faucet OpenFlow Controller - topology" src="/uploads/inog-faucet-demo-topology.png"/>
<br/></p>
<p>The target is to achieve the following:</p>
<ul>
<li>users (in the bottom of the diagram) are connected to first 2 ports on the Zodiac FX switch</li>
<li><blue>Zodiac FX switch</blue> (blue box in the diagram) trunks all the user traffic on to port 3 towards the NFV server</li>
<li>the <strong>NFV server</strong> (light gray box) is just a Linux box that performs different network functions (read below)</li>
<li>the <strong>Faucet controller</strong> (dark gray box) runs on a separate Linux machine and controls both the physical Zodiac switch and the software Open vSwitch inside the NFV</li>
</ul>
<h3 id="nfv-details">NFV Details</h3>
<p>Probably the most complex element of this demo is the implementation of the NFV. The target is to perform the following Network Functions <strong><em>each inside its own isolated container</em></strong>:</p>
<ul>
<li>run a <blue>DHCP daemon</blue> to serve IP addresses to the users (use <code>dnsmasq</code>)</li>
<li>run a <blue>NAT function</blue> to allow internal users to connect outside (use <code>iptables</code>)</li>
<li>run an <blue>IDS server</blue> that monitors all the traffic (use Faucet's feature for port mirroring)</li>
</ul>
<p>All of these functions run inside network namespaces that provides isolation between them. The connectivity is achieved by creating an OVS switch that is <strong><em>also managed by the external Faucet controller</em></strong>. It is clear that by doing this you have a central location for controlling both physical and logical switches and thus making sure that all of them adhere to the same network and/or security policies.</p>
<p>In order to connect the namespaces to the OVS software switch, we use Virtual Ethernet pipes (<code>veth</code>).<br/>
Physical interface <code>eth2</code>, that connects to the outside world, is moved into the NAT NFV in order to provide Internet connectivity to the test users. (<em>Note that during demo I may have used the wireless <code>wlan0</code> to provide internet access to the setup!</em>)</p>
<h2 id="implementation-details_1">Implementation Details</h2>
<p>The Raspberry Pi's used for this demo run <a href="https://ubuntu-mate.org/download/">Ubuntu Mate</a>. The main reason I opted for Ubuntu Mate versus Raspbian is the predictable interface naming, which was an important factor considering that I used a lot of USB adaptors to get more network connections on the Pi's. <b>Make sure you run <code>sudo apt-get update; sudo apt-get upgrade</code> before installing further packages.</b></p>
<h3 id="faucet">Faucet</h3>
<p>The procedure to install Faucet - the dark gray box in the diagram <em>or</em> the black case Raspberry Pi in the picture - is very very simple:</p>
<div class="row"><pre class="col-md-8">
<black># Packages needed</black>
sudo apt-get install python python-pip
sudo pip install --upgrade pip
<black># Faucet installation</black>
sudo pip install ryu-faucet
</pre></div>
<p>Small preparation and you are good to go:</p>
<div class="row"><pre class="col-md-8">
<black># Create the logging directory for faucet</black>
sudo mkdir -p /var/log/ryu/faucet/
<black># Provide Faucet a basic configuration (yaml file) to start with</black>
cd /etc/ryu/faucet
sudo cp faucet.yaml-dist faucet.yaml
</pre></div>
<p>Use <code>pip show ryu-faucet</code> to see the Faucet's location and then invoke <strong>ryu-manager</strong> (remember, Faucet is a Ryu Application):</p>
<div class="row"><pre class="col-md-12">
sudo ryu-manager --verbose /usr/local/lib/python2.7/dist-packages/ryu_faucet/org/onfsdn/faucet/faucet.py
...
...
CONSUMES EventOFPEchoRequest
CONSUMES EventOFPPortDescStatsReply
EVENT Faucet->Faucet EventFaucetResolveGateways
EVENT Faucet->Faucet EventFaucetHostExpire
EVENT Faucet->Faucet EventFaucetResolveGateways
EVENT Faucet->Faucet EventFaucetResolveGateways
...
<i><CTRL & C></i>
</pre></div>
<p>Additionally (optional but you may want to do this), you can follow <a href="https://faucet-sdn.blogspot.com/2016/10/running-faucet-as-systemd-controlled.html">these instructions</a> to install Faucet as a service.</p>
<p>Now that we have Faucet working, let's adjust its configuration file <code>/etc/ryu/faucet/faucet.yaml</code> for this setup:</p>
<div class="row"><pre class="col-md-7">
version: 2
<purple>vlans:</purple>
10:
name: "lab-10"
unicast_flood: True
max_hosts: 33
20:
name: "lab-20"
unicast_flood: False
999:
name: "IDS"
unicast_flood: False
<purple>acls:</purple>
99:
- rule:
dl_type: 0x800
nw_proto: 17
tp_src: 68
tp_dst: 67
actions:
allow: 1
output:
port: 2
- rule:
actions:
allow: 1
mirror: 4
98:
- rule:
actions:
allow: 1
mirror: 4
<purple>dps:</purple>
zodiac-sw:
<red>dp_id: 0x011111</red>
hardware: "ZodiacFX"
interfaces:
1:
native_vlan: 10
2:
native_vlan: 10
3:
native_vlan: 10
ovs-sw:
<red>dp_id: 0x01</red>
hardware: "Open vSwitch"
interfaces:
1:
native_vlan: 10
acl_in: 99
2:
native_vlan: 10
3:
native_vlan: 10
acl_in: 98
4:
native_vlan: 999
</pre></div>
<p><em>Note that <code>dp_id</code> needs to be adjusted to match your Zodiac's DPID.</em></p>
<p>As you can see above, <strong>ACL 99</strong> catches all DHCP traffic and sends it towards DHCP NFV (<em>first rule</em>) and then it does <em>port-mirroring</em> by sending a copy of all traffic seen on the port it is applied to the IDS on port 4. Same for <strong>ACL 98</strong>, it mirros return traffic from the internet onto IDS port-4.</p>
<h3 id="packet-walk">Packet walk</h3>
<p>Here is a brief description of how the traffic should work:</p>
<ul>
<li>Users are connected to ports 1 and 2 on the Zodiac FX switch</li>
<li>As per rule defined in Faucet's configuration file, all DHCP traffic from hosts is forwarded on toward the DHCP NFV (this is <blue>PBR implementation</blue>). DHCP provides an IP address and default gateway (<strong>10.0.0.1</strong>) that exists on the NAT NFV internal interface</li>
<li>After DHCP is successful, traffic gets forwarded by the Zodiac FX and the OVS switch <blue><em>based on flows installed by Faucet</em></blue> (its multi-table architecture is described <a href="https://github.com/REANNZ/faucet#openflow-pipeline">here</a> and <a href="https://inside-openflow.com/2016/09/16/dissecting-faucet-pipeline/">here</a>)</li>
<li>NAT NFV has forwarding enabled and uses <code>iptables</code> to <em>masquerade</em> internal traffic from users behind the IP used on the link that connects to the outside world</li>
<li>The Faucet's configuration also contains a <strong>port mirroring</strong> rule (which is translated by Faucet into an OpenFlow rule or flow) that sends a copy all traffic seen on OVS ports 1 and 3 to the port 4 where IDS is connected.</li>
</ul>
<h3 id="nfv">NFV</h3>
<p>As already mentioned above, each distinct network function will run inside a network namespace. <a href="/2016/06/26/my-sdn-testbed/#leveraging-linux-to-create-an-sdn-testbed_1">This article</a> may give you more information about namespaces and veth interfaces and how to set them up. For this demo you don't have to do any of that because I made a bash script that creates all the namespaces, the veth interfaces and the connectivity between them:</p>
<div class="row"><pre class="col-md-12">
#!/bin/bash
<black>#set -x
echo "[info] Cleanup everything..."</black>
ip netns | xargs -r -t -n 1 ip netns del
ip link | egrep -o "[dhcpnatids]{3,4}-eth[01]" | xargs -r -t -n 1 ip link del
<black>echo "[info] Delete existing OVS bridge..."</black>
ovs-vsctl del-br ovs-sw
sleep 2
<black>echo ""</black>
<black>echo "[info] Adding network namespaces..."</black>
ip netns add dhcp-ns
ip netns add nat-ns
ip netns add ids-ns
<black>echo "[info] Creating virtual ethernet interfaces..."</black>
ip link add name dhcp-eth0 type veth peer name dhcp-eth1
ip link add name nat-eth0 type veth peer name nat-eth1
ip link add name ids-eth0 type veth peer name ids-eth1
<black>echo "[info] Adding veth to network namespaces..."</black>
ip link set dev dhcp-eth0 netns dhcp-ns
ip link set dev nat-eth0 netns nat-ns
ip link set dev ids-eth0 netns ids-ns
<black>echo "[info] Set interfaces up in the root namespace..."</black>
ifconfig dhcp-eth1 0 up
ifconfig nat-eth1 0 up
ifconfig ids-eth1 0 up
<black>echo "[info] Set interfaces up and assign IP in the NFVs..."</black>
ip netns exec dhcp-ns /sbin/ifconfig dhcp-eth0 10.0.0.254/24 up
ip netns exec nat-ns /sbin/ifconfig nat-eth0 10.0.0.1/24 up
ip netns exec ids-ns /sbin/ifconfig ids-eth0 0 up
ip netns exec dhcp-ns /sbin/ip addr add 1234::ff/64 dev dhcp-eth0
ip netns exec nat-ns /sbin/ip addr add 1234::1/64 dev nat-eth0
ip netns exec nat-ns /sbin/ip addr add 1111::1/64 dev lo:1
ip netns exec nat-ns /sbin/ip addr add 1.1.1.1/64 dev lo:1
<black>echo "[info] Add a new OVS switch and ports..."</black>
ovs-vsctl add-br ovs-sw
ovs-vsctl add-port ovs-sw eth1 -- set Interface eth1 ofport_request=1
ovs-vsctl add-port ovs-sw dhcp-eth1 -- set Interface dhcp-eth1 ofport_request=2
ovs-vsctl add-port ovs-sw nat-eth1 -- set Interface nat-eth1 ofport_request=3
ovs-vsctl add-port ovs-sw ids-eth1 -- set Interface ids-eth1 ofport_request=4
ip link set eth1 up
<black>echo "[info] Add Faucet as controller for the OVS switch..."</black>
ovs-vsctl set-controller ovs-sw tcp:192.168.124.10:6633
<black>echo "[info] Add WAN link to the NAT NFV..."</black>
ifdown eth2
ip link set dev eth2 netns nat-ns
<gray># [TODO] change to manual dhcp client instead of relying on ifup scripts</gray>
ip netns exec nat-ns /sbin/ifup eth2
<black>echo "[info] Start dnsmasq (DHCP server) in DHCP NFV..."</black>
ip netns exec dhcp-ns /usr/sbin/dnsmasq
<black>echo "[info] Enable NAT MASQUERADE in NAT NFV..."</black>
ip netns exec nat-ns /sbin/iptables -t nat -A POSTROUTING -o eth2 -j MASQUERADE
<black>echo "[info] Enable forwarding in NAT NFV..."</black>
ip netns exec nat-ns /sbin/sysctl -q -w net.ipv4.ip_forward=1
<gray># [TODO] to be enabled when NAT66 works
# ip netns exec nat-ns sysctl -q -w net.ipv6.conf.all.forwarding=1</gray>
</pre></div>
<p>As you see in the above script, I rely on <code>ifup/ifdown</code> scripts to configure some of the interfaces. For this to work, I have configured the <code>/etc/network/interfaces</code> file like this:</p>
<div class="row"><pre class="col-md-7">
source-directory /etc/network/interfaces.d
auto lo
iface lo inet loopback
<gray># Builtin Ethernet port
# OpenFlow connection
# controller on 192.168.124.10</gray>
auto eth0
iface eth0 inet static
address 192.168.124.124
netmask 255.255.255.0
<gray># Top-Left USB
# DataPlane connection
# added to the OVS (no config needed)
#iface eth1 inet manual</gray>
auto eth1
iface eth1 inet static
address 0.0.0.0
<gray># Bottom-Left USB
# WAN/Internet (Outside link)
# [TODO] Do DHCP inside network namespace</gray>
auto eth2
iface eth2 inet static
address 0.0.0.0
#iface eth2 inet dhcp
<gray># Top-Right USB
# OOB management</gray>
auto eth3
iface eth3 inet static
address 192.168.1.3
netmask 255.255.255.0
</pre></div>
<p>On the DHCP NFV, I run <code>dnsmasq</code> as a dhcp/dns server (<code>sudo apt-get install dnsmasq</code> to install it) and below is the basic configuration (file <code>/etc/dnsmasq.conf</code>):</p>
<div class="row"><pre class="col-md-12">
domain-needed
bogus-priv
no-resolv
server=8.8.8.8
local=/lab/
interface=dhcp-eth0
except-interface=lo
bind-interfaces
domain=lab
dhcp-range=10.0.0.10,10.0.0.199,255.255.255.0,6h
dhcp-range=1234::100, 1234::1ff, 64, 6h
dhcp-option=option:router,10.0.0.1
dhcp-option=6,8.8.8.8,8.8.4.4
dhcp-option=option6:dns-server,[2001:4860:4860::8888],[2001:4860:4860::8844]
cache-size=2048
log-queries
log-dhcp
log-facility=/var/log/dnsmasq.log
log-async
</pre></div>
<p><bri></bri></p>
<p><br>
<ins class="adsbygoogle" data-ad-client="ca-pub-7090359976267134" data-ad-format="auto" data-ad-slot="8618794473" style="display:block"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
<br/></br></p>
<h3 id="putting-it-all-together">Putting it all together</h3>
<p>At this moment, connecting everything as per the diagram above should allow the <em>test user</em> to get an IPv4 address (with default gateway and dns server information) and to connect to the internet.</p>
<p>Some troubleshooting is expected so jump to the <a href="/2017/03/07/sdn-lesson-2-introducing-faucet-as-an-openflow-controller/#troubleshooting_1">end of this article</a> to see how you can verify your setup.</p>
<h2 id="virtualize-everything_1">Virtualize Everything</h2>
<p>Not everybody has the devices needed for this setup... but you can create everything on a Linux box using namespaces, veth, OVS, etc. Or you can use <strong>Mininet</strong> that does all this for you. Follow the procedure below - it should take you less than one hour to have everything working.</p>
<table class="notes">
<tr><td class="notes-left">
<div class="hidden-xs">NOTE</div><i class="icon-book-open icon-sm"></i>
</td>
<td class="notes-right">Note that creating the entire network/setup in a virtualized environment gives administrators the possibility to test the OpenFlow rules installed by Faucet and to validate end-to-end connectivity <b><i>before launching them in the production environment</i></b>. This is what software engineers call <b>Push On Green</b>!
</td></tr>
</table>
<h4 id="1-prepare-a-vagrant-box">1. Prepare a Vagrant Box</h4>
<p>In order to be consistent (and avoid problems due to differences between package versions, etc), I suggest you create a vagrant box running Ubuntu 16.04 (details were already provided in a <a href="/2016/06/26/my-sdn-testbed/#setup">previous post</a>) then make sure that you update your new virtual box.</p>
<ul>
<li>
<p>on your laptop/desktop, prepare Vagrant (it assumes that you have VirtualBox installed):<br/>
<div class="row"><pre class="col-md-12">
mkdir -p ~/vagrantwork/box1; cd ~/vagrantwork/box1
<black># Boot your new box up</black>
cat >Vagrantfile <<EOF
Vagrant.configure(2) do |config|
config.vm.box = "ubuntu/xenial64"
config.vm.box_check_update = false
config.vm.network "private_network", ip: "192.168.56.<purple>11</purple>"
#config.vm.network "public_network", bridge: "en0: Wi-Fi (AirPort)", auto_config: false
config.vm.network "public_network", bridge: "en5: Thunderbolt Ethernet", auto_config: false
config.vm.provider "virtualbox" do |vb|
#vb.gui = true
vb.name = "vagrant_<purple>box1</purple>"
end
config.ssh.forward_x11 = true
end
EOF
<black># Boot your new box up</black>
vagrant up
<black># Connect to it</black>
vagrant ssh
</pre></div> </p>
</li>
<li>
<p>on your new Ubuntu vagrant box:<br/>
<div class="row"><pre class="col-md-7">
sudo apt-get update
sudo apt-get upgrade
</pre></div> </p>
</li>
</ul>
<h4 id="2-install-the-necessary-packages">2. Install the necessary packages</h4>
<div class="row"><pre class="col-md-7">
sudo apt-get install python mininet python-pip dnsmasq
sudo apt-get install python-pip
sudo pip install --upgrade pip
sudo pip install ryu-faucet
</pre></div>
<h4 id="3-faucet-configuration">3. Faucet configuration</h4>
<p>To make Faucet work you need to create the location for the logs <code>sudo mkdir -p /var/log/ryu/faucet/</code> and then create its config file <code>vi /etc/ryu/faucet/faucet.yaml</code> - copy/paste the configuration presented in the <a href="/2017/03/07/sdn-lesson-2-introducing-faucet-as-an-openflow-controller/#faucet"><em>Faucet</em> section above</a>.</p>
<p>You have the option of running it from command line or create the necessary scripts to make it run as a service (instructions <a href="https://faucet-sdn.blogspot.com/2016/10/running-faucet-as-systemd-controlled.html">here</a>).</p>
<h4 id="4-dnsmasq-configuration">4. dnsmasq configuration</h4>
<p><code>dnsmasq</code> should already be installed (as part of step 2) so you only have to adjust its config file <code>/etc/dnsmasq.conf</code> to match the demo (see <a href="/2017/03/07/sdn-lesson-2-introducing-faucet-as-an-openflow-controller/#nfv"><em>NFV Section</em> above</a>)</p>
<h4 id="5-mininet-topology">5. Mininet topology</h4>
<p>Last step is to run a Mininet topology that creates all the devices as per the diagram shown at the top of this article. I already prepared this mininet file and shared it via github:</p>
<div class="row"><pre class="col-md-7">
git clone https://github.com/costiser/faucet.git
cd faucet/inog-demo
</pre></div>
<p>Additionally, I am copying Mininet's <code>m</code> utility from <a href="https://github.com/mininet/mininet/blob/master/util/m">here</a> but I change the penultimate line to this:<br/>
<code>cmd="exec sudo TERM=xterm-color debian_chroot=$host mnexec $cg -a $pid $cmd"</code> </p>
<p>So you should have these files in the same folder - <strong>you must run all these commands with <red>sudo</red></strong>:</p>
<div class="row"><pre class="col-md-10">
costiser@costi ~/faucet/inog-demo$ ls -l
total 32
-rwxr-xr-x 1 costi costi 429 Mar 6 18:06 cleanup.sh
-rwxr-xr-x 1 costi costi 2 Mar 7 14:14 m
-rwxr-xr-x 1 costi costi 5582 Mar 6 18:07 mininet_faucet_demo.py
</pre></div>
<p><br/></p>
<ul>
<li><code>cleanup.sh</code> is an additional bash script that cleans your setup for potential Mininet leftovers or namespaces/links that were not cleaned upon exiting. Run this script to make sure you start clean.</li>
<li><code>m</code> is Mininet's utility that I use to connect to Mininet hosts from external terminals</li>
<li><code>mininet_faucet_demo.py</code> is the topology file that I created to start the virtual network</li>
</ul>
<p><br/></p>
<p>To run the setup use <code>sudo ./mininet_faucet_demo.py</code></p>
<p>Then you use <code>m</code> utility to connect to Mininet nodes in order to run troubleshooting commands <strong><em>inside each node</em></strong>:</p>
<div class="row"><pre class="col-md-7">
ubuntu@xenial ~/faucet$ <purple>sudo ./m nat</purple>
<green>(nat)</green>root@xenial:~/faucet#
<green>(nat)</green>root@xenial:~/faucet#
</pre></div>
<h2 id="troubleshooting_1">Troubleshooting</h2>
<p>This section is common to both physical setup (using the Zodiac FX switch) and to the virtualized setup (where all devices are emulated with Mininet). Troubleshooting is the most important part as it will help you understand better how all things work together. Since this article is already too long, I'm only listing the commands used to troubleshoot with the promise that I will explain each of them in a future article.</p>
<ul>
<li><code>tcpdump</code> on the incoming port <strong>eth1</strong> on the NFV server should display all traffic (DHCP, internet connectivity, etc)</li>
<li><code>tcpdump</code> on the <strong>dhcp-eth0</strong> interface, <purple>inside the DHCP NFV</purple> should display all incoming DHCP Requests from the test users and outgoing DHCP Replies from the server</li>
<li><code>/var/log/dnsmasq.log</code> on the <strong>DHCP NFV</strong> should show all the leases assigned</li>
<li><code>iptables -nvL -t nat</code> on the <strong>NAT NFV</strong> should display the <em>MASQUARADE</em> rule</li>
<li><code>conntrack -L</code> on the <strong>NAT NFV</strong> should display the connection table (where you see the NAT entries)</li>
<li><code>tcpdump</code> on the <strong>IDS NFV</strong> should display all the traffic traversing the OVS switch between ports 1 and 3, in both directions</li>
<li><code>/var/log/ryu/faucet/faucet.log</code> should display that Zodiac FX and the OVS switches successfully talk to Faucet controller</li>
</ul>
<p><br>
<em>As always, thank you for your interest and comments! Good luck with your setup!</em></br></p>Quiz #25 – Troubleshooting IPsec Authentication Headers (AH)2016-08-13T00:35:00+01:00Costitag:costiser.ro,2016-08-13:2016/08/13/quiz-25/<p><span class="dropcap-bg">Y</span>our company has an IPsec tunnel with another company for achieving network connectivity between servers in <code>10.10.10.0/24</code> on your side to <code>10.20.20.0/24</code> on theirs.<br/>
Lately they complained that their equipment has problems dealing with ESP and requested to migrate this existing IPsec tunnel from <strong>Encapsulating Security Payloads (ESP)</strong> to <strong>Authentication Headers (AH)</strong>, since encryption/confidentiality was never a requirement for this tunnel. <em>Please note that network connectivity is ok in the initial state, using ESP</em>.
<br><br/></br></p>
<p><a href="/uploads/quiz-25.png" title="Quiz 25 - Troubleshooting IPsec AHl"><img alt="Quiz 25 - Troubleshooting IPsec AH" src="/uploads/quiz-25.png" title="Quiz 25 - Troubleshooting IPsec AH"/></a></p>
<p><br>
The only change that you did was adding a new transform set <code>T_SET_AH</code> to use <code>ah-md5-hmac</code> and updating the <code>crypto map</code> to use it.<br/>
Together with the partner company you establish a maintenance window and perform the migration from ESP to AH. The new tunnel using AH comes up, but unfortunately <red>you do not have network connectivity between the internal networks</red>. </br></p>
<p>You start troubleshooting and conclude the following: </p>
<ul>
<li>ISAKMP phase gets established</li>
<li>IPSEC phase gets established</li>
<li>the new parameters (AH & MD5) are agreed correctly</li>
<li>the protected internal subnets <code>10.10.10.0/24</code> and <code>10.20.20.0/24</code> are agreed correctly</li>
<li><red>but there is no network connectivity between them</red></li>
<li>you see packets being sent but nothing received back</li>
</ul>
<div class="row"><pre class="col-md-10">
<black>! No connectivity</black>
R1#<blue>ping 10.20.20.20 source 10.10.10.10</blue>
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.20.20.20, timeout is 2 seconds:
Packet sent with a source address of 10.10.10.10
<red>.....
Success rate is 0 percent (0/5)</red>
R1#
<black>! Phase 1 OK</black>
R1#<blue>sh crypto isakmp sa</blue>
IPv4 Crypto ISAKMP SA
dst src state conn-id slot status
44.44.44.44 12.12.12.1 QM_IDLE 1001 0 ACTIVE
IPv6 Crypto ISAKMP SA
<black>! Phase 2 OK<black>
R1#<blue>sh crypto ipsec sa detail</blue>
interface: FastEthernet0/0
Crypto map tag: IPSEC_VPN, local addr 12.12.12.1
protected vrf: (none)
<green>local ident (addr/mask/prot/port): (10.10.10.0/255.255.255.0/0/0)
remote ident (addr/mask/prot/port): (10.20.20.0/255.255.255.0/0/0)</green>
current_peer 44.44.44.44 port 4500
PERMIT, flags={origin_is_acl,}
<green>#pkts encaps: 5, #pkts encrypt: 5, #pkts digest: 5</green>
<red>#pkts decaps: 0, #pkts decrypt: 0, #pkts verify: 0</red>
#pkts compressed: 0, #pkts decompressed: 0
#pkts not compressed: 0, #pkts compr. failed: 0
#pkts not decompressed: 0, #pkts decompress failed: 0
#pkts no sa (send) 5, #pkts invalid sa (rcv) 0
#pkts encaps failed (send) 0, #pkts decaps failed (rcv) 0
#pkts invalid prot (recv) 0, #pkts verify failed: 0
#pkts invalid identity (recv) 0, #pkts invalid len (rcv) 0
#pkts replay rollover (send): 0, #pkts replay rollover (rcv) 0
##pkts replay failed (rcv): 0
#pkts internal err (send): 0, #pkts internal err (recv) 0
<purple>local crypto endpt.: 12.12.12.1, remote crypto endpt.: 44.44.44.44</purple>
path mtu 1500, ip mtu 1500, ip mtu idb FastEthernet0/0
current outbound spi: 0xDEE2A85C(3739396188)
inbound esp sas:
<green>inbound ah sas:
spi: 0x7612ECC1(1980951745)
transform: ah-md5-hmac ,
in use settings ={Tunnel UDP-Encaps, }
conn id: 1, flow_id: SW:1, crypto map: IPSEC_VPN
sa timing: remaining key lifetime (k/sec): (4530158/3488)
replay detection support: Y
Status: ACTIVE</green>
inbound pcp sas:
outbound esp sas:
<green>outbound ah sas:
spi: 0xDEE2A85C(3739396188)
transform: ah-md5-hmac ,
in use settings ={Tunnel UDP-Encaps, }
conn id: 2, flow_id: SW:2, crypto map: IPSEC_VPN</green>
</black></black></pre></div>
<p><strong><em>What is wrong ?</em></strong> <em>Why everything works if the IPsec tunnel uses ESP but it is <strong>not</strong> working with AH?</em></p>
<p>Even though you requested your peer network engineer working for the Partner Company to give you the config from his router (<strong>R3</strong>), due to internal policies, he rejected that !... but he confirms your above conclusion on his side: <em>tunnel goes up but no connectivity</em> ! </p>
<p>Eventually, you enable crypto debugging for both phases. You also bring in a senior network consultant. He looks at the <code>show crypto</code> outputs and the <code>debug crypto</code> below and concludes that <purple><em>you cannot use AH between the two companies, so you'll have to keep ESP !</em></purple> </p>
<p>Here is the debug output:</p>
<div class="row"><pre style="height: 400px;">R1#<purple>deb crypto isakmp</purple>
Crypto ISAKMP debugging is on
R1#
R1#<purple>deb crypto ipsec</purple>
Crypto IPSEC debugging is on
R1#
R1#sh crypto isakmp sa
IPv4 Crypto ISAKMP SA
dst src state conn-id slot status
IPv6 Crypto ISAKMP SA
R1#
R1#
R1#
R1#ping 10.20.20.20 sou
R1#ping 10.20.20.20 source 10.10.10.10
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 10.20.20.20, timeout is 2 seconds:
Packet sent with a source address of 10.10.10.10
*Mar 1 00:02:06.175: IPSEC(sa_request): ,
(key eng. msg.) OUTBOUND local= 12.12.12.1, remote= 44.44.44.44,
local_proxy= 10.10.10.0/255.255.255.0/0/0 (type=4),
remote_proxy= 10.20.20.0/255.255.255.0/0/0 (type=4),
protocol= AH, transform= ah-md5-hmac (Tunnel),
lifedur= 3600s and 4608000kb,
spi= 0x0(0), conn_id= 0, keysize= 0, flags= 0x0
*Mar 1 00:02:06.179: ISAKMP:(0): SA request profile is (NULL)
*Mar 1 00:02:06.183: ISAKMP: Created a peer struct for 44.44.44.44, peer port 500
*Mar 1 00:02:06.183: ISAKMP: New peer created peer = 0x663A6274 peer_handle = 0x80000002
*Mar 1 00:02:06.187: ISAKMP: Locking peer struct 0x663A6274, refcount 1 for isakmp_initiator
*Mar 1 00:02:06.191: ISAKMP: local port 500, remote port 500
*Mar 1 00:02:06.191: ISAKMP: set new node 0 to QM_IDLE
*Mar 1 00:02:06.191: insert sa successfully sa = 66981718
*Mar 1 00:02:06.191: ISAKMP:(0):Can not start Aggressive mode, trying Main mode.
*Mar 1 00:02:06.195: ISAKMP:(0):found peer pre-shared key matching 44.44.44.44
*Mar 1 00:02:06.203: ISAKMP:(0): constructed NAT-T vendor-rfc3947 ID
*Mar 1 00:02:06.203: ISAKMP:(0): constructed NAT-T vendor-07 ID
*Mar 1 00:02:06.207: ISAKMP:(0): constructed NAT-T vendor-03 ID
*Mar 1 00:02:06.207: ISAKMP:(0): constructed NAT-T vendor-02 ID
*Mar 1 00:02:06.207: ISAKMP:(0):Input = IKE_MESG_FROM_IPSEC, IKE_SA_REQ_MM
*Mar 1 00:02:06.207: ISAKMP:(0):Old State = IKE_READY New State = IKE_I_M.M1
*Mar 1 00:02:06.207: ISAKMP:(0): beginning Main Mode exchange
*Mar 1 00:02:06.207: ISAKMP:(0): sending packet to 44.44.44.44 my_port 500 peer_port 500 (I) MM_NO_STATE
*Mar 1 00:02:06.207: ISAKMP:(0):Sending an IKE IPv4 Packet.....
Success rate is 0 percent (0/5)
R1#
*Mar 1 00:02:16.207: ISAKMP:(0): retransmitting phase 1 MM_NO_STATE...
*Mar 1 00:02:16.207: ISAKMP (0:0): incrementing error counter on sa, attempt 1 of 5: retransmit phase 1
*Mar 1 00:02:16.207: ISAKMP:(0): retransmitting phase 1 MM_NO_STATE
*Mar 1 00:02:16.207: ISAKMP:(0): sending packet to 44.44.44.44 my_port 500 peer_port 500 (I) MM_NO_STATE
*Mar 1 00:02:16.207: ISAKMP:(0):Sending an IKE IPv4 Packet.
R1#
*Mar 1 00:02:26.207: ISAKMP:(0): retransmitting phase 1 MM_NO_STATE...
*Mar 1 00:02:26.211: ISAKMP (0:0): incrementing error counter on sa, attempt 2 of 5: retransmit phase 1
*Mar 1 00:02:26.215: ISAKMP:(0): retransmitting phase 1 MM_NO_STATE
*Mar 1 00:02:26.215: ISAKMP:(0): sending packet to 44.44.44.44 my_port 500 peer_port 500 (I) MM_NO_STATE
*Mar 1 00:02:26.219: ISAKMP:(0):Sending an IKE IPv4 Packet.
*Mar 1 00:02:26.331: ISAKMP (0:0): received packet from 44.44.44.44 dport 500 sport 500 Global (I) MM_NO_STATE
*Mar 1 00:02:26.347: ISAKMP:(0):Input = IKE_MESG_FROM_PEER, IKE_MM_EXCH
*Mar 1 00:02:26.351: ISAKMP:(0):Old State = IKE_I_MM1 New State = IKE_I_MM2
*Mar 1 00:02:26.363: ISAKMP:(0): processing SA payload. message ID = 0
*Mar 1 00:02:26.363: ISAKMP:(0): processing vendor id payload
*Mar 1 00:02:26.363: ISAKMP:(0): vendor ID seems Unity/DPD but major 245 mismatch
*Mar 1 00:02:26.363: ISAKMP (0:0): vendor ID is NAT-T v7
*Mar 1 00:02:26.363: ISAKMP:(0):found peer pre-shared key matching 44.44.44.44
*Mar 1 00:02:26.363: ISAKMP:(0): local preshared key found
*Mar 1 00:02:26.363: ISAKMP : Scanning profiles for xauth ...
*Mar 1 00:02:26.363: ISAKMP:(0):Checking ISAKMP transform 1 against priority 10 policy
*Mar 1 00:02:26.363: ISAKMP: encryption DES-CBC
*Mar 1 00:02:26.363: ISAKMP: hash SHA
*Mar 1 00:02:26.363: ISAKMP: default group 1
*Mar 1 00:02:26.363: ISAKMP: auth pre-share
*Mar 1 00:02:26.363: ISAKMP: life type in seconds
*Mar 1 00:02:26.363: ISAKMP: life duration (VPI) of 0x0 0x1 0x51 0x80
*Mar 1 00:02:26.367: ISAKMP:(0):atts are acceptable. Next payload is 0
*Mar 1 00:02:26.367: ISAKMP:(0):Acceptable atts:actual life: 0
*Mar 1 00:02:26.371: ISAKMP:(0):Acceptable atts:life: 0
*Mar 1 00:02:26.371: ISAKMP:(0):Fill atts in sa vpi_length:4
*Mar 1 00:02:26.375: ISAKMP:(0):Fill atts in sa life_in_seconds:86400
*Mar 1 00:02:26.379: ISAKMP:(0):Returning Actual lifetime: 86400
*Mar 1 00:02:26.379: ISAKMP:(0)::Started lifetime timer: 86400.
*Mar 1 00:02:26.379: ISAKMP:(0): processing vendor id payload
*Mar 1 00:02:26.379: ISAKMP:(0): vendor ID seems Unity/DPD but major 245 mismatch
*Mar 1 00:02:26.379: ISAKMP (0:0): vendor ID is NAT-T v7
*Mar 1 00:02:26.379: ISAKMP:(0):Input = IKE_MESG_INTERNAL, IKE_PROCESS_MAIN_MODE
*Mar 1 00:02:26.379: ISAKMP:(0):Old State = IKE_I_MM2 New State = IKE_I_MM2
*Mar 1 00:02:26.383: ISAKMP:(0): sending packet to 44.44.44.44 my_port 500 peer_port 500 (I) MM_SA_SETUP
*Mar 1 00:02:26.387: ISAKMP:(0):Sending an IKE IPv4 Packet.
*Mar 1 00:02:26.391: ISAKMP:(0):Input = IKE_MESG_INTERNAL, IKE_PROCESS_COMPLETE
*Mar 1 00:02:26.391: ISAKMP:(0):Old State = IKE_I_MM2 New State = IKE_I_MM3
*Mar 1 00:02:26.491: ISAKMP (0:0): received packet from 44.44.44.44 dport 500 sport 500 Global (I) MM_SA_SETUP
*Mar 1 00:02:26.495: ISAKMP:(0):Input = IKE_MESG_FROM_PEER, IKE_MM_EXCH
*Mar 1 00:02:26.499: ISAKMP:(0):Old State = IKE_I_MM3 New State = IKE_I_MM4
*Mar 1 00:02:26.511: ISAKMP:(0): processing KE payload. message ID = 0
*Mar 1 00:02:26.531: ISAKMP:(0): processing NONCE payload. message ID = 0
*Mar 1 00:02:26.531: ISAKMP:(0):found peer pre-shared key matching 44.44.44.44
*Mar 1 00:02:26.531: ISAKMP:(1001): processing vendor id payload
*Mar 1 00:02:26.531: ISAKMP:(1001): vendor ID is Unity
*Mar 1 00:02:26.531: ISAKMP:(1001): processing vendor id payload
*Mar 1 00:02:26.531: ISAKMP:(1001): vendor ID is DPD
*Mar 1 00:02:26.531: ISAKMP:(1001): processing vendor id payload
*Mar 1 00:02:26.531: ISAKMP:(1001): speaking to another IOS box!
*Mar 1 00:02:26.531: ISAKMP (0:1001): NAT found, the node outside NAT
*Mar 1 00:02:26.531: ISAKMP:(1001):Input = IKE_MESG_INTERNAL, IKE_PROCESS_MAIN_MODE
*Mar 1 00:02:26.531: ISAKMP:(1001):Old State = IKE_I_MM4 New State = IKE_I_MM4
*Mar 1 00:02:26.531: ISAKMP:(1001):Send initial contact
*Mar 1 00:02:26.531: ISAKMP:(1001):SA is doing pre-shared key authentication using id type ID_IPV4_ADDR
*Mar 1 00:02:26.531: ISAKMP (0:1001): ID payload
next-payload : 8
type : 1
address : 12.12.12.1
protocol : 17
port : 0
length : 12
*Mar 1 00:02:26.531: ISAKMP:(1001):Total payload length: 12
*Mar 1 00:02:26.531: ISAKMP:(1001): sending packet to 44.44.44.44 my_port 4500 peer_port 4500 (I) MM_KEY_EXCH
*Mar 1 00:02:26.531: ISAKMP:(1001):Sending an IKE IPv4 Packet.
*Mar 1 00:02:26.531: ISAKMP:(1001):Input = IKE_MESG_INTERNAL, IKE_PROCESS_COMPLETE
*Mar 1 00:02:26.535: ISAKMP:(1001):Old State = IKE_I_MM4 New State = IKE_I_MM5
*Mar 1 00:02:26.639: ISAKMP (0:1001): received packet from 44.44.44.44 dport 4500 sport 4500 Global (I) MM_KEY_EXCH
*Mar 1 00:02:26.647: ISAKMP:(1001): processing ID payload. message ID = 0
*Mar 1 00:02:26.647: ISAKMP (0:1001): ID payload
next-payload : 8
type : 1
address : 192.168.1.2
protocol : 17
port : 0
length : 12
*Mar 1 00:02:26.655: ISAKMP:(0):: peer matches *none* of the profiles
*Mar 1 00:02:26.655: ISAKMP:(1001): processing HASH payload. message ID = 0
*Mar 1 00:02:26.663: ISAKMP:(1001):SA authentication status:
authenticated
*Mar 1 00:02:26.663: ISAKMP:(1001):SA has been authenticated with 44.44.44.44
*Mar 1 00:02:26.663: ISAKMP: Trying to insert a peer 12.12.12.1/44.44.44.44/4500/, and inserted successfully 663A6274.
*Mar 1 00:02:26.663: ISAKMP:(1001):Input = IKE_MESG_FROM_PEER, IKE_MM_EXCH
*Mar 1 00:02:26.663: ISAKMP:(1001):Old State = IKE_I_MM5 New State = IKE_I_MM6
*Mar 1 00:02:26.663: ISAKMP:(1001):Input = IKE_MESG_INTERNAL, IKE_PROCESS_MAIN_MODE
*Mar 1 00:02:26.663: ISAKMP:(1001):Old State = IKE_I_MM6 New State = IKE_I_MM6
*Mar 1 00:02:26.663: ISAKMP:(1001):Input = IKE_MESG_INTERNAL, IKE_PROCESS_COMPLETE
*Mar 1 00:02:26.663: ISAKMP:(1001):<purple>Old State = IKE_I_MM6 New State = IKE_P1_COMPLETE</purple>
*Mar 1 00:02:26.663: ISAKMP:(1001):beginning Quick Mode exchange, M-ID of 2112503013
*Mar 1 00:02:26.663: ISAKMP:(1001):QM Initiator gets spi
*Mar 1 00:02:26.671: ISAKMP:(1001): sending packet to 44.44.44.44 my_port 4500 peer_port 4500 (I) QM_IDLE
*Mar 1 00:02:26.675: ISAKMP:(1001):Sending an IKE IPv4 Packet.
*Mar 1 00:02:26.679: ISAKMP:(1001):Node 2112503013, Input = IKE_MESG_INTERNAL, IKE_INIT_QM
*Mar 1 00:02:26.679: ISAKMP:(1001):Old State = IKE_QM_READY New State = IKE_QM_I_QM1
*Mar 1 00:02:26.679: ISAKMP:(1001):Input = IKE_MESG_INTERNAL, IKE_PHASE1_COMPLETE
*Mar 1 00:02:26.679: ISAKMP:(1001):Old State = IKE_P1_COMPLETE New State = IKE_P1_COMPLETE
*Mar 1 00:02:26.775: ISAKMP (0:1001): received packet from 44.44.44.44 dport 4500 sport 4500 Global (I) QM_IDLE
*Mar 1 00:02:26.783: ISAKMP:(1001): processing HASH payload. message ID = 2112503013
*Mar 1 00:02:26.787: ISAKMP:(1001): processing SA payload. message ID = 2112503013
*Mar 1 00:02:26.787: ISAKMP:(1001):Checking IPSec proposal 1
*Mar 1 00:02:26.791: ISAKMP: transform 1, AH_MD5
*Mar 1 00:02:26.791: ISAKMP: attributes in transform:
*Mar 1 00:02:26.791: ISAKMP: encaps is 3 (Tunnel-UDP)
*Mar 1 00:02:26.795: ISAKMP: SA life type in seconds
*Mar 1 00:02:26.795: ISAKMP: SA life duration (basic) of 3600
*Mar 1 00:02:26.795: ISAKMP: SA life type in kilobytes
*Mar 1 00:02:26.795: ISAKMP: SA life duration (VPI) of 0x0 0x46 0x50 0x0
*Mar 1 00:02:26.795: ISAKMP: authenticator is HMAC-MD5
*Mar 1 00:02:26.795: ISAKMP:(1001):atts are acceptable.
*Mar 1 00:02:26.795: IPSEC(validate_proposal_request): proposal part #1
*Mar 1 00:02:26.795: IPSEC(validate_proposal_request): proposal part #1,
(key eng. msg.) INBOUND local= 12.12.12.1, remote= 44.44.44.44,
local_proxy= 10.10.10.0/255.255.255.0/0/0 (type=4),
remote_proxy= 10.20.20.0/255.255.255.0/0/0 (type=4),
protocol= AH, transform= NONE (Tunnel-UDP),
lifedur= 0s and 0kb,
spi= 0x0(0), conn_id= 0, keysize= 0, flags= 0x0
*Mar 1 00:02:26.795: Crypto mapdb : proxy_match
src addr : 10.10.10.0
dst addr : 10.20.20.0
protocol : 0
src port : 0
dst port : 0
*Mar 1 00:02:26.795: ISAKMP:(1001): processing NONCE payload. message ID = 2112503013
*Mar 1 00:02:26.799: ISAKMP:(1001): processing ID payload. message ID = 2112503013
*Mar 1 00:02:26.803: ISAKMP:(1001): processing ID payload. message ID = 2112503013
*Mar 1 00:02:26.811: ISAKMP:(1001): Creating IPSec SAs
*Mar 1 00:02:26.811: inbound SA from 44.44.44.44 to 12.12.12.1 (f/i) 0/ 0
(proxy 10.20.20.0 to 10.10.10.0)
*Mar 1 00:02:26.811: has spi 0x7612ECC1 and conn_id 0
*Mar 1 00:02:26.811: lifetime of 3600 seconds
*Mar 1 00:02:26.811: lifetime of 4608000 kilobytes
*Mar 1 00:02:26.811: outbound SA from 12.12.12.1 to 44.44.44.44 (f/i) 0/0
(proxy 10.10.10.0 to 10.20.20.0)
*Mar 1 00:02:26.811: has spi 0xDEE2A85C and conn_id 0
*Mar 1 00:02:26.811: lifetime of 3600 seconds
*Mar 1 00:02:26.811: lifetime of 4608000 kilobytes
*Mar 1 00:02:26.811: ISAKMP:(1001): sending packet to 44.44.44.44 my_port 4500 peer_port 4500 (I) QM_IDLE
*Mar 1 00:02:26.811: ISAKMP:(1001):Sending an IKE IPv4 Packet.
*Mar 1 00:02:26.811: ISAKMP:(1001):deleting node 2112503013 error FALSE reason "No Error"
*Mar 1 00:02:26.815: ISAKMP:(1001):Node 2112503013, Input = IKE_MESG_FROM_PEER, IKE_QM_EXCH
*Mar 1 00:02:26.819: ISAKMP:(1001):Old State = IKE_QM_I_QM1 <purple>New State = IKE_QM_PHASE2_COMPLETE</purple>
*Mar 1 00:02:26.827: IPSEC(key_engine): got a queue event with 1 KMI message(s)
*Mar 1 00:02:26.827: Crypto mapdb : proxy_match
src addr : 10.10.10.0
dst addr : 10.20.20.0
protocol : 0
src port : 0
dst port : 0
*Mar 1 00:02:26.827: IPSEC(crypto_ipsec_sa_find_ident_head): reconnecting with the same proxies and peer 44.44.44.44
*Mar 1 00:02:26.831: IPSEC(policy_db_add_ident): src 10.10.10.0, dest 10.20.20.0, dest_port 0
<purple>*Mar 1 00:02:26.831: IPSEC(create_sa): sa created,
(sa) sa_dest= 12.12.12.1, sa_proto= 51,
sa_spi= 0x7612ECC1(1980951745),
sa_trans= ah-md5-hmac , sa_conn_id= 1
*Mar 1 00:02:26.831: IPSEC(create_sa): sa created,
(sa) sa_dest= 44.44.44.44, sa_proto= 51,
sa_spi= 0xDEE2A85C(3739396188),
sa_trans= ah-md5-hmac , sa_conn_id= 2</purple>
*Mar 1 00:02:26.831: IPSEC(update_current_outbound_sa): updated peer 44.44.44.44 current outbound sa to SPI DEE2A85C
R1#
R1#
</pre></div>
<p><strong><em>What did the senior engineer found ? What is the problem?</em></strong> </p>
<p><br/></p>
<p><em>Post your answer in the 'Comments' section below and subscribe to this blog to get the detailed solution and more interesting quizzes.</em></p>
<p><br/></p>MACsec Implementation on Linux2016-08-01T02:40:00+01:00Costitag:costiser.ro,2016-08-01:2016/08/01/macsec-implementation-on-linux/<p><span class="dropcap-bg">A</span>s you noticed from the previous articles, lately I have been playing with some various tunnelling techniques and today I am presenting MACsec.<br/>
Most of the documentation resources about MACsec implementation on the web at this moment, are the ones showing various vendors implementation, especially Cisco's approach.<br/>
Although it's not a new topic, support for MACsec in the Linux kernel was added only recently, in version 4.6.</p>
<h2 id="quick-overview">Quick Overview</h2>
<p>MAC Security (MACsec), defined in <strong>IEEE 802.1AE</strong> standard, is intended to provide secure access to the network, ensuring data integrity, data origin authentication and, optionally, encryption for the traffic between the host and the access switch - everything at Layer 2 !<br/>
Its primary use case is to secure communication to & from endpoints at the access edge, and in this role, it is usually used together with 802.1X that provides port-based authentication and transmits the necessary keying material to both host and switch. 802.1X had several revisions that cover MACsec Key Agreement protocol (MKA), a protocol that discovers MACsec enabled peers and dynamically distributes keying material to them. </p>
<p>Additional use cases could be to protect switch to switch links or, why not, host to host, and <em>usually</em> in these cases you have to use static association keys (with Cisco you can also do switch-to-switch MACsec link security with Radius and AAA, as well as manual). In this case, MACsec represents an alternative to IPsec for WAN links if they use Ethernet.</p>
<table class="notes">
<tr><td class="notes-left">
<div class="hidden-xs">NOTES</div><i class="icon-book-open icon-sm"> </i>
</td>
<td class="notes-right">
MACsec does a <b><blue>hop-by-hop</blue></b>, wire speed, link protection - as opposed to IPsec that provides end-to-end (over multiple hops) layer 3 security.<br> For this reason, MACsec is sometimes referred to as <b>Linksec</b>.
</br></td></tr>
</table>
<h3 id="features">Features</h3>
<p>Here are some notes about MACsec without covering the details: </p>
<ul>
<li>protection (integrity and/or encryption) is performed at <strong>Layer 2</strong>, so it is transparent for the network</li>
<li>as opposed to IPsec that raises performance challenges, MACsec is intended to run at <strong>line-rate</strong>, in hardware's ASIC (for this reason, not all hardware supports MACsec)</li>
<li>the Ethertype for the protected MACsec frame is <strong>0x88e5</strong></li>
<li>the Ethernet Header and SecTag are sent in clear text but they are always integrity-protected (by ICV)</li>
<li>default crypto is <strong>AES-GCM-128</strong></li>
<li>supports <em>optional replay protection</em> with a configurable replay
window</li>
</ul>
<h2 id="implementation-on-linux_1">Implementation on Linux</h2>
<p>Before I start, I would like to mention the name Sabrina Dubroca - she is the person behind the work on bringing MACsec support in the Linux kernel. </p>
<p>In order to test MACsec I am going to use two Virtual Boxes (managed via vagrant), similar to <a href="/2016/06/26/my-sdn-testbed/">the lab described here</a> with <strong>Ubuntu</strong>. At the date when this article was written, <red>July 2016</red>, the following are necessary to make it work: </p>
<ul>
<li>install the latest version of the kernel, <strong>kernel-4-7-rc7</strong> (<em>yes! a release candidate version</em>)</li>
<li>compile the macsec module support</li>
<li>install the latest version of iproute2 tools</li>
</ul>
<p>Let's start! </p>
<h3 id="install-latest-kernel-and-compile-macsec-support">Install latest kernel and compile MACsec support</h3>
<p>I am only briefly describing this process, since internet contains a lot of articles on this topic. Here is <a href="https://wiki.ubuntu.com/KernelTeam/GitKernelBuild" target="_blank">one official document</a>.</p>
<div class="row"><pre>
<black># preparation</black>
cd $HOME
mkdir kernel-4-7-rc7
cd kernel-4-7-rc7
<black># clone the git or download a release candidate version</black>
wget https://cdn.kernel.org/pub/linux/kernel/v4.x/testing/linux-4.7-rc7.tar.xz
tar xf linux-4.7-rc7.tar.xz
<black># copy existing kernel config to keep previous settings</black>
cp /boot/config-`uname -r` .config
<black># make the below changes</black>
vi .config
<black># unset CONFIG_DEBUG_INFO to speed up time and decrease size on disk
# enable MACsec support </black>
CONFIG_DEBUG_INFO is not set
<purple>CONFIG_MACSEC=m</purple>
<black># bring file up to date and cleanup</black>
make oldconfig
make clean
<black># build the dev files</black>
make -j `getconf _NPROCESSORS_ONLN` deb-pkg LOCALVERSION=<purple>-custom</purple>
<black># change directory then install linux-image and linux-header files </black>
sudo dpkg -i linux-image-*<purple>-custom</purple>-*.deb
sudo dpkg -i linux-headers-*<purple>-custom</purple>-*.deb
</pre>
</div>
<p>That's it! When that finishes, just do <code>sudo reboot</code>.</p>
<h3 id="install-the-latest-version-of-iproute2">Install the latest version of iproute2</h3>
<div class="row"><pre>
<black># some prerequisites needed</black>
sudo apt-get install pkg-config bison flex xtables-addons-common xtables-addons-source
git clone git://git.kernel.org/pub/scm/linux/kernel/git/shemminger/iproute2.git
cd iproute2/
./configure
make
sudo make install
<black># verify that the below command works</black>
ip macsec show
</pre></div>
<h3 id="configure-macsec-between-two-linux-machines">Configure MACsec between Two Linux Machines</h3>
<p>Repeat the above steps on both virtual machines and then you are ready to configure and test MACsec. If you use the same setup as me, it means that you already have two <em>Host-Only</em> adaptors between the VMs - I am going to use one of them, <code>enp0s8</code>, to create the MACsec interface/device on top of it.<br/>
Here is how you configure MACsec on Linux - instructions are inline. <em>The highlighted MAC address corresponds to the other end (it's like on IPsec you provide the IP address of the peer endpoint)</em>: </p>
<p><a href="/uploads/macsec-implementation-on-linux.png" title="MACsec Implementation on Linux"><img alt="MACsec Implementation on Linux" src="/uploads/macsec-implementation-on-linux.png" title="MACsec Implementation on Linux"/></a></p>
<h4 id="on-the-first-virtual-machine">On the first Virtual Machine</h4>
<div class="row"><pre>
<black># on vagrant box-1
# ----------------
# Clear IP configuration on the Host-Only adaptor between the VMs</black>
sudo ifconfig enp0s8 0.0.0.0
<black># Load the MACsec kernel</black>
sudo modprobe macsec
<black># Create the MACsec device on top of the physical one</black>
sudo ip link add link enp0s8 macsec0 <purple>type macsec</purple>
<black># Configure the Transmit SA and keys</black>
sudo ip macsec <purple>add macsec0 tx</purple> sa 0 pn 100 on key 01 11111111111111111111111111111111
<black># Configure the Receive Channel and SA:
# MAC address of the peer
# port number, packet number and key</black>
sudo ip macsec <purple>add macsec0 rx </purple><red>address 08:00:27:f2:1d:8c</red> port 1
sudo ip macsec <purple>add macsec0 rx </purple><red>address 08:00:27:f2:1d:8c</red> port 1 sa 0 pn 100 on key 02 22222222222222222222222222222222
<black># Bring up the interface</black>
sudo ip link set dev macsec0 up
<black># Configure an IP address on it for connectivity between the hosts</black>
sudo ifconfig <purple>macsec0 1.1.1.1/24</purple>
</pre></div>
<h4 id="on-the-second-virtual-machine">On the second Virtual Machine</h4>
<p>Follow the same steps - again, make sure that you have the correct MAC addresses for the Layer 2 endpoints:</p>
<div class="row"><pre>
<black># on vagrant box-1
# ----------------</black>
sudo modprobe macsec
sudo ip link add link enp0s8 macsec0 <purple>type macsec</purple>
sudo ip macsec <purple>add macsec0 tx</purple> sa 0 pn 100 on key 02 22222222222222222222222222222222
sudo ip macsec <purple>add macsec0 rx </purple><red>address 08:00:27:ae:4d:62</red> port 1
sudo ip macsec <purple>add macsec0 rx </purple><red>address 08:00:27:ae:4d:62</red> port 1 sa 0 pn 100 on key 01 11111111111111111111111111111111
sudo ip link set dev macsec0 up
sudo ifconfig <purple>macsec0 1.1.1.2/24</purple>
</pre></div>
<p><br>
<script async="" src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- pelican-article -->
<ins class="adsbygoogle" data-ad-client="ca-pub-7090359976267134" data-ad-format="auto" data-ad-slot="8618794473" style="display:block"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
<br/></br></p>
<h3 id="verification-and-troubleshooting_1">Verification and Troubleshooting</h3>
<h4 id="common-problems">Common Problems</h4>
<p>If you encounter any problems, here are some things that you can check:</p>
<ol>
<li>
<p>command <strong>sudo modprobe macsec</strong> fails with error <code>modprobe: FATAL: Module macsec not found in directory /lib/modules/</code>.<br/>
As indicated in the error message, your kernel does not contain a module for macsec support. You need to re-compile the kernel as per above-mentioned procedure and make sure that the <strong><em>.config</em></strong> file contains this line <code>CONFIG_MACSEC=m</code> (which enables MACsec support as a module)</p>
</li>
<li>
<p>command <strong>ip macsec</strong> fails with error <code>Object "macsec" is unknown, try "ip help".</code>.<br/>
This means that you do not have the latest version of iproute2 utilities. Follow the procedure above to fix this.</p>
</li>
<li>
<p>command <strong>sudo ip macsec add macsec0</strong> fails with error <code>RTNETLINK answers: Cannot allocate memory</code>.<br/>
This means that you do not use the correct version of the kernel - something higher than version 4.7.0 release candidate 7 (rc7). </p>
</li>
<li>
<p>command <strong>ip macsec show</strong> fails with error <code>RTNETLINK answers: No such file or directory - Error talking to the kernel</code>.<br/>
In this case make sure that you loaded the MACsec module into the kernel with command <code>sudo modprobe macsec</code>.</p>
</li>
</ol>
<h4 id="verification">Verification</h4>
<p>To verify that everything works, we can check the following: </p>
<ul>
<li>
<p>connectivity over the MACsec interface
<div class="row"><pre>ubuntu@box-1 ~$ <blue>ping 1.1.1.2</blue>
PING 1.1.1.2 (1.1.1.2) 56(84) bytes of data.
64 bytes from 1.1.1.2: icmp_seq=1 ttl=64 time=0.663 ms
64 bytes from 1.1.1.2: icmp_seq=2 ttl=64 time=0.489 ms
64 bytes from 1.1.1.2: icmp_seq=3 ttl=64 time=0.527 ms
^C
--- 1.1.1.2 ping statistics ---
3 packets transmitted, 3 received, <purple>0% packet loss</purple>, time 2004ms
rtt min/avg/max/mdev = 0.489/0.559/0.663/0.079 ms
</pre></div> </p>
</li>
<li>
<p>output of ip command showing increased Packet Number (PN) counter - please note that I provided an initial <code>pn 100</code> in the above configuration commands, so the starting point is <strong>100</strong>
<div class="row"><pre>ubuntu@box-1 ~$ <blue>sudo ip macsec show</blue>
5: macsec0: protect on validate strict sc off sa off <red>encrypt off</red> send_sci on end_station off scb off replay off
cipher suite: GCM-AES-128, using ICV length 16
TXSC: 0100624dae270008 on SA 0
0: <purple>PN 113, state on,</purple> key 01000000000000000000000000000000
RXSC: 01008c1df2270008, state on
0: <purple>PN 113, state on,</purple> key 02000000000000000000000000000000
ubuntu@box-1 ~$
</pre></div><br/>
As outlined here, encryption <em><strong>was not configured</strong></em> and it is <strong>off</strong> by default. </p>
</li>
<li>
<p>tcpdump on the physical <code>enp0s8</code> interface
<div class="row"><pre>ubuntu@box-2 ~$ <blue>sudo tcpdump -nli enp0s8</blue>
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on enp0s8, link-type EN10MB (Ethernet), capture size 262144 bytes
23:27:10.072761 <purple>08:00:27:ae:4d:62 > 08:00:27:f2:1d:8c, ethertype Unknown (0x88e5)</purple>, length 130:
0x0000: 2000 0000 0078 0800 27ae 4d62 0001 0800 .....x..'.Mb....
0x0010: 4500 0054 7b8d 4000 4001 bb17 0101 0101 E..T{.@.@.......
0x0020: 0101 0102 0800 c280 0be2 0004 5389 9e57 ............S..W
0x0030: 0000 0000 6be5 0d00 0000 0000 1011 1213 ....k...........
0x0040: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"#
0x0050: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()<em>+,-./0123
0x0060: 3435 3637 e209 f7db 1d85 1ea6 7532 d240 4567........u2.@
0x0070: a10e 1f8a ....
23:27:10.072846 <purple>08:00:27:f2:1d:8c > 08:00:27:ae:4d:62, ethertype Unknown (0x88e5)</purple>, length 130:
0x0000: 2000 0000 0078 0800 27f2 1d8c 0001 0800 .....x..'.......
0x0010: 4500 0054 8965 0000 4001 ed3f 0101 0102 E..T.e..@..?....
0x0020: 0101 0101 0000 ca80 0be2 0004 5389 9e57 ............S..W
0x0030: 0000 0000 6be5 0d00 0000 0000 1011 1213 ....k...........
0x0040: 1415 1617 1819 1a1b 1c1d 1e1f 2021 2223 .............!"#
0x0050: 2425 2627 2829 2a2b 2c2d 2e2f 3031 3233 $%&'()</em>+,-./0123
0x0060: 3435 3637 e6cc 6ca0 dec9 0520 fd25 b73d 4567..l......%.=
0x0070: fc62 f9d1 .b..</pre></div></p>
</li>
</ul>
<p></p><br/>
As you can see here, tcpdump command does not understand ethertype <strong>0x88e5</strong> so it does not understand what kind of traffic that is.
<p><br>
To have a better picture, I captured the traffic and opened it with Wireshark - this one knows about the <strong>Ethertype 0x88e5</strong> as being MACsec, but still it misses dissectors to interpret the data: </br></p>
<p><a href="/uploads/macsec-packet-capture.png" title="MACsec Packet Capture"><img alt="MACsec Packet Capture" src="/uploads/macsec-packet-capture.png" title="MACsec Packet Capture"/></a></p>
<p>I have also included a <em><strong>Total Bytes calculation</strong></em>, starting from an ICMP packet of 64B (the default on Ubuntu) - note that the SecTag has a length of 16B here (though default is 8 bytes, here it includes the optional Secure Channel Identifier (SCI) encoding which adds an extra 8 bytes). </p>
<p>As always, I uploaded the <a href="https://www.cloudshark.org/captures/653a5b97a1dd" target="_blank">packet capture and you can view it here</a>.</p>
<h3 id="enabling-encryption_1">Enabling Encryption</h3>
<p>As indicated above, by default MACsec performs data integrity and authentication but no encryption. In our scenario, to enable encryption on the macsec interface, use the following command:</p>
<div class="row"><pre class="col-md-8">
sudo ip link set macsec0 type macsec <purple>encrypt on</purple>
</pre></div>
<p>Since neither tcpdump nor Wireshark are not (yet!) capable of dissecting the MACsec payload, the packet captures with <strong>encrypt off</strong> and <strong>encrypt on</strong> look almost identical - you cannot say which one has encrypted data (since in both cases data is <em>not</em> dissected). But the SecTag contains a signal, <strong>the "E" bit</strong>, to indicate if encryption is on or off - see below: </p>
<p><a href="/uploads/macsec-encrypt-on-vs-off.png" title="MACsec Encrypt OFF vs ON"><img alt="MACsec Encrypt OFF vs ON" src="/uploads/macsec-encrypt-on-vs-off.png" title="MACsec Encrypt OFF vs ON"/></a> </p>
<p><br>
Another good command is <code>ip -s macsec show</code> that contains individual counters for each type of protection: integrity-only (<em>encrypt off</em>) and encryption (<em>encrypt on</em>): </br></p>
<div class="row"><pre>
ubuntu@box-2 ~$ <blue>ip -s macsec show</blue>
5: macsec0: protect on validate strict sc off sa off encrypt off send_sci on end_station off scb off replay off
cipher suite: GCM-AES-128, using ICV length 16
TXSC: 01008c1df2270008 on SA 0
stats: OutPktsUntagged InPktsUntagged OutPktsTooLong InPktsNoTag InPktsBadTag InPktsUnknownSCI InPktsNoSCI InPktsOverrun
0 0 0 0 0 0 0 0
stats: <purple>OutOctetsProtected OutOctetsEncrypted OutPktsProtected OutPktsEncrypted
150247 0 227442534 0</purple>
0: PN 150347, state on, key 02000000000000000000000000000000
<purple>OutPktsProtected OutPktsEncrypted</purple>
150247 0
RXSC: 0100624dae270008, state on
stats: InOctetsValidated InOctetsDecrypted InPktsUnchecked InPktsDelayed InPktsOK InPktsInvalid InPktsLate InPktsNotValid InPktsNotUsingSA InPktsUnusedSA
6855406 0 0 0 68431 0 0 0 0 0
0: PN 68539, state on, key 01000000000000000000000000000000
InPktsOK InPktsInvalid InPktsNotValid InPktsNotUsingSA InPktsUnusedSA
68431 0 0 0 0
ubuntu@box-2 ~$</pre></div>
<p><br>
Although initially my plan was to include another usecase - GRE over MACsec - I will leave this for the next post since it seems that I have the tendency to write very long articles.</br></p>
<p><br>
<em>As always, thank you for your interest and comments!</em></br></p>Performance Review of Overlay Tunnels with Open vSwitch2016-07-11T01:40:00+01:00Costitag:costiser.ro,2016-07-11:2016/07/11/performance-review-of-overlay-tunnels-with-openvswitch/<p>In my <a href="/2016/07/07/overlay-tunneling-with-openvswitch-gre-vxlan-geneve-greoipsec/">previous article</a> I presented various encapsulation techniques used to extend Layer 2 reachability across separate networks using tunnels created with Open vSwitch.<br/>
Although the initial intention was to include some iperf test results, I decided to leave these for a separate post (<em>this one!</em>) because I hit few problems. While I was prepared to deal with MTU issues - always a topic when adding extra encapsulation - there were other things that I had to take care of. </p>
<div class="row"><div class="col-md-12">
<div class="panel panel-red">
<div class="panel-heading"><i class="fa fa-binoculars"></i> DISCLAIMER: </div>
<div class="panel-body">
The tests presented in this post do <red>not</red> follow a typical network performance procedure, but are more just iperf tests (mostly with the default options) intended to give the reader a <b>simple overview</b>. These tests were <red>not</red> performed between physical machines over physical wires, but instead they were carried between virtual elements <b>in a fully virtualized environment</b> (OS, networking, connection, etc).
</div>
</div>
</div></div>
<h3 id="baseline">Baseline</h3>
<p>As always when trying to measure performance, you need a baseline that gives you a rough idea about the total capacity you have. In this case, I assigned the <em>Host-Only</em> adapter to the each of the network namespace so that they can reach each other directly.<br/>
After measuring the baseline, I followed the instructions in <a href="/2016/07/07/overlay-tunneling-with-openvswitch-gre-vxlan-geneve-greoipsec/">my previous post</a> to create tunnels and re-do the performance measuring each time. In the end, I summarized the iperf results for all the tests performed. </p>
<p>Let's start ! On each of the vagrant box, I create a network namespace (<strong><em>left</em></strong> and <strong><em>right</em></strong>) and assign the first <em>Host-Only Adapter</em> (<strong>enp0s8</strong>) to them, thus achieving a direct connection.</p>
<div align="center"><a href="/uploads/overlay-networks-with-openvswitch-baseline.png"><img alt="Overlay Networks with Open vSwitch" src="/uploads/overlay-networks-with-openvswitch-baseline.png" title="Overlay Networks with Open vSwitch" width="80%"/></a></div>
<div class="row">
<pre class="col-md-9">
<black># on vagrant box-1
# -----------------</black>
<black># create a new namespace</black>
sudo ip netns add <purple>left</purple>
<black># reset enp0s8 interface and assign it to the namespace</black>
sudo ifconfig enp0s8 0.0.0.0 down
sudo ip link set dev enp0s8 netns left
<black># configure that interface inside the namespace</black>
sudo ip netns exec left ifconfig enp0s8 <purple>10.0.0.1/24</purple> up
<black># on vagrant box-2
# ----------------</black>
<black># same as above, with different name and IP for the namespace</black>
sudo ip netns add <purple>right</purple>
sudo ifconfig enp0s8 0.0.0.0 down
sudo ip link set dev enp0s8 netns right
sudo ip netns exec right ifconfig enp0s8 <purple>10.0.0.2/24</purple> up
<black># check the connectivity between left and right namespaces
# from vagrant box-1</black>
ubuntu@box-1 ~$ sudo ip netns exec left ping 10.0.0.2
PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data.
64 bytes from 10.0.0.2: icmp_seq=1 ttl=64 time=0.415 ms
64 bytes from 10.0.0.2: icmp_seq=2 ttl=64 time=0.264 ms
...
</pre>
</div>
<p>Now, let's perform an iperf test and record the results. On the <strong><em>'left'</em></strong> namespace (vagrant box-1), run the server with command <code>sudo ip netns exec left iperf -s</code> and on the <strong><em>'right'</em></strong> start the iperf client:</p>
<div class="row">
<pre class="col-md-9">
ubuntu@box-2 ~$ <purple>sudo ip netns exec right iperf -c 10.0.0.1</purple>
------------------------------------------------------------
Client connecting to 10.0.0.1, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[ 3] local 10.0.0.2 port 46616 connected with 10.0.0.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec <red>1.05 GBytes 1.80 Gbits/sec</red>
</pre></div>
<table class="notes">
<tr><td class="notes-left">
<div class="hidden-xs">NOTES</div><i class="icon-book-open icon-sm"> </i>
</td>
<td class="notes-right"><ul>
<li>I'm only testing TCP, but you may want to perform also UDP tests
<li>I'm only using the iperf defaults, but you may want to tweak it to achieve better results - for example, setting a different TCP window size or/and using parallel connections, etc...
</li></li></ul></td></tr>
</table>
<p>If you follow these steps to do your own tests, before you start creating the tunnels, you should undo the above configuration, that was needed for the baseline - the easiest way is to reboot the vagrant box.
After reboot, create a <a href="http://costiser.ro/2016/07/07/overlay-tunneling-with-openvswitch-gre-vxlan-geneve-greoipsec/#gretap">GRETAP tunnel between two OVS bridges</a> and repeat the iperf tests. </p>
<h3 id="considerations-with-overlay-networks">Considerations with Overlay Networks</h3>
<p>Everytime you add extra encapsulation to your traffic, you have to think about the MTU: '<red><em>do I have to increase or decrease it?</em></red>' or '<em><red>is Path MTU Discovery working, is it enabled or disabled in my network?</red></em>' or '<em><red>what is the overhead that my tunnel adds?</red></em>' - these are questions that you should always consider (and you better know the answer ☺) !<br/>
As I said in the beginning, I was expecting MTU problems and I hoped that I can deal with them - unfortunately, I was wrong ! Let's see what happened, chronologically: </p>
<h4 id="first-test">First Test</h4>
<p>After having the GRE tunnel up and connectivity working between the two network namespaces, I started the iperf server on the '<em>left</em>' namespace with command <code>ip netns exec left iperf -s</code> and the client on '<em>right</em>' with command <code>ip netns exec right iperf -c 10.0.0.1</code>. I waited for a minute and nothing was shown on the console. After few more minutes I opened a new ssh session and started doing tcpdump, but again, nothing there too. I left the command running and after <red><em>15 minutes (!!)</em></red> it finished and returned <red><strong><em>759 bits/sec</em></strong></red> (not Megs, not Kilos, simple bits per second). </p>
<div class="row"><pre class="col-md-10">
ubuntu@box-2 ~$ <purple>sudo ip netns exec right iperf -c 10.0.0.1</purple>
------------------------------------------------------------
Client connecting to 10.0.0.1, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[ 3] local 10.0.0.2 port 40708 connected with 10.0.0.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-931.1 sec 86.3 KBytes <red>759 bits/sec<red>
</red></red></pre></div>
<p><br>
<script async="" src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- pelican-article -->
<ins class="adsbygoogle" data-ad-client="ca-pub-7090359976267134" data-ad-format="auto" data-ad-slot="8618794473" style="display:block"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
<br/></br></p>
<h3 id="solutions_1">Solutions</h3>
<h4 id="adjusting-the-mtu">Adjusting the MTU</h4>
<p>My first action was to increase the MTU on the physical link between the vagrant boxes (the <em>Host-only</em> adapter between the Virtual Boxes) and I have chosen a value high enough to fit the GRE overhead, such as <code>1600</code>. </p>
<div class="row"><pre class="col-md-10">
ubuntu@box-2 ~$ <purple>sudo ip link set dev enp0s8 mtu 1600</purple>
ubuntu@box-2 ~$
ubuntu@box-2 ~$ <blue>sudo ip netns exec right iperf -c 10.0.0.1</blue>
------------------------------------------------------------
Client connecting to 10.0.0.1, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[ 3] local 10.0.0.2 port 40710 connected with 10.0.0.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-11.0 sec 5.00 MBytes <red>3.80 Mbits/sec</red>
</pre></div>
<p><strong><em>What ? Only <red>3.80 Mbps</red>?</em></strong> That was unexpectedly small! I did not have high expectations, but for sure, nothing like that !</p>
<h4 id="tweaking-the-linux-network-stack">Tweaking the Linux Network Stack</h4>
<p>Next thing that I had to do was to tweak the Linux Network Stack, in particular, the fragmentation/segmentation offloading. Normally, the Linux TCP/IP Stack is responsible for performing the fragmentation/segmentation of large UDP/TCP data chunks, thus consuming CPU cycles. Features like TSO (<strong>TCP Segmentation Offload</strong>), GSO (<strong>Generic Segmentation Offload</strong>) and other, reduce these CPU cycles by offloading the segmentation to the NIC driver and they are mostly enabled by default. Use the <code>ethtool</code> command to display or modify these offload settings: </p>
<div class="row"><pre>
ubuntu@box-2 ~$ <purple>ethtool -k gre_sys</purple>
Features for gre_sys:
rx-checksumming: off [fixed]
tx-checksumming: on
tx-checksum-ipv4: off [fixed]
tx-checksum-ip-generic: on
tx-checksum-ipv6: off [fixed]
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: off [fixed]
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: on
<red>tcp-segmentation-offload: on
tx-tcp-segmentation: on
tx-tcp-ecn-segmentation: on
tx-tcp-mangleid-segmentation: on
tx-tcp6-segmentation: on</red>
udp-fragmentation-offload: on
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: off [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: on
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: on [fixed]
netns-local: on [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: off [fixed]
hw-tc-offload: off [fixed]
</pre></div>
<p>I played a bit with disabling these offloads (TSO, GSO, GRO, LRO), but the one tweak that worked for my test environment was to disable the TCP Segmentation Offload on the <strong><em>gre_sys</em></strong> interface: </p>
<div class="row"><pre class="col-md-10">
ubuntu@box-2 ~$ <purple>sudo ethtool -K gre_sys tso off</purple>
ubuntu@box-2 ~$
ubuntu@box-2 ~$ <blue>sudo ip netns exec right iperf -c 10.0.0.1</blue>
------------------------------------------------------------
Client connecting to 10.0.0.1, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[ 3] local 10.0.0.2 port 40744 connected with 10.0.0.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0-10.0 sec 1.14 GBytes <green>980 Mbits/sec</green>
ubuntu@box-2 ~$
ubuntu@box-2 ~$
ubuntu@box-2 ~$
ubuntu@box-2 ~$ <blue>sudo ip netns exec right iperf -c 10.0.0.1 -t 60 -i 5</blue>
------------------------------------------------------------
Client connecting to 10.0.0.1, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[ 3] local 10.0.0.2 port 50982 connected with 10.0.0.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 5.0 sec 644 MBytes <green>1.08 Gbits/sec</green>
[ 3] 5.0-10.0 sec 614 MBytes <green>1.03 Gbits/sec</green>
[ 3] 10.0-15.0 sec 590 MBytes 990 Mbits/sec
[ 3] 15.0-20.0 sec 631 MBytes <green>1.06 Gbits/sec</green>
[ 3] 20.0-25.0 sec 603 MBytes 1.01 Gbits/sec
[ 3] 25.0-30.0 sec 608 MBytes 1.02 Gbits/sec
[ 3] 30.0-35.0 sec 613 MBytes 1.03 Gbits/sec
[ 3] 35.0-40.0 sec 612 MBytes 1.03 Gbits/sec
[ 3] 40.0-45.0 sec 609 MBytes 1.02 Gbits/sec
[ 3] 45.0-50.0 sec 579 MBytes 971 Mbits/sec
[ 3] 50.0-55.0 sec 609 MBytes 1.02 Gbits/sec
[ 3] 55.0-60.0 sec 616 MBytes 1.03 Gbits/sec
[ 3] 0.0-60.0 sec 7.16 GBytes 1.02 Gbits/sec
</pre></div>
<p>Much better this time ! I was able to hit 1 Gbps! This time I also saw the CPU going close to 100% - the process causing this was called "<strong><em>ksoftirqd</em></strong>", a soft-interrupts process that queues IRQ when they come too fast due to the system being under heavy load.</p>
<h4 id="disabling-path-mtu-discovery">Disabling Path MTU Discovery</h4>
<p>By default, Path MTU Discovery (PMTUD) is enabled so the outer IP header contains the <blue><strong>Don't Fragment</strong></blue> bit set. Disabling it, so that the outer IP header will not have the DF bit set, represents also a solution - not something that I would recommend !!</p>
<div class="row"><pre>
<black># on vagrant box-1
# -----------------
<blue>sudo ovs-vsctl add-port sw1 tun0 -- set Interface tun0 <purple>type=gre</purple> options:remote_ip=192.168.56.12</blue> <red>options:df_default=false</red>
<black># on vagrant box-2
# -----------------</black>
<blue>sudo ovs-vsctl add-port sw2 tun0 -- set Interface tun0 <purple>type=gre</purple> options:remote_ip=192.168.56.11</blue> <red>options:df_default=false</red>
</black></pre></div>
<p>Let's see the results - note the default MTU on the physical <strong>enp0s8</strong> interface between the hypervisors:</p>
<div class="row"><pre>
ubuntu@box-2 ~$ <purple>ip link</purple>
1: lo: <loopback,up,lower_up> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp0s3: <broadcast,multicast,up,lower_up> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
link/ether 02:27:53:60:41:f1 brd ff:ff:ff:ff:ff:ff
<blue>3: enp0s8: <broadcast,multicast,up,lower_up> <red>mtu 1500</red> qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
link/ether 08:00:27:f2:1d:8c brd ff:ff:ff:ff:ff:ff</broadcast,multicast,up,lower_up></blue>
4: enp0s9: <broadcast,multicast,up,lower_up> mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
link/ether 08:00:27:9d:a4:ee brd ff:ff:ff:ff:ff:ff
5: sw2-p1@if6: <broadcast,multicast,up,lower_up> mtu 1500 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000
link/ether 26:05:35:81:94:24 brd ff:ff:ff:ff:ff:ff link-netnsid 0
7: ovs-system: <broadcast,multicast> mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether c2:bd:52:03:08:44 brd ff:ff:ff:ff:ff:ff
8: sw2: <broadcast,multicast,up,lower_up> mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/ether 2e:2b:1e:62:f4:44 brd ff:ff:ff:ff:ff:ff
9: gre0@NONE: <noarp> mtu 1476 qdisc noop state DOWN mode DEFAULT group default qlen 1
link/gre 0.0.0.0 brd 0.0.0.0
10: gretap0@NONE: <broadcast,multicast> mtu 1462 qdisc noop state DOWN mode DEFAULT group default qlen 1000
link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff
<black># Let's test now (with PMTUD disabled and default 1500 MTU)</black>
ubuntu@box-2 ~$ sudo ip netns exec right iperf -c 10.0.0.1 -t 20 -i 5
------------------------------------------------------------
Client connecting to 10.0.0.1, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[ 3] local 10.0.0.2 port 50968 connected with 10.0.0.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 5.0 sec 2.25 MBytes <red>3.77 Mbits/sec</red>
[ 3] 5.0-10.0 sec 1.38 MBytes <red>2.31 Mbits/sec</red>
[ 3] 10.0-15.0 sec 1.88 MBytes <red>3.15 Mbits/sec</red>
^C[ 3] 0.0-18.4 sec 6.38 MBytes <red>2.90 Mbits/sec</red>
<black># Let's disable TSO</black>
ubuntu@box-2 ~$ <purple>sudo ethtool -K gre_sys tso off</purple>
ubuntu@box-2 ~$
ubuntu@box-2 ~$ sudo ip netns exec right iperf -c 10.0.0.1 -t 60 -i 5
------------------------------------------------------------
Client connecting to 10.0.0.1, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[ 3] local 10.0.0.2 port 50974 connected with 10.0.0.1 port 5001
[ ID] Interval Transfer Bandwidth
[ 3] 0.0- 5.0 sec 220 MBytes <blue>370 Mbits/sec</blue>
[ 3] 5.0-10.0 sec 175 MBytes <blue>294 Mbits/sec</blue>
[ 3] 10.0-15.0 sec 194 MBytes <blue>325 Mbits/sec</blue>
[ 3] 15.0-20.0 sec 200 MBytes <blue>336 Mbits/sec</blue>
^C[ 3] 20.0-25.0 sec 193 MBytes <blue>324 Mbits/sec</blue>
</broadcast,multicast></noarp></broadcast,multicast,up,lower_up></broadcast,multicast></broadcast,multicast,up,lower_up></broadcast,multicast,up,lower_up></broadcast,multicast,up,lower_up></loopback,up,lower_up></pre></div>
<h3 id="summary-of-iperf-results_1">Summary of iPerf Results</h3>
<p>After finding the tweaks needed to get the best performance results, I resumed creating all different types of tunnels, as per <a href="/2016/07/07/overlay-tunneling-with-openvswitch-gre-vxlan-geneve-greoipsec/">instructions in the previous post</a> and for each type of tunnel, I recorded the <em>iperf</em> tests results. </p>
<table class="notes">
<tr><td class="notes-left">
<div class="hidden-xs">NOTES</div><i class="icon-book-open icon-sm"> </i>
</td>
<td class="notes-right"><ul>
<li>as mentioned in the disclaimer, these tests were performed in a fully virtualized environment, not between physical machines !
<li>I performed only TCP tests
<li>in all of these cases, the performance is a factor of CPU - the more cycles you have, the better results you get
<li> below table contains only the best results
</li></li></li></li></ul>
</td></tr>
</table>
<p>Here is what I've got. These results were obtained between two VirtualBoxes, each having <purple><strong><em>2x CPUs running @2394 Mhz</em></strong></purple> (though, the number of CPUs does not matter - this needs more testing):</p>
<!--Basic Table-->
<div class="row"><div class="col-md-12">
<div class="panel panel-orange margin-bottom-20">
<div class="panel-heading">
<i class="fa fa-edit"></i> iPerf Results Overview
</div>
<table class="table table-striped">
<thead>
<tr>
<th>Tunnel Type</th>
<th><blue>MTU 1500</blue> (enp0s8)<br>offload <red>on</red></br></th>
<th><blue>MTU 1600</blue> (enp0s8)<br>offload <red>on</red></br></th>
<th><blue>MTU 1600</blue> (enp0s8)<br>offload <red>off</red></br></th>
</tr>
</thead>
<tbody>
<tr>
<td><b><green>Baseline<green></green></green></b></td>
<td>1.80 Gbits/sec</td>
<td>N/A</td>
<td>TSO off (enp0s8)<br><blue>1.92 Gbits/sec</blue></br></td>
</tr>
<tr>
<td><b>GRETAP</b></td>
<td>759 bits/sec</td>
<td>3.15 Mbits/sec</td>
<td>TSO off (gre_sys)<br><blue>1.08 Gbits/sec</blue></br></td>
</tr>
<tr>
<td><b>VXLAN</b></td>
<td>N/A</td>
<td>1.10 Gbits/sec</td>
<td>UFO off (vxlan_sys_4789)<br><blue>1.12 Gbits/sec</blue></br></td>
</tr>
<tr>
<td><b>GENEVE</b></td>
<td>N/A</td>
<td>1.09 Gbits/sec</td>
<td>UFO off (genev_sys_6081)<br><blue>1.15 Gbits/sec</blue></br></td>
</tr>
<tr>
<td><b>GREoIPSEC</b></td>
<td>N/A</td>
<td>4.19 Mbits/sec</td>
<td>TSO off (gre_sys)<br><blue>594 Mbits/sec</blue></br></td>
</tr>
</tbody>
</table>
</div>
</div></div>
<!--End Basic Table-->
<p>That's it ! What results have you got ? If you run the same tests and you want to share the results, leave a comment below (please specify also the CPU). </p>
<p><br>
<em>Thanks for interest !</em><br/>
<br/></br></p>Overlay Tunneling with Open vSwitch - GRETAP, VXLAN, Geneve, GREoIPsec2016-07-07T11:30:00+01:00Costitag:costiser.ro,2016-07-07:2016/07/07/overlay-tunneling-with-openvswitch-gre-vxlan-geneve-greoipsec/<p><span class="dropcap-bg">B</span>uilding overlay networks using tunnels was always done to achieve connectivity between isolated networks that needed to share the same policies, VLANs or security domains. In particular, they represent a strong use-case in the data center, where tunnels are created between the hypervisors in different locations allowing virtual machines to be provisioned independently from the physical network.<br/>
In this post I am going to present how to build such tunnels between Open vSwitch bridges running on separate machines, thus creating an overlay virtual Layer 2 network on top of the physical Layer 3 one.<br/>
By itself, this article does not bring anything new - there are multiple blogs describing various tunneling protocols. The particularity of this post is that I present multiple encapsulations with packet capture and iperf tests and the fact that, instead of hypervisors and VMs, I am going to use OVS bridges and network namespaces - both of these are extensively used in emerging data center standards and products such as OpenStack or CloudStack.</p>
<p>I encourage you to follow the steps described in this post, perform the same iperf tests (and packet captures, if you want) and share with me the results you've got ! [<em>UPDATE: the iperf tests required some additional tweaking - MTU, TCP Segmentation Offload, etc - so I will present all of those in a separate article</em>] </p>
<p>Before we start I'd like to mention the inspirational articles on this topic from <a href="http://blog.scottlowe.org/2013/05/07/using-gre-tunnels-with-open-vswitch/" target="_blank">Scott Lowe</a> and <a href="http://networkstatic.net/open-vswitch-gre-tunnel-configuration/" target="_blank">Brent Salisbury</a>.</p>
<h2 id="initial-state">Initial State</h2>
<p>This lab is based on the setup explained in <a href="/2016/06/26/my-sdn-testbed/">this post</a> - up to the point of creating the network namespaces. I am using two virtual machines (VirtualBoxes managed via Vagrant) called <strong><blue>vagrant box-1</blue></strong> and <strong><blue>vagrant box-2</blue></strong> connected via <em>Host-Only Adapters</em> (<code>192.168.56.0/24</code> and <code>192.168.57.0/24</code>).</p>
<p>The task is to <purple>achieve Layer 2 connectivity between two network namespaces</purple> (<em>think of VMs in a data center world</em>) created on these two vagrant boxes (<em>think of hypervisors</em>). </p>
<h2 id="overlay-tunnels-using-open-vswitch">Overlay Tunnels using Open vSwitch</h2>
<p>Now we are going to use Open vSwitch commands to create tunnels between the OVS bridges in order to connect the <strong><em>left</em></strong> and <strong><em>right</em></strong> namespaces at Layer 2. Before you proceed, make sure that you are back in the initial state (by rebooting both vagrant boxes).</p>
<p>Below is the diagram describing the <em>target connectivity</em>:</p>
<div align="center"><a href="/uploads/overlay-networks-with-openvswitch-gre-vxlan-greoipsec.png"><img alt="Overlay Networks with Open vSwitch - GRE - VXLAN - Geneve - GREoIPSEC" src="/uploads/overlay-networks-with-openvswitch-gre-vxlan-greoipsec.png" title="Overlay Networks with Open vSwitch - GRE - VXLAN - Geneve - GREoIPSEC" width="85%"/></a></div>
<p>Let's create everything except the tunnels - more info about the setup can be found in <a href="/2016/06/26/my-sdn-testbed/">this post</a>:</p>
<div class="row">
<pre class="col-md-9">
<black># on vagrant box-1
# -----------------</black>
sudo ip netns add left
sudo ip link add name veth1 type veth peer name sw1-p1
sudo ip link set dev veth1 netns left
sudo ip netns exec left ifconfig veth1 10.0.0.1/24 up
sudo ovs-vsctl add-br sw1
sudo ovs-vsctl add-port sw1 sw1-p1
sudo ip link set sw1-p1 up
sudo ip link set sw1 up
<black># on vagrant box-2
# -----------------</black>
sudo ip netns add right
sudo ip link add name veth1 type veth peer name sw2-p1
sudo ip link set dev veth1 netns right
sudo ip netns exec right ifconfig veth1 10.0.0.2/24 up
sudo ovs-vsctl add-br sw2
sudo ovs-vsctl add-port sw2 sw2-p1
sudo ip link set sw2-p1 up
sudo ip link set sw2 up
</pre></div>
<h3 id="gretap">GRETAP</h3>
<p>First encapsulation that we are going to test is GRETAP, which encapsulates the entire Layer 2 frame into a GRE packet. Note that the <em>Protocol Type</em> in the GRE header is <strong>0x6558</strong> - <em>Transparent Ethernet Bridging</em> - which denotes that the payload is the Ethernet frame, as opposed to <strong>0x0800</strong> used in case of carrying Layer 3 IP packets.<br/>
Here are the commands to create the GRE tunnel between the OVS bridges:</p>
<div class="row">
<pre>
<black># on vagrant box-1
# -----------------</black>
<blue>sudo ovs-vsctl add-port sw1 tun0 -- set Interface tun0 <purple>type=gre</purple> options:remote_ip=192.168.56.</blue><red>12</red>
<black># on vagrant box-2
# -----------------</black>
<blue>sudo ovs-vsctl add-port sw2 tun0 -- set Interface tun0 <purple>type=gre</purple> options:remote_ip=192.168.56.</blue><red>11</red>
<black># Test that now there is connectivity between 'left' and 'right'
# ----------------------------------------------------------------</black>
ubuntu@box-1 ~$ sudo ip netns exec left ping 10.0.0.2
PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data.
64 bytes from 10.0.0.2: icmp_seq=1 ttl=64 time=0.644 ms
64 bytes from 10.0.0.2: icmp_seq=2 ttl=64 time=0.436 ms
...
</pre></div>
<p>One question you may ask is: <strong><em>how does the tunnel work between OVS switches sw1 and sw2 since the physical interfaces (enp0s8) do not belong to them? It looks like the OVS bridge is not connected to the outside world at all!</em></strong><br/>
The answer is not that obvious, unfortunately. <em><purple>The tunnel is created by the OVS daemon</purple> that run on each of these boxes/hypervisor</em>, box-1 and box-2, and it uses the their networking stack to build it. A very interesting post on this topic was written by Scott Lowe <a href="http://blog.scottlowe.org/2013/05/15/examining-open-vswitch-traffic-patterns/#scenario-3-the-isolated-bridge">here</a>. </p>
<table class="notes">
<tr><td class="notes-left">
<div class="hidden-xs">NOTES</div><i class="icon-book-open icon-sm"> </i>
</td>
<td class="notes-right">
You can (and should) always use command <code>ovs-vsctl show</code> and <code>ovs-ofctl show <bridge></code> to check the configuration and status of the bridges and interfaces!
</td></tr>
</table>
<div class="row">
<pre class="col-md-7">root@box-1 ~$ <purple>ovs-vsctl show</purple>
2963c9d3-3069-4edd-88dc-b313da7366de
Bridge "sw1"
<green>Port "tun0"
Interface "tun0"
type: gre
options: {remote_ip="192.168.56.12"}</green>
Port "sw1"
Interface "sw1"
type: internal
Port "sw1-p1"
Interface "sw1-p1"
ovs_version: "2.5.90"
</pre></div>
<p><br>
Since the GRETAP traffic is going via the physical <strong><em>enp0s8</em></strong> interface, let's perform tcpdump on it and dissect it with wireshark - <a href="https://www.cloudshark.org/captures/6aa1b9a2dc12" target="_blank">here you can view the entire packet capture</a>: </br></p>
<p><a href="/uploads/icmp-inside-gretap.png" title="Packet Capture GRETAP"><img alt="Packet Capture GRETAP" src="/uploads/icmp-inside-gretap.png" title="Packet Capture GRETAP"/></a></p>
<p>You can notice that the initial Layer 2 frame (containing ICMP/IP - between <strong><em>10.0.0.1</em></strong> and <strong><em>10.0.0.2</em></strong>) is entirely encapsulated into GRE/IP with external <strong>192.168.56.11</strong> and <strong>192.168.56.12</strong>.<br/>
<br/></p>
<p><br>
<script async="" src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- pelican-article -->
<ins class="adsbygoogle" data-ad-client="ca-pub-7090359976267134" data-ad-format="auto" data-ad-slot="8618794473" style="display:block"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
<br/></br></p>
<h3 id="vxlan">VXLAN</h3>
<p>Before you continue, make sure that you delete the GRE interface created in the previous section:<br/>
<code>sudo ovs-vsctl del-port tun0</code> <br/>
The second tunneling protocol to be tested is VXLAN, a technique that encapsulates Layer 2 frames within Layer 4 UDP packets, using the destination <strong>UDP port 4789</strong>.<br/>
Using the same idea as above, with GRE, I will add a new port, of <code>type vxlan</code>, to the OVS bridge, specify the remote endpoint IP and an <em>optional</em> key.</p>
<div class="row">
<pre>
<black># on vagrant box-1
# -----------------</black>
<blue>sudo ovs-vsctl add-port sw1 tun0 -- set interface tun0 <purple>type=vxlan</purple> options:remote_ip=192.168.56.<red>12</red> options:<purple>key=123</purple></blue>
<black># on vagrant box-2
# -----------------</black>
<blue>sudo ovs-vsctl add-port sw2 tun0 -- set interface tun0 <purple>type=vxlan</purple> options:remote_ip=192.168.56.<red>11</red> options:<purple>key=123</purple>
</blue></pre></div>
<p>Here is how communication between internal VMs <strong>10.0.0.1</strong> and <strong>10.0.0.2</strong> looks like - now encapsulated in UDP 4789: </p>
<p><a href="/uploads/vxlan.png" title="VXLAN between two OVS bridges"><img alt="VXLAN between two Open vSwitch bridges" src="/uploads/vxlan.png" title="VXLAN between two Open vSwitch bridges"/></a></p>
<p>You can notice that the <code>key=123</code> is used as VXLAN Network Identifier (<strong>VNI</strong>). You can view <a href="https://www.cloudshark.org/captures/670aeb7bad79" target="_blank">the entire packet capture here</a>.<br/>
<br/></p>
<h3 id="geneve">Geneve</h3>
<p>If you followed this post, before testing Geneve, make sure you delete the previous VXLAN tunnel:<br/>
<code>sudo ovs-vsctl del-port tun0</code><br/>
The next encapsulation to be presented is <strong><em>Geneve</em></strong>, a tunneling technique with a flexible format that allows metadata information to be carried inside <em>Variable Length Options</em> and provides service chaining (think firewall, load balancing, etc). Geneve header is more like an IPv6 header with <strong><em>basic fixed-length fields</em></strong> and <strong><em>extension headers</em></strong> used to enable different functions.<br/>
Let's have a look at its configuration and packet capture: </p>
<div class="row">
<pre>
<black># on vagrant box-1
# -----------------</black>
<blue>sudo ovs-vsctl add-port sw1 tun0 -- set interface tun0 <purple>type=geneve</purple> options:remote_ip=192.168.56.<red>12</red> options:<purple>key=123</purple></blue>
<black># on vagrant box-2
# -----------------</black>
<blue>sudo ovs-vsctl add-port sw2 tun0 -- set interface tun0 <purple>type=geneve</purple> options:remote_ip=192.168.56.<red>11</red> options:<purple>key=123</purple>
</blue></pre></div>
<p>Here is <a href="https://www.cloudshark.org/captures/ba56581a5845" target="_blank">the full packet capture</a> - unfortunately, the CloudShark provider, where I store these captures, does not have a dissector for Geneve traffic, but Wireshark does (see image below):</p>
<p><a href="/uploads/geneve.png" title="Geneve between two OVS bridges"><img alt="Geneve between two Open vSwitch bridges" src="/uploads/geneve.png" title="GENEVE between two Open vSwitch bridges"/></a> </p>
<p><br/></p>
<h3 id="greoipsec">GREoIPsec</h3>
<p>Again, if you followed along, delete the previously created Geneve tunnel on both vagrant boxes:<br/>
<code>ovs-vsctl del-port tun0</code><br/>
<strong><em>GREoIPsec</em></strong> does not need any introduction, so let's do the configuration. Note, though, that again here, the GRE payload is the Ethernet frame (same as with the GRETAP example presented above): </p>
<div class="row">
<pre>
<black># on vagrant box-1
# -----------------</black>
<blue>sudo ovs-vsctl add-port sw1 tun0 -- set interface tun0 <purple>type=ipsec_gre</purple> options:remote_ip=192.168.56.<red>12</red> options:<purple>psk=test123</purple></blue>
<black># on vagrant box-2
# -----------------</black>
<blue>sudo ovs-vsctl add-port sw2 tun0 -- set interface tun0 <purple>type=ipsec_gre</purple> options:remote_ip=192.168.56.<red>11</red> options:<purple>psk=test123</purple></blue>
<black># test connectivity between 'left' and 'right' namespaces
# notice the high times reported for the first pings
# needed for the IPsec tunnel to get established</black>
ubuntu@box-1 ~$ sudo ip netns exec left ping 10.0.0.2
PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data.
64 bytes from 10.0.0.2: icmp_seq=1 ttl=64 <blue>time=2000 ms</blue>
64 bytes from 10.0.0.2: icmp_seq=2 ttl=64 <blue>time=1001 ms</blue>
64 bytes from 10.0.0.2: icmp_seq=3 ttl=64 time=2.14 ms
64 bytes from 10.0.0.2: icmp_seq=4 ttl=64 time=0.793 ms
...
</pre></div>
<p>If you are curious how this works: OVS uses <strong>racoon</strong> package to build and manage the IPsec tunnel. Have a look at racoon config files <code>/etc/racoon/racoon.conf</code> and <code>/etc/racoon/psk.txt</code> - you will notice that the configuration is automatically generated by Open vSwitch. </p>
<p><a href="https://www.cloudshark.org/captures/1d3b907580eb" target="_blank">Here is the full packet capture</a>, but of course, as it's IPsec, you will only see the outer IP header (<strong>192.168.56.11</strong> and <strong>192.168.56.12</strong>), while the payload (GRE/ETH/IP/ICMP) is encrypted and you only see ESP information.<br/>
But if you use Wireshark, you can provide the keys and it will decrypt it for you - see below:</p>
<p><a href="/uploads/greoipsec.png" title="GREoIPSEC between two OVS bridges"><img alt="GREoIPSEC between two Open vSwitch bridges" src="/uploads/greoipsec.png" title="GREoIPSEC between two Open vSwitch bridges"/></a></p>
<p>If you are curious, how Wireshark decrypted it, I will present that in a separate blog post ! </p>
<p>Since this post became very long, I decided to leave the iperf tests for a separate article, also because you will have to deal with MTU issues and TCP Segmentation Offload (tso) - it will be better to explain all of that in a separate post! </p>
<p><br>
<em>Thanks for your interest ! Stay tuned for the follow-up articles on this topic !</em> </br></p>
<p><br/></p>My SDN Testbed2016-06-26T18:30:00+01:00Costitag:costiser.ro,2016-06-26:2016/06/26/my-sdn-testbed/<p><span class="dropcap-bg">O</span>ver the next few articles, I will write about OpenFlow, Open vSwitch and other SDN related topics. As always, I'm combining the theory part with some hands-on practice and for this, I put this article together describing one way of building such a testing environment.</p>
<p>In the subsequent SDN articles, I want to focus on the topic being discussed and not on how to build the lab and this is the reason of writing this post. </p>
<p>Among other things, SDN means flexibility - as such, there are multiple ways of creating your own SDN lab, some more simpler than others, depending on your skills. <em>At the heart of an SDN testbed is <strong>virtualization</strong></em> - you virtualize almost everything: hosts, network links, switches, routers, etc... Most of the virtualization techniques are based on linux - for example: isolation using namespaces (especially network namespaces), linux bridging, virtual ethernet pairs (veth), OS-level virtualization vs. lightweight containers - all these are topics that will help you get the full picture. I won't spend time explaining them since there are plenty of resources out there on the big wide web.</p>
<div class="row"><div class="col-md-12">
<div class="panel panel-orange">
<div class="panel-heading"><i class="fa fa-binoculars"></i> Note:</div>
<div class="panel-body">
By far, the simplest way to get yourself an SDN lab is to <a href="http://mininet.org/download/" target="_blank">download Mininet VM image</a>. You may as well follow their <a href="http://mininet.org/walkthrough/" target="_blank">Mininet Walkthrough</a> if you hadn't done that already !
</div>
</div>
</div></div>
<p>Again, I want to emphasize that this is just an example of a lab, just one way of testing ! There are many other ways to achieve the same things - using LXC, QEMU, KVM, just to name a few. I'm not going to reason <em>why</em> I use this instead of something else - it all comes down to your personal preference.</p>
<p>If you already have your favorite virtual environment and you are interested only in the OVS testbed, then please jump directly to <a href="#open-vswitch-setup_1">Open vSwitch Setup</a> section. </p>
<h3 id="components">Components</h3>
<p>Here is a list of the components used to build the lab environment: </p>
<ul>
<li><a href="https://www.virtualbox.org/wiki/Downloads"><strong><green>VirtualBox</green></strong></a> - used to create different virtual machines</li>
<li><a href="https://www.vagrantup.com/downloads.html"><strong><green>Vagrant</green></strong></a> - used to manage the VMs in a fast and reproducible fashion (thus making sure that everyone uses the same environment, avoiding troubleshooting VirtualBox issues and focusing entirely on the topic being tested)</li>
<li><strong>Network Namespaces</strong> - used to isolate (or partition) the network interfaces (and, as a result, the routing table, more like VRFs do). We are going to use them to simulate testing hosts.</li>
<li><strong>veth pairs</strong> - Virtual Ethernet interfaces that are used to connect the virtual elements (hosts, software switches) between them. <em>VETH</em> always come in pairs - think of them as being same as a pipe: whatever comes in one end will go out the other.</li>
<li><strong>X11 server</strong> - used for X forwarding. Depending on your OS, you may want to install <strong>Xming</strong> (if you're on Windows) or <strong>XQuartz</strong> (if you're on MacOS)</li>
</ul>
<p>Please follow the links above to install VirtualBox and Vagrant. </p>
<p><br>
<script async="" src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- pelican-article -->
<ins class="adsbygoogle" data-ad-client="ca-pub-7090359976267134" data-ad-format="auto" data-ad-slot="8618794473" style="display:block"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
<br/></br></p>
<h3 id="setup">Setup</h3>
<h4 id="virtualbox">VirtualBox</h4>
<p>Once you have downloaded and installed VirtualBox, let's create a couple of <em>Host-Only</em> network adapters. To do so, open <purple>VirtualBox Preferences</purple>, then navigate to <purple>Network</purple> -> <purple>Host-Only Networks</purple> and add: </p>
<ul>
<li><code>vboxnet0</code> - IPv4 address: <strong>192.168.56.1/24</strong></li>
<li><code>vboxnet1</code> - IPv4 address: <strong>192.168.57.1/24</strong></li>
</ul>
<h4 id="vagrant">Vagrant</h4>
<p>We are going to use <strong>Vagrant</strong> in order to manage the VirtualBox VMs. The Operating System of the VMs will be the latest LTS version of Ubuntu (currently, Xenial - 16.04) - this is how we tell vagrant to add such a box to its inventory:<br/>
<code>vagrant box add ubuntu/xenial64</code>.<br/>
Personally I keep all my vagrant boxes in a folder called <em>vagrantwork</em>: <code>mkdir -p ~/vagrantwork</code>. </p>
<p>After this point, the workflow is pretty standard - <red><em>everytime you want to create a new virtual machine, you follow the steps below</em></red>: </p>
<ul>
<li>
<p>create a new directory for each VM (I call them <strong>box1</strong>, <strong>box2</strong>, ...): <code>mkdir -p ~/vagrantwork/box1</code></p>
</li>
<li>
<p>create an init vagrant file inside that directory:
<div class="row">
<pre>cd ~/vagrantwork/box1
cat >Vagrantfile <<EOF
Vagrant.configure(2) do |config|
config.vm.box = "ubuntu/xenial64"
config.vm.box_check_update = false
config.vm.network "private_network", ip: "192.168.56.<purple>11</purple>"
config.vm.network "private_network", ip: "192.168.57.<purple>11</purple>"
config.vm.provider "virtualbox" do |vb|
vb.name = "vagrant_<purple>box1</purple>"
end
config.ssh.forward_x11 = true
end
EOF</pre></div>
In case you create multiple VMs, I highlighted what you need to change - for example, a second VM, stored in its own <code>~/vagrantwork/box2</code> folder, will get IP addresses <strong>192.168.56.<red>12</red></strong> & <strong>192.168.57.<red>12</red></strong> and the name <code>vagrant_box2</code>. </p>
</li>
<li>
<p>bring up your box (make sure you are in the right folder):<br/>
<div class="row">
<pre class="col-md-7">cd ~/vagrantwork/box1
<purple>vagrant up</purple></pre></div></p>
</li>
<li>
<p>That's it! Now connect to your VM:
<div class="row">
<pre class="col-md-7">cd ~/vagrantwork/box1
<purple>vagrant ssh</purple></pre></div></p>
</li>
<li>
<p>(extra) do an <code>sudo apt-get update; sudo apt-get upgrade</code> </p>
</li>
<li>
<p>(extra) set a miningful hostname - for example set name "<red><strong>box-1</strong></red>" in the <code>/etc/hostname</code> and <code>/etc/hosts</code> files. </p>
</li>
</ul>
<p>Read more about Vagrant on <a href="https://www.vagrantup.com/docs/cli/">their documentation page</a>. Some other very useful vagrant commands:</p>
<ul>
<li><code>vagrant status</code> = displays the state of the machine</li>
<li><code>vagrant box list</code> = lists all boxes installed into Vagrant</li>
<li><code>vagrant snapshot [ save | list | delete ]</code> = manage snapshots</li>
<li><code>vagrant [ suspend | resume ]</code> = suspends (and resume) the guest machine rather than fully shutting it down</li>
<li><code>vagrant destroy</code> = destroy the machine</li>
</ul>
<h3 id="open-vswitch-setup_1">Open vSwitch Setup</h3>
<p>Here you also have the option of going the <em>easy way</em> by installing Mininet and everything that comes with it (I recommend installing from sources) - <em><red>Don't do this (read further) !</red></em>: </p>
<div class="row">
<pre class="col-md-10">git clone git://github.com/mininet/mininet
cd mininet
util/install.sh -nfv <em><black># use '-h' (help) to understand what you are installing</black></em></pre></div>
<p>Or, another simple option is to install the Ubuntu packages ( <em><red> Again, don't do this just yet (read further) !</red></em> ): <code>sudo apt-get install openvswitch-switch openvswitch-common</code></p>
<p>But, if you want to have an identical setup with mine (<em>in case you want to recreate the labs that I'm going to present in future posts</em>), then, <blue><em>do follow below instructions</em></blue> - it may seem complicated, but you'll learn something new ! In summary, target is to install the latest Linux kernel version and the latest version of Open vSwitch, because these are the ones that contains a lot of interesting features. </p>
<h4 id="install-latest-kernel">Install Latest Kernel</h4>
<p>Lately, my tests required me to compile the kernel from sources and although the process is not as painful as it once (read 10 year ago) was, it was still time consuming and error prone. I found that installing the kernel using <strong>.deb</strong> packages from the <blue><em>kernel-ppa/mainline</em></blue>, is a much straight forward process. This is how it is done: </p>
<div class="row">
<pre><black># Navigate to http://kernel.ubuntu.com/~kernel-ppa/mainline/
# scroll down to the very bottom (where the latest kernel version is)
# then download these 3 .deb files:
# linux-headers-...-<your_arch>.deb
# linux-headers-...-all.deb
# linux-image-...-<your_arch>.deb</your_arch></your_arch></black>
wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.7-rc6-yakkety/linux-headers-4.7.0-040700rc6-generic_4.7.0-040700rc6.201607040332_amd64.deb
wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.7-rc6-yakkety/linux-headers-4.7.0-040700rc6_4.7.0-040700rc6.201607040332_all.deb
wget http://kernel.ubuntu.com/~kernel-ppa/mainline/v4.7-rc6-yakkety/linux-image-4.7.0-040700rc6-generic_4.7.0-040700rc6.201607040332_amd64.deb
<black># then install them and reboot</black>
sudo dpkg -i *.deb
sudo reboot
</pre></div>
<h4 id="install-latest-open-vswitch">Install Latest Open vSwitch</h4>
<p>Now I install Open vSwitch by building the Debian packages from the latest version (master) Git tree:</p>
<div class="row">
<pre>
<black># Get the latest version (master)</black>
mkdir -p ~/ovs-master; cd ~/ovs-master
git clone https://github.com/openvswitch/ovs.git
cd ovs
<black># Install necessary packages for the build
# use command 'dpkg-checkbuilddeps' to find the list of packages that you need to install</black>
sudo apt-get install build-essential fakeroot
sudo apt-get install pkg-config graphviz autoconf automake debhelper dh-autoreconf libssl-dev libtool python-all python-qt4 python-twisted-conch python-zopeinterface
<black># Now build the packages
# Start this process only after the command 'dpkg-checkbuilddeps' returns nothing missing
# This steps takes about 15 min</black>
DEB_BUILD_OPTIONS='parallel=8' fakeroot debian/rules binary
<black># Now install the packages
# but before that some required packages first</black>
sudo apt-get install ipsec-tools racoon
cd ..
<blue>sudo dpkg -i openvswitch-common*.deb openvswitch-switch*.deb openvswitch-ipsec*.deb openvswitch-pki*.deb python-openvswitch*.deb openvswitch-testcontroller*.deb</blue>
</pre></div>
<p>Let's test that everything is ok (of course, your version may be higher than the one displayed below, since you are installing the latest stuff): </p>
<div class="row">
<pre class="col-md-10">ubuntu@box-1 ~$ <purple>sudo ovs-vsctl show</purple>
cae0df42-9c95-4af0-a4b9-f10386ed7bee
<blue>ovs_version: "2.5.90"</blue>
</pre></div>
<h3 id="leveraging-linux-to-create-an-sdn-testbed_1">Leveraging Linux to Create an SDN Testbed</h3>
<p>Again, using Mininet is, for sure, the easy option to create a testbed. Run <code>sudo mn</code> and you'll get the default OVS switch with two hosts connected to it.</p>
<p>Instead, I will manually create each element - thus learning some new stuff that will help later on ! Below is the diagram showing the target that we want to reach: <strong><em>2 hosts connected via veth pairs to an OVS switch</em></strong>.</p>
<p><a href="/uploads/sdn-network-namespace-veth-ovs-base.png">
<img alt="SDN Network Namespaces - veth - OVS" src="/uploads/sdn-network-namespace-veth-ovs-base.png" width="60%"/>
</a></p>
<h4 id="creating-hosts-using-network-namespaces">Creating Hosts using Network Namespaces</h4>
<p>We will use network namespaces in order to simulate hosts:</p>
<div class="row">
<pre class="col-md-8">sudo ip netns add <purple>h1</purple>
sudo ip netns add <purple>h2</purple>
</pre></div>
<h4 id="creating-an-ovs-switch">Creating an OVS Switch</h4>
<p>Let's create a virtual OVS switch named <strong>sw1</strong>:</p>
<div class="row">
<pre class="col-md-8">sudo ovs-vsctl add-br <purple>sw1</purple>
</pre></div>
<h4 id="creating-network-connectivity-using-veth-pairs">Creating Network Connectivity using veth pairs</h4>
<p>We will need to create veth links in order to achieve network connectivity:</p>
<div class="row">
<pre class="col-md-8">sudo ip link add name <purple>h1-eth1</purple> type veth peer name <purple>sw1-port1</purple>
sudo ip link add name <purple>h2-eth1</purple> type veth peer name <purple>sw1-port2</purple>
</pre></div>
<p>Now we will connect each host to the switch using the veth pairs created above. One end of the veth pair (<strong><em>hX-eth1</em></strong>) will be attached to the network namespace while the other end (<strong><em>sw1-portX</em></strong>) will be added to the switch.</p>
<div class="row">
<pre class="col-md-8"><black># for h1:</black>
sudo ip link set dev h1-eth1 netns h1
sudo ovs-vsctl add-port sw1 sw1-port1
<black># for h2:</black>
sudo ip link set dev h2-eth1 netns h2
sudo ovs-vsctl add-port sw1 sw1-port2
</pre></div>
<h4 id="configure-ip-addresses-and-bring-interfaces-up">Configure IP addresses and Bring Interfaces Up</h4>
<p>By default all these veth interfaces are in DOWN state and un-configured. Since one end is assigned to the network namespace, all the configuration commands need to be executed <strong><em>inside</em></strong> the namespace with command <code>sudo ip netns exec <host> <command></code>.</p>
<p>Let's see:</p>
<div class="row">
<pre class="col-md-8">
<black># for h1:</black>
sudo ip netns exec h1 ifconfig h1-eth1 10.0.0.1/24 up
<black># for h2:</black>
sudo ip netns exec h2 ifconfig h2-eth1 10.0.0.2/24 up
</pre></div>
<p>The other end of the veth pair, that is connected to the virtual switch, just needs to be brought up - using standard linux commands:</p>
<div class="row">
<pre class="col-md-8">sudo ip link set sw1-port1 up
sudo ip link set sw1-port2 up
</pre></div>
<h4 id="test-connectivity-between-hosts">Test Connectivity between hosts</h4>
<p>At this moment you should have a working lab environment with connectivity between <strong>h1</strong> (<green>10.0.0.1</green>) and <strong>h2</strong> (<green>10.0.0.2</green>) namespaces:</p>
<div class="row">
<pre>ubuntu@box-1 ~$ <purple>sudo ip netns exec h1 ping 10.0.0.2</purple>
PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data.
64 bytes from 10.0.0.2: icmp_seq=1 ttl=64 time=0.334 ms
64 bytes from 10.0.0.2: icmp_seq=2 ttl=64 time=0.059 ms
^C
--- 10.0.0.2 ping statistics ---
2 packets transmitted, 2 received, <green>0% packet loss</green>, time 999ms
rtt min/avg/max/mdev = 0.059/0.196/0.334/0.138 ms
ubuntu@box-1 ~$
</pre></div>
<p>You may ask <red><em>why does it work, since we did not add any flows ?</em></red><br/>
The answer is: <strong><em>the default behaviour of the OVS switch</em></strong> is that of a traditional layer 2 switch. You can see this by inspecting the flow table with <code>ovs-ofctl dump-flows</code> command:</p>
<div class="row">
<pre>ubuntu@box-1 ~$ <purple>sudo ovs-ofctl dump-flows sw1</purple>
NXST_FLOW reply (xid=0x4):
cookie=0x0, duration=1818.260s, table=0, n_packets=44, n_bytes=3704, idle_age=4, priority=0 <blue>actions=NORMAL</blue>
</pre></div>
<p>By default, there is this single flow that has no match fields (so basically it matches everything) and <code>action = NORMAL</code> which means that it uses the traditional non-OpenFlow Layer 2 switching.</p>
<p><br>
<script async="" src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- pelican-article -->
<ins class="adsbygoogle" data-ad-client="ca-pub-7090359976267134" data-ad-format="auto" data-ad-slot="8618794473" style="display:block"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
<br/></br></p>
<h3 id="troubleshooting_1">Troubleshooting</h3>
<p>If you have got this far but for some reason the two hosts cannot ping each other, you can follow these trobleshooting steps to check what's wrong:</p>
<table class="notes">
<tr><td class="notes-left">
<div class="hidden-xs">NOTE</div><i class="icon-book-open icon-sm"> </i>
</td>
<td class="notes-right">
The terms bridge and switch define the same thing and are used interchangeably !
</td></tr>
</table>
<p><br/></p>
<ul>
<li>
<p>check the status of the OVS bridge (that it contains the correct ports)
<div class="row">
<pre>ubuntu@box-1 ~$ sudo ovs-vsctl show
99804df1-b2f3-4e33-b1a4-59a7c6f260db
Bridge "sw1"
<green>Port "sw1-port2"
Interface "sw1-port2"
Port "sw1-port1"
Interface "sw1-port1"</green>
Port "sw1"
Interface "sw1"
type: internal
ovs_version: "2.5.0"
</pre></div></p>
</li>
<li>
<p>check that interfaces are UP on the switch:
<div class="row">
<pre>ubuntu@box-1 ~$ sudo ovs-ofctl show sw1
OFPT_FEATURES_REPLY (xid=0x2): dpid:0000f66282ccb549
n_tables:254, n_buffers:256
capabilities: FLOW_STATS TABLE_STATS PORT_STATS QUEUE_STATS ARP_MATCH_IP
actions: output enqueue set_vlan_vid set_vlan_pcp strip_vlan mod_dl_src mod_dl_dst mod_nw_src mod_nw_dst mod_nw_tos mod_tp_src mod_tp_dst
<green>1(sw1-port1): addr:fe:2f:14:90:d7:c9
config: 0
state: 0
current: 10GB-FD COPPER
speed: 10000 Mbps now, 0 Mbps max
2(sw1-port2): addr:ca:a4:43:30:6a:06
config: 0
state: 0
current: 10GB-FD COPPER
speed: 10000 Mbps now, 0 Mbps max</green>
LOCAL(sw1): addr:f6:62:82:cc:b5:49
config: PORT_DOWN
state: LINK_DOWN
speed: 0 Mbps now, 0 Mbps max
OFPT_GET_CONFIG_REPLY (xid=0x4): frags=normal miss_send_len=0
</pre></div>
You don't have to worry about the LOCAL(sw1) port showing as DOWN.</p>
</li>
<li>
<p>check that hosts (network namespaces) have network interfaces configured and UP:
<div class="row">
<pre>ubuntu@box-1 ~$ <purple>sudo ip netns exec h1 ifconfig</purple>
h1-eth1 Link encap:Ethernet HWaddr 7e:be:1f:55:e7:fb
<blue>inet addr:10.0.0.1 Bcast:10.0.0.255 Mask:255.255.255.0</blue>
inet6 addr: fe80::7cbe:1fff:fe55:e7fb/64 Scope:Link
<blue>UP</blue> BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
<green>RX packets:41</green> errors:0 dropped:0 overruns:0 frame:0
<green>TX packets:33</green> errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:3410 (3.4 KB) TX bytes:2762 (2.7 KB)
</pre></div></p>
</li>
</ul>
<h4 id="capture-traffic">Capture Traffic</h4>
<p>As I like to say, this environment is the "<blue><em>heaven for a network engineer</em></blue>" because you can capture traffic at any point you want. Use the commands in the diagram below to perfom tcpdump in different parts of your network. </p>
<p><a href="/uploads/sdn-network-namespace-veth-ovs-tcpdump.png"><img alt="sdn-network-namespace-veth-ovs-tcpdump" src="/uploads/sdn-network-namespace-veth-ovs-tcpdump.png"/></a></p>
<p>You may need to open multiple windows - in the first window you leave a continuous ping running on h1 (<code>sudo ip netns exec h1 ping 10.0.0.2</code>) and in the other windows you perform packet capture. You should see traffic leaving <strong>h-eth1</strong>, reaching <strong>sw1-port1</strong>, then outgoing on <strong>sw1-port2</strong> and finally reaching <strong>h2-eth1</strong>. </p>
<p>Here is an example of capture performed on <strong>sw1-port1</strong>:</p>
<div class="row">
<pre>ubuntu@box-1 ~$ <purple>sudo tcpdump -nli sw1-port1</purple>
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on sw1-port1, link-type EN10MB (Ethernet), capture size 262144 bytes
04:40:00.861168 <green>ARP, Request</green> who-has 10.0.0.2 tell 10.0.0.1, length 28
04:40:00.861411 <green>ARP, Reply</green> 10.0.0.2 is-at 4a:a6:8d:0b:27:d3, length 28
04:40:00.861417 IP 10.0.0.1 > 10.0.0.2: <green>ICMP echo request</green>, id 2551, seq 1, length 64
04:40:00.861512 IP 10.0.0.2 > 10.0.0.1: <green>ICMP echo reply</green>, id 2551, seq 1, length 64
</pre></div>
<div class="row warning">
<div class="col-xs-12 col-sm-2 warning-left">
WARNING<br><i class="icon-fire icon-lg"> </i>
</br></div>
<div class="col-xs-12 col-sm-10 warning-right">
Please note that some of the configuration (network namespaces - netns - and virtual ethernet - veth) <red>will be lost</red> upon reboot of your vagrant box. If you want to keep them, you need to add all these commands into the <code><blue>/etc/rc.local</blue></code> file!<br>
The OVS sw1 configuration will survive the reboot since it is stored in the OVSDB server !
</br></div>
</div>
<p>Now you are ready to start some SDN labs. I'll post them soon. </p>
<p><br/></p>Website Reborn and Migrated to Pelican2016-02-04T21:00:00+00:00Costitag:costiser.ro,2016-02-04:2016/02/04/website-migrated-to-pelican/<p><span class="dropcap-bg">H</span><strong><em>ello and Welcome to CostiSer.Ro in 2016 !!</em></strong></p>
<p>After a long period of inactivity, I decided to resume my on-line activity in the blogging space.<br/>
Being a perfectionist, I was not happy with the Wordpress platform mostly because of the continuous upgrades that I was supposed to perform. <em>Don't get me wrong</em>, Wordpress is a very powerful platform and I used it for very long time with great success. </p>
<p>Another very important driver for migration was the fact that I wanted something based on Python - as this is something that I'm using on a daily basis in my work. </p>
<p>Last, but not least, I wanted a new HTML template that I would have full control of, that I could tailor the way I wanted to and, also very important, that would be responsive and mobile friendly.</p>
<h3 id="the-journey">The Journey</h3>
<p>Finding the right platform for my needs (which I won't list now) was not easy. I started with <a href="http://mezzanine.jupo.org/">Mezzanine</a>, a very powerful content management platform based on Django framework that resembles in some ways with Wordpress. </p>
<p>While playing with it, I started thinking more and more about static site generators such as <a href="http://blog.getpelican.com/">Pelican</a> - hmmm, maybe I was (a bit ☺) influenced by my friend <a href="http://www.trueneutral.eu/">@trueneutral.eu</a>. I gave it a try and that was it ! </p>
<p><em>One of the most painful (and time consuming) actions</em> was the migration of each article from Wordpress to Markdown. Yes, there are tools out there to do that and I used them, but I had loads of manual HTML code that I used in each Wordpress article (how stupid of me!). </p>
<p>Also, I created a new theme for Pelican based on an HTML template that I bought on-line. </p>
<h3 id="the-result">The result</h3>
<p>The result is what you are browsing right now ! I am very pleased with it ! <br/>
But that is not all: migrating to a static site decreased the response time by half, as seen in the graph below: </p>
<p><img alt="Response Times for CostiSer.Ro after migrating to Pelican" src="/uploads/response-times-costiser-ro.png" title="Response Times for CostiSer.Ro after migrating to Pelican"/></p>
<h3 id="what-else-did-i-do-in-2015">What else did I do in 2015 ?</h3>
<p>Heh, not much blogging...<br/>
But I would like to use this article to introduce a nice initiative that was born late last year, the <a href="https://inog.net/">Irish Network Operators Group</a>, a community that grew from 5 people to over 130 in 6 months.<br/>
If you are a <strong><em>network engineer in Ireland</em></strong>, make sure you visit the <a href="https://inog.net/">office website</a> (<em>https and dual-stack</em>) and join the <a href="https://groups.google.com/d/forum/inog">Google Group</a> or access the <a href="https://inog.slack.com/">iNOG Slack</a>.</p>
<h3 id="plans-in-2016">Plans in 2016 ?</h3>
<p>Lots of things that I want to do, lots of ideas, lots of work... <em>but so few hours in a day</em> ☹ <br/>
I will, <em>most probably</em>, not have too much time to write quizzes, although I will try to make few from time to time. Instead I will, <em>hopefully</em>, write more about network automation and SDN.<br/>
In parallel, I introduced a new section, called <a href="http://costiser.ro/blog/qotd/index.html">Question of the Day</a>, small and fast to read posts. </p>
<p>For more details, see the <a href="http://costiser.ro/blog/about.html">about me</a> or <a href="http://costiser.ro/blog/faq.html">faq</a> pages.</p>What is the broadcast address in IPv6 ?2016-01-19T00:00:00+00:00Costitag:costiser.ro,2016-01-19:2016/01/19/question-of-the-day-19th-jan-2016/<p><span class="dropcap">T</span>his is a <em><strong>tricky</strong></em> question ! If you did come up with a broadcast address in IPv6, <em>then you are wrong</em> ! <br />
<em><strong>There is no broadcast concept in IPv6!</strong></em><br />
Instead, IPv6 uses multicast that have different scopes, such as global, organisation-local, site-local or link-local (just few examples, list is longer).<br />
Also, as review, IPv6 supports these types of addresses: unicast, multicast and anycast. </p>How many penalty points does a BGP route get for each flap, when Route Dampening is enabled ?2016-01-11T00:00:00+00:00Costitag:costiser.ro,2016-01-11:2016/01/11/question-of-the-day-11th-jan-2016/<p><span class="dropcap">A</span>fter BGP Dampening is enabled, every time a route flaps, it gets a penalty of <strong><em>1000</em></strong>. This penalty is cumulative and when it exceeds the <em>suppress</em> limit value, it will be get suppressed.<br />
Note that this penalty value is <strong>not</strong> configurable. </p>How do two adjacent routers know that they have a two-way OSPF communication ?2016-01-08T00:00:00+00:00Costitag:costiser.ro,2016-01-08:2016/01/08/question-of-the-day-8th-jan-2016/<p><span class="dropcap">T</span>he fact that a router sees a Hello packet it only shows an <em>INIT</em> state but it does not ensure a two-way communication. Routers include the list of neighbors in their Hello packets, so a <strong><em>2-way</em></strong> state is reached when a router sees its own ID in the Hellos received from that particular peer.<br />
The Hello packets also contains the information about DR and BDR - or use 0.0.0.0 if they have not been selected yet.</p>How many bits does the VLAN ID have in the 802.1Q header ?2016-01-05T00:00:00+00:00Costitag:costiser.ro,2016-01-05:2016/01/05/question-of-the-day-5th-jan-2016/<p><span class="dropcap">T</span>he VLAN ID field in the 802.1Q header has 12 bits and hence the maximum number of vlans is 2 at the power of 12, which means 4096.</p>How does Path MTU Discovery (PMTUD) work ?2016-01-04T00:00:00+00:00Costitag:costiser.ro,2016-01-04:2016/01/04/question-of-the-day-4th-jan-2016/<p><span class="dropcap">P</span>ath MTU Discovery is performed by setting the <strong>DF</strong> (Don't Fragment) bit "<em>on</em>" in the IP Header. Routers along the path, that have smaller MTU on the outgoing links, will send back an ICMP <strong><em>Fragmentation Needed and Don't Fragment was set</em></strong> message (type 3, code 4) - also known as <em>ICMP Packet Too Big</em> -, in which they specify the outgoing MTU.</p>OSPF Default-Information Originate – Side Effects of ALWAYS keyword2014-12-03T00:00:00+00:00Costitag:costiser.ro,2014-12-03:2014/12/03/ospf-default-information-originate-side-effects-of-always-keyword/<p><span class="dropcap">T</span>his post represents the solution and explanation for <a href="/2014/05/10/quiz-24/">quiz-24</a>. </p>
<h3 id="quiz-review">Quiz Review</h3>
<p>Quiz #24 opens the discussion about a scenario in which traffic is black-holed when a certain link fails. Let's summarize the quiz: </p>
<p><a href="/uploads/default-information-originate-always.png" title="Side Effects of ALWAYS keyword"><img alt='default-information-originate-always "Side Effects of ALWAYS keyword"' src="/uploads/default-information-originate-always.png" title="Side Effects of ALWAYS keyword"/></a> </p>
<ul>
<li>company ABC runs OSPF internally, in all 3 buildings</li>
<li>internet access is provided via 2 Border Routers (<strong>BR-B</strong> and <strong>BR-C</strong>) each connected to a separate ISP</li>
<li>each BR receives a default route from its directly connected ISP (via eBGP)</li>
<li>each BR is configured with <code>default-information originate always</code> (the network administrator considered that the Border Routers / BRs will <em>always</em> be an exit point for the internet traffic sourced from the internal networks/buildings</li>
<li>buildings A and D are single-connected to only 1 Border Router (BR-B and BR-C, respectively)</li>
</ul>
<p>In the above configuration, it was noticed that <em>when ISP-1 fails, <red><strong>CORE-A</strong> (and all users in Building-A) cannot reach the internet any more</red></em>, even though CORE-A still has a default route: </p>
<div class="row">
<pre class="col-md-10">CORE-A#<purple>sh ip ospf data</purple>
...
Type-5 AS External Link States
Link ID ADV Router Age Seq# Checksum Tag
<green>0.0.0.0 192.168.12.2 664 0x80000003 0x00CA6E 1</green>
<green>0.0.0.0 192.168.15.1 1114 0x80000002 0x00BD7A 1</green>
CORE-A#
CORE-A#<purple>sh ip route ospf</purple>
...
<green>O*E2 0.0.0.0/0 [110/1] via 192.168.15.1, 00:51:22, FastEthernet0/0</green>
CORE-A#</pre>
</div>
<p>The network engineer <em>expected</em> that when ISP-1 fails, internet connectivity from Building-A would not be impacted as it would go from CORE-A to BR-B, then BR-C and to Internet via ISP-2. </p>
<h3 id="problem-description">Problem description</h3>
<p>Unfortunately the network admin was wrong ... simple troubleshooting shows that if/when ISP-1 fails, traffic from Building-A is black-holed on BR-B.<br/>
Let's see what happens in details (again, <u>when ISP-1 fails</u>): </p>
<ul>
<li>
<p><strong>on CORE-A:</strong></p>
<ul>
<li>as shown above, CORE-A still has both Type 5 LSAs for 0.0.0.0 (injected by BR-B and BR-C) and installs a default route via BR-B (all fine here) </li>
</ul>
</li>
<li>
<p><strong>on BR-B:</strong></p>
<ul>
<li>due to the <code>always</code> keyword, BR-B generates a Type-5 LSA (external) for default 0.0.0.0</li>
<li>BR-B receives also the Type-5 External LSA for 0.0.0.0 originated by the other border router BR-C</li>
<li>
<p>checking the routing table, we see that <red><strong><em>BR-B does <u>not</u> have any default route</em></strong></red> even though it <u>does</u> have both Type-5 LSAs in the OSPF database:</p>
<p><div class="row">
<pre>BR-B#
*Mar 1 00:02:58.203: %BGP-5-ADJCHANGE: <red>neighbor 1.1.1.1 Down</red> Interface flap
BR-B#
BR-B#sh ip route 0.0.0.0
<red>% Network not in table</red>
BR-B#
BR-B#
BR-B#<purple>sh ip ospf database</purple>
...
Type-5 AS External Link States
Link ID ADV Router Age Seq# Checksum Tag
0.0.0.0 192.168.12.2 293 0x80000001 0x00CE6C 1
0.0.0.0 192.168.15.1 294 0x80000001 0x00BF79 1
BR-B#</pre>
</div></p>
</li>
<li>
<p>there were people suggesting (in the comments of the quiz) that BR-B dis-regards the 0.0.0.0 Type-5 LSA received from the other router BR-C because of higher metric (including cost to the FA/Forwarding Address)... but I do not agree with such an explanation: first, BR-B cannot consider its own generated LSA for SPF calculation (see <a href="http://tools.ietf.org/html/rfc1583">RFC 1583, Section 16.4 "Calculating AS external routes, point 2</a>) => so it cannot compare BR-C's LSA against something that is already out of the calculation.<br/>
Let's see these LSAs in detail:</p>
<p><div class="row">
<pre>BR-B#<purple>sh ip ospf database external</purple>
OSPF Router with ID (192.168.15.1) (Process ID 1)
Type-5 AS External Link States
LS age: 1329
Options: (No TOS-capability, DC)
LS Type: AS External Link
Link State ID: 0.0.0.0 (External Network Number )
<green>Advertising Router: 192.168.12.2</green>
LS Seq Number: 80000001
Checksum: 0xCE6C
Length: 36
Network Mask: /0
Metric Type: 2 (Larger than any link state path)
TOS: 0
Metric: 1
Forward Address: 0.0.0.0
External Route Tag: 1
LS age: 1330
Options: (No TOS-capability, DC)
LS Type: AS External Link
Link State ID: 0.0.0.0 (External Network Number )
<green>Advertising Router: 192.168.15.1</green>
LS Seq Number: 80000001
Checksum: 0xBF79
Length: 36
Network Mask: /0
Metric Type: 2 (Larger than any link state path)
TOS: 0
Metric: 1
Forward Address: 0.0.0.0
External Route Tag: 1
BR-B#
</pre>
</div></p>
<p><em>Note that <strong>none</strong> of these LSAs have the "Routing Bit Set" (a first sign that something is "fishy" with the LSA received from BR-C)</em> </p>
</li>
<li>
<p>performing an SPF debug will reveal the reason why the BR-C's 0.0.0.0 Type-5 LSA is not considered for the SPF calculation (which explains the missing "Routing Bit Set") => explains why BR-B does not have a default route via BR-C:</p>
<p><div class="row">
<pre>BR-B#<purple>debug ip ospf spf</purple>
OSPF spf events debugging is on
OSPF spf intra events debugging is on
OSPF spf inter events debugging is on
OSPF spf external events debugging is on
BR-B#
... OSPF: Start partial processing Type 5 External LSA 0.0.0.0, mask 0.0.0.0,
adv 192.168.12.2, age 1, seq 0x80000003, metric 1, metric-type 2
BR-B#
... <red>OSPF: We originate default always. Don't install default from others
... OSPF: delete lsa id 0.0.0.0, type 5, adv rtr 192.168.12.2 from delete list</red>
</pre>
</div></p>
</li>
</ul>
</li>
</ul>
<div class="row"><div class="col-md-12">
<div class="panel panel-orange">
<div class="panel-heading"><i class="fa fa-binoculars"></i> Conclusion:</div>
<div class="panel-body">
Any OSPF router that originates a default with <u>always</u> keyword will <u>never</u> accept other 0.0.0.0 Type-5 LSA from another neighbor !
</div>
</div>
</div></div>
<p><br>
<script async="" src="//pagead2.googlesyndication.com/pagead/js/adsbygoogle.js"></script>
<!-- pelican-article -->
<ins class="adsbygoogle" data-ad-client="ca-pub-7090359976267134" data-ad-format="auto" data-ad-slot="8618794473" style="display:block"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
<br/></br></p>
<h3 id="quiz-solutions">Quiz Solutions</h3>
<h4 id="1-remove-always">1. Remove ALWAYS</h4>
<p>The simplest solution is to remove the <code>always</code> keyword. After doing so, the SPF debugs shows that the Type-5 LSA 0.0.0.0 from BR-C (192.168.12.2) is accepted into the SPF calculations and a default route via BR-C is installed:</p>
<div class="row">
<pre>*Mar 1 01:00:40.447: OSPF: Start partial processing Type 5 External LSA 0.0.0.0, mask 0.0.0.0,
adv 192.168.12.2, age 1583, seq 0x80000002, metric 1, metric-type 2
*Mar 1 01:00:40.447: Add better path to LSA ID 0.0.0.0, gateway 192.168.12.2, dist 1
*Mar 1 01:00:40.447: Add path: next-hop 192.168.12.2, interface FastEthernet0/0
*Mar 1 01:00:40.447: network update dest_addr 0.0.0.0 mask 0.0.0.0 gateway 192.168.12.2
*Mar 1 01:00:40.447: Add External Route to 0.0.0.0\. Metric: 1, Next Hop: 192.168.12.2
*Mar 1 01:00:40.451: OSPF: insert route list LS ID 0.0.0.0, type 5, adv rtr 192.168.12.2
BR-B#
BR-B#<purple>sh ip route ospf</purple>
<green>O*E2 0.0.0.0/0 [110/1] via 192.168.12.2, 00:01:50, FastEthernet0/0</green>
BR-B#</pre>
</div>
<p>Of course, without the "always" keyword and with the link to ISP-1 down, BR-B does not generate a 0.0.0.0 LSA, so the only one in the OSPF database is the one from neighbor BR-C (note the "<em><strong>Routing Bit Set</strong></em>" on this LSA):</p>
<div class="row">
<pre class="col-md-8">BR-B#sh ip ospf database external
OSPF Router with ID (192.168.15.1) (Process ID 1)
Type-5 AS External Link States
<green>Routing Bit Set on this LSA</green>
LS age: 1708
Options: (No TOS-capability, DC)
LS Type: AS External Link
Link State ID: 0.0.0.0 (External Network Number )
<green>Advertising Router: 192.168.12.2</green>
LS Seq Number: 80000002
Checksum: 0xCC6D
Length: 36
Network Mask: /0
Metric Type: 2 (Larger than any link state path)
TOS: 0
Metric: 1
Forward Address: 0.0.0.0
External Route Tag: 1
</pre>
</div>
<p>In my opinion, as a matter of best practice, this is the recommended solution because <em><u>the injection of the default route should be conditional</u></em> in most scenarios (use a <code>default-information originate</code> without <strong>always</strong> or, even better, together with a <strong>route-map</strong>). </p>
<h4 id="2-ibgp-between-brs">2. iBGP between BRs</h4>
<p>Another solution would be to form an iBGP peering between BR-B and BR-C and keep the <code>always</code> keyword.<br/>
This way, in case the ISP-1 fails, BR-B will install the default via iBGP from BR-C and connectivity in Building-A will not be impacted. </p>
<div class="row">
<pre class="col-md-8">BR-C(config-router)#router bgp 65001
BR-C(config-router)#<purple>neigh 192.168.12.1 remote-as 65001</purple>
BR-C(config-router)#<purple>neigh 192.168.12.1 next-hop-self</purple>
</pre>
</div>
<p><br/></p>
<p><em>Thanks again for all your comments in the quiz !<br/>
Subscribe to this blog to get more interesting quizzes and detailed solutions.</em> </p>
<p><br/></p>SDN Lesson #1 – Introduction to Mininet2014-08-07T00:00:00+01:00Costitag:costiser.ro,2014-08-07:2014/08/07/sdn-lesson-1-introduction-to-mininet/<p><span class="dropcap-bg">W</span>elcome to a new series of articles that will be structured as lessons with the target of bringing SDN closer to everyone's understanding.<br/>
Each article will present a topic plus one or more exercises that will show that topic in action. The lessons will wrap up with some questions asking the readers to exercise on their own and provide the answers. As you see, the approach is pretty similar to the networking quizzes, but with SDN ones, I also make an introduction to the topic since all this is relatively new. </p>
<p>When I started my personal SDN journey, it was difficult to find a place to start, so I began reading articles, viewing presentations or listening to different videos, all trying to demonstrate (mostly theoretically) the advantages, the applications or limitations of this new technology.<br/>
I am a person who learns best by "doing", by individual testing, so I have decided to start these articles in order to help other network engineers with their beginnings in SDN. </p>
<p>In parallel, I will continue my series of network quizzes and solutions, hoping that time will allow me to do so... </p>
<p>Having started treating SDN topics clearly shows that I believe in this new approach in networking and after reading some introductory presentations and papers, one of the arguments that convinced me, was an analogy to the computer industry evolution from vertically integrated, closed, proprietary, relatively slow innovation, specialized-hardware, specialized-feature systems to horizontal, open interface, rapid innovation networking technology: </p>
<p><a href="/uploads/sdn_analogy_with_computer_evolution_2.png" title="Computer evolution vs. Networking evolution"><img alt='sdn_analogy_with_computer_evolution_2 "SDN vs. Traditional Networking"' src="/uploads/sdn_analogy_with_computer_evolution_2.png" title="Computer evolution vs. Networking evolution"/></a></p>
<p><font size="-1"><em>[<a href="#references">1</a>] Taken from Nick McKeown's presentation "How SDN will shape networking"</em></font> </p>
<h4 id="what-is-software-defined-networking-sdn">What is Software Defined Networking (SDN)?</h4>
<p>SDN (Software Defined Networking) is a <em>network architecture that breaks the vertical integration by separating the control logic (control plane) from the underlying routers and switches that will become simple forwarding elements (data plane)</em>. The <strong>four pillars</strong> to remember about SDN are: [<a href="#references">2</a>]</p>
<ol>
<li><font color="blue">the control and data planes are decoupled</font>.</li>
<li>forwarding decisions are <font color="blue">flow-based instead of destination-based</font></li>
<li>control logic is moved to an external entity, called <font color="blue">SDN controller or NOS (Network Operatying System)</font></li>
<li><font color="blue">network is programmable</font> through applications running on top of the controller (the fundamental characteristic of SDN)</li>
</ol>
<p>In order to put all the theory into practice, we will use Mininet, a virtual network environment that runs on a single machine and provides many of the OpenFlow features built-in. Mininet will emulate an entire network of switches, virtual hosts (running standard Linux software), the links between them and, optionally, a SDN controller (I will talk in details about the SDN controller and OpenFlow in future articles). </p>
<p>The switches generated with Mininet will be just simple forwarding devices, without any "brain" of their own (no control plane). <strong><em>The new networking paradigm with SDN</em></strong> <em>is: control plane is separated from the data plane so such switches will only do what they are instructed by an external controller. Whenever a switch (or a forwarding device, generally speaking) does not know how to deal with some packets, it will simply send them to the controller</em> (or contact the controller referrencing some buffer-ids - more on this later on). </p>
<p>Mininet supports research, development, learning, prototyping, testing, debugging, and any other tasks that could benefit from having a complete experimental network on a laptop or other PC.[<a href="#references">3</a>]<br/>
This may require a little bit of programming knowledge (Python), but you can also get it along the way... </p>
<p><br>
<ins class="adsbygoogle" data-ad-client="ca-pub-7090359976267134" data-ad-format="auto" data-ad-slot="8618794473" style="display:block"></ins>
<script>
(adsbygoogle = window.adsbygoogle || []).push({});
</script>
<br/></br></p>
<h4 id="installing-and-running-mininet">Installing and Running Mininet</h4>
<p>The installation of Mininet is pretty straight forward - in summary:</p>
<ul>
<li>1. <a href="https://github.com/mininet/mininet/wiki/Mininet-VM-Images">download the Mininet VM image</a>,</li>
<li>2. open it with your favourite <a href="http://mininet.org/vm-setup-notes/">virtualization system (Virtual Box, VMware, VMFusion, Qemu or KVM)</a></li>
<li>3. then perform <a href="http://mininet.org/walkthrough/">the walkthrough that will teach you how to use it</a> (the entire walkthrough should take under an hour - no need to go though all of it now, just familiarize yourself with the basic commands).</li>
</ul>
<p>The default run of Mininet <code>sudo mn</code> will create a topology consisting of one controller (<strong><em>c0</em></strong>), one switch (<strong><em>s1</em></strong>) and two hosts (<strong><em>h1</em> </strong>and <strong><em>h2</em></strong>). </p>
<p>To help you start up, here are the most important options for running Mininet:</p>
<ul>
<li><strong><blue>--topo=TOPO</blue></strong> represents the topology of the virtual network, where TOPO could be:<ul>
<li><strong><blue>minimal</blue></strong> - this is the <u>default topology</u> with 1 switch and 2 hosts</li>
<li><strong><blue>single,X</blue></strong> - a single switch with X hosts attached to it</li>
<li><strong><blue>linear,X</blue></strong> - creates X switches connected in a linear/daisy-chain fashion, each switch with one host attached</li>
<li><strong><blue>tree,X</blue></strong> - a tree topology with X fanout</li>
</ul>
</li>
<li><strong><blue>--switch=SWITCH</blue></strong> creates different type of switches, such as:<ul>
<li><strong><blue>ovsk</blue></strong> - this is the <u>default Open vSwitch</u> that comes preinstalled in the VM</li>
<li><strong><blue>user</blue></strong> - this is a switch running in software namespace (much slower)</li>
</ul>
</li>
<li><strong><blue>--controller=CONTROLLER</blue></strong> where CONTROLLER can be:<ul>
<li><strong><blue>ovsc</blue></strong> - this creates the <u>default OVS Controller</u> that comes preinstalled in the VM</li>
<li><strong><blue>nox</blue></strong> - this creates the well-known NOX controller</li>
<li><strong><blue>remote</blue></strong> - does <strong><em>not</em> </strong>create a controller but instead listens for connections from external controllers</li>
</ul>
</li>
<li><strong><blue>--mac</blue></strong> set easy-to-read MAC addresses for the devices</li>
</ul>
<p>For our exercise, we will create a virtual network with one switch and 3 hosts, using the command shown below: </p>
<p><a href="/uploads/lesson-1_2.png" title="Mininet working topology"><img alt="lesson-1_2" src="/uploads/lesson-1_2.png" title="Mininet working topology"/></a> </p>
<p>For the moment, we will not touch the topic of SDN controller so our test network will have no controller. For such a switch to be able to forward traffic, it needs to be told how to handle the flows (again a topic for future articles).<br/>
In order to "control" the switch (to add and view the status of the flows on the switch) we will use an utility called <strong>DPCTL</strong> that allows direct control and visibility over switch's flow table <u><em>without the need to add debugging code to the controller</em></u>. Most OpenFlow switches have a passive listening port running on <strong>TCP 6634</strong> (by default) that can be used to poll flows and counters or to manually insert flow entries. <br/>
<div class="row warning">
<div class="col-xs-12 col-sm-2 warning-left">
WARNING<br><i class="icon-fire icon-lg"> </i>
</br></div>
<div class="col-xs-12 col-sm-10 warning-right">Do <b>not</b> confuse the <b>dpctl</b> with a controller (it's <b>not</b> the same thing) - dpctl is just a management/monitoring utility!
</div>
</div></p>
<p>You can use this utility directly under the mininet prompt or in a separate console by connecting to the listening port as indicated below:<br/>
<code>dpctl COMMAND tcp:127.0.0.1:6634 OPTIONS</code> <br/>
where COMMAND can be:</p>
<ul>
<li><strong>show</strong></li>
<li><strong>dump-flows</strong></li>
<li><strong>add-flow</strong></li>
</ul>
<p>In our first exercise, let's configure our virtual network in order to make host h1 successfully ping host h2. Follow these steps: </p>
<p><strong>STEP 1</strong>: Start Mininet with a single switch (the default, Open vSwitch = <em>ovsk</em>) and 3 hosts: </p>
<div class="row">
<pre>mininet@mininet-vm:~$ <purple>sudo mn --topo=single,3 --mac --switch=ovsk --controller=remote</purple>
*** Creating network
*** Adding controller
<red>Unable to contact the remote controller at 127.0.0.1:6633</red>
*** Adding hosts:
h1 h2 h3
*** Adding switches:
s1
*** Adding links:
(h1, s1) (h2, s1) (h3, s1)
*** Configuring hosts
h1 h2 h3
*** Starting controller
*** Starting 1 switches
s1
*** Starting CLI:
mininet>
mininet> <purple>nodes</purple>
available nodes are:
c0 h1 h2 h3 s1
mininet>
mininet>
mininet> <purple>net</purple>
h1 h1-eth0:s1-eth1
h2 h2-eth0:s1-eth2
h3 h3-eth0:s1-eth3
s1 lo: s1-eth1:h1-eth0 s1-eth2:h2-eth0 s1-eth3:h3-eth0
c0
mininet>
mininet> <purple>dump</purple>
<Host h1: h1-eth0:10.0.0.1 pid=9730>
<Host h2: h2-eth0:10.0.0.2 pid=9731>
<Host h3: h3-eth0:10.0.0.3 pid=9732>
<OVSSwitch s1: lo:127.0.0.1,s1-eth1:None,s1-eth2:None,s1-eth3:None pid=9735>
<RemoteController c0: 127.0.0.1:6633 pid=9723>
mininet></pre>
</div>
<p>As you can see, we receive this message: "<em>Unable to contact the remote controller at 127.0.0.1:6633</em>". This is because, for the time being, we are going to use mininet <em><strong>without any controller</strong></em> (cli argument <code>--controller=remote</code> tells mininet that we are going to use an external controller, but for this lesson, we don't need it).<br/>
In order to double-check that everything started correctly, use the following mininet commands: </p>
<ul>
<li><strong>nodes</strong> - to list all virtual devices in the topology</li>
<li><strong>net</strong> - to list of links between them</li>
<li><strong>dump</strong> - to see more info about the hosts</li>
</ul>
<p><strong>STEP 2</strong>: Open terminals for each host and run tcpdump on each:<br/>
<em>Attention: for Windows users, make sure you installed & run Xming, plus you enabled X-forwarding in your ssh session to the Mininet VM!</em> </p>
<p><strong>STEP 3</strong>: Test connectivity between h1 and h2: on host h1 perform a <code>ping -c3 10.0.0.2</code> (the IP address of host h2) </p>
<p><a href="/uploads/sdn-lesson-1-no-flows.png"><img alt="sdn-lesson-1-no-flows" src="/uploads/sdn-lesson-1-no-flows.png"/></a> </p>
<p>Results:</p>
<ul>
<li><red><em>ping will fail</em></red>, because the switch does <strong>NOT</strong> know what to do with such traffic (and remember, <strong><em>we don't run any controller</em></strong>)</li>
<li>checking the list of flows on the switch (with command <code>dpctl dump-flows</code>) will show an empty list (again, nobody told the switch how to deal with the traffic)</li>
</ul>
<p><strong>STEP 4</strong>: Manually add flows on the switch to allow connectivity between h1 and h2<br/>
Use the <code>dpctl add-flow</code> utility to manually install flows on the switch that will allow connectivity between host h1 and host h2.<br/>
<em>Attention: we need two rules to achieve a bidirectional connectivity (echo request and echo replies)</em>: </p>
<p><a href="/uploads/sdn-lesson-1-dpctl-add-flow.png"><img alt="sdn-lesson-1-dpctl-add-flow" src="/uploads/sdn-lesson-1-dpctl-add-flow.png"/></a></p>
<p>The 2 commands basically say:</p>
<ul>
<li><code>dpctl add-flow tcp:127.0.0.1:6634 in_port=1,actions=output:2</code> = everything received on port 1 (<strong>in_port</strong>) send out on port 2</li>
<li><code>dpctl add-flow tcp:127.0.0.1:6634 in_port=2,actions=output:1</code> = everything received on port 2 (return traffic) send out on port 1</li>
</ul>
<p>Result:</p>
<ul>
<li>ping is <green><strong>successful</strong></green></li>
<li>tcpdump on <strong>host h2</strong> shows the traffic from/to h1 (ARP and ICMP)</li>
<li>tcpdump on <strong>host h3</strong> does <red>not</red> see anything (<red><em>not even the ARP which should be broadcast</em></red>)!</li>
</ul>
<h4 id="sdn-exercise-1">SDN Exercise #1</h4>
<p>The first exercise in the SDN series will use the above setup, but with the additional requirement of <strong><em>treating ARP traffic as broadcast</em></strong>:</p>
<ul>
<li>ARP requests (no matter input port) are flooded on all switch ports</li>
<li>ICMP traffic between hosts h1 and h2 is unicasted on the relevant ports</li>
</ul>
<p><strong><em>What are the relevant <blue>dpctl add-flow</blue> commands to achieve this?</em></strong>[<a href="#references">4</a>]<br/>
Post your answer in the 'Comments' section below and subscribe to this blog to get more interesting SDN lessons and quizzes. </p>
<p><blue>Remember</blue>: if this is the first time you worked with Mininet, now would be a good time to do the <a href="http://mininet.org/walkthrough/">Mininet Walkthrough</a> !
<hr/></p>
<p>After completing this exercise, you should be able to see this: </p>
<p><a href="/uploads/sdn-lesson-1-target.png"><img alt="sdn-lesson-1-target" src="/uploads/sdn-lesson-1-target.png"/></a></p>
<ul>
<li>ARP request from host h1 are received by both h2 and h3 (highlighted in green) </li>
<li>ICMP echo requests from host h1 to h2 are only seen by h2 (highlighted in yellow) </li>
</ul>
<p><br/></p>
<h4 id="references">References</h4>
<ul>
<li>[1] <a href="https://www.youtube.com/watch?v=c9-K5O_qYgA">Nick McKeown's presentation "How SDN will shape networking"</a></li>
<li>[2] <a href="http://arxiv.org/abs/1406.0440">Software-Defined Networking: A Comprehensive Survey</a></li>
<li>[3] <a href="http://mininet.org/overview/">Mininet Overview</a></li>
<li>[4] <a href="http://ranosgrant.cocolog-nifty.com/openflow/dpctl.8.html"><strong>dpctl</strong> command reference</a></li>
</ul>
<p><br>
<em>Thanks again for all your comments !</em> </br></p>
<p><br/></p>QoS Pre-Classify - Where to Apply the Service Policy ?2014-06-02T00:00:00+01:00Costitag:costiser.ro,2014-06-02:2014/06/02/qos-pre-classify-where-to-apply-the-service-policy/<p><span class="dropcap-bg">C</span>This post represents the solution and explanation for <a href="/2014/04/07/quiz-23/index.html">quiz-23</a>.<br/>
Have a look at the quiz and test your knowledge before reading this solution. </p>
<h3 id="quiz-review">Quiz Review</h3>
<p>The quiz shows a scenario where the network engineer has to configure Low Latency Queuing (LLQ) for some traffic that will be encrypted into an IPsec tunnel.<br/>
The configuration of the policy-map is given but it has not been applied yet anywhere, as shown below: </p>
<p><a href="/uploads/quiz-23.png" title="QoS Pre-Classify - Where to Apply the Service Policy ?"><img alt="quiz-23" src="/uploads/quiz-23.png" title="QoS Pre-Classify - Where to Apply the Service Policy ?"/></a> </p>
<p>The final question is <strong><em>"what is missing to finish this task ?"</em></strong> giving the impression that the answer to the quiz is very simple: <em>apply the policy-map</em>... </p>
<h3 id="where-to-apply-the-service-policy">Where to Apply the Service Policy</h3>
<p>Unfortunately, the answer is not that simple...<br/>
<font color="blue"><em>1. applying the policy-map onto the physical interface</em></font> </p>
<div class="row">
<pre class="col-md-8">R2#conf t
R2(config-if)#<red>int s0/0</red>
R2(config-if)#<red>service-policy output LLQ</red>
R2(config-if)#end</pre>
</div>
<p>This configuration does not have the effect that we want because the ACL searches for traffic between <em>host 192.168.1.1 and host 192.168.5.5</em> (<purple><em>the inner header</em></purple>), but the policy-map <u>sees only the outer header</u> of the tunnel.<br/>
See the zero counters (after sending 100 pings from 192.168.1.1 to 192.168.5.5 via the tunnel):</p>
<div class="row">
<pre class="col-md-8">R2#sh policy-map int
Serial0/0
Service-policy output: LLQ
Class-map: IMPORTANT_TRAFFIC (match-all)
<red><u>0 packets, 0 bytes</u></red>
5 minute offered rate 0 bps, drop rate 0 bps
Match: access-group name ACL_IMPORTANT_TRAFFIC
Queueing
Strict Priority
Output Queue: Conversation 264
Bandwidth 33 (%)
Bandwidth 509 (kbps) Burst 12725 (Bytes)
(pkts matched/bytes matched) 0/0
(total drops/bytes drops) 0/0
<red>Class-map: class-default (match-any)
<u>101 packets, 15624 bytes</u></red>
5 minute offered rate 2000 bps, drop rate 0 bps
Match: any
R2#</pre>
</div>
<p><font color="blue"><em>2. applying the policy-map onto the tunnel interface</em></font><br/>
At first look, this would represent the correct solution and it might work with a different type of policy-map, but not with a Class Based Weighted Fair Queueing (CBWFQ):</p>
<div class="row">
<pre class="col-md-11">R2(config)#int tun0
R2(config-if)#service-policy output LLQ
*Mar 1 00:02:08.547: <red>Class Based Weighted Fair Queueing not supported on interface Tunnel0</red>
R2(config-if)#end</pre>
</div>
<p>As you can see, the IOS parser does not accept the LLQ policy-map to be applied directly onto the VTI (Tunnel0) interface.<br/>
On Cisco, <strong><em>logical interfaces (tunnel interfaces, sub-interfaces, etc) do not understand the state of concestion (since they are logical) and as a result you cannot apply a queueing mechanism</em></strong>. There is a workaround as described later on. </p>
<h3 id="solutions">Solutions</h3>
<p>There is a workaround for each of the two situations described above: </p>
<h4 id="1-qos-pre-classify-applying-the-policy-map-onto-the-physical-interface">1. QoS Pre-Classify - applying the policy-map onto the physical interface</h4>
<p>The recommended solution for this quiz is to use the <font color="blue"><em>QoS Pre-Classify</em></font> feature and apply the policy-map to the physical interface.<br/>
This features tells the router to keep a temporary copy of the packet's header in its memory (the inner header, before encapsulation and/or encryption) and use it to make QoS decisions such as priority queueing or classification. </p>
<div class="row">
<pre class="col-md-10">R2#conf t
R2(config-if)#<green>int s0/0</green>
R2(config-if)#<green>service-policy output LLQ</green>
!
!
R2(config)#<green>int tun0</green>
R2(config-if)#qos ?
pre-classify Enable QOS classification before packets are tunnel
encapsulated
R2(config-if)#<green>qos pre-classify</green>
</pre>
</div>
<div class="row">
<pre class="col-md-10">R2#<purple>sh policy-map interface</purple>
Serial0/0
Service-policy output: LLQ
Class-map: IMPORTANT_TRAFFIC (match-all)
<green><u>100 packets, 15600 bytes</u></green>
5 minute offered rate 2000 bps, drop rate 0 bps
Match: access-group name ACL_IMPORTANT_TRAFFIC
Queueing
Strict Priority
Output Queue: Conversation 264
Bandwidth 33 (%)
Bandwidth 509 (kbps) Burst 12725 (Bytes)
(pkts matched/bytes matched) 0/0
(total drops/bytes drops) 0/0
Class-map: class-default (match-any)
1 packets, 24 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Match: any
R2#</pre>
</div>
<table class="notes">
<tr><td class="notes-left">
<div class="hidden-xs">NOTE</div><i class="icon-book-open icon-sm"> </i>
</td>
<td class="notes-right"><ul>
The qos pre-classify feature can be applied either to the tunnel interface or to crypto-maps (for IPsec tunnels).
</ul>
</td></tr>
</table>
<h4 id="2-hierarchical-queueing-framework-hqf-applying-the-policy-map-onto-the-tunnel-interface">2. Hierarchical Queueing Framework (HQF) - applying the policy-map onto the tunnel interface</h4>
<p>The workaround mentioned for the above scenario #2 is to configure a hierarchical QoS (HQF) service policy that will can be applied to the logical (Tunnel0) interface.</p>
<div class="row">
<pre class="col-md-10">R2#conf t
R2(config)#<green>policy-map HQF</green>
R2(config-pmap)#class class-default
R2(config-pmap-c)#<green>shape average 1544000</green>
R2(config-pmap-c)#<green>service-policy LLQ</green>
R2(config-pmap-c)#exit
R2(config-pmap)#exit
R2(config)#<green>int tun0</green>
R2(config-if)#<green>service-policy output HQF</green>
R2(config-if)#exit</pre>
</div>
<div class="row">
<pre class="col-md-10">R2#<purple>sh policy-map interface</purple>
Tunnel0
Service-policy output: HQF
Class-map: class-default (match-any)
102 packets, 10499 bytes
5 minute offered rate 2000 bps, drop rate 0 bps
Match: any
Traffic Shaping
Target/Average Byte Sustain Excess Interval Increment
Rate Limit bits/int bits/int (ms) (bytes)
1544000/1544000 9650 38600 38600 25 4825
Adapt Queue Packets Bytes Packets Bytes Shaping
Active Depth Delayed Delayed Active
- 0 102 10099 0 0 no
Service-policy : LLQ
<green>Class-map: IMPORTANT_TRAFFIC (match-all)
<u>100 packets, 10400 bytes</u></green>
5 minute offered rate 2000 bps, drop rate 0 bps
Match: access-group name ACL_IMPORTANT_TRAFFIC
Queueing
Strict Priority
Output Queue: Conversation 72
Bandwidth 33 (%)
Bandwidth 509 (kbps) Burst 12725 (Bytes)
(pkts matched/bytes matched) 0/0
(total drops/bytes drops) 0/0
Class-map: class-default (match-any)
2 packets, 99 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Match: any</pre>
</div>
<div class="row warning">
<div class="col-xs-12 col-sm-2 warning-left">
REMEMBER<br><i class="icon-fire icon-lg"> </i>
</br></div>
<div class="col-xs-12 col-sm-10 warning-right">
Since HQF is a logical engine, it needs shaping to be configured in order to be able to provide other QoS functions.
</div>
</div>
<p>Some final notes:</p>
<ul>
<li>each of the above solutions have pro's and con's in specific scenarios and you may need to evaluate them before choosing the right solution</li>
<li>be aware that on some platforms (low end ones, usually) using <strong>shape</strong> command on the tunnel interfaces might cause high CPU problems</li>
<li>applying the service policy on the physical interface does account for the tunnel overhead</li>
</ul>
<p><em>Thanks again for all your comments in the quiz !<br/>
Subscribe to this blog to get more interesting quizzes and detailed solutions.</em> </p>
<p><br/></p>Quiz #24 – OSPF Default-Information Originate Always2014-05-10T00:00:00+01:00Costitag:costiser.ro,2014-05-10:2014/05/10/quiz-24/<p><span class="dropcap">C</span>ompany ABC has multiple buildings (A, B, C and D) and two internet connections to ISP-1 (in Building-B) and ISP-2 (in Building-C). Building-A has a CORE router connected to the Border Router in Building-B (BR-B).<br/>
Both BR-B and BR-C receive a default route via eBGP from the ISPs and are configured identically to inject it into the OSPF Area 0 that covers all internal routers as shown in the diagram below: </p>
<p><a href="/uploads/quiz-24.png" title="OSPF Default-Information Originate Always"><img alt='quiz-24 "OSPF Default-Information Originate Always"' src="/uploads/quiz-24.png" title="OSPF Default-Information Originate Always"/></a> </p>
<p><br>
As you can see, <blue>Area 0 contains two external LSAs for the default-information (0.0.0.0)</blue> injected by each Border Routers, BR-B (192.168.15.1) and BR-C (192.168.12.2).<br/>
At this time, connectivity to the Internet (eg. 34.34.34.4) is working fine: </br></p>
<div class="row">
<pre class="col-md-7">CORE-A#<blue>traceroute 34.34.34.4</blue>
Type escape sequence to abort.
Tracing the route to 34.34.34.4
1 192.168.15.1 24 msec 48 msec 8 msec
2 1.1.1.1 32 msec 124 msec 40 msec
3 34.34.34.4 88 msec * 68 msec
CORE-A#</pre>
</div>
<p><font size="-1"><em>Note that each BR performs NAT Overload on outside interface s0/0 (but this does not affect the quiz)!</em></font> </p>
<p>At some moment, the link to ISP-1 (1.1.1.1) is brought down as the ISP requires some maintenance on the circuit. You assume that everything will work fine since traffic will reach internet via <code>BR-C -> ISP-2 (2.2.2.2)</code>. Unfortunatelly, you soon find out that this is not the case: <red><i><b>entire Building-A looses the internet access for the time that link to ISP-1 is down</b></i></red>: </p>
<div class="row">
<pre class="col-md-7">CORE-A#<blue>traceroute 34.34.34.4</blue>
Type escape sequence to abort.
Tracing the route to 34.34.34.4
1 192.168.15.1 52 msec 40 msec 36 msec
<red>2 192.168.15.1 !H * !H</red>
CORE-A#</pre>
</div>
<p>Routing and OSPF database for Area 0 look fine: </p>
<div class="row">
<pre>CORE-A#<blue>sh ip ospf data</blue>
...
Type-5 AS External Link States
Link ID ADV Router Age Seq# Checksum Tag
<purple>0.0.0.0 192.168.12.2 664 0x80000003 0x00CA6E 1</purple>
<purple>0.0.0.0 192.168.15.1 1114 0x80000002 0x00BD7A 1</purple>
CORE-A#
CORE-A#<blue>sh ip route ospf</blue>
...
<purple>O*E2 0.0.0.0/0 [110/1] via 192.168.15.1, 00:51:22, FastEthernet0/0</purple>
CORE-A#</pre>
</div>
<p><strong><em>What is the problem ?</em></strong> </p>
<p><br>
DEVICES' CONFIGURATIONS:</br></p>
<p><br>
<!-- Tab v1 -->
<div class="row">
<div class="tab-v1">
<ul class="nav nav-tabs col-md-9">
<li class="active"><a data-toggle="tab" href="#tab-1">CORE-A</a></li>
<li><a data-toggle="tab" href="#tab-2">BR-B</a></li>
<li><a data-toggle="tab" href="#tab-3">BR-C</a></li>
<li><a data-toggle="tab" href="#tab-4">ISP-1</a></li>
<li><a data-toggle="tab" href="#tab-5">ISP-2</a></li>
</ul>
<div class="tab-content col-md-9">
<div class="tab-pane fade in active" id="tab-1">
<pre class="configs">
hostname CORE-A
!
!
!
interface FastEthernet0/0
ip address 192.168.15.5 255.255.255.0
speed 100
full-duplex
!
!
router ospf 1
log-adjacency-changes
network 192.168.0.0 0.0.255.255 area 0
!
!
!
</pre>
</div>
<div class="tab-pane fade in" id="tab-2">
<pre class="configs">
hostname BR-B
!
!
!
interface FastEthernet0/0
ip address 192.168.12.1 255.255.255.0
ip nat inside
ip virtual-reassembly
speed 100
full-duplex
!
interface Serial0/0
ip address 1.1.1.2 255.255.255.252
ip nat outside
ip virtual-reassembly
clock rate 2000000
!
interface FastEthernet0/1
ip address 192.168.15.1 255.255.255.0
ip nat inside
ip virtual-reassembly
speed 100
full-duplex
!
router ospf 1
log-adjacency-changes
network 192.168.0.0 0.0.255.255 area 0
default-information originate always
!
router bgp 65001
no synchronization
bgp log-neighbor-changes
neighbor 1.1.1.1 remote-as 100
no auto-summary
!
ip nat inside source list ACL_NAT interface Serial0/0 overload
!
ip access-list standard ACL_NAT
permit 192.168.0.0 0.0.255.255
!
!
</pre>
</div>
<div class="tab-pane fade in" id="tab-3">
<pre class="configs">
hostname BR-C
!
!
interface FastEthernet0/0
ip address 192.168.12.2 255.255.255.0
ip nat inside
ip virtual-reassembly
speed 100
full-duplex
!
interface Serial0/0
ip address 2.2.2.1 255.255.255.252
ip nat outside
ip virtual-reassembly
clock rate 2000000
!
!
router ospf 1
log-adjacency-changes
network 192.168.0.0 0.0.255.255 area 0
default-information originate always
!
router bgp 65001
no synchronization
bgp log-neighbor-changes
neighbor 2.2.2.2 remote-as 200
no auto-summary
!
ip nat inside source list ACL_NAT interface Serial0/0 overload
!
ip access-list standard ACL_NAT
permit 192.168.0.0 0.0.255.255
!
!
!
</pre>
</div>
<div class="tab-pane fade in" id="tab-4">
<pre class="configs">
hostname ISP-1
!
!
interface FastEthernet0/0
ip address 34.34.34.3 255.255.255.0
speed 100
full-duplex
!
interface Serial0/0
ip address 1.1.1.1 255.255.255.252
clock rate 512000
!
!
router bgp 100
no synchronization
bgp log-neighbor-changes
neighbor 1.1.1.2 remote-as 65001
neighbor 1.1.1.2 default-originate
neighbor 34.34.34.4 remote-as 200
no auto-summary
!
ip route 2.2.2.0 255.255.255.252 34.34.34.4
!
!
</pre>
</div>
<div class="tab-pane fade in" id="tab-5">
<pre class="configs">
hostname ISP-2
!
!
interface FastEthernet0/0
ip address 34.34.34.4 255.255.255.0
speed 100
full-duplex
!
interface Serial0/0
ip address 2.2.2.2 255.255.255.252
clock rate 512000
!
router bgp 200
no synchronization
bgp log-neighbor-changes
neighbor 2.2.2.1 remote-as 65001
neighbor 2.2.2.1 default-originate
neighbor 34.34.34.3 remote-as 100
no auto-summary
!
ip route 1.1.1.0 255.255.255.252 34.34.34.3
!
!
!
</pre>
</div>
</div>
</div>
</div>
<!-- End Tab v1 --></br></p>
<p><em><strong>Post your answer in the ‘Comments’ section below and subscribe to this blog to get the detailed solution and more interesting quizzes.</strong></em> </p>
<p><br/></p>How do ACLs handle fragments ?2014-04-22T00:00:00+01:00Costitag:costiser.ro,2014-04-22:2014/04/22/how-do-acls-handle-fragments/<p><span class="dropcap-bg">C</span></p>
<p>This post represents the solution and explanation for <a href="/2014/02/03/quiz-22/index.html">quiz-22</a>.<br/>
Have a look at it to understand the problem. </p>
<h3 id="quiz-review">Quiz Review</h3>
<p>This quiz starts an interesting discussion about fragmentation and Router ACL behavior. Here are the details of the network configuration from the quiz:</p>
<ul>
<li>Network configuration of your company:<ul>
<li>the company has 3 sites (behind R1, R2 and R3)</li>
<li>Site-1 (R1) and Site-2 (R2) have dedicated internet uplink, while Site-3 (R3) is connecting to everything (intranet and internet) via R2</li>
<li>for backup purposes, a backdoor link exists between the sites, R1 and R2</li>
<li>your main application server, <strong><blue>1.1.1.10</blue></strong>, is hosted in Site-1 (behind R1)</li>
<li>the main applications on this server are using <strong><blue>TCP 1001</blue></strong> and <strong><red>TCP 1002<red></red></red></strong></li>
</ul>
</li>
</ul>
<p><a href="/uploads/quiz-22-acl-for-fragments-flows.png" title="How do ACLs handle fragments ?"><img alt="quiz-22-acl-for-fragments-flows How do ACLs handle fragments ?" src="/uploads/quiz-22-acl-for-fragments-flows.png" title="How do ACLs handle fragments ?"/></a><br/>
<br/></p>
<ul>
<li>Connectivity from the Site-3 (behind R3, from <strong>172.16.1.0/24</strong>):<ul>
<li>a GRE tunnel is buit between R3 and R2 with an MTU of 1440 (due to constraints in the transit network between them)</li>
<li>since the TCP 1001 application is consuming a lot of bandwidth, Policy Based Routing (<strong>PBR</strong>) was configured on R2 to forward TCP 1001 over the backdoor link (so that internet access for users in Site-2 will not be impacted)</li>
<li>traffic for the TCP 1002 application (and for other potential applications) will be NAT-ed and sent over the Internet</li>
</ul>
</li>
<li>As shown in the above diagram, <blue><em>the TCP connectivity is ok for both applications, TCP 1001 and 1002 (<strong>SYN - SYN/ACK - ACK</strong>)</em></blue></li>
</ul>
<h3 id="problem-description">Problem Description</h3>
<p>Unfortunately, Site-3 users (172.16.1.10) are reporting the following, while trying to upload data to the server:</p>
<ul>
<li>the application on <blue><strong>TCP 1001 works OK</strong></blue>, using the backdoor link</li>
<li>the application on <red><strong>TCP 1002 does <em><u>not</u></em> work</strong></red>: <em>the connection to server 1.1.1.10 <u>gets established</u> but <red>the data transfer freezes soon after connection is made and, in the end, will timeout</red></em></li>
</ul>
<p>It may seem strange at the beginning how is it possible that the control channel (TCP connectivity) is working fine while the data transfer does not ? There's no firewall in the path, only routing is involved !<br/>
One of the things that you should consider in such scenarios is MTU ! Reading the quiz with closer attention will show that MTU is indeed involved (set to 1440 on the GRE Tunnel between R3 and R2) and its presence is also visible in the 2 packet captures attached to the quiz - both of them show <blue><em>"Fragmented IP protocol"</em></blue> (fragments): </p>
<div class="row">
<div class="col-sm-6"><b>TCP 1001 (working)</b>
<a href="/uploads/TCP-1001-working.png"><img src="/uploads/TCP-1001-working.png"/></a>
</div>
<div class="col-sm-6"><b>TCP 1002 (not working)</b>
<a href="/uploads/TCP-1002-not-working.png"><img src="/uploads/TCP-1002-not-working.png"/></a>
</div>
</div>
<h3 id="how-do-acls-handle-fragments">How do ACLs handle fragments ?</h3>
<p>The trick in this quiz is the way ACLs handle fragments, especially ACLs that contain lines referring to Layer 4 (port numbers). The problem is that not all fragments contain the Layer 4 information.<br/>
Let's see how ACLs deal with fragments: </p>
<ul>
<li><blue>ACL contain only Layer-3 information:</blue><ul>
<li>if match, perform the action of that ACL entry (permit or deny)</li>
<li>if NO match, evaluate the next ACL entry/line</li>
</ul>
</li>
<li><blue>ACL contain only Layer-3 information with the "<strong>fragments</strong>" keyword:</blue><ul>
<li>if packet is a fragment, then perform the action (permit or deny)</li>
<li>if packet is NOT a fragment, then evaluate the next ACL entry</li>
</ul>
</li>
<li><blue>ACL contain Layer-3 and Layer-4 information:</blue><ul>
<li><strong>non-fragment</strong>s (contain both L3 and L4) => ACL can evaluate all fields and perform indicated action</li>
<li><strong>initial fragments</strong> also contain both L3 & L4 => ACL can evaluate all fields and perform indicated action</li>
<li><red><u>non-initial fragments contain <strong>only</strong> Layer-3 info</u></red> => ACL cannot evaluate all fields:<ul>
<li>L3 info in the fragment matches the ACL and action is PERMIT => packet allowed</li>
<li>L3 info in the fragment matches the ACL and action is DENY => move to next ACL entry</li>
</ul>
</li>
</ul>
</li>
</ul>
<p>Now, getting back to our quiz and the PBR on R2: </p>
<p><a href="/uploads/quiz-22-ac-for-fragments-explanation.png" title="How do ACLs handle fragments ?"><img alt="quiz-22-ac-for-fragments-explanation How do ACLs handle fragments ?" src="/uploads/quiz-22-ac-for-fragments-explanation.png" title="How do ACLs handle fragments ?"/></a> </p>
<p><font size="-1"><em><strong>NOTE</strong> that this example is for a end-to-end Path MTU of 1500.<br/>
The calculations for our quiz (with GRE header involved) are a bit more complicated.<br/>
</em></font> </p>
<p>Applying the rules explained above, here is what happens in case of <strong><em>flows over TCP 1002</em></strong>:</p>
<ul>
<li>non-fragments do <u>not</u> match ACL (TCP port does not match) => no PBR & sent over the internet WITH NAT</li>
<li>initial fragments do <u>not</u> match ACL (TCP port does not match) => no PBR & sent over the internet WITH NAT</li>
<li><red>non-initial fragments match ACL</red> (these fragments don't contain Layer-4 port numbers and since the Layer-3/IP addresses match, they will be allowed) => <red>PBR is applied and packets are sent over the Backdoor link WITHOUT NAT</red></li>
</ul>
<p>To demonstrate this, let's perform a packet capture at the server side for TCP 1002: </p>
<div class="row">
<pre>root@vb02-freebsd9:~ # <blue>tcpdump -vvv -n -i em2</blue>
tcpdump: listening on em2, link-type EN10MB (Ethernet), capture size 65535 bytes
<font color="black" size="-1" style="font-family:'Lucida Grande',Verdana,Arial,sans-serif';">#
# This is the SYN-SYN/ACK-ACK
# Note the MSS=1436
# Note the TTL=60 for all packet routes over the internet
#</font>
IP (tos 0x0, ttl 60, id 413, offset 0, flags [none], proto TCP (6), length 60)
<green>2.2.2.2.51668</green> > 1.1.1.10.1002: Flags [S], cksum 0x8f4f (correct), seq 1602060910, win 65535, options [<red>mss 1436</red>,nop,wscale 6,sackOK,
TS val 1442555 ecr 0], length 0
IP (tos 0x0, ttl 64, id 227, offset 0, flags [none], proto TCP (6), length 60)
1.1.1.10.1002 > <green>2.2.2.2.51668</green>: Flags [S.], cksum 0x063d (incorrect -> 0x1ac9), seq 454032707, ack 1602060911, win 65535, options [mss 1436,nop,wscale 6,sackOK,
TS val 2252658141 ecr 1442555], length 0
IP (tos 0x0, ttl 60, id 414, offset 0, flags [none], proto TCP (6), length 52)
<green>2.2.2.2.51668</green> > 1.1.1.10.1002: Flags [.], cksum 0x4510 (correct), seq 1, ack 1, win 1045, options [nop,nop,
TS val 1442642 ecr 2252658141], length 0
<font color="black" size="-1" style="font-family:'Lucida Grande',Verdana,Arial,sans-serif';">#
#
# A fragment follows without NAT - note the following:
# - the source address is real IP instead of NAT
# - there is no Layer-4 information
# - the non-zero fragment offset (this is a fragment)
# - the TTL=61 (lower number of hops since it goes via backdoor)
#</font>
IP (tos 0x0, <red>ttl 61</red>, id 415, <red>offset 1416, flags [none]</red>, proto TCP (6), length 60)
<red><u>172.16.1.10</u></red> > 1.1.1.10: ip-proto-6
<font color="black" size="-1" style="font-family:'Lucida Grande',Verdana,Arial,sans-serif';">#
#
# This is an initial-fragment - note the following:
# - it arrives out of order
# - Layer-3 info: fragment offset=0 (1st fragment), flags ="More Fragments"
# - it contains Layer-4 info
#</font>
IP (tos 0x0, <red>ttl 60, id 415, offset 0, flags [+]</red>, proto TCP (6), length 1436)
<green>2.2.2.2.51668</green> > 1.1.1.10.1002: Flags [.], seq 1:1385, ack 1, win 1045, options [nop,nop,TS val 1442642 ecr 2252658141], length 1384
<font color="black" size="-1" style="font-family:'Lucida Grande',Verdana,Arial,sans-serif';">...
...
#
# This repeats over and over (retransmissions) until it times-out
#
#############################################
#
# The NAT table on R2:
#</font>
R2#sh ip nat tra
Pro Inside global Inside local Outside local Outside global
tcp <red>2.2.2.2:51668</red> 172.16.1.10:51668 1.1.1.10:1002 1.1.1.10:1002
R2#</pre>
</div>
<h3 id="solutions">Solutions</h3>
<p>The best solution (and recommendation from design point of view) is to eliminate the fragments and the best way to achieve this is with Path MTU Discovery (PMTUD).<br/>
Attention though, you cannot solve this quiz by tweaking the ACL used for the PBR (and fulfilling the requirements at the same time) - details below! </p>
<p><strong>1. Enable PMTUD</strong> <br/>
If you want to review how Path MTU Discovery (PMTUD) works, have a look at <a href="/how-could-mtu-affect-bgp-sessions.html#path-mtu-discovery-pmtud"><em><strong>this section that I wrote in a separate article</strong></em></a>.<br/>
Most of the servers already have PMTUD enabled by default...but since we don't live in a perfect world, there might be exceptions.<br/>
Another thing that you must be aware is that servers do cache the PMTUD value per destination. Everytime the server wants to initiate a TCP connection to that particular destination it sets the MSS (Maximum Segment Size) to the value of MTU - 40. For the MTU, it uses either the PMTUD cached value OR the MTU of the outgoing interface (usually 1500) in case there's no cached value for that particular destination. </p>
<div class="row">
<pre><font color="black" size="-1" style="font-family:'Lucida Grande',Verdana,Arial,sans-serif';">#
# Check if PMTUD is enabled (1) or disabled (0)
#</font>
root@vb03-freebsd:~/client # <blue>sysctl -a | grep mtu_discovery</blue>
<red>net.inet.tcp.path_mtu_discovery: 0</red>
<font color="black" size="-1" style="font-family:'Lucida Grande',Verdana,Arial,sans-serif';">#
# Check if there's a cache list of hosts:
# The Count number should tell how may hosts are cached
#</font>
root@vb03-freebsd:~/client # <blue>sysctl -a | grep cache | grep tcp | grep count</blue>
net.inet.tcp.hostcache.count: 1
<font color="black" size="-1" style="font-family:'Lucida Grande',Verdana,Arial,sans-serif';">#
# Check what's the cached PMTUD values
#</font>
root@vb03-freebsd:~/client # <purple>sysctl -o net.inet.tcp.hostcache.list</purple>
net.inet.tcp.hostcache.list:
IP address MTU SSTRESH RTT RTTVAR BANDWIDTH CWND SENDPIPE RECVPIPE HITS UPD EXP
<red>1.1.1.10 1476</red> 0 97ms 31ms 0 11252 0 0 16 3 1800
root@vb03-freebsd:~/client #</pre>
</div>
<p>As you saw in my quiz, the client has a cached PMTUD value of 1476 and, at this moment, <red><em>PMTUD is disabled</em></red>!<br/>
You may wonder <em><strong>why there's a cache PMTUD since PMTUD is disabled ?</strong></em> The answer is: to demonstrate this quiz, I did the following:</p>
<ul>
<li>with default server config (PMTUD enabled) and no MTU on GRE tunnel -> made a test connection -> client learned the PMTUD = <strong>1476</strong> (1500-24/GRE)</li>
<li>then I configured lower MTU 1440 on the GRE tunnels</li>
<li>also I disabled PMTUD with command <code>sysctl -w net.inet.tcp.path_mtu_discovery="0"</code> so the server cannot learn the new PMTUD value</li>
</ul>
<p>You will say that <em><strong>it was not nice of me to hack it this way</strong></em>, but I'll say: it worth demonstrate this quiz ☺ </p>
<p>As already mentioned, the best solution for the quiz would be to re-enable back the PMTUD on the client so that it will discover the new MTU: </p>
<div class="row">
<pre>root@vb03-freebsd:~/client # <green>sysctl -w net.inet.tcp.path_mtu_discovery=1</green>
net.inet.tcp.path_mtu_discovery: 0 -> 1
root@vb03-freebsd:~/client #
root@vb03-freebsd:~/client # <blue>nc 1.1.1.10 1002 < test.file</blue> <font color="black" size="-1" style="font-family:'Lucida Grande',Verdana,Arial,sans-serif';">!! This is my test and it's successful !!</font>
root@vb03-freebsd:~/client #
root@vb03-freebsd:~/client # <blue>sysctl -o net.inet.tcp.hostcache.list</blue>
net.inet.tcp.hostcache.list:
IP address MTU SSTRESH RTT RTTVAR BANDWIDTH CWND SENDPIPE RECVPIPE HITS UPD EXP
1.1.1.10 <green><u>1440</u></green> 0 106ms 28ms 0 11252 0 0 26 6 3600
</pre>
</div>
<p><br/></p>
<p><strong>2. MSS Clamping</strong> </p>
<p>MSS Clamping means that the network engineer will configure the routers to modify the MSS value that the client and server exchange in the TCP 3-way handshake. For the sake of exercise, I will use a value of 1300: </p>
<div class="row">
<pre class="col-sm-8">R3#conf t
R3(config)#int tun1
R3(config-if)#ip tcp adjust-mss ?
<500-1460> Maximum segment size in bytes
R3(config-if)#<purple>ip tcp adjust-mss 1300</purple>
R3(config-if)#end</pre>
</div>
<div class="row">
<pre>root@vb03-freebsd:~/client # <blue>nc 1.1.1.10 1002 < test.file</blue> <font color="black" size="-1" style="font-family:'Lucida Grande',Verdana,Arial,sans-serif';">!! Test ok !!</font>
root@vb03-freebsd:~/client #
root@vb03-freebsd:~/client # sysctl -o net.inet.tcp.hostcache.list
net.inet.tcp.hostcache.list:
IP address MTU SSTRESH RTT RTTVAR BANDWIDTH CWND SENDPIPE RECVPIPE HITS UPD EXP
<red>1.1.1.10 <u>1476</u></red> 0 111ms 40ms 0 11252 0 0 13 3 3600
root@vb03-freebsd:~/client #</pre>
</div>
<p>As shown, the PMTUD value is still the wrong one, 1476, since PMTUD is disabled in the quiz !... but this does not influence anymore the communication because the largest packet will be 1300 bytes, as decided with the MSS.<br/>
<br/></p>
<p><strong>3. Modify the ACL PBR to send TCP 1002 over backdoor (same as working TCP 1001)</strong> </p>
<p>As some of you already indicated in the quiz, you can modify the ACL used for the PBR to send TCP 1002 over the backdoor link, same as for TCP 1001: </p>
<div class="row">
<pre class="col-md-9">R2#conf t
Enter configuration commands, one per line. End with CNTL/Z.
R2(config)#ip access-list extended ACL_BACKDOOR
R2(config-ext-nacl)#<purple>15 permit tcp host 172.16.1.10 host 1.1.1.10 eq 1002</purple>
R2(config-ext-nacl)#end</pre>
</div>
<p><strong><em>Problems with this solution</em></strong>:<br/>
- it violates the quiz requirements (that could represent a business requirement that you, as a network enginner, would have to follow!)<br/>
- <red>it still does <u><strong>not</strong></u> solve the problems for other flows that contains <strong>fragments</strong> ( other TCP ports different than 1001 & 1002)</red><br/>
<br/></p>
<p><strong>4. Deny fragments from PBR - breaks TCP 1001 !!</strong> </p>
<p>Since the problem is that fragments of TCP 1002 match the PBR ACL, one could modify this ACL to deny such fragments from being routed over the backdoor by the PBR: </p>
<div class="row">
<pre class="col-md-9">R2#conf t
Enter configuration commands, one per line. End with CNTL/Z.
R2(config)#ip access-list extended ACL_BACKDOOR
R2(config-ext-nacl)#<red>5 deny ip host 172.16.1.10 host 1.1.1.10 fragments</red>
R2(config-ext-nacl)#end
R2#sh access-list
Extended IP access list ACL_BACKDOOR
5 deny ip host 172.16.1.10 host 1.1.1.10 fragments
10 permit tcp host 172.16.1.10 host 1.1.1.10 eq 1001 (296 matches)
20 deny ip any any (433 matches)
</pre>
</div>
<div class="row">
<pre>root@vb03-freebsd:~/client # <blue>nc 1.1.1.10 1002 < test.file</blue> <font color="black" size="-1" style="font-family:'Lucida Grande',Verdana,Arial,sans-serif';">!! Test ok !!</font>
root@vb03-freebsd:~/client #
root@vb03-freebsd:~/client # <red>nc 1.1.1.10 1001 < test.file</red> <font color="black" size="-1" style="font-family:'Lucida Grande',Verdana,Arial,sans-serif';">!! TCP 1001 does not work anymore !!</font>
<red>^C</red>
root@vb03-freebsd:~/client #</pre>
</div>
<p>As you can see, we made it work for TCP 1002 but at the same time <blue><em>we damaged the TCP 1001</em></blue>...Why is that ?<br/>
It's because TCP 1001 also uses fragments and those fragments are now denied by the new ACL entry #5 which makes the router send them via Internet (while TCP 1001 non-fragments are allowed by the ACL and PBRed over the backdoor) !<br/>
<br/></p>
<p><strong>5. Adjust (lower) the MTU on all links</strong> </p>
<p>Adjusting the MTU on all links will eliminate the need for the servers to use PMTUD and thus the fragments will disappear... unfortunately, even though it's recommended and desirable to have same MTU values in the path, it's not always possible, especially for links that are not under your administration.<br/>
<br/></p>
<p><em>Thanks again for all your comments in the quiz !<br/>
Subscribe to this blog to get more interesting quizzes and detailed solutions.</em> </p>
<p><br/></p>Quiz #23 – QoS on IPsec Tunnels2014-04-07T00:00:00+01:00Costitag:costiser.ro,2014-04-07:2014/04/07/quiz-23/<p><span class="dropcap">C</span>ompany ABC runs a static VTI-based VPN tunnel between Site-1, hosting <strong>192.168.1.1</strong>, and Site-2, hosting <strong>192.168.5.5</strong>.<br/>
BGP is configured between the two sites, over the VTI Tunnel, making all traffic between the sites to be encrypted/protected by IPsec. </p>
<p>A new requirement is received from the customer, asking that all traffic from 192.168.1.1 (in Site-1) to 192.168.2.2 (in Site-2) must be prioritized. The network engineer creates the configuration below (<code>access-list</code>, <code>class-map IMPORTANT_TRAFFIC</code> and <code>policy-map LLQ</code>) as shown below: </p>
<p><a href="/uploads/quiz-23.png" title="QoS on IPsec Tunnels"><img alt="quiz-23 QoS on IPsec Tunnels" src="/uploads/quiz-23.png" title="QoS on IPsec Tunnels"/></a> </p>
<p><br>
<strong><em>What is missing to finish this task ?</em></strong> </br></p>
<p><br>
DEVICES' CONFIGURATIONS:
<br>
<!-- Tab v1 -->
<div class="row">
<div class="tab-v1">
<ul class="nav nav-tabs col-md-8">
<li class="active"><a data-toggle="tab" href="#tab-1">R1</a></li>
<li><a data-toggle="tab" href="#tab-2">R2</a></li>
<li><a data-toggle="tab" href="#tab-3">R3</a></li>
<li><a data-toggle="tab" href="#tab-4">R4</a></li>
<li><a data-toggle="tab" href="#tab-5">R5</a></li>
</ul>
<div class="tab-content col-md-8">
<div class="tab-pane fade in active" id="tab-1">
<pre class="configs">
hostname R1
!
no aaa new-model
ip cef
!
interface Loopback0
ip address 192.168.1.1 255.255.255.255
!
interface FastEthernet0/0
ip address 192.168.12.1 255.255.255.0
speed 100
full-duplex
!
interface FastEthernet0/1
no ip address
shutdown
duplex auto
speed auto
!
router ospf 1
log-adjacency-changes
network 192.168.0.0 0.0.255.255 area 0
!
</pre>
</div>
<div class="tab-pane fade in" id="tab-2">
<pre class="configs">
hostname R2
!
no aaa new-model
ip cef
!
class-map match-all IMPORTANT_TRAFFIC
match access-group name ACL_IMPORTANT_TRAFFIC
!
!
policy-map LLQ
class IMPORTANT_TRAFFIC
priority percent 33
!
!
!
crypto isakmp policy 10
encr 3des
authentication pre-share
crypto isakmp key cisco123 address 0.0.0.0 0.0.0.0
!
!
crypto ipsec transform-set TSET esp-3des esp-md5-hmac
!
crypto ipsec profile IPSEC_PROFILE
set transform-set TSET
!
!
interface Tunnel0
ip address 192.168.255.2 255.255.255.252
tunnel source 23.23.23.2
tunnel destination 34.34.34.4
tunnel mode ipsec ipv4
tunnel protection ipsec profile IPSEC_PROFILE
!
interface FastEthernet0/0
ip address 192.168.12.2 255.255.255.0
speed 100
full-duplex
!
interface Serial0/0
ip address 23.23.23.2 255.255.255.248
clock rate 2000000
!
!
router ospf 1
log-adjacency-changes
network 192.168.12.2 0.0.0.0 area 0
default-information originate
!
router bgp 65200
no synchronization
bgp log-neighbor-changes
redistribute connected route-map INTERNAL_INTERFACES
redistribute ospf 1
neighbor 192.168.255.1 remote-as 65100
no auto-summary
!
ip forward-protocol nd
ip route 0.0.0.0 0.0.0.0 23.23.23.3
!
!
ip access-list extended ACL_IMPORTANT_TRAFFIC
permit ip host 192.168.1.1 host 192.168.5.5
!
!
route-map INTERNAL_INTERFACES permit 10
match interface FastEthernet0/0
!
</pre>
</div>
<div class="tab-pane fade in" id="tab-3">
<pre class="configs">
hostname R3
!
no aaa new-model
ip cef
!
!
interface FastEthernet0/0
ip address 34.34.34.3 255.255.255.248
speed 100
full-duplex
!
interface Serial0/0
ip address 23.23.23.3 255.255.255.248
clock rate 2000000
!
!
</pre>
</div>
<div class="tab-pane fade in" id="tab-4">
<pre class="configs">
hostname R4
!
no aaa new-model
ip cef
!
!
crypto isakmp policy 10
encr 3des
authentication pre-share
crypto isakmp key cisco123 address 0.0.0.0 0.0.0.0
!
!
crypto ipsec transform-set TSET esp-3des esp-md5-hmac
!
crypto ipsec profile IPSEC_PROFILE
set transform-set TSET
!
!
interface Tunnel0
ip address 192.168.255.1 255.255.255.252
tunnel source 34.34.34.4
tunnel destination 23.23.23.2
tunnel mode ipsec ipv4
tunnel protection ipsec profile IPSEC_PROFILE
!
interface FastEthernet0/0
ip address 34.34.34.4 255.255.255.248
speed 100
full-duplex
!
interface FastEthernet0/1
ip address 192.168.45.4 255.255.255.0
speed 100
full-duplex
!
router ospf 1
log-adjacency-changes
network 192.168.45.4 0.0.0.0 area 0
default-information originate
!
router bgp 65100
no synchronization
bgp log-neighbor-changes
redistribute connected route-map INTERNAL_INTERFACES
redistribute ospf 1
neighbor 192.168.255.2 remote-as 65200
no auto-summary
!
ip route 0.0.0.0 0.0.0.0 34.34.34.3
!
!
route-map INTERNAL_INTERFACES permit 10
match interface FastEthernet0/1
!
!
</pre>
</div>
<div class="tab-pane fade in" id="tab-5">
<pre class="configs">
hostname R5
!
!
no aaa new-model
ip cef
!
!
!
!
interface Loopback0
ip address 192.168.5.5 255.255.255.255
!
interface FastEthernet0/0
ip address 192.168.45.5 255.255.255.0
speed 100
full-duplex
!
router ospf 1
log-adjacency-changes
network 192.168.0.0 0.0.255.255 area 0
!
</pre>
</div>
</div>
</div>
</div>
<!-- End Tab v1 --></br></br></p>
<p><em>Post your answer in the 'Comments' section below and subscribe to this blog to get the detailed solution and more interesting quizzes.</em> </p>
<p><br/></p>Pre-bestpath Cost Community – What is it?2014-03-27T00:00:00+00:00Costitag:costiser.ro,2014-03-27:2014/03/27/pre-bestpath-cost-community/<p><span class="dropcap-bg">T</span>his post represents the solution and explanation for <a href="/2013/11/29/quiz-21/index.html">quiz-21</a>. <br/>
Have a look at it to test your knowledge. ☺</p>
<h3 id="quiz-review">Quiz Review</h3>
<p>A large enterprise consisting of multiple remote sites, uses a private MPLS cloud with EIGRP as the protocol between PE to CE and MPLS L3 VPNs to achieve the necessary connectivity.<br/>
Of particular interes, Site-A and Site-B have a <strong><em>Backdoor Link</em></strong> between them.<br/>
Everything works as desired until a new request reaches the network department: a new Site-ABC will be connected to PE-2 and users in this site will mostly connect to resources behind Site-A / CE-1 (<strong>192.168.1.55</strong>).<br/>
The requirement is to make sure that this traffic (from PE-2 to CE-1) will use the backdoor link instead of the MPLS cloud: </p>
<p><a href="/uploads/quiz-21-solution.png" title="Pre-bestpath Cost Community"><img alt="quiz-21-solution Pre-bestpath Cost Community" src="/uploads/quiz-21-solution.png" title="Pre-bestpath Cost Community"/></a> </p>
<p>A simple investigation shows that in the current setup, traffic from PE-2 to CE-1 goes via the MPLS cloud: </p>
<div class="row">
<pre class="col-md-10">PE-2#<blue>traceroute vrf CUST_A 192.168.1.55</blue>
Type escape sequence to abort.
Tracing the route to 192.168.1.55
1 10.0.0.6 [MPLS: Labels 16/19 Exp 0] 60 msec 60 msec 40 msec
2 192.168.1.1 [MPLS: Label 19 Exp 0] 36 msec 36 msec 40 msec
3 192.168.1.2 44 msec * 20 msec
PE-2#</pre>
</div>
<h3 id="problem-statement">Problem Statement</h3>
<p>The network engineer tries to understand the current routing status for destination 192.168.1.55 and finds out that PE-2 prefers the BGP path versus the EIGRP one: </p>
<div class="row">
<pre class="col-md-11">PE-2#<blue>sh ip route vrf CUST_A 192.168.1.55</blue>
Routing entry for 192.168.1.55/32
<red>Known via "bgp 100"</red>, distance 200, metric 156160, type internal
Redistributing via eigrp 100
Advertised by eigrp 100 metric 100000 10 255 1 1500
bgp 100 (self originated)
Last update from 10.255.255.1 00:22:30 ago
Routing Descriptor Blocks:
<red>* 10.255.255.1 (Default-IP-Routing-Table), from 10.255.255.1, 00:22:30 ago</red>
Route metric is 156160, traffic share count is 1
AS Hops 0
PE-2#
PE-2#<blue>sh bgp vpnv4 uni all 192.168.1.55</blue>
BGP routing table entry for 100:1:192.168.1.55/32, version 22
Paths: (1 available, best #1, table CUST_A)
Not advertised to any peer
Local
<red>10.255.255.1 (metric 3) from 10.255.255.1 (10.255.255.1)
Origin incomplete, metric 156160, localpref 100, valid, internal, best</red>
Extended Community: RT:100:1 Cost:pre-bestpath:128:156160
0x8800:32768:0 0x8801:100:130560 0x8802:65281:25600 0x8803:65281:1500
mpls labels in/out nolabel/19
PE-2#</pre>
</div>
<p>He tries to influence the BGP path selection by setting a high local preference on the redistributed EIGRP routes, but unfortunatelly PE-2 still choses the prefix received over the MPLS as the best path: </p>
<div class="row">
<pre class="col-md-7">ip access-list standard PE1_LOOPBACK
permit 192.168.1.55
!
route-map SET_LP_500 permit 10
match ip address PE1_LOOPBACK
<purple>set local-preference 500</purple>
route-map SET_LP_500 permit 999
!
router bgp 100
address-fam ipv4 vrf CUST_A
redistribute eigrp 100 route-map SET_LP_500
</pre>
</div>
<div class="row">
<pre class="col-md-11">PE-2#<blue>sh bgp vpnv4 uni all 192.168.1.55</blue>
BGP routing table entry for 100:1:192.168.1.55/32, version 22
Paths: (1 available, best #1, table CUST_A)
Not advertised to any peer
Local
10.255.255.1 (metric 3) <red>from 10.255.255.1 (10.255.255.1)</red>
Origin incomplete, metric 156160,<red> **localpref 100</red>, valid, internal,<red> best
Extended Community: RT:100:1 <u>Cost:pre-bestpath:128:156160</u></red>
0x8800:32768:0 0x8801:100:130560 0x8802:65281:25600 0x8803:65281:1500
mpls labels in/out nolabel/19
<red><i>!
! the prefix received over MPLS (with default LP = 100) is still chosen as best !!
! although the redistributed one has LP = 500
!</i></red></pre>
</div>
<p>As most of you already answered in <a href="/2013/11/29/quiz-21/">the quiz</a> the reason for not being able to influence the BGP Best Path selection with the Local Preference is the existence of the <strong><em>Cost Community</em></strong> as seen in this line <red><code>Cost:pre-bestpath:128:156160</code></red>.<br/>
<strong>What's that ?</strong> </p>
<h3 id="pre-bestpath-cost-community">Pre-bestpath Cost Community</h3>
<p>Pre-bestpath is an <em>extended non-transitive community</em> that Cisco introduced in order to be able to influence the BGP Best Path selection in an arbitrary fashion, after partial computations of the normal process (or even <u><strong>before</strong></u> it starts) and take a decision based on local criteria. In some cases, especially in situation with Backdoor links, this can also help against routing loops.<br/>
This is not (yet, and probably will never be) part of an RFC standard as the proposed document is still in draft status. This draft is called "<a href="http://tools.ietf.org/html/draft-retana-bgp-custom-decision-02"><em>BGP Custom Decisions</em></a>".<br/>
To achieve such a custom decision, this community uses a <u><blue><em>Point of Insertion (POI)</em></blue></u> to indicate at what point during the BGP Best Path Selection process the router has to stop and consider the value of the Cost Community. The <em><blue>breaking points</blue> or <blue>insertion points</blue></em> mentioned in the draft document are:</p>
<p><a href="/uploads/quiz-21-cost-community.png" title="Insertion Points in BGP Best path selection"><img alt="quiz-21-cost-community Insertion Points in BGP Best path selection" src="/uploads/quiz-21-cost-community.png" title="Insertion Points in BGP Best path selection"/></a> </p>
<ul>
<li><red>POI = 128</red>, use Cost Community <u>before anything else</u></li>
<li><green>POI = 129</green>, use Cost Community <u>after the IGP cost to next-hop has been compared</u></li>
<li>POI = 130, use Cost Community <u>after the paths advertised by BGP speakers in a neighboring autonomous system (if any) have been selected</u></li>
<li><purple>POI = 131</purple>, use Cost Community <u>after BGP IDs have been compared</u></li>
</ul>
<p>Out of all these, Cisco implemented only POI = 129 (IGP) that represents the default and POI = 128 that represents the <em>ABSOLUTE_VALUE</em>.<br/>
<div class="row warning">
<div class="col-xs-12 col-sm-2 warning-left">
WARNING<br><i class="icon-fire icon-lg"> </i>
</br></div>
<div class="col-xs-12 col-sm-10 warning-right">
This POI 128 (absolute value) totally modifies the BGP best path selection process by making the router compare the cost values <u>before</u> the entire process starts - hence the name <blue>pre-bestpath cost community</blue>.
</div>
</div></p>
<h3 id="eigrp-and-cost-community-pre-bestpath">EIGRP and Cost Community (Pre-bestpath)</h3>
<p>Before presenting the solutions for <a href="/2013/11/29/quiz-21/">the quiz</a>, let's review some of the characteristics of EIGRP used as PE-CE protocol in relation with the pre-bestpath cost community:</p>
<ul>
<li>by default, EIGRP routes redistributed into BGP <blue><em>get automatically the Cost Community POI 128</em></blue> => this means that cost value is evaluated /compared before any other path attributes (including weight). Also, the community-ID is as well 128.</li>
<li><blue><em>the value/cost of the pre-bestpath community is the composite metric of the redistributed EIGRP route</em></blue></li>
<li>routes without this cost community are evaluated as if they had a cost value of 2147483647, which represents half of the maximum possible value</li>
<li>MP-BGP uses other set of communities to transport EIGRP metric values from one PE to another:<ul>
<li>0x8800 = Route Flag and Tag</li>
<li>0x8801 = AS Number and Delay</li>
<li>0x8802 = Reliability, Next Hop, and Bandwidth</li>
<li>0x8803 = Reserve, Load and MTU</li>
<li>0x8804 = (for external routes) Remote AS Number and Remote ID</li>
<li>0x8805 = (for external routes) Remote Protocol and Remote Metric</li>
</ul>
</li>
<li><strong><em>the MP-BGP cloud is interpreted as a metric zero (0)</em></strong></li>
</ul>
<p><a href="/uploads/quiz-21-eigrp-as-pe-ce.png" title="EIGRP as PE-CE Protocol"><img alt="quiz-21-eigrp-as-pe-ce EIGRP as PE-CE Protocol" src="/uploads/quiz-21-eigrp-as-pe-ce.png" title="EIGRP as PE-CE Protocol"/></a> </p>
<p>For example, 0x8801 AS Number determines if the prefix will be redistributed as internal (same AS number) or external (different AS numbers). </p>
<p>Now let's put all together and reveal the things behind the scene. As you can see in the picture below, PE-2 has the following information in the BGP table and it will try to find best path:</p>
<ul>
<li>prefix 192.168.1.55 received from PE-1 over the MP-BGP with a pre-bestpath community of 128:<strong><u>156160</u></strong> - this value represents the composite metric of the EIGRP route <u><em>at the moment it was redistributed from EIGRP to BGP on PE-1</em></u></li>
<li>the MPLS cloud does not modifies this cost (MPLS cloud is transparent)</li>
<li>prefix 192.168.1.55 received from CE-2 over the EIGRP gets redistributed into BGP and immediately receives a pre-bestpath community of 128:<strong><u>158720</u></strong> - this value represents the composite EIGRP metric at this point</li>
<li>due to the existence pre-bestpath, MP-BGP path is selected the best path, even though the locally redistributed one has a weight of 32768 (default weight for all locally originated routes) - as explained, weight does not count when pre-bestpath exists</li>
</ul>
<p><a href="/uploads/quiz-21-explanation.png" title="Pre-bestpath in action"><img alt="quiz-21-explanation Pre-bestpath in action" src="/uploads/quiz-21-explanation.png" title="Pre-bestpath in action"/></a> </p>
<h3 id="quiz-solutions">Quiz Solutions</h3>
<p>Now, knowing that the Pre-bestpath Cost Community modifies the normal BGP best path selection process by considering the value of this community (the cost) <u>before anything else is compared</u> (due to ABSOLUTE point of insertion of 128), it becomes obvious that modifying any of the "clasic" path attributes, such as Local Preference, AS PATH, MED or even Weight will <strong>not</strong> help.<br/>
The solutions would have to find a way to modify the pre-bestpath cost or to disable this community. Let's see them in action ! </p>
<h4 id="1-change-pre-bestpath-on-pe-2">1. Change pre-bestpath on PE-2</h4>
<p>One method to get the result we want is to <font color="blue"><em>modify the pre-bestpath community on PE-2 during redistribution from EIGRP into MP-BGP</em></font>. Since we cannot use the same community-ID of 128 (because this gets over-written by the redistribution process) I will use a lower community-ID (1 in below example) and a random cost value (9999999) - according to the RFC Draft: "<em><u>the Cost Community with the lowest Community-ID is considered first</u></em>": </p>
<div class="row">
<pre class="col-md-8">PE-2#sh run | s access-list|route-map|router bgp
ip access-list standard CE1_LOOPBACK
permit 192.168.1.55
!
route-map SET_EXT_COST_COMMUNITY permit 10
match ip address CE1_LOOPBACK
<purple>set extcommunity cost pre-bestpath 1 9999999</purple>
route-map SET_EXT_COST_COMMUNITY permit 99
!
!
router bgp 100
address-family ipv4 vrf CUST_A
<purple>redistribute eigrp 100 route-map SET_EXT_COST_COMMUNITY</purple>
</pre>
</div>
<div class="row">
<pre>PE-2#<blue>sh bgp vpnv4 uni all 192.168.1.55</blue>
BGP routing table entry for 100:1:192.168.1.55/32, version 8
Paths: (1 available, best #1, table CUST_A)
Advertised to update-groups:
1
Local
192.168.2.2 from 0.0.0.0 (10.255.255.2)
Origin incomplete, metric 158720, localpref 100, weight 32768, valid, sourced, <green>best</green>
Extended Community: RT:100:1
<purple>Cost:pre-bestpath:1:9999999
Cost:pre-bestpath:128:158720</purple> 0x8800:32768:0 0x8801:100:133120
0x8802:65282:25600 0x8803:65281:1500
mpls labels in/out 33/nolabel
PE-2#
PE-2#<blue>traceroute vrf CUST_A 192.168.1.55</blue>
Type escape sequence to abort.
Tracing the route to 192.168.1.55
1 192.168.2.2 64 msec 28 msec 12 msec
2 192.168.12.1 44 msec * 24 msec
PE-2#</pre>
</div>
<p><font size="-1"><em>Note that the pre-bestpath:128:<eigrp_metric> also gets added during redistribution</em></font> </p>
<h4 id="2-change-pre-bestpath-on-pe-1">2. Change pre-bestpath on PE-1</h4>
<p>A similar solution to the above one, but this time play with the pre-bestpath cost community between the BGP peers: </p>
<div class="row">
<pre class="col-md-8">PE-1#sh run | s access-list|route-map|router b
ip access-list standard CE1_LOOPBACK
permit 192.168.1.55
!
route-map SET_EXT_COMM permit 10
match ip address CE1_LOOPBACK
<purple>set extcommunity cost pre-bestpath 128 7777777</purple>
route-map SET_EXT_COMM permit 999
!
!
router bgp 100
address-family vpnv4
<purple>neighbor 10.255.255.2 route-map SET_EXT_COMM out</purple>
</pre>
</div>
<div class="row">
<pre>PE-2#<blue>sh bgp vpnv4 uni all 192.168.1.55</blue>
BGP routing table entry for 100:1:192.168.1.55/32, version 36
Paths: (2 available, best #2, table CUST_A)
Flag: 0x820
Advertised to update-groups:
1
Local
10.255.255.1 (metric 3) from 10.255.255.1 (10.255.255.1)
Origin incomplete, metric 156160, localpref 100, valid, internal
Extended Community: RT:100:1
<purple>Cost:pre-bestpath:128:7777777</purple> 0x8800:32768:0
0x8801:100:130560 0x8802:65281:25600 0x8803:65281:1500
mpls labels in/out 24/20
Local
192.168.2.2 from 0.0.0.0 (10.255.255.2)
Origin incomplete, metric 158720, localpref 100, weight 32768, valid, sourced, <green>best
Extended Community: RT:100:1 Cost:pre-bestpath:128:158720</green>
0x8800:32768:0 0x8801:100:133120 0x8802:65282:25600 0x8803:65281:1500
mpls labels in/out 24/nolabel
PE-2#</pre>
</div>
<p><font size="-1"><em>Note that the pre-bestpath:128:7777777 overwrites the initial one, as you cannot have two communities for the same point of insertion, 128 and the same community-ID, 128</em></font> </p>
<p>In this case, comparison is done between same POI & community-ID (128) but EIGRP redistributed route has a lower cost (158720) versus the one received over MP-BGP (7777777). </p>
<h4 id="3-increase-metrics-using-offset-lists">3. Increase metrics using Offset-lists</h4>
<p>Since the MPLS cloud is transparent for the EIGRP metric carried from PE-1 to PE-2, another solution to the quiz would be to modify the composite metric just <em>before</em> entering BGP, on PE-1, with an offset-list: </p>
<div class="row">
<pre class="col-md-8">PE-1#sh run | s access-list|router eigrp
ip access-list standard CE1_LOOPBACK
permit 192.168.1.55
!
!
router eigrp 1
address-family ipv4 vrf CUST_A
<purple>offset-list CE1_LOOPBACK in 1000000 FastEthernet0/0</purple>
</pre>
</div>
<h4 id="4-disabling-the-pre-bestpath-behaviour">4. Disabling the Pre-bestpath Behaviour</h4>
<p>Last solution to this quiz would be to disable the pre-bestpath behaviour. To achieve this, command "<strong><u>bgp bestpath cost-community ignore</u></strong>" tells the router to ignore the presence of the pre-bestpath community and to follow the normal best path selection process.<br/>
This is the least recommended solution because you have to apply this command on all BGP speakers, which is not scalable. </p>
<p><red><i>Not applying it on all devices, will lead to routing loops due to inconsistent best path selection process!</i></red> </p>
<div class="row">
<pre class="col-md-8">PE-1(config)#router bgp 100
PE-1(config-router)#<purple>bgp bestpath cost-community ignore</purple>
PE-1(config-router)#^Z
!
!
PE-2(config)#router bgp 100
PE-2(config-router)#<purple>bgp bestpath cost-community ignore</purple>
PE-2(config-router)#^Z
</pre>
</div>
<div class="row">
<pre class="col-md-11">PE-2#<blue>sh bgp vpnv4 uni all 192.168.1.55</blue>
BGP routing table entry for 100:1:192.168.1.55/32, version 3
Paths: (2 available, best #2, table CUST_A)
Flag: 0x820
Advertised to update-groups:
1
Local
10.255.255.1 (metric 3) from 10.255.255.1 (10.255.255.1)
Origin incomplete, metric 156160, localpref 100, valid, internal
Extended Community: RT:100:1 <purple>Cost:pre-bestpath:128:<u>156160</u></purple>
0x8800:32768:0 0x8801:100:130560 0x8802:65281:25600 0x8803:65281:1500
mpls labels in/out 19/18
Local
192.168.2.2 from 0.0.0.0 (10.255.255.2)
Origin incomplete, metric 158720, localpref 100, <green><u>weight 32768</u></green>, valid, sourced, <green>best
Extended Community: RT:100:1 Cost:pre-bestpath:128:<u>158720</u></green>
0x8800:32768:0 0x8801:100:133120 0x8802:65282:25600 0x8803:65281:1500
mpls labels in/out 19/nolabel
PE-2#<blue>traceroute vrf CUST_A 192.168.1.55</blue>
Type escape sequence to abort.
Tracing the route to 192.168.1.55
1 192.168.2.2 24 msec 16 msec 20 msec
2 192.168.12.1 44 msec * 52 msec
PE-2#</pre>
</div>
<p><em>This brings the end to another veeeery long post.<br/>
Thank you for all your comments and inputs in the quiz !</em> </p>
<p><br/></p>How Can MSTP Configuration Changes Impact Your Network2014-02-13T00:00:00+00:00Costitag:costiser.ro,2014-02-13:2014/02/13/how-can-mstp-configuration-changes-impact-your-network/<p><span class="dropcap-bg">T</span>his post represents the solution and explanation for <a href="/2013/10/04/quiz-19/index.html">quiz-19</a>.<br/>
Have a look at it to understand the problem. </p>
<h3 id="quiz-review">Quiz Review</h3>
<p>This quiz talks about making configuration changes to the MSTP by modifying the vlan to instance mapping. There are 4 switches (Dist-1, Dist-2, Acc-1 and Acc-2), all configured to run MSTP with one region:</p>
<p><a href="/uploads/quiz-19.png" title="MSTP with one region"><img alt="MSTP with one region" src="/uploads/quiz-19.png" title="MSTP with one region"/></a>
<br> </br></p>
<ul>
<li>Dist-1 is primary root for MST0 and MST1 and secondary root for MST2</li>
<li>Dist-2 is primary root for MST2 and secondary root for MST0 and MST1</li>
</ul>
<p><br>
At this moment, the network engineer <red><strong><em>creates vlan 200 and then he maps it to instance/MST 2</em></strong></red> on all switches, in this order: Acc-1 --> Acc-2 --> Dist-1 --> Dist-2 which causes some sensitive applications (connected to Acc-1 and Acc-2) to experience short network cuts, alerting the server team. </br></p>
<h3 id="mstp-review">MSTP Review</h3>
<p>Before explaining why this happens, let's review some of the characteristics of MSTP:</p>
<ul>
<li>MSTP uses the concept of <strong><em>regions</em></strong> = a collection of switches that share the same MSTP configuration</li>
<li>the following <strong><u>must match</u></strong> for two switches to consider themselves in the <u>same region</u>:<ul>
<li>configuration name</li>
<li>revision number</li>
<li>vlan to instance mapping</li>
</ul>
</li>
</ul>
<div class="row warning">
<div class="col-xs-12 col-sm-2 warning-left">
WARNING<br><i class="icon-fire icon-lg"> </i>
</br></div>
<div class="col-xs-12 col-sm-10 warning-right">
If <i>any of the above (configuration name, revision number or vlan-to-instance mapping) is different</i>, the switches will be in <red><b><i>separate regions</i></b></red> !
</div>
</div>
<ul>
<li>switches do not exchange the vlan-to-instance mapping but instead, they compute a hash of this mapping and exchange it between them</li>
<li>vlans that are not mapped to a specific instance will be automatically in IST / MST 0</li>
</ul>
<p>Usually you configure a single MSTP Region for your network. Of course, there might be cases when more regions would make more sense, but these are corner cases.<br/>
<br>
In case that <blue>multiple MSTP Regions</blue> exist, you have to remember the following <blue><strong>rules</strong></blue>:</br></p>
<ul>
<li>the switch with the lowest Bridge ID among all regions will be selected as <strong><em>CIST Root</em></strong> (Common and Internal Spanning Tree)</li>
<li>the links between the regions are known as <strong>boundaries</strong></li>
<li>switches that contain boundary links are known as <strong><em>boundary switches</em></strong></li>
<li>each region will elect a <strong><em>Regional Root</em></strong> based on the lowest <em>external</em> cost toward the CIST Root - <u>only boundary switches</u> are eligible for this election !</li>
</ul>
<p>One result of the above rules is: between two regions, only one boundary port will be in FWD state - the rest of boundary ports will be in BLOCKING state. </p>
<h3 id="quiz-explanation">Quiz Explanation</h3>
<p>As soon as the junion engineer changes the vlan-to-instance mapping (by moving vlan 200 to instance 2), <red>he creates un-intentionally more MSTP regions</red> because the hash of the vlan-to-instance mapping will be different. This will trigger some links to become boundary ports and to transition from FWD to BLK or vice-versa.<br/>
The next diagram shows how the ports' state will change for vlans in instance 2 / MST 2, as the engineer made the configuration touching the switching in this order <strong>Acc-1 --> Acc-2 --> Dist-1 --> Dist-2</strong>: </p>
<p><a href="/uploads/mstp-states-for-mst-2.png" title="How Can MSTP Configuration Changes Impact Your Network"><img alt="mstp-states-for-mst-2 How Can MSTP Configuration Changes Impact Your Network" src="/uploads/mstp-states-for-mst-2.png" title="How Can MSTP Configuration Changes Impact Your Network"/></a> </p>
<p>A continuous ping between two hosts in vlan 150 (mapped to MST2), connected to Acc-1 and Acc-2 respectively, shows that connectivity is lost during re-convergence. The time of impact could be different depending on whether the portfast/edge feature is configured or not on the ports connected to the hosts: </p>
<p>- <u>portfast not enabled</u> - note the 30 sec outage (2x forward delay):</p>
<div class="row">
<pre class="col-md-10">host-2#ping 192.168.150.1 timeout 1 repeat 10000000
Type escape sequence to abort.
Sending 10000000, 100-byte ICMP Echos to 192.168.150.1, timeout is 1 seconds:
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
<red>...............................</red>!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!</pre>
</div>
<p>- <u>portfast enabled</u> - note the 2 sec outage:</p>
<div class="row">
<pre class="col-md-10">!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!<red>..</red>!!!!!!!!!!!
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!</pre>
</div>
<h3 id="solutions">Solutions</h3>
<p>For the first time since writing quizzes on this website, <red><em>this quiz does not have any solution</em></red> !! It is <strong><em>not possible</em></strong> to change the vlan-to-instance mapping <u>on all 4 switches</u> without causing at least a short network cut !<br/>
The only way to do such configuration changes to the MSTP in production environments is by scheduling a maintenance window approved by all teams impacted.<br/>
Of course, not all MST instances are impacted the same: for example, after reconvergence some interfaces could maintain the same status (BLK or FWD) for some instances/vlans but not for other ones. </p>
<p>My point of view is: in networks like the one in diagram with MSTP between Access Layer and Aggregation Layer, you can count how many possible ways/paths exist between the Access Layer and Aggregation Layer (usually equal to the number of uplinks) - and this case (and most of the cases) this number is 2: <u>from access switches to aggregation there are <strong>only two uplinks / paths</strong></u>. In this case, with <strong>two MST instances</strong> you cover all situations. As a result, you <u><em>map half of the vlans to instance 1 and the other half to instance 2</em></u>, so that you will never have to change the mapping again.<br/>
Remember that it is always a good practice with MSTP to <strong><em>avoid leaving vlans in IST / MST 0 instance</em></strong> (more about this in future quizzes/posts. </p>
<p>In the end of this article, I'm attaching another diagram with the states for vlan 200 (the one that is newly added into production). During the process, this vlan 200 will be mapped to IST / MST 0 in some regions and to MST 2 in others. As you can see from this diagram, the impact on vlan 200 is much less than for vlans 100-199 (MST 2) shown above - this is because most of the times, the same interfaces remain blocked:<br/>
<a href="/uploads/vlan-200-states1.png"><img alt="vlan-200-states" src="/uploads/vlan-200-states1.png"/></a> </p>
<p><em>Thank you for your comments into the quiz !<br/>
Subscribe to this blog to get more interesting quizzes and detailed solutions.</em> </p>
<p><br/></p>Quiz #22 – Policy Based Routing (PBR) Problem or Not ?2014-02-03T00:00:00+00:00Costitag:costiser.ro,2014-02-03:2014/02/03/quiz-22/<p><span class="dropcap">Y</span>our company has 3 sites, each with a dedicated border router, R1, R2 and R3.<br/>
Site-1 (R1) and Site-2 (R2) have their own internet uplinks, but Site-3 (R3) connects to internet via R2. A GRE tunnel is built between R2 and R3 and applied an MTU of 1440, due to some constraints in the transit network between them. </p>
<p>Here are details about network configuration:</p>
<ul>
<li>for backup purposes, a backdoor link exists between sites R1 and R2</li>
<li>R2 performs NAT for all internal addresses of Site-2 and Site-3 (172.16.0.0/12 & 192.168.0.0/16) for traffic that is sent toward the Internet</li>
<li>the main server, <purple><strong>1.1.1.10</strong></purple>, which runs in Site-1 (behind R1), hosts two applications that use <purple><strong>TCP 1001</strong></purple> and <purple><strong>TCP 1002</strong></purple></li>
<li>since the TCP 1001 application is consuming a lot of bandwidth, Policy Based Routing (<strong>PBR</strong>) was configured on R2 to forward TCP 1001 over the backdoor link (so that internet access for users in Site-2 will not be impacted)</li>
<li>traffic for the TCP 1002 application (and for other potential applications) will be NAT-ed and sent over the Internet toward server in HQ</li>
</ul>
<p><a href="uploads/quiz-22.png" title="PBR Problem or Not?"><img alt="quiz-22 PBR Problem or Not?" src="uploads/quiz-22.png" title="PBR Problem or Not?"/></a> </p>
<p>After you applied the configuration in the figure above, the users in Site-3 (<strong>172.16.1.10</strong>) tried to upload data to the application server and sent you the following feedback:</p>
<ul>
<li><green>TCP 1001 works OK, using the backdoor link</green></li>
<li><red>TCP 1002 <strong>does <em>not</em> work</strong></red>: <em>the <u>connections</u> from Site-3 to server 1.1.1.10 <u>get established</u> but the transfer of data gets stalled soon after it is established and, in the end, it timesout</em></li>
</ul>
<p>You check the PBR configured on R2 and everything looks all right:</p>
<ul>
<li>TCP 1001 is forwarded over the backdoor link and works fine</li>
<li>TCP 1002 is not matching the PBR and it gets NAT-ed and forwarded to server over the internet (which is what you want/expect)</li>
</ul>
<p>As a last resort, you installed a sniffer and captured all incoming traffic on R2 sent by R3. Your conclusions were the following:</p>
<ul>
<li>TCP session (SYN/SYN-ACK/ACK) gets established for <u>both TCP 1001 and TCP 1002</u></li>
<li>you notice a lot of fragments and retransmissions for data transfers of <u>both applications, TCP 1001 and TCP 1002</u></li>
<li>TCP 1001 finishes the data transfer with FIN/FIN-ACK (and customer confirms that <blue><em>TCP 1001 works ok</em></blue>)</li>
<li>TCP 1002 transfers get stuck, there are no FIN/FIN-ACK (and customer complains that <red><em>TCP 1002 is not working</em></red>)</li>
</ul>
<p><em>Here are some snapshots of the captured traffic:</em></p>
<div class="row">
<div class="col-sm-6"><b>TCP 1001 (working)</b>
<a href="/uploads/TCP-1001-working.png"><img src="/uploads/TCP-1001-working.png"/></a>
</div>
<div class="col-sm-6"><b>TCP 1002 (not working)</b>
<a href="uploads/TCP-1002-not-working.png"><img src="uploads/TCP-1002-not-working.png"/></a>
</div>
</div>
<p><strong><em>What is the problem and how can you solve it ?</em></strong> </p>
<p><em>Post your answer in the 'Comments' section below and subscribe to this blog to get the detailed solution and more interesting quizzes.</em></p>
<p><br> </br></p>How Could MTU affect BGP Sessions ?2014-01-25T00:00:00+00:00Costitag:costiser.ro,2014-01-25:2014/01/25/how-could-mtu-affect-bgp-sessions/<p><span class="dropcap-bg">T</span>his post represents the solution and explanation for <a href="/2013/08/28/quiz-18/index.html">quiz-18</a>.
Have a look at it to understand the problem.</p>
<h3 id="quiz-review">Quiz Review</h3>
<p>A company using a multi-vendor routing platforms (Cisco and Juniper) has a HQ and multiple spoke sites connected by an MPLS provider. Each remote site has a GRE tunnel with the Headquarter (HQ) and runs BGP over it. </p>
<p>After attending a security training, your Security Team raised concerns about ICMP-based attacks and decided <purple>to block ICMP messages on all physical interfaces connected to outside networks</purple>, on all border routers, in all sites.<br/>
Some time later, all the BGP sessions between Cisco and Juniper devices started flapping up/down, impactiving the connectivity between HQ and Juniper-based sites, while the BGP sessions between HQ (Cisco-based) to other Cisco-based sites were ok. </p>
<p>As most of you spotted already, dropping all ICMP messages affects Path MTU Discovery (PMTUD) which in turn impacts end to end connectivity (in this case, BGP session)... but why is there a difference between Cisco and Juniper ? We will see that, but before let's do some simple math:</p>
<div class="row"><div class="col-md-12">
<div class="panel panel-green">
<div class="panel-heading"><i class="fa fa-binoculars"></i> Review of different MTU values</div>
<div class="panel-body"><ul>
<li>by default, <red><b>Ethernet MTU is 1500 bytes</b></red> (full Ethernet is <b>1518 = 1514 Ethernet II header + 4 bytes checksum</b>)</li>
<li>by default, <red><b>GRE tunnel MTU is 1476</b></red> = 1500 - (20 bytes IP header + 4 bytes GRE header)</li>
<li><red><b>MPLS adds a 4-byte</b></red> overhead for each label - by default, if MPLS MTU is not configured, this will be <red><b>1492 bytes</b></red> (accounting for 2 labels)</li>
<li>by default, the <red><b>TCP MSS</b></red> (Maximum Segment Size) is automatically calculated by substracting <red><b>40</b></red> (<b>20-bytes IP header + 20-bytes TCP header</b>) from the MTU of the outgoing interface: <ul>
<li>TCP MSS is the maximum size of the TCP payload</li>
<li>TCP MSS is negociated (the lower should be chosen) between source and destination during the TCP 3-way handshake, in the SYN & SYN/ACK packets</li>
<li>for example: MSS for a TCP outgoing an Ethernet interface would be 1500 - 40 = <red><b>1460 bytes</b></red></li>
<li>another example: MSS for a TCP outgoing a GRE tunnel interface would be 1476 - 40 = <red><b>1436 bytes</b></red></li></ul></li>
</ul>
</div>
</div>
</div></div>
<p><a href="/uploads/solution-quiz-18-all.png" title="How Could MTU affect BGP Sessions"><img alt="solution-quiz-18-all How Could MTU affect BGP Sessions" src="/uploads/solution-quiz-18-all.png" title="How Could MTU affect BGP Sessions"/></a><br/>
<font size="-1"><em>Note the entire frame size of 1522 = 1508 (packet with 2 MPLS labels) + 14 (Ethernet II header)</em></font> </p>
<p>Now, for the BGP sessions, the math is like this:</p>
<ul>
<li>the maximum BGP UPDATE message would have a size of <purple><strong>1436 bytes</strong></purple> = this is the TCP MSS for a BGP over GRE tunnel (see above)</li>
<li>when such a packet reaches the PE, its size would be <purple><strong>1500 bytes</strong> = <strong>1436</strong></purple> (BGP payload) + <purple><strong>20</strong></purple> (TCP header) + <purple><strong>20</strong></purple> (IP header) + <purple><strong>4</strong></purple> (GRE header) + <purple><strong>20</strong></purple> (outer IP header)</li>
<li>the quiz does not make any reference to the MTU size inside the MPLS cloud as there is no MTU configuration on the MPLS links - this is done on purpose to create the quiz => <red>a packet of 1500 bytes is <strong>too large</strong> for the MPLS links</red> (because PE needs to add 2 labels = another <strong>8 bytes</strong>)</li>
<li>as a result, <red><em>the PE will need to perform fragmentation of the BGP UPDATE message</em></red> </li>
</ul>
<h3 id="path-mtu-discovery-pmtud">Path MTU Discovery (PMTUD)</h3>
<p>For completeness of this article, in short, <strong>Path MTU Discovery</strong> consists of:</p>
<ul>
<li>source host sets DF-bit in the IP header to indicate that packet must not be fragmented in transit</li>
<li>intermediate routers (PE in our quiz) will drop these large packets: because they exceed the MTU of outgoing interface and because they are not allowed to fragment them due to DF-bit setting</li>
<li>intermediate routers will send an ICMP "Fragmentation Needed and DF set" back to source host (CE router for BGP session, in our quiz)</li>
<li><strong><em>very important:</em></strong> the ICMP "Fragmentation Needed" messages contains also the recommended MTU value</li>
</ul>
<p><a href="/uploads/icmp-fragmentation-needed.png" title="ICMP Fragmentation Needed"><img alt="icmp-fragmentation-needed" src="/uploads/icmp-fragmentation-needed.png" title="ICMP Fragmentation Needed"/></a> </p>
<h3 id="cisco-vs-juniper">Cisco vs. Juniper</h3>
<p>The difference between the BGP sessions established between Cisco-only sites (that were not impacted) and Cisco-Juniper ones (sites impacted) lies in the DF-bit setting ! </p>
<ul>
<li>By default, <purple><strong>Cisco does <u>not</u> set DF-bit for GRE tunnels</strong></purple> => this means that a BGP UPDATE of 1500-bytes would be fragmented by the PE before sending them over the 1492-bytes MPLS links.</li>
<li><purple>Junipers</purple>, on the other hand, by default <purple><strong><u>set the DF-bit</u> for GRE tunnels</strong></purple> => so a 1500-bytes BGP UPDATE with DF-bit set would not fit the 1492-bytes MPLS links. The PEs would <red><em>drop</em></red> them and send back to CEs an ICMP "Fragmentation Needed" indicating the MTU of the outgoing link (see above screenshot: 1492).</li>
</ul>
<p>This is visible on both Cisco PE and Juniper CE:<br/>
- debugging ICMP on PE:</p>
<div class="row">
<pre>R5#
*Mar 1 00:22:32.851: ICMP: dst (192.168.255.1) frag. needed and DF set unreachable sent to 192.168.255.2
*Mar 1 00:22:33.747: ICMP: dst (192.168.255.1) frag. needed and DF set unreachable sent to 192.168.255.2
R5#
*Mar 1 00:22:40.291: ICMP: dst (192.168.255.1) frag. needed and DF set unreachable sent to 192.168.255.2
*Mar 1 00:22:41.699: ICMP: dst (192.168.255.1) frag. needed and DF set unreachable sent to 192.168.255.2
</pre>
</div>
<p>- firewall logs on Juniper CE with the drops:</p>
<div class="row">
<pre>root@Router-1> <blue>show firewall</blue>
Filter: DENY_ICMP-ge-0/0/0.0-i
Counters:
Name Bytes Packets
deny-icmp-ge-0/0/0.0-i 22400 <red>400</red>
root@Router-1> <blue>show firewall log</blue>
Log :
Time Filter Action Interface Protocol Src Addr Dest Addr
23:14:23 DENY_ICMP-ge-0/0/0.0-i D ge-0/0/0.0 ICMP 192.168.2.1 192.168.255.2
23:14:07 DENY_ICMP-ge-0/0/0.0-i D ge-0/0/0.0 ICMP 192.168.2.1 192.168.255.2
23:13:59 DENY_ICMP-ge-0/0/0.0-i D ge-0/0/0.0 ICMP 192.168.2.1 192.168.255.2
...
root@Router-1> <blue>show firewall log detail</blue>
Time of Log: 2014-01-24 23:14:23 UTC, <red>Filter: DENY_ICMP-ge-0/0/0.0-i, Filter action: discard</red>, Name of interface: ge-0/0/0.0
Name of protocol: ICMP, Packet Length: 54189, Source address: 192.168.2.1, Destination address: 192.168.255.2
<red>ICMP type: 3, ICMP code: 4</red>
Time of Log: 2014-01-24 23:14:07 UTC, Filter: DENY_ICMP-ge-0/0/0.0-i, Filter action: discard, Name of interface: ge-0/0/0.0
Name of protocol: ICMP, Packet Length: 54189, Source address: 192.168.2.1, Destination address: 192.168.255.2
ICMP type: 3, ICMP code: 4
</pre>
</div>
<p><br>
If you are curious how the BGP session behaves on each end, here it is:</br></p>
<p><strong>Cisco CE in HQ</strong><br/>
The BGP session gets established but it does <strong>not</strong> learn any route. Notice:<br/>
- the 0 counter on the PfxRcd<br/>
- the Up/Down timer never gets more that "1:29" = 90 sec (the BGP default holdtime)</p>
<div class="row">
<pre>CE-HQ#
%BGP-5-ADJCHANGE: <red>neighbor 192.168.12.2 Up</red>
%BGP-3-NOTIFICATION: sent to neighbor 192.168.12.2 4/0 <red>(hold time expired)</red> 0 bytes
%BGP-5-NBR_RESET: Neighbor 192.168.12.2 reset (BGP Notification sent)
%BGP-5-ADJCHANGE: neighbor 192.168.12.2 Down BGP Notification sent
CE-HQ#
*Jan 24 22:42:18.519: <red>%BGP-5-ADJCHANGE: neighbor 192.168.12.2 Up</red>
CE-HQ#sh ip bgp summary
...
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
192.168.12.2 4 65200 2 8 1935 0 0 <red>00:01:27 0</red>
192.168.13.2 4 65300 11 15 1935 0 0 00:04:35 848
CE-HQ#
%BGP-3-NOTIFICATION: sent to neighbor 192.168.12.2 4/0 (hold time expired) 0 bytes
</pre>
</div>
<p><br>
<strong>Juniper CE in remote site</strong><br/>
The BGP session gets established and prefixes are learned over it. Notice the Flaps counter is non-zero</br></p>
<div class="row">
<pre>>root@Router-1> <blue>show bgp summary</blue>
Groups: 1 Peers: 1 Down peers: 0
Table Tot Paths Act Paths Suppressed History Damp State Pending
inet.0 1934 1934 0 0 0 0
Peer AS InPkt OutPkt OutQ Flaps Last Up/Dwn State|#Active/Received...
192.168.12.1 65000 9 16 0 10 <red>1:27 1934/1934/1934/0</red>
root@Router-1> <blue>show firewall</blue>
Filter: DENY_ICMP-ge-0/0/0.0-i
Counters:
Name Bytes Packets
deny-icmp-ge-0/0/0.0-i 5152 <red>92</red>
root@Router-1> show firewall log detail
Time of Log: 2014-01-24 22:46:38 UTC, <red>Filter: DENY_ICMP-ge-0/0/0.0-i, Filter action: <u>discard</u></red>,
Name of interface: ge-0/0/0.0
Name of protocol: ICMP, Packet Length: 54189,
Source address: 192.168.2.1, Destination address: 192.168.255.2
<red>ICMP type: 3, ICMP code: 4</red>
</pre>
</div>
<h3 id="solutions">Solutions</h3>
<h4 id="1-set-the-higher-mtu-inside-mpls">1. Set the higher MTU inside MPLS</h4>
<p>As mentioned above, the MPLS MTU was not set to take into account the labels, for the sake of this quiz. Considering the default MTU of physical interface of 1500, the MPLS MTU would be <strong>1492</strong> (for 2 labels). This value is easily seen in the ICMP "Fragmentation Needed" messages, as shown above. </p>
<p>Usually MPLS providers do provide an MTU of 1500 bytes to their customers. To do this we need to increase the MPLS MTU to at least 1508 - usually you set the MPLS MTU to 1516 (to accomodate 4 labels), but for this quiz we use only 2 MPLS labels: </p>
<div class="row">
<pre class="col-md-9">PE-2#<blue>sh mpls interfaces</blue>
Interface IP Tunnel Operational
FastEthernet0/1 Yes (ldp) No Yes
PE-2#
PE-2#conf t
Enter configuration commands, one per line. End with CNTL/Z.
PE-2#(config)#int fa0/1
PE-2#(config-if)#<purple>mpls mtu 1508</purple>
PE-2#
*Mar 1 00:47:45.651: %SYS-5-CONFIG_I: Configured from console by console
PE-2#sh mpls int detail
Interface FastEthernet0/1:
IP labeling enabled (ldp):
Interface config
LSP Tunnel labeling not enabled
BGP tagging not enabled
Tagging operational
Fast Switching Vectors:
IP to MPLS Fast Switching Vector
MPLS Turbo Vector
<green>MTU = 1508</green>
PE-2#</pre>
</div>
<p>This is, by far, the best solution, because it avoids fragmentation!! </p>
<h4 id="2-allow-icmp-fragmentation-needed-into-the-acl-on-juniper-side">2. Allow ICMP "Fragmentation Needed" into the ACL (on Juniper side)</h4>
<p>Another solutions to this problem is to modify the access-list / Juniper filter to permit the ICMP messages type 4 (destination unreachable) - code 3 (fragmentation needed) that are used to achieve the PMTUD (Path MTU Discovery).<br/>
In general, it's a good practice to allow the ICMP "Fragmentation needed" messages into access-lists, whenever ICMP protocol is filtered.<br/>
For this quiz, these ICMP messages needs to be allowed by the firewall filter only on the Juniper devices (because it's the Juniper that sets DF-bit in the GRE packets). </p>
<div class="row">
<pre class="col-md-9">root@Router-1> <blue>show configuration firewall filter DENY_ICMP</blue>
interface-specific;
<green>term ALLOW_PMTUD {
from {
protocol icmp;
icmp-type unreachable;
icmp-code fragmentation-needed;
}
then {
count allow-pmtud;
log;
accept;
}
}</green>
term DENY_ICMP {
from {
protocol icmp;
}
then {
count deny-icmp;
log;
discard;
}
}
term ALLOW_ALL {
then accept;
}</pre>
</div>
<p>After commiting this change, the BGP between Cisco-HQ and Juniper-sites became stable. The firewall counters and logs show that ICMP "Fragmentation Needed" messages are allowed on Juniper: </p>
<div class="row">
<pre>root@Router-1> <blue>show firewall</blue>
Filter: __default_bpdu_filter__
Filter: DENY_ICMP-ge-0/0/0.0-i
Counters:
Name Bytes Packets
<green>allow-pmtud-ge-0/0/0.0-i 224 4</green>
deny-icmp-ge-0/0/0.0-i 168 2
root@Router-1> <blue>show firewall log detail</blue>
Time of Log: 2014-01-22 22:55:30 UTC, Filter: <green>DENY_ICMP-ge-0/0/0.0-i, Filter action: <u>accept</u></green>, Name of interface: ge-0/0/0.0
Name of protocol: ICMP, Packet Length: 54189, Source address: 192.168.2.1, Destination address: 192.168.255.2
ICMP type: 3, ICMP code: 4
</pre>
</div>
<h4 id="3-apply-the-allow-fragmentation-on-the-tunnel-interface-on-juniper">3. Apply the <code>allow-fragmentation</code> on the Tunnel Interface (on Juniper)</h4>
<p>By default, GRE packets will be dropped if they exceed the MTU of the outgoing physical interface. Instead of dropping them, you can tell the Juniper router to split them into more IP fragments - this is achieved with command <code>allow-fragmentation</code> under the gr- (tunnel) interface: </p>
<div class="row">
<pre class="col-md-7">root@Router-1> <blue>show configuration interfaces gr-0/0/0</blue>
unit 0 {
tunnel {
source 192.168.255.2;
destination 192.168.255.1;
<purple>allow-fragmentation</purple>;
}
family inet {
address 192.168.12.2/30;
}
}</pre>
</div>
<p>Since you allow fragmentation of the GRE packets, then it will not set the DF-bit anymore. This is the reason why I consider this solution to be more of a <em>workaround</em> since in fact you don't solve the problem: large BGP Updates messages are still sent and they get fragmented on MPLS PE routers.<br/>
A real solution would be to avoid fragmentation ! </p>
<h4 id="4-implement-mss-clamping">4. Implement MSS Clamping</h4>
<p>Another good solution to avoid fragmentation is to use the "MSS Clamping". This feature will modify (usually decrease) the MSS value in the SYN and SYN/ACK packets to the configured value. As shown above the MSS value for the BGP sessions that run over GRE tunnels is 1436 (= 1476 (GRE MTU) - 40 (IP+TCP headers)).<br/>
On Cisco devices, this is implemented at the global level with <code>ip tcp mss</code> or at the interface level with <code>ip tcp adjust-mss</code>: </p>
<div class="row">
<pre>CE-HQ(config)#<purple>ip tcp mss 1400</purple>
CE-HQ(config)#end
CE-HQ#
CE-HQ#clear ip bgp 192.168.12.2
CE-HQ#
%BGP-5-ADJCHANGE: neighbor 192.168.12.2 Down User reset
%BGP_SESSION-5-ADJCHANGE: neighbor 192.168.12.2 IPv4 Unicast topology base removed from session User reset
%BGP-5-ADJCHANGE: neighbor 192.168.12.2 Up
CE-HQ#sh ip bgp s
CE-HQ#sh ip bgp summary
...
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
192.168.12.2 4 65200 13 12 2835 0 0 <green>00:00:19 999</green>
192.168.13.2 4 65300 83 88 2835 0 0 01:10:53 848
CE-HQ#
CE-HQ#sh ip bgp nei 192.168.12.2 | i max
Number of NLRIs in the update sent: max 1010, min 0
minRTT: 48 ms, maxRTT: 484 ms, ACK hold: 200 ms
Datagrams (<green>max data segment is 1400 bytes</green>):
CE-HQ#</pre>
</div>
<p>This is a screenshot of the TCP 3-way handshake for the BGP between HQ and remote-site:<br/>
<a href="/uploads/mss-values.png" title="MSS Values"><img alt="mss-values MSS Values" src="/uploads/mss-values.png" title="MSS Values"/></a> </p>
<h4 id="5-additional-tests-run-on-juniper">5. Additional tests run on Juniper</h4>
<p>On Juniper, I tried several other options that, theoretically, represent solution to this quiz:</p>
<ol>
<li>use <code>no-gre-path-mtu-discovery</code> to disable PMTUD for GRE. This can be applied either on the GRE interface or under <strong>system internet-options</strong><br/>
For unknown reasons (I suspect due to virtual hardware that I used for testing) this solution did not work for me.</li>
<li>use <code>no-path-mtu-discovery</code> to disable PMTUD for all outgoing TCP connections.<br/>
This can also be applied either on the GRE interface or under "system internet-options".<br/>
Although this may look as a solution at the first sight, it's not working because it disables PMTUD on the TCP (BGP sessions, in our case) which represents the inner header, not for the outer IP header.</li>
</ol>
<p>Last but not least, let me mention here that, with current IOS version, BGP performs PMTUD by default: </p>
<div class="row">
<pre class="col-md-6">CE-HQ#sh ip bgp nei 192.168.12.2 | i path-mtu
<purple>Transport(tcp) path-mtu-discovery is enabled</purple>
CE-HQ#</pre>
</div>
<p>Uuuuu, I had not idea that this post will be sooo long... but I wanted to touch all aspects of <a href="/2013/08/28/quiz-18/">the quiz</a> and I hope it will be an interesting reading !</p>
<p><em>Thank you for your comments and interest in the quiz!<br/>
Subscribe to this blog to get more interesting quizzes and detailed solutions.</em> </p>
<p><br/></p>TOP 5 Most Commented Quizzes in 20132014-01-08T00:00:00+00:00Costitag:costiser.ro,2014-01-08:2014/01/08/top-5-most-commented-quizzes-in-2013/<p><span class="dropcap-bg">H</span><strong><em>ello and welcome to CostiSer.Ro in 2014 !!</em></strong></p>
<p>As most people do, I also have my own resolutions for the new year and one of them is to write more articles, come up with more interesting (not necessarily more difficult) quizzes and (hopefully) discuss topics, such as SDN, that are "burning" the minds of all the network engineers.<br />
2014 will be the year that I will have to re-certify my R&S by going for the 2nd CCIE - the only question will be: "<em>what track: Security or Service Provider?</em>"... and, as I usually do, the answer will depend on what I do more in my current employment. </p>
<p>My first post of this year will be a review of Top 5 most commented quizzes from 2013. Here it goes: ... rat-a-tat-a-tat-a (I hear the drums, don’t ask me why)... </p>
<h3>Number 1: Quiz #6 – Routing protocols over IPsec (59 comments)</h3>
<div class="row"><div class="col-sm-6">
<a href="/2013/02/03/quiz-6/" title="Quiz-6 Routing Protocols over IPsec"><img alt="Quiz-6 Routing Protocols over IPsec" src="/uploads/quiz-6.png" title="Quiz-6 Routing Protocols over IPsec" /></a>
</div></div>
<p>The winner of 2013 is quiz-6 with a only 1 extra comment in front of the second place.<br />
This quiz talks about running routing protocols over tunnels (GRE over IPsec) and the problems appear in case the tunnel destination are learnt/advertised via the tunnel itself, situation known as recursive routing.<br />
In this particular case, tunnel destinations were advertised into the routing protocol due to a "network 0.0.0.0" command.<br />
The solutions were discussed <a href="/recursive-routing.html">here</a> and talk about filtering tunnel destinations from being sent/received via the tunnel or about setting static routes that points to the physical interfaces. </p>
<h3>Number 2: Quiz #9 – BGP peering over a Cisco ASA (58 comments)</h3>
<div class="row"><div class="col-sm-6">
<a href="/2013/02/20/quiz-9/" title="BGP Peering over a Cisco ASA"><img alt="quiz-9 BGP Peering over a Cisco ASA" src="/uploads/quiz-9.png" title="BGP Peering over a Cisco ASA" /></a>
</div></div>
<p>The runner-up of 2013 is a quiz about establishing BGP sessions protected by MD5 passwords over networks that involves stateful firewalls in between the peers.<br />
The catch in this case is that BGP uses TCP Option 19 to perform the authentication and firewalls usually clear or drop TCP sessions that contain TCP Options. Besides this, TCP sequence numbers are randomized by most firewalls, which also impacts BGP authentication.<br />
More explanation about this can be found in <a href="/bgp-md5-authentication.html">the post with the solution</a> to this quiz. </p>
<h3>Number 3: Quiz #12 – OSPF Improper Path Selection (55 comments)</h3>
<div class="row"><div class="col-sm-6">
<a href="/2013/04/04/quiz-12/" title="OSPF Improper Path Selection"><img alt="quiz-12 OSPF Improper Path Selection" src="/uploads/quiz-12.png" title="OSPF Improper Path Selection" /></a>
</div></div>
<p>Although in the 3rd place in 2013, I consider quiz-12 to be one of the most difficult quizzes that I came up with. This is because it involves an OSPF feature that is not widely used/understood: Forwarding Address (FA).<br />
The difficulty of the quiz appears due to special conditions needed to be true in order to set a non-zero FA. The full explanation of these conditions and solution to the quiz can be found in <a href="/ospf-understanding-the-forwarding-address-fa.html">this post</a>. </p>
<h3>Number 4: Quiz #4 – BGP over ISP (54 comments)</h3>
<div class="row"><div class="col-sm-6">
<a href="/2013/01/22/quiz-4/" title="quiz-4 BGP over ISP"><img alt="quiz-4 BGP over ISP" src="/uploads/quiz-4.png" title="quiz-4 BGP over ISP" /></a>
</div></div>
<p>This quiz-4 shows a situation where an eBGP peering does not get established due to the fact that the peer is reachable over the default route (from routing perspective).<br />
The <a href="/bgp-over-a-default-route.html">solution</a> to this quiz also shows that a BGP session will be established when the receiver (server side of the BGP session) gets the session initiation request from a peer reachable over the default route, but it will never act as a requester (client side) towards such a peer. </p>
<h3>Number 5: Quiz #16 – BGP Filtering Updates (43 comments)</h3>
<div class="row"><div class="col-sm-6">
<a href="/2013/08/01/quiz-16/" title="quiz-16 BGP Filtering Updates"><img alt="quiz-16 BGP Filtering Updates" src="/uploads/quiz-16.png" title="quiz-16 BGP Filtering Updates" /></a>
</div></div>
<p>Another corner-case scenario is represented by quiz-16. When trying to suppress inactive routes from being advertised to other BGP peers, some conditions about the next-hop must be fulfilled - see details in the <a href="/bgp-suppress-inactive-and-next-hop-matches.html">solution post</a>. </p>
<p>Although not in top 5, probably due to the fact that it was published towards the end of 2013, I would also like to mention another quiz that I consider to be very difficult: <a href="/2013/11/29/quiz-21/">quiz-21</a> about EIGRP used as a CE-PE routing protocol. </p>
<p><em>Thank you for all your comments to my quizzes in 2013 !!<br />
I will try to continue this work over 2014.</em> </p>
<p><em><strong>Happy New 2014 !!</strong></em> </p>
<p>Costi</p>Quiz #21 – EIGRP as CE-PE2013-11-29T00:00:00+00:00Costitag:costiser.ro,2013-11-29:2013/11/29/quiz-21/<p><span class="dropcap">Y</span>ou have just received a nice job at a big enterprise that has multiple sites connected over their own managed MPLS Core. Each site runs EIGRP as the CE - PE routing protocol. </p>
<p>Two of these sites, Site-A and Site-B, have an additional direct link between each other as in the below diagram. </p>
<p>With the standard configuration, each site is reachable via its respective PE (for example, all traffic from MPLS cloud - other sites - to Site-A is via PE-1/CE-1 link), while the traffic between Site-A and Site-B uses the direct link between CE-1 and CE-2. </p>
<p><a href="/uploads/quiz-21.png" title="Quiz 21 - EIGRP as CE-PE Protocol"><img alt="Quiz 21 - EIGRP as CE-PE Protocol" src="/uploads/quiz-21.png" title="Quiz 21 - EIGRP as CE-PE Protocol"/></a> </p>
<p><br>
At this moment, traffic <strong>from PE-2 to Site-A</strong>'s 192.168.1.55 will go <blue>via PE-1</blue>: </br></p>
<div class="row">
<pre class="col-md-8">PE-2#<blue>traceroute vrf CUST_A 192.168.1.55</blue>
Type escape sequence to abort.
Tracing the route to 192.168.1.55
1 10.0.0.6 [MPLS: Labels 16/19 Exp 0] 60 msec 60 msec 40 msec
2 192.168.1.1 [MPLS: Label 19 Exp 0] 36 msec 36 msec 40 msec
3 192.168.1.2 44 msec * 20 msec
PE-2#</pre>
</div>
<p>Because in the near future a new site will be connected to PE-2, you have been assigned <strong><em>the task of configuring the network in such a way that traffic from PE-2 to Site-A's 192.168.1.55 will go via Site-B (CE-2)</em></strong> <em>instead of going over MPLS core!</em> </p>
<p><font color="blue"><strong><em>How would you complete this task?</em></strong> - preferably only for prefix 192.168.1.55 !</font> </p>
<p>You have checked the routing information on PE-2 and noticed that the prefix is learned from BGP over the MPLS cloud: </p>
<div class="row">
<pre class="col-md-11">PE-2#<blue>sh ip route vrf CUST_A 192.168.1.55</blue>
Routing entry for 192.168.1.55/32
<purple>Known via "bgp 100"</purple>, distance 200, metric 156160, type internal
Redistributing via eigrp 100
Advertised by eigrp 100 metric 100000 10 255 1 1500
bgp 100 (self originated)
Last update from 10.255.255.1 00:22:30 ago
Routing Descriptor Blocks:
<purple>* 10.255.255.1 (Default-IP-Routing-Table), from 10.255.255.1, 00:22:30 ago</purple>
Route metric is 156160, traffic share count is 1
AS Hops 0
PE-2#
PE-2#<blue>sh bgp vpnv4 uni all 192.168.1.55</blue>
BGP routing table entry for 100:1:192.168.1.55/32, version 22
Paths: (1 available, best #1, table CUST_A)
Not advertised to any peer
Local
<purple>10.255.255.1 (metric 3) from 10.255.255.1 (10.255.255.1)
Origin incomplete, metric 156160, localpref 100, valid, internal, best</purple>
Extended Community: RT:100:1 Cost:pre-bestpath:128:156160
0x8800:32768:0 0x8801:100:130560 0x8802:65281:25600 0x8803:65281:1500
mpls labels in/out nolabel/19
PE-2#</pre>
</div>
<p>You tried to influence the BGP path selection by setting a high local preference on the redistributed EIGRP routes, but <em>unfortunately PE-2 still choses the prefix received over the MPLS as the best path</em>: </p>
<div class="row">
<pre class="col-md-8">ip access-list standard CE1_LOOPBACK
permit 192.168.1.55
!
route-map SET_LP_500 permit 10
match ip address CE1_LOOPBACK
<red>set local-preference 500</red>
route-map SET_LP_500 permit 999
!
router bgp 100
address-fam ipv4 vrf CUST_A
<red>redistribute eigrp 100 route-map SET_LP_500</red>
</pre>
</div>
<div class="row">
<pre class="col-md-11">PE-2#sh bgp vpnv4 uni all 192.168.1.55
BGP routing table entry for 100:1:192.168.1.55/32, version 22
Paths: (1 available, best #1, table CUST_A)
Not advertised to any peer
Local
<red>10.255.255.1 (metric 3) from 10.255.255.1 (10.255.255.1)
Origin incomplete, metric 156160, localpref 100, valid, internal, best</red>
Extended Community: RT:100:1 Cost:pre-bestpath:128:156160
0x8800:32768:0 0x8801:100:130560 0x8802:65281:25600 0x8803:65281:1500
mpls labels in/out nolabel/19
<red>!!
!! the prefix received over MPLS (with default LP = 100) is still chosen as best !!
!!</red></pre>
</div>
<p><strong><em>Why is that happening? How would you configure the network to achieve the desired result ?</em></strong> </p>
<p><br>
<!-- Tab v1 -->
<div class="row">
<div class="tab-v1">
<ul class="nav nav-tabs col-md-8">
<li class="active"><a data-toggle="tab" href="#tab-1">CE-1</a></li>
<li><a data-toggle="tab" href="#tab-2">CE-2</a></li>
<li><a data-toggle="tab" href="#tab-3">PE-1</a></li>
<li><a data-toggle="tab" href="#tab-4">PE-2</a></li>
<li><a data-toggle="tab" href="#tab-5">P-Core</a></li>
</ul>
<div class="tab-content col-md-8">
<div class="tab-pane fade in active" id="tab-1">
<pre class="configs">
hostname CE-1
!
ip cef
!
!
interface Loopback0
ip address 192.168.1.55 255.255.255.255
!
interface FastEthernet0/0
ip address 192.168.1.2 255.255.255.252
speed 100
full-duplex
!
interface FastEthernet0/1
ip address 192.168.12.1 255.255.255.252
speed 100
full-duplex
!
router eigrp 100
network 192.168.0.0 0.0.255.255
no auto-summary
!
</pre> <br/>
</div>
<div class="tab-pane fade in" id="tab-2">
<pre class="configs">
hostname CE-2
!
ip cef
!
!
interface Loopback0
ip address 192.168.2.55 255.255.255.255
!
interface FastEthernet0/0
ip address 192.168.2.2 255.255.255.252
speed 100
full-duplex
!
interface FastEthernet0/1
ip address 192.168.12.2 255.255.255.252
speed 100
full-duplex
!
router eigrp 100
network 192.168.0.0 0.0.255.255
no auto-summary
! <br/>
</pre>
</div>
<div class="tab-pane fade in" id="tab-3">
<pre class="configs">
hostname PE-1
!
ip cef
!
ip vrf CUST_A
rd 100:1
route-target export 100:1
route-target import 100:1
!
!
interface Loopback0
ip address 10.255.255.1 255.255.255.255
!
interface FastEthernet0/0
ip vrf forwarding CUST_A
ip address 192.168.1.1 255.255.255.252
speed 100
full-duplex
!
interface FastEthernet0/1
ip address 10.0.0.1 255.255.255.252
speed 100
full-duplex
mpls ip
!
router eigrp 1
auto-summary
!
address-family ipv4 vrf CUST_A
redistribute bgp 100 metric 100000 10 255 1 1500
network 192.168.1.1 0.0.0.0
no auto-summary
autonomous-system 100
exit-address-family
!
router ospf 1
log-adjacency-changes
network 10.0.0.0 0.255.255.255 area 0
!
router bgp 100
no bgp default ipv4-unicast
bgp log-neighbor-changes
neighbor 10.255.255.2 remote-as 100
neighbor 10.255.255.2 update-source Loopback0
!
address-family vpnv4
neighbor 10.255.255.2 activate
neighbor 10.255.255.2 send-community extended
exit-address-family
!
address-family ipv4 vrf CUST_A
redistribute eigrp 100
no synchronization
exit-address-family
!
</pre>
</div>
<div class="tab-pane fade in" id="tab-4">
<pre class="configs">
hostname PE-2
!
ip cef
!
ip vrf CUST_A
rd 100:1
route-target export 100:1
route-target import 100:1
!
!
interface Loopback0
ip address 10.255.255.2 255.255.255.255
!
interface FastEthernet0/0
ip vrf forwarding CUST_A
ip address 192.168.2.1 255.255.255.252
speed 100
full-duplex
!
interface FastEthernet0/1
ip address 10.0.0.5 255.255.255.252
speed 100
full-duplex
mpls ip
!
router eigrp 1
auto-summary
!
address-family ipv4 vrf CUST_A
redistribute bgp 100 metric 100000 10 255 1 1500
network 192.168.2.1 0.0.0.0
no auto-summary
autonomous-system 100
exit-address-family
!
router ospf 1
log-adjacency-changes
network 10.0.0.0 0.255.255.255 area 0
!
router bgp 100
no bgp default ipv4-unicast
bgp log-neighbor-changes
neighbor 10.255.255.1 remote-as 100
neighbor 10.255.255.1 update-source Loopback0
!
address-family vpnv4
neighbor 10.255.255.1 activate
neighbor 10.255.255.1 send-community extended
exit-address-family
!
address-family ipv4 vrf CUST_A
redistribute eigrp 100 route-map SET_LP_500
no synchronization
exit-address-family
!
ip access-list standard CE1_LOOPBACK
permit 192.168.1.55
!
route-map SET_LP_500 permit 10
match ip address CE1_LOOPBACK
set local-preference 500
!
route-map SET_LP_500 permit 999
!
</pre>
</div>
<div class="tab-pane fade in" id="tab-5">
<pre class="configs">
hostname P-CORE
!
ip cef
!
!
interface FastEthernet0/0
ip address 10.0.0.2 255.255.255.252
speed 100
full-duplex
mpls ip
!
interface FastEthernet0/1
ip address 10.0.0.6 255.255.255.252
speed 100
full-duplex
mpls ip
!
router ospf 1
log-adjacency-changes
network 10.0.0.0 0.255.255.255 area 0
!
</pre>
</div>
</div>
</div>
</div>
<!-- End Tab v1 --> </br></p>
<p><em>Post your answer in the 'Comments' section below and subscribe to this blog to get the detailed solution and more interesting quizzes.</em> </p>
<p><br/></p>Cisco vs. Juniper – Advertising Inactive Routes into BGP2013-11-21T00:00:00+00:00Costitag:costiser.ro,2013-11-21:2013/11/21/cisco-vs-juniper-advertising-inactive-routes-into-bgp/<p><span class="dropcap-bg">T</span>his post represents the solution and explanation for <a href="/2013/08/11/quiz-17/index.html">quiz-17</a>.<br/>
Have a look at the quiz to understand the problem.</p>
<h3 id="quiz-review">Quiz Review</h3>
<p>The quiz presents a situation when the network is refreshed by swapping the Cisco routers with Juniper ones.<br/>
It's far from me the intention of discussing which one is better...the reason for this quiz is to present different approaches chosen by these two vendors when implementing BGP advertisements.<br/>
There are a lot of differences but this article discusses the default behaviour for <em>advertising inactive routes by BGP</em>. </p>
<p>The <blue><strong>inactive routes</strong></blue> are routes that are <em><strong>not installed into the RIB (not selected as best path)</strong></em>, most of the times because they are also learned from another routing protocol that has a better (read lower) administrative distance or route preference, in Juniper terminology.<br/>
As a revision of these values, below is a table of Cisco's AD and Juniper's Route Preference for some of the routing protocols: </p>
<p><a href="/uploads/cisco-ad-vs-juniper-route-pref.png" title="Cisco vs Juniper Route Preferences - Administrative Distance"><img alt="cisco-ad-vs-juniper-route-pref" src="/uploads/cisco-ad-vs-juniper-route-pref.png" title="Cisco vs Juniper Route Preferences - Administrative Distance"/></a>
<em><font size="-1">Note that this table does <strong>not</strong> contain all routing sources!</font></em> </p>
<p>Getting back to the quiz, R1 and R2 are part of the OSPF Area 0 and also run an iBGP session between them. R1 advertises local subnets in both OSPF and BGP. The configuration applied to Juniper devices "matches" Cisco configuration, meaning: there is no import/export policies applied (Juniper's BGP Default Policy is Accept All/Advertise All, same as Cisco's). </p>
<p><a href="/uploads/quiz-17-solution-2.png" title="Cisco vs Juniper - quiz solution"><img alt="quiz-17-solution-2" src="/uploads/quiz-17-solution-2.png" title="Cisco vs Juniper - quiz solution"/></a>
<em><font size="-1">Note that this article does <strong>not</strong> discuss BGP design "best practices"</font></em> </p>
<p>In this topology, when R2 is a Cisco device, R3 will receive the 192.168.100.0/24 and 192.168.200.0/24 prefixes... but with Juniper as R2, these routes are not received by R3. </p>
<h3 id="default-behaviour-on-cisco-vs-juniper">Default behaviour on Cisco vs. Juniper</h3>
<p>The different result seen on router R3 is due to the different default behavior:</p>
<ul>
<li><red>by default, <strong>CISCO advertises inactive routes</strong></red> - this can be disabled with command <code>bgp suppress-inactive</code>, but only in <a href="/2013/10/19/bgp-suppress-inactive-and-next-hop-matches/"><em>special situations, depending whether next-hop matches or not</em></a> !</li>
<li><red>by default, <strong>JUNIPER does <u>not</u> advertise inactive routes</strong></red> - this can be enabled with command <code>advertise-inactive</code></li>
</ul>
<p>In my opinion, in a good network design (please read "in most situations", as I don't want to debate here when & why a network design is better than another) you would not have to deal with BGP inactive routes. In the routing world, where all advertisements/redistribution are done from the RIB / active routes, the Juniper approach seems logical. On the other hand, Cisco seems to support designs where prefixes are "leaked" into the BGP domain on devices that are not at the edge of the network (like in this quiz: 192.168.x00.0/24 get into BGP on R1 instead of edge router, R2). </p>
<p>Please note that for both vendors, the inactive route needs to be selected as <u>best path in the BGP table</u> in order to have the option of being advertised ! </p>
<h3 id="displaying-the-inactive-routes-on-cisco-and-juniper">Displaying the inactive routes on Cisco and Juniper</h3>
<p>Inactive routes appear in the BGP table with the prefix of "r" which means "RIB-failure":</p>
<div class="row">
<pre class="col-md-11">R2#<blue>show ip bgp</blue>
BGP table version is 5, local router ID is 192.168.23.2
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
<red>r RIB-failure</red>, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
<red>r</red>>i192.168.100.0 192.168.12.1 0 100 0 i
<red>r</red>>i192.168.200.0 192.168.12.1 0 100 0 i
R2#
R2#sh ip route 192.168.100.0
Routing entry for 192.168.100.0/24
<red>Known via "ospf 1"</red>, distance 110, metric 2, type intra area
Last update from 192.168.12.1 on FastEthernet0/0, 00:01:21 ago
Routing Descriptor Blocks:
* 192.168.12.1, from 192.168.200.1, 00:01:21 ago, via FastEthernet0/0
Route metric is 2, traffic share count is 1
</pre>
</div>
<p>Spotting the inactive routes on Juniper is much easier due to the fact that the output of the command "show route" contains information about all routing sources:</p>
<div class="row">
<pre class="col-md-9">root@Router-2> <blue>show route 192.168.100.0</blue>
inet.0: 7 destinations, 9 routes (7 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both
192.168.100.0/24 <red>*[OSPF/10]</red> 00:11:28, metric 1
> to 192.168.12.1 via em1.0
**[BGP/170]** 00:11:26, localpref 100
AS path: I
> to 192.168.12.1 via em1.0
</pre>
</div>
<p>This simple command "show route" display both active route (OSPF, preference 10, marked with a "<red>*</red>") and inactive route (BGP, preference 170). </p>
<p>Using the detailed/extensive version, "show route extensive", you will also see detailed output for each routing information and in case of the BGP inactive route, the output will contain the reason why it is inactive ! </p>
<h3 id="solutions">Solutions</h3>
<p>The best solution, for this scenario, is to use the "advertise-inactive" command on Juniper router R2:</p>
<div class="row">
<pre class="col-md-8">root@Router-2> <blue>show configuration protocols bgp</blue>
group AS_65100 {
type internal;
neighbor 192.168.12.1 {
peer-as 65100;
}
}
group AS_65300 {
type external;
<red>advertise-inactive</red>;
neighbor 192.168.23.3 {
peer-as 65300;
}
}</pre>
</div>
<p>Of course, other solutions are possible, in order of my own preference:</p>
<ul>
<li>announce internal routes into BGP on the edge router R2, instead of the "internal" router R1</li>
<li>redistribute the OSPF routes into BGP on router R2</li>
<li>change the default route preference, either make BGP "better" (read lower) than OSPF or vice-versa. The best approach would be to change the default preference with a routing policy rather than changing it for the whole protocol, which may create even bigger problems than the initial one trying to solve</li>
</ul>
<p><em>Thank you for your comments and interest in the quiz!<br/>
Subscribe to this blog to get more interesting quizzes and detailed solutions.</em> </p>
<p><br/></p>Quiz #20 – NAT between Two Partner Companies2013-10-26T00:00:00+01:00Costitag:costiser.ro,2013-10-26:2013/10/26/quiz-20/<p><span class="dropcap">Y</span>our company has a border router (R2) that is connected to two partner companies: <strong>Partner-DB (R1)</strong> providing database services and <strong>Partner-APP (R3)</strong> providing different application services to your web servers in DMZ (200.200.200.0/24).<br/>
R2 is also used to perform NAT between <strong>internal LAN</strong> (fa1/1 = <blue>ip nat inside</blue>) and <strong>the ISP</strong> (fa1/2 = <blue>ip nat outside</blue>). At this moment there is <strong><em>not NAT</em></strong> configured on DMZ interface (fa1/0) and the 2 connections to the partners (R1 & R3). </p>
<p><a href="/uploads/quiz-20.png" title="NAT between Two Partner Companies"><img alt="quiz-20" src="/uploads/quiz-20.png" title="NAT between Two Partner Companies"/></a> <br/>
Currently, your web server in DMZ (200.200.200.4) can connect to both DB (R1) and APP (R3) and it does <u>not</u> need any NAT. </p>
<p>After short time, <strong><em>a new requirement</em></strong> appear: the two partners, DB and APP, requires connectivity between themselves, via your router R2... but both of them share the same internal addressing (192.168.0.0/16) and their border routers (R1 and R3) do <u>not have NAT capabilities</u>. </p>
<p>You have been requested to make the connectivity between R1 and R3 and since you have unused addresses in your public DMZ range (200.200.200.0/24) you suggest the following solution:</p>
<ul>
<li>Partner-DB (R1) / <purple><strong>192.168.1.1 will be translated to 200.200.200.1</strong></purple>, only when going to other partner</li>
<li>Partner-APP (R3) / <purple><strong>192.168.3.3 will be translated to 200.200.200.3</strong></purple>, only when going to other partner</li>
</ul>
<p>How could you do this, <br/>
<red><strong><em>considering that you cannot enable NAT on the DMZ interface (since this will break other production services in DMZ) !</em></strong></red> </p>
<p><em>Post your answer in the 'Comments' section below and subscribe to this blog to get the detailed solution and more interesting quizzes.</em> </p>
<p><br/></p>BGP – Suppress-Inactive and Next-Hop Matches2013-10-19T00:00:00+01:00Costitag:costiser.ro,2013-10-19:2013/10/19/bgp-suppress-inactive-and-next-hop-matches/<p><span class="dropcap-bg">T</span>his post represents the solution and explanation for <a href="/2013/08/01/quiz-16/index.html">quiz-16</a>. Have a look at the quiz to understand the problem. </p>
<h3 id="quiz-review">Quiz Review</h3>
<p>The network in the quiz consists of 3 sites, each represented by a sub-confederation, AS65100 / AS65200 / AAS65300, and a partner site (AS 400) represented by R4.<br/>
The problem appears due to a rather transient situation in the production network, such as: at this moment BGP runs only between R3 <-> R1 and R1 <-> R2 while the physical link between R3 and R2 was recently installed and no BGP was configured on it. During this temporary situation (until full BGP peering will be configured between R3 and R2) there are 2 requirements:</p>
<ul>
<li>site-3/R3 needs to use this direct link to reach site-2/R2 (so a static route is configured on R3)</li>
<li>while site-2/R2 runs in pre-production mode, its prefix (192.168.200.0/24) <strong><em>must not</em></strong> be advertised to partner company, R4</li>
</ul>
<p><a href="/uploads/quiz-16.png" title="BGP Suppress Inactive and Next Hop Matches"><img alt="quiz-16-1" src="/uploads/quiz-16.png" title="BGP Suppress Inactive and Next Hop Matches"/></a> </p>
<p>A requirement that I "forced" into the quiz was: <em>"do <strong><u>not</u></strong> use route-maps or other policies applied to the BGP neighbors"</em>. The reason I put this was to direct the reader into discovering the RIB-Failure route on R3 and suggesting Suppress-Inactive discussion. </p>
<h3 id="what-suppress-inactive-command-does">What <code>Suppress-Inactive</code> Command Does</h3>
<p>Let me start by defining the <blue><strong>inactive routes</strong></blue> = routes that are not installed into the RIB, most of the times because they are also learned from another routing protocol that has a better (read lower) administrative distance.<br/>
Such routes are marked in the BGP table with a "<blue><strong>r</strong></blue>" and also displayed with command <blue><code>show ip bgp rib-failure</code></blue>: </p>
<div class="row">
<pre class="col-md-10">R3#<blue>sh ip bgp</blue>
BGP table version is 5, local router ID is 192.168.255.3
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*> 10.0.0.0 10.0.0.4 0 0 400 i
*> 192.168.100.0 192.168.13.1 0 100 0 (65100) i
<red><u>r</u></red>> 192.168.200.0 192.168.12.2 0 100 0 (65100 65200) i
R3#
R3#<purple>sh ip bgp rib-failure</purple>
Network Next Hop RIB-failure RIB-NH Matches
192.168.200.0 192.168.12.2 <red>Higher admin distance</red> n/a
R3#
R3#<blue>sh ip route 192.168.200.0</blue>
Routing entry for 192.168.200.0/24
<red>Known via "static", distance 1</red>, metric 0
Routing Descriptor Blocks:
* 192.168.23.2
Route metric is 0, traffic share count is 1</pre>
</div>
<p>As seen above, the reason for the RIB-Failure is the existence of the static route (of course, lower AD comparing to BGP). </p>
<p>The default behavior on Cisco routers, as opposed to Juniper, is to advertise these inactive BGP prefixes as long as they are selected as "best" in the BGP table. I don't want to debate here which approach regarding the default behavior is better, Cisco's or Juniper's... my personal position in this matter would support Juniper approach because, in my opinion, advertising inactive BGP routes is an exception from normal designs, not a rule (when I say "rule" I mean that you don't design routes to be advertised by both BGP and IGP at the same time) ! </p>
<p>Of course, you can disable this default advertising of inactive routes with the command <purple><code>bgp suppress-inactive</code></purple> under the BGP process. </p>
<p><strong>Default behavior</strong> </p>
<div class="row">
<pre class="col-md-11">
R3#<blue>sh ip bgp neigh 10.0.0.4 advertised-routes</blue>
BGP table version is 5, local router ID is 192.168.255.3
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*> 192.168.100.0 192.168.13.1 0 100 0 (65100) i
<red>r> 192.168.200.0 192.168.12.2 0 100 0 (65100 65200) i</red>
Total number of prefixes 2
R3#</pre>
</div>
<p><strong>With <code>bgp suppress-inactive</code></strong> </p>
<div class="row">
<pre class="col-md-11">R3#conf t
Enter configuration commands, one per line. End with CNTL/Z.
R3(config)#<purple>router bgp 65300</purple>
R3(config-router)#<purple>bgp suppress-inactive</purple>
R3(config-router)#^Z
R3#
R3#sh ip bgp rib-failure
Network Next Hop RIB-failure RIB-NH Matches
192.168.200.0 192.168.12.2 <red>Higher admin distance</red> No
R3#
R3#<blue>sh ip bgp neigh 10.0.0.4 advertised-routes</blue>
BGP table version is 7, local router ID is 192.168.255.3
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal,
r RIB-failure, S Stale
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*> 192.168.100.0 192.168.13.1 0 100 0 (65100) i
Total number of prefixes 1
R3#</pre>
</div>
<h3 id="rib-next-hop-matches">RIB Next-Hop Matches</h3>
<p>An interesting fact about this command is the existence of this <u>condition</u>: command works (suppresses the advertising of BGP inactive routes) only when the comparison of these two next-hops (NH) returns a "<blue><strong>NO</strong></blue>":</p>
<ul>
<li>the next-hop of the BGP route selected as "best"</li>
<li>the next-hop of the route as installed in the RIB (the NH from the protocol that "beats" BGP)</li>
</ul>
<p><em>IF these two Next-Hops do <strong><u>not</u></strong> match,<br/>
Then the <u>inactive route is suppressed</u></em>: </p>
<div class="row">
<pre class="col-md-11">R3#<blue>sh ip bgp rib-failure</blue>
Network Next Hop RIB-failure <purple>RIB-NH Matches</purple>
192.168.200.0 192.168.12.2 Higher admin distance <red><u>No</u></red>
R3#
R3#<blue>sh ip bgp neigh 10.0.0.4 advertised-routes</blue>
...
Network Next Hop Metric LocPrf Weight Path
*> 192.168.100.0 192.168.13.1 0 100 0 (65100) i
Total number of prefixes 1
R3#</pre>
</div>
<p><em>IF these two Next-Hops <u>match</u>,<br/>
Then the inactive route is <u>not suppressed</u></em>.<br/>
Note that I'm modifying the static route on R3 to match the BGP NH: </p>
<div class="row">
<pre class="col-md-11">R3#conf t
Enter configuration commands, one per line. End with CNTL/Z.
R3(config)#<red>ip route 192.168.200.0 255.255.255.0 <u>192.168.12.2</u></red>
R3(config)#^Z
R3#
R3#<blue>sh ip bgp rib-failure</blue>
Network Next Hop RIB-failure <purple>RIB-NH Matches</purple>
192.168.200.0 <red><u>192.168.12.2</u></red> Higher admin distance <red><u>Yes</u></red>
R3#
R3#sh ip bgp neigh 10.0.0.4 advertised-routes
...
Network Next Hop Metric LocPrf Weight Path
*> 192.168.100.0 192.168.13.1 0 100 0 (65100) i
<red>r> 192.168.200.0 192.168.12.2 0 100 0 (65100 65200) i</red>
Total number of prefixes 2
R3#</pre>
</div>
<p>This means that if the two next-hop matches, there is no way that you can suppress the BGP inactive route. My own explanation for this: since the BGP next-hop matches the one of better AD routing protocol, then this prefix is "almost active" or "as good as active", so it will always be advertised to other BGP peers according to the BGP rules. </p>
<div class="row warning">
<div class="col-xs-12 col-sm-2 warning-left">
ATTENTION<br><i class="icon-fire icon-lg"> </i>
</br></div>
<div class="col-xs-12 col-sm-10 warning-right">
The <blue>RIB-NH Match</blue> is <u>relevant only when using <code>bgp suppress-inactive</code> command</u>. If this command is not configured (the default behavior) this field is <blue>n/a</blue> (see below)
</div>
</div>
<div class="row">
<pre>R3#<purple>sh ip bgp</purple>
BGP table version is 5, local router ID is 192.168.255.3
...
Network Next Hop Metric LocPrf Weight Path
*> 10.0.0.0 10.0.0.4 0 0 400 i
*> 192.168.100.0 192.168.13.1 0 100 0 (65100) i
<red><u>r</u></red>> 192.168.200.0 192.168.12.2 0 100 0 (65100 65200) i
R3#
R3#<purple>sh ip bgp rib-failure</purple>
Network Next Hop RIB-failure RIB-NH Matches
192.168.200.0 192.168.12.2 Higher admin distance <red><u>n/a</u></red>
R3#
</pre>
</div>
<p><em>Thank you for your comments and interest in the quiz!<br/>
Subscribe to this blog to get more interesting quizzes and detailed solutions.</em> </p>
<p><br/></p>Quiz #19 – Short Network Cuts with MSTP2013-10-04T00:00:00+01:00Costitag:costiser.ro,2013-10-04:2013/10/04/quiz-19/<p><span class="dropcap">Y</span>ou are a senior network administrator managing your company's data center.<br/>
Upon receiving complaints from server team that yesterday there were multiple short network cuts that impacted some very sensitive applications running in the data center, you investigate and find out that one of the level 1 network engineers performed the following changes on these 4 switches:<br/>
<br/></p>
<ul>
<li>he created a new <strong>vlan 200</strong> on all 4 switches</li>
<li>he added <strong>vlan 200 to instance 2</strong> on all switches in this order: Acc-1 -> Acc-2 -> Dist-1 -> Dist-2</li>
</ul>
<p><a href="/uploads/quiz-19.png" title="Quiz 19 - Network Cuts with MSTP"><img alt="quiz-19" src="/uploads/quiz-19.png" title="Quiz 19 - Network Cuts with MSTP"/></a> <br/>
<em>Note: the trunks between the switches allow all vlans!</em> </p>
<p>Considering the actions performed by the junior engineer,<br/>
<strong><em>1. how can you explain the network cuts ?<br/>
2. how could have he performed the task (to assign new vlan 200 to instance 2) without any network cut ?</em></strong> </p>
<p><em>Post your answer in the 'Comments' section below and subscribe to this blog to get the detailed solution and more interesting quizzes.</em> </p>
<p><br/></p>RIP – Auto Summarization and Impact on Discontiguous Networks2013-09-13T00:00:00+01:00Costitag:costiser.ro,2013-09-13:2013/09/13/rip-auto-summary-and-discontiguous-networks/<p><span class="dropcap-bg">T</span>his article discusses the solutions for <a href="/2013/06/03/quiz-15/">quiz 15</a>.<br/>
Have a look at the quiz to understand the problem</p>
<p>Who said the dinosaurs are dead ? No, no... they are still among us in the form of RIP, Frame Relay and few others.<br/>
This is not a new episode from Discovery Channel, but just another topic that Cisco (still) keeps inside the curriculum for the CCIE exam. </p>
<p>There were people asking me WHY do I talk about RIP ?<br/>
Let me ask you something back: imagine you go for the CCIE lab and you need to configure RIP or EIGRP and after reading the entire content of the exam you realize that they don't say anything about <strong>the default</strong> auto-summarization - so, what do you do? do you disable it or not ?<br/>
My point here: you need to be ready for anything and a proper preparation for CCIE exam needs to cover RIP and other topics that are considered "dinosaurs" in today's network world. </p>
<h3 id="quiz-review">Quiz Review</h3>
<p>The quiz is about a simple RIP network that worked for some time until the interface connected to users on 1st floor (Fa0/1 on R1) goes down which causes an impact on the entire network. The first problem visible here is the use of <strong><em>discontinuous networks</em></strong>: <red>classful <code>172.16.0.0/16</code> is separated by <code>192.168.1.x/30</code> in the middle/transit</red> - this is important especially in this case that RIP is involved. </p>
<p>I intentionally did <strong>not</strong> show any configuration in the quiz (<em>it would have been too obvious</em>), but only the output of the routing table.<br/>
A closer look at the quiz reveals <strong><em>a classful 172.16.0.0/16 route that exists <u>only</u> on R2 and <u>not</u> on R1</em></strong>. This discrepancy between R1 and R2 should be the starting point for explaining the problem.<br/>
Also, <a href="/2013/06/03/quiz-15/">the hint section</a> shows the output of <blue><code>show ip protocols</code></blue> containing the <red><strong><code>Automatic network summarization <u>is</u> in effect</code></strong></red> sentence. </p>
<p><a href="/uploads/quiz-15-with-config.png" title="Quiz 15 - RIP Auto Summarization"><img alt="quiz-15-with-config" src="/uploads/quiz-15-with-config.png" title="Quiz 15 - RIP Auto Summarization"/></a> </p>
<p>When <strong>R1's Fa0/1</strong> interface goes down, the default summary <code>172.16.0.0/16</code> will loop between R1, R2 and R3 and any traffic for it (<em>different than the connected networks</em>) will loop at layer 3 until TTL expires, burdening the network (details to follow below): </p>
<div class="row">
<pre class="col-md-8">R3#<blue>traceroute 172.16.1.1</blue>
Type escape sequence to abort.
Tracing the route to 172.16.1.1
1 192.168.1.2 72 msec 16 msec 0 msec
2 192.168.1.10 16 msec 12 msec 8 msec
3 192.168.1.5 16 msec 8 msec 12 msec
4 192.168.1.2 8 msec 12 msec 8 msec
5 192.168.1.10 20 msec 16 msec 20 msec
6 192.168.1.5 32 msec 20 msec 20 msec
7 192.168.1.2 20 msec 20 msec 20 msec
<black>... and so on...</black></pre>
</div>
<p>... read further to understand why it happens like that ... </p>
<h3 id="rip-automatic-summarization">RIP Automatic Summarization</h3>
<p>To understand how the loop is formed, we need to know/remember:</p>
<div class="row warning">
<div class="col-xs-12 col-sm-2 warning-left">
REMEMBER<br><i class="icon-fire icon-lg"> </i>
</br></div>
<div class="col-xs-12 col-sm-10 warning-right">
RIP performs <red>auto-summarization on the classful network boundary</red> every time the major/classful advertised network is different from the major network of the interface onto which the updates are sent !
</div>
</div>
<p>In our quiz, because the transit links have addressing in major network <code>192.168.1.0/24</code>, it means that R1 will always generate summaries for 172.16.x.x when sent out over these transit links.<br/>
This behavior applies to all distance vector protocols: RIP, IGRP, EIGRP. When using EIGRP and RIPv2, you can disable this automatic summarization. </p>
<table class="notes">
<tr><td class="notes-left">
<div class="hidden-xs">NOTES</div><i class="icon-book-open icon-sm"> </i>
</td>
<td class="notes-right"><ul>
<li>auto-summary is enabled by default</li>
<li>it does **<u>not</u>** install a Null0 for the auto-summaries that it generates</li>
<li>RIPv2 allows turning off the automatic summarization, but RIPv1 does not have this capability</li>
</ul>
</td></tr>
</table>
<h3 id="how-is-the-loop-formed">How is the loop formed ?</h3>
<p>As explained above, due to auto-summarization being enabled on R1, the major network summary 172.16.0.0/16 will loop between R1, R2 and R3. Let's see, step by step, how this happens: </p>
<p><a href="/uploads/quiz-15_states-2.png" title="RIP - Routing Loops due to Auto Summary"><img alt="quiz-15_states-2" src="/uploads/quiz-15_states-2.png" title="RIP - Routing Loops due to Auto Summary"/></a><br/>
<em><font size="-1">Note: the above diagram shows <strong>only</strong> the status of summary 172.16.0.0/16 (the other RIP routes are omitted)</font></em> </p>
<p><strong>Stage 1:</strong> Initial (apparently working) status<br/>
Due to mistakenly auto-summarization being enabled on R1, this router generates the summary 172.16.0.0/16 and advertises it to both R2 and R3.<br/>
The condition to generate the auto-summary is the existence of contributing prefixes, which is satisfied: R1 has a connected 172.16.1.0/24 route. RIP does not uses the behavior of installing Null0 for the summaries they generate ! </p>
<p><strong>Stage 2 (for the next 180 sec):</strong> R1's Fa0/1 interface goes down<br/>
R1's Fa0/1 interface goes down and for the following occurs on R1:</p>
<ul>
<li>R1 drops the connected 172.16.1.0/24 from its routing table, <font color="red"><strong>but it still has another contributing route: the RIP 172.16.2.0/24 received from R2 ==> it continues to generate the summary 172.16.0.0/16</strong></font></li>
<li>since the summary is based on the RIP 172.16.2.0/24 received from R2 and due to split-horizon rules, R1 will <strong><u>stop</u></strong> advertising the summary towards R2</li>
<li>since the summary is based on the RIP 172.16.2.0/24 received from R2, R1 will advertise the summary with an <strong><u>increased metric</u></strong> towards R3</li>
<li>R3 will see two equal cost (metric=2) routes for the summary (the new one from R1 and the old one from R2)</li>
<li>R2 will not see the summary from R1 and it needs to wait 180 seconds before invalidating it</li>
</ul>
<p><strong>Stage 3 (for the next 60 sec):</strong> R2 is holding down the summary<br/>
R2 does not receive the summary from R1 for 180 seconds so it will flush it and poison it (advertise it with unreachable metric of 16). The following happens:</p>
<ul>
<li>R2 marks the 172.16.0.0/16 summary as down in its routing table and for the next 60 sec does not accept worse metric updates for it</li>
<li>R3 sees the poisoned summary from R2 and it drops this path from its routing table - R3 will only keep the summary via R1</li>
</ul>
<p><strong>Stage 4 (next update <30 sec):</strong> Flushing expires on R2<br/>
The flushing (holding down the summary) expires after a total of 240 sec, so now R2 can happily accept the summary with a worse metric (metric = 3) received from R3 and it sends it further to R1, thus creating the loop in control plane:</p>
<ul>
<li>R2 accepts the worse metric (metric=3) via R3 (as opposed to the old metric of 1 from R1)</li>
<li>R1 will receive (for the first time) this summary with a metric of 4, from R1</li>
<li>R1 also receives the child subnet 172.16.2.0/24 from R1 so <font color="red">it will continue to generate the summary towards R3</font></li>
</ul>
<h3 id="solutions-to-solve-the-routing-loop">Solutions to Solve the Routing Loop</h3>
<p><strong>1. Correct the mistake</strong><br/>
The problem is caused by a mistake in configuration on R1: auto-summary is enabled ! </p>
<div class="row warning">
<div class="col-xs-12 col-sm-2 warning-left">
REMEMBER<br><i class="icon-fire icon-lg"> </i>
</br></div>
<div class="col-xs-12 col-sm-10 warning-right">
<red>Always disable auto-summarization</red> when performing routing between <u><red>discontiguous</red></u> networks !
</div>
</div>
<div class="row">
<pre class="col-md-9">R1#conf t
Enter configuration commands, one per line. End with CNTL/Z.
R1(config)#router rip
R1(config-router)#<green>no auto-summary</green>
R1(config-router)#end
R1#
----------------------------------------------
R3#<blue>clear ip route *</blue>
R#sh ip route rip
172.16.0.0/24 is subnetted, 1 subnets
R 172.16.2.0 [120/1] via 192.168.1.6, 00:00:15, Serial0/1
192.168.1.0/30 is subnetted, 3 subnets
R 192.168.1.8 [120/1] via 192.168.1.6, 00:00:15, Serial0/1
[120/1] via 192.168.1.2, 00:00:19, Serial0/0
R3#
R3#<blue>traceroute 172.16.1.1</blue>
Type escape sequence to abort.
Tracing the route to 172.16.1.1
1 * * *
2 * * *
...
R3#</pre>
</div>
<p><strong>2. Manually generate a Null0 for the summary</strong><br/>
DISCLAIMER: I would not consider this a solution, but it can solve the problem since the Null0 route will break the routing loop!</p>
<div class="row">
<pre class="col-md-9">R1#conf t
Enter configuration commands, one per line. End with CNTL/Z.
R1(config)#<green>ip route 172.16.0.0 255.255.0.0 null 0</green>
R1(config)#end
----------------------------------------------
R3#traceroute 172.16.1.1
Type escape sequence to abort.
Tracing the route to 172.16.1.1
1 192.168.1.2 40 msec 20 msec 4 msec
2 192.168.1.2 !H * !H
R3#</pre>
</div>
<h3 id="juniper-approach">Juniper Approach</h3>
<p>The above problems with the auto-summaries under RIP can not occur on Juniper devices because of different approach used by JunOS: <strong><em>all advertisements are based on routing policies used to import or export routes</em></strong>.<br/>
JunOS does <strong>not</strong> have commands/keywords such as <code>auto-summary</code>, <code>ip summary-address</code>, <code>default-originate</code> or similar that do "magic" things in the background.<br/>
Unlike Cisco, Juniper would never advertise a summary unless you manually export it under the RIP protocol and such summary exists in the routing table. </p>
<p>The detailed configuration of this quiz on Juniper devices and more information can be found <a href="/rip-basic-configuration-on-juniper.html">here</a>. </p>
<p><br>
<em>Thank you for your comments and interest in the quiz!</em><br/>
<em>Subscribe to this blog to get other more interesting quizzes and detailed solutions.</em> </br></p>
<p><br/></p>RIP – Basic Configuration on Juniper2013-09-13T00:00:00+01:00Costitag:costiser.ro,2013-09-13:2013/09/13/rip-basic-configuration-on-juniper/<p><span class="dropcap-bg">T</span>his article is a continuation of previous post about <a href="/rip-auto-summary-and-discontiguous-networks.html">RIP Auto-Summarization and it's impact on discontiguous networks in Cisco networks</a>, but this time from Juniper's perspective. Using the default <code>auto-summary</code> on Cisco devices can lead to routing loops in case of discontiguous networks, as shown in <a href="/2013/06/03/quiz-15/">quiz 15</a>. </p>
<p>As mentioned in the previous article, JunOS is not susceptible of such problem for the simple reason that there's no <code>auto-summary</code> command/option for RIP in Juniper world.<br/>
Another very important note is: according to the <blue><strong>default export policy for RIP under JunOS, <u>nothing is advertised</u> (no direct/connected interface, no other RIP learned routes, nothing)</strong></blue> ! </p>
<p>Here is how the basic, minimum configuration looks like:<br/>
<a href="/uploads/RIP-on-junos.png" title="RIP Configuration on Juniper Junos"><img alt="RIP-on-junos" src="/uploads/RIP-on-junos.png" title="RIP Configuration on Juniper Junos"/></a> </p>
<p>The use of groups and neighbor commands looks odd, but it provides freedom to apply different policies to different neighbors. It is very unlikely that RIP will be used for large scale networks, but could be used on edge where different groups appear useful. </p>
<p>Another oddness (at least for a Cisco oriented engineer) is the fact that <em><strong>the above configuration does not produce any exchange of routing information</strong></em> - as described above, the <u>default export policy for RIP is <code>reject all</code></u>. If you want your RIP speaker to advertise routing information, you must configure routing policies: policy statements that will be applied as <strong><em>export</em></strong> policy under rip protocol, as shown below: </p>
<div class="row">
<pre class="col-md-8">protocols {
rip {
group INTERNAL {
<green>export RIP_EXPORT</green>;
neighbor em0.0;
neighbor em1.0;
}
}
}
policy-options {
<green>policy-statement RIP_EXPORT {
term direct {
from protocol direct;
then accept;
}
term rip {
from protocol rip;
then accept;
}
}</green>
}</pre>
</div>
<p><em><font size="-1">Note-1: there are multiple ways of configuring policies (different policy-statements, or a single term, or using route filters, etc) - but it's outside the scope of this article</font></em> </p>
<p>_<font size="-1">Note-2: the <strong>export</strong> can be applied under different levels of hierarchy (rip, group, neighbor)</font> _ </p>
<p>As a side note about routing policies, the default <strong>import</strong> policy is "accept all", so unless you need to apply some granular import policy, all RIP routes will be imported (accepted).<br/>
The RIP routes in the routing table: </p>
<div class="row;">
<pre class="col-md-10">root@R3-BR> <blue>show route protocol rip</blue>
inet.0: 8 destinations, 10 routes (8 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both
172.16.1.0/24 *[RIP/100] 00:12:14, metric 2, tag 0
> to 192.168.1.2 via em0.0
172.16.2.0/24 *[RIP/100] 00:12:17, metric 2, tag 0
> to 192.168.1.6 via em1.0
192.168.1.0/30 [RIP/100] 00:12:17, metric 3, tag 0
> to 192.168.1.6 via em1.0
192.168.1.4/30 [RIP/100] 00:12:14, metric 3, tag 0
> to 192.168.1.2 via em0.0
192.168.1.8/30 *[RIP/100] 00:12:17, metric 2, tag 0
to 192.168.1.2 via em0.0
> to 192.168.1.6 via em1.0
224.0.0.9/32 *[RIP/100] 00:12:26, metric 1
MultiRecv
root@R3-BR> <blue>show route 192.168.1.0/30 extensive exact </blue>
inet.0: 8 destinations, 10 routes (8 active, 0 holddown, 0 hidden)
192.168.1.0/30 (2 entries, 0 announced)
<green>Direct Preference: 0</green>
Next hop type: Interface
Next-hop reference count: 1
Next hop: via em0.0, selected
State: <active int="">
Age: 12:29
Task: IF
AS path: I
RIP Preference: 100
Next hop type: Router, Next hop index: 549
Next-hop reference count: 4
Next hop: 192.168.1.6 via em1.0, selected
State: <int>
<red>Inactive reason: Route Preference</red>
Age: 12:21 Metric: 3 Tag: 0
Task: RIPv2
AS path: I
Route learned from 192.168.1.6 expires in 170 seconds
</pre>
</div>
<p>Some conclusions from the above output:</p>
<ul>
<li>RIP preference (the equivalent of Cisco's administrative distance) is 100</li>
<li>active routes are only the ones marked with an asterisk *</li>
<li>extensive output provides a lot of useful information, such as: status, inactive reason, time when was learned, etc</li>
</ul>
<p>In order to verify the status of RIP, the following two commands may be used: </p>
<div class="row">
<pre>root@R3-BR> <blue>show rip neighbor </blue>
Source Destination Send Receive In
Neighbor State Address Address Mode Mode Met
-------- ----- ------- ----------- ---- ------- ---
em0.0 Up 192.168.1.1 224.0.0.9 mcast both 1
em1.0 Up 192.168.1.5 224.0.0.9 mcast both 1
root@R3-BR> <blue>show rip statistics </blue>
RIPv2 info: port 520; holddown 120s.
rts learned rts held down rqsts dropped resps dropped
5 0 0 0
em0.0: 3 routes learned; 0 routes advertised; timeout 180s; update interval 30s
Counter Total Last 5 min Last minute
------- ----------- ----------- -----------
Updates Sent 0 0 0
Triggered Updates Sent 0 0 0
Responses Sent 0 0 0
Bad Messages 0 0 0
RIPv1 Updates Received 0 0 0
RIPv1 Bad Route Entries 0 0 0
RIPv1 Updates Ignored 0 0 0
<green>RIPv2 Updates Received 15 10 2</green>
RIPv2 Bad Route Entries 0 0 0
RIPv2 Updates Ignored 0 0 0
Authentication Failures 0 0 0
RIP Requests Received 1 0 0
RIP Requests Ignored 0 0 0
…</pre>
</div>
<p>In the end, as part of the troubleshooting, traceoptions will provide you all the debugging information that you might be interested in: </p>
<div class="row">
<pre>rip {
<purple>traceoptions {
file rip-debug;
flag update;
flag route;</purple>
}
}
!
!
root@R3-BR> <blue>show log rip-debug </blue>
Sep 12 17:49:57 R3-BR clear-log[1922]: logfile cleared
Sep 12 17:49:58.702387 received response: sender 192.168.1.2, command 2, version 2, mbz: 0; 4 routes.
Sep 12 17:49:58.702432 192.168.1.8/30: metric-in: 2, change: 2 -> 2; # gw: 2, pkt_upd_src 192.168.1.2, inx: 0, rte_upd_src 192.168.1.2
Sep 12 17:49:58.702446 192.168.1.4/30: metric-in: 3, change: 3 -> 3; # gw: 1, pkt_upd_src 192.168.1.2, inx: 0, rte_upd_src 192.168.1.2
Sep 12 17:49:58.702456 172.16.2.0/24: metric-in: 3, change: 2 -> 3; # gw: 1, pkt_upd_src 192.168.1.2, inx: 1, rte_upd_src 0.0.0.0
Sep 12 17:49:58.702465 172.16.1.0/24: metric-in: 2, change: 2 -> 2; # gw: 1, pkt_upd_src 192.168.1.2, inx: 0, rte_upd_src 192.168.1.2
Sep 12 17:50:04.827218 Preparing to send RIPv2 updates on nbr em1.0, group: INTERNAL.
Sep 12 17:50:14.936562 Preparing to send RIPv2 updates on nbr em0.0, group: INTERNAL.
</pre>
</div>
<p><em>Thanks for reading !</em> </p>
<p><br/></p>Quiz #18 – Cisco vs. Juniper - Filtering ICMP between BGP Peers2013-08-28T00:00:00+01:00Costitag:costiser.ro,2013-08-28:2013/08/28/quiz-18/<p><span class="dropcap-bg">Y</span>our company uses multi-vendor routing platforms (Cisco and Juniper) and has multiple sites connected via MPLS from a service provider.<br/>
Each remote site has a GRE tunnel with the Headquarter (HQ) and a BGP session over this tunnel, in order to learn prefixes that you don't want to be exchanged with your MPLS provider. </p>
<p>After attending a security training, your Security Team raised concerns about ICMP-based attacks and decided to <red>block ICMP messages on all physical interfaces connected to outside networks</red> on all border routers in all sites, and they implement this protection as shown in the below diagram: </p>
<p><a href="/uploads/quiz-18.png" title="Quiz-18 Cisco vs Juniper"><img alt="quiz-18-1" src="/uploads/quiz-18.png" title="Quiz-18 Cisco vs Juniper"/></a> </p>
<p>Some time after the Security Team implemented the above changes, you notice that the BGP session with <strong>Site-2 (Juniper-based CE)</strong> started to flap impacting the connectivity to this site.<br/>
After getting some more info, it seems that <strong>all Juniper-based CE sites are affected</strong> (BGP sessions go UP, they try to exchange prefixes but then NOTIFICATION is received and BGP goes down), while the BGP sessions to the <strong>Cisco-based CE sites are ok</strong>. </p>
<p><strong><em>What is the problem and how to solve it?</em></strong> </p>
<p><em>Post your answer in the 'Comments' section below and subscribe to this blog to get the detailed solution and more interesting quizzes.</em> </p>
<p><br/></p>Redistributing Internal BGP (iBGP) into an IGP – why is it dangerous ?2013-08-19T00:00:00+01:00Costitag:costiser.ro,2013-08-19:2013/08/19/redistributing-internal-bgp-ibgp-into-an-igp-why-is-it-dangerous/<p><span class="dropcap-bg">T</span>his article discusses the solutions for <a href="/2013/05/23/quiz-14/">quiz 14</a>.<br/>
Have a look at the quiz to understand the problem. </p>
<h3 id="quiz-review">Quiz Review</h3>
<p>The network engineer noted that CORE has a default route received from the Border Router (BR) via internal BGP (iBGP) and he wanted to push this default into the OSPF toward the DIST router. In order to achieve this, he configures <strong><blue><code>default-information originate</code></blue></strong> on the CORE router, still the DIST does not learn it. </p>
<p>To understand the problem you need to know that command "default-information originate" represents a <strong><em>redistribution of prefix 0.0.0.0/0 into OSPF</em></strong> ( and as with any redistribution, it requires the prefix to be in the routing table - except when using the keyword <code>always</code>).<br/>
The key of the quiz is that <font color="red"><em>internal BGP (iBGP) routes are <strong>not</strong>, by default, redistributed into any IGP (OSPF, in our case)</em></font> - this is the reason why CORE does not send the default route to DIST.</p>
<div class="row">
<pre class="col-md-9">CORE#sh ip ospf database external
OSPF Router with ID (192.168.255.2) (Process ID 1)
CORE#</pre>
</div>
<p>As you can see, the default does not appear in the OSPF database. </p>
<p>As many of you replied to the quiz, to make CORE redistribute the default iBGP into OSPF you need to configure <strong><font color="blue"><code>bgp redistribute-internal</code></font></strong> under the BGP process. While the purpose of quiz was to open the discussion about this topic, you need to know that using this command you open the door to routing loops. </p>
<h3 id="why-is-bgp-redistribute-internal-dangerous">Why is <code>bgp redistribute-internal</code> dangerous ?</h3>
<p>As mentioned, redistributing internal BGP (iBGP) into any IGP is disabled by default and it can be overridden using this command, but it is <strong><em>never recommended</em></strong>. In my opinion, a network that requires this command, needs to be re-designed ! </p>
<p>Everybody's talking about how this command is dangerous but there are few scenarios on the web showing the problem, so I will try to present such a routing loop.<br/>
The potential for routing loops is given by several facts:</p>
<ul>
<li>redistribution means that you loose some information about the "real" source of that prefix</li>
<li>iBGP loop prevention mechanisms requires only a full mesh and does not consider other information (such as metrics, real originator etc) + using Route Reflection, you loose this protection</li>
<li>iBGP administrative distance (AD) is higher than the AD of any IGP, so redistributed routes will always be preferred via the IGP</li>
</ul>
<p>Let's consider the same scenario as in the quiz but with the addition of the 2nd core, with full mesh of iBGP between BR and the two COREs, as shown below: </p>
<p><a href="/uploads/quiz-14-solution.png"><img alt="quiz-14-solution" src="/uploads/quiz-14-solution.png"/></a> </p>
<p>CORE-1 redistributes the iBGP default route into the OSPF domain => <strong>CORE-2 receives <u>both</u> the iBGP (<em>from BR with AD 200</em>) and the OSPF (<em>from CORE-1 with AD 110</em>) and chooses CORE-1 as best</strong>, due to lower AD of OSPF vs. iBGP: </p>
<div class="row">
<pre class="col-md-11">CORE-2#<blue>sh ip route | i 0.0.0.0</blue>
...
<red>O*E2 0.0.0.0/0 [110/1] via 192.168.1.17, 01:51:31, FastEthernet1/0</red>
CORE-2#
CORE-2#<blue>sh ip route 0.0.0.0</blue>
Routing entry for 0.0.0.0/0, supernet
Known via <red>"ospf 1", distance 110</red>, metric 1, candidate default path
Tag 1, type extern 2, forward metric 1
Last update from 192.168.1.17 on FastEthernet1/0, 01:51:21 ago
Routing Descriptor Blocks:
* 192.168.1.17, from 192.168.255.2, 01:51:21 ago, via FastEthernet1/0
Route metric is 1, traffic share count is 1
Route tag 1
CORE-2#
CORE-2#<blue>sh ip bgp</blue>
...
Network Next Hop Metric LocPrf Weight Path
<red>r></red>i0.0.0.0 192.168.255.1 0 100 0 100 i
CORE-2#
CORE-2#<purple>sh ip bgp rib-failure</purple>
Network Next Hop RIB-failure RIB-NH Matches
0.0.0.0 192.168.255.1 Higher admin distance n/a
</pre>
</div>
<p>You can immediately notice two (small) problems:<br/>
- CORE-2 does not correctly identify the "real" source of the default route<br/>
- CORE-2 choses a longer path (via CORE-1) to reach internet prefixes (to exit), instead of direct path to BR </p>
<p>I said "small" problems because there's nothing wrong with the connectivity - CORE-2 can ping ISP link on BR: </p>
<div class="row">
<pre class="col-md-10">CORE-2#<blue>ping 1.1.1.2</blue>
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 1.1.1.2, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 36/62/88 ms
CORE-2#
CORE-2#<blue>traceroute 1.1.1.2</blue>
Type escape sequence to abort.
Tracing the route to 1.1.1.2
1 192.168.1.17 52 msec 28 msec 16 msec
2 192.168.1.1 64 msec * 64 msec
</pre>
</div>
<p>Now let's imagine that link between BR and CORE-1 gets down. Since the iBGP peering is configured between loopbacks, <font color="red"><em>BGP session will remain up</em></font> (CORE-1 and BR can reach each other via CORE-2). As a result: </p>
<ul>
<li>
<p><strong>CORE-1</strong> has a default route via iBGP from BR with a next-hop 192.168.1.18 = CORE-2 (BGP performs recursive routing to resolve the BGP next-hops): </p>
<p><div class="row">
<pre class="col-md-11">CORE-1#<blue>sh ip route | i 0.0.0.0</blue>
...
<red>B* 0.0.0.0/0 [200/0] via 192.168.255.1, 00:00:24</red>
CORE-1#
CORE-1#<blue>sh ip cef 192.168.255.1</blue>
192.168.255.1/32, version 60, epoch 0, cached adjacency 192.168.1.18
0 packets, 0 bytes
<red>via 192.168.1.18, FastEthernet1/0</red>, 1 dependency
next hop 192.168.1.18, FastEthernet1/0
valid cached adjacency
CORE-1#</pre>
</div></p>
</li>
<li>
<p><strong>CORE-2</strong> still has two sources of information for default, OSPF and iBGP, and chooses OSPF back to CORE-1: </p>
<p><div class="row">
<pre class="col-md-11">CORE-2#<blue>sh ip route 0.0.0.0</blue>
Routing entry for 0.0.0.0/0, supernet
Known <red>via "ospf 1", distance 110</red>, metric 1, candidate default path
Tag 1, type extern 2, forward metric 1
Last update from 192.168.1.17 on FastEthernet1/0, 00:02:28 ago
Routing Descriptor Blocks:
<red>* 192.168.1.17, from 192.168.255.2, 00:02:28 ago, via FastEthernet1/0</red>
Route metric is 1, traffic share count is 1
Route tag 1
CORE-2#</pre>
</div></p>
</li>
</ul>
<p>The result of this: <red><u>routing loop between CORE-1 and CORE-2</u></red>, visible in the broken connectivity towards the internet from all devices: </p>
<div class="row">
<pre class="col-md-9"><purple>CORE-2#</purple><blue>ping 1.1.1.2</blue>
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 1.1.1.2, timeout is 2 seconds:
<red>.....
Success rate is 0 percent</red> (0/5)
CORE-2#
<black>===================================================================</black>
<purple>DIST#</purple><blue>ping 1.1.1.2</blue>
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 1.1.1.2, timeout is 2 seconds:
<red>.....
Success rate is 0 percent</red> (0/5)
DIST#
DIST#<blue>traceroute 1.1.1.2</blue>
Type escape sequence to abort.
Tracing the route to 1.1.1.2
1 192.168.1.5 36 msec 24 msec 24 msec
2 192.168.1.18 44 msec 28 msec 24 msec
3 192.168.1.17 36 msec 68 msec 36 msec
4 192.168.1.18 72 msec 48 msec 36 msec
5 192.168.1.17 68 msec 64 msec 88 msec
6 192.168.1.18 56 msec 40 msec 68 msec
7 192.168.1.17 72 msec 100 msec 56 msec
<black>... and so on...</black></pre>
</div>
<p>A final note: this command is not necessary in case of MPLS VPNs, but this is another topic, maybe a new quiz :-) </p>
<h3 id="solutions-for-the-quiz">Solutions for the quiz</h3>
<p>In the end, let's consider all solutions for the quiz. </p>
<p><em><strong>1. Using "bgp redistribute-internal" on CORE-1</strong></em></p>
<p>This solution and its potential dangerous situation was detailed above.<br/>
<em><font size="-1">The quiz was purposely intended towards the "bgp redistribute-internal" solution, in order to open this discussion/article.</font></em> </p>
<p><em><strong>2. Inject the Default Route on the BR</strong></em></p>
<p>Since the "real" last resort in AS 200 is/are the border router(s), from design perspective, the recommended solution is to generate the default route on the BRs, not on CORE(s). </p>
<p><em><strong>3. Using <code>always</code> option on CORE-1 or Using any Route-Map</strong></em></p>
<p>Using "default-information originate always" is not equivalent to a redistribution because it does not consider/check whether a default route exists before being advertised into OSPF. </p>
<div class="row">
<pre class="col-md-9">CORE#conf t
Enter configuration commands, one per line. End with CNTL/Z.
CORE(config)#router ospf 1
CORE(config-router)#<purple>default-information originate always</purple>
CORE(config-router)#end
CORE#sh ip osp database external
OSPF Router with ID (192.168.255.2) (Process ID 1)
Type-5 AS External Link States
LS age: 127
Options: (No TOS-capability, DC)
LS Type: AS External Link
<red>Link State ID: 0.0.0.0 (External Network Number )</red>
Advertising Router: 192.168.255.2
...</pre>
</div>
<p><red><em>NOTE</em>: using <code>always</code> in this case creates the same routing loop scenario between CORE-1 and CORE-2 when uplink between CORE-1 and BR fails !</red> </p>
<p>Using <strong>a route-map</strong>, has the same result as with "always" as long as the match condition in the route-map is true, no matter whether a default route exists or not. </p>
<p><br>
<em>Thank you for your comments and interest in the quiz!</em><br/>
<em>Subscribe to this blog to get other more interesting quizzes and detailed solutions.</em> </br></p>
<p><br/></p>Quiz #17 – Cisco vs. Juniper - BGP Advertisements2013-08-11T00:00:00+01:00Costitag:costiser.ro,2013-08-11:2013/08/11/quiz-17/<p><span class="dropcap">Y</span>our company decided to replace the existing Cisco devices with Juniper. </p>
<p>In the diagram below, the two routers in AS 65100 (R1 and R2) were replaced with two Juniper routers, while for the moment the router in AS 65300 (R3) is still Cisco one. </p>
<p><br>
<a href="/uploads/quiz-17-1.png"><img alt="quiz-17-1" src="/uploads/quiz-17-1.png"/></a> </br></p>
<p>You performed the configuration in the diagram, OSPF and BGP sessions are up, but <strong>with the new Juniper devices</strong>, <strong><red><em>R3 does not receive any BGP routes from R2</em></red></strong>. </p>
<p><a href="/uploads/quiz-17-2.png"><img alt="quiz-17-2" src="/uploads/quiz-17-2.png"/></a> </p>
<p>Troubleshooting on R2 returns you the following output:<br/>
<a href="/uploads/quiz-17-3.png"><img alt="quiz-17-3" src="/uploads/quiz-17-3.png"/></a> </p>
<p><strong><em>What is the problem and how to solve it?</em></strong> </p>
<p><em>Post your answer in the 'Comments' section below and subscribe to this blog to get the detailed solution and more interesting quizzes.</em> </p>
<p><br/></p>Frame Relay – Understanding Static and Dynamic Mappings2013-08-07T00:00:00+01:00Costitag:costiser.ro,2013-08-07:2013/08/07/frame-relay-understanding-static-and-dynamic-mappings/<p><span class="dropcap-bg">T</span>his is the solution for <a href="/2013/04/27/quiz-13/">quiz-13</a> that covers Frame Relay, a topic that is very hard to find it implemented in today's networks but often seen during certification exams, as it still part of the curriculum for some of them. </p>
<p>In the real world, Frame Relay was mostly replaced by MPLS but there could still be companies that have Frame Relay, probably due to the long-time contracts signed with their providers. Anyway, even in these cases, it’s almost certain that you will not find an end-to-end Frame Relay connection these days, but rather FR on the last mile and MPLS in the provider’s core. </p>
<p>One of the challenge with Frame Relay networks is represented by the <em>difference between the layer 2 connectivity and layer 3 view</em> (what layer 3 protocols “think” about layer 2), again due to the fact that in the absence of broadcast, it could be very difficult to discover all the devices connected at layer 2 - this is mostly common in partial-mesh or hub-and-spoke topologies.<br/>
For example, all devices connected to the same Ethernet broadcast domain are able to see each other (using broadcast or multicast) but in case of Frame Relay more devices could be connected to the same link but they don't necessary have direct connectivity between each other (spoke-to-spoke communication occurs via the hub when there is no virtual circuit between them). </p>
<p>In order to find the layer 3 address of other DTE neighbors, Frame Relay routers uses dynamic mapping, via Inverse ARP, or static entries. </p>
<h3 id="address-resolution-via-dynamic-mapping-inverse-arp">Address Resolution via Dynamic Mapping: Inverse ARP</h3>
<p>Each DTE router learns the DLCI via the LMI messages but it does not know what is the IP address of the neighboring device or devices. The process of discovering the IP address of the remote end is called Inverse ARP (InARP), thus creating mapping between local DLCI and the remote end's protocol address.<br/>
The name “<strong><em>Inverse ARP</em></strong>” comes from the fact that “normal” ARP is used to discover the layer-2 MAC address having the layer-3 IP address, while in case of Frame Relay, the DLCI (layer-2 address) is already learnt via LMI but it does not know the layer-3 address (so it is an “inverse” logic).<br/>
Once an IP address is configured on an interface connected to the Frame Relay cloud, InARP messages containing that IP address are sent on all DLCI on that interface. </p>
<p><a href="/uploads/figure-4.png" title="ARP via Dynamic Mapping - Inverse ARP"><img alt="figure-4" src="/uploads/figure-4.png" title="ARP via Dynamic Mapping - Inverse ARP"/></a> </p>
<p>In the diagram, R1 sends InARP requests with its IP 192.168.1.1 on both DLCI 102 (towards R2) and DLCI 103 (towards R3) and, based on the received InARP replies, it will map <strong>IP 192.168.1.2 to DLCI 102</strong> and <strong>IP 192.168.1.3 to DLCI 103</strong>. </p>
<p>There are several <font color="blue">important notes</font> that need to be remembered: </p>
<ol>
<li>Inverse ARP cannot work without <del datetime="2013-08-15T10:21:47+00:00">LMI</del> [...see Roman's comment below...] both DLCI & IP address, because LMI is the mechanism used to learn about the DLCI associated with that interface (without LMI the router does not learn any DLCI, so it cannot send InARP messages) [...added: if you disable LMI <strong>but you manually configure a DLCI on that interface</strong> then InARP will work...]</li>
<li>all DLCIs learned via LMI are automatically associated with the main interface, so the Inverse ARP requests are generated only by the main interface</li>
<li>by default, Inverse ARP supports broadcasts - notice the <strong>broadcast</strong> word in the output of <strong>"show frame-relay map"</strong></li>
<li>Inverse ARP is automatically enabled by LMI (unless disabled by static mapping as we will see below)</li>
</ol>
<h3 id="address-resolution-via-static-mapping">Address Resolution via Static Mapping</h3>
<p>Inverse ARP is a dynamic mechanism of mapping IP address to a DLCI, but the same result can be achieved with static mapping via manual configuration.<br/>
A very important note here (and easily overlooked during exams) is that the <font color="red"><strong>static mapping disables the Inverse ARP for the pair (protocol, DLCI)</strong></font> - where the protocol is IP. <strong>This is actually the key to the <a href="/2013/04/27/quiz-13/">quiz</a></strong>.<br/>
Suppose that you create static mapping for IP 192.168.1.2 to DLCI 102 – then, this will automatically disable InARP for <strong>the pair (IP, DLCI 102)</strong>.<br/>
Actually, the static mapping does not disable InARP completely - <strong><em>only the InARP requests</em></strong> - but the router will still reply to the InARP messages, if it receives any! </p>
<p>For full explanation about Frame Relay technology, please read <a href="http://resources.intenseschool.com/demystifying-frame-relay/">Demystifying Frame Relay</a> article. </p>
<h3 id="quiz-review">Quiz Review</h3>
<p>The two routers in the quiz (R1 and R2) are using initially dynamic mappings (via Inverse ARP) to learn about each other: </p>
<p><a href="/uploads/quiz-13.png"><img alt="quiz-13" src="/uploads/quiz-13.png"/></a> </p>
<p>Later on, when the network engineer adds the static mapping for its own IP address, everything looks ok, at least for a while: </p>
<div class="row">
<pre class="col-md-10">R1#
interface Serial0/0
ip address 192.168.1.1 255.255.255.0
encapsulation frame-relay
<green>frame-relay map ip 192.168.1.1 102</green>
end
<red>R1#sh frame map
Serial0/0 (up): <u>ip 192.168.1.1 dlci 102</u>(0x66,0x1860), <u>static</u>,
CISCO, status defined, active
Serial0/0 (up): <u>ip 192.168.1.2 dlci 102</u>(0x66,0x1860), <u>dynamic</u>,
broadcast,, status defined, active</red>
R1#
R1#ping 192.168.1.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.1.1, timeout is 2 seconds:
<green>!!!!!
Success rate is 100 percent</green> (5/5), round-trip min/avg/max = 4/40/116 ms
R1#
R1#ping 192.168.1.2
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.1.2, timeout is 2 seconds:
<green>!!!!!**
Success rate is 100 percent</green> (5/5), round-trip min/avg/max = 4/39/84 ms
R1#</pre>
</div>
<div class="row">
<pre class="col-md-10">R2#
interface Serial0/0
ip address 192.168.1.2 255.255.255.0
encapsulation frame-relay
<green>frame-relay map ip 192.168.1.2 201</green>
end
<red>R2#sh frame map
Serial0/0 (up): <u>ip 192.168.1.2 dlci 201</u>(0xC9,0x3090), <u>static</u>,
CISCO, status defined, active
Serial0/0 (up): <u>ip 192.168.1.1 dlci 201</u>(0xC9,0x3090), <u>dynamic</u>,
broadcast,, status defined, active</red>
R2#
R2#ping 192.168.1.1
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.1.1, timeout is 2 seconds:
<green>!!!!!
Success rate is 100 percent</green> (5/5), round-trip min/avg/max = 4/38/80 ms
R2#
R2#ping 192.168.1.2
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.1.2, timeout is 2 seconds:
<green>!!!!!
Success rate is 100 percent</green> (5/5), round-trip min/avg/max = 4/36/148 ms
R2#</pre>
</div>
<p>The problem is that this apparently working status will last only as long as the dynamic entry exists.<br/>
If for any reason (interface reset, clearing of dynamic mappings, device reboot) <strong><em>the dynamic entry disappears</em></strong>, it will never re-appear because of the additional static mapping configured in the meantime.<br/>
As explained in the above section <strong><em>"Address Resolution via Static Mapping"</em></strong>, the static mapping disables the Inverse ARP for the pair (protocol, DLCI) - in our case, Inverse ARP will be disabled for pair DLCI 102 - protocol IP.<br/>
So, you cannot use both dynamic and static mapping for the same DLCI/protocol pair! </p>
<h3 id="quiz-solutions">Quiz Solutions</h3>
<p>Considering the requirement of the quiz (make the routers be able to ping their own IP addresses), I suggest the following solutions: </p>
<p><strong>1. Create static entries for both IP addresses (itself and peer)</strong> </p>
<p>In this case we will avoid having both dynamic and static mappings for same protocol/DLCI pair, situation that appeared due to a race condition (the dynamic entry was created before the static one): </p>
<p><a href="/uploads/quiz-13_solution-1.png"><img alt="quiz-13_solution-1" src="/uploads/quiz-13_solution-1.png"/></a> </p>
<p><strong>2. Create point-to-point sub-interfaces</strong> </p>
<p>This also represents a solution because point-to-point sub-interfaces do not "need" mappings, routers just sent any data onto the sub-interface: </p>
<p><a href="/uploads/quiz-13_solution-2.png"><img alt="quiz-13_solution-2" src="/uploads/quiz-13_solution-2.png"/></a><br/>
<em>Note that this solution may not be suitable/available in all scenarios</em>. </p>
<p><em>Thank you for your comments and interest in the quiz!</em> </p>
<p><br/></p>Quiz #16 – BGP Filtering Updates2013-08-01T00:00:00+01:00Costitag:costiser.ro,2013-08-01:2013/08/01/quiz-16/<p><span class="dropcap">C</span>ompany ABC is in process of configuring BGP confederation between its sites.<br/>
For the moment, BGP has been configured between <strong>R1 - R3</strong> and <strong>R1 - R2</strong>, while in the following weeks, there will be <strong><red>no BGP peering between R3 and R2<red></red></red></strong>.<br/>
The direct link between R3 and R2 was recently installed and a <strong>static route</strong> was configured on R3 for site-2 network, <code>192.168.200.0/24</code>, to use the new link. </p>
<p><a href="/uploads/quiz-16.png" title="Quiz-16"><img alt="quiz-16-1" src="/uploads/quiz-16.png" title="Quiz-16"/></a> </p>
<p>Until the moment that BGP will be configured between R3 and R2, you <strong><red>do not</red></strong> want to <strong><red>advertise site-2 (R2) network, <em>192.168.200.0/24</em></red></strong> towards your external partner, AS 400.</p><br/>
Here is the current status:
<div class="row">
<pre class="col-md-11"><black>============= on R3 ===================</black>
R3#<blue>sh ip bgp</blue>
...
Network Next Hop Metric LocPrf Weight Path
*> 10.0.0.0 10.0.0.4 0 0 400 i
*> 192.168.100.0 192.168.13.1 0 100 0 (65100) i
r> 192.168.200.0 192.168.12.2 0 100 0 (65100 65200) i
<black>============= on R4 ===================</black>
R4#<blue>sh ip bgp</blue>
...
Network Next Hop Metric LocPrf Weight Path
*> 10.0.0.0 0.0.0.0 0 32768 i
*> 192.168.100.0 10.0.0.3 0 123 i
*> 192.168.200.0 10.0.0.3 0 123 i
</pre>
</div>
<p><strong>How can you achieve the task (don't advertise 192.168.200.0/24 to R4) <em><u><red>without</red></u></em> modifying the ALLOW-ALL route-map (or applying other policies to the BGP neighbors) ?</strong><br/>
<em>The configuration needs to be applied <u>only</u> on R3.</em> </p>
<p><em>Post your answer in the ‘Comments’ section below and subscribe to this blog to get the detailed solution and more interesting quizzes.</em> </p>
<p><br/></p>Quiz #15 – RIPv2 Problems after Link Failure2013-06-03T00:00:00+01:00Costitag:costiser.ro,2013-06-03:2013/06/03/quiz-15/<p><span class="dropcap">I</span>n the below scenario, subnet <code>192.168.1.0/24</code> is used for transit links between routers and <code>172.16.0.0/16</code> is used for the user vlans situated on different floors. RIP version 2 runs on all three routers and additional <em>static default routes</em> are configured on <strong>R1</strong> and <strong>R2</strong> to forward internet traffic towards border router R3. </p>
<p>Network is up and running for more than few months without any complaints.</p>
<p><a href="/uploads/quiz-15.jpg" title="Quiz 15 - RIP version 2"><img alt="quiz-15" src="/uploads/quiz-15.jpg" title="Quiz 15 - RIP version 2"/></a> </p>
<p><br><br/>
At some moment, the <em><red>Fa0/1 interface on R1</red></em> (connected to users on <strong><em><red>Floor 1</red></em></strong>) <em><red>goes down</red></em>. Since no-one is working on this floor that day, the network administrator doesn't give priority to this issue and goes to lunch.<br/>
While on lunch, he receives phone calls from users on the other floor, <strong>Floor 2</strong>, complaining that <red>the entire network experiences slowness, including communication toward the internet</red>. </br></p>
<p><strong>Considering the information that you have in the above diagram, could you try to indicate <em>what is the problem ?</em></strong> </p>
<p><em>Try answering this quiz only with the above information</em>. If you have no ideas of the problem, then un-hide the following hidden tip: </p>
<div class="panel panel-default">
<div class="panel-heading panel-title">
<a class="accordion-toggle" data-parent="#accordion-1" data-toggle="collapse" href="#collapse-Two">
Give me a hint about the problem !
</a>
</div>
<div class="panel-collapse collapse" id="collapse-Two">
<div class="panel-body">
<p>Upon his return, the network administrator uses the following command on R1 and immediately spots the problem:
<div class="row">
<pre class="col-md-10">R1#<blue>sh ip protocols</blue>
Routing Protocol is "rip"
Outgoing update filter list for all interfaces is not set
Incoming update filter list for all interfaces is not set
Sending updates every 30 seconds, next due in 13 seconds
Invalid after 180 seconds, hold down 180, flushed after 240
Redistributing: rip
Default version control: send version 2, receive version 2
Interface Send Recv Triggered RIP Key-chain
FastEthernet0/0 2 2
Serial0/0 2 2
FastEthernet0/1 2 2
Automatic network summarization is in effect
Maximum path: 4
Routing for Networks:
172.16.0.0
192.168.1.0
Routing Information Sources:
Gateway Distance Last Update
192.168.1.10 120 00:00:27
192.168.1.1 120 00:00:26
Distance: (default is 120)</pre>
</div>
</p>
</div>
</div>
</div>
<p><strong><em>What solution(s) would you suggest to this problem?</em></strong> </p>
<p><em>Post your answer in the 'Comments' section below and subscribe to this blog to get the detailed solution and more interesting quizzes.</em> </p>
<p><br/></p>OSPF – Understanding the Forwarding Address (FA)2013-05-26T00:00:00+01:00Costitag:costiser.ro,2013-05-26:2013/05/26/ospf-understanding-the-forwarding-address-fa/<p><span class="dropcap-bg">H</span>ere I come with the solution for <a href="/2013/04/04/quiz-12/">quiz-12</a>, that I consider to be one of the most difficult quiz published on this blog, until now.<br/>
The difficultly (and the underlying problem in the quiz) is given by fact that some LSAs will contain a non-zero FA (Forwarding Address) while others have it set to 0.0.0.0. </p>
<h3 id="quiz-review">Quiz Review</h3>
<p>In the beginning, let's see why FA (Forwarding Address) exists in the first place. We all know that packets destined to external destinations are routed through the advertising ASBR. According to <a href="http://www.ietf.org/rfc/rfc2328.txt">RFC 2328 (see page 141)</a>, there might be situations when this behaviour is not desirable, so they have introduced the concept of FA in order to avoid extra hops in the path.<br/>
Consider the diagram below, where both <strong>RT-A</strong> and <strong>RT-B</strong> are connected to RT-X (a partner company), but only RT-A speaks eBGP with partner company (let's say that RT-B does not have enough memory to run BGP). <strong>RT-A</strong> redistributes the BGP routes (ex: <code>172.16.10.0/24</code>) into the OSPF domain, thus becoming an ASBR. </p>
<p><a href="/uploads/Common_problem_without_FA.png" title="OSPF - Common problem without the Forwarding Address (FA)"><img alt="Common_problem_without_FA" src="/uploads/Common_problem_without_FA.png" title="OSPF - Common problem without the Forwarding Address (FA)"/></a> </p>
<p>Without the concept of FA, traffic from <strong>RT-B</strong> towards those external destinations will go via the ASBR (RT-A), as shown in the traceroute output.<br/>
If RT-A set FA = 192.168.1.3, then RT-B would route directly to FA instead of ASBR, as you can see below: </p>
<p><a href="/uploads/traceroutes_with_and_without_FA.png" title="Traceroute with and without Forwarding Address (FA)"><img alt="traceroutes_with_and_without_FA" src="/uploads/traceroutes_with_and_without_FA.png" title="Traceroute with and without Forwarding Address (FA)"/></a> </p>
<h3 id="forwarding-address-fa">Forwarding Address (FA)</h3>
<p>Now, let's see what are the conditions required to have a non-zero FA. According to Cisco documentation, <strong><red>all</red></strong> of these conditions need to be true: </p>
<ul>
<li>1. <u>OSPF is enabled</u> on the ASBR's next hop interface <red>AND</red></li>
<li>2. ASBR's next hop interface is <u>non-passive under OSPF</u> <red>AND</red></li>
<li>3. ASBR's next hop interface is <u>not point-to-point</u> <red>AND</red></li>
<li>4. ASBR's next hop interface is <u>not point-to-multipoint</u> <red>AND</red></li>
<li>5. ASBR's next hop interface address falls <u>under the network range</u> specified in the router ospf command</li>
</ul>
<p><em><font size="-1">Note the <strong>AND</strong> logical operator between each condition. An easier way to remember this (using the "inverse logic"): <blue>next-hop interface must be a broadcast interface that is natively advertised in OSPF</blue>.</font></em> </p>
<p>Going back to the quiz, we see that <strong>both R1 and R2 are ASBR</strong> for external destinations (static routes for <code>172.16.10.0/24</code> and <code>172.16.11.0/24</code>). Each of them injects <em>Type-5 LSAs but with different information</em>:<br/>
- <strong><blue><em>R1 sets the FA address to R5's address (192.168.1.5)</em></blue></strong> because all of the above conditions are true !<br/>
- <strong><blue><em>R2 sets the FA address to zero (0.0.0.0)</em></blue></strong> because the connection to R6 is a <red><strong>point-to-point interface</strong></red> ! </p>
<p><a href="/uploads/quiz-12.png" title="Quiz 12 - Forwarding Address in OSPF"><img alt="quiz-12" src="/uploads/quiz-12.png" title="Quiz 12 - Forwarding Address in OSPF"/></a> </p>
<p><strong><em>R3</em></strong> (<em>and all other OSPF routers</em>) receives two external LSAs for same destinations and it choose the best path based on the <strong><purple>forwarding metric</purple></strong> with the following comparison: </p>
<ul>
<li>Type-5 LSA generated by <strong>ASBR R1 with FA = 192.168.1.5</strong> - with a <em><purple>metric to reach the FA (in this case: <strong>metric 3</strong>)</purple></em></li>
<li>Type-5 LSA generated by <strong>ASBR R2 with FA = 0.0.0.0</strong> - with a <em><purple>metric to reach the ASBR (in this case: <strong>metric 2</strong>)</purple></em></li>
</ul>
<p>thus, it considers that <u>best path is via ASBR R2</u>. </p>
<h3 id="solutions">Solutions</h3>
<p>There are several solutions to this quiz, each with different result, and you may have to consider what do you want to achieve: load balancing on both exit links (<strong>R1 and R2</strong>) or use only Fast Ethernet exit on <strong>R1</strong>.<br/>
Basically, the idea is to have same behavior on both ASBRs: either both will set a non-zero FA or both set it to 0.0.0.0. <br/>
<br> </br></p>
<ul>
<li>
<p><font color="blue">on <strong>R1</strong>, replace the <code>network 0.0.0.0</code> under the ospf process with <em>more specific statements</em> (only for internal interfaces)</font><br/>
This may or may not be useful in certain scenarios when you want to passively advertise connected interfaces. <br/>
This will break the 1st condition of setting a non-zero FA as mentioned above, so both ASBRs will set 0.0.0.0 as Forwarding Address.<br/>
<br> <br/>
<a href="/uploads/FA-specific-network.png"><img alt="FA-specific-network" src="/uploads/FA-specific-network.png"/></a>
With this solution you achieve load-balancing over both exit points from router equally far to both R1 and R2 (for example, R3 is equally far from exit points R1 and R2).<br/>
<br><br/></br></br></p>
</li>
<li>
<p><font color="blue">on <strong>R1</strong>, make the <strong>interface</strong> connected to partner router as <strong>passive</strong> under the ospf process</font><br/>
This will break the 2nd condition of setting a non-zero FA, so both ASBRs will set 0.0.0.0 as Forwarding Address.<br/>
<a href="/uploads/FA-passive-interface.png"><img alt="FA-passive-interface" src="/uploads/FA-passive-interface.png"/></a><br/>
You get the same load-balancing effect from R3.<br/>
<br><br/></br></p>
</li>
<li>
<p><font color="blue">on <strong>R2</strong>, configure the interface connected to partner router as <strong>broadcast OSPF type</strong></font><br/>
This will make all conditions TRUE, on R2, for setting a non-zero FA.<br/>
<a href="/uploads/FA-serial-as-broadcast.png"><img alt="FA-serial-as-broadcast" src="/uploads/FA-serial-as-broadcast.png"/></a><br/>
In this solution, both Type-5 LSAs have a non-zero FA so the best path is chosen based on the forward metric: </p>
<ul>
<li>LSA Type-5 from R1 (<strong>FA = 192.168.1.5</strong>) has a <em><strong><green>forward metric of 3</green></strong></em> </li>
<li>LSA Type-5 from R2 (<strong>FA = 192.168.2.6</strong>) has a <strong><em><green>forward metric of 66</green></em></strong></li></ul></li></ul>
<p><br>
In case the primary link between R1 and R5 fails, the external destinations are reachable via serial between R2 and R6 - notice the forward metric: </br></p>
<p><a href="/uploads/FA-primary-link-down.png"><img alt="FA-primary-link-down" src="/uploads/FA-primary-link-down.png"/></a> </p>
<p>Of course, there is much more to discuss about Forwarding Address, but this is one of the first article on this subject - more quizzes to follow :-) </p>
<p><em>Thank you for your comments and interest in the quiz!</em> </p>
<p><br/></p>Quiz #14 – Default Originate into OSPF2013-05-23T00:00:00+01:00Costitag:costiser.ro,2013-05-23:2013/05/23/quiz-14/<p><span class="dropcap-bg">Y</span></p>
<p>Your company's network follows a standard three tier hierarchical desing (Core, Distribution, Access) and a WAN module that consists of two border routers, each having a separate connection to different ISPs (<strong>eBGP</strong> sessions running with the ISPs). <br/>
Inside your network, you run OSPF with a flat Area 0 everywhere. Between the Border Routers and the Core routers you run <strong>iBGP 200</strong> with <em>next-hop-self</em> used on BRs towards the CORES.<br/>
The ISP sends you a default route via BGP. You want to push a default route down to the Distribution routers and you configure command <purple><code>default-information originate</code></purple> on the CORE, as shown below: </p>
<p><a href="/uploads/quiz-14.png" title="Quiz-14 Default Originate into OSPF"><img alt="quiz-14" src="/uploads/quiz-14.png" title="Quiz-14 Default Originate into OSPF"/></a> </p>
<p><br>
<em>For some reason, the Distribution does not get the default route from Core:</em> </br></p>
<div class="row">
<pre class="col-md-10"><purple>CORE</purple>#<blue>sh ip route</blue>
...
Gateway of last resort is 192.168.255.1 to network 0.0.0.0
192.168.255.0/32 is subnetted, 2 subnets
C 192.168.255.2 is directly connected, Loopback0
O 192.168.255.1 [110/2] via 192.168.1.1, 00:49:25, FastEthernet0/0
192.168.1.0/30 is subnetted, 2 subnets
C 192.168.1.0 is directly connected, FastEthernet0/0
C 192.168.1.4 is directly connected, FastEthernet0/1
<green>B* 0.0.0.0/0 [200/0] via 192.168.255.1, 00:49:20</green>
</pre>
</div>
<div class="row">
<pre class="col-md-10"><purple>DIST</purple>#<blue>sh ip route</blue>
...
Gateway of last resort is not set
192.168.255.0/32 is subnetted, 2 subnets
O 192.168.255.2 [110/2] via 192.168.1.5, 00:49:43, FastEthernet0/0
O 192.168.255.1 [110/3] via 192.168.1.5, 00:49:43, FastEthernet0/0
192.168.1.0/30 is subnetted, 2 subnets
O 192.168.1.0 [110/2] via 192.168.1.5, 00:49:43, FastEthernet0/0
C 192.168.1.4 is directly connected, FastEthernet0/0
</pre>
</div>
<p><em><font size="-1">* Note that the output does not show any info about the right-hand side routers (gray colored in the diagram)</font></em> </p>
<p><strong><em>What is the problem?</em></strong> How to solve it (without splitting the OSPF into separate areas) ? </p>
<p><em>Post your answer in the 'Comments' section below and subscribe to this blog to get the detailed solution and more interesting quizzes.</em> </p>
<p><br/></p>NAT – Order of Operation2013-05-12T00:00:00+01:00Costitag:costiser.ro,2013-05-12:2013/05/12/nat-order-of-operation/<p><span class="dropcap-bg">T</span>his post represents the solution and explanation for <a href="/2013/03/27/quiz-11/index.html">quiz-11</a>. <br/>
Have a look at the quiz to understand the problem. </p>
<p>A very important topic when configuring Network Address Translation (NAT) is the order of operation. The most asked question is <purple><strong><em>who is performed first: NAT or Routing?</em></strong></purple>. Unfortunately, as regards to Cisco IOS, the answer is: <purple><strong><em>it depends !</em></strong></purple> ... yes, it depends on which side the packet arrives, <em>inside</em> or <em>outside</em> (as defined by the <code>ip nat</code> commands). </p>
<div class="row"><div class="col-xs-12">
<div class="panel panel-blue">
<div class="panel-heading"><i class="fa fa-binoculars"></i> NAT - Order of Operation</div>
<div class="panel-body"><ul>
<li>if the packet arrives <b><red>on the inside interface</red></b>, the order is: <b><red>ROUTING (1st) --> NAT (local to global)</red></b>
<li>if the packet arrives <b><red>on the outside interface</red></b>, the order is: <b><red>NAT (global to local) --> ROUTING (2nd)</red></b>
</li></li></ul>
</div>
</div>
</div></div>
<p>Most of the times, NAT is used to hide or translate the source of the packet and you leave the destination unchanged, so you don't have to deal with the order of operations as long as routing is configured correctly. </p>
<p>But sometimes, you may want to translate also the destination of the packet, as it was demonstrated (or required) in the scenario described in the quiz.
<strong>The NAT configuration was correct</strong>, but still <em>end-to-end connectivity between 192.168.11.1 and partner 192.168.44.4 was not achieved</em>. </p>
<p>Let's review the quiz:</p>
<ul>
<li>your company’s server <code>192.168.11.1</code> will be seen as <code>172.16.23.1</code> on the partner side</li>
<li>partner server <code>192.168.44.4</code> will be seen as <code>192.168.1.4</code> inside your company’s network</li>
</ul>
<p>As shown in the diagram below, packets received on the outside interface from partner 192.168.44.4 are correctly translated and routed (because translation occurs <em>before</em> routing).<br/>
But packets received on the inside interface from 192.168.11.1 are not translated because the destination 192.168.1.4 (to be translated to 192.168.44.4) has a routing entry on Fa0/1 (connected subnet): </p>
<p><a href="/uploads/nat-order-of-operation-fail1.png" title="NAT - Order of Operation"><img alt="nat-order-of-operation--fail" src="/uploads/nat-order-of-operation-fail1.png" title="NAT - Order of Operation"/></a> </p>
<div class="row">
<pre class="col-md-9">R2#<blue>sh ip route 192.168.1.4</blue>
Routing entry for 192.168.1.0/24
Known via "connected", distance 0, metric 0 (connected, via interface)
Routing Descriptor Blocks:
* directly connected, <red>via FastEthernet0/0</red>
Route metric is 0, traffic share count is 1
[...]
R4#<blue>ping 172.16.23.1 source lo0</blue>
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.16.23.1, timeout is 2 seconds:
Packet sent with a source address of 192.168.44.4
<red>.....
Success rate is 0 percent (0/5)</red></pre>
</div>
<p><br>
In order to solve the quiz we need to adjust the routing for the address 192.168.1.4 to be routed onto the outside partner: </br></p>
<div class="row">
<pre>R2(config)#<purple>ip route 192.168.1.4 255.255.255.255 172.16.23.3</purple>
R2(config)#end
R2#<green>sh ip route 192.168.1.4</green>
Routing entry for 192.168.1.4/32
Known via "static", distance 1, metric 0
Routing Descriptor Blocks:
* 172.16.23.3
Route metric is 0, traffic share count is 1
[...]
R4#ping 172.16.23.1 source lo0
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 172.16.23.1, timeout is 2 seconds:
Packet sent with a source address of 192.168.44.4
<green>!!!!!
Success rate is 100 percent (5/5)</green>, round-trip min/avg/max = 48/79/100 ms</pre>
</div>
<p><a href="/uploads/nat-order-of-operation-success.png" title="NAT - Order of Operation"><img alt="nat-order-of-operation-success" src="/uploads/nat-order-of-operation-success.png" title="NAT - Order of Operation"/></a> </p>
<p>Of course, this situation is triggered by the fact that I used the <code>connected</code> subnets for NAT, but even when using different subnets, you need to know the order of operation to make it work. </p>
<p><em>Thank you for your comments and interest in the quiz!</em> </p>
<p><br/></p>Quiz #13 – Frame Relay self pinging2013-04-27T00:00:00+01:00Costitag:costiser.ro,2013-04-27:2013/04/27/quiz-13/<p><span class="dropcap">Y</span>ou have joined a new company that has few offices connected over the Frame Relay cloud as in the below diagram.<br/>
For monitoring purposes, you created a TCL script that pings all interfaces connected to the FR cloud (including itself). </p>
<p><br> </br></p>
<p><a href="/uploads/quiz-13.png" title="Quiz-13 Frame Relay"><img alt="quiz-13" src="/uploads/quiz-13.png" title="Quiz-13 Frame Relay"/></a> </p>
<p>But you noticed that each router cannot ping itself: </p>
<div class="row">
<div class="col-md-6">
<pre>R1#<purple>ping 192.168.1.1</purple>
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.1.1, timeout is 2 seconds:
<red>.....</red>
<red>Success rate is 0 percent</red> (0/5)
R1#
R1#<purple>ping 192.168.1.2</purple>
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.1.2, timeout is 2 seconds:
<green>!!!!!
Success rate is 100 percent</green> (5/5), round-trip min/avg/max = 1/22/72 ms
R1#
R1#sh frame map
Serial0/0 (up): ip 192.168.1.2 dlci 102(0x66,0x1860), dynamic,
broadcast,, status defined, active
R1#</pre>
</div>
<div class="col-md-6">
<pre>R2#<purple>ping 192.168.1.1</purple>
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.1.1, timeout is 2 seconds:
<green>!!!!!
Success rate is 100 percent</green> (5/5), round-trip min/avg/max = 1/10/28 ms
R2#
R2#<purple>ping 192.168.1.2</purple>
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.1.2, timeout is 2 seconds:
<red>.....
Success rate is 0 percent</red> (0/5)
R2#
R2#sh frame map
Serial0/0 (up): ip 192.168.1.1 dlci 201(0xC9,0x3090), dynamic,
broadcast,, status defined, active
R2#</pre>
</div>
</div>
<p>After some troubleshooting, you notice that each router misses a DLCI mapping for its own IP address, so you perform the following configuration: </p>
<div class="row">
<pre class="col-md-10">R1#
interface Serial0/0
ip address 192.168.1.1 255.255.255.0
encapsulation frame-relay
<green>frame-relay map ip 192.168.1.1 102</green>
end
R1#sh frame map
<blue>Serial0/0 (up): ip 192.168.1.1 dlci 102(0x66,0x1860), static,
CISCO, status defined, active
Serial0/0 (up): ip 192.168.1.2 dlci 102(0x66,0x1860), dynamic,
broadcast,, status defined, active</blue>
R1#
R1#<purple>ping 192.168.1.1</purple>
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.1.1, timeout is 2 seconds:
<green>!!!!!
Success rate is 100 percent</green> (5/5), round-trip min/avg/max = 4/40/116 ms
R1#
R1#<purple>ping 192.168.1.2</purple>
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.1.2, timeout is 2 seconds:
<green>!!!!!
Success rate is 100 percent</green> (5/5), round-trip min/avg/max = 4/39/84 ms
R1#</pre>
</div>
<div class="row">
<pre class="col-md-10">R2#
interface Serial0/0
ip address 192.168.1.2 255.255.255.0
encapsulation frame-relay
<green>frame-relay map ip 192.168.1.2 201</green>
end
R2#sh frame map
<blue>Serial0/0 (up): ip 192.168.1.2 dlci 201(0xC9,0x3090), static,
CISCO, status defined, active
Serial0/0 (up): ip 192.168.1.1 dlci 201(0xC9,0x3090), dynamic,
broadcast,, status defined, active</blue>
R2#
R2#<purple>ping 192.168.1.1</purple>
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.1.1, timeout is 2 seconds:
<green>!!!!!
Success rate is 100 percent</green> (5/5), round-trip min/avg/max = 4/38/80 ms
R2#
R2#<purple>ping 192.168.1.2</purple>
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 192.168.1.2, timeout is 2 seconds:
<green>!!!!!
Success rate is 100 percent</green> (5/5), round-trip min/avg/max = 4/36/148 ms
R2#</pre>
</div>
<p>All looks ok now: each router can ping the neighbor device over FR and also itself... </p>
<p><strong><em>Is there any problem with the end-result configuration?</em></strong> </p>
<p><em>Post your answer in the ‘Comments’ section below and subscribe to this blog to get the detailed solution and more interesting quizzes.</em> </p>
<p><br/></p>OSPF on PE-CE Links and the Understanding the Down Bit2013-04-15T00:00:00+01:00Costitag:costiser.ro,2013-04-15:2013/04/15/ospf-on-pe-ce-links-and-the-understanding-the-don-bit/<p><span class="dropcap-bg">T</span>
his post represents the solution and explanation for <a href="/2013/03/17/quiz-10/index.html">quiz-10</a>.<br/>
Have a look at the quiz to understand the problem. </p>
<h3 id="quiz-review">Quiz Review</h3>
<p>This quiz has a detailed solution already presented in one of my very first posts, from 2011, back in the times when I was "experimenting" with the term of "blogging" ☺ For this reason, <strong><em>please have a look at <a href="/ospf-on-ce-pe-links-2.html">that post</a> to understand the theoretic part</em></strong> and then read this article to get the full picture. </p>
<p>In short, <red>the problem is caused by the fact that the PEs set the Down-bit (DN bit) in all Type-3 LSA sent towards the respective CEs</red>. As explained in the above-mentioned post, this is used to prevent a routing loop in situations when that LSA reaches another PE (for example, if there is a backdoor between two CEs, then that LSA will reach other PEs via the backdoor link, thus creating a routing loop). </p>
<p>The consequence of the Down bit is that <red>not only the PEs but <strong>any router running VRFs</strong> (with or without MPLS = VRF-lite or not) <u>will drop</u> the LSAs having the DN-bit set</red> and in our quiz, CE2(R5) is using VRF-lite to keep separate routing tables for internal network vs. partner routes.<br/>
<em>Note that the LSA is not actually dropped, it exists in the OSPF database (as shown in the quiz), but it is <strong>not</strong> considered for SPF calculations!</em> </p>
<p><a href="/uploads/lsa-with-dn-bit-set11.jpg" title="LSA with DN bit set"><img alt="LSA with DN bit set" src="/uploads/lsa-with-dn-bit-set11.jpg" title="LSA with DN bit set"/></a> </p>
<p><strike>It is interesting to note that Cisco does not fully respect the RFC standards</strike> [UPDATE: RFC4577 RFC4577 actually talks about both methods: DN bit and route tagging, though it considers tagging as an "old implementation". Note that RFC4576 only talks about DN bit]. <a href="http://www.ietf.org/rfc/rfc4577.txt">RFC 4577</a> (OSPF as the Provider/Customer Edge Protocol for BGP/MPLS IP VPNs) and <a href="http://www.ietf.org/rfc/rfc4576.txt">RFC 4576</a> (Using a LSA Options Bit to Prevent Looping in BGP/MPLS IP VPNs) say:<br/>
<green><em>(quote) When a type 3, 5, or 7 LSA is sent from a PE to a CE, the DN bit MUST be set. The DN bit MUST be clear in all other LSA types.(end-of-quote)</em></green><br/>
but Cisco sets the DN-bit <u>only</u> in the Type-3 LSAs (and it is using route tagging for External Type-5 LSA as loop prevention mechanism). </p>
<p><br>
<em>How does the PE decide what type of LSA to generate when an OSPF route is received over the MPLS cloud (the superbackbone) ?</em> </br></p>
<p>This is based on the attributes that travel along with the route within the VPNv4 prefix as <strong><em><blue>BGP extended communities</blue></em></strong>.<br/>
<br>Let's see: Office #1 <code>route 192.168.1.254/32</code> (lo0 on CE1/<strong>R1</strong>) reaches PE1 (<strong>R2</strong>) which sends it over the MPLS/BGP cloud as a VPNv4 and reaches PE2 (<strong>R4</strong>). </br></p>
<p><a href="/uploads/quiz-10.png" title="Quiz-10"><img alt="Quiz-10" src="/uploads/quiz-10.png" title="Quiz-10"/></a> </p>
<p>PE2/<strong>R4</strong> selects it as BGP best route (according to BGP rules) and then it redistributes it into Area 2 (towards R5/CE2), but before doing that, it needs to decide which type of LSA to generate for it. Under <em>normal BGP-to-OSPF redistribution</em> conditions, this route should be sent as a <em>Type-5 External</em>, but this is not true from customer's perspective - this route is NOT an external-AS one.
Companies, which choose MPLS VPN services to interconnect their offices, expect the routes from other offices to be seen as internal, not
external. <br/>
For this to happen, the second PE (<strong>R4</strong>) compares the domain-id contained in the VPNv4 route with the domain-id of the OSPF running with CE2. By default, in Cisco implementations for IPv4 (OSPFv2), the domain-id is set to be the same as the process-id. <em>(Note that for OSPFv3, the default value is NULL).</em> </p>
<p>The domain-id can be configured to a non-default value with the following command under the OSPF process: </p>
<div class="row">
<pre class="col-md-8">R4(config)#router ospf 100 vrf CUST_A
R4(config-router)#<green>domain-id ?</green>
A.B.C.D OSPF domain ID in IP address format
Null Null Domain-ID
type OSPF domain ID type in Hex format
</pre>
</div>
<p>If the <strong>domain-id</strong> contained in the VPNv4 route <strong>is different</strong> than the one configured/existing in the local OSPF process on CE2, then <blue>the OSPF route is considered external (O E2)</blue> and it is injected as a <strong>Type-5 LSA</strong>. </p>
<p>In our quiz, as there's no special configuration for the domain-id, both PEs (R2 and R4) uses the <u>process-id (100 on both routers)</u>. Here is output from PE (<strong>R4</strong>):</p>
<p><a href="/uploads/vpnv4-route.png" title="VPNV4 Route"><img alt="vpnv4-route" src="/uploads/vpnv4-route.png" title="VPNV4 Route"/></a> </p>
<p>As a result, <purple>the route is considered internal (inter-area route, O IA)</purple>, therefore injected as a <b>Type-3 LSA</b> with the Down-bit (DN-bit) set as per loop prevention requirements: </p>
<div class="row">
<pre class="col-md-8">R5#<purple>sh ip ospf data summary 192.168.1.254</purple>
OSPF Router with ID (192.168.2.254) (Process ID 100)
Summary Net Link States (Area 2)
LS age: 1119
Options: (No TOS-capability, DC, <red>Downward</red>)
LS Type: Summary Links(Network)
Link State ID: 192.168.1.254 (summary Network Number)
Advertising Router: 192.168.2.4
LS Seq Number: 80000004
Checksum: 0x2BB3
Length: 28
Network Mask: /32
TOS: 0 Metric: 2</pre>
</div>
<p>As I mentioned above, these LSAs with DN-bit set are <em><strong>not</strong> considered for SPF calculations</em> by any OSPF router running VRFs => that summary LSA does not appear in the RIB on CE2/R5 router!</p>
<h3 id="solutions">Solutions</h3>
<p>Now, let's see what solutions exists for this quiz: </p>
<h4 id="1-using-capability-vrf-lite-command">1. Using <code>capability vrf-lite</code> Command</h4>
<p>Cisco introduces an "elegant" solution for exactly this type of problem: command <b><code>capability vrf-lite</code></b> configured on the router running VRF-lite, which is <strong><u>not</u></strong> a PE, in order to make him consider that LSA (with DN-bit set) for the SPF calculations. </p>
<p><a href="/uploads/capability-vrf-lite.png" title="Capability VRF-Lite"><img alt='capability-vrf-lite "Capability VRF-Lite"' src="/uploads/capability-vrf-lite.png" title="Capability VRF-Lite"/></a><br/>
<em>Note that the OSPF adjacency gets reset when this command is configured!</em> </p>
<h4 id="2-setting-different-domain-id">2. Setting Different Domain ID</h4>
<p>Another solution is to manually configure different OSPF domain-ids on the PEs (using command <code>domain-id</code>) in order to force the OSPF routes learned over the MP-BGP to be injected as <b>external Type-5 LSAs</b>.<br/>
Cisco implementation does <strong>not</strong> set the DN-bit in the type-5 LSAs (and for this reason, this command represents a candidate solution for our quiz): </p>
<p>Let's set a different domain-id on PE2/R4 and see the result on CE2/R5: </p>
<p><a href="/uploads/different-domain-id.png" title="OSPF - Different Domain ID"><img alt="different-domain-id" src="/uploads/different-domain-id.png" title="OSPF - Different Domain ID"/></a><br/>
<em>Notice the routes learned as external "O E2"</em> </p>
<div class="row">
<pre class="col-md-10">R5#<purple>sh ip osp data ex 192.168.1.254</purple>
OSPF Router with ID (192.168.2.254) (Process ID 100)
Type-5 AS External Link States
Routing Bit Set on this LSA
LS age: 302
Options: (No TOS-capability, DC)
LS Type: AS External Link
Link State ID: 192.168.1.254 (External Network Number )
Advertising Router: 192.168.2.4
LS Seq Number: 80000001
Checksum: 0x9C6B
Length: 36
Network Mask: /32
Metric Type: 2 (Larger than any link state path)
TOS: 0
Metric: 2
Forward Address: 0.0.0.0
<green>External Route Tag: 3489660929</green>
</pre>
</div>
<p><em>Notice that there's <strong>no DN bit set on type-5 LSA (normally indicated by the <u>Downward</u> keyword)</strong> (contrary to what the RFC says !) but there is a route tag of 3489660929 = 11010000.00000000.<red>00000000.00000001</red> (in binary) which is according to the RFC 4577 (section 4.2.5.2):
<strong>"1 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 <red><em>AS number of the VPN Backbone</em></red>" (in our case <red>1</red>)</strong></em> </p>
<h5 id="21-setting-different-process-id">2.1 Setting Different Process ID</h5>
<p>Since, by default, the OSPF domain-id for OSPFv2 (in Cisco Implementation) is set equal to the Process-ID number, then configuring different process-ids on the PEs (under the relevant OSPF VRF instances) will get us in the exact situation as above solution 2. </p>
<h4 id="3-using-a-sham-link_1">3. Using a Sham-Link (??)</h4>
<p>Configuring a sham-link between PEs was indicated by a lot of readers as a possible solution to this quiz.<br/>
Unfortunately, <strong><em>it does <u>not</u> represent a solution right away</em></strong> because of 2 thinks that appear in the quiz: </p>
<ul>
<li>the "<red>network 0.0.0.0 255.255.255.255 area X</red>" that is configured on both PEs -- remember: Sham-Links Endpoint Addresses MUST NOT be advertised by OSPF</li>
<li>the bigger impediment is the <red>difference in the Area numbers</red> on the PE-CE Areas of the offices (<strong>Area 1</strong> on R1-R2 vs <strong>Area 2</strong> on R4-R5)<br/>
Of course, nothing stops you from choosing one of the area number (let's say 1) and use this number in command <strong><code>area 1 sham-link x.x.x.x y.y.y.y</code></strong>) on both PEs => thus the sham-link and the adjacency will go up.
Otherwise, if you continue to use a different Areas towards the CE2/R5 then the PE2/R4 will still send those routes as summary Type-3 and, of course, the DN-bit will still be set on them! </li>
</ul>
<p>In order to make this solution to work, we need to solve these two above "extra"-issues and here is the final configuration: </p>
<p><a href="/uploads/sham-link-solution.png" title="Sham-link solution"><img alt="sham-link-solution" src="/uploads/sham-link-solution.png" title="Sham-link solution"/></a> </p>
<p>As a result, due to the sham-link, the customer CE2/R5 (now configured in new Area 1) will see the OSPF routes redistributed by PE2/R4 as intra-area, which solves the quiz (since <green>there is <strong>no DN-bit</strong> in type-1 LSAs</green>): </p>
<div class="row">
<pre class="col-md-10">R5#sh ip route vrf MY_NETWORK ospf
192.168.1.0/24 is variably subnetted, 2 subnets, 2 masks
O 192.168.1.0/31 [110/3] via 192.168.2.4, 00:40:44, FastEthernet0/0
O 192.168.1.254/32 [110/4] via 192.168.2.4, 00:40:44, FastEthernet0/0
</pre>
</div>
<p><br>
<em>Thank you for your comments and interest in the quiz!</em> </br></p>
<p><br/></p>Quiz #12 – OSPF Improper Path Selection2013-04-04T00:00:00+01:00Costitag:costiser.ro,2013-04-04:2013/04/04/quiz-12/<p><span class="dropcap">C</span>ompany ABC closes a deal with a Partner Company that requires redundant network paths between the two networks.
This requirement is met by enabling 2 connections: <blue><strong>one FastEthernet</strong> between R1 and R5</blue> and <red><strong>one Serial</strong> link between R2 and R6</red>. </p>
<p><strong>Static routes</strong> are configured <strong>on R1 and R2</strong> for the 2 subnets in partner's network (<code>172.16.10.0/24</code> and <code>172.16.11.0/24</code>) and they are <strong>redistributed</strong> into the OSPF Area 0: </p>
<p><br>
<a href="/uploads/quiz-12.png" title="Quiz 12 - OSPF improper Path Selection"><img alt="quiz-12" src="/uploads/quiz-12.png" title="Quiz 12 - OSPF improper Path Selection"/></a> </br></p>
<p>Unfortunately, the network administrator is not happy with the OSPF best path performed on the company router <strong>R3</strong> for those external routes.<br/>
As displayed above, <em><strong>R3</strong> prefers the <red>path via R4 -> then R2 -> then Serial link</red> to reach the Partner's network</em>. </p>
<div class="row">
<pre class="col-md-10">R3#<purple>traceroute 172.16.10.5</purple>
...
<red>1 10.0.34.4 80 msec 48 msec 28 msec
2 10.0.24.2 64 msec 40 msec 68 msec
3 192.168.2.6 32 msec 56 msec 64 msec
4 172.16.10.5 124 msec * 148 msec</red>
R3#
R3#<purple>sh ip route 172.16.10.0</purple>
Routing entry for 172.16.10.0/24
Known via "ospf 1", distance 110, metric 20, type extern 2, forward metric 2
Last update from 10.0.34.4 on FastEthernet0/1, 00:06:56 ago
Routing Descriptor Blocks:
* 10.0.34.4, from 192.168.2.2, 00:06:56 ago, <red>via FastEthernet0/1</red>
Route metric is 20, traffic share count is 1
R3#</pre>
</div>
<p><strong><em>Why is it so? Is there a problem with OSPF or the selected path is really the shortest/correct one ?</em></strong> </p>
<p><em><strong>Notes</strong></em>:<br/>
- there was <strong><red>no manipulation</red></strong> of OSPF costs !!<br/>
- don't worry about return traffic: static routing for network 10.0.0.0/8 was configured on Partner devices ! </p>
<p><em>Post your solution in the 'Comments' section below and subscribe to this blog to get the solution and more interesting quizzes.</em> </p>
<p><br/></p>Protecting the BGP Session with MD5 Authentication2013-03-31T00:00:00+00:00Costitag:costiser.ro,2013-03-31:2013/03/31/bgp-md5-authentication/<p><span class="dropcap-bg">T</span>his post represents the solution and explanation for <a href="/2013/02/20/quiz-9/index.html">quiz-9</a>.<br/>
Have a look at the quiz to understand the problem. </p>
<p>The quiz brings up for discussion a well-known problem of BGP peering over a firewall (in this case, a Cisco ASA), problem that is caused by two things: BGP uses TCP Options to perform authentication and the fact that firewalls do not "like" IP nor TCP OPTIONS. </p>
<p><strong><em><red>IP OPTIONS</red></em></strong> are considered a security risk as they have the potential of leaking information about the internet network setup and, by default, firewalls drop all packets with IP OPTIONS. </p>
<p>As regards to the <strong><em><red>TCP OPTIONS</red></em></strong>, they can receive different treatment (depending the type of firewall and its configuration):</p>
<ul>
<li>forwarded unchanged, for example: <em>selective ACK, window scale, MSS</em>, etc</li>
<li>modified and then forwarded, for example in case of <em>MSS clamping</em></li>
<li>stripped off (actually, replace with NOP TCP Option, that is more like padding), for example: <em>BGP MD5 TCP Option</em></li>
<li>drop them (if they are not known and/or based on firewall configuration)</li>
</ul>
<p>BGP controls most of the Internet traffic so it is very important to keep it secure. Besides the attacks against the routing protocol itself, BGP shares the same weaknesses as any TCP established application and is vulnerable to:</p>
<ul>
<li><em><blue>attacks against confidentiality</blue></em> - capture the TCP communication between two devices and learn the information exchanged ("eavesdropping")</li>
<li><em><blue>attacks against integrity</blue></em> - insert forged BGP messages into the BGP peering of two devices (<em>"man in the middle"</em>)</li>
<li><em><blue>attacks agains the TCP session</blue></em>:<ul>
<li>reset the TCP session by falsely terminating the BGP established connection (send a packet with a RST bit set or a packet with SYN bit set, while spoofing all the other fields of the already established connection)</li>
<li>turn up a TCP session with an unauthorized party</li>
<li>re-routing or looping the connectivity between the BGP peerings (in case of multihop BGP)</li>
</ul>
</li>
<li><em><blue>denial of service (DOS) attacks</blue></em>:<ul>
<li>send a large number of SYN packets to consume device's memory - SYN flooding</li>
<li>saturate the link so that the BGP will time out</li>
</ul>
</li>
</ul>
<p>There are more methods to protect against these attacks, but today we will only cover <strong><purple>BGP Authentication</purple></strong>. </p>
<div class="row"><div class="col-md-10">
<div class="panel panel-red">
<div class="panel-heading"><i class="fa fa-binoculars"></i> Remember</div>
<div class="panel-body">
<a href="http://www.ietf.org/rfc/rfc2385.txt">RFC 2385</a> describes a TCP extenstion that enhances BGP security by defining a new <blue>TCP OPTION</blue> (<red><b>option-kind 19</b></red>) that carries an MD5 digest (this means that <i>BGP MD5 is not a function of BGP but an extention of TCP</i>) !
</div>
</div>
</div></div>
<p>This BGP MD5 <strong><em><green>will protect against</green></em></strong>:</p>
<ul>
<li>session hijacking and replay attacks (see above integrity attacks)</li>
<li>unauthorized BGP session turn-ups</li>
</ul>
<p>but it <strong><em><red>does not protect against</red></em></strong>:</p>
<ul>
<li>confidentiality - there's no encryption of the packet</li>
<li>denial of service - as the router's CPU needs to perform MD5 hashing against all packets sent by attacker</li>
</ul>
<p>According to the RFC, the MD5 digest is performed against several fields from both the IP and TCP header, as shown in the following diagram: </p>
<p><a href="/uploads/tcp-options-calculating-bgp-md5-digest.png" title="TCP Options - Calculating BGP MD5 Digest"><img alt="tcp-options-calculating-bgp-md5-digest" src="/uploads/tcp-options-calculating-bgp-md5-digest.png" title="TCP Options - Calculating BGP MD5 Digest"/></a> </p>
<ul>
<li>the <red>TCP Pseudo-Header</red> (includes: IP source, IP destination, zero-padded protocol number and TCP segment length)<ul>
<li><em>The pseudo-header is called like that because it refers to some fields from Layer 3/IP header</em></li>
</ul>
</li>
<li>the <red>TCP Header</red> excluding options (includes: source port, destination port, sequence numbers, etc)</li>
<li>the TCP segment <red>data</red> (if any)</li>
<li>the neighbor configured <red>password</red></li>
</ul>
<p>Cisco PIX/ASA version 7.x and later has some default protections that impacts the BGP MD5 authentication. <em><strong>By default, Cisco ASA will</strong></em>:<br/>
- <red>re-write any TCP OPTION 19 (BGP MD5) with NOP (No Operation)</red> that is used like padding<br/>
- <red>randomize the TCP sequence numbers</red> (by offsetting them with a random number) </p>
<p>The BGP MD5 digest needs to match on each side and for this to happen, <em><u>the information used for hashing needs to be identical</u></em>, such as: <em>IP source and destination, source and destination port, sequence numbers, TCP segment data, password</em>, etc... </p>
<p><em><strong>Solution</strong></em><br/>
For an authenticated BGP session to get established, it is not enough to allow TCP port 179 via the firewall but also you need to disable the above mentioned protections:</p>
<ul>
<li><strong><em>disable TCP MD5 option rewriting</em></strong></li>
<li><strong><em>disable TCP sequence number randomization</em></strong></li>
</ul>
<p>This also means, that it's <strong><em>not possible to perform NAT between BGP peers that uses authentication</em></strong> ! </p>
<p>In the end, let's see how the configuration looks like on a Cisco ASA to allow the routers to have a successful BGP authenticated session: </p>
<div class="row">
<pre class="col-md-9">FIREWALL#
!
access-list ACL_BGP extended permit tcp any any eq bgp
access-list ACL_BGP extended permit tcp any eq bgp any
!
<green>tcp-map TCP_MAP_ALLOW_OPTIONS
tcp-options range 19 19 allow</green>
!
class-map BGP_MD5
match access-list ACL_BGP
!
policy-map global_policy
class BGP_MD5
<green>set connection random-sequence-number disable
set connection advanced-options TCP_MAP_ALLOW_OPTIONS</green>
!
service-policy global_policy global
!</pre>
</div>
<p><em>Thank you for your comments in the quiz!</em> </p>
<p><br/></p>Quiz #11 – NAT both Source and Destination2013-03-27T00:00:00+00:00Costitag:costiser.ro,2013-03-27:2013/03/27/quiz-11/<p><span class="dropcap">Y</span>our company (Company ABC) makes a new contract with a Partner for a new research project. This requires connectivity between the Research Departments of the two companies.<br/>
Unfortunately, both companies uses the same private addressing space, <code>192.168.0.0/16</code>, for internet network.<br/>
In order to perform the connectivity, Network Address Translation needs to be configured, but your partner company does not have a dedicated network team (they outsource the network changes to external parties, when needed) so in order to make this work as soon as possible, the NAT is performed on Company ABC's border router (R2) as shown below: </p>
<p><a href="/uploads/quiz-11.png" title="Quiz 11 - NAT"><img alt="quiz-11" src="/uploads/quiz-11.png" title="Quiz 11 - NAT"/></a> </p>
<p>The target is: </p>
<ul>
<li>your company's server <strong>192.168.11.1</strong> will be seen as <strong>172.16.23.1</strong> on the partner side</li>
<li>partner server <strong>192.168.44.4</strong> will be seen as <strong>192.168.1.4</strong> inside your company's network</li>
</ul>
<p>Another constraint that exist within your network, is the fact that your border router (R2) is already using the old-style NAT (as apposed to new NVI, NAT Virtual Interface, feature) to hide/translate your internal network behind serial0/0 interface towards the Internet (R5=ISP), so for the moment, you have to use the legacy methods. </p>
<p>Unfortunately, there's something wrong with the new NAT because the connectivity between R1 and R4 is not working.<br/>
Here is the troubleshooting that you perform (<em><red>when pinging from 192.168.44.4 towards 192.168.11.1</red></em>): </p>
<div class="row">
<pre>R2#<purple>deb ip nat detailed</purple>
IP NAT detailed debugging is on
R2#
*Mar 1 01:26:24.335: NAT*: o: icmp (192.168.44.4, 3) -> (172.16.23.1, 3) [16]
*Mar 1 01:26:24.335: NAT*: s=192.168.44.4->192.168.1.4, d=172.16.23.1 [16]
*Mar 1 01:26:24.335: NAT*: s=192.168.1.4, d=172.16.23.1->192.168.11.1 [16]
R2#
R2#<purple>sh ip nat translations</purple>
Pro Inside global Inside local Outside local Outside global
--- --- --- 192.168.1.4 192.168.44.4
icmp 172.16.23.1:3 192.168.11.1:3 192.168.1.4:3 192.168.44.4:3
--- 172.16.23.1 192.168.11.1 --- ---
R2#un all
</pre>
</div>
<p><strong><em>What is the problem?</em></strong> </p>
<p><em>Post your answer in the 'Comments' section below and subscribe to this blog to get the detailed solution and more interesting quizzes.</em> </p>
<p><br/></p>MPLS LDP-IGP Synchronization2013-03-23T00:00:00+00:00Costitag:costiser.ro,2013-03-23:2013/03/23/mpls-ldp-igp-synchronization/<p><span class="dropcap-bg">T</span>his post represents the solution and explanation for <a href="/2013/02/13/quiz-8/index.html">quiz-8</a>.<br/>
Have a look at the quiz to understand the problem. </p>
<h3 id="quiz-review-solution">Quiz Review & Solution</h3>
<p>In this scenario, the MPLS Core team brings up a new link between PE-1 and P-4 routers. All links have same OSPF cost and, as observed in the diagram, so the new link will be chosen as best path between the PEs:</p>
<p><a href="/uploads/quiz-8.png" title="Quiz-8 LDP IGP Synchronization"><img alt="ldp-igp-synchronization" src="/uploads/quiz-8.png" title="Quiz-8 LDP IGP Synchronization"/></a> </p>
<p>The problem hidden in this quiz is that the network engineer <em>forgot</em> to enable LDP on the new link. This mistake causes connectivity outage for the customers using MPLS VPN services because <red> the OSPF (the IGP inside the MPLS core) will choose this new link as the best path and the VPN packets will be sent untagged over it (since LDP is not <em>yet</em> configured)</red>. As a result, the core of the MPLS (P routers) will drop them (as they don't have information about VPNs). </p>
<p>I agree with your comments that it is very unlikely for a core engineer to forget this, but that quiz represents an introduction for this current article, and anyway, you should never exclude any kind of mistakes when implementing network changes (in the end, even the core engineer is human and prone to mistakes). </p>
<p>The <em><strong>solution</strong></em> to this quiz is to enable MPLS on the new interface (either using <blue><code>mpls ip</code></blue> at the interface level or via other methods such as <blue><code>mpls ldp autoconfig</code></blue> under OSPF process). </p>
<p>Contrary to what some people commented, the suggestion to enable LDP-IGP synchronization <em>could be a solution</em> by itself, alone, <em><u>only</u></em> in certain scenarios ! To cover all situations, LDP-IGP sync (<blue><code>mpls ldp sync</code></blue>) should be accompanied by LDP Autoconfiguration (<blue><code>mpls ldp autoconfig</code></blue>). I will present few test cases on this topic below.</p>
<table class="notes">
<tr><td class="notes-left">
<div class="hidden-xs">REVIEW</div><i class="icon-book-open icon-sm"> </i>
</td>
<td class="notes-right">
<green>MPLS LDP-IGP Synchronization</green> is a feature enabled under the IGP process (OSPF or IS-IS) that protects against packet loss when the <u>IGP peering is established <b>before</b> LDP label exchange</u> is completed (because, in these cases, packets will be sent untagged on that link, which will break MPLS VPN connectivity).
</td></tr></table>
<h3 id="test-cases">Test cases</h3>
<h4 id="ldp-missing-on-both-ends-of-the-link">LDP missing on both ends of the link</h4>
<div class="row">
<div class="col-md-6">
<pre>PE-1#
<blue>router ospf 1
mpls ldp sync</blue>
network 0.0.0.0 255.255.255.255 area 0
!
interface FastEthernet0/1
ip address 10.10.6.1 255.255.255.252
!</pre>
</div>
<div class="col-md-6">
<pre>P-4#
<blue>router ospf 1
mpls ldp sync</blue>
network 0.0.0.0 255.255.255.255 area 0
!
interface FastEthernet0/0
ip address 10.10.6.2 255.255.255.252
!</pre>
</div>
</div>
<div class="row warning">
<div class="col-xs-12 col-sm-2 warning-left">
WARNING<br><i class="icon-fire icon-lg"> </i>
</br></div>
<div class="col-xs-12 col-sm-10 warning-right">
The LDP-IGP synchronization feature <red><b>does not protect</b></red> against packet loss in this situation (and it does not represent a solution for our quiz !).
</div>
</div>
<p>Although sync is enabled, its status is <code>not required</code> so the protection is not triggered, as seen below: </p>
<div class="row">
<pre class="col-md-10">PE-1#<purple>sh ip osp nei</purple>
Neighbor ID Pri State Dead Time Address Interface
<red>4.4.4.4</red> 1 FULL/DR 00:00:35 10.10.6.2 FastEthernet0/1
2.2.2.2 1 FULL/DR 00:00:38 10.10.1.2 FastEthernet0/0
PE-1#<purple>sh ip osp mpls ldp int fa0/1</purple>
FastEthernet0/1
Process ID 1, Area 0
LDP is not configured through LDP autoconfig
<red>LDP-IGP Synchronization : <u>Not required</u></red>
Holddown timer is disabled
Interface is up
PE-1#
PE-1#<purple>sh mpls ldp igp sync</purple>
FastEthernet0/1:
<red>LDP not configured</red>; LDP-IGP Synchronization enabled.
Sync status: <red>sync not achieved</red>; peer reachable.
Sync delay time: 0 seconds (0 seconds left)
IGP holddown time: infinite.
IGP enabled: OSPF 1
</pre>
</div>
<div class="row">
<pre class="col-md-10">PE-1#sh ip cef 3.3.3.3
3.3.3.3/32, version 297, epoch 0, cached adjacency 10.10.6.2
...
tag rewrite with <red>Fa0/1, 10.10.6.2, tags imposed: {}</red>
CE-2#ping 192.168.1.1 repeat 100
Type escape sequence to abort.
Sending 100, 100-byte ICMP Echos to 192.168.1.1
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!<red>...................</red>
</pre>
</div>
<p>As shown, <em>the protection is not triggered</em> and from the moment when the OSPF adjancency is UP and the new best path towards PE-3 (3.3.3.3) is via the new link that has no LDP on it, the end-to-end customer connectivity is lost!</p>
<h4 id="ldp-missing-on-only-one-end-of-the-link">LDP missing on only one end of the link</h4>
<div class="row">
<div class="col-md-6">
<pre>PE-1#
<blue>router ospf 1
mpls ldp sync</blue>
network 0.0.0.0 255.255.255.255 area 0
!
interface FastEthernet0/1
ip address 10.10.6.1 255.255.255.252
<blue>mpls ip</blue>
!</pre>
</div>
<div class="col-md-6">
<pre>P-4#
<blue>router ospf 1
mpls ldp sync</blue>
network 0.0.0.0 255.255.255.255 area 0
!
interface FastEthernet0/0
ip address 10.10.6.2 255.255.255.252
!</pre>
</div>
</div>
<p>The LDP-IGP synchronization <blue><em>does protect against the packet loss in this situation</em></blue>. It is interesting to note that the <red><em>OSPF adjancency does not come up</em></red> (on the side that has MPLS enabled) because LDP-IGP synchronization will keep the interface down for OSPF process: </p>
<div class="row">
<pre class="col-md-10">PE-1#<purple>sh ip ospf int br</purple>
Interface PID Area IP Address/Mask Cost State Nbrs F/C
Fa0/1 1 0 10.10.6.1/30 1 <red>DOWN 0/0</red>
Lo0 1 0 1.1.1.1/32 1 LOOP 0/0
Fa0/0 1 0 10.10.1.1/24 1 BDR 1/1
PE-1#
PE-1#sh ip ospf mpls ldp interface fa0/1
FastEthernet0/1
Process ID 1, Area 0
LDP is not configured through LDP autoconfig
LDP-IGP Synchronization : <green>Required</green>
Holddown timer is not configured
<green><u>Interface is down and pending LDP</u></green>
PE-1#
PE-1#<purple>sh mpls ldp igp sync</purple>
FastEthernet0/1:
<green>LDP configured; LDP-IGP Synchronization enabled.
Sync status: sync not achieved; peer reachable.</green>
Sync delay time: 0 seconds (0 seconds left)
IGP holddown time: infinite.
IGP enabled: OSPF 1</pre>
<pre class="col-md-10">PE-1#sh ip cef 3.3.3.3
3.3.3.3/32, version 279, epoch 0, cached adjacency 10.10.1.2
...
tag rewrite with <green>Fa0/0, 10.10.1.2, tags imposed: {18}</green>
CE-2#ping 192.168.1.1 repeat 100
Type escape sequence to abort.
Sending 100, 100-byte ICMP Echos to 192.168.1.1
<green>!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!</green>
</pre>
</div>
<p>Note that the LDP-IGP <strong><em>sync is enabled & required</em></strong> on the side where MPLS is configured, and <em><strong>sync is not achieved</strong></em> <u>which triggers the protection</u> ! </p>
<h4 id="ldp-initially-enabled-at-both-ends-and-removed-later">LDP initially enabled at both ends and removed later</h4>
<div class="row">
<div class="col-md-6">
<pre>PE-1#
<blue>router ospf 1
mpls ldp sync</blue>
network 0.0.0.0 255.255.255.255 area 0
!
interface FastEthernet0/1
ip address 10.10.6.1 255.255.255.252
<blue>mpls ip</blue>
!</pre>
</div>
<div class="col-md-6">
<pre>P-4#
<blue>router ospf 1
mpls ldp sync</blue>
network 0.0.0.0 255.255.255.255 area 0
!
interface FastEthernet0/0
ip address 10.10.6.2 255.255.255.252
<blue>mpls ip</blue>
!</pre>
</div>
</div>
<p>If the <blue><code>mpls ip</code></blue> command is later removed from any end of the link, then the connectivity is lost as the router will change its status to <strong><em>"sync is not required"</em></strong> and remove the protection. </p>
<p>Actually it is always recommended to use the <em>"LDP Autoconfiguration" feature</em> under the OSPF process by configuring the command <strong><code>(config-router)#<blue>mpls ldp autoconfig</code></strong></p> which will enable LDP on every interface associated with that IGP - OSPF in this case - <em>and will not allow manual removal with "no mpls ip" from individual interfaces!</em>.
<h4 id="ldp-session-initially-ok-but-going-down-at-a-later-time">LDP session initially ok but going down at a later time</h4>
<p>Let's see how the LDP-IGP Synchronization helps in situations when, for whatever reason, the LDP session goes down on a link that is part of the IGP best path between PEs. To simulate this, I'll use an ACL to block the LDP session: </p>
<div class="row">
<pre class="col-md-8">ip access-list extended DENY_LDP
deny udp any eq 646 any log
deny tcp any any eq 646 log
deny tcp any eq 646 any log
deny udp any any eq 646 log
permit ip any any
!
PE-1(config)#int fa0/1
PE-1(config-if)#<red>ip access-group DENY_LDP in</red>
!</pre>
</div>
<p>After this, the following will happen: </p>
<ul>
<li>LPD session between PE-1 (1.1.1.1) and P-4 (4.4.4.4) will go down</li>
<li>OSPF peering between them will continue to stay up</li>
<li>LDP-IGP synchronization will be activated and will make <strong><em>OSPF announce the max-metric</em></strong> on that link</li>
<li>OSPF will reconverge using the other link where LDP is UP (fa0/0 on PE-1) and VPN connectivity will <strong>not</strong> be lost</li>
</ul>
<div class="row">
<pre>PE-1(config-if)#
*Mar 1 02:41:46.987: %SEC-6-IPACCESSLOGP: <red>list DENY_LDP denied tcp 4.4.4.4(49095) -> 1.1.1.1(646)</red>, 1 packet
PE-1(config-if)#<purple>do sh mpl ldp discover</purple>
Local LDP Identifier:
1.1.1.1:0
Discovery Sources:
Interfaces:
FastEthernet0/0 (ldp): xmit/recv <red>!!! the only LDP peer is on Fa0/0 (P-2)</red>
LDP Id: 2.2.2.2:0
FastEthernet0/1 (ldp): xmit <red>!!! no LDP peer on Fa0/1 due to ACL</red>
PE-1(config-if)#<purple>do sh ip osp nei</purple>
Neighbor ID Pri State Dead Time Address Interface
4.4.4.4 1 FULL/DR 00:00:32 10.10.6.2 FastEthernet0/1
2.2.2.2 1 FULL/DR 00:00:36 10.10.1.2 FastEthernet0/0
PE-1(config-if)#<purple>do sh ip cef 3.3.3.3</purple>
3.3.3.3/32, version 241, epoch 0, cached adjacency 10.10.1.2
0 packets, 0 bytes
tag information set, shared
local tag: 18
fast tag rewrite with <red>Fa0/0, 10.10.1.2, <u>tags imposed: {18}</u></red>
...
PE-1(config-if)#**do sh ip osp database router 4.4.4.4**
OSPF Router with ID (1.1.1.1) (Process ID 1)
Router Link States (Area 0)
LS age: 476
Options: (No TOS-capability, DC)
LS Type: Router Links
<red>Link State ID: 4.4.4.4
Advertising Router: 4.4.4.4</red>
LS Seq Number: 8000003C
Checksum: 0xD224
Length: 72
Number of Links: 4
Link connected to: a Transit Network
(Link ID) Designated Router address: 10.10.6.2
(Link Data) Router Interface address: 10.10.6.2
Number of TOS metrics: 0
TOS 0 <red><u>Metrics: 65535</u></red>
...
PE-1(config-if)#do sh ip ospf mpls ldp int
FastEthernet0/1
Process ID 1, Area 0
LDP is configured through LDP autoconfig
LDP-IGP Synchronization : Required
Holddown timer is not configured
<red><u>Interface is up and sending maximum metric</u></red>
</pre>
</div>
<p>In the end, I'd like to mention that the quiz may also have another solution by using <strong><em>LDP Session Protection</em> via Targetted LDP Helloes</strong>... but this will be detailed in a separate post, maybe after another quiz ☻ </p>
<p><em>Thank you for your comments in the quiz!</em> </p>
<p><br/></p>Quiz #10 – OSPF on CE-PE links2013-03-17T00:00:00+00:00Costitag:costiser.ro,2013-03-17:2013/03/17/quiz-10/<p><span class="dropcap">C</span>ompany ABC has multiple offices interconnected via an MPLS provider and each office runs OSPF with a separate Area number.<br/>
In <em>office #2</em> there is a department that is colaborating with a PARTNER company and office router (R5) has a link connected to partner's router. For security reasons, it has been decided to split the routing tables into separate logical instances using VRFs, as displayed below: </p>
<p><br>
<a href="/uploads/quiz-10.png" title="Quiz 10 - CE-PE links"><img alt="quiz-10" src="/uploads/quiz-10.png" title="Quiz 10 - CE-PE links"/></a> </br></p>
<p>After you finish the configuration, you notice that there is <em><strong>no</strong> connectivity between office #1 and office #2</em> while connectivity with your PARTNER is ok.<br/>
You troubleshoot the OSPF and notice that adjacency with your PE is ok, the OSPF database contains the correct data but the routing table is not ok: </p>
<div class="row">
<pre class="col-md-10">R5#<purple>sh ip ospf nei</purple>
Neighbor ID Pri State Dead Time Address Interface
192.168.2.4 1 FULL/BDR 00:00:32 192.168.2.4 FastEthernet0/0
R5#
R5#<purple>sh ip ospf database</purple>
OSPF Router with ID (192.168.2.254) (Process ID 100)
Router Link States (Area 2)
Link ID ADV Router Age Seq# Checksum Link count
192.168.2.4 192.168.2.4 175 0x80000002 0x00224E 1
192.168.2.254 192.168.2.254 174 0x80000002 0x0097CA 3
Net Link States (Area 2)
Link ID ADV Router Age Seq# Checksum
192.168.2.5 192.168.2.254 174 0x80000001 0x00A0E5
Summary Net Link States (Area 2)
Link ID ADV Router Age Seq# Checksum
192.168.1.0 192.168.2.4 113 0x80000001 0x0017CB
192.168.1.254 192.168.2.4 113 0x80000001 0x0031B0
R5#
R5#<purple>sh ip route vrf MY_NETWORK</purple>
...
Gateway of last resort is not set
192.168.2.0/24 is variably subnetted, 3 subnets, 3 masks
C 192.168.2.48/28 is directly connected, FastEthernet0/1
C 192.168.2.4/31 is directly connected, FastEthernet0/0
C 192.168.2.254/32 is directly connected, Loopback0
R5#</pre>
</div>
<p><strong><em>What is the problem?</em></strong> </p>
<p><em>Post your answer in the ‘Comments’ section below and subscribe to this blog to get the detailed solution and more interesting quizzes.</em> </p>
<p><br/></p>Catalyst MLS QOS - part I2013-03-14T00:00:00+00:00Costitag:costiser.ro,2013-03-14:2013/03/14/catalyst-mls-qos-part-i/<p><span class="dropcap-bg">A</span>fter a nice vacation, I come back with the solution for <a href="/2013/02/10/quiz-7/index.html">quiz-7</a>.<br/>
Have a look at the quiz to understand the problem. </p>
<h3 id="quiz-review">Quiz Review</h3>
<p>As indicated in the quiz, the network administrator discovered an access switch without MLS QOS (while all the other switches in the network were configured with it) and he took the initiative to enable it without thinking of the consequences.<br/>
As a result, from the moment he enabled MLS QOS, the switch started re-writing the DSCP bits of all packets and since no other QOS-related configuration was performed, <red><em>the switch reset all QOS values to zero</em></red>. This can be easily spotted with the <blue><code>show mls qos</code></blue> command: </p>
<div class="row">
<pre class="col-md-7">Access-4(config)#<purple>mls qos</purple>
Access-4(config)#^Z
Access-4#
Access-4#<purple>sh mls qos</purple>
<red>QoS is enabled
QoS ip packet dscp rewrite is enabled</red>
</pre>
</div>
<p>Since packets will have the DSCP reset to zero, the rest of the network will treat them in the best-effort class, which explains the latency problems.<br/>
The small trick hidden in the quiz is the question "<strong><em>Why everything was ok before his actions?</em></strong>". As most of you replied within the comments section, the answer is simple: <em>with MLS QOS disabled, the switch does not change anything because there's no concept of trust or untrust, so packets sent with various DSCP/COS/IPPrec settings were forwarded by the access switch and treated accordingly by all other devices in the network (of course, this means that marking was performed by the application running on the end host)</em>. </p>
<h3 id="qos-review">QOS Review</h3>
<p>Now, let's step back a little and review some things about QOS: </p>
<p>First of all, let's see various headers and the QOS bits within them (marked with red color):<br/>
<a href="uploads/QOS-bits.png" title="QOS bits in various headers"><img alt="QOS-bits" src="uploads/QOS-bits.png" title="QOS bits in various headers"/></a> </p>
<h4 id="classification">Classification</h4>
<p>To apply a different treatment to different traffic, the switch must distinguish packets from one another in a process called <em>classification</em> that results in generating a QOS label that will identify further actions which will be performed with the packet. </p>
<p><a href="/uploads/classification.png" title="QOS Classification"><img alt="classification" src="/uploads/classification.png" title="QOS Classification"/></a> </p>
<table class="notes">
<tr><td class="notes-left">
<div class="hidden-xs">NOTES</div><i class="icon-book-open icon-sm"> </i>
</td>
<td class="notes-right"><ul>
<li>The <blue>COS-to-DSCP</blue> and the <blue>IPPrec-to-DSCP maps</blue> have default values that may or may not be appropriate for your network.</li>
<li>The default <blue>DSCP-to-DSCP-mutation map</blue> and the default <blue>policed-DSCP map</blue> are null maps; they bind an incoming DSCP value to the same DSCP value.</li>
<li>The <blue>DSCP-to-DSCP-mutation map</blue> is the only map you apply to a specific port. All other maps apply to the entire switch.</li>
</ul>
</td></tr>
</table>
<h4 id="policing-and-marking">Policing and Marking</h4>
<p>As packets move through the switch, you can force them to comply with configured policies or profiles - policing determines if the a packet is <em><strong>in</strong></em> or <em><strong>out of profile</strong></em> (based on bandwidth/resource usage limits) and passes the result to marking process that will read configuration and take actions, such as: </p>
<ul>
<li>pass it further without modification</li>
<li>drop down the QOS label </li>
<li>drop it. </li>
</ul>
<p>If the packet is out of profile and a drop down action is speficied, the switch will use the <code>policed-DSCP map</code> to generate a new QOS label. </p>
<h4 id="queueing-and-scheduling">Queueing and Scheduling</h4>
<p>Since the total inbound bandwidth of all ports can exceed the bandwidth of the internal ring, <code>ingress queues</code> are used before packets are forwarded into the switch fabric. Later on, <code>outbound queues</code> are located/used to send them onto an egress port. </p>
<p>In a future post, I'll try to come with more explanations about queueing and scheduling, as this topic is rather large. </p>
<h3 id="quiz-solutions_1">Quiz Solutions</h3>
<p>In the end, let's get back to our quiz and say that the network administrator did not do a mistake by enabling MLS QOS, as <em>one of the QOS design rules</em> is to clasify and mark data applications as close to their sources as possible. <red>The mistake was his lack of knowledge about the DSCP re-write</red>.<br/>
Hosts and servers are capable of marking the COS and DSCP values but it is a question of whether to trust them or not. <blue><em>Without MLS QOS, the switch does not interfere with these markings.</em></blue><br/>
While enabling MLS QOS, the administrator needs to define the trust boundaries. Some devices can be trusted (fully or partilly/conditionally - see IP Phones) or untrusted. The access switches are the closest to the endpoints, so here is where you usually define your trust boundaries. </p>
<p><em><strong>Solutions</strong></em> to the quiz:</p>
<ul>
<li>trust the QOS markings received from the application running on the end-host (or use DSCP-to-DSCP mutation maps)</li>
<li>enable <em>DSCP Transparency Mode</em> using the <blue><strong><code>no mls qos rewrite ip dscp</code></strong></blue> command which will instruct the switch not to modify the DSCP values of the incomming packets</li>
<li>use QOS ACLs to match the desired traffic generated by the specific application and apply the DSCP values to it</li>
</ul>
<p><em>Thanks everyone for your comments in the quiz !</em> </p>
<p><br/></p>Quiz #9 – BGP peering over a Cisco ASA2013-02-20T00:00:00+00:00Costitag:costiser.ro,2013-02-20:2013/02/20/quiz-9/<p><span class="dropcap">Y</span>our company has 2 offices that are interconnected via a firewall (Cisco ASA) as shown below.<br/>
You received the task to configure a BGP session between the border routers of each office.<br/>
After performing the configuration shown below, you notice that the BGP peering does not come up. </p>
<p><br/></p>
<p><a href="/uploads/quiz-9.png" title="Quiz-9"><img alt="quiz-9" src="/uploads/quiz-9.png" title="Quiz-9"/></a> </p>
<p><br>
Suspecting that the problem could be related to the firewall, you check the Cisco ASA configuration and confirm that BGP traffic is allowed between the border routers. To be really sure, you perform a capture on each interface of the firewall:<br> </br></br></p>
<ul>
<li>
<p><strong><em>on the "office1" interface</em></strong>:
<div class="row">
<pre>ciscoasa# sh capt test-ins
3 packets captured
1: 00:57:13.134865 <purple>192.168.1.1.33736 > 192.168.2.1.179: S 3598735645:3598735645(0)</purple> win 16384 <mss 536,opt-19:3c8a0d9ea430ac1492a1f21cbf41220f,eol="">
2: 00:57:15.176718 <purple>192.168.1.1.33736 > 192.168.2.1.179: S 3598735645:3598735645(0)</purple> win 16384 <mss 536,opt-19:3c8a0d9ea430ac1492a1f21cbf41220f,eol="">
3: 00:57:19.139854 <purple>192.168.1.1.33736 > 192.168.2.1.179: S 3598735645:3598735645(0)</purple> win 16384 <mss 536,opt-19:3c8a0d9ea430ac1492a1f21cbf41220f,eol="">
3 packets shown
</pre>
</div></p>
</li>
<li>
<p><strong><em>on the "office2" interface</em></strong>:
<div class="row">
<pre>ciscoasa# sh capt test-out
3 packets captured
1: 00:57:15.176718 <purple>192.168.1.1.33736 > 192.168.2.1.179: S 4134390026:4134390026(0)</purple> win 16384 <mss 536,opt-19:3c8a0d9ea430ac1492a1f21cbf41220f,eol="">
2: 00:57:16.090052 <purple>192.168.2.1.52869 > 192.168.1.1.179: S 1806197614:1806197614(0)</purple> win 16384 <mss 536,opt-19:a115c6c0687490096359a370e9ea1955,eol="">
3: 00:57:19.139854 <purple>192.168.1.1.33736 > 192.168.2.1.179: S 4134390026:4134390026(0)</purple> win 16384 <mss 536,opt-19:3c8a0d9ea430ac1492a1f21cbf41220f,eol="">
3 packets shown
ciscoasa#
</pre>
</div></p>
</li>
</ul>
<p>As seen in the captures, <em>TCP SYN packets for BGP (port 179) are received on "office1" interface and allowed/forwarded onto the "office2" interface... <red>though, the BGP peering does not get established</red></em>. </p>
<p><strong><em>What is the problem ?</em></strong> </p>
<p><em>Post your solution in the ‘Comments’ section below and subscribe to this blog to get the solution and more interesting quizzes.</em> </p>
<p><br/></p>Recursive Routing with Tunnels - study case: GRE over IPsec2013-02-17T00:00:00+00:00Costitag:costiser.ro,2013-02-17:2013/02/17/recursive-routing/<p><span class="dropcap-bg">T</span>his post represents the solution and explanation for <a href="/2013/02/03/quiz-6/index.html">quiz-6</a>.<br/>
Have a look at the quiz to understand the problem. </p>
<p>This quiz was made more difficult with the addition of IPsec configuration, which was intended to make it more close to real life scenarios - remember: real networks run a lot of features blended together ! </p>
<h3 id="quiz-review">Quiz Review</h3>
<p>Before explaining the problem, let's briefly discuss this scenario, as per below diagram: </p>
<p><a href="/uploads/GRE_over_IPsec.png" title="Recursive Routing Problem with Tunnels"><img alt="GRE_over_IPsec" src="/uploads/GRE_over_IPsec.png" title="Recursive Routing Problem with Tunnels"/></a><br> </br></p>
<ul>
<li>the scenario uses a point-to-point GRE tunnel over IPsec</li>
<li>GRE (Generic Route Encapsulation) is a protocol that "carries" other passanger protocols - for example, IP multicast, which allows us to run routing protocols</li>
<li>with GRE over IPsec, all traffic between GRE endpoints is encrypted by IPsec</li>
<li>this scenarios uses the method of crypto-maps - only one line in the crypto ACL is enough: the GRE protocol (IP Prot 47) between the endpoints</li>
<li>crypto-maps is the "old" method as opposed to newer method using VTIs (Virtual Tunnel Interface) - VTIs are more flexible with the folloing benefits: simpler configuration, support multicast, capable of running IP routing protocols, etc</li>
</ul>
<p>Going back to the quiz, the problem within this scenario is a typical recursive routing ! <br/>
Depending on your level of logging, you may be able to see the following entries in the logs: </p>
<div class="row">
<pre>HQ-Router#
*Mar 1 00:01:39.027: %LINEPROTO-5-UPDOWN: Line protocol on Interface Tunnel1, changed state to up
*Mar 1 00:01:39.587: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 100: Neighbor 172.16.255.2 (Tunnel1) is up: new adjacency
HQ-Router#
HQ-Router#
*Mar 1 00:01:48.031: <red>%TUN-5-RECURDOWN: Tunnel1 temporarily disabled due to recursive routing</red>
*Mar 1 00:01:49.027: %LINEPROTO-5-UPDOWN: Line protocol on Interface Tunnel1, changed state to down
*Mar 1 00:01:49.063: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 100: Neighbor 172.16.255.2 (Tunnel1) is down: interface down</pre>
</div>
<div class="row warning">
<div class="col-xs-12 col-sm-2 warning-left">
REMEMBER<br><i class="icon-fire icon-lg"> </i>
</br></div>
<div class="col-xs-12 col-sm-10 warning-right">
<b><i><red>Recursive Routing</red></i></b> occurs whenever the tunnel destination address is learned, by a dynamic routing protocol, via the tunnel itself !
</div>
</div>
<p><em>Before the tunnel is established</em>:
<div class="row">
<pre class="col-md-7">Remote-Router#sh ip cef 155.1.1.1
<green>0.0.0.0/0</green>, version 7, epoch 0, cached adjacency to Serial0/0
0 packets, 0 bytes
via 155.2.2.1, 0 dependencies, recursive
<green>next hop 155.2.2.1, Serial0/0 via 155.2.2.0/30</green>
valid cached adjacency
</pre>
</div></p>
<p><em>Tunnel is UP and tunnel destination is learned by routing protocols running over the tunnel</em>
<div class="row">
<pre class="col-md-7">Remote-Router#sh ip cef 155.1.1.1
<red>155.1.1.0/30</red>, version 29, epoch 0
0 packets, 0 bytes
<red>via 172.16.255.1, Tunnel1</red>, 0 dependencies
next hop 172.16.255.1, Tunnel1
valid adjacency
</pre>
</div></p>
<p>In order to solve recursive routing problems, we need to ensure that tunnel destination is <strong><em>never</em></strong> learned via the tunnel itself - let's see some solutions: </p>
<h3 id="1-do-not-advertise-the-tunnel-destination-into-the-routing-protocol">1. Do not advertise the tunnel destination into the routing protocol</h3>
<p>From design perspective, especially when deploying the tunnels over a private WAN, it's desired to split the addressing so that WAN addresses (used as tunnel destinations) will not leak into the routing protocol. <br/>
In our quiz, we don't have a private WAN and the tunnel destinations are public IP addresses, but the <strong><em>error</em> </strong> exists in the command <red><b><code>network 0.0.0.0</code></b></red> that is configured under the EIGRP routing process => the first solution would be to configure more specific network commands that do <strong>not</strong> contain the tunnel destination, like below: </p>
<p><em>on the HQ Router</em>
<div class="row">
<pre class="col-md-7">HQ-Router#sh run | s router eigrp
router eigrp 100
passive-interface default
no passive-interface FastEthernet0/0
no passive-interface FastEthernet0/1
no passive-interface FastEthernet1/0
no passive-interface Tunnel1
<purple>network 172.16.0.0</purple>
no auto-summary</pre>
</div></p>
<p><em>on the Remote Offce</em>
<div class="row">
<pre class="col-md-7">Remote-Router#sh run | s router eigrp
router eigrp 100
passive-interface default
no passive-interface Tunnel1
<purple>network 172.16.0.0</purple>
no auto-summary</pre>
</div></p>
<h3 id="2-filter-tunnel-destinations-from-being-sentlearned-via-the-routing-protocol">2. Filter tunnel destinations from being sent/learned via the routing protocol</h3>
<p>When you have same addressing/subnets for tunnels as internal network (private WAN), the solution would be to filter tunnel destinations from being advertised (sent) or learned (received) over the IP routing protocol.<br/>
In the example below, we configure a distribute list that denies tunnel destinations from being learned via the EIGRP over the tunnel: </p>
<p><em>on the HQ Router</em>
<div class="row">
<pre class="col-md-7">HQ-Router#sh run | i access-list
<purple>access-list 10 deny 155.2.2.0</purple>
access-list 10 permit any
!
HQ-Router#sh run | s router eigrp
router eigrp 100
passive-interface default
no passive-interface FastEthernet0/0
no passive-interface FastEthernet0/1
no passive-interface FastEthernet1/0
no passive-interface Tunnel1
network 0.0.0.0
<purple>distribute-list 10 in</purple>
no auto-summary#</pre>
</div></p>
<p><em>on the Remote Office</em>
<div class="row">
<pre class="col-md-7">Remote-Router#sh run | i access-list
<purple>access-list 10 deny 155.1.1.0</purple>
access-list 10 permit any
Remote-Router#
Remote-Router#sh run | s router eigrp
router eigrp 100
passive-interface default
no passive-interface Tunnel1
network 0.0.0.0
<purple>distribute-list 10 in</purple>
no auto-summary
Remote-Router#</pre>
</div></p>
<h3 id="3-use-static-routes-for-tunnel-destinations">3. Use static routes for tunnel destinations</h3>
<p>Another solution is to configure static host routes for the tunnel destinations (/32 routes would be the most specific & static has the best administrative distance): </p>
<p><em>on the HQ Router</em>
<div class="row">
<pre class="col-md-8">HQ-Router#sh run | i ip route
ip route 0.0.0.0 0.0.0.0 155.1.1.2
<purple>ip route 155.2.2.2 255.255.255.255 155.1.1.2</purple>
</pre>
</div></p>
<p><em>on the Remote Route</em>
<div class="row">
<pre class="col-md-8">Remote-Router#sh run | i ip route
ip route 0.0.0.0 0.0.0.0 155.2.2.1
<purple>ip route 155.1.1.1 255.255.255.255 155.2.2.1</purple>
</pre>
</div></p>
<p><em>Thanks everyone for your comments in the quiz !</em> </p>
<p><br/></p>Quiz #8 – MPLS: inside Provider's Core2013-02-13T00:00:00+00:00Costitag:costiser.ro,2013-02-13:2013/02/13/quiz-8/<p><span class="dropcap">Y</span>ou are a network engineer of a company that provides MPLS services to more customers. As shown below, Customer ABC has 2 offices running EIGRP with PE routers in both locations. Inside the MPLS core, you run OSPF and MP-BGP between PE routers and everything runs smoothly. </p>
<p><br/></p>
<p><a href="/uploads/quiz-8.png" title="Quiz 8 - MPLS inside Provider Core"><img alt="quiz-8.png" src="/uploads/quiz-8.png" title="Quiz 8 - MPLS inside Provider Core"/></a> </p>
<p>In order to increase the redundancy of the CORE network, your team desides to add a new link between PE-1 (Fa0/1) and P-4 (Fa0/0) routers. </p>
<p>You finish the change and you verify that the OSPF on the new link is established. You're also happy with the OSPF routing after the MPLS Core converges (for info: all links are equal cost). You also double-check-ed that the <em>routing table for vrf VPNA <strong>did not change on both PE routers</strong></em> (it is same as in above picture). </p>
<div class="row">
<pre>PE-1#<purple>sh ip osp nei</purple>
Neighbor ID Pri State Dead Time Address Interface
4.4.4.4 1 FULL/DR 00:00:33 10.0.0.9 FastEthernet0/1
2.2.2.2 1 FULL/DR 00:00:36 10.0.0.1 FastEthernet0/0
PE-1#</pre>
</div>
<p>Soon after your change, you receive a ticket in which customer complains that connectivity between the two office is down: </p>
<div class="row">
<pre class="col-md-10">CE-2#ping 192.168.1.1
Sending 5, 100-byte ICMP Echos to 192.168.1.1, timeout is 2 seconds:
<red>.....
Success rate is 0 percent (0/5)</red>
CE-2#</pre>
</div>
<p><strong><em>What could be the wrong ?</em></strong> </p>
<p><em>Post your solution in the ‘Comments’ section below and subscribe to this blog to get the solution and more interesting quizzes.</em> </p>
<p><br/></p>Quiz #7 – MLS QOS2013-02-10T00:00:00+00:00Costitag:costiser.ro,2013-02-10:2013/02/10/quiz-7-mls-qos/<p><span class="dropcap">Y</span>ou have recently moved to a new company as a network administrator and you've started doing an audit of the existing network. </p>
<p>Your network uses an end-to-end QOS approach between multiple offices. Access switches trust QOS markings received from IP Phones and higher layer devices trust the markings received from access switches, as seen in diagram below.<br/>
<br><br/></br></p>
<p><a href="/uploads/quiz-7.png" title="Quiz-7"><img alt="quiz-7" src="/uploads/quiz-7.png" title="Quiz-7"/></a> </p>
<p>During your audit, you've discovered that <strong><em>MLS QOS is not enabled</em></strong> on one of the access switches in Office-1 (access-4):</p>
<div class="row">
<pre class="col-md-6">Access-4#<purple>sh mls qos</purple>
<red>QoS is disabled</red></pre>
</div>
<p>You checked entire configuration for <em>Access-4</em> switch and notice that everything is configured identical to all other existing access switches with the exception of the fact that MLS QOS is disabled globally on the switch. So, you take the initiative and enable MLS QOS on it. </p>
<p>Not so long after you did that, you are contacted by server team about the following: they run a LAB-1 SERVER connected to that switch (Access-4) that is used to test some new applications that are very sensitive to delay and jitter - and they complain that this application is reporting errors due to high latency that started few hours ago (aproximately same time you enabled QOS on the access switch). </p>
<p>You did not change anything on the port configuration. You only enabled QOS globally (as part of standard config for all switches).<br/>
<strong><em>Why communication between Lab-1 and Lab-2 servers experience high delays ? What is the solution to solve this problem ?</em></strong> </p>
<p><em>Post your answer in the ‘Comments’ section below and subscribe to this blog to get the detailed solution and more interesting quizzes.</em></p>
<p><br/></p>OSPF – P-bit Setting in Type-7 LSAs2013-02-07T00:00:00+00:00Costitag:costiser.ro,2013-02-07:2013/02/07/ospf-p-bit-in-type-7-lsa/<p><span class="dropcap-bg">T</span>his post represents the solution and explanation for <a href="/2013/01/29/quiz-5/index.html">quiz-5</a>.<br/>
Have a look at the quiz to understand the problem. </p>
<h3 id="quiz-explained">Quiz Explained</h3>
<p>At first look, the quiz seemed tricky because of the OSPFv3/IPv6, but in reality the same problem may affect OSPFv2 in the same manner. I used IPv6 intentionally to give the impression that the problem is related to OSPFv3 only. </p>
<p>As most of you already indicated in the <em>Comments</em> section, the junior engineer did make a mistake: he put/left the <strong>Vlan102 interface in area 0.0.0.0</strong> (clearly shown in the output of command <strong><code>show ipv6 ospf int br</code></strong>). As a result <strong><em><purple>Dist-2 router becomes an ABR</purple></em></strong> (Area Border Router). </p>
<p>You may think: "<em><blue>so, what?... why can't an ABR accept that default route received from another ABR (the COREs in our quiz)?</blue></em>" ... let's see why: </p>
<p>We all know that the motivation of having an NSSA is to allow external routes (learned from different routing domains) into that area and they will be carried as Type-7 LSA. </p>
<table class="notes">
<tr><td class="notes-left">
<div class="hidden-xs">NOTES</div><i class="icon-book-open icon-sm"> </i>
</td>
<td class="notes-right">As specified in RFC 1587, there are some bits in the <b><i>Options field</i></b> that are important in our context:<br><ul>
<li><blue><b>N-bit</b></blue> = used in the Hello packets to indicated that the router has NSSA capability on that interface (two routers will not form an ajancency unless they agree on the N-bit, meaning they are both configured as NSSA for that interface)</li>
<li><blue><b>P-bit</b></blue> = (P - propagate) only used in type-7 LSAs to tell the ABRs to translate that type-7 LSA into a type-5 LSA. <blue>This P-bit represents a routing loop prevention mechanism</blue>.</li>
</ul>
</br></td></tr>
</table>
<p>Have a look at the picture below: </p>
<p><a href="/uploads/NSSA_and_P-bit_in_type-7_lsa.png" title="NSSA and P-bit in Type 7 LSA"><img alt="NSSA_and_P-bit_in_type-7_lsa" src="/uploads/NSSA_and_P-bit_in_type-7_lsa.png" title="NSSA and P-bit in Type 7 LSA"/></a> </p>
<p>On the <em>left-hand side</em>, the ABR-1 and ABR-2 (COREs in our quiz) originate <strong>type-7 default route and they <red>MUST NOT set the P-bit</red></strong>. When these type-7 LSAs reach other ABRs, since they <strong>don't</strong> have the P-bit, they will <strong>not</strong> be considered for SFP calculations and will <strong>not</strong> get to the routing table. </p>
<p>On the <em>right-hand side</em>, the ASBR (connected to other routing domains) will send the external prefixes as type-7 LSAs <strong>with the P-bit set</strong>. These LSAs will be translated into type-5 LSAs on both ABRs.<br/>
<br/></p>
<p>Let's have a look at OSPF's database on our Dist-2: </p>
<div class="row">
<pre class="col-md-9">Dist-2#<purple>sh ipv ospf database</purple>
...
Type-7 AS External Link States (Area 192.168.1.0)
ADV Router Age Seq# Prefix
192.168.255.1 231 0x80000001 <red>::/0</red>
192.168.255.2 220 0x80000001 <red>::/0</red>
...
Dist-2#<purple>sh ipv ospf database nssa-external</purple>
OSPFv3 Router with ID (192.168.255.4) (Process ID 1)
Type-7 AS External Link States (Area 192.168.1.0)
LS age: 263
LS Type: AS External Link
Link State ID: 1
<green>Advertising Router: 192.168.255.1</green>
LS Seq Number: 80000001
Checksum: 0x6565
Length: 28
<green>Prefix Address: ::</green>
Prefix Length: 0, <red>Options: None</red>
Metric Type: 2 (Larger than any link state path)
Metric: 1
LS age: 252
LS Type: AS External Link
Link State ID: 1
<green>Advertising Router: 192.168.255.2</green>
LS Seq Number: 80000001
Checksum: 0x5F6A
Length: 28
<green>Prefix Address: ::</green>
Prefix Length: 0, <red>Options: None</red>
Metric Type: 2 (Larger than any link state path)
Metric: 1
Dist-2#<purple>sh ipv route ::/0</purple>
<red>% Route not found</red>
</pre>
</div>
<p>As you can see, Dist-2 knows the two default injected by cores in the OSPF database, but does not put them into the routing table - <em><blue>as an ABR, it does not accept type-7 LSA <u>without P-bit</u>, in order to avoid routing loops</blue></em>. </p>
<h3 id="solution">Solution</h3>
<p>Now, let's fix the mistake ( configure area 192.168.1.0 on Vlan102 interface) and check again: </p>
<div class="row">
<pre class="col-md-9">Dist-2(config)#int vlan 102
Dist-2(config-if)#<green>ipv6 ospf 1 area 192.168.1.0</green>
Dist-2(config-if)#end
Dist-2#
Dist-2#<purple>sh ipv route ::/0</purple>
...
<green>ON2 ::/0 [110/1]</green>
via FE80::C001:1CFF:FE3C:10, <green>FastEthernet0/1</green>
via FE80::C000:1CFF:FE3C:10, <green>FastEthernet0/0</green>
Dist-2#
</pre>
<pre class="col-md-9">Dist-2#<purple>sh ipv osp int br</purple>
Interface PID Area Intf ID Cost State Nbrs F/C
<green>Vl102 1 192.168.1.0 30 1 DR 0/0</green>
Vl201 1 192.168.1.0 35 1 DR 0/0
Vl200 1 192.168.1.0 34 1 DR 0/0
Vl105 1 192.168.1.0 33 1 DR 0/0
Vl104 1 192.168.1.0 32 1 DR 0/0
Vl103 1 192.168.1.0 31 1 DR 0/0
Vl101 1 192.168.1.0 29 1 DR 0/0
Vl100 1 192.168.1.0 28 1 DR 0/0
Fa0/1 1 192.168.1.0 5 1 DR 1/1
Fa0/0 1 192.168.1.0 4 1 DR 1/1
Lo0 1 192.168.1.0 27 1 LOOP 0/0
</pre>
</div>
<p><em>Voila!</em>... DIST-2 is not an ABR anymore - it's an internal OSPF router with all interfaces in NSSA area 192.168.1.0, so it accepts the default routes from ABRs (the core). </p>
<p>Interesting enough, if you have a look again at the OSPF database: </p>
<div class="row">
<pre class="col-md-9">Dist-2#<purple>sh ipv ospf database nssa-external</purple>
OSPFv3 Router with ID (192.168.255.4) (Process ID 1)
Type-7 AS External Link States (Area 192.168.1.0)
<green>Routing Bit Set on this LSA</green>
LS age: 543
LS Type: AS External Link
Link State ID: 1
Advertising Router: 192.168.255.1
LS Seq Number: 80000001
Checksum: 0x6565
Length: 28
Prefix Address: ::
Prefix Length: 0, <green>Options: None</green>
Metric Type: 2 (Larger than any link state path)
Metric: 1
<green>Routing Bit Set on this LSA</green>
LS age: 532
LS Type: AS External Link
Link State ID: 1
Advertising Router: 192.168.255.2
LS Seq Number: 80000001
Checksum: 0x5F6A
Length: 28
Prefix Address: ::
Prefix Length: 0, <green>Options: None</green>
Metric Type: 2 (Larger than any link state path)
Metric: 1
</pre>
</div>
<p>you'll notice the additional information <blue><code>Routing Bit Set on this LSA</code></blue> (<strong><em>that did not exist before !!</em></strong>).
This is a Cisco implementation to indicate that the <blue><strong>LSA is valid</strong></blue> (passed all sanity checks) for SFP calculation. </p>
<div class="row warning">
<div class="col-xs-12 col-sm-2 warning-left">
ATTENTION<br><i class="icon-fire icon-lg"> </i>
</br></div>
<div class="col-xs-12 col-sm-10 warning-right">
This "routing-bit" is <b>not</b> an actual bit in the LSA Option field, it is stored only locally, <b>not</b> propagated to other OSPF routers!
</div>
</div>
<p>In the end, let's have a look how <strong>a type-7 LSA with P-bit SET</strong> looks like: suppose ASBR (Dist-1 in our case) redistributes the external prefix 2003:1:1:1::/64 </p>
<div class="row">
<pre class="col-md-7">Dist-2#<purple>sh ipv route 2003:1:1:1::/64</purple>
...
<green>ON2 2003:1:1:1::/64</green> [110/20]
via FE80::C001:1CFF:FE3C:10, FastEthernet0/1
via FE80::C000:1CFF:FE3C:10, FastEthernet0/0
Dist-2#
Dist-2#<purple>sh ipv ospf database nssa-external</purple>
...
<green>Routing Bit Set on this LSA</green>
LS age: 42
LS Type: AS External Link
Link State ID: 2
<green>Advertising Router: 192.168.255.3</green>
LS Seq Number: 80000001
Checksum: 0x89B4
Length: 36
<green>Prefix Address: 2003:1:1:1::</green>
Prefix Length: 64, <red>Options: P</red>
Metric Type: 2 (Larger than any link state path)
Metric: 20
</pre>
</div>
<p>As you can see, this type-7 LSA will be flooded into entire NSSA area and it has the P-bit set (also notice the "Routing Bit Set on this LSA"). </p>
<p><em>Thanks everyone for your comments in the quiz !</em> </p>
<p><br/></p>Quiz #6 – Routing protocols over IPsec2013-02-03T00:00:00+00:00Costitag:costiser.ro,2013-02-03:2013/02/03/quiz-6/<p><span class="dropcap">Y</span>our company is extending their network by opening a new Remote Office in a different city. In the HeadQuarters, you run EIGRP between buildings and the design team has decided to add the Remote Office into the EIGRP domain and to maintain the same level of protection/privacy over the Internet.<br/>
To achieve this you must configure an IPsec tunnel and run EIGRP over it - you've chosen to use the old method of crypto maps with a GRE tunnel that allows you to run dynamic routing protocols over IPsec. </p>
<p>You perform the configuration below: </p>
<p><a href="/uploads/quiz-6.png" title="Quiz 6 - Routing Protocols over IPsec"><img alt="Quiz-6" src="/uploads/quiz-6.png" title="Quiz 6 - Routing Protocols over IPsec"/></a> </p>
<p>and you see that you have a successfull IPsec tunnel between HQ and Remote Office and the EIGRP peering is also UP: </p>
<div class="row">
<pre>HQ-Router#ping 172.16.255.2
Sending 5, 100-byte ICMP Echos to 172.16.255.2, timeout is 2 seconds:
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 52/76/100 ms
HQ-Router#
HQ-Router#sh crypto isakmp sa
IPv4 Crypto ISAKMP SA
dst src state conn-id slot status
<green>155.1.1.1 155.2.2.2 QM_IDLE </green> 1001 0 ACTIVE
HQ-Router#
HQ-Router#sh crypto ipsec sa
local ident (addr/mask/prot/port): (155.1.1.1/255.255.255.255/47/0)
remote ident (addr/mask/prot/port): (155.2.2.2/255.255.255.255/47/0)
...
<green> #pkts encaps: 9, #pkts encrypt: 9</green>, #pkts digest: 9
<green> #pkts decaps: 8, #pkts decrypt: 8</green>, #pkts verify: 8
...
HQ-Router#
*Mar 1 01:35:27.435: %DUAL-5-NBRCHANGE: IP-EIGRP(0) 100: <green>Neighbor 172.16.255.2 (Tunnel1) is up: new adjacency</green>
HQ-Router#
HQ-Router#sh ip ei nei
IP-EIGRP neighbors for process 100
H Address Interface Hold Uptime SRTT RTO Q Seq
(sec) (ms) Cnt Num
<green>3 172.16.255.2 Tu1 14 00:00:06 84 5000 0 29</green>
2 172.16.3.2 Fa1/0 10 00:05:49 52 312 0 120
1 172.16.2.2 Fa0/1 13 00:05:51 53 318 0 119
0 172.16.1.2 Fa0/0 11 00:05:53 50 300 0 121
HQ-Router#</pre>
</div>
<p>After a while, you are aproached by the NOC telling you that the new IPsec tunnel and the EIGRP neighborship keep flapping up and down, impacting the connectivity between HQ and the remote office. </p>
<p><strong><em>What is the problem ?</em></strong> </p>
<p><em>Post your answer in the ‘Comments’ section below and subscribe to this blog to get the detailed solution and more interesting quizzes.</em> </p>
<p><br/></p>EBGP Peering Not Established over a Default Route2013-02-02T00:00:00+00:00Costitag:costiser.ro,2013-02-02:2013/02/02/bgp-over-a-default-route/<p><span class="dropcap-bg">T</span>his post represents the solution and explanation for <a href="/2013/01/22/quiz-4/index.html">quiz-4</a>.<br/>
Have a look at it to understand the problem. </p>
<p>As described in the quiz, you try to establish a BGP external peering between the border routers of two offices. These routers can reach each using the default route that each of them has via the ISP. </p>
<p>The quiz shows that, although the BGP speakers can reach each other, they do <em><strong>not</strong></em> establish an eBGP session.<br/>
<br>
<a href="/uploads/quiz-4.png" title="BGP Peering Not Established over a Default Route"><img alt="quiz-4" src="/uploads/quiz-4.png"/></a> </br></p>
<div class="row">
<pre class="col-md-10">Office-1#<purple>sh ip cef 2.2.2.2</purple>
<red>0.0.0.0/0</red>, version 9, epoch 0, cached adjacency to Serial0/0
0 packets, 0 bytes
via 155.1.1.2, 0 dependencies, recursive
next hop 155.1.1.2, Serial0/0 via 155.1.1.0/30
valid cached adjacency</pre>
</div>
<div class="row warning">
<div class="col-xs-12 col-sm-2 warning-left">
WARNING<br><i class="icon-fire icon-lg"> </i>
</br></div>
<div class="col-xs-12 col-sm-10 warning-right">
The key to the quiz is that the BGP speaker will <b>not</b> try to establish an EBGP session if/when its peer is reachable over a default route.<br>
</br></div>
</div>
<p>[UPDATE] According to the <a href="http://www.cisco.com/c/en/us/support/docs/ip/border-gateway-protocol-bgp/19167-bgp-rec-routing.html#solution" target="_blank">Cisco documentation</a> this is done in order to avoid route flapping and routing loops. </p>
<p>In this situation, <strong><purple>debug ip bgp events</purple></strong> doesn't show usefull information, but the problem could be easily spotted with <strong><blue>debug ip bgp ipv4 unicast</blue></strong>: </p>
<div class="row">
<pre>Office-1#<purple>deb ip bgp event</purple>
BGP events debugging is on
Office-1#
*Mar 1 00:12:39.931: BGP: Import timer expired. Walking from 1 to 1
*Mar 1 00:12:54.931: BGP: Import timer expired. Walking from 1 to 1
Office-1#
Office-1#deb ip bgp ipv4 unicast
BGP debugging is on for address family: IPv4 Unicast
Office-1#
*Mar 1 00:31:36.603: BGP: 2.2.2.2 active open failed - <red>no route to peer</red>, open active delayed 31267ms (35000ms max, 28% jitter)
*Mar 1 00:32:07.871: BGP: 2.2.2.2 active open failed - <red>no route to peer</red>, open active delayed 32532ms (35000ms max, 28% jitter)
Office-1#
</pre>
</div>
<p>Let's configure a static route on Office-1 router - soon after that the BGP session goes up: </p>
<div class="row">
<pre>Office-1(config)#<red>ip route 2.2.2.2 255.255.255.255 155.1.1.2</red>
Office-1(config)#end
Office-1#
Office-1#
*Mar 1 00:22:34.607: <red>%BGP-5-ADJCHANGE: neighbor 2.2.2.2 Up</red>
Office-1#
Office-1#sh ip bgp s
BGP router identifier 1.1.1.1, local AS number 65100
BGP table version is 1, main routing table version 1
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
2.2.2.2 4 65200 4 4 1 0 0 00:00:17 0
Office-1#
Office-1#
Office-1#sh ip bgp nei 2.2.2.2 | inc state|host
<red>BGP state = Established</red>, up for 00:00:41
Connection state is ESTAB, I/O status: 1, unread input bytes: 0
<red>Local host: 1.1.1.1, Local port: 53209</red>
Foreign host: 2.2.2.2, Foreign port: 179
Office-1#
Office-1#sh tcp brief
TCB Local Address Foreign Address (state)
6752A090 <red>1.1.1.1.53209 2.2.2.2.179</red> ESTAB
Office-1#
</pre>
</div>
<p>As you can see in both outputs <strong><code>sh ip bgp nei</code></strong> and <strong><code>sh tcp brief</code></strong>, Office-1 router initiates the TCP connection (<red>acts as a client</red>) - local port 53209 - to Office-2 router (2.2.2.2) on port 179 (that will act as a server). <em>Note that I added the static route only on 1.1.1.1 router - 2.2.2.2 router still uses the default route</em>. </p>
<p>Another interesting thing is that the route to the peer can as well be learned via BGP. There were people that replied in the quiz saying that "(quote) the route to the peer <strong><em>must not be</em></strong> default <strong>nor</strong> learned via BGP (endofqoute)" - <em>the 2nd part is not true</em> ! Here is an output showing that the BGP is established even though the route to the BGP peer is learned via another BGP session: </p>
<div class="row">
<pre>Office-2#<purple>sh ip bgp sum</purple>
...
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
<red>1.1.1.1 4 65100 10 10 2 0 0 00:05:50 0</red>
155.2.2.2 4 100 11 11 2 0 0 00:07:00 1
Office-2#
Office-2#<purple>sh ip route 1.1.1.1</purple>
Routing entry for 1.1.1.1/32
<red>Known via "bgp 65200"</red>, distance 20, metric 0
...
Office-2#
Office-2#<purple>sh tcp br</purple>
TCB Local Address Foreign Address (state)
650B7CB8 <red>2.2.2.2.55415 1.1.1.1.179</red> ESTAB
</pre>
</div>
<p>In this output, Office-2 router has an eBGP peering with the ISP and learns the IP of Office-1 (1.1.1.1) via this peering - which is enough to start a TCP connection to Office-1. </p>
<p><br>
<div class="row"><div class="col-xs-12">
<div class="panel panel-orange">
<div class="panel-heading"><i class="fa fa-binoculars"></i> Conclusion:</div>
<div class="panel-body">
<ul>
<li>a BGP speaker <red>will <strong>not</strong> initiate</red> the TCP session to establish a BGP peering <red>if the peer is reachable only <b>over a default route.</b></red><br>
You'll need a more specific route than default - it can br learned statically or dynamically (including via another BGP session - it's still ok)</br></li>
<li>a BGP speaker <blue>will accept/respond</blue> to a TCP session and will establish a BGP peering even if the peer is reachable over a default route. Thus, the non-default route is only needed on one side, but it's always recommended to exist on both sides</li>
<li>once established, the BGP peering will not be broken if the more specific route is lost and connectivity remains over the default route</li>
</ul>
</div>
</div>
</div></div></br></p>
<p><em>Thanks everyone for your comments in the quiz !</em>
<br/></p>Quiz #5 – OSPFv3 Default Route into a NSSA Area2013-01-29T00:00:00+00:00Costitag:costiser.ro,2013-01-29:2013/01/29/quiz-5/<p><span class="dropcap">Y</span>our company's network consist of a CORE module running OSPF Area 0 and multiple buildings with 2x distribution switches per building running OSPF NSSA areas. </p>
<p>You have asked your junior network administrator to configure OSPFv3 (for IPv6) to match the same design as OSPFv2 (for IPv4) in building 1 as per below diagram. The design requires that both COREs will inject a default route into the NSSA area, so each DIST will have 2 paths for default route. </p>
<p>After he finishes the configuration, your junior colleague realizes that there are default routes 0.0.0.0 for IPv4 (received from COREs) but there is <strong><red>no default ::/0 for IPv6</red></strong>. </p>
<p><a href="/uploads/quiz-5.png" title="Quiz-5"><img alt="quiz-5" src="/uploads/quiz-5.png" title="Quiz-5"/></a> </p>
<p>You perform various troubleshooting, including the commands below: </p>
<div class="row">
<pre class="col-md-11">Dist-2#<purple>sh ipv6 osp int br</purple>
Interface PID Area Intf ID Cost State Nbrs F/C
Vl102 1 0.0.0.0 30 1 DR 0/0
Lo0 1 192.168.1.0 27 1 LOOP 0/0
Vl201 1 192.168.1.0 35 1 DR 0/0
Vl200 1 192.168.1.0 34 1 DR 0/0
Vl105 1 192.168.1.0 33 1 DR 0/0
Vl104 1 192.168.1.0 32 1 DR 0/0
Vl103 1 192.168.1.0 31 1 DR 0/0
Vl101 1 192.168.1.0 29 1 DR 0/0
Vl100 1 192.168.1.0 28 1 DR 0/0
Fa0/1 1 192.168.1.0 5 1 DR 1/1
Fa0/0 1 192.168.1.0 4 1 DR 1/1
Dist-2#
Dist-2#<purple>sh ipv6 osp nei</purple>
Neighbor ID Pri State Dead Time Interface ID Interface
192.168.255.2 1 FULL/BDR 00:00:36 6 FastEthernet0/1
192.168.255.1 1 FULL/BDR 00:00:37 6 FastEthernet0/0
Dist-2#</pre>
</div>
<p><strong><em>Is there a problem with the design or your junior colleague did a mistake with the implementation ? What is the problem?</em></strong> </p>
<p><em>Post your solution in the ‘Comments’ section below and subscribe to this blog to get the solution and more interesting quizzes</em>.<br/>
<br/></p>NAT – Port Forwarding in Both Directions2013-01-26T00:00:00+00:00Costitag:costiser.ro,2013-01-26:2013/01/26/nat-port-forwarding-in-both-directions/<p><span class="dropcap">T</span>his post represents the solution and explanation for <a href="/2013/01/19/quiz-3/index.html">quiz-3</a>. Have a look at it to understand the problem. </p>
<p>NAT on Cisco IOS can be pretty frustrating sometimes ☻ and it may take time and practice to master it.<br/>
Let's consider <em>Company ABC</em> with internal network <strong>192.168.1.0/24</strong> and a /31 public IP (<strong>155.1.23.2</strong>) assigned to the external interface of its border router R2 (fa0/1). In order to have access to internet, PAT (Port Address Translation) is configured on the border router, overloading entire internal network into the public external interface IP address. </p>
<p>The following tests are performed from R1 (192.168.1.1): </p>
<ul>
<li>"<code>ping 3.3.3.3</code>" - see the first translation on R2 in picture below</li>
<li>"<code>telnet 3.3.3.3</code>" - see the 2nd translation on R2 below</li>
<li>"<code>telnet 3.3.3.3 12345</code>" - see the 3rd translation</li>
</ul>
<p><a href="/uploads/quiz-3-simple-nat-overloading.png" title="Simple PAT - NAT Overloading"><img alt="quiz-3 simple nat overloading" src="/uploads/quiz-3-simple-nat-overloading.png" title="Simple PAT - NAT Overloading"/></a> </p>
<p>As the quiz says, your users try to connect to external partner 3.3.3.3 on port 12345, the PAT translation occurs on R2 (see translation table and the "<code>debug ip nat detailed</code>"): </p>
<div class="row">
<pre>R1#<blue>telnet 3.3.3.3 12345</blue>
Trying 3.3.3.3, 12345 ...
<red>% Connection timed out; remote host not responding</red>
R2#
*Mar 1 00:44:02.151: NAT*: <red>i: tcp (192.168.1.1, 18714)</red> -> (3.3.3.3, 12345) [39406]
*Mar 1 00:44:02.151: NAT*: <red>s=192.168.1.1->155.1.23.2</red>, d=3.3.3.3 [39406]
R2#
*Mar 1 00:44:06.147: NAT*: i: tcp (192.168.1.1, 18714) -> (3.3.3.3, 12345) [39406]
*Mar 1 00:44:06.147: NAT*: s=192.168.1.1->155.1.23.2, d=3.3.3.3 [39406]
R2#
*Mar 1 00:44:14.143: NAT*: i: tcp (192.168.1.1, 18714) -> (3.3.3.3, 12345) [39406]
*Mar 1 00:44:14.143: NAT*: s=192.168.1.1->155.1.23.2, d=3.3.3.3 [39406]
R2#<blue>sh ip nat trans</blue>
Pro Inside global Inside local Outside local Outside global
tcp <red>155.1.23.2:18714 192.168.1.1:18714</red> 3.3.3.3:12345 3.3.3.3:12345</pre>
</div>
<p>but the ISP (R3) does not allow TCP port 12345 (seen also in the debug on R2: there's only small "i" and there's <strong>no "o"</strong>) </p>
<div class="row">
<pre>R3
*Mar 1 00:44:00.895: <red>%SEC-6-IPACCESSLOGP: list ACL_LOG denied</red> tcp 155.1.23.2(18714) -> <red>3.3.3.3(12345)</red>, 1 pack
</pre>
</div>
<p>Now comes the part when we do mistakes: we have to configure the outgoing <em>traffic towards partner server 3.3.3.3 on destination port 12345 to be translated to destination port 8080 (allowed by ISP)</em>. Most of us will use this logic: <br/>
<<<em><strong>outgoing traffic = traffic from inside to outside ... so I will use <code>ip nat inside source ...</code></strong></em>>> </p>
<p>which, unfortunately, is not doing what we want because <strong><code>ip nat inside source</code></strong> translates source IPs and ports for traffic received on inside interface, while in our case we want to translate destination port !! </p>
<p>My piece of advise: consider that you always translate sources (there are exceptions, but forget about them for now) then most of the times when you have to translate a destination IP or port, look at the problem from the opposite direction: </p>
<ul>
<li>if you have to change the destination IP and/or port when going from inside to outside, then use <code>ip nat outside source ...</code></li>
<li>if you have to change the destination IP and/or port when going from outside to inside, then use <code>ip nat inside source ...</code></li>
</ul>
<table class="notes">
<tr><td class="notes-left">
<div class="hidden-xs">NOTES</div><i class="icon-book-open icon-sm"> </i>
</td>
<td class="notes-right">
Please note that this logic "works" because static NAT is bi-directional ☺ !;
</td></tr></table>
<p>So, in this quiz, you have to translate the destination port for traffic going from internal clients (on inside) to partner server (on outside). </p>
<h4 id="solution-for-the-quiz">Solution for the quiz</h4>
<p><a href="/uploads/quiz-3-nat-outgoing-port-forwarding.png" title="NAT - outgoing port forwarding"><img alt="quiz-3 nat outgoing port forwarding" src="/uploads/quiz-3-nat-outgoing-port-forwarding.png" title="NAT - outgoing port forwarding"/></a> </p>
<p>Again, we perform the same tests from R1: "<code>ping 3.3.3.3</code>" , "<code>telnet 3.3.3.3</code>" and "<code>telnet 3.3.3.3 12345</code>". This time all of them are successfull (notice the "Open" reply for the telnet on port 12345, meaning that TCP is established): </p>
<div class="row">
<pre>R1#<purple>telnet 3.3.3.3 12345</purple>
Trying 3.3.3.3, <green>12345 ... Open</green>
R2(config)#<purple>do sh ip nat transl</purple>
Pro Inside global Inside local Outside local Outside global
tcp --- --- 3.3.3.3:12345 3.3.3.3:8080
tcp 155.1.23.2:44060 192.168.1.1:44060 <green>3.3.3.3:12345 3.3.3.3:8080</green>
icmp 155.1.23.2:16 192.168.1.1:16 3.3.3.3:16 3.3.3.3:16
tcp 155.1.23.2:46177 192.168.1.1:46177 3.3.3.3:23 3.3.3.3:23
R2(config)#
R2(config)#
*Mar 1 00:59:43.327: NAT*: <red>i</red>: tcp (192.168.1.1, 44060) -> <green>(3.3.3.3, 12345)</green> [14693]
*Mar 1 00:59:43.327: NAT*: TCP s=44060, <green>d=12345->8080</green>
*Mar 1 00:59:43.327: NAT*: s=192.168.1.1->155.1.23.2, d=3.3.3.3 [14693]
*Mar 1 00:59:43.375: NAT*: <red>o</red>: <green>tcp (3.3.3.3, 8080)</green> -> (155.1.23.2, 44060) [42254]
*Mar 1 00:59:43.375: NAT*: TCP <green>s=8080->12345</green>, d=44060
*Mar 1 00:59:43.375: NAT*: s=3.3.3.3, d=155.1.23.2->192.168.1.1 [42254]
*Mar 1 00:59:43.403: NAT*: i: tcp (192.168.1.1, 44060) -> (3.3.3.3, 12345) [14694]
*Mar 1 00:59:43.407: NAT*: TCP s=44060, d=12345->8080
*Mar 1 00:59:43.407: NAT*: s=192.168.1.1->155.1.23.2, d=3.3.3.3 [14694]
R3#
*Mar 1 00:59:42.083: %SEC-6-IPACCESSLOGP: list <green>ACL_LOG permitted tcp 155.1.23.2(44060) -> 3.3.3.3(8080)</green>, 1 packet
R3#
*Mar 1 01:02:51.783: %SEC-6-IPACCESSLOGDP: list ACL_LOG permitted icmp 155.1.23.2 -> 3.3.3.3 (8/0), 1 packet
R3#
*Mar 1 01:03:30.791: %SEC-6-IPACCESSLOGP: list ACL_LOG permitted tcp 155.1.23.2(46177) -> 3.3.3.3(23), 1 packet</pre>
</div>
<p>Notice in the debug output that there is traffic/translations from both sides "i" and "o". </p>
<h4 id="what-if-you-have-used-ip-nat-inside-source-instead">What if you have used <code>ip nat inside source</code> instead?</h4>
<p>Let's simulate this. This type of translation is used when you have servers in the DMZ within private addressing space and you want to make them reachable from the internet.<br/>
<em>For example</em>: you accept HTTP traffic on the router's external IP address (155.1.23.2 port 80) and translate both destination IP and port to the real/private IP and port of the DMZ server: </p>
<p><a href="/uploads/quiz-3-nat-incoming-port-forwarding.png" title="NAT - Incoming port forwarding"><img alt="quiz-3 nat incoming port forwarding" src="/uploads/quiz-3-nat-incoming-port-forwarding.png" title="NAT - Incoming port forwarding"/></a> </p>
<p>Let's generate some traffic from Internet (R3 = ISP) towards the DMZ server: </p>
<div class="row">
<pre>R3#<purple>telnet 155.1.23.2 80</purple>
Trying <green>155.1.23.2, 80 ... Open</green>
R2#<purple>sh ip nat trans</purple>
Pro Inside global Inside local Outside local Outside global
tcp --- --- 3.3.3.3:12345 3.3.3.3:8080
<green>tcp 155.1.23.2:80 192.168.1.33:8080</green> 155.1.23.3:21841 155.1.23.3:21841
tcp 155.1.23.2:80 192.168.1.33:8080 --- ---
*Mar 1 01:18:32.411: NAT*: o: tcp (155.1.23.3, 21841) -> <green>(155.1.23.2, 80)</green> [60062]
*Mar 1 01:18:32.411: NAT*: TCP s=21841, <green>d=80->8080</green>
*Mar 1 01:18:32.411: NAT*: s=155.1.23.3, <green>d=155.1.23.2->192.168.1.33</green> [60062]
*Mar 1 01:18:32.447: NAT*: i: tcp (192.168.1.33, 8080) -> (155.1.23.3, 21841) [49]
*Mar 1 01:18:32.447: NAT*: TCP s=8080->80, d=21841
*Mar 1 01:18:32.447: NAT*: s=192.168.1.33->155.1.23.2, d=155.1.23.3 [49]</pre>
</div>
<p>In the end, here is the output of "<strong>show ip nat trans</strong>" on the border router when all this kind of traffic is generated. <em>Take it as a practice exercise and try to identify what is each translation doing:</em> </p>
<div class="row">
<pre class="col-md-10">R2#<purple>sh ip nat trans</purple>
Pro Inside global Inside local Outside local Outside global
tcp --- --- 3.3.3.3:12345 3.3.3.3:8080
icmp 155.1.23.2:17 192.168.1.1:17 3.3.3.3:17 3.3.3.3:17
tcp 155.1.23.2:59833 192.168.1.1:59833 3.3.3.3:12345 3.3.3.3:8080
tcp 155.1.23.2:80 192.168.1.33:8080 155.1.23.3:21841 155.1.23.3:21841
tcp 155.1.23.2:80 192.168.1.33:8080 --- ---</pre>
</div>
<p><br><br/></br></p>Quiz #4 – BGP over ISP2013-01-22T00:00:00+00:00Costitag:costiser.ro,2013-01-22:2013/01/22/quiz-4/<p><span class="dropcap">Y</span>our company has more offices and each of them has a separate internet connection. The default route for each office points towards the ISP. Also, within each office you run iBGP using private AS numbers.<br/>
In the below diagram, you have Office-1 running AS 65100 and Office-2 running AS 65200. </p>
<p>You have been asked to configure an eBGP peering between Office-1 and Office-2. You did that by establishing a BGP session, using each border routers' loopbacks (as shown below). </p>
<p><a href="/uploads/quiz-4.png" title="Quiz #4"><img alt="quiz-4" src="/uploads/quiz-4.png"/></a> </p>
<p>Soon, you notice that the new eBGP peering does not come up. You perform the following troubleshooting: </p>
<div class="row"><pre>Office-1#<purple>sh ip bgp summ</purple>
BGP router identifier 1.1.1.1, local AS number 65100
BGP table version is 1, main routing table version 1
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd
<blue>2.2.2.2 4 65200 0 0 0 0 0 never </blue><red>Active</red>
Office-1#
Office-1#
Office-1#<purple>ping 2.2.2.2 source lo0</purple>
Type escape sequence to abort.
Sending 5, 100-byte ICMP Echos to 2.2.2.2, timeout is 2 seconds:
Packet sent with a source address of 1.1.1.1
!!!!!
Success rate is 100 percent (5/5), round-trip min/avg/max = 1/16/56 ms
Office-1#
Office-1#
Office-1#
Office-1#<purple>traceroute 2.2.2.2 source lo0</purple>
Type escape sequence to abort.
Tracing the route to 2.2.2.2
1 155.1.1.2 4 msec 4 msec 16 msec
2 155.2.2.1 32 msec * 52 msec
Office-1#
Office-1#
Office-1#
Office-1#<purple>sh ip cef 2.2.2.2</purple>
0.0.0.0/0, version 9, epoch 0, cached adjacency to Serial0/0
0 packets, 0 bytes
via 155.1.1.2, 0 dependencies, recursive
next hop 155.1.1.2, Serial0/0 via 155.1.1.0/30
valid cached adjacency</pre>
</div>
<p>As you can see, you can ping each others' loopback, but the BGP does not come up.<br/>
<em>Note that Office-2 Border Router displays similar output</em> </p>
<p><strong><em>What is the problem ?</em></strong> </p>
<p><em>Post your solution in the 'Comments' section below and subscribe to this blog to get the solution and more interesting quizzes.</em> </p>
<p><br/></p>Quiz #3 – NAT port redirection from inside to outside2013-01-19T00:00:00+00:00Costitag:costiser.ro,2013-01-19:2013/01/19/quiz-3/<p><span class="dropcap">A</span>s a network administrator of <em>Company ABC</em>, you've been requested to configure the internal clients (192.168.1.0/24) to connect to a <em><strong>partner server 3.3.3.3 on port 12345</strong></em>. Unfortunately, you discover that your ISP (R3) blocks traffic on that particular TCP port 12345 and allows only some well-known ports, including 8080. </p>
<p>After several discussions with the Partner Company, they have agreed to configure their server 3.3.3.3 to listen on port 8080 for this purpose. </p>
<p>Now, it appears there's another problem: behind R1, the applications are already configured to use port 12345 and it cannot be changed. As a result, you've been requested to configure <em><strong>port redirection</strong></em> on the border router (R2) for all traffic going to server 3.3.3.3 on port 12345 to be redirected to port 8080. </p>
<p>On the border router (R2) you already have NAT that is hidding the entire internal network behind the external interface of your router, as you can see below: </p>
<p><img alt="quiz-3" src="/uploads/quiz-3.png" title="Quiz #3"/><br/>
What command(s) you need on R2 so that traffic originating from your internal network going to external server 3.3.3.3 will translate the port from 12345 to 8080 ? </p>
<p><em>Post your solution in the comments section below and subscribe to this blog to get the solution and more interesting quizzes.</em> </p>
<p><br/></p>OSPF – Redistribution between different ospf processes2013-01-10T00:00:00+00:00Costitag:costiser.ro,2013-01-10:2013/01/10/redistribute-different-ospf-processes/<p><span class="dropcap-bg">T</span>his post represents the solution and explanation for <a href="/2012/12/30/quiz-2/index.html">quiz-2</a>.<br/>
Have a look at <a href="/2012/12/30/quiz-2/">the quiz</a> and try solving it before reading this post. </p>
<p>I will start this article with one of the most important rule in OSPF regarding the best path selection: for the same prefix, OSPF uses the following order: <purple><strong><em>O (intra-area) vs. O IA (inter-area) vs. O E1 (external type-1) vs. O E2 (external type-2)</em></strong></purple> </p>
<p>But in the above scenario, in regards to <code>net A (33.33.33.1/24 = Lo1 on R3)</code>, this rule is not very useful on R1 (or R2). Why?<br/>
Because <red><b><u>the routes do not belong to the same OSPF process</u></b></red>. Let's see: R1 has 2 paths for Net A: </p>
<ul>
<li><strong>net A (33.33.33.1/32) <span style="text-decoration: underline;">from R3</span> as <purple>intra-area (O) in OSPF process-id 1</purple></strong></li>
<li><strong>net A (33.33.33.1/32) <span style="text-decoration: underline;">from R2</span> as <purple>external (O E2) in OSPF process-id 2</purple></strong></li>
</ul>
<p><a href="/uploads/quiz-2_ospf.png" title="OSPF Redistribution"><img alt="OSPF Redistribution" src="/uploads/quiz-2_ospf.png" title="OSPF Redistribution"/></a></p>
<p>Since the process-id is different, R1 cannot apply the above mentioned rule, so in this case it treats them as coming from 2 different routing protocols. What happens when the router receives same prefix from 2 different routing protocols? It <em>compares the Administrative Distance</em>. </p>
<p>But in our case, the <strong>AD is same (110) for both internal OSPF Pid 1 and external OSPF Pid 2</strong>... so, it comes down to a race condition: whatever route is learned first it will be inserted into the routing table. </p>
<p>Now, the above process occurs also on R2, not only on R1... So, what happens if the OSPF database exchange with R3 takes longer on R1 (comparing to R2). All this results in suboptimal routing on R1 (or R2) seen here: </p>
<div class="row"><pre class="col-sm-8">R1#trace 33.33.33.1
Type escape sequence to abort.
Tracing the route to 33.33.33.1
1 172.16.1.2 20 msec 44 msec 16 msec
2 192.168.1.3 40 msec 56 msec 20 msec</pre></div>
<p>The problem becomes worse if the source of Net A (33.33.33.1/32), R3, looses OSPF peering with R1/R2 as this will cause a routing loop for Net A (shutdown fa0/0 on R3, as shown in the <a href="/2012/12/30/quiz-2/" title="Quiz-2">quiz</a>): </p>
<div class="row"><pre class="col-sm-8">R4#traceroute 33.33.33.1
Type escape sequence to abort.
Tracing the route to 33.33.33.1
1 172.16.1.2 68 msec 20 msec 20 msec
2 192.168.1.1 36 msec 48 msec 32 msec
3 172.16.1.2 52 msec 32 msec 20 msec
4 192.168.1.1 20 msec 56 msec 52 msec
5 172.16.1.2 40 msec 60 msec 52 msec
6 192.168.1.1 108 msec 44 msec 68 msec
7 172.16.1.2 68 msec 96 msec 68 msec <black>...and so on...</black></pre>
</div>
<p>How could we solve this? Well, remember where the problem started in the first place: comparing the AD of internal OSPF Pid-1 with the AD of external OSPF Pid-2. To solve this problem there are more options, but let's start by configuring both OSPF processes to have a higher AD for external routes: </p>
<div class="row">
<div class="col-md-8">
<pre>router ospf 1
redistribute ospf 2 subnets
<purple>distance ospf external 222</purple>
!
router ospf 2
redistribute ospf 1 subnets
<purple>distance ospf external 222</purple></pre>
</div>
<div class="col-md-4">
<img alt="OSPF Redistribution" src="/uploads/ospf-redistribution-same-external-AD.png" title="OSPF Redistribution">
</img></div>
</div>
<p>This solves the initial problem, but introduces the same situation for any external prefixes learned via either OSPF Pid 1 or OSPF Pid 2 => those prefixes may be reached via suboptimal path or may loop indefinetely (see in the above picture the External Net X learned via OSPF Pid 1) </p>
<p>The solution to the new problem could be to configure different AD for the OSPF processes: </p>
<div class="row">
<div class="col-md-8">
<pre>router ospf 1
redistribute ospf 2 subnets
<purple>distance ospf external 199</purple>
!
router ospf 2
redistribute ospf 1 subnets
<purple>distance ospf external 222</purple></pre>
</div>
<div class="col-md-4">
<img alt="OSPF Redistribution - external AD" src="/uploads/ospf-redistribution-same-different-ext-AD.png" title="OSPF Redistribution - external AD">
</img></div>
</div>
<p>This will solve the problems for any external prefix (see Net X) learned via OSPF Pid-1, but will <em><strong>not</strong></em> solve the problem for any external prefix (see <red>Net Y</red> above) learned via OSPF Pid-2... so we can't find a general solution for all external prefixes learned via either OSPF Pid-1 or via OSPF Pid-2. </p>
<p>In the end, the solution will be to <strong><em>combine setting the administrative distance with a filtering method</em></strong>. You have you match the prefixes from one domain (with an ACL or a tag) and keep them from being re-advertised into the domain they originated from.<br/>
Let's see the solutions in detail: </p>
<h3 id="solution-1-different-ad-filtering-based-on-acls">Solution 1: different AD + filtering based on ACLs</h3>
<p>Using the above configuration, you configure ACLs matching all prefixes originated or learned in each OSPF process: </p>
<div class="row"><pre class="col-md-10">
router ospf 1
redistribute ospf 2 subnets route-map OSPF_DOMAIN_2
distance ospf external 222
!
router ospf 2
redistribute ospf 1 subnets route-map OSPF_DOMAIN_1
distance ospf external 222
!
route-map OSPF_DOMAIN_2 permit 10
match ip address 2
!
route-map OSPF_DOMAIN_1 permit 10
match ip address 1
!
access-list 1 permit 33.33.33.1
access-list 1 <net X> <black># all routes from OSPF Pid-1 (including external ones)</black>
!
access-list 2 <net Y> <black># all routes from OSPF Pid-2 (including external ones)</black></pre>
</div>
<table class="notes">
<tr><td class="notes-left">
<div class="hidden-xs">NOTES</div><i class="icon-book-open icon-sm"> </i>
</td>
<td class="notes-right"><ul>
<li><b><i>Pro</i></b>: it solves most of the problems
<li><b><i>Con</i></b>: you have to maintain long ACL with prefixes from each domain
</li></li></ul>
</td></tr>
</table>
<p><strong><em>Variant solution 1.1:</em></strong> instead of setting the AD for all external prefixes, you may opt to set the AD to prefixes identified by ACLs with command<br/>
<blue><code>distance 222 0.0.0.0 255.255.255.255 <acl></code></blue></p>
<h3 id="solution-2-filtering-based-on-route-tags">Solution 2: filtering based on route tags</h3>
<p>During the redistribution, you can set tags so all <blue>the prefixes that belong to same routing domain will share the same tag</blue>. Later, on the remote routers (second redistribution point), you will filter those prefixes based on tags. </p>
<p>Depending on whether the redistributed routing domains have the same AD (as in our case) or different (let's say: OSPF and EIGRP), you may need to perform filtering <em>while installing/accepting the tagged prefixes in the routing table</em> OR you may perform the filtering of the tagged prefixes <em>while performing a new redistribution</em> (back into the original domain). </p>
<p>Filtering the tagged routes while installing into the routing table works in all situations because if a route is <strong>not</strong> installed in the routing table, it will also <strong>not</strong> be redistributed back into the original domain=> hence you'll not create a routing loop. </p>
<div class="row">
<div class="col-md-7">
<pre>router ospf 1
redistribute ospf 2 subnets tag 2
distribute-list route-map DENY_TAG_2 in
!
route-map DENY_TAG_2 deny 10
match tag 2
route-map DENY_TAG_2 permit 20
!
!
router ospf 2
redistribute ospf 1 subnets tag 1
distribute-list route-map DENY_TAG_1 in
!
route-map DENY_TAG_1 deny 10
match tag 1
route-map DENY_TAG_1 permit 20</pre>
</div>
<div class="col-md-5">
<img alt="OSPF Redistribution with tags" src="/uploads/ospf-redistribution-using-tags.png" title="OSPF Redistribution with tags">
</img></div>
</div>
<table class="notes">
<tr><td class="notes-left">
<div class="hidden-xs">NOTES</div><i class="icon-book-open icon-sm"> </i>
</td>
<td class="notes-right"><ul>
<li><b><i>Pro</i></b>: less administrative overhead (you don't have to maintain ACLs/prefix-list for every route that is learned)
<li><b><i>Con</i></b>: R1 and R2 does not accept each other's prefixes from the different routing domain (OSPF processes) due to the distribute-list
</li></li></ul>
</td></tr>
</table>
<h4 id="variant-solution-21">Variant solution 2.1</h4>
<p>A variant solution is to perform the filtering during the redistribution (<purple><code>redistribute ospf <pid> subnets route-map DENY_TAG_X</code></purple>) instead of using distribute-list.<br/>
<ul>
<li><b><i>Pro</i></b>: R1 / R2 will accept each other's prefixes (see above Cons)
<li><b><i>Con</i></b>: suboptimal routing for R1/R2 due to the fact that they accepts each other's routing
</li></li></ul></p>
<h3 id="solution-3-filter-redistributed-prefixes-with-a-ad-of-255_1">Solution 3: Filter redistributed prefixes with a AD of 255</h3>
<p>In previous solution, redistribution routers were filtering the redistributed prefixes based on tags. This time, we can do the same filtering by setting the administrative distance to 255 - so, we'll setup the following rule: on all redistribution points/routers (in our case, R1 & R2), do <strong>not</strong> accept any prefix that is learned from the other/redundant redistribution routers: </p>
<div class="row">
<div class="col-md-6">
<pre>R1
router ospf 1
redistribute ospf 2 subnet
distance 255 2.2.2.123 0.0.0.0 ACL_DOMAIN_2
!
ip access-list stand ACL_DOMAIN_2 <black>... # prefixes learned via Pid-2</black>
!
router ospf 2
redistribute ospf 1 subnet
distance 255 2.2.2.124 0.0.0.0 ACL_DOMAIN_1
!
ip access-list stand ACL_DOMAIN_1 <black>... # prefixes learned via Pid-1</black></pre>
</div>
<div class="col-md-6">
<pre>R2
router ospf 1
redistribute ospf 2 subnet
distance 255 1.1.1.123 0.0.0.0 ACL_DOMAIN_2
!
ip access-list stand ACL_DOMAIN_2 <black>... # prefixes learned via Pid-2</black>
!
router ospf 2
redistribute ospf 1 subnet
distance 255 1.1.1.124 0.0.0.0 ACL_DOMAIN_1
!
ip access-list stand ACL_DOMAIN_1 <black>... # prefixes learned via Pid-1</black></pre>
</div>
</div>
<p>Again, same as the previous solution, this one also has the drawbacks of maintaining ACLs for each routing domain plus the fact that <red><b>R1/R2 don't backup each other</b></red> (because they don't accept each other's prefixes). </p>
<p>As always, posts about OSPF are veeery long, but hopefully useful.<br/>
Thanks for your comments on the <a href="/2012/12/30/quiz-2/">initial quiz</a>. </p>
<p><br/></p>MSTP – Inside a Region2013-01-06T00:00:00+00:00Costitag:costiser.ro,2013-01-06:2013/01/06/mstp-inside-a-region/<p><span class="dropcap-bg">T</span>his post represents the solution and explanation for <a href="/2012/12/27/quiz-1/index.html">quiz-1</a>.<br/>
Have a look at <a href="/2012/12/27/quiz-1/">the quiz</a> and try solving it before reading this post.</p>
<p><br>
Before jumping to the solution, let's recap some of the features of MSTP and I will start with the history of Spanning Tree Protocol:<br/>
<a href="/uploads/stp-history.png" title="STP History"><img alt="STP history" src="/uploads/stp-history.png" title="STP History"/></a></br></p>
<p>Multiple Spanning Tree (MST) is an IEEE standard derived from the Cisco proprietary Multiple Instances Spanning Tree Protocol (MISTP) implementation. The key things that you have to remember about 802.1 MSTP are here:<br/>
<a href="/uploads/dot1s-mstp.png" title="Understanding MSTP"><img alt="Understanding MSTP" src="/uploads/dot1s-mstp.png" title="Understanding MSTP"/></a></p>
<ul>
<li>it sends BPDU only for the IST (Internal Spanning Tree)</li>
<li>MST Instances (MSTi) info is piggybacked as special M-records</li>
<li>in order to be in the same region, switches needs to share <red>all of these</red>:<ul>
<li><code>configuration name</code></li>
<li><code>revision number</code></li>
<li><code>vlan-to-instance mapping</code></li>
</ul>
</li>
<li>instances:<ul>
<li><strong>IST (Internal Spanning Tree)</strong> = special instance, also called MSTI0/MST 0, that extends the Commont Spaning Tree (CST) inside the MST region and represents the entire region as a virtual CST bridge to the outside world</li>
<li>up to 15 <strong>MST Instances (MSTIs)</strong> = RSTP instances that exist only within a region</li>
</ul>
</li>
<li>CIST Regional Root can only be a <em>boundary switch</em> (the one with the lowest cost to the CIST Root) </li>
</ul>
<p><br>
<strong><green>Quiz Solution</green></strong></br></p>
<p>In quiz #1, the MSTP implementation does not take into account the following fact: <blue><strong>IST/MST0 instance is active on all ports inside a region</strong></blue> (this is a consequence of the fact that there are no individual BDPUs for each instance but, on the contrary, the BPDUs are sent only for IST while MSTI info is piggybacked into those BPDUs).<br/>
You can see this by running the command "<code>show span mst 0</code>" on CORE-2:</p>
<div class="row"><pre class="col-sm-9">CORE-2#sh span mst 0
##### MST0 vlans mapped: 1-99,200-4094
[...]
Interface Role Sts Cost Prio.Nbr Type
---------------- ---- --- --------- -------- --------------------------------
<red>Gi0/13 Root FWD 20000 128.13 P2p</red>
Gi0/14 Altn BLK 20000 128.14 P2p
Gi0/16 Desg FWD 20000 128.16 P2p
Gi0/19 Desg FWD 20000 128.19 P2p </pre>
</div>
<div class="row warning">
<div class="col-xs-12 col-sm-2 warning-left">
WARNING<br><i class="icon-fire icon-lg"> </i>
</br></div>
<div class="col-xs-12 col-sm-10 warning-right">
Although vlan 11 (part of IST/MST0) is not allowed on the trunk on Gi0/13 (only vlans 100-199 are allowed on it), IST does not care about this - <strong>IST just runs on all ports in the region !</strong>
</div>
</div>
<p>The best command that shows the problem is "<code>show span vlan 11</code>" on CORE-2:</p>
<div class="row"><pre class="col-sm-9">CORE-2#sh span vlan 11
MST0
[...]
Interface Role Sts Cost Prio.Nbr Type
------------------- ---- --- --------- -------- --------------------------------
Gi0/14 Altn BLK 20000 128.14 P2p
Gi0/16 Desg FWD 20000 128.16 P2p
Gi0/19 Desg FWD 20000 128.19 P2p </pre>
</div>
<p><red><em>Oooh, there's no ROOT port !!</em></red><br/>
Since Gi0/13 (root port for MST0) does not allow vlan 11 on its trunk, then it does not appear in the above output => this is the best evidence of the problem.</p>
<p>From the design point of view, when choosing MSTP for your network, <strong><em>you must remember the below rules:</em></strong></p>
<ul>
<li><red>map all vlans that you use in your network to some instances (don't leave them in IST)</red></li>
<li><red>vlans mapped to IST must be allowed on all links</red></li>
<li><red>don't manually prune individual vlans off a trunk - if you want to do this, then remove all vlans mapped to the same MST instance (not for IST - see above rule)</red></li>
</ul>
<p><strong><em>The solutions</em></strong> for this problem are (in order of preference):</p>
<ul>
<li>create another instance and map vlan 11 (plus other vlans used in the network) to it (let's say MST 2)</li>
<li>allow all vlans on Gi0/13 trunk</li>
<li>configure cost or port-priority for MST0 to force Gi0/14 to be ROOT port for MST0</li>
</ul>
<p>Thanks everyone for your interest and comments in the <a href="/2012/12/27/quiz-1/">quiz.</a>
<br><br/></br></p>Quiz #2 – OSPF Redistribution between different processes2012-12-30T00:00:00+00:00Costitag:costiser.ro,2012-12-30:2012/12/30/quiz-2/<p><span class="dropcap">R</span>edistribution between different or same routing protocols can become a nasty thing when there are 2 or more redistribution points. </p>
<p>In this quiz, I present you the following scenario with redistribution between 2 different OSPF processes on 2 routers (R1 and R2) - see below diagram and configuration:<br/>
<br><br/></br></p>
<p><a href="/uploads/quiz-2_ospf.png" title="quiz-2 OSPF"><img alt="quiz-2_ospf" src="uploads/quiz-2_ospf.png" title="quiz-2 OSPF"/></a> </p>
<p><br>
Although, at first glance, everything looks ok, this scenario poses some problems. Try to think of the problems from the perspective of <code>Net A (33.33.33.1/24)</code>.<br/>
Then reveal the hints below to understand when the problem will come to the surface: </br></p>
<div class="panel panel-default">
<div class="panel-heading panel-title">
<a class="accordion-toggle" data-parent="#accordion-1" data-toggle="collapse" href="#collapse-One">
Show me the first hint !
</a>
</div>
<div class="panel-collapse collapse" id="collapse-One">
<div class="panel-body">
<p>At some moment in time you'll notice that R1 is reaching Net A via the bottom OSPF Area:
<div class="row"><pre class="col-sm-9">R1#trace 33.33.33.1
Type escape sequence to abort.
Tracing the route to 33.33.33.1
1 172.16.1.2 20 msec 44 msec 16 msec
2 192.168.1.3 40 msec 56 msec 20 msec</pre></div>
<p><red><i>Why does R1 prefers R2 to reach Net A (lo1 on R3)?</i></red></p></p></div>
</div></div>
<div class="panel panel-default">
<div class="panel-heading panel-title">
<a class="accordion-toggle" data-parent="#accordion-1" data-toggle="collapse" href="#collapse-Two">
Show me the second hint !
</a>
</div>
<div class="panel-collapse collapse" id="collapse-Two">
<div class="panel-body">
<p>The cable connected to Fa0/0 on R3 gets unplugged. At this moment there is a <red><i>routing loop in your network</i></red>:
<div class="row"><pre class="col-sm-9">R4#traceroute 33.33.33.1
Type escape sequence to abort.
Tracing the route to 33.33.33.1
1 172.16.1.2 68 msec 20 msec 20 msec
2 192.168.1.1 36 msec 48 msec 32 msec
3 172.16.1.2 52 msec 32 msec 20 msec
4 192.168.1.1 20 msec 56 msec 52 msec
5 172.16.1.2 40 msec 60 msec 52 msec
6 192.168.1.1 108 msec 44 msec 68 msec
7 172.16.1.2 68 msec 96 msec 68 msec <black>...and so on...</black></pre></div>
<p></p></p></div>
</div></div>
<p><strong>How would you solve this problem?</strong> </p>
<p>Leave a comment with your opinion and subscribe to email newsletter to get the latest quizzes and the solutions to each of them.<br/>
<br/></p>Quiz #1 – MSTP2012-12-27T00:00:00+00:00Costitag:costiser.ro,2012-12-27:2012/12/27/quiz-1/<p><span class="dropcap">H</span>ere I come with my first quiz about a (rather common) MSTP misconfiguration that does not look very obvious on the first encounter.</p>
<p>This is my first quiz and I chose an MSTP misconfiguration scenario that could be easily overlooked on the first encounter.</p>
<p>You have a typical network with 2 COREs and 2 DIST switches running Multiple Spanning Tree Protocol. The server team requested that servers in vlans 100 to 199 will be considered <em>privileged</em>. You implemented MSTP in the following way:</p>
<ul>
<li>created a dedicated instance in MSTP (vlans 100-199 mapped to MST 1) and made CORE-2 as ROOT</li>
<li>left all other vlans in MST 0 with CORE-1 as ROOT</li>
<li>first trunk between CORES (core-1:Gi0/13 <> core-2:Gi0/13) was dedicated only to vlans 100-199 (<code>switchport trunk allowed vlan 100-199</code>)</li>
<li>second trunk between CORES (core-1:Gi0/14 <> core-2:Gi0/14) allows all vlans
<br/></li>
</ul>
<p><a href="/uploads/quiz-1_mstp.png" title="quiz-1_mstp"><img alt="quiz-1_mstp" src="/uploads/quiz-1_mstp.png" title="quiz-1_mstp"/></a></p>
<p>After your implementation, server team confirmed that all communications are ok (in both privileged vlans 100-199 and also in all other existing vlans). <br/>
<red>What's wrong with this design ?<red> <br/>
<br/></red></red></p>
<div class="panel panel-default">
<div class="panel-heading panel-title">
<a class="accordion-toggle" data-parent="#accordion-1" data-toggle="collapse" href="#collapse-Two">
Give me a hint about the problem !
</a>
</div>
<div class="panel-collapse collapse" id="collapse-Two">
<div class="panel-body">
<p>All good, everybody's happy ... until a cable got disconnected on DIST-2: the fiber between <code>DIST-2:Gi0/13 and CORE-1:Gi0/19</code>.
<br>No problem, you say, DIST-2 still has one uplink to CORE-2 ... but very soon you receive <red>complaints from admins that the 2 servers in vlan 11 are not able to communicate to each other anymore</red>.
</br></p>
</div>
</div>
</div>
<p>What do you think the problem is? What would be the best commands to troubleshoot this issue and on what device(s)?
<br><br/></br></p>Auto-RP and usage of "ip pim autorp listener"2012-05-27T00:00:00+01:00Costitag:costiser.ro,2012-05-27:2012/05/27/auto-rp-and-usage-of-ip-pim-autorp-listener/<p><span class="dropcap-bg">A</span>fter reading several forums with people arguing about what routers should be configured with the "<code>ip pim autorp listener</code>" command, I have decided to write this post explaining things in my own way (of course, with a lot of pictures, as usual).<br/>
Before starting, just a few words about Auto-RP: it's a legacy Cisco proprietary method for selecting the RP that uses two functional roles: the Candidate RPs (C-RP) and the Mapping Agent (MA).</p>
<p>Candidate RPs (C-RP):</p>
<ul>
<li>routers willing to be RP</li>
<li>announce themselves to the MA via 224.0.1.39</li>
</ul>
<p>Mapping Agent (MA):</p>
<ul>
<li>decide who will be the RP from the Candidates RP</li>
<li>will inform the rest of the network about the elected RP via 224.0.1.40</li>
</ul>
<p><strong>Auto-RP uses, in fact, <red>a DENSE mode fashion</red> for distributing the announces and discoveries.</strong> For this:</p>
<ul>
<li><strong>MA</strong> must listen for <strong>224.0.1.39</strong></li>
<li><strong>all</strong> multicast speaking routers must listen for <strong>224.0.1.40</strong></li>
</ul>
<p>Let's see how all works. Consider a topology with 4 routers connected in a daisy-chain fashion: R1 <--> R2 <--> R3 <--> R4.</p>
<h3 id="step-1">Step 1</h3>
<p>Configuring "<code><strong>ip pim sparse-mode</strong></code>" on all 4 routers will automatically install a <code><purple>(*,224.0.1.40)</purple></code> on all of them ( at least in IOS 12.4 - not tested in other IOSes !!! )</p>
<p><a href="/uploads/ip-pim-sparse-mode.jpg" title="ip pim sparse-mode"><img alt="ip pim sparse-mode" src="/uploads/ip-pim-sparse-mode.jpg" title="ip pim sparse-mode"/></a></p>
<h3 id="step-2">Step 2</h3>
<p>Configure <em>R1</em> to be a Candidate-RP: "<code><strong>ip pim send-rp-announce Lo 0</strong></code>"</p>
<p>=> R1 announce itself as candidate RP via <purple><strong>224.0.1.39</strong></purple> in a DENSE-mode fashion (to all its neighbors):</p>
<p><a href="/uploads/ip-pim-send-rp-announce.jpg" title="ip pim send-rp-announce"><img alt="ip pim send-rp-announce" src="/uploads/ip-pim-send-rp-announce.jpg" title="ip pim send-rp-announce"/></a></p>
<div class="row warning">
<div class="col-xs-12 col-sm-2 warning-left">
OBSERVATION<br><i class="fa fa-bell"></i>
</br></div>
<div class="col-xs-12 col-sm-8 warning-right">
<red><strong>R2 will not forward these announcements !!</strong></red>
</div>
</div>
<h3 id="step-3">Step 3</h3>
<p>Configure <em>R2</em> to be a Mapping Agent: "<code><strong>ip pim send-rp-discovery Lo 0</strong></code>"<ul class="list-unstyled">
<li>=> R2 selects the RP from all the candidates (in our case only R1)<br/>
<li>=> R2 informs <em><strong>everybody</strong></em> about the selected RP <strong>via 224.0.1.40</strong> in a DENSE-mode fashion
<li>=> R2 also forwards the announcements that it received from R1 (<strong>via 224.0.1.39</strong>) !! - this is because there may be redundant MAs in the network</li></li></li></ul></p>
<p><a href="/uploads/ip-pim-send-rp-discovery.jpg" title="ip pim send-rp-discovery"><img alt="ip pim send-rp-discovery" src="/uploads/ip-pim-send-rp-discovery.jpg" title="ip pim send-rp-discovery"/></a></p>
<div class="row warning">
<div class="col-xs-12 col-sm-2 warning-left">
OBSERVATION<br><i class="fa fa-bell"></i>
</br></div>
<div class="col-xs-12 col-sm-8 warning-right">
<red><strong>R3 will not forward these announcements !!</strong></red>
</div>
</div>
<h3 id="step-4">Step 4</h3>
<p>Configure <em>R3</em> to be a <em><strong>redundant</strong></em> Mapping Agent: "<code><strong>ip pim send-rp-discovery Lo 0</strong></code>"
<ul class="list-unstyled">
<li>=> R3 receives the announcements sent by R1 (because it happens to be neighbor with the MA-1, R2, that forwards them)
<li>=> R3 selects R1 as RP
<li>=> R3 informs <strong><strong>everybody</strong></strong> about the selected RP <strong>via 224.0.1.40</strong> in a DENSE-mode fashion
<li>=> same as R2, also R3 forwards the announcements that it received from R1 (<strong>via 224.0.1.39</strong>) !!</li></li></li></li></ul></p>
<p><a href="/uploads/ip-pim-send-rp-discovery-ma.jpg" title="ip pim send-rp-discovery-ma"><img alt="ip pim send-rp-discovery-ma" src="/uploads/ip-pim-send-rp-discovery-ma.jpg" title="ip pim send-rp-discovery-ma"/></a></p>
<p>At this moment, all 4 routers know the RP via Auto-RP with the following observations:</p>
<ul>
<li>R1 sees the source of info as being 2.2.2.2 (R2 = MA)</li>
<li>R2 sees the source of info as being 1.1.1.1 (R1 = C-RP)</li>
<li>R3 sees the source of info as being 1.1.1.1 (R1 = C-RP)</li>
<li>R4 sees the source of info as being 3.3.3.3 (R3 = redundant MA)</li>
</ul>
<div class="row"><div class="col-xs-12 col-md-10">
<div class="panel panel-orange">
<div class="panel-heading"><i class="fa fa-fire"> </i>Important to remember</div>
<div class="panel-body">
<i>I did <red><b>not</b></red> use the "<code><strong>ip pim autorp listener</strong></code>" command on any of the 4 routers !!!</i><br>
I only took advantage of the fact that the C-RPs and MAs are directly connected to all other routers and they forward the information for 224.0.1.39 & 224.0.1.40 in a DENSE fashion.
</br></div>
</div>
</div></div>
<p>If a 5th router had been connected to R4, then in order for R5 to receive the Auto-RP messages we would have needed to configure "ip pim autorp listener" on R4 !<br/>
<br/></p>
<p><strong>Conclusion</strong>: <red><strong>ip pim autorp listener</strong>" is <strong>not needed</strong> on routers that are adjacent to the MA !</red></p>
<p><strong>Recommendation</strong>: against above conclusion, <blue>use this command <strong>on all PIM routers running Auto-RP</strong> to avoid any headackes</blue>, especially during CCIE lab exam!
<br><br/></br></p>ACLs Supported on 3560&3750 Switches– Part II (VLAN Maps)2011-06-27T00:00:00+01:00Costitag:costiser.ro,2011-06-27:2011/06/27/acls-supported-on-3560-3750-switches-part-ii-vlan-maps/<p><span class="dropcap-bg">T</span>his post resumes the topic about <a href="/acls-supported-on-3560-3750-switches-part-i-port-acl.html" title="ACLs that you can apply on 3560 or 3750 series switches">ACLs that you can apply on 3560 or 3750 series switches</a>. It is going to be very brief and will only enumarate the most important things that you need to remember:
<br/></p>
<ul>
<li>they control all traffic in that VLAN, such as:<ul>
<li>bridged traffic within that particular VLAN</li>
<li>routed traffic INTO or OUT of that VLAN</li>
</ul>
</li>
</ul>
<table class="notes">
<tr><td class="notes-left">
<div class="hidden-xs">NOTES</div><i class="icon-book-open icon-sm"> </i>
</td>
<td class="notes-right">
VLAN Maps are <strong>the only way</strong> to filter traffic within a VLAN !
</td></tr>
</table>
<ul>
<li>they are <strong><strong>not</strong></strong> defined by direction</li>
<li>they are <strong><strong>only</strong></strong> processed in hardware (ACL fields that are not supported in hardware will be ignored)</li>
<li>logging is not supported</li>
</ul>
<div class="row"><div class="col-xs-12 col-md-8">
<div class="panel panel-red">
<div class="panel-heading"><i class="fa fa-bell"></i>IMPORTANT to remember</div>
<div class="panel-body">
Everytime you modify the ACL, you <strong>must</strong> re-apply the VLAN Map !
</div>
</div>
</div></div>
ACLs Supported on 3560 & 3750 Switches - Part I (Port ACL & Router ACL)2011-06-26T00:00:00+01:00Costitag:costiser.ro,2011-06-26:2011/06/26/acls-supported-on-3560-3750-switches-part-i-port-acl/<p><span class="dropcap-bg">A</span>fter a nice vacation, here I come again with a new post, this time about several ACL types that are supported on 3560 and 3750 series switches. As you will see, each type has its own features and restrictions. This post is the first part covering Port ACLs and Router ACLs. </p>
<p>Have a look at this overview:</p>
<p><a href="/uploads/acls-supported-on-switches.jpg" title="ACLs Supported on 3560 & 3750 Switches"><img alt="ACLs Supported on 3560 & 3750 Switches" src="/uploads/acls-supported-on-switches.jpg" title="ACLs Supported on 3560 & 3750 Switches"/></a></p>
<p>As you can see in the above chart, there are 3 types of ACLs supported on these switches:</p>
<ul>
<li><strong>Port ACLs</strong></li>
<li><strong>Router ACLs</strong></li>
<li><strong>VLAN ACLs (VLAN Maps)</strong></li>
</ul>
<p>Let's take them one by one.</p>
<h3 id="1-port-acls">1. Port ACLs</h3>
<p>Port ACLs are ACLs that are applied to Layer 2 interfaces on a switch.<br/>
They are supported <blue>only on physical interfaces (not on EtherChannel interfaces)</blue> and can be applied <blue>only in the inbound direction</blue>.
<table class="notes">
<tr><td class="notes-left">
<div class="hidden-xs">NOTES</div><i class="icon-book-open icon-sm"> </i>
</td>
<td class="notes-right"><ul>
<li>if you apply a port ACL to a trunk port, the ACL filters traffic on all VLANs present on the trunk port
<li>if you apply a port ACL to a port with voice VLAN, the ACL filters traffic on both data and voice VLANs
</li></li></ul>
</td></tr></table></p>
<p>Here is how you can use them: </p>
<h4 id="numbered-standard-ip-acl-acl-numbers-1-99-1300-1999">Numbered Standard IP ACL (ACL numbers 1-99, 1300-1999)</h4>
<div class="row">
<pre class="col-xs-12 col-md-8">(config)# access-list <purple>5 deny host</purple> 192.168.1.1
(config)# access-list <purple>5 permit any</purple></pre>
</div>
<h4 id="acl-numbers-100-199-2000-2699">(ACL numbers 100-199, 2000-2699)</h4>
<div class="row">
<pre>(config)# access-list <purple>105 deny tcp</purple> 192.168.1.0 0.0.0.255 172.16.1.0 0.0.0.255 eq telnet
(config)# access-list <purple>105 deny udp</purple> 192.168.1.0 0.0.0.255 172.16.1.0 0.0.0.255
(config)# access-list <purple>105 permit ip</purple> any any</pre>
</div>
<h4 id="named-standard-and-extended-acls">Named Standard and Extended ACLs</h4>
<div class="row">
<pre class="col-xs-12 col-md-8">(config)# ip access-list <purple>standard</purple> Stnd_ACL
(config-ext-nacl)# permit 192.168.1.1</pre>
</div>
<div class="row">
<pre>(config)# ip access-list <purple>extended</purple> Extd_ACL
(config-ext-nacl)# deny tcp 192.168.1.0 0.0.0.255 172.16.1.0 0.0.0.255 eq telnet
(config-ext-nacl)# deny udp 192.168.1.0 0.0.0.255 172.16.1.0 0.0.0.255
(config-ext-nacl)# permit ip any any</pre>
</div>
<h4 id="named-mac-extended-acls">Named MAC Extended ACLs</h4>
<p>As for the <red>non-IPv4 traffic</red>, this can be filtered using either a VLAN (VLAN Maps - see part II) or with a Port ACL applied to the Layer 2 interface by using named MAC extended ACLs.<br/>
The procedure is similar to that of configuring other extended named ACLs.<br/>
* You cannot apply named MAC extended ACLs to Layer 3 interfaces.<br/>
* You cannot use mac access-group command on EtherChannel ports</p>
<div class="row">
<pre class="col-xs-12 col-md-8">(config)# <purple>mac access-list extended</purple> ACL_MAC
(config-ext-macl)# permit any host 00c0.00a0.03fa netbios
(config-ext-macl)# deny any any 0x4321 0
(config)# interface gigabitethernet1/0/2
(config-if)# <purple>mac access-group</purple> ACL_MAC in</pre>
</div>
<h3 id="2-router-acls_1">2. Router ACLs</h3>
<p>There are not so many things to be said about the Router ACLs. Take a look to the above section and you'll see how to create standard or extended, numbered or named ACLs.</p>
<p>The important differences that you need to remember are:
<table class="notes">
<tr><td class="notes-left">
<div class="hidden-xs">NOTES</div><i class="icon-book-open icon-sm"> </i>
</td>
<td class="notes-right"><strong>Router ACLs</strong> (vs. Port ACLs):<ul>
<li>control traffic that is routed between VLANs (SVIs) or L3 interfaces
<li>may be applied in both direction (inbound and outbound)
</li></li></ul>
</td></tr></table></p>
<p>The last section, VLAN ACLs, will be discussed in next post:
<a href="/acls-supported-on-3560-3750-switches-part-ii-vlan-maps.html" title="ACLs Supported on 3560 & 3750 Switches - Part II (VLAN Maps)">ACLs Supported on 3560 & 3750 Switches - Part II (VLAN Maps)</a></p>OSPF on CE-PE links2011-05-27T00:00:00+01:00Costitag:costiser.ro,2011-05-27:2011/05/27/ospf-on-ce-pe-links-2/<p><span class="dropcap-bg">H</span>ere it comes a pretty long post that summarizes the characteristics of OSPF protocol when using it on a CE-PE link. For those without too much free time (myself included), you can take a fast look at the drawing above that serves as a fast summary / review.<br/>
If you want to find out more details, then go ahead and read this post till the end (more cool drawings will come ☺ ).<br/>
UPDATE: A special post was dedicated to <a href="/2013/04/15/ospf-on-pe-ce-links-and-the-understanding-the-down-bit/" title="Understanding the Down Bit in OSPF on CE-PE links">Understanding the Down Bit in OSPF on CE-PE links.</a></p>
<p><a href="/uploads/ospf-on-ce_pe-links1.jpg" title="OSPF on CE-PE links"><img alt="OSPF on CE-PE links" src="/uploads/ospf-on-ce_pe-links1.jpg" title="OSPF on CE-PE links"/></a></p>
<h3 id="some-basic-commentsfeatures">Some basic comments/features</h3>
<p>Let's start with reviewing some basic comments/features:</p>
<ul>
<li>MP-BGP is used to transport the routing info accross the MPLS backbone</li>
<li><blue>all internal routes (LSA 1, LSA 2, LSA 3) are carried as inter-area summary Type-3 LSAs</blue></li>
<li>PE routers appear as ABR (OSPF Area Border Routers) for the devices in customer OSPF domains</li>
<li>there are NO OSPF ADJACENCIES or flooding across the MPLS cloud (except when sham-links are used)</li>
<li>MP-BGP cloud can be seen as a "<blue><strong>superbackbone</strong></blue>" / "super area 0" that gives the following advantages:<ul>
<li>we may have non-zero areas at different VPN sites without the need of an area 0</li>
<li>we may have area 0 at different sites together with non-zero areas attached to the superbackbone</li>
</ul>
</li>
<li>OSPF information is carried across the MPLS VPN cloud using BGP extended communities</li>
</ul>
<p><a href="/uploads/ospf-on-ce-pe-links-overview.jpg" title="OSPF on CE-PE links - Overview"><img alt="OSPF on CE-PE links - Overview" src="/uploads/ospf-on-ce-pe-links-overview.jpg" title="OSPF on CE-PE links - Overview"/></a></p>
<h3 id="bgp-extended-communities">BGP extended communities</h3>
<p>The BGP extended communities used to carry OSPF information over the MPLS cloud (via MP-BGP) are:</p>
<ul>
<li><blue><strong>router-id</strong></blue> = router ID of the PE in the relevant VRF instance of OSPF</li>
<li><blue><strong>domain-id</strong></blue><ul>
<li>by default, equal to the OSPF process number</li>
<li>manually configured via command "domain-id"</li>
<li>if the domain ID of the route does not match the domain ID on the receiving PE the route is translated to the external OSPF route (<red><strong>LSA Type 5</strong></red>) with metric-type <red><strong>E2</strong></red></li>
</ul>
</li>
<li><blue><strong>OSPF route-type</strong></blue></li>
<li><blue><strong>MED (metric)</strong></blue><ul>
<li>routes sent across the MP-BGP cloud do not increment their metric</li>
<li>MED can be manually manipulated to influence path selection</li>
</ul>
</li>
</ul>
<p><a href="/uploads/show-ip-bgp-vpnv4.jpg" title="show ip bgp vpnv4"><img alt="show ip bgp vpnv4" src="/uploads/show-ip-bgp-vpnv4.jpg" title="show ip bgp vpnv4"/></a></p>
<h3 id="loop-prevention-mechanisms">Loop Prevention Mechanisms</h3>
<p>The addition of the super-backbone area also introduces the possibility of routing loops. In order to prevent them, several basic loop prevention rules apply:</p>
<h4 id="1-down-bit">1. "DOWN" Bit</h4>
<p>The DOWN Bit is used to prevent loops in multi-homed sites and it is automatically set in all summary LSA 3 (<red><strong>not in LSA 5</strong></red>) when routes are redistributed from MP-BGP (back) into OSPF.<br/>
When a prefix with DOWN bit set is received on an interface which is configured with VRF, that LSA is dropped - The correct sentence is: that LSA is not considered during the SPF calculations!</p>
<p><a href="/uploads/lsa-with-downn-bit-set.jpg" title="LSA with DOWN bit set"><img alt="LSA with DN bit set" src="/uploads/lsa-with-downn-bit-set.jpg" title="LSA with DN bit set"/></a></p>
<p>This feature has an undesirable effect when using VRF-lite in the customer cloud:</p>
<p><a href="/uploads/down-bit-and-multi-vrf.jpg" title="DOWN bit and multi-vrf"><img alt="DN bit and multi-vrf" src="/uploads/down-bit-and-multi-vrf.jpg" title="DN bit and multi-vrf"/></a></p>
<p>The solutions to this problem:<br/>
A) configure the command "<code><strong>capability vrf-lite</strong></code>" on the customer router(s), but this may not be supported on all IOS:</p>
<div class="row">
<pre class="col-xs-12 col-md-7">
router ospf 1 vrf VPN_A
capability vrf-lite</pre>
</div>
<p>B) configure different domain-IDs as this will force all redistributed routes to become external (LSA 5, thus bypassing the DN bit check)</p>
<p>UPDATE: There is another post that describes in better detail the usage of the Down bit: <a href="/2013/04/15/ospf-on-pe-ce-links-and-the-understanding-the-down-bit/" title="Understanding the Down Bit in OSPF on CE-PE links/">OSPF on CE-PE Links and Understanding the Down Bit</a></p>
<p><center><em> * </em></center></p>
<h4 id="2-route-tagging">2. Route Tagging</h4>
<p>The DOWN Bit helps preventing routing loops when Summary Type-3 LSAs are redistributed, but <em>not</em> when external routes are announced (<strong>the DN bit is not set on LSA Type 5/7</strong>). Instead, routes redistributed via a particular PE will carry the OSPF route tag which, by default, is the BGP AS number.<br/>
This tag is preserved when the external route is propagated across the entire OSPF domain (including redistribution into another OSPF domains).<br/>
The Loop Prevention mechanisms works this way: <em>when another PE receives this route and it sees that its own local AS number matches the AS number in the tag, it will ignore this LSA.</em><br/>
Here is the configuration for it:</p>
<div class="row">
<pre class="col-xs-12 col-md-7">
router ospf 1 vrf VPN_A
domain-tag 777</pre>
</div>
<p>or</p>
<div class="row">
<pre class="col-xs-12 col-md-7">
router ospf 1 vrf VPN_A
redistribute bgp 100 subnets tag 777</pre>
</div>
<p><a href="/uploads/ospf-route-tagging.jpg" title="OSPF Route tagging"><img alt="OSPF Route tagging" src="/uploads/ospf-route-tagging.jpg" title="OSPF Route tagging"/></a></p>
<h3 id="sham-link_1">Sham-Link</h3>
<p>Situation/problem description: when there is a backup link between sites (called <em>a backdoor</em>), this link will always be used for inter-site traffic because intra-area routes (LSAs received via the backdoor) are preferred to the inter-area (LSA 3 received from PE).<br/>
The solution to this problem is the usage of OSPF sham-links = special tunnel similar to virtual-links between PE routers and configured in the same area as the PEs.</p>
<p>Sham-links have the following characteristics:</p>
<ul>
<li>OSPF adjacency established via MPLS cloud</li>
<li>routes in the OSPF database are now seen as intra-area (even though they are received via the super-backbone)</li>
<li>the information across the sham-link is ONLY used for SPF calculations - the actual forwarding is being done based on the info learned via MP-BGP</li>
</ul>
<p><a href="/uploads/ospf-sham-link.jpg" title="OSPF sham-link"><img alt="OSPF sham-link" src="/uploads/ospf-sham-link.jpg" title="OSPF sham-link"/></a></p>
<p>One last note about sham-links: sham endpoints IP addresses should be advertised into the VRF by means other than OSPF (commonly via BGP) - known before creating the sham-links.</p>
<p>I hope you did not fell asleep reading this long post. At least the drawings helped you do a faster "reading".</p>Difference for PORTFAST & BPDUFILTER used globally or at interface-level2011-05-23T00:00:00+01:00Costitag:costiser.ro,2011-05-23:2011/05/23/subtle-difference-for-portfast-bpdufilter-used-together-globally-or-at-interface-level/<p><span class="dropcap-bg">P</span>ortfast + bpdufilter (used together) can be enabled globally or at interface level. Although the first impression is that the only difference is the global or per-interface effect, this is not entirely true. Let's start with a summary table:</p>
<p><a href="/uploads/subtle-difference-for-portfast-and-bpdufilter-used-together.jpg" title="Subtle difference for portfast and bpdufilter used together"><img alt="Subtle difference for portfast and bpdufilter used together" src="/uploads/subtle-difference-for-portfast-and-bpdufilter-used-together.jpg" title="Subtle difference for portfast and bpdufilter used together"/></a></p>
<p><em>Globally</em>:</p>
<div class="row"><pre class="col-xs-12 col-md-7">
(config)# <red>spanning-tree portfast bpdufilter default</red>
</pre></div>
<p><em>At interface level</em>:</p>
<div class="row"><pre class="col-xs-12 col-md-7">
(config)# interface x/x
(config-if)# spanning-tree portfast
(config-if)# <red>spanning-tree bpdufilter enable</red>
</pre></div>
<p>Although the first impression is that the only difference is the global or per-interface effect, this is not entirely true and another subtle and important difference is described below.</p>
<p>By default, <strong>a port configured with portfast is still sending out BPDUs</strong>. If you want portfast-enabled ports to stop sending BPDUs you may rush to use command:
<code>(config-if)# <strong>spanning-tree bpdufilter enable</strong></code> on the same interface.<br/>
While this gives you what you want (don't send BPDUs on portfast interfaces), you have the following problem: <strong>you disable completely STP on that port</strong>, meaning that you stop both sending and receiving BPDUs.</p>
<div class="row warning">
<div class="col-xs-12 col-sm-2 warning-left">
WARNING<br><i class="icon-fire icon-lg"> </i>
</br></div>
<div class="col-xs-12 col-sm-10 warning-right">
Disabling STP as shown above is <red><strong>NOT SAFE</strong> nor recommended</red> as it may result in STP loops (in case you connect a BPDU-enabled device on that port)!
</div>
</div>
<p>A better option (and here it comes up the subtle difference that I talked about) is to enable bpdufilter globally for all portfast-enabled ports: <code>(config)# <strong>spanning-tree portfast bpdufilter default</strong></code>.<br/>
This command stops, as well, sending BPDUs on the portfast interfaces, but in case a BPDU is received on that port, <strong>it will resume STP operations on it</strong>, thus preventing STP loops. If a BPDU is received, that port loses its portfast status immediatelly and starts following the STP rules/states.</p>Matching packets based on their size2011-05-16T00:00:00+01:00Costitag:costiser.ro,2011-05-16:2011/05/16/matching-packets-based-on-their-size/<p><span class="dropcap-bg">W</span>elcome on my very first post on my new fresh technical blog!<br>This post shows different ways of how to match packets based on their length. While this may not be very common in real production, you will find it useful during your CCIE preparations.<br>
Below is a summary of this post:</br></br></p>
<p><a href="/uploads/matching-packets-based-on-their-size.jpg" title="Matching packets based on their size"><img alt="Matching packets based on their size" src="/uploads/matching-packets-based-on-their-size.jpg" title="Matching packets based on their size"/></a></p>
<p>There are 3 ways (at least, those that I am aware of) to match packets by their size:</p>
<h3 id="1-modular-qos-cli-mqc">1. Modular QoS CLI (MQC)</h3>
<p>In the following example, we are given the task to limit to 500 Kbps all the packets that are over 1000 bytes in size. Using the MQC, you create a class-map matching the length of the packet, then create a policy-map to apply the police action to this class and finally apply it to on interface:</p>
<div class="row"><pre class="col-xm-12 col-md-8">
class-map match-all CLASS_BIG_PKTS
<red>match packet length min 1300</red>
policy-map MY_POLICY
class CLASS_BIG_PKTS
police 500000</pre></div>
<p>After applying the service-policy to an interface, you may tested by ICMP with packets of different sizes. To verify it, use the following command:</p>
<div class="row"><pre class="col-xs-12 col-md-8">
Router-1#<strong>sh policy-map inter fa0/0</strong>
FastEthernet0/0
Service-policy output: MY_POLICY
Class-map: CLASS_BIG_PKTS (match-all)
10 packets, 10150 bytes
5 minute offered rate 2000 bps, drop rate 0 bps
<purple>Match: packet length min 1000</purple>
police:
cir 500000 bps, bc 15625 bytes
<purple>conformed 10 packets, 10150 bytes; actions: transmit</purple>
exceeded 0 packets, 0 bytes; actions:
drop
conformed 2000 bps, exceed 0 bps
Class-map: class-default (match-any)
26 packets, 2310 bytes
5 minute offered rate 0 bps, drop rate 0 bps
Match: any
</pre></div>
<h3 id="2-policy-based-routing-pbr">2. Policy Based Routing (PBR)</h3>
<p>Using route-maps gives you also the possibility to match packets by their sizes. In the below example, all ICMP packets with size between 200 and 1200 will be dropped:</p>
<div class="row"><pre class="col-xs-12 col-md-8">
ip access-list extended ACL_ICMP
permit icmp any any
route-map PBR_ICMP permit 10
match ip address ACL_ICMP
<red>match length 200 1200</red>
set interface Null0
interface Serial0/0.1 multipoint
ip policy route-map PBR_ICMP
</pre></div>
<p>To verify it, initiate pings with diferrent sizes and then check the route-map on the router:</p>
<div class="row"><pre class="col-xs-12 col-md-8">
Router-1#<strong>sh route-map</strong>
route-map PBR_ICMP, permit, sequence 10
<purple> Match clauses:</purple>
ip address (access-lists): ACL_ICMP
interface FastEthernet0/0
<purple>length 200 1200</purple>
Set clauses:
interface Null0
Policy routing matches: 10 packets, 1290 bytes
</pre></div>
<h3 id="3-flexible-packet-matching-fpm">3. Flexible Packet Matching (FPM)</h3>
<p>Flexible Packet Matching is a solution that is belongs more to the Security rather than Routing & Switching and I'm not going to go into many details about it (maybe a separate post ?). Here's a quick referrence about it:</p>
<p>FPM is a set of classes and policies that provides pattern matching capability for more granular and customized packet filters for Layer 2 to 7, bit/byte matching capability, deep into the packet at any offset within the packet header and payload.</p>
<div class="row"><div class="col-xs-12 col-md-8">
<div class="panel panel-orange">
<div class="panel-heading"><i class="fa fa-binoculars"></i>Flexible Packet Matching - configuration steps:</div>
<div class="panel-body">
<ul class="list-unstyled">
<li>Step 1. Load the protocol header description file(s) (PHDF)
<li>Step 2. Define the protocol stack (IP-UDP, IP-TCP, etc.)
<li>Step 3. Define FPM match criteria filter (class-map)
<li>Step 4. Define action to take on classes (service-map)
<li>Step 5. Apply service policy to an interface
</li></li></li></li></li></ul>
</div>
</div>
</div></div>
<p>Here is the <a href="http://www.cisco.com/en/US/prod/collateral/iosswrel/ps6537/ps6586/ps6723/prod_white_paper0900aecd80633b0a.html">Cisco Documentation for FPM</a> !</p>