In my previous article I presented various encapsulation techniques used to extend Layer 2 reachability across separate networks using tunnels created with Open vSwitch.
Although the initial intention was to include some iperf test results, I decided to leave these for a separate post (this one!) because I hit few problems. While I was prepared to deal with MTU issues - always a topic when adding extra encapsulation - there were other things that I had to take care of.
Baseline
As always when trying to measure performance, you need a baseline that gives you a rough idea about the total capacity you have. In this case, I assigned the Host-Only adapter to the each of the network namespace so that they can reach each other directly.
After measuring the baseline, I followed the instructions in my previous post to create tunnels and re-do the performance measuring each time. In the end, I summarized the iperf results for all the tests performed.
Let's start ! On each of the vagrant box, I create a network namespace (left and right) and assign the first Host-Only Adapter (enp0s8) to them, thus achieving a direct connection.
# on vagrant box-1 # ----------------- # create a new namespace sudo ip netns addleft # reset enp0s8 interface and assign it to the namespace sudo ifconfig enp0s8 0.0.0.0 down sudo ip link set dev enp0s8 netns left# configure that interface inside the namespace sudo ip netns exec left ifconfig enp0s810.0.0.1/24 up# on vagrant box-2 # ---------------- # same as above, with different name and IP for the namespace sudo ip netns addright sudo ifconfig enp0s8 0.0.0.0 down sudo ip link set dev enp0s8 netns right sudo ip netns exec right ifconfig enp0s810.0.0.2/24 up# check the connectivity between left and right namespaces # from vagrant box-1 ubuntu@box-1 ~$ sudo ip netns exec left ping 10.0.0.2 PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data. 64 bytes from 10.0.0.2: icmp_seq=1 ttl=64 time=0.415 ms 64 bytes from 10.0.0.2: icmp_seq=2 ttl=64 time=0.264 ms ...
Now, let's perform an iperf test and record the results. On the 'left' namespace (vagrant box-1), run the server with command sudo ip netns exec left iperf -s
and on the 'right' start the iperf client:
ubuntu@box-2 ~$sudo ip netns exec right iperf -c 10.0.0.1 ------------------------------------------------------------ Client connecting to 10.0.0.1, TCP port 5001 TCP window size: 85.0 KByte (default) ------------------------------------------------------------ [ 3] local 10.0.0.2 port 46616 connected with 10.0.0.1 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec1.05 GBytes 1.80 Gbits/sec
NOTES
|
|
If you follow these steps to do your own tests, before you start creating the tunnels, you should undo the above configuration, that was needed for the baseline - the easiest way is to reboot the vagrant box. After reboot, create a GRETAP tunnel between two OVS bridges and repeat the iperf tests.
Considerations with Overlay Networks
Everytime you add extra encapsulation to your traffic, you have to think about the MTU: '
As I said in the beginning, I was expecting MTU problems and I hoped that I can deal with them - unfortunately, I was wrong ! Let's see what happened, chronologically:
First Test
After having the GRE tunnel up and connectivity working between the two network namespaces, I started the iperf server on the 'left' namespace with command ip netns exec left iperf -s
and the client on 'right' with command ip netns exec right iperf -c 10.0.0.1
. I waited for a minute and nothing was shown on the console. After few more minutes I opened a new ssh session and started doing tcpdump, but again, nothing there too. I left the command running and after
ubuntu@box-2 ~$sudo ip netns exec right iperf -c 10.0.0.1 ------------------------------------------------------------ Client connecting to 10.0.0.1, TCP port 5001 TCP window size: 85.0 KByte (default) ------------------------------------------------------------ [ 3] local 10.0.0.2 port 40708 connected with 10.0.0.1 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-931.1 sec 86.3 KBytes759 bits/sec
Solutions
Adjusting the MTU
My first action was to increase the MTU on the physical link between the vagrant boxes (the Host-only adapter between the Virtual Boxes) and I have chosen a value high enough to fit the GRE overhead, such as 1600
.
ubuntu@box-2 ~$sudo ip link set dev enp0s8 mtu 1600 ubuntu@box-2 ~$ ubuntu@box-2 ~$sudo ip netns exec right iperf -c 10.0.0.1 ------------------------------------------------------------ Client connecting to 10.0.0.1, TCP port 5001 TCP window size: 85.0 KByte (default) ------------------------------------------------------------ [ 3] local 10.0.0.2 port 40710 connected with 10.0.0.1 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-11.0 sec 5.00 MBytes3.80 Mbits/sec
What ? Only
Tweaking the Linux Network Stack
Next thing that I had to do was to tweak the Linux Network Stack, in particular, the fragmentation/segmentation offloading. Normally, the Linux TCP/IP Stack is responsible for performing the fragmentation/segmentation of large UDP/TCP data chunks, thus consuming CPU cycles. Features like TSO (TCP Segmentation Offload), GSO (Generic Segmentation Offload) and other, reduce these CPU cycles by offloading the segmentation to the NIC driver and they are mostly enabled by default. Use the ethtool
command to display or modify these offload settings:
ubuntu@box-2 ~$ethtool -k gre_sys Features for gre_sys: rx-checksumming: off [fixed] tx-checksumming: on tx-checksum-ipv4: off [fixed] tx-checksum-ip-generic: on tx-checksum-ipv6: off [fixed] tx-checksum-fcoe-crc: off [fixed] tx-checksum-sctp: off [fixed] scatter-gather: on tx-scatter-gather: on tx-scatter-gather-fraglist: ontcp-segmentation-offload: on tx-tcp-segmentation: on tx-tcp-ecn-segmentation: on tx-tcp-mangleid-segmentation: on tx-tcp6-segmentation: on udp-fragmentation-offload: on generic-segmentation-offload: on generic-receive-offload: on large-receive-offload: off [fixed] rx-vlan-offload: off [fixed] tx-vlan-offload: off [fixed] ntuple-filters: off [fixed] receive-hashing: off [fixed] highdma: on rx-vlan-filter: off [fixed] vlan-challenged: off [fixed] tx-lockless: on [fixed] netns-local: on [fixed] tx-gso-robust: off [fixed] tx-fcoe-segmentation: off [fixed] tx-gre-segmentation: off [fixed] tx-gre-csum-segmentation: off [fixed] tx-ipxip4-segmentation: off [fixed] tx-ipxip6-segmentation: off [fixed] tx-udp_tnl-segmentation: off [fixed] tx-udp_tnl-csum-segmentation: off [fixed] tx-gso-partial: off [fixed] fcoe-mtu: off [fixed] tx-nocache-copy: off loopback: off [fixed] rx-fcs: off [fixed] rx-all: off [fixed] tx-vlan-stag-hw-insert: off [fixed] rx-vlan-stag-hw-parse: off [fixed] rx-vlan-stag-filter: off [fixed] l2-fwd-offload: off [fixed] busy-poll: off [fixed] hw-tc-offload: off [fixed]
I played a bit with disabling these offloads (TSO, GSO, GRO, LRO), but the one tweak that worked for my test environment was to disable the TCP Segmentation Offload on the gre_sys interface:
ubuntu@box-2 ~$sudo ethtool -K gre_sys tso off ubuntu@box-2 ~$ ubuntu@box-2 ~$sudo ip netns exec right iperf -c 10.0.0.1 ------------------------------------------------------------ Client connecting to 10.0.0.1, TCP port 5001 TCP window size: 85.0 KByte (default) ------------------------------------------------------------ [ 3] local 10.0.0.2 port 40744 connected with 10.0.0.1 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0-10.0 sec 1.14 GBytes980 Mbits/sec ubuntu@box-2 ~$ ubuntu@box-2 ~$ ubuntu@box-2 ~$ ubuntu@box-2 ~$sudo ip netns exec right iperf -c 10.0.0.1 -t 60 -i 5 ------------------------------------------------------------ Client connecting to 10.0.0.1, TCP port 5001 TCP window size: 85.0 KByte (default) ------------------------------------------------------------ [ 3] local 10.0.0.2 port 50982 connected with 10.0.0.1 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0- 5.0 sec 644 MBytes1.08 Gbits/sec [ 3] 5.0-10.0 sec 614 MBytes1.03 Gbits/sec [ 3] 10.0-15.0 sec 590 MBytes 990 Mbits/sec [ 3] 15.0-20.0 sec 631 MBytes1.06 Gbits/sec [ 3] 20.0-25.0 sec 603 MBytes 1.01 Gbits/sec [ 3] 25.0-30.0 sec 608 MBytes 1.02 Gbits/sec [ 3] 30.0-35.0 sec 613 MBytes 1.03 Gbits/sec [ 3] 35.0-40.0 sec 612 MBytes 1.03 Gbits/sec [ 3] 40.0-45.0 sec 609 MBytes 1.02 Gbits/sec [ 3] 45.0-50.0 sec 579 MBytes 971 Mbits/sec [ 3] 50.0-55.0 sec 609 MBytes 1.02 Gbits/sec [ 3] 55.0-60.0 sec 616 MBytes 1.03 Gbits/sec [ 3] 0.0-60.0 sec 7.16 GBytes 1.02 Gbits/sec
Much better this time ! I was able to hit 1 Gbps! This time I also saw the CPU going close to 100% - the process causing this was called "ksoftirqd", a soft-interrupts process that queues IRQ when they come too fast due to the system being under heavy load.
Disabling Path MTU Discovery
By default, Path MTU Discovery (PMTUD) is enabled so the outer IP header contains the
# on vagrant box-1 # ----------------- sudo ovs-vsctl add-port sw1 tun0 -- set Interface tun0 type=gre options:remote_ip=192.168.56.12options:df_default=false # on vagrant box-2 # ----------------- sudo ovs-vsctl add-port sw2 tun0 -- set Interface tun0 type=gre options:remote_ip=192.168.56.11options:df_default=false
Let's see the results - note the default MTU on the physical enp0s8 interface between the hypervisors:
ubuntu@box-2 ~$ip link 1: lo:mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 2: enp0s3: mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000 link/ether 02:27:53:60:41:f1 brd ff:ff:ff:ff:ff:ff 3: enp0s8: 4: enp0s9:mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000 link/ether 08:00:27:f2:1d:8c brd ff:ff:ff:ff:ff:ffmtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000 link/ether 08:00:27:9d:a4:ee brd ff:ff:ff:ff:ff:ff 5: sw2-p1@if6: mtu 1500 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000 link/ether 26:05:35:81:94:24 brd ff:ff:ff:ff:ff:ff link-netnsid 0 7: ovs-system: mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether c2:bd:52:03:08:44 brd ff:ff:ff:ff:ff:ff 8: sw2: mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000 link/ether 2e:2b:1e:62:f4:44 brd ff:ff:ff:ff:ff:ff 9: gre0@NONE: mtu 1476 qdisc noop state DOWN mode DEFAULT group default qlen 1 link/gre 0.0.0.0 brd 0.0.0.0 10: gretap0@NONE: mtu 1462 qdisc noop state DOWN mode DEFAULT group default qlen 1000 link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff # Let's test now (with PMTUD disabled and default 1500 MTU) ubuntu@box-2 ~$ sudo ip netns exec right iperf -c 10.0.0.1 -t 20 -i 5 ------------------------------------------------------------ Client connecting to 10.0.0.1, TCP port 5001 TCP window size: 85.0 KByte (default) ------------------------------------------------------------ [ 3] local 10.0.0.2 port 50968 connected with 10.0.0.1 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0- 5.0 sec 2.25 MBytes3.77 Mbits/sec [ 3] 5.0-10.0 sec 1.38 MBytes2.31 Mbits/sec [ 3] 10.0-15.0 sec 1.88 MBytes3.15 Mbits/sec ^C[ 3] 0.0-18.4 sec 6.38 MBytes2.90 Mbits/sec # Let's disable TSO ubuntu@box-2 ~$sudo ethtool -K gre_sys tso off ubuntu@box-2 ~$ ubuntu@box-2 ~$ sudo ip netns exec right iperf -c 10.0.0.1 -t 60 -i 5 ------------------------------------------------------------ Client connecting to 10.0.0.1, TCP port 5001 TCP window size: 85.0 KByte (default) ------------------------------------------------------------ [ 3] local 10.0.0.2 port 50974 connected with 10.0.0.1 port 5001 [ ID] Interval Transfer Bandwidth [ 3] 0.0- 5.0 sec 220 MBytes370 Mbits/sec [ 3] 5.0-10.0 sec 175 MBytes294 Mbits/sec [ 3] 10.0-15.0 sec 194 MBytes325 Mbits/sec [ 3] 15.0-20.0 sec 200 MBytes336 Mbits/sec ^C[ 3] 20.0-25.0 sec 193 MBytes324 Mbits/sec
Summary of iPerf Results
After finding the tweaks needed to get the best performance results, I resumed creating all different types of tunnels, as per instructions in the previous post and for each type of tunnel, I recorded the iperf tests results.
NOTES
|
|
Here is what I've got. These results were obtained between two VirtualBoxes, each having
Tunnel Type | offload |
offload |
offload |
---|---|---|---|
1.80 Gbits/sec | N/A | TSO off (enp0s8) |
|
GRETAP | 759 bits/sec | 3.15 Mbits/sec | TSO off (gre_sys) |
VXLAN | N/A | 1.10 Gbits/sec | UFO off (vxlan_sys_4789) |
GENEVE | N/A | 1.09 Gbits/sec | UFO off (genev_sys_6081) |
GREoIPSEC | N/A | 4.19 Mbits/sec | TSO off (gre_sys) |
That's it ! What results have you got ? If you run the same tests and you want to share the results, leave a comment below (please specify also the CPU).
Thanks for interest !
Comments
comments powered by Disqus