Performance Review of Overlay Tunnels with Open vSwitch

In my previous article I presented various encapsulation techniques used to extend Layer 2 reachability across separate networks using tunnels created with Open vSwitch.
Although the initial intention was to include some iperf test results, I decided to leave these for a separate post (this one!) because I hit few problems. While I was prepared to deal with MTU issues - always a topic when adding extra encapsulation - there were other things that I had to take care of.

DISCLAIMER:

The tests presented in this post do not follow a typical network performance procedure, but are more just iperf tests (mostly with the default options) intended to give the reader a simple overview. These tests were not performed between physical machines over physical wires, but instead they were carried between virtual elements in a fully virtualized environment (OS, networking, connection, etc).

Baseline

As always when trying to measure performance, you need a baseline that gives you a rough idea about the total capacity you have. In this case, I assigned the Host-Only adapter to the each of the network namespace so that they can reach each other directly.
After measuring the baseline, I followed the instructions in my previous post to create tunnels and re-do the performance measuring each time. In the end, I summarized the iperf results for all the tests performed.

Let's start ! On each of the vagrant box, I create a network namespace (left and right) and assign the first Host-Only Adapter (enp0s8) to them, thus achieving a direct connection.

# on vagrant box-1
# -----------------

# create a new namespace
sudo ip netns add left

# reset enp0s8 interface and assign it to the namespace
sudo ifconfig enp0s8 0.0.0.0 down
sudo ip link set dev enp0s8 netns left

# configure that interface inside the namespace
sudo ip netns exec left ifconfig enp0s8 10.0.0.1/24 up


# on vagrant box-2
# ----------------

# same as above, with different name and IP for the namespace
sudo ip netns add right
sudo ifconfig enp0s8 0.0.0.0 down
sudo ip link set dev enp0s8 netns right
sudo ip netns exec right ifconfig enp0s8 10.0.0.2/24 up


# check the connectivity between left and right namespaces
# from vagrant box-1
ubuntu@box-1 ~$ sudo ip netns exec left ping 10.0.0.2
PING 10.0.0.2 (10.0.0.2) 56(84) bytes of data.
64 bytes from 10.0.0.2: icmp_seq=1 ttl=64 time=0.415 ms
64 bytes from 10.0.0.2: icmp_seq=2 ttl=64 time=0.264 ms
...

Now, let's perform an iperf test and record the results. On the 'left' namespace (vagrant box-1), run the server with command sudo ip netns exec left iperf -s and on the 'right' start the iperf client:

ubuntu@box-2 ~$ sudo ip netns exec right iperf -c 10.0.0.1
------------------------------------------------------------
Client connecting to 10.0.0.1, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 10.0.0.2 port 46616 connected with 10.0.0.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  1.05 GBytes  1.80 Gbits/sec

NOTES

I'm only testing TCP, but you may want to perform also UDP tests
I'm only using the iperf defaults, but you may want to tweak it to achieve better results - for example, setting a different TCP window size or/and using parallel connections, etc...

If you follow these steps to do your own tests, before you start creating the tunnels, you should undo the above configuration, that was needed for the baseline - the easiest way is to reboot the vagrant box. After reboot, create a GRETAP tunnel between two OVS bridges and repeat the iperf tests.

Considerations with Overlay Networks

Everytime you add extra encapsulation to your traffic, you have to think about the MTU: 'do I have to increase or decrease it?' or 'is Path MTU Discovery working, is it enabled or disabled in my network?' or 'what is the overhead that my tunnel adds?' - these are questions that you should always consider (and you better know the answer ☺) !
As I said in the beginning, I was expecting MTU problems and I hoped that I can deal with them - unfortunately, I was wrong ! Let's see what happened, chronologically:

First Test

After having the GRE tunnel up and connectivity working between the two network namespaces, I started the iperf server on the 'left' namespace with command ip netns exec left iperf -s and the client on 'right' with command ip netns exec right iperf -c 10.0.0.1. I waited for a minute and nothing was shown on the console. After few more minutes I opened a new ssh session and started doing tcpdump, but again, nothing there too. I left the command running and after 15 minutes (!!) it finished and returned 759 bits/sec (not Megs, not Kilos, simple bits per second).

ubuntu@box-2 ~$ sudo ip netns exec right iperf -c 10.0.0.1
------------------------------------------------------------
Client connecting to 10.0.0.1, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 10.0.0.2 port 40708 connected with 10.0.0.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-931.1 sec  86.3 KBytes   759 bits/sec

Solutions

Adjusting the MTU

My first action was to increase the MTU on the physical link between the vagrant boxes (the Host-only adapter between the Virtual Boxes) and I have chosen a value high enough to fit the GRE overhead, such as 1600.

ubuntu@box-2 ~$ sudo ip link set dev enp0s8 mtu 1600
ubuntu@box-2 ~$
ubuntu@box-2 ~$ sudo ip netns exec right iperf -c 10.0.0.1
------------------------------------------------------------
Client connecting to 10.0.0.1, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 10.0.0.2 port 40710 connected with 10.0.0.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-11.0 sec  5.00 MBytes  3.80 Mbits/sec

What ? Only 3.80 Mbps? That was unexpectedly small! I did not have high expectations, but for sure, nothing like that !

Tweaking the Linux Network Stack

Next thing that I had to do was to tweak the Linux Network Stack, in particular, the fragmentation/segmentation offloading. Normally, the Linux TCP/IP Stack is responsible for performing the fragmentation/segmentation of large UDP/TCP data chunks, thus consuming CPU cycles. Features like TSO (TCP Segmentation Offload), GSO (Generic Segmentation Offload) and other, reduce these CPU cycles by offloading the segmentation to the NIC driver and they are mostly enabled by default. Use the ethtool command to display or modify these offload settings:

ubuntu@box-2 ~$ ethtool -k gre_sys
Features for gre_sys:
rx-checksumming: off [fixed]
tx-checksumming: on
    tx-checksum-ipv4: off [fixed]
    tx-checksum-ip-generic: on
    tx-checksum-ipv6: off [fixed]
    tx-checksum-fcoe-crc: off [fixed]
    tx-checksum-sctp: off [fixed]
scatter-gather: on
    tx-scatter-gather: on
    tx-scatter-gather-fraglist: on
tcp-segmentation-offload: on
    tx-tcp-segmentation: on
    tx-tcp-ecn-segmentation: on
    tx-tcp-mangleid-segmentation: on
    tx-tcp6-segmentation: on
udp-fragmentation-offload: on
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off [fixed]
rx-vlan-offload: off [fixed]
tx-vlan-offload: off [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: on
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: on [fixed]
netns-local: on [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
busy-poll: off [fixed]
hw-tc-offload: off [fixed]

I played a bit with disabling these offloads (TSO, GSO, GRO, LRO), but the one tweak that worked for my test environment was to disable the TCP Segmentation Offload on the gre_sys interface:

ubuntu@box-2 ~$ sudo ethtool -K gre_sys tso off
ubuntu@box-2 ~$
ubuntu@box-2 ~$ sudo ip netns exec right iperf -c 10.0.0.1
------------------------------------------------------------
Client connecting to 10.0.0.1, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 10.0.0.2 port 40744 connected with 10.0.0.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  1.14 GBytes   980 Mbits/sec
ubuntu@box-2 ~$
ubuntu@box-2 ~$
ubuntu@box-2 ~$
ubuntu@box-2 ~$ sudo ip netns exec right iperf -c 10.0.0.1 -t 60 -i 5
------------------------------------------------------------
Client connecting to 10.0.0.1, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 10.0.0.2 port 50982 connected with 10.0.0.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 5.0 sec   644 MBytes  1.08 Gbits/sec
[  3]  5.0-10.0 sec   614 MBytes  1.03 Gbits/sec
[  3] 10.0-15.0 sec   590 MBytes   990 Mbits/sec
[  3] 15.0-20.0 sec   631 MBytes  1.06 Gbits/sec
[  3] 20.0-25.0 sec   603 MBytes  1.01 Gbits/sec
[  3] 25.0-30.0 sec   608 MBytes  1.02 Gbits/sec
[  3] 30.0-35.0 sec   613 MBytes  1.03 Gbits/sec
[  3] 35.0-40.0 sec   612 MBytes  1.03 Gbits/sec
[  3] 40.0-45.0 sec   609 MBytes  1.02 Gbits/sec
[  3] 45.0-50.0 sec   579 MBytes   971 Mbits/sec
[  3] 50.0-55.0 sec   609 MBytes  1.02 Gbits/sec
[  3] 55.0-60.0 sec   616 MBytes  1.03 Gbits/sec
[  3]  0.0-60.0 sec  7.16 GBytes  1.02 Gbits/sec

Much better this time ! I was able to hit 1 Gbps! This time I also saw the CPU going close to 100% - the process causing this was called "ksoftirqd", a soft-interrupts process that queues IRQ when they come too fast due to the system being under heavy load.

Disabling Path MTU Discovery

By default, Path MTU Discovery (PMTUD) is enabled so the outer IP header contains the Don't Fragment bit set. Disabling it, so that the outer IP header will not have the DF bit set, represents also a solution - not something that I would recommend !!

# on vagrant box-1
# -----------------

sudo ovs-vsctl add-port sw1 tun0 -- set Interface tun0 type=gre options:remote_ip=192.168.56.12 options:df_default=false



# on vagrant box-2
# -----------------

sudo ovs-vsctl add-port sw2 tun0 -- set Interface tun0 type=gre options:remote_ip=192.168.56.11 options:df_default=false

Let's see the results - note the default MTU on the physical enp0s8 interface between the hypervisors:

ubuntu@box-2 ~$ ip link
1: lo:  mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: enp0s3:  mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether 02:27:53:60:41:f1 brd ff:ff:ff:ff:ff:ff
3: enp0s8:  mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether 08:00:27:f2:1d:8c brd ff:ff:ff:ff:ff:ff
4: enp0s9:  mtu 1500 qdisc pfifo_fast state UP mode DEFAULT group default qlen 1000
    link/ether 08:00:27:9d:a4:ee brd ff:ff:ff:ff:ff:ff
5: sw2-p1@if6:  mtu 1500 qdisc noqueue master ovs-system state UP mode DEFAULT group default qlen 1000
    link/ether 26:05:35:81:94:24 brd ff:ff:ff:ff:ff:ff link-netnsid 0
7: ovs-system:  mtu 1500 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether c2:bd:52:03:08:44 brd ff:ff:ff:ff:ff:ff
8: sw2:  mtu 1500 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/ether 2e:2b:1e:62:f4:44 brd ff:ff:ff:ff:ff:ff
9: gre0@NONE:  mtu 1476 qdisc noop state DOWN mode DEFAULT group default qlen 1
    link/gre 0.0.0.0 brd 0.0.0.0
10: gretap0@NONE:  mtu 1462 qdisc noop state DOWN mode DEFAULT group default qlen 1000
    link/ether 00:00:00:00:00:00 brd ff:ff:ff:ff:ff:ff

# Let's test now (with PMTUD disabled and default 1500 MTU)
ubuntu@box-2 ~$ sudo ip netns exec right iperf -c 10.0.0.1 -t 20 -i 5
------------------------------------------------------------
Client connecting to 10.0.0.1, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 10.0.0.2 port 50968 connected with 10.0.0.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 5.0 sec  2.25 MBytes  3.77 Mbits/sec
[  3]  5.0-10.0 sec  1.38 MBytes  2.31 Mbits/sec
[  3] 10.0-15.0 sec  1.88 MBytes  3.15 Mbits/sec
^C[  3]  0.0-18.4 sec  6.38 MBytes  2.90 Mbits/sec


# Let's disable TSO
ubuntu@box-2 ~$ sudo ethtool -K gre_sys tso off
ubuntu@box-2 ~$
ubuntu@box-2 ~$ sudo ip netns exec right iperf -c 10.0.0.1 -t 60 -i 5
------------------------------------------------------------
Client connecting to 10.0.0.1, TCP port 5001
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 10.0.0.2 port 50974 connected with 10.0.0.1 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0- 5.0 sec   220 MBytes   370 Mbits/sec
[  3]  5.0-10.0 sec   175 MBytes   294 Mbits/sec
[  3] 10.0-15.0 sec   194 MBytes   325 Mbits/sec
[  3] 15.0-20.0 sec   200 MBytes   336 Mbits/sec
^C[  3] 20.0-25.0 sec   193 MBytes   324 Mbits/sec

Summary of iPerf Results

After finding the tweaks needed to get the best performance results, I resumed creating all different types of tunnels, as per instructions in the previous post and for each type of tunnel, I recorded the iperf tests results.

NOTES

as mentioned in the disclaimer, these tests were performed in a fully virtualized environment, not between physical machines !
I performed only TCP tests
in all of these cases, the performance is a factor of CPU - the more cycles you have, the better results you get
below table contains only the best results

Here is what I've got. These results were obtained between two VirtualBoxes, each having 2x CPUs running @2394 Mhz (though, the number of CPUs does not matter - this needs more testing):

iPerf Results Overview

Tunnel Type	MTU 1500 (enp0s8) offload on	MTU 1600 (enp0s8) offload on	MTU 1600 (enp0s8) offload off
Baseline	1.80 Gbits/sec	N/A	TSO off (enp0s8) 1.92 Gbits/sec
GRETAP	759 bits/sec	3.15 Mbits/sec	TSO off (gre_sys) 1.08 Gbits/sec
VXLAN	N/A	1.10 Gbits/sec	UFO off (vxlan_sys_4789) 1.12 Gbits/sec
GENEVE	N/A	1.09 Gbits/sec	UFO off (genev_sys_6081) 1.15 Gbits/sec
GREoIPSEC	N/A	4.19 Mbits/sec	TSO off (gre_sys) 594 Mbits/sec

That's it ! What results have you got ? If you run the same tests and you want to share the results, leave a comment below (please specify also the CPU).

Thanks for interest !

Baseline

Considerations with Overlay Networks

First Test

Solutions

Adjusting the MTU

Tweaking the Linux Network Stack

Disabling Path MTU Discovery

Summary of iPerf Results

Post Tags:

Share this article:

Comments

Subscribe

Follow Me

Recent Posts

Recent Comments

Blog Tags

About Me

Most Commented Quizzes

Useful links