Vxlan Topology

VXLAN is virtual extensible local area network. As you guess from definition, you can extend a VLAN  via VXLAN technology. We will touch on  extension mechanism in this post and why VXLAN is needed in nowadays network topologies. VXLAN is introduced in RFC7348.

Nowadays, virtualization takes important role in order to give customer service. Many application,hosts, services, even network are being virtualized and cloud computing is becoming more important day by day. So, many VMs can be created in datacenter network which need large mac address tables of datacenter switches in order to provide communication of those VMs. All VMs are grouped and need different VLANs based on their use. So, 4096 VLANs are limitation for segmentation of network.Sometimes, your customer needs to communicate with different datacenter via layer 2. VXLAN will provide this extension.Another approach is to use  layer 2 links efficient way by passing traffic active-active. As you know, STP is blocking some links to prevent loop and this causes some links are idle.

VXLAN is a layer 2 extension over layer 3 infrastracture by using  UDP encapsulation of client’s MAC address. Basically,you can extend a VLAN over this layer 3 topology. Each VXLAN has their own VXLAN network Identifier (VNI) which is 24-bit. You can create up 16M VXLAN and exceed 4096 VLAN limitation. Consider a ethernet frame which is created by VM. Once switch gets this frame,it assigns respectively VXLAN header,UDP header, outer IP header and outer ethernet header to that frame. The critical point is how a switch knows next hop or how that switch behaves when it gets this type packet.In such a topology, switch should know this somewhat. Some topologies use EVPN to carry this information, some topologies use a common authority server. Arista uses CVX, for example, which are cluster VMs on different hosts.

Let’s examine the path of packet in below topology.

[huge_it_gallery id=”30″]

I will briefly  explain the topology at first.  There are two clients that are connected the leaf switches via LACP. Those leafs are running MCLAG and connected to Spine devices which have AS 65590 boundary. All 4 leaf switches use BGP to connect Spine devices with separate cross links. This is completely  routing between AS65500 and 65590. In topology you will see small trick that each leaf pair is using same loopback adress. This is critical  to use all links efficiently. Because, we are using ECMP(RFC2991) protocol in BGP to make loadbalance traffic during communication 4 leafs each other through Spine.This will provide using all existing links by computing hash algorithm. As you know,in STP, some links have to be blocked. If layer 2 was chosen in this communication, some links would be idle. In that point, loopback addresses are advertised from leafs to other leafs. So while communication of loopback adresses, packets will be loadbalanced. In VXLAN encapsulation, we will need communication of loopback adresses of leafs to create tunnel.

As we said before, there should be something that Layer 2 database to be shared. In this topology, there is center authority servers. All switches in topology will advertise their own mac address tables to these servers. So these servers behave like a controller and know all switches’ mac addres tables.They also advertise learnt mac address tables to other switches. In VXLAN topology,  the switches which create tunnel is called VTEP, Virtual Tunnel End Point.Thus, these servers know each VTEP IP, VNI  and mac address behind it. Below there is simple received mac address table of authority server.

[huge_it_gallery id=”31″]

Now let’s look headers:

  • VXLAN Header is 8 bytes and 24 bit is used for VNI.

[huge_it_gallery id=”32″]

  • Outer UDP has source port and destination port of VTEPs. Indeed, source port is provided from VTEP and destination port 4789 is default udp port for VXLAN. It is also 8 bytes as below. Also, if ECMP exists, it uses hashing algorithm which link will be used. Random source port will help for this algoritm to utilize the links.

[huge_it_gallery id=”33″]

  • Outer IP header is 20 bytes that incluedes source IP address of VTEP. Destination adress can be multicast or unicast. Unicast is destination address of VTEP.

[huge_it_gallery id=”34″]

  • Outer Ethernet Header is 18 bytes and icludes mac address of source VTEP. Destination mac address can be here directly destination mac address VTEP or any layer 3 device to forward packets.

[huge_it_gallery id=”35″]

Any VTEP comes up, it uses IGMP to join multicast group for VNIs that it used. If there is a VNI that VTEP does not used , it is not needed to join multicast group.However, there is benefit of using authority servers like Arista CVX here. Since all VTEP, VNI and mac address are know by server, there is no need to multicast in all datacenter at first communication. It is enough to send only VTEP which is in same VNI by looking vxlan table.

As soon as any host connects to port of a switch, switch will know its mac address and advertise it to controller server.But what happens when broadcast frame comes from host.Suppose that, Client 1 will communicate with Client 2. Clien1 will create an ARP packet to learn mac address of destination IP. Then, in ethernet header, FF:FF:FF:FF:FF:FF will be inserted and sent to switch. Now, since this client port has a vlan number and it is bounded a VNI number, the frame will be added VXLAN header,outer UDP header, outer IP header and outer ethernet frame. Since switch is not sure which VTEP is get this packet, it will send all VTEP that is used same VNI. As soon as VTEPs get this packet, it will be decapsulated. Now arp request  will be respponded by client 2 and the packet will again encapsulated to be sent back VTEP of client1. You know VTEP source and destination address are specified in outer UDP header.With this, client 1 will know client 2’s mac address. Now, real communication can start. Now switch knows the destination mac address of client 1 frame and it will send the packet directly to related VTEP by using the table that is received by central servers. In VXLAN technology, multicast is considered to use. But especially, Arista does not chose to ise it in order to prevent multicast traffic in all datacenter.

In this post,  I tried to explain VXLAN technology and how it is used in modern datacenters. There are another ways to learn this layer2 database like  using BGP Evpn. I hope this post will help you.

By Mahmut Aydin

CCIE R&S #63405

Leave a Reply

Your email address will not be published.