Eyle Brinkhuis
Eyle werkt 5 jaar bij SURF en is vanuit het Jong Talent programma… Meer over Eyle Brinkhuis
Network function virtualization (NFV) is becoming a reality for education and research. Over the past two years I've worked with colleagues on an NFV infrastructure which SURF will soon put into production. This infrastructure will allow us to perform network functions such as switching and firewalling software-wise in a dedicated cloud instead of using physical devices at their own location. What will our NFV infrastructure look like?
Pilot with Firewall as a Service
At SURF we've been working for some time on network function virtualization (NFV): a technique which replaces the physical hardware of a network with virtual, software-based components. NFV offers the possibility of realizing functionalities such as routing, switching and firewalling. The big advantage is that virtual functions are easily scalable: you only pay for what you use.
In 2018, we did a pilot to see if Firewall as a Service (in Dutch) was a viable concept: offering a firewall as a service in the cloud, replacing the physical firewall equipment at the institution. We wanted to use NFV for that. That pilot did not succeed: we ended up using physical firewall equipment that institutions could use via network connections (lightpaths). But now, more than two years later, we have unabatedly continued to define how we can successfully implement NFV in our services.
Building it ourselves
In the 2018 pilot, we wanted to deploy NFV, and we were using Juniper Contrail for that. However, this software did not meet our requirements. As a result, we did learn what NFV software should be able to do for us, and we learned that the best solution was to build the NFV-infrastructure ourselves.
Off-the-shelf NFV software is widely used in the telecom industry, by large telecom operators like KPN and T-mobile. But in that world, the demands and wishes differ from those we have for our services:
We want to be able to migrate workloads live to another server if necessary, without service interruption. In telecom, an interruption is usually limited to a small hick-up in a 4G or 5G connection. The user does not notice this. An interruption in the workload of a firewall has greater consequences for the service: the institution's internet connection is simply cut off.
We work with much faster connections in our network than is usually the case in the telecom sector: we work with connections up to 100 Gbit/s. Standard NFV solutions are not built for that.
Processing network traffic at high speed
The most important aspect of an NFV infrastructure is processing network traffic at high speeds on standard hardware. Ordinary servers are not suitable for this out of the box. Therefore, we must instruct the network cards not to process packets on an interrupt basis ('Hi, I'm a network packet, what are you going to do with me?') but on a polling basis ('Bring on the packets'). We then ensure that these network packets are placed in an easily accessible location in memory as quickly as possible.
Within our NFV infrastructure, we use OpenStack Train to do this, and we speed up the data plane using VPP: Vector Packet Processing. Within VPP, we use the RDMA driver that was specifically written to drive our Mellanox ConnectX-5 network cards on a polling basis. VPP was developed by Cisco and donated to the open-source community. A plug-in was later written for it that allows us to connect VPP to OpenStack.
Up to 100 Gbit/s per server
By using these techniques and software, we can process up to 100 Gbit/s per server. And to get the traffic to the right place in the memory, we only use 2 CPU cores. The rest is available for firewalling, routing or switching.
The network cards of the servers are in turn redundantly connected to two Mellanox SN2100 100Gbit/s switches. We use MultiChassis Link Aggregation (MC-LAG) to ensure that we do not have a single point of failure here. The Mellanox SN2100 chassis are then connected to the SURF network with two connections, one to both Juniper MX2008 chassis. For loop prevention we make use here of Ethernet Segment Identifiers (ESIs), a feature of the EVPN technique that we use within the SURF network.
64 logical CPU cores
This completes the network part. The next question was which servers to use for running the firewall software. We chose Lenovo SR635 servers with an AMD EPYC2 processor. This gives us 32 CPU cores, which allows us to create 64 logical cores. That's a lot, but we also need a lot of power. We use 8 logical cores for host matters, such as running the OS and processing traffic in the dataplane. That leaves us with 56 cores for the actual service. In total, we have 6 of these Lenovo servers, divided between our two locations AMS1 and AMS2. The six of them make up one large cluster.
Overview of SURF's NFV infrastructure. The example shows an institution that purchases a virtual firewall as part of the NFV infrastructure: in red the route taken by incoming traffic from the internet to the firewall; in green the route from the firewall to the institution.
Automatically set up and monitor functions
Now that we have set all this up, the NFV infrastructure can be put to use. Operation is almost fully automatic. If an institution wants to configure a firewall (or other function) in its network, it can do so itself via a portal. The firewall is then set up entirely automatically; no one from SURF needs to be involved. The firewall is then also monitored automatically. Setting up and monitoring are done with the open-source software package Open Source MANO made available by ETSI.
Almost ready for production
We are now almost ready for production with our NFV infrastructure. We not only want to offer firewall services on it, but soon also virtual switches, routers and so on. And we want to run our own service eduVPN via NFV. We think we'll gain a lot of performance, because NFV allows us to handle traffic more efficiently and to scale up or down more easily than our current solution.
All in all, we hope that our NFV structure will save institutions from having to buy a lot of closet space-consuming, quickly aging devices that can only do one job.
Are you interested in using our NFV infrastructure? Or would you like to know more about it? Then please contact me at eyle.brinkhuis@surf.nl.
This is a translation of a Dutch article.
Eyle werkt 5 jaar bij SURF en is vanuit het Jong Talent programma… Meer over Eyle Brinkhuis
0 Praat mee