Technology known as NFV (Network Functions Virtualization) is attracting a lot of attention. Indeed, it was one of the main topics at this year’s Mobile World Congress, where everyone from the large Network Equipment Manufacturers to new software companies focused on virtualization, presented their solutions targeting areas like vEPC, vIMS, vCPE and more. The Internet is also contributing to the excitement with each new white papers, webinar and every piece of market analysis that declares that the NFV phenomenon is a real revolution in the network technology. After an initial period of uncertainty, it seems the question today is not “if” the Communication Service Providers (CSP) will adopt NFV technology, but rather “when”.
Despite all the attention NFV is getting, there are areas that still need to be explored. For example, most of the discussion today revolved around the technology and the various options to adopt it, but few are thinking further down the road about how to ensure Service Assurance functions in these highly abstracted and complex virtual environments. Honestly, I doubt that any CSP, after initial laboratory trials, can afford to roll out NFV without the appropriate tools to guarantee at least the same level of service quality delivered today by the “legacy” (i.e. hardware-based) network infrastructure. Service Assurance includes various functions but for the purposes of this blog, I would like to focus on how to provide visibility of the traffic within NFV, via “passive monitoring”, rather than contributing to the debate about how the current OSS should evolve.
Passively monitoring Virtualized Networks leveraging probes poses various challenges depending on the level of “virtualization” that will be implemented (i.e. how many network functions will be moved into VMs).These challenges include:
- Lack of visibility of the traffic once the network interfaces (e.g. S11, Gn, S6a…) which today are accessible on physical links, move into a “Cloud” (i.e. virtualized). Once in the virtualized cloud, they could be implemented as inter-VM communications (in NFV terminology, east-west interfaces as opposed to north-south interfaces identifying traffic flowing in/out of the Cloud environment).
- Keeping up with the dynamic nature of virtualized environments: the NFV’s Orchestrator component can decide to move the network functions (e.g. MME, SGW, PGW, PCRF…) across the underlying hardware infrastructure, requiring the re-configuration of the probes (possibly, automatically).
- Inability to provide network elasticity: one of the promises of the NFV is to automatically expand (or collapse) the network on the fly depending on the traffic conditions. As example, in the case of a sudden increase of customers (or machine sensors), the Orchestrator should be capable to increase the capacity of one or more functions (obviously, only if the hardware resources are available) and, when the traffic spike is over, decommission those resources. The passive probe system should also be able to scale up and down as needed.
- Time to determine Root Cause identification: once the monitoring tool detects a problem, it is necessary to understand if it is due to a specific VNF or to the interworking between VNFs or to the hosting infrastructure (e.g. resources depletion of the physical servers hosting the VNFs).
Given these challenges, it will be critical for CSPs to have the same level or better of monitoring capability within a NFV environment. But how should organizations think about gaining that equivalent or better functionality? I have identified three possible approaches toward that direction.
- The first is to deploy virtual TAPs within the NFV infrastructure (in the form of dedicated VMs), extract the desired traffic, forward it to an external aggregator that delivers the packets to physical probes. This approach has the advantage to allow the CSP to re-use existing probes but does not allow the automatic scaling of the probe system (because still based on hardware appliances) and it introduces some doubt about the accuracy of the QoS measurements (e.g. MOS), because the network packets have to flow through multiple components before reaching the probe. Furthermore, it still requires physical aggregators deployed externally to the cloud: the complexity and inflexibility of such approach is evident. It is also unclear why a virtual tap should replicate a job that a virtual switch can do (i.e., switching packets from source to destination, based upon specific rules).
- The second approach is to integrate virtual probes functionality into the virtualized nodes. For example, a vEPC vendor will provide its virtual probes, natively integrated within the VNF, which can feed external applications that will collect and analyze aggregated measurements based on standards like Netflow. Although this sounds appealing, this method also presents various disadvantages: the data being exposed to the external applications, are only the ones the vendor has decided to export. Furthermore, it provides only aggregated measurements that can be used for Performance Monitoring. In order to do Troubleshooting, visibility down to a single packet is required. Lastly, some CSPs may question using a monitoring solution from their network infrastructure vendor vs and independent monitoring solution. Many CSPs already faced this issue in the past when they tried to implement Service Assurance totally based on data coming from the network nodes and realized it was not an optimal solution
- The third and potentially best approach is to deploy virtual probes that are fully independent of the VNF systems and receive a copy of the traffic to monitor from the virtual switches (just like the hardware probes today receive data from a mirror port or a physical tap). These probes, combined with a flexible centralized data collection and correlation system, can provide a unified view of the traffic down to single subscriber detail and are fully independent from the NFV vendor. Because these virtual probes run as close as possible to the respective VNF they are monitoring, they provide very accurate measurements. Additionally, virtual probescan automatically “scale-up or scale-down” as needed with the other NFV infrastructure being monitored (this is one of the promises of NFV, a.k.a. network elasticity): For example if the Orchestrator instantiates more vEPC components to satisfy an increasing traffic demand (a.k.a. “scale-up”), virtual probe capacity can be also increased accordingly. Once the traffic peak is over, the Orchestrator will release the hardware resources for both the vEPC and the virtual probes (a.k.a. “scale-down”).
- Finally, because these probes are independent from the VNF vendors, they can be easily expanded to provide additional measurements as soon as new services are provided by the CSP.
Support for this approach can be seen in an initiative started last year by the Openstack Community called TaaS (Tap-as-a-Service). This services has been designed to simplify the task of configuring the Virtual Switch to forward traffic to virtual probes. As reported on their site, it is “a project developed to introduce the functionality of port mirroring in OpenStack Neutron provisioned networks. This feature allows tenants and administrators to mirror network traffic (ingress/egress) from Neutron ports they have VM’s connected on to another port. This feature will greatly benefit tenants who want to debug their virtual networks and gain visibility into their VMs by monitoring and analyzing the network traffic associated with them”.
Is your company thinking about deploying NFV? Make sure that in the rush to deploy virtual infrastructure, you don’t forget about how you are going to properly monitor that environment. Empirix has solutions to help you make this transition and ensure service levels are met or exceeded.