In last week’s post about Network Virtualization and the Software Defined Data Center, I pointed out that whatever approach you choose to take towards your modern data center network – SDN, NFV, a blend – an element of network virtualization will become fundamental to its architecture. I concluded by pointing out that the requirement to retain visibility will be paramount to its service-oriented success. But before I move to this visibility theme, I thought I’d add a graphic to help illustrate the complementary nature of SDN and NFV.

SDN and NFV seem quite complementary
SDN and NFV seem quite complementary

New visibility challenges with Network Virtualization

This shift towards network virtualization doesn’t change the importance of visibility into network traffic– but it does change the approach. In the past, the best way to collect data was to position a passive probe that could see either the initiation, termination or transient path of an application’s conversations. Given the client-server paradigm of the monitored TCP connection, you could accurately report on the “who, what, when, where and why” of degraded performance. By combining that with insight into the application from the end user perspective and with server performance details, you had a reasonably complete story.

Then came fully meshed environments, where probe placement became harder or, more accurately, less practical. This can be addressed by introducing flow based data to the story, complementing richer probe-based data. The key to all of these approaches was access to the traffic in transit; even the introduction of physical WAN optimization controllers (devices that essentially “broke” that end-to-end TCP connection) could be accounted for by combining the pre- and post-optimized flow to represent a single flow. But abstracting network functions and obscuring in-transit traffic into a virtual black box means that you have to address head-on the approach to data acquisition.

Visibility: more than just “nice to have”

Of course this network data is still a vital part of the complete story; in fact you can make an argument that visibility into this area has increased in importance. First, this is likely to be something of a first-time project for organizations, and as a result many will have no real idea about how their applications will perform in such an environment and, subsequently, what impact the applications will have on this new network architecture. The traffic and performance characteristics and traits from previous architectures may well no longer be the same, and without insight into how these have changed, resolving problems or optimizing performance may be quite difficult.

A recent Stamford University study found that the performance of a workload running inside a VM is affected not only by other VMs on the same physical machine, but also by the location of the VM to other VMs it is communicating with across the network topology, observing significant impact from the latency and utilization characteristics of the network links. They went on to discuss that in data centers with oversubscribed network links, the topological location of a virtual machine significantly determines its performance, and went on to show that by utilizing a form of network-aware VM placement, performance can improve by over 70%, when compared to initial random VM placements. Remember, just like traditional networks and the promise of automation, this still isn’t a set-and-leave, self-healing environment, yet.

A flattened network can reduce latency and congestion – if you are aware of traffic paths
A flattened network can reduce latency and congestion – if you are aware of traffic paths

The next aspect is the operation perspective. The network still represents the largest part of the application journey form the server to the end user, and therefore represents the biggest opportunity for performance degradation. This could be simple congestion, or an undesirable behavior introduced by encapsulation such as a broken tunnel or packet drops. In addition this is likely to be a somewhat dynamic network environment, so knowing where those flows have been, are currently, and should be is fundamental to managing service delivery.

Familiar metrics

It’s all very well saying you need visibility into this virtual network, but what exactly should you be looking for? Fundamentally, you need to understand the mechanics of the virtual network deployment. For example, in an overlay network, visibility into active tunnels, their end points, and the type of tunnels, is a good starting point. This will provide a basic level of insight, allowing you to understand if the virtual network has been configured and deployed correctly, working as designed, and tuned effectively. That core level of visibility, understanding what applications, users and servers are traversing which tunnels, will give you indications as to the load in your environment, helping you to assess performance degradations that might be caused by oversubscription.

This perspective can then also be used to understand usage patterns to determine, for example, if one application is adversely impacting another. To complete that picture you’ll want to include packet loss in your metric collection. This should include insight at the server, and also at points within the virtual infrastructure, to help you determine if the virtual infrastructure might be negatively impacting the application traffic. Essentially what you’re looking for is application performance, usage and loss rate – the same metrics we cared about when networks were simpler. We’ve already alluded to the fact that new software-defined network fabrics make data acquisition more complicated than simply placing a probe in the path, and you should expect to leverage a number of techniques for collecting important metrics, including flow-based data (e.g. NetFlow, IPFIX etc.) from the virtual infrastructure itself, packet-level data from agents and/or probes on virtual taps, as well as information taken directly from the controller.


However you approach it, network virtualization presents a great opportunity for organizations to reduce costs, improve SWaP, (space weight and power) in the data center, decrease time to market, and offer the speed and agility today’s service delivery expectations demand. But to reap the benefits of the technology, you need visibility into the network and the services it delivers. This visibility is critical for optimizing operations as you plan and build out new infrastructure, providing actionable operational data in production, and ultimately provide you with a path to that utopia of true automation somewhere in the future.