By Sven Olav Lund, Sr. Product Manager, Napatech
EMC estimates that by 2020, about 1.7 megabytes of new information will be created every second for every human being on the planet. And IDC projects that by then, our current 4.4 zettabytes of data today will grow to approximately 44 zettabytes. This unprecedented level of data generation has created demand for higher-capacity networks. In response, the telecom core networks and enterprise data centers recently introduced the 100G technology and 100G links – taking the transition from 10G to 100G out of the theoretical and into the real.
Transition Brings Challenges
The changeover to 100G network analysis from its smaller 10G cousin might seem like a simple matter but, in fact, it introduces new challenges in terms of system performance and offload requirements. The analysis of bidirectional flows on 100G network links requires capture of up to 2 x 100G line rate traffic, transfer of the captured traffic to the server memory and a full utilization of the multicore server system for processing the captured traffic. This presents three challenges:
- Server system use: The network analysis involves a correlation of the bidirectional flows in most use cases. An efficient application processing of flows requires that both directions of each flow are handled by the same CPU core. The challenge is to correlate upstream and downstream flows and transfer packets from 2 Ethernet ports to the same CPU core and, at the same time, use all the available CPU cores in the server system efficiently. The communication between the CPU sockets involves the Intel Quick Path Interconnect (QPI). Remember, a communication over the QPI introduces overhead and should be avoided.
- Zero packet loss: The analysis applications depend on complete capture of traffic under all conditions. This includes bursts of 64-byte packets at 100G line rate. For this, a specialized network adapter is needed to guarantee full packet capture.
- The bottleneck in PCIe: The PCIe Gen3 will support up to 115G packet transfer rate when the PCIe overhead is taken into account, depending on the network adapter and server system used. Based on the system design approach, the PCIe bottleneck is a limitation when monitoring 100G links.
Three Possible Solutions
Just as there are three primary challenges, there are also three different approaches for monitoring up/down link 100G traffic over optical taps. Each approach has pros and cons in relation to full packet capture and system performance.
The first approach involves a single PCI slot. In this scenario, the network adapter has two ports and interfaces to the host system over a single PCIe interface. The upstream and downstream flows captured on the two Ethernet ports are correlated and transferred to the CPU queues on the server system. Only the CPU cores on the CPU socket attached to the PCIe slot with the network adapter have direct access to the data. Using other CPU cores in the system requires communication over QPI, which introduces overheads and reduced application performance.
It takes well-designed distribution techniques and data retrieved—from packet headers outside or inside the tunnels or encapsulations—to create an efficient load distribution over the CPU cores. And the obvious limitation of the single PCI slot solution is the PCIe bandwidth bottleneck. The solution cannot handle sustained bandwidth above 115G. But on-board buffer memory can somehow compensate for the limitation and enable capture of bursts of line rate traffic.
The second approach is a dual PCI slot with independent flow transfer. It uses two PCI slots and captures up/down link traffic on two separate network adapters. This approach eliminates the PCIe bandwidth limitation completely and guarantees full line rate capture and delivery to the server memory, assuming sufficient server system performance.
The traffic from the downstream flow direction is delivered to CPU cores on another CPU socket, and traffic from the upstream flow direction is delivered to the CPU cores on one CPU socket. Correlation between the upstream and downstream flow directions have to be handled by the application. Since upstream and downstream traffic is delivered to different CPU sockets, the correlation process involves communication over QPI. For CPU bound applications, the overhead can be critical.
The third approach is a variation of the second. It uses a dual PCI slot with inter-PCI slot flow transfer. A hardware interconnect between the two PCIe network adapter is introduced. This can, for example, be done by using two Napatech NT100E3 accelerators with an interconnect cable. The purpose of interconnectivity is to direct the upstream and downstream flows to the correct CPU cores without the need for application communication over the QPI. Each network adapter is configured to distribute the upstream and downstream to the correct CPU sockets. This approach completely makes up for the QPI communication overhead.
However, asymmetrical load distribution may lead to the oversubscription of one of the PCIe interfaces. Nevertheless, in general, a highly loaded 100G link carries a large number of flows, which means the load distribution will be evenly balanced over CPU cores.
How can an organization decide which way to go? The design decision should be based on optimizing the performance of the main bottleneck in the system. When analyzing 100G links, the application is the bottleneck in most cases. Consequently, a design that optimizes the server resource use is preferable, and the third approach is the best solution.
The first approach can be an alternative if the use case allows data reduction, such as dropping certain traffic categories based on filter criteria. Data reduction can compensate for the PCIe Gen3 bandwidth bottleneck and also reduce the need for balancing the load across the CPU sockets.
Preparing for What’s Ahead
PCIe Gen4 solves the PCIe Gen3 bottleneck and enables implementation of full line-rate capture, 2-port 100G network adapters in a single PCI slot. That’s because it doubles the capacity compared to PCIe Gen3 and opens the way for network adapter designs supporting line rate transfer of 2 x 100G traffic from network ports to server host memory. Fine, yet it does not address server system use. The QPI and related system overhead are still involved in flow distribution across all CPU cores. It remains to be seen, though, whether Intel will improve QPI performance or debut an entirely different concept for inter-CPU communication. Time will tell.
About the author:
Sven Olav Lund is a Sr. Product Manager at Napatech and has over 30 years of experience in the IT and Telecom industry. Prior to joining Napatech in 2006, Sven Olav was a Software Architect for home media gateway products at Triple Play Technologies. From 2002 to 2004 he worked a Software Architect for mobile phone platforms at Microcell / Flextronics ODM and later at Danish Wireless Design / Infineon AG. As a Software Engineer, Sven Olav started his career architecting and developing software for various gateway and router products at Intel and Case Technologies. He has an MSc degree in Electrical Engineering from the Danish Technical University.