SuperComputing 2016
ExaScale Scientific Data Transfers using Software Defined Networking

Table of Contents

SDN Controlled High performance Network Infrastructure. 2

Single Server 1Tbps RDMA Data Transfer 3

NVMe over Fabrics (Ethernet) 5

WAN Demonstrations. 6

  SDN Controlled High performance Network Infrastructure

Software defined networking (SDN) has evolved as a key component in the modern data center or wide area network. SDN reduces the applications and infrastructure gap by not only improving the infrastructure automation but also to efficiently and dynamically provision end to end network paths and provide an improved quality of service to the applications. SDN software has finally evolved out from its infancy and now several commercial grade controllers are available. The Caltech HEP group has been actively involved in the design and development of several SDN components on top of the OpenDaylight [ [1] ] SDN controller, currently the Beryllium release. The SDN application operator panel is near to completion with active ongoing developments to make it production ready. Software developments in specific are:

1)      North Bound User Interface for topology visualization

2)      End to end path provision (layer2, layer3, tunnels, gateways)

3)      Monitoring and graphing of individual flows, all the flows, port utilization

The topology used at SC16 conference is shown below in Figure 1 . This topology comprises various booths and DTN interconnections using the OpenFlow protocol. Paths are provisioned end to end in less than a minute which brings the application to the level of a real production environment.


Figure 1 : SDN-controlled high-performance network infrastructure

Single Server 1Tbps RDMA Data Transfer

At Supercomputing conference 2016, a single pair of servers in the Caltech and Data Intensive Science Booths was connected over Infinera Cloud Xpress Data Center Interconnect (DCI). Supermicro 4028GR-TR2 servers were used, which provide an additional PCIe expansion board connected to a single processor using two PCIe Gen3 x16 lanes. The network topology is shown in Figure 2 .

Figure 2 : Single server architecture supporting 1Tbps

Remote Data Memory Access Over Converged Ethernet (RoCE) offers server to server data transport directly between application’s memory without any CPU involvement on a lossless Ethernet network. Due to the limitation of available PCIe lanes in the current generation of Intel CPUs, we used a custom designed NIC firmware to generate traffic directly from the NIC. The goals of designing such a system are twofold:

1)      CPU offload to transport massive application traffic

2)      Maximum network utilization with lowest possible latency

3)      To demonstrate the current limitations of PCIe lanes, PCIe switch fabric

Server to server traffic was captured using sFlow configured on the Arista 7060CX Ethernet switch. Traffic was captured using inMon sFlow [ [2] ] collector and is displayed in Figure 3 .

Figure 3 : 850Gbps data traffic between a pair of servers

NVMe over Fabrics (Ethernet)

In this demonstration we used two Supermicro 1U servers. These servers are based on a single socket CPU design with 2 x 3.2TB LIQID [ [3] ] NVMe cards and one 100GE Chelsio T6 [ [4] ] NIC in each of them. NVMe offers low latency and parallelism using a simplified architecture. NVMe block devices in the target server were mounted on the client server. Fio [ [5] ] disk benchmark test showed a throughput of 10.18 GB/s on the remote drives. NVMe over Fabrics using RDMA allows complete CPU bypass and performance close to the maximum throughput at the remote server. Figure 4 shows the actual throughput of about 9 GB/sec reading from server and writing on the client drives.

Figure 4 : NVME over Fabrics disk throughput over the Network


WAN Demonstrations

Caltech worked with many of the regional and international commercial and research network providers to bring a large number of 100GE waves to the Salt Lake City convention center.  Another important goal was to connect many UC campuses connected to the Pacific Research Platform (PRP) through CENIC to participate in the large data intensive transfers. Figure 5 provides a larger view of the WAN topology, service providers and circuits terminating in the booth.

Figure 5 : Caltech - WAN and Inter-Booth Connections

For this purpose, with the invaluable help from CENIC, Centurylink and SCinet, 6 x 100GE circuits were installed connecting Salt Lake convention center and CENIC’s Los Angeles and Sunnyvale PoPs. At the show floor, three of the 100GE circuits were terminated in the Caltech booth and connected to a single Dell R930 server using the Caltech SDN infrastructure.

Different Vlans were laid to SDSU, USC, UCSC and to other perfsonar test end points as shown in the PRP topology in Figure 6 . Despite of various challenges at the end sites, we were able to achieve FDT data transfers at:

Source / Destination                                                           Throughput

1.      SDSU <=> Caltech Booth <=> SDC Booth                 40Gbps

2.      Caltech Booth => USC                                                  30.3Gbps

3.      USC => Caltech Booth                                                  24Gbps

4.      Stanford => Caltech Booth                                          74Gbps


Figure 6 : Pacific Research Platform over the CENIC backbone