SuperComputing 2014
Enabling Scientific Discoveries with the LHC Data Distribution Over Terabit Networks

Tools used in SC '14

FDT - One of the key advances in this demonstration was Fast Data Transport (FDT; http://monalisa.cern.ch/FDT), an open source Java application developed by the Caltech team in close collaboration with the Polytehnica Bucharest team. FDT runs on all major platforms and uses the NIO libraries to achieve stable disk reads and writes coordinated with smooth data flow across long-range networks.

The FDT application streams a large set of files across an open TCP socket, so that a large data set composed of thousands of files, as is typical in high-energy physics applications, can be sent or received at full speed, without the network transfer restarting between files. FDT works with Caltech's MonALISA system to dynamically monitor the capability of the storage systems as well as the network path in real-time, and sends data out to the network at a moderated rate that achieves smooth data flow across long range networks.

MonALISA - MonALISA, stands for Monitoring Agents using a Large Integrated Services Architecture, has been developed by Caltech and its partners with the support of the U.S. CMS software and computing program. The framework is based on Dynamic Distributed Service Architecture and is able to provide complete monitoring, control and global optimization services for complex systems.

The MonALISA system is designed as an ensemble of autonomous multi-threaded, self-describing agent-based subsystems which are registered as dynamic services, and are able to collaborate and cooperate in performing a wide range of information gathering and processing tasks. The system is designed to easily integrate existing monitoring tools and procedures and to provide this information in a dynamic, customized, self describing way to any other services or clients. http://monalisa.caltech.edu

OLiMPS - OpenDaylight - Addressing a yet unsolved issue in LHCONE, namely the efficiency in interconnecting multiple network domains over more than one connection, Caltech is investigating and has made significant progress in developing the use of multipath connectivity through OpenFlow networks. In the OpenFlow Link-layer MultiPath Switching (OLiMPS) project, funded by DOE/OASCR, the group has developed an OpenFlow controller based on Big Switch’s Floodlight open-source controller. To this end, we have significantly improved the basic controller towards a more flexible and production-ready version. Specific improvements include a more versatile and extensible internal architecture and configuration management features, an improved command line interface, and a range of advanced features.

Later in 2014 through Cisco Research funding, the code was partially converted to the OpenDaylight SDN platform as was demonstrated in SC14 conference. Currently the code is written for the Hydrogen release, however it will be matured for the Helium release in the coming months during the year 2015.

For more information, please visit: http://www.uslhcnet.org/projects/olimps/

PhEDEx - PhEDEx is the data-placement management tool for the CMS experiment at the LHC. It manages the scheduling of all large-scale WAN transfers in CMS, ensuring reliable delivery of the data. It consists of several components:

  • an Oracle database, hosted at CERN
  • a website and data-service, which users (humans or machine) use to interact with and control PhEDEx
  • a set of central agents that deal with routing, request-management, bookkeeping and other
  • activities. These agents are also hosted at CERN, though they could be run anywhere. The key point is that there is only one set of central agents per PhEDEx instance
  • a set of site-agents, one set for every site that receives data

PhEDEx maintains knowledge and history of transfer performance, and the central agents use that information to choose among source replicas when a user makes a request (users specify the destination, PhEDEx chooses the source).

The central agents then queue the transfer to be processed by the site agents. PhEDEx operates in a data-pull mode, the destination site pulls the data to itself when it is ready. This gives the sites more control over the activity at their site, so they can ensure that neither their network nor their storage are overloaded.

For more information, please visit: http://www.uslhcnet.org/projects/anse/