Section: New Results
A Flexible Framework for Asynchronous In Situ and In Transit Analytics for Scientific Simulations
High performance computing systems are today composed of tens of
thousands of processors and deep memory hierarchies. The next
generation of machines will further increase the unbalance between I/O
capabilities and processing power. To reduce the pressure on I/Os, the
in situ analytics paradigm proposes to process the data as closely as
possible to where and when the data are produced. Processing can be
embedded in the simulation code, executed asynchronously on helper
cores on the same nodes, or performed in transit on staging nodes
dedicated to analytics. Today, software environnements as well as
usage scenarios still need to be investigated before in situ analytics
become a standard practice. In this paper [3] we introduce a framework for designing, deploying and executing in situ scenarios. Based on a component model, the scientist designs analytics workflows by first developing processing components that are next assembled in a dataflow graph through a Python script. At runtime the graph is instantiated according to the execution context, the framework taking care of deploying the application on the target architecture and coordinating the analytics workflows with the simulation execution. Component coordination, zero-copy intra-node communications or inter-nodes data transfers rely on per-node distributed daemons. We evaluate various scenarios performing in situ and in transit analytics on large molecular dynamics systems simulated with Gromacs using up to 1664 cores. We show in particular that analytics processing can be performed on the fraction of resources the simulation does not use well, resulting in a limited impact on the simulation performance (less than