Research and development
A number of observations arise out of the experience gained at the CEA’s HPC complex over the past 20 years.
Increasing user requirements have resulted in a rapid increase in demand for processing power and data management capabilities. Changing physical models are generating significantly larger flows of data from the computer to the storage and visualisation systems. Moore’s law on the rate of technological advancement in microprocessors is no longer yielding increases in processor frequency, but increases in the number of cores per processor.
The unit calculation capacity remains more or less constant, only parallelism makes it possible to increase the processing capacity of a machine.
Calculation software (codes) are the translation, through numerical algorithms, of the mathematical formulations of the physical models studied. Upstream and downstream of the calculation, the environment software manages several complex calculation preparation and analysis operations.
The numerical simulation of complex multi-physical phenomena, while respecting the scales in space and time, requires numerous calculations, which use and generate large volumes of data, on power computers: this is high performance computing.
The power of individual processors remains essentially constant; currently, the only way to increase a computer’s processing power is to scale the number of processors.
Clearly, developing future generations of supercomputers calls for disruptive technologies, in particular in the area of electric power consumption management. To this end, the CEA decided to adopt a pro-active co-design methodology for future computers, alongside Atos/Bull, with which the CEA has built up a partnership over more than 15 years.
The agreement between CEA and Atos/Bull to develop an exascale supercomputer by 2020 sets out the R&D objectives that must be achieved in order to address major challenges in areas such as energy performance, operating reliability and mass interconnection.
The following major R&D themes have been identified for HPC :
- Cluster architecture, from the compute nodes to the integration of the cluster in the computing center, including the topology of the interconnecting network.
- Grid technologies.
- The parallel programming libraries.
- The parallel file systems and associated storage.
The CEA/DAM is a "Lustre Center of Excellence". The functionalities developed within the centre are:
- The life of data within Lustre. The CEA/DAM has developed the support for the OST Pool (available in Lustre 1.8) and leads the Lustre HSM project (Hierarchical Storage Management). The goal of this project is to set up an infinite storage space within Lustre by connecting to external archiving systems (HSM_Migration).
- Lustre management tools. The CEA/DAM is developing a Lustre configuration and management tool in collaboration with Bull : Shine.
- Use of Lustre over long distance networks. The CEA/DAM is investigating sharing data between various computer centres via a file system. This experiment is being undertaken as part of the Carriocas project.
Computational grids are virtual infrastructures that take advantage of the computing power of resources distributed across different areas.
In the framework of the PRACE and DEISA projects, the CEA/DAM contributes to the development of grid services that offer a common view and allow a seamless access to supercomputers distributed across European computing centers. Main "grid" topics spotted in CEA/DAM R&D are:
- Architecture of services that bridge the user workstation world to the supercomputer world. These services cover the interactive access to the supercomputers, the job submission and the data transfer.
- Middlewares to access resources available within the grid.
The CEA/DAM contributes to the UNICORE middleware (www.unicore.eu). UNICORE (Uniform Interface to Computing Resources) provides a Grid system including client and server software. The CEA/DAM is member of the UNICORE Forum (www.unicore.eu/forum/) and of the UNICORE Technical Advisory Board. Examples of feature that have been developed:
- High-Availability on UNICORE components. The CEA/DAM has developed, in collaboration with FZJ (www.fz-juelich.de/jsc/), the load-balancing feature between multiple TSI components as well as the SSL-encryption of communications between UNICORE/X and TSI components (available in UNICORE 6.3).
- Usage of an LDAP directory to check user access rights, via an authorization plug-in for the UNICORE/X component.
The CEA/DAM has also studied methodologies to access grid resources based on a Kerberos authentication mechanism (fr.wikipedia.org/wiki/Kerberos_(protocole)) with an X.509 certificate credential. A PAM module using the Kerberos PKINIT extension has been developed and enables a seamless connection to Kerberos-based sites via GSI-SSH: pam_pkinit (sourceforge.net/projects/pam-pkinit/).
New architectures and models for programming
When working with high-performance numerical simulation, developing a new program code is not simply a case of writing a set of programming instructions. The program designer must also take account of the architecture of the massively parallel supercomputers, so that the calculations are efficiently distributed over the hundreds, or even thousands, of processors and the exchange of information is optimised.
Mastering the complexity of the modelling, the mathematical methods and algorithms, the relevant computer science and the validation techniques as well as the software production are all key factors in the success of a new numerical simulation project.
Numerical models and mathematical methods
Together with the increase in computing power (Moore's Law), the mathematical and numerical techniques used to resolve the equations from physical models have progressed greatly, despite the many difficulties related to the complexity and non-linearity of these models.
Examples include the methods for hyperbolic systems and gas dynamics, for transport and diffusion equations and for coupled systems. New techniques have been able to improve precision whilst preserving robustness. Dynamic adaptation of meshes, by refining or unrefining, is a good example. Front-capture techniques are another example, without forgetting the various finite element methods. For transport equations, improvements relate to both the statistical (Monte Carlo) and deterministic approaches. We could also consider the algorithms for efficient resolution of large linear systems, or the techniques for evaluating the uncertainties and sensitivity of the simulation parameters.
This progress in numerical models and mathematical methods comes from a continuous research effort aimed at applied mathematics, numerical analysis and computer science. This research is performed in cooperation with the various research communities concerned: university laboratories, other higher education institutes and research organisations. This can lead to more formal collaborations. The research is illustrated by internationally reviewed publications and conference presentations. It is also manifest in the supervision of trainees (Masters and third year engineering students), graduate students and postdoctoral researchers.