On October 1st and 2nd, 2012, CEA and JSC organized a workshop on "Challenges for Tools for Exascale" at the CEA's TGCC. CEA and JSC both operate a Tier0 system for PRACE in Europe, and both sites are heavily engaged in projects to design and deliver the next generation of their production system.
The workshop gathered hardware vendors, experts in programming languages, and tools developers along with users, giving them the opportunity to discuss and understand the challenges we are facing to move from the petascale to the exascale era in a HPC production environment. From the description of hardware evolutions (current and potential) and the trends in computer languages, attendess were able to grasp the pressure placed on tools developers. Then each major tool architect presented the current status of their software and their plans on what should be done to provide still usable and insightful tools at the exascale level.
The workshop emphasized the following points:
- Hardware evolutions will force developers to think about how to
handle an increasing number of threads (millions and above), and how to cope with very dynamic environments because of failing and energy-adapting components. This emphasizes the ever more important place of runtimes and their coupling with languages and tools. A convergence in the runtime area is highly recommended.
- The number of items to present to the user introduces new challenges
in visualizing the results of the analysis of the code behaviors at scale. New techniques have been suggested. In addition, more intelligent performance analytics (beyond simple visualizations) are necessary to increase the insightfulness of the tools and to be able to cope with the ever increasing volume of performance data.
- In HPC, we must stop to think about performance (and to use tools)
as an after-thought (when we find out the program is not efficient enough). A performance-aware design, development and deployment of HPC software is needed.
- No "one size fits all" tool seems possible. Yet we noticed that a
convergence takes place at the level of performance profile and trace formats allowing for a better integration between tools. This is important to users to maximize the insight gained while minimizing the number of necessary runs for a given level of understanding.
The final panel of the workshop extended the debate by trying to answer the difficult question "What is a balanced machine ?". The audience agreed on the need of balance at all levels: memory performance to compute power and between the different kind of software components (tools, languages, runtimes, OS...).
However, the answer also depends on the type of workload, so a definitive answer seems impossible.