Workshop organised in conjunction with eScience 2018 Conference (29th October 2018) in Amsterdam, The Netherlands
On the 29th of October, 35 eScientists joined a half-a-day workshop “Platform-driven e-infrastructure
innovations” to discuss extreme data services and their impacts, development paths and the road towards sustainability. The workshop, organised by the PROCESS project, included contributions from ESCAPE-2, EPEEC, EOSC-hub and DEEP-HybridDataCloud projects. One of the main goals of the workshop was to bring together projects that had potential synergies and common interests that could serve as a starting point for future joint activities that could be both deeper and broader in nature.
The workshop was opened by Nils gentschen Felde (PROCESS project’s deputy coordinator), who presented the PROCESS project’s basic goals and organisational structure. This was followed by an opening talk “Towards Exascale Computing” by project coordinator Dieter Kranzlmüller. He raised several points related to how translation of HPC system benchmark figures into accelerated science is a multifaceted challenge that requires paying attention to several organisational, technical and operational issues that may not be obvious in the abstract. As a concrete, recent example of the scientific impact of increasing available computational capabilities by orders of magnitude, Dieter
Kranzlmüller reported the initial findings of genetic mapping related to breast cancer therapies. A research team was given the access to full 6.8 PFLOP system for the duration of a weekend (between the end of a maintenance break and start of the normal, scheduled use). In addition to the expected quantitative benefits, this activity produced immediate results that were qualitatively fundamentally different from ones achieved before.
The vision of the Exascale Landscape was followed by a data-driven literature review presented by Stijn Helders from NLeSC. Due to the large volume of the exascale-related publications a traditional literature review would be difficult (if not impossible). To overcome this limitation, full texts of the exascale-related publications were downloaded from Elsevier Scopus and analysed and clustered e.g. based on keywords and citation networks. Already a preliminary analysis could identify some issues that are relevant to the identification of essential competences for successful exascale deployment that may not be immediately obvious. For example, the fault tolerance of the system was a recurring theme in the publications analysed, which might allow us to identify communities with relevant, complementary skills and knowledge that could speed up the exascale deployment. On the other hand, the applications used in the domain sciences to be supported by the next generation HPC were typically not discussed (at least in detail) in the analysed exascale publications.
These overview presentations were followed by the presentation of the PROCESS use cases, system architecture and its components and a finally a short live demonstration of the first prototype implementation of the PROCESS platform. The SKA/LOFAR pilot was presented in detail by Jason Maassen (NLeSC) with Matti Heikkurinen (LMU) presenting the overview of the use cases and some details of the four other pilots.
After a coffee break, the contributing projects presented their approaches and goals. ESCAPE-2 project was presented by Andreas Mueller (ECMWF). The project’s goal is to make it possible to increase the resolution of the global weather model applications from the current 18km to 5km. This requires a comprehensive rethinking of the software and hardware solutions – both in order to reach sufficient performance and to remain within the constraints imposed by the energy consumption. The work presents also a formidable data management challenge, with the current archives of weather prediction data exceeding 290PB and growing at the rate of 200TB per day.
The EPEEC project was presented by Antonio J. Peña. The project aims at creating a parallel programming environment that supports application developers to utilise exascale supercomputers efficiently. This support needs to cover heterogeneous application mix: compute-intensive, data-intensive, and extreme data solutions. The EPEEC solution includes advanced analysis of the application code, including automatic code annotation and analysis of the code patterns (often repeating software components that can have a fundamental impact on the reliability and efficiency of the application).
The DEEP project was presented by Martin Bobak (UISAV). The project aims at creating modular, reusable components for deep learning, post-processing and on-line analysis of data streams. The component library approach (which could be seen as being a similar, but complementary, approach to code pattern approach used by the DEEP project) aims at providing – in addition to performance benefits – interoperability between existing and emerging IaaS solutions.
As a final contributed presentation, Mark van de Sander (SURFsara) presented the vision and current status of the EOSC initiative. The topic of special interest was the EOSC service portal – the ways the projects participating in the workshop could become service providers was discussed in some detail. The final set of presentations before the panel presented the PROCESS architecture in detail, including the containerised approach and principles of the service orchestration.
A panel discussion with all the speakers finalised the workshop. The discussion was kicked off by a provocative question: how the current exascale and extreme data initiatives are different from the initiatives that have preceded them during the last two decades. The discussion concluded that while ideas themselves may have reached maturity earlier on, “things start to work” only once the surrounding infrastructure and interest from key stakeholders reach a sufficient level. On the other hand, it was noted that some of the working solutions (such as GridFTP) from the earlier efforts tend to survive and evolve in the context of new efforts.
In conclusion, there was a strong consensus that further cross-project events like this initial workshop would be very useful tools for speeding up both to increase awareness of the solutions being developed and to encourage reuse and uptake of the existing solutions. Thus, in addition to possible joint publications and bilateral collaborations, the participants agreed to study possibilities to organise a similar event in roughly 6 months’ time, ideally with a larger number of projects representing a broader range of key stakeholders.