THE EXTREMES PREDICTION USE CASE - HIPEAC

Page created by Ken Lucas
 
CONTINUE READING
THE EXTREMES PREDICTION USE CASE - HIPEAC
TECHNICAL DIMENSION

The TransContinuum Initiative: exploiting the full range of digital technologies for the prediction
of weather and climate extremes

The extremes prediction
use case
By PETER BAUER, MARC DURANTON and MICHAEL MALMS
Dealing responsibly with extreme events requires not only a drastic change in the ways society
addresses its energy and population crises. It also requires a new capability for using present and
future information on the Earth system to reliably predict the occurrence and impact of such
events. A breakthrough in Europe’s predictive capability can be made manifest through science
and technology solutions delivering as yet unseen levels of predictive reliability with real value
for society.
The “TransContinuum Initiative”, initiated by ETP4HPC, ECSO, BDVA, 5GIA, EU MATHS IN,
CLAIRE, AIOTI and HiPEAC, offers unprecedented opportunities to overcome the technological
limitations currently hampering progress in this area. Beyond providing this use case with better
technology solutions, the initiative offers a foundation for an Earth system – computational
science collaboration that will eventually lead to science and technology being truly co-developed,
and thus to sustainable benefit for one of today’s most relevant applications and for European
technology providers.

Key insights                                  Key recommendations
• The reliable prediction of weather          • Future systems will require at least     • Interoperable machine learning tool-
  and climate extremes represents an            100 times more computational power         kits facilitating the portability of data
  extreme-scale computing and data              for producing reliable predictions of      processing in the cloud and workflow
  handling challenge.                           Earth-system extremes with short lead      management options in the cloud for
                                                times. It will require implementing an     orchestrating the rather complex data
• The European and global Earth system
                                                Earth-system digital twin – a cyber-       assimilation and Earth-system simulation
  science community has established a
                                                physical entanglement. A solution is a     workloads should be developed
  solid network of technology solutions
                                                layered federation with fewer elements
  for its complex and time critical work-                                                • Several domains need to be simultane-
                                                near the heavy workloads and more
  flows; however, they will not scale to                                                   ously promoted:
                                                elements at the observational data
  meet future requirements.                                                                • for software: interactive workflows,
                                                pre-processing front-end and the data
                                                                                             mathematical methods and algorithms,
• Climate change and the urgency for            analytics post-processing back-end.
                                                                                             high-productivity programming envi-
  societies to adapt to future extremes
                                              • The extensive use of edge computing          ronments, performance models and
  require new technology solutions that
                                                needs low-cost yet high-performance          optimization tools.
  provide breakthroughs for how we
                                                computing facilities and to overcome       • for hardware: heterogeneous proces-
  observe and simulate the Earth system.
                                                the data-transfer bottlenecks between        sor configurations through accele-
• The cross-disciplinary “TransContinuum        the computing and data intensive parts       rators and data-flow engines, high-
  Initiative (TCI)” promises true co-design     of the digital twin and downstream           bandwidth memory, deep memory
  opportunities exploiting the entire           applications                                 hierarchies for I/O and storage, super-
  breadth of digital technologies for the                                                    fast interconnects and configurable
  benefit of skilful extremes prediction                                                     computing including the supporting
  capabilities in Europe.                                                                    system software stack.

                                                                44
THE EXTREMES PREDICTION USE CASE - HIPEAC
THE EXTREMES PREDICTION USE CASE

The use case                                       which is crucially important for decision-         is the ultimate goal. The loop shown in the
   Natural hazards represent some of the           making on tight schedules [4].                     figure would be open as humans are continu-
most important socio-economic chal-                                                                   ally influencing nature, for example through
lenges our society faces in the decades to            Europe currently leads the world in             CO2 emissions at global scale or irrigation in
come. Natural hazards have caused over             medium-range weather prediction and is             agriculture at small scales. But natural vari-
1 million fatalities and over 3 trillion Euros     also a major competence centre for global          ability can also affect the system and needs
of economic loss worldwide in the last             climate prediction [5,6]. For decades, HPC         to be replicated, for example extreme events
twenty years, and this trend is accelerat-         has been a key enabler for this track-record       such as volcanic eruptions injecting large
ing as a result of the drastic rise in demand      and has led to weather and climate predic-         amounts of ash into the atmosphere leading
for resources and population growth. The           tion becoming one of the leading use cases         to a change in radiative forcing.
combination of likelihood and impact               for computing and data handling at large
makes extremes and climate action failure          scale. The apparent change in computing                Beyond advancing present-day Earth-
the leading threats for our society [1,2,3].       architectures has stimulated a wide range          system simulation and observation capa-
                                                   of programmes that aim to prepare the              bilities, the digital twin would close the gap
    The “Earth system extremes” use case           operational Earth-system monitoring and            between Earth system science and socio-
relies on very complex numerical Earth             prediction infrastructures for future tech-        economic sectors in which it might be
system simulation models that ingest               nologies [7]. These programmes increas-            applied and facilitate a high level of flexible
hundreds of millions of observations per           ingly realize that it will take more than          intervention by non-science/technology
day to help improve the formulation of             progress in HPC to fulfil future extremes          experts. Hiding the complexity of the digi-
the physical process representation as well        prediction targets. This is where the need         tal TransContinuum from the user is key
as produce the initial conditions used for         to exploit the opportunities offered by the        to success for user-driven digital twins [9].
launching predictions of the future. This          entire digital TransContinuum [8] comes
logic applies equally to weather timescales        into play, and where new ways of co-devel-            As shown in Fig. 1, the TransContin-
of days or weeks and climate timescales            oping Earth system and computational               uum embeds smart sensors and IoT within
of decades. These systems are also run as          science need to be found.                          smart networks supplying data to extreme-
ensembles whereby each ensemble member                                                                scale computing and big data handling
represents both initial condition and model           Implementing an Earth-system digital            platforms. These platforms exploit cloud
uncertainty. The result is a prediction of         twin – called the cyber-physical entangle-         services across workflows giving access
state, but also a prediction of uncertainty,       ment in Fig. 1 – through these technologies        to additional distributed computing and

Figure 1: Main elements of digital continuum and relevance for extremes prediction use case including readiness of both application and technologies
(green = established in production, but not optimal in performance; yellow = research, not in production mode; red = novel and unexplored).

                                                                        45
THE EXTREMES PREDICTION USE CASE - HIPEAC
TECHNICAL DIMENSION

data analytics capabilities. Mathematical         observations and perform on-the-fly data           the desired quality of service for different
methods and algorithms as well as artificial      pre-processing.                                    data flows. The extremes prediction use
intelligence and machine learning provide                                                            case is particularly HPC- and big data-
the glue between the elements of the Trans-          However, observations from commod-              driven and creates a significant footprint for
Continuum. Cybersecurity acts as a shield         ity devices deployed on e.g. phones, car           the supporting communication networks.
for the entire loop. Such a methodological        sensors and specialized industrial devices         Pre-processing in edge devices can already
framework based on extremes prediction            monitoring agriculture, renewable energy           offload some of this burden as this use case
also offers gains for other domains.              sources and infrastructures also offer data        requires near real-time data availability at
                                                  that is currently inaccessible to operational      the HPC facilities. The extensive use of edge
Challenges along the TransContinuum               services and that can fill vast observational      computing needs low-cost yet high-perfor-
   Assessing the challenges for the               gaps in less developed countries, offers           mance computing facilities that will interact
extremes prediction use case with respect         much finer resolution in densely populated         with end devices as well as with one another.
to the TransContinuum requires a closer           areas and in regions of significant socio-
look at the production workflow in today’s        economic interest [11]. Beyond techni-                 Given the decade-long evolution of
systems and how they are likely to evolve         cal challenges, a generic approach for fast        interconnected data collection, transmis-
in the future.                                    access to commercial data for public use           sion, pre-processing, centralized HPC
                                                  via public-private partnerships is required,       and post-processing, and dissemination
Smart sensors and Internet of Things              which is already on the agenda of inter-           in weather prediction workflows there is
   At present, Earth-system observa-              governmental organisations like the World          little room for optimization of present-day
tions that are used operationally already         Meteorological Organization. This element          systems. However, workloads, data volumes
comprise hundreds of millions of observa-         of the TransContinuum is only developing           and data diversity grow much faster than
tions collected daily to monitor the atmos-       now and has huge potential.                        in the past and the demand for providing
phere, oceans, cryosphere, biosphere and                                                             more skilful predictions is more urgent than
the solid Earth, the largest data volumes         Smart networks and services                        ever before. Therefore, network and service
being provided close to a hundred satellite          Collecting and transferring massive             solutions need to orchestrate and dynami-
instruments [10]. This volume is expected         amounts of diverse data from devices scat-         cally manage data routing and computing
to increase by several orders of magni-           tered in multiple areas via distributed pre-       resources with much more flexibility and
tude in the next decade, bringing with it a       processing centres to centralized digital          scalability. Whether both performance and
need to ingest such observations in digital       platforms and computing centres requires           cost-effectiveness need dedicated solutions
twin systems within hours. Smart sensor           a suitable network infrastructure. Evolved         or whether it can be achieved by a mixture
technology is highly relevant for satellite       networks and services should offer secure          of commercial and institutional systems is
instruments that can perform targeted             and trustable solutions that will support          an as yet unsolved question.

Figure 2: a) The volume of worldwide climate and daily weather forecast data is expanding rapidly, creating challenges for both physical archiving
and sharing, as well as for ease of access, particularly for non-experts. The figure shows the projected increase in global climate data holdings
for climate models, remotely sensed data, and in situ instrumental/proxy data (from [12]). b) Daily data volumes produced by the 50-member
ECMWF weather forecast ensemble at today’s spatial resolution of 18 km, the next upgrade to 9 km and the expected configuration in 2025.

                                                                        46
THE EXTREMES PREDICTION USE CASE - HIPEAC
THE EXTREMES PREDICTION USE CASE

Big data and data analytics                      water, food, energy, health, finance and        the decomposition of workflows such that
   The volume of Earth-system observa-           civil protection. Machine learning-based        the production chain can exploit the full
tion and simulation data in weather predic-      quality control and error correction, infor-    processing and analysis potential of the
tion already exceeds hundreds of TBytes/         mation extraction and data compression as       TransContinuum. A likely solution is a
day while climate projection programmes          well as processing acceleration will be key.    layered federation with fewer elements near
produce PByte-sized archives, which take                                                         the heavy workloads and more elements at
climate scientists years to analyse (see         High-performance and cloud computing            the observational data pre-processing front-
Fig. 2) [12]. The extrapolation of data              Today, experimental and operational         end and the data analytics post-processing
volumes and production rates to advanced         Earth-system simulations use petascale          back-end. This distribution also needs to
Earth-system digital twins will prohibit the     HPC infrastructures, and the expectation        be driven by the urgency of product deliv-
effective and timely information extraction      is that future systems will require at least    ery as, particularly for extremes prediction,
that is critical for timely actions to antici-   100 times more computational power for          processing speed trumps all other consid-
pate and mitigate the effects of extremes.       producing reliable predictions of Earth-        erations.
Both simulations and observations need to        system extremes with lead times that
be generated and combined in the Earth-          are sufficient for society and industry to         Adding flexibility without sacrificing
system digital twin within minutes to hours      respond [13]. This need translates into a       performance through the TransContin-
of time-critical workflows towards near-         new software paradigm to gain full and          uum is where most of the strategic recom-
real-time decision-making.                       sustainable access to low-energy process-       mendations for research and innovation
                                                 ing capabilities, dense memory hierar-          compiled by the European Technology
   Overcoming the data-transfer bottle-          chies as well as post-processing and data       Platform for HPC (ETP4HPC) [15], the Big
necks between the computing and data             dissemination pipelines that are optimally      Data Value Association (BDVA) [16] and
intensive parts of the digital twin and          configured across centralized and cloud-        High Performance Embedded Architec-
downstream applications is crucial, and          based facilities. European leadership in this   ture and Compilation (HiPEAC) [17] are
future workflow management needs to              software domain offers a unique opportu-        positioned. For software, these are interac-
make such applications an integral part of       nity to turn European investment in HPC         tive workflows, mathematical methods and
the observation and prediction infrastruc-       digital technology into real value.             algorithms, high-productivity program-
ture. Powerful data analytics technology                                                         ming environments, performance models
and methodologies offer the only option             A big challenge is the definition of the     and optimization tools. For hardware,
to make the effective conversion from            evolution of today’s strongly centralized       heterogeneous processor configurations
PBytes/day of raw data to Mbytes/day of          workload and data lifecycle management          through accelerators and dataflow engines,
usable information. This conversion must         towards a more distributed system – still       high-bandwidth memory, deep memory
be tailored to individual sectors needing to     being data-centric to keep big data near        hierarchies for I/O and storage, super-fast
prepare and respond to extremes, namely          extreme-scale computing – that allows           interconnects and configurable computing

Figure 3: Main components of extremes prediction workflow, their alignment with edge, high-performance and cloud computing elements of the
TransContinuum, and key contributions of machine learning to workflow enhancements.

                                                                      47
THE EXTREMES PREDICTION USE CASE - HIPEAC
TECHNICAL DIMENSION

including the supporting system software         data analytics. Here, the biggest gaps are      tions will not scale to meet future require-
stack are part of the agenda [14].               interoperable machine learning toolkits         ments. True near-sensor, edge processing
                                                 facilitating the portability of data process-   does not yet exist and IoT devices are not
   For the prediction of extremes, the main      ing in the cloud and workflow management        yet used; smart and configurable networks
computing tasks are currently performed          options in the cloud for orchestrating the      as well as flexible HPC and cloud solutions
on centralized, dedicated systems where          rather complex data assimilation and Earth-     are not available; and even for the extreme-
the main data storage facilities also lie.       system simulation workloads (see Fig. 3).       scale computing and data handling tasks,
The observational input data flow crosses                                                        the present algorithmic and programming
all levels of edge, cloud and centralized        Conclusions                                     frameworks do not allow the exploitation of
computing during which selected pre-                Predicting environmental extremes            the real potential of emerging technologies.
processing steps are exercised, for example,     clearly has an extreme-scale computing
for satellite data collection and pre-process-   and data footprint. It has reached a point         Building on past European science
ing at distributed receiving stations of the     where only through significant investment       and technology programmes such as the
ground segment, their further dissemina-         in all parts of the TransContinuum can          Future and Emerging Technology for HPC
tion and processing by space agencies and        the future needs of our society for reliable    (FET-HPC) [18] under Horizon 2020, the
meteorological centres, and assimilation         information on the present and future of        recent implementation of the EuroHPC
into models at prediction centres. Similar       our planet be fulfilled. These investments      Joint Undertaking, and the Copernicus
data flow mechanisms exist for ground-           must go hand in hand with investments in        Programme, Europe has actually rec-
based meteorological observation taken           basic Earth system science, to better under-    ognized the present gaps and needs for
from networks, stations and (few) commer-        stand key processes and their interaction,      serious investments. This has motivated the
cial providers.                                  and for service provision ensuring that         ambitious Destination Earth action [19]. In
                                                 Earth system data is translated into useful     support of the Green Deal and the European
   Increasingly however, the back-end of         information for society.                        strategy for data [20], Destination Earth
model output data processing interfacing                                                         promises to bring the Earth system science,
with commercial service providers is being          While the present digital infrastructures    digital TransContinuum and service devel-
placed into the cloud, which also offers         support weather and climate prediction well     opment strands together assigning the
better access to machine learning-based          enough, the chosen domain-specific solu-        highest priority to extremes prediction and

                                                                      48
THE EXTREMES PREDICTION USE CASE - HIPEAC
THE EXTREMES PREDICTION USE CASE

climate adaptation. The programme has
adapted the concept of digital twins of the
Earth system for this purpose as promoted
by HiPEAC and ETP4HPC and its strong
digital infrastructure contribution has huge
potential for achieving European technol-
ogy leadership by addressing one of the
principal challenges for today’s society.

References
[1] UNISDR, “Economic losses, poverty & disasters, 1998-
     2017”, (UNISDR, Geneva, 2018)
[2] AON, “Weather, climate & catastrophe insight”, Annual
     report, (AON, Chicago, 2020)
[3] WEF, “The global risks report 2020”, Insight report
     15th edition, (World Economic Forum, Geneva, 2020)
[4] P. Bauer, A. Thorpe, and G. Brunet (2015). The quiet
     revolution of numerical weather prediction. Nature,
     525(7567), 47-55.
[5] R. A. Kerr (2012). One Sandy forecast a bigger winner
     than others. Science, 736-737.
[6] P. Voosen (2019). New climate models predict a
     warming surge. Science, 16.
[7] P. Neumann, J. Biercamp, (2019, September).
     ESiWACE: On European Infrastructure Efforts for
     Weather and Climate Modeling at Exascale. In 2019
     15th International Conference on eScience (eScience)
     (pp. 498-501). IEEE.
[8] “TransContinuum Initiative (TCI): our vision”, https://
     www.etp4hpc.eu/transcontinuum-initiative.html
[9] P. Bauer, B. Stevens, & W. Hazeleger (2020). A digital
     twin of Earth for the green transition. Nature Climate
     Change, submitted.
[10] W. Balogh, and K. Toshiyuki. “The World
     Meteorological Organization and Space-Based
     Observations for Weather, Climate, Water and Related
     Environmental Services.” In Space Capacity Building in
     the XXI Century, pp. 223-232. Springer, Cham, 2020.
[11] J. M. Talavera, L. E. Tobón, J. A. Gómez,
     M. A. Culman, J. M. Aranda, D. T. Parra, ... &
     L. E. Garreta, (2017). Review of IoT applications in
     agro-industrial and environmental fields. Computers and
     Electronics in Agriculture, 142, 283-297.
[12] J. T. Overpeck, G. A. Meehl, S. Bony &
     D. R. Easterling (2011). Climate data challenges in the
     21st century. Science, 331(6018), 700-702.
                                                                                                         This document is part of the HiPEAC
[13] T. C. Schulthess, P. Bauer, N. Wedi, O. Fuhrer,
     T. Hoefler & C. Schär (2018). Reflecting on the goal      Peter Bauer is the Deputy Director        Vision available at hipeac.net/vision.
     and baseline for exascale computing: a roadmap based                                                V.1, January 2021.
     on weather and climate simulations. Computing in
                                                               of the Research Department at the
                                                               European Centre for Medium-Range          Cite as: P. Bauer, M. Duranton, and M. Malms.
     Science & Engineering, 21(1), 30-41.
                                                                                                         The extremes prediction use case.
[14] P. Bauer, P. D. Dueben, T. Hoefler, T. Quintino,          Weather Forecasts, Reading, UK.
     T. Schulthess & N. P. Wedi (2020), The digital                                                      In M. Duranton et al., editors, HiPEAC Vision
     revolution of Earth-system science. Nature                                                          2021, pages 44-49, Jan 2021.
     Computational Science, submitted.                                                                   The HiPEAC project has received funding from
[15] ETP4HPC, https://www.etp4hpc.eu/pujades/files/            Marc Duranton is researcher at the        the European Union’s Horizon 2020
     ETP4HPC_SRA4_2020_web(1).pdf                              Research and Technology Department        research and innovation programme under
[16] BDVA, https://www.bdva.eu/BDVA_SRIA_v4                    of CEA (Alternative energies and Atomic   grant agreement number 871174.
[17] HiPEAC, https://www.hipeac.net/vision/#/latest/
[18] “European HPC technology research projects”, https://     Energy Commission), France and the        © HiPEAC 2021
     www.scientific-computing.com/white-paper/european-        coordinator of the HiPEAC Vision 2021.
     hpc-technology-research-projects
[19] “Destination Earth (DestinE)”, https://ec.europa.eu/
     digital-single-market/en/destination-earth-destine
[20] “A European Strategy for Data”, https://ec.europa.eu/     Michael Malms is the SRA lead editor
     digital-single-market/en/european-strategy-data           within the office team of ETP4HPC and
                                                               main initiator and coordinator of the
                                                               TransContinuum Initiative.

                                                                                 49
You can also read