In biology, the vast majority of systems can be modeled as ordinary differential equations (ODEs). Modeling more finely biological objects leads to increase the number of equations. Simulating ever larger systems also leads to increasing the number of equations. Therefore, we observe a large increase in the size of the ODE systems to be solved. A major lock is the limitation of ODE numerical...
The end of Moore's law encourages us to challenge new approaches for the future of computing. One of the promising approaches is heterogeneous architecture with reconfigurable devices such as field-programmable gate arrays and coarse-grain reconfigurable architecture, which leverages hardware specialization and dataflow computing. In this break-out session, we will discuss subjects and...
The 2D block-cyclic pattern is a well-known solution to distribute the data of a matrix among homogenous nodes. Its ease of implementation and good performance makes it widely used.
With the increased popularity and efficiency of task-based distributed runtime systems, it becomes feasible to consider more exotic patterns. We have recently proposed improvements in two different...
Batched linear solvers, which solve many small related but independent problems, are important in several applications. This is increasingly the case for highly parallel processors such as graphics processing units (GPUs), which need a substantial amount of work to keep them operating efficiently and solving smaller problems one-by-one is not an option. Because of the small size of each...
Numerical linear algebra building blocks are used in many modern scientific applications codes. Ginkgo is an open-source numerical linear algebra software that is designed around the principles of portability, flexibility, usability, and performance. The Ginkgo library is integrated into the deal.II, MFEM, OpenFOAM, HYTEG, Sundials, XGC, HiOp, and OpenCARP scientific applications, ranging from...
Aims
libyt
provides researchers a way to analyze and visualize data using yt
(a Python package for analyzing and visualizing volumetric data) or any other Python packages during simulations runtime. User can either use a Python script or enter Python statements to analyze the ongoing data in simulations and get feedbacks instantly. This improves disk usage efficiency and makes...
The architecture of supercomputers over the years has evolved to support different need in applications that seek to solve some human concerns. Heterogeneity role nowadays is important in processors and also in the memory-storage system. In processors, we can observe CPUs, GPUs and other accelerators coexisting. In the same fashion, different kinds of memory have appeared over the years,...
With the increasing size and complexity of simulations, the need for interactions rises. JuMonC is a user controlled application, that runs parallel to the simulation and offers a REST-API for system monitoring, and is expandable through plugins to allow simulation monitoring and steering as well. This information can then be used multiple ways, for example to be displayed in Jupyter notebooks...
Simulation-based training of deep neural networks (DNN), such as surrogates and inference models, is technically challenging and expensive both memory- and computational-wise.
Large-scale deep learning applications for sciences (fluid dynamics, climate prediction, molecular structure exploration) demand novel approaches. One of them is online training, where the simulations are generated...
The INRIA AIRSEA team in Grenoble is specialized on ocean and atmosphere models.
This short talk will briefly outline our current HPC research topics:
- Domain specific languages: For patch-based Ocean simulations CROCO and NEMO
- Load balancing: Non-equal time-varying workload, AMR, coupling, homogeneous and heterogeneous systems
- Data-intensive data-flows: Part of data assimiliation...
Quantum computing is the computation using the properties of quantum states, and considered to be an important block for the post-Moore Era. This break out session aims to introduce the researches and activities in the quantum computing area from different institutes. Especially, we would like to focus on the hybrid/cooperative computations by the quantum and classical computing.
There will...
Independent Parallel Particle Layer (IPPL) is an open-source performance portable C++ library for generic computations with grids and particles. It is primarily used for large scale kinetic plasma simulations. In this talk, I will briefly introduce the library, and show some of the recent benchmarks we performed on pre-exascale leadership computing systems with thousands of GPUS and CPU cores....
The Inria Beagle project-team at LIRIS has been developing evolutionary models (Experimental Evolution In Silico) for more than 15 years, and in particular the Aevol software, which makes it possible to identify predictive molecular markers in evolution (emergence of variants, resistance to antibiotics, environmental changes). These markers can be environmental characteristics (living...
Numerical simulation is a key technology for many application domains. It is nowadays considered the third pillar of sciences (with experiment and theory) and is critical to gain a competitive position. Thanks to the democratization of high-performance computers (HPC), complex physics, molecular biology, and more generally complex systems can now be routinely simulated. Aevol (http://aevol.fr)...
Density Functional Theory (DFT) is a popular Quantum Mechanical framework for computing the properties of molecules and materials. Recent advances in linear-scaling algorithms and computing power have made it possible to apply DFT to systems of an unprecedented size. This has significant consequences for the research paradigms employed by DFT users. In this talk, we will precent our research...
In this project, we aim to enable Charm++ based HPC applications to run natively on a Kubernetes cloud platform. The Charm++ programming model provides a shrink/expand capability which matches well with the elastic cloud philosophy. We investigate how to enable running Charm++ applications with dynamic scaling of resources on Kubernetes. In order to run Charm++ applications on Kubernetes, we...
Subatomic particles have a size of about one femtometer and are
studied through measurement of scattering events at various particle
accelerator facilities around the world. An experimental event is a particle
collision that triggers a detector response, which then collects
various signals that allows the properties of the measured final state
particles to be reconstructes. For imaging...
Modern scientific instruments, such as detectors at synchrotron light sources, generate data at such high rates that online processing is needed for data reduction, feature detection, experiment steering, and other purposes. Leadership computing facilities (e.g., ALCF) are deploying mechanisms that would enable these applications to acquire (a portion of) HPC resources on-demand. These...
This short talk provides an introduction to the ongoing research at UGA /TUM (EuroHPC Time-X) on an application-driven dynamic resource approach for HPC. Time-X targets the area of parallel-in-time (PinT) integration, where resource dynamic strategies have been shown to improve the performance and efficiency of PinT algorithms.
However, current approaches to enable dynamic resources for...
Deep learning (DL) is widely used to solve classification problems previously unchallenged, such as face recognition, and presents clear use cases for privacy requirements. Homomorphic encryption (HE) enables operations upon encrypted data, at the expense of vast data size increase. RAM sizes currently limit the use of HE on DL to severely reduced use cases. Recently emerged persistentmemory...
Exascale systems draw a significant amount of power. As each application
deployed map to the various heterogeneous computing elements of these
platforms, managing how power is distributed across components becomes a
priority. The ECP Argo project is developing an infrastructure for node-local control loops that can observe application behavior and adjust resources dynamically, power...
Process malleability and dynamic resources have demonstrated, in several studies, to increase the productivity of HPC facilities, in terms of completed jobs per unit of time. In this regard, changing the number of resources assigned to an application during its execution accelerates global job processing. Furthermore, the users of malleable applications can also benefit from malleability when...
Recent advancements in High-Speed NICs have gained a speed of 400 Gbps and achieved the status of SmartNICs by enabling offloads for cryptography and virtualization. Data Processing Units (DPUs) are taking this development further by integrating performant processing cores on the SmartNIC itself.
The DOCA API for programming BlueField DPUs requires proficiency in network technologies. We...
The management and allocation of resources to users in HPC infrastructures often relies on the RJMS.
One key component for an optimized resource allocation, with respect to some objectives, is the scheduler.
Scheduling theory is interesting as it provides algorithms with performance guarantees.
These guarantees come at the cost of tedious and complex modeling effort.
The growing...
On a recent visit to NSF, I was asked about how Blue Waters was decommissioned. After describing the process, they asked me if I would write a paper/report on the process and the environmental impact. This expanded from the typical paper about the energy used by supercomputers to interest in the e-waste and other impacts. In fact, some recent decadal reports form science domains (e.g....
New memory technologies are an emerging to provide larger RAM sizes at reasonable cost and energy consumption. In addition to the conventional DRAM, recent memory infrastructures contain byte-addressable persistent memory (PMEM) technology that offers capacities higher than DRAM and better access times than Nand-based technologies such as SSDs.
In such hybrid infrastuctures, users have...
We have extended the Ray framework to enable automatic scaling of workloads on high-performance computing (HPC) clusters managed by SLURM© and bursting to a Cloud managed by Kubernetes®. Our implementation allows a single Python-based parallel workload to be run concurrently across an HPC cluster and a Cloud. The Python-level abstraction provided by our solution offers a transparent user...
Data compression is becoming a major topic in HPC, remote sensors, and scientific instrument communities. As a result, various software compression software has been developed. In addition, there is a huge interest in applying scientific data compression for real-time and streaming processing scenarios. However, software-only implementations may be challenging or impossible to meet the...
This talk will highlight recent updates in the collaboration for streaming data compression for instruments between Argonne National Laboratory and Riken R-CCS. Since the last JLESC, we've shared our compression approaches between organizations, and attempted to use each other's compression approaches. We share our findings, lessons learned, and other progress.
Today's scientific applications and advanced instruments are producing extremely large volumes of data everyday, so that error-controlled lossy compression has become a critical technique to the scientific data storage and management. Existing lossy scientific data compressors, however, are designed mainly based on error-control driven mechanism, which cannot be efficiently applied in the...
Super-resolution tools have been originally invented for image super-resolution but are also increasingly used for improving scientific simulations or data-storage. Examples range from cosmology to urban prediction. One particular network framework, physics-informed enhanced super-resolution generative adversarial networks (PIESRGANs), has been shown to be a powerful tool for subfilter...
Checkpointing large amounts of related data concurrently to stable storage is a common I/O pattern of many HPC applications in a variety of scenarios: checkpoint-restart fault tolerance, coupled workflows that combine simulations with analytics, adjoint computations, etc. This pattern is challenging because it needs to happen frequently and typically leads to I/O bottlenecks that negatively...
Performance tuning, software/hardware co-design, and job scheduling are among the many tasks that rely on models to predict application performance. We propose and evaluate low rank tensor decomposition for modeling application performance. We use tensors to represent regular grids that discretize the input and configuration domain of an application. Application execution times mapped within...
Computing at large scales has become extremely challenging due to increasing heterogeneity in both hardware and software. A positive feedback loop of more scientific insight leading to more complex solvers which in turn need more computational resources has been a continuous driver for development of more powerful platforms. The field of computer architecture is poised for more radical changes...
The new IMPROVE project at ANL is collecting and curating AI models for cancer and similar precision medicine problems. Comparing these models across a large configuration space of hyperparameters and data sets is a challenging problem. The IMPROVE team is building a scalable workflow suite to answer a range of questions that arise when attempting to run diverse models developed by different...
From the sensor to the laptop, from the telescope to the supercomputer, from the microscope to the database, scientific discovery is part of a connected digital continuum that is dynamic and fast. In this new digital continuum, Artificial intelligence (AI) is providing tremendous breakthroughs, making data analysis and automated responses possible across the digital continuum. Sage is a...
Per industry definition, objectives of Digital Twins (DTs) include facilitating real-time or on-demand investigation, impact studies, and recalibrating for monitoring, diagnostics and prognostics of virtual ecosystems representing real world scenarios. Efficient and secure implementation of complex workflows encompassing a wide range of experimental, observational and simulation computational...
BoS: Next-generation Numerical Linear Algebra Libraries
Exa-class system development has achieved some successful results, and full-scale systems are in operation. RIKEN is currently conducting a feasibility study of technological trends for developing a successor to Fugaku. Based on the experience developing the numerical library for Fugaku, RIKEN is now studying library development trends...
Interactive exploration and analysis of large amounts of data from scientific simulations, in-situ visualization and application control are convincing scenarios for explorative sciences. It is the task of High-Performance Computing (HPC) Centers to enable, support and, of course, simplify these workflows of our users of today's supercomputers. Especially technical work simplifications in...
Python is emerging as a high-productivity language favored by many application scientists and engineers in simulation/modeling, data analytics, and machine learning. Interactive parallel computing is another related trend, especially for analyzing graphs in addition to the above. The CharmTyles project is aimed at addressing these needs while providing a highly efficient and adaptive parallel...
The Blue Waters system and project was one of the measured and monitored system at scale. Now, over a Petabyte of monitoring, system activity, reliability, security, networking and performance data is available for the researchers to use for its entire operational period of over 9 years. This talk will summarize the types of data available and possible open questions that collaborators may...
With the growth and evolution of supercomputers and the incorporation of diverse technologies, monitoring their usage has become a vital necessity for administrators and users altogether. In this context, LLview monitoring structure, developed by the Jülich Supercomputing Centre, stands out for providing extensive views on the system and job operations. The recently-released new version of...
With an increasing workload diversity and hardware complexity in HPC, the boundaries of today's runtimes are pushed to their limits. This evolution needs to be matched by corresponding increases in the capabilities of system management solutions.
Power management is a key element in the upcoming exascale era. First to allow us to stay within the power budget, but also for the applications...
In situ paradigm represents a relevant alternative to classical post hoc workflows as it allows bypassing disk accesses, thus reducing the IO bottleneck. However, as most in situ data analytics tools are built on MPI, and they are complicated to set up and use, especially to parallelize irregular algorithms. In a previous work, we provided a tool that couples MPI simulations with in situ...
This is the report for the project 'Optimization of Fault-Tolerance Strategies for Workflow Applications'
Checkpoint operations are periodic and high-volume I/O operations and, as such, are particularly sensitive to interferences. Indeed, HPC applications execute on dedicated nodes but share the I/O system. As a consequence, interferences surge when several applications perform I/O...
The use of accelerators such as GPUs has become mainstream to
achieve high performance on modern computing systems. GPUs come with
their own (limited) memory and are connected to the main memory of
the machine through a bus (with limited bandwidth). When a
computation is started on a GPU, the corresponding data needs to
be transferred to the GPU before the computation starts....
Applications traditionally leverage MPI to run efficiently on HPC systems and scale up to thousands of processors. Since one decade ago, developers have also been adapting their applications to heterogeneous systems by offloading the most time-consuming computation kernels to the available GPUs. To achieve optimal performance in such applications, developers must use the non-blocking and...
Scientists have lots of data that they need to store, transport, and use. Lossy compression could be the solution, but there are 32+ compressors, each with its own interface and the interfaces of the most recent compressors often evolve. Moreover, compressors are missing key features: provenance and configuration parameter optimization. LibPressio addresses all these issues by providing a...
Task-based systems have become popular due to their ability to utilize the computational power of complex heterogeneous systems. A typical programming model used is the Sequential Task Flow (STF) model, which unfortunately only supports static task graphs. This can result in submission overhead and a task graph that is not well-suited for execution on heterogeneous systems. A common approach...
This talk will focus on the design of device support in the Template Task Graph. Specifically, TTG employs C++ coroutines to suspend tasks during times of data motion and kernel execution. This design allows TTG to support transparent device memory oversubscription by delegating memory management to the underlying PaRSEC runtime system. TTG will also offer coroutines as a means for describing...
Task-based programming models have proven to be a robust and versatile way to approach development of applications for distributed environments. The programming model itself feels natural and close to classical algorithms; the task-based distribution of tasks can achieve a high degree of performance. All this is achieved with a minimal impact on programmability. However, execution on this...
Proteins and other biological molecules are responsible for many vital cellular functions, such as transport, signaling, or catalysis, and dysfunction can result in diseases. Information on the 3-dimensional (3D) structures of biological molecules and their dynamics is essential to understand mechanisms of their functions, leading to medicinal applications such as drug design. Different...
The growing complexity arsing in the development of HPC libraries and applications impedes speedy code development. To reel in this complexity, CI tools and workflows are a great way to automate large portions of test-driven development cycles.
In this short-talk we want to present the current impact of our CI-HPC tools to automate such workflows. Our FMM library FMsolvr will be used as a...
This JLESC collaboration focuses on the prediction of flow fields using machine learning (ML) techniques. The basis for the project are jointly developed convolutional neural networks (CNNs) with an autoencoder-decoder type architecture, inspired by the work in [1]. These CNNs are used to investigate dimension-reduction techniques for a three-dimensional flow field [2]. That is, the CNNs are...
Many HPC applications display iterative patterns, where a series of computations and communications are repeated a specific number of times. This pattern happens, for example, in multi-step simulations, iterative mathematical methods and machine learning training. When these applications are coded using data-flow programming models, much time is spent creating tasks and processing dependencies...
Super-resolution networks (SRNs) are employed for enhancing the resolution of Computer Tomography (CT) images. In previous works of the JSC group, respiratory flow simulations were integrated into a data processing pipeline to facilitate diagnosis and treatment planning in rhinology [1]. However, obtaining accurate simulation results is often hindered by low CT image resolutions in clinical...
Elasticity, or the ability to adapt a system to a dynamically changing workload, has been a core feature of Cloud Computing storage since its inception more than two decades ago. In the meantime HPC applications have mostly continued to rely on static parallel file systems to store their data. This picture is now changing as more and more applications adopt custom data services tailored to...