Presentations

Here are a few key links for this workshop:

Demos

Exploring Data Science with Arkouda: A Practical Introduction to Scalable Data Science

Work-in-progress: CUDA Python object models and parallelism models

Seamlessly scale your python program from single CPU core to multi-GPU multi-node HPC cluster with cuNumeric

  • Presenter: Wonchan Lee, Manolis Papadakis, Mike Bauer, Bo Dong

  • SC Presentation

Visualizing Workflows with the Dragon Telemetry Service

Accelerating Python Applications with Dask and ProxyStore

PyOMP: Parallel programming for CPUs and GPUs with OpenMP and Python

Lightning Talks

Accelerated massive data analytics for semiconductors

Presenter: Quynh L. Nguyen

X-ray experiments at the Linac Coherent Light Source (LCLS), SLAC National Accelerator Laboratory, enables new scientific discoveries of matter. Accompanied challenges include extracting important insights from massive amount of data being generated at TeraBytes/hour for effective experiment-steering. This rate will increase to TBs/second with our newly commissioned LCLS-II/HE facilities. We developed functions in cuNumerics that are relevant for scientific computing and implemented them for live-analysis during an experiment. We found a 6x speed up as compared to our routine data analytics using Numpy. By using this new approach, we extract comprehensive information on material properties at higher efficiency.

In-Transit Machine Learning of Plasma Simulations on Exascale systems

Presenter: Vineeth Gutta

Traditional ML workflows use offline training where the data is stored on disk and is subsequently loaded into accelerator (CPU,GPU, etc) memory during training or inference. We recently devised a novel and scalable in-transit ML workflow for a plasma-physics application (chosen as 1 out of 8 compelling codes in the country) for the world’s fastest supercomputer, Frontier, with an aim to build a high-energy laser particle accelerator. This in-transit workflow solves the challenge of coupling full-scale particle-in-cell simulations with distributed ML training on PyTorch using DDP enabling the model to learn correlations between emitted radiation and particle dynamics within simulation in an unsupervised method. Simulations on Exascale systems create volumes of data that is infeasible to store on HPC file systems. A mismatch between modern memory hierarchies occurs due to high volume and rate of data generation. The workflow demonstrates use of data reduction combined with inversion using invertible neural networks to reconstruct the simulation. We use continuous learning where the data is consumed in batches as the simulation produces the data and then discards after each batch is trained. We demonstrate this at scale on Frontier using 400 AMD MI250X GPUs and show the flexibility of such workflows beyond the plasma simulation science case, opening up the possibility of running in-transit ML with other surrogate models and foundation models.