the proceedings contain 6 papers. the topics discussed include: accelerating microstructural analytics with Dask for volumetric x-ray images;enabling system wide shared memory for performance improvement in PyCOMPSs a...
ISBN:
(纸本)9780738110868
the proceedings contain 6 papers. the topics discussed include: accelerating microstructural analytics with Dask for volumetric x-ray images;enabling system wide shared memory for performance improvement in PyCOMPSs applications;experiences in developing a distributed agent-based modeling toolkit withpython;data engineering for HPC withpython;python workflows on HPC systems;and distributed asynchronous array computing withthe jetlag environment.
the recent successes and wide spread application of compute intensive machine learning and data analytics methods have been boosting the usage of the python programming language on HPC systems. While python provides m...
详细信息
ISBN:
(纸本)9780738110868
the recent successes and wide spread application of compute intensive machine learning and data analytics methods have been boosting the usage of the python programming language on HPC systems. While python provides many advantages for the users, it has not been designed with a focus on multi-user environments or parallel programming - making it quite challenging to maintain stable and secure python workflows on a HPC system. In this paper, we analyze the key problems induced by the usage of python on HPC clusters and sketch appropriate workarounds for efficiently maintaining multi-user python software environments, securing and restricting resources of python jobs and containing python processes, while focusing on Deep Learning applications running on GPU clusters.
Distributed agent-based modeling (ABM) on high-performancecomputing resources provides the promise of capturing unprecedented details of large-scale complex systems. However, the specialized knowledge required for de...
详细信息
ISBN:
(纸本)9780738110868
Distributed agent-based modeling (ABM) on high-performancecomputing resources provides the promise of capturing unprecedented details of large-scale complex systems. However, the specialized knowledge required for developing such ABMs creates barriers to wider adoption and utilization. Here we present our experiences in developing an initial implementation of Repast4Py, a python-based distributed ABM toolkit. We build on our experiences in developing ABM toolkits, including Repast for highperformancecomputing (Repast HPC), to identify the key elements of a useful distributed ABM toolkit. We leverage the Numba, NumPy, and PyTorch packages and the python C-API to create a scalable modeling system that can exploit the largest HPC resources and emerging computing architectures.
We describe JetLag, a python-based environment that provides access to a distributed, interactive, asynchronous many-task (AMT) computing framework called Phylanx. this environment encompasses the entire computing pro...
详细信息
ISBN:
(纸本)9780738110868
We describe JetLag, a python-based environment that provides access to a distributed, interactive, asynchronous many-task (AMT) computing framework called Phylanx. this environment encompasses the entire computing process, from a Jupyter front-end for managing code and results to the collection and visualization of performance data. We use a python decorator to access the abstract syntax tree of python functions and transpile them into a set of C++ data structures which are then executed by the HPX runtime. the environment includes services for sending functions and their arguments to run as jobs on remote resources. A set of Docker and Singularity containers are used to simplify the setup of the JetLag environment. the JetLag system is suitable for a variety of array computational tasks, including machine learning and exploratory data analysis.
Data engineering is becoming an increasingly important part of scientific discoveries withthe adoption of deep learning and machine learning. Data engineering deals with a variety of data formats, storage, data extra...
详细信息
ISBN:
(纸本)9780738110868
Data engineering is becoming an increasingly important part of scientific discoveries withthe adoption of deep learning and machine learning. Data engineering deals with a variety of data formats, storage, data extraction, transformation, and data movements. One goal of data engineering is to transform data from original data to vector/matrix/tensor formats accepted by deep learning and machine learning applications. there are many structures such as tables, graphs, and trees to represent data in these data engineering phases. Among them, tables are a versatile and commonly used format to load and process data. In this paper, we present a distributed python API based on table abstraction for representing and processing data. Unlike existing state-of-the-art data engineering tools written purely in python, our solution adopts highperformance compute kernels in C++, with an in-memory table representation with Cython-based python bindings. In the core system, we use MPI for distributed memory computations with a data-parallel approach for processing large datasets in HPC clusters.
While X-ray microtomography has become indispensable in 3D inspections of materials, efficient processing of such volumetric datasets continues to be a challenge. this paper describes a computational environment for H...
详细信息
ISBN:
(纸本)9780738110868
While X-ray microtomography has become indispensable in 3D inspections of materials, efficient processing of such volumetric datasets continues to be a challenge. this paper describes a computational environment for HPC to facilitate parallelization of algorithms in computer vision and machine learning needed for microstructure characterization and interpretation. the contribution is to accelerate microstructural analytics by employing Dask high-level parallel abstractions, which scales Numpy workflows to enable multi-dimensional image analysis of diverse specimens. We illustrate our results using an example from materials sciences, emphasizing the benefits of parallel execution of image-dependent tasks. Preliminary results show that the proposed environment configuration and scientific software stack deployed using JupyterLab at NERSC Cori enables near-real time analyses of complex, high-resolution experiments.
python has been gaining some traction for years in the world of scientific applications. However, the high-level abstraction it provides may not allow the developer to use the machines to their peak performance. To ad...
详细信息
ISBN:
(纸本)9780738110868
python has been gaining some traction for years in the world of scientific applications. However, the high-level abstraction it provides may not allow the developer to use the machines to their peak performance. To address this, multiple strategies, sometimes complementary, have been developed to enrich the software ecosystem either by relying on additional libraries dedicated to efficient computation (e.g., NumPy) or by providing a framework to better use HPC scale infrastructures (e.g., PyCOMPSs). In this paper, we present a python extension based on SharedArray that enables the support of system-provided shared memory and its integration into the PyCOMPSs programming model as an example of integration to a complex python environment. We also evaluate the impact such a tool may have on performance in two types of distributed execution-flows, one for linear algebra with a blocked matrix multiplication application and the other in the context of data-clustering with a k-means application. We show that with very little modification of the original decorator (3 lines of code to be modified) of the task-based application the gain in performance can rise above 40% for tasks relying heavily on data reuse on a distributed environment, especially when loading the data is prominent in the execution time.
Presents the introductory welcome message from the conference proceedings. May include the conference officers9; congratulations to all involved withthe conference event and publication of the proceedings record.
ISBN:
(数字)9780738110868
ISBN:
(纸本)9781665422864
Presents the introductory welcome message from the conference proceedings. May include the conference officers' congratulations to all involved withthe conference event and publication of the proceedings record.
暂无评论