the proceedings contain 8 papers. the topics discussed include: enabling python to execute efficiently in heterogeneous distributed infrastructures with PyCOMPSs;efficient pattern matching in python;real-time financia...
ISBN:
(纸本)9781450354806
the proceedings contain 8 papers. the topics discussed include: enabling python to execute efficiently in heterogeneous distributed infrastructures with PyCOMPSs;efficient pattern matching in python;real-time financial risk measurement of dynamic complex portfolios withpython and PyOpenCL;python in the NERSC exascale science applications program for data;real-time thermal medium-based breathing analysis withpython;GPUMap: a transparently GPU-accelerated python map function;nbodykit: a python toolkit for cosmology simulations and data analysis on parallel HPC systems;and python and HPC for high energy physics data analyses.
Pattern matching is a powerful tool for symbolic computations. Applications include term rewriting systems, as well as the manipulation of symbolic expressions, abstract syntax trees, and XML and JSON data. It also al...
详细信息
ISBN:
(纸本)9781450351249
Pattern matching is a powerful tool for symbolic computations. Applications include term rewriting systems, as well as the manipulation of symbolic expressions, abstract syntax trees, and XML and JSON data. It also allows for an intuitive description of algorithms in the form of rewrite rules. We present the open source python module MatchPy, which offers functionality and expressiveness similar to the pattern matching in Mathematica. In particular, it includes syntactic pattern matching, as well as matching for commutative and/or associative functions, sequence variables, and matching with constraints. MatchPy uses new and improved algorithms to efficiently find matches for large pattern sets by exploiting similarities between patterns. the performance of MatchPy is investigated on several real-world problems.
We present nbodykit, an open source, massively parallel python toolkit for cosmology simulations and data analysis developed for highperformancecomputing machines. We discuss the challenges encountered while designi...
详细信息
ISBN:
(纸本)9781450351249
We present nbodykit, an open source, massively parallel python toolkit for cosmology simulations and data analysis developed for highperformancecomputing machines. We discuss the challenges encountered while designing parallel and scalable software in pythonthat still exploits the unique interactive tools provided by the python stack. Using the mpi4py library, nbodykit implements a fully parallel, canonical set of algorithms in the field of large-scale structure cosmology and also includes a set of distributed data containers, insulated from the algorithms themselves. We use the dask library to provide a straightforward method for users to manipulate data without worrying about the costs of parallel IO operations. We take advantage of the readability of python as an interpreted language by implementing nbodykit in pure python, while ensuring highperformance by relying on external, compiled libraries, optimized for specific tasks. We demonstrate the ease of use and performance capabilities of nbodykit with several real-world scenarios in the field of cosmology.
We describe a new effort at the National Energy Research scientificcomputing Center (NERSC) in performance analysis and optimization of scientificpython applications targeting the Intel Xeon Phi (Knights Landing, KN...
详细信息
ISBN:
(纸本)9781450351249
We describe a new effort at the National Energy Research scientificcomputing Center (NERSC) in performance analysis and optimization of scientificpython applications targeting the Intel Xeon Phi (Knights Landing, KNL) manycore architecture. the python-centered work outlined here is part of a larger effort called the NERSC Exascale Science Applications Program (NESAP) for Data. NESAP for Data focuses on applications that process and analyze high-volume, high-velocity data sets from experimental or observational science (EOS) facilities supported by the US Department of Energy Office of Science. We present three case study applications from NESAP for Data that use python. these codes vary in terms of "python purity" from applications developed in pure python to ones that use python mainly as a convenience layer for scientists without expertise in lower level programming languages like C, C++ or Fortran. the science case, requirements, constraints, algorithms, and initial performance optimizations for each code are discussed. Our goal withthis paper is to contribute to the larger conversation around the role of python in high-performancecomputing today and tomorrow, highlighting areas for future work and emerging best practices.
python has been adopted as programming language by a large number of scientific communities. Additionally to the easy programming interface, the large number of libraries and modules that have been made available by a...
详细信息
ISBN:
(纸本)9781450351249
python has been adopted as programming language by a large number of scientific communities. Additionally to the easy programming interface, the large number of libraries and modules that have been made available by a large number of contributors, have taken this language to the top of the list of the most popular programming languages in scientific applications. However, one main drawback of python is the lack of support for concurrency or parallelism. PyCOMPSs is a proved approach to support task-based parallelism in pythonthat enables applications to be executed in parallel in distributed computing platforms. this paper presents PyCOMPSs and how it has been tailored to execute tasks in heterogeneous and multi-threaded environments. We present an approach to combine the task-level parallelism provided by PyCOMPSs withthe thread-level parallelism provided by MKL. performance and behavioral results in distributed computing heterogeneous clusters show the benefits and capabilities of PyCOMPSs in both HPC and Big Data infrastructures.
Risk measures, such as value-at-risk and expected shortfall, are widely used to keep track of the risk at which a financial portfolio is exposed. this analysis is not only a key part of the daily operation of financia...
详细信息
ISBN:
(纸本)9781450351249
Risk measures, such as value-at-risk and expected shortfall, are widely used to keep track of the risk at which a financial portfolio is exposed. this analysis is not only a key part of the daily operation of financial institutions worldwide, but it is also strictly enforced by regulators. While nested Monte Carlo simulations are the most flexible approach that can even deal with portfolios containing complicated derivatives, they traditionally suffer from a high computational complexity. this limits their application at certain intervals of time, mostly daily, by temporarily keeping the composition of the portfolio static. In this work, we bring together for the first time nested Monte Carlo simulations withthe real-time continuous risk measurement of complex portfolios that dynamically change their composition during intraday operation. By combining the development productivity offered by python, state-of-the-art mathematical optimizations, and the highperformance capabilities offered by PyOpenCL targeting heterogeneous computing systems, our new approach reaches a throughput between 16 and 191 trading orders per second per computing node, which corresponds to the worst-case and best-case scenarios respectively. We have also made use of the Jupyter Notebook, as an interactive interface in an interdisciplinary research environment.
Respiration monitoring is an important physiological measurement taken to determine the health of an individual. In clinical sleep studies, respiration activity is monitored to detect sleep disorders such as sleep apn...
详细信息
ISBN:
(纸本)9781450351249
Respiration monitoring is an important physiological measurement taken to determine the health of an individual. In clinical sleep studies, respiration activity is monitored to detect sleep disorders such as sleep apnea and respiratory conditions such as Chronic Obstructive Pulmonary Disease (COPD). Existing methods of respiration monitoring either place sensors on the patient's body, causing discomfort to the patient, or monitor respiration remotely with lower accuracy. We present a method of respiratory analysis that is non-contact, but also measures the exhaled air of a human subject directly through a medium-based exhale visualization technique. In this method, we place a thin medium perpendicular to the exhaled airflow of an individual, and use a thermal camera to record the heat signature from the exhaled breath on the opposite side of the material. Respiratory behaviors are extracted from the thermal data in real time using python. Our prototype is an embedded, low-power device that performs image and signal processing in real-time withpython, making use of powerful existing python modules for scientificcomputing and visualization. Our proposed respiration monitoring technique accurately reports breathing rate, and may provide other metrics not obtainable through other non-contact methods. this method can be useful for medical applications where long-term respiratory analysis is necessary, and for applications that require additional information about breathing behavior.
high level abstractions in pythonthat can utilize computing hardware well seem to be an attractive option for writing data reduction and analysis tasks. In this paper, we explore the features available in python whic...
详细信息
ISBN:
(纸本)9781450351249
high level abstractions in pythonthat can utilize computing hardware well seem to be an attractive option for writing data reduction and analysis tasks. In this paper, we explore the features available in python which are useful and efficient for end user analysis in high Energy Physics (HEP). A typical vertical slice of an HEP data analysis is somewhat fragmented: the state of the reduction/analysis process must be saved at certain stages to allow for selective reprocessing of only parts of a generally time-consuming workflow. Also, algorithms tend to to be modular because of the heterogeneous nature of most detectors and the need to analyze different parts of the detector separately before combining the information. this fragmentation causes difficulties for interactive data analysis, and as data sets increase in size and complexity (O10 TiB for a "small" neutrino experiment to the O10 PiB currently held by the CMS experiment at the LHC), data analysis methods traditional to the field must evolve to make optimum use of emerging HPC technologies and platforms. Mainstream big data tools, while suggesting a direction in terms of what can be done if an entire data set can be available across a system and analysed withhigh-level programming abstractions, are not designed with either scientificcomputing generally, or modern HPC platform features in particular, such as data caching levels, in mind. Our example HPC use case is a search for a new elementary particle which might explain the phenomenon known as "Dark Matter". Using data from the CMS detector, we will use HDF5 as our input data format, and MPI withpython to implement our use case.
We present dispel4py, a novel data intensive and highperformancecomputing middleware provided as a standard python library for describing stream-based workows. It allows its users to develop their scientific applica...
详细信息
暂无评论