Cellular Potts Model is a mathematical model used to simulate biological systems in a wide scale range, from cells to organs. The model uses a Monte-Carlo approach to determinate for each cell, new state and actions l...
详细信息
ISBN:
(纸本)9783030238735;9783030238728
Cellular Potts Model is a mathematical model used to simulate biological systems in a wide scale range, from cells to organs. The model uses a Monte-Carlo approach to determinate for each cell, new state and actions like mitosis, movements or emission of pseudopods. Literature shows multiple implementations of CPM model, even incorporating parallel processing. These works use a data division approach that requires to take locks on data structures, or to spread information between tasks, slowing down simulations. This work proposes a fast implementation for CPM using software transactional memory to synchronize parallel tasks and to apply it to breast cancer in situ (DCIS). Execution times and speedups are calculated. Results show appreciable speedups.
Stream processing paradigm is present in several applications that apply computations over continuous data flowing in the form of streams (e.g., video feeds, image, and data analytics). Employing self-adaptivity to st...
详细信息
ISBN:
(纸本)9783030483401;9783030483395
Stream processing paradigm is present in several applications that apply computations over continuous data flowing in the form of streams (e.g., video feeds, image, and data analytics). Employing self-adaptivity to stream processing applications can provide higher-level programming abstractions and autonomic resource management. However, there are cases where the performance is suboptimal. In this paper, the goal is to optimize parallelism adaptations in terms of stability and accuracy, which can improve the performance of parallel stream processing applications. Therefore, we present a new optimized self-adaptive strategy that is experimentally evaluated. The proposed solution provided high-level programming abstractions, reduced the adaptation overhead, and achieved a competitive performance with the best static executions.
This talk will present the ongoing work of developing a Chapel implementation of Random Forest, a popular ensembling learning method utilized both for predictive modeling and feature selection. Language features in Ch...
详细信息
ISBN:
(纸本)9781728174457
This talk will present the ongoing work of developing a Chapel implementation of Random Forest, a popular ensembling learning method utilized both for predictive modeling and feature selection. Language features in Chapel make it possible to easily express shared-memory and distributed-memory implementations of this algorithm. Furthermore, Chapel's built-in python interoperability functionality made it easier to implement a python front-end, making it accessible to a language popular among data scientists.
Since they were introduced, Java streams were very fast embraced by the industry, being currently used at a large scale. The parallelism enabled by them is very easy to achieve, but it is constrained either by the use...
详细信息
ISBN:
(纸本)9781728174457
Since they were introduced, Java streams were very fast embraced by the industry, being currently used at a large scale. The parallelism enabled by them is very easy to achieve, but it is constrained either by the used parallelism model (in some cases), or by the set of operations that could be specified using streams. We investigate in this paper the possibility to enhance the computation types that could be defined using the Java streams API by introducing into this infrastructure the PowerList theory based computation. Powerlists are recursive data structures that together with their associated algebraic theory offer both abstractions in order to ease the development of parallel applications, and also a methodology to design parallel algorithms. The Java streaming infrastructure could be adapted to support them in a great measure. We present here such an adaptation, and we analyse and discuss the advantages and constraints. This analysis is exemplified by application examples.
A broad set of data science and engineering questions may be organized as graphs, providing a powerful means for describing relational data. Although experts now routinely compute graph algorithms on huge, unstructure...
详细信息
ISBN:
(纸本)9781728174457
A broad set of data science and engineering questions may be organized as graphs, providing a powerful means for describing relational data. Although experts now routinely compute graph algorithms on huge, unstructured graphs using high performance computing (HPC) or cloud resources, this practice hasn't yet broken into the mainstream. Such computations require great expertise, yet users often need rapid prototyping and development to quickly customize existing code. Toward that end, we are exploring the use of the Chapel programming language as a means of making some important graph analytics more accessible, examining the breadth of characteristics that would make for a productive programming environment, one that is expressive, performant, portable, and robust. In this talk we describe our early explorations of this space, based on miniTri [4], a miniapp from the Mantevo suite [1], and the mean hitting time algorithm [2], one of the analytics being explored within Grafiki1 [3], both of which are designed for use on distributed memory parallel processing environments. These implementations have been posed in terms of key linear algebra operations and algorithms, specifically sparse matrix-matrix multiplication, operating on integer datatypes, and the Conjugate Gradient method, based on a graph Laplacian matrix.
We present a novel approach to parallelize the SpMV kernel included in LASs (Linear Algebra routines on OmpSs) library, after a deep review and analysis of several well-known approaches. LASs is based on OmpSs, a task...
详细信息
ISBN:
(纸本)9783030581442;9783030581435
We present a novel approach to parallelize the SpMV kernel included in LASs (Linear Algebra routines on OmpSs) library, after a deep review and analysis of several well-known approaches. LASs is based on OmpSs, a task-based runtime that extends OpenMP directives, providing more flexibility to apply new strategies. Based on tasking and nesting, with the aim of improving the workload imbalance inherent to the SpMV operation, we present a strategy especially useful for highly imbalanced input matrices. In this approach, the number of created tasks is dynamically decided in order to maximize the use of the resources of the platform. Throughout this paper, SpMV behavior depending on the selected strategy (state of the art and proposed strategies) is deeply analyzed, setting in this way the base for a future auto-tunable code that is able to select the most suitable approach depending on the input matrix. The experiments of this work were carried out for a set of 12 matrices from the Suite Sparse Matrix Collection, all of them with different characteristics regarding their sparsity. The experiments of this work were performed on a node of Marenostrum 4 supercomputer (with two sockets Intel Xeon, 24 cores each) and on a node of Dibona cluster (using one ARM ThunderX2 socket with 32 cores). Our tests show that, for Intel Xeon, the best parallelization strategy reduces the execution time of the reference MKL multi-threaded version up to 67%. On ARM ThunderX2, the reduction is up to 56% with respect to the OmpSs parallel reference.
GPGPUs arc widely used in high-performance computing systems to accelerate scientific and machine learning workloads. Developing efficient GPU kernels is critically important to obtain "bare-metal" performan...
详细信息
ISBN:
(纸本)9781728199986
GPGPUs arc widely used in high-performance computing systems to accelerate scientific and machine learning workloads. Developing efficient GPU kernels is critically important to obtain "bare-metal" performance on GPU-based dusters. In this paper, we describe the design and implementation of GVPROF, the first value profiler that pinpoints value-related inefficiencies in applications running on NVIDIA GPU-based clusters. The novelly of GVPROF resides in its ability to detect temporal and spatial value redundancies, which provides useful information to guide code optimization. GVPROF can monitor production multi-node multi-GPU executions in clusters. Our experiments with well-known GPU benchmarks and HPC applications show that GVPROF incurs acceptable overhead and scales to large executions. Using GVPROF, we optimized several IIPC and machine learning workloads on one NVIDIA V100 GPU. In one case study of LAMMPS, optimizations based on information from GVProf led to whole-program speedups ranging from I.37x on a single CPU to 1.08x on 64 GPUs.
A common simplification made when modeling the performance of a parallel program is the assumption that the performance behavior of all processes or threads is largely uniform. Empirical performance-modeling tools suc...
详细信息
ISBN:
(纸本)9780738110707
A common simplification made when modeling the performance of a parallel program is the assumption that the performance behavior of all processes or threads is largely uniform. Empirical performance-modeling tools such as Extra-P exploit this common pattern to make their modeling process more noise resilient, mitigating the effect of outliers by summarizing performance measurements of individual functions across all processes. While the underlying assumption does not equally hold for all applications, knowing the qualitative differences in how the performance of individual processes changes as execution parameters are varied can reveal important performance bottlenecks such as malicious patterns of load imbalance. A challenge for empirical modeling tools, however, arises from the fact that the behavioral class of a process may depend on the process configuration, letting process ranks migrate between classes as the number of processes grows. In this paper, we introduce a novel approach to the problem of modeling of spatially diverging performance based on a certain type of process clustering. We apply our technique to identify a previously unknown performance bottleneck in the BoSSS fluid-dynamics code. Removing it made the code regions in question running up to 20 times and the application as a whole run up to 4.5 times faster.
A complete textbook and reference for engineers to learn the fundamentals of computer programming with modern C++ Introduction to programming with C++ for Engineers is an original presentation teaching the fundamental...
详细信息
ISBN:
(数字)9781119431152
ISBN:
(纸本)9781119431107
A complete textbook and reference for engineers to learn the fundamentals of computer programming with modern C++ Introduction to programming with C++ for Engineers is an original presentation teaching the fundamentals of computer programming and modern C++ to engineers and engineering students. Professor Cyganek, a highly regarded expert in his field, walks users through basics of data structures and algorithms with the help of a core subset of C++ and the Standard Library, progressing to the object-oriented domain and advanced C++ features, computer arithmetic, memory management and essentials of parallel programming, showing with real world examples how to complete tasks. He also guides users through the software development process, good programming practices, not shunning from explaining low-level features and the programming tools. Being a textbook, with the summarizing tables and diagrams the book becomes a highly useful reference for C++ programmers at all levels. Introduction to programming with C++ for Engineers teaches how to program by: Guiding users from simple techniques with modern C++ and the Standard Library, to more advanced object-oriented design methods and language features Providing meaningful examples that facilitate understanding of the programming techniques and the C++ language constructions Fostering good programming practices which create better professional programmers Minimizing text descriptions, opting instead for comprehensive figures, tables, diagrams, and other explanatory material Granting access to a complementary website that contains example code and useful links to resources that further improve the reader’s coding ability Including test and exam question for the reader’s review at the end of each chapter Engineering students, students of other sciences who rely on computer programming, and professionals in various fields will find this book invaluable when learning to program with C++.
Current scientific workflows are large and complex. They normally perform thousands of simulations whose results combined with searching and data analytics algorithms, in order to infer new knowledge, generate a very ...
详细信息
ISBN:
(纸本)9783030576752;9783030576745
Current scientific workflows are large and complex. They normally perform thousands of simulations whose results combined with searching and data analytics algorithms, in order to infer new knowledge, generate a very large amount of data. To this end, workflows comprise many tasks and some of them may fail. Most of the work done about failure management in workflow managers and runtimes focuses on recovering from failures caused by resources (retrying or resubmitting the failed computation in other resources, etc.) However, some of these failures can be caused by the application itself (corrupted data, algorithms which are not converging for certain conditions, etc.), and these fault tolerance mechanisms are not sufficient to perform a successful workflow execution. In these cases, developers have to add some code in their applications to prevent and manage the possible failures. In this paper, we propose a simple interface and a set of transparent runtime mechanisms to simplify how scientists deal with application-based failures in task-based parallel workflows. We have validated our proposal with use-cases from e-science and machine learning to show the benefits of the proposed interface and mechanisms in terms of programming productivity and performance.
暂无评论