Investigation of fracture mechanics problems with computational tools has always been a great challenge due to singularities present at the crack tip. FRAC3D is an effective finite element tool that benefits from enri...
详细信息
Investigation of fracture mechanics problems with computational tools has always been a great challenge due to singularities present at the crack tip. FRAC3D is an effective finite element tool that benefits from enriched element methodology. The dynamic version of this code enables the analysis of structures with stationary cracks subjected to impact loading. Response of the components in these problems are highly influenced by stress wave propagation phenomenon. In this study, bimaterial interface cracking in an electronic packaging structure is analyzed considering transient behavior. Besides the complications associated with the finite element solution of such a problem, long computational times may also be an issue considering model sizes. Multiprocessing of finite element codes could save significant times if corresponding algorithms are restructured with parallel processing tools in an efficient form. Up to 75 % reductions in time for the given example were obtained by using newly implemented multiprocessing code.
Although the teaching of programming has evolved over 50 years, all methodologies rely on a simple structure that was born a long time ago: the loop, shared by all high-level programming languages, and the preferred c...
详细信息
Although the teaching of programming has evolved over 50 years, all methodologies rely on a simple structure that was born a long time ago: the loop, shared by all high-level programming languages, and the preferred choice for any repetitive task programmers face. We analyze here how "loops" skew the way programmers solve problems, and prevent them from taking advantage of the available parallel/distributed computing architectures. To do so, we state our initial hypothesis: eliminating loops will allow a more natural parallel programming approach. The idea is to mimic a common practice today that was established in the past for a different purpose: prohibiting goto statements to improve code maintainability. This paper describes a new computer programming teaching strategy that we tested for 7 years and provides evidence on how loop prohibition, in the context of Functional programming, makes students aware of data dependencies and produces 21st-century programmers who benefit from widely available parallel architectures.
This paper aims at comparing the serial, shared memory parallelization, and distributed memory parallelization of the dynamic programming algorithm for the Knapsack Problem. Knapsack Problem is one of the most popular...
详细信息
ISBN:
(数字)9781665404761
ISBN:
(纸本)9781665446426
This paper aims at comparing the serial, shared memory parallelization, and distributed memory parallelization of the dynamic programming algorithm for the Knapsack Problem. Knapsack Problem is one of the most popular optimization problems. This is the decision-making problem and uses for real-world situations such as business projects, airline cargo business, cryptography, and decision-making industry processes, etc. The algorithm under consideration is the table- based dynamic programming algorithm based on Bellman's optimality principle. We used the C++ programming language. To solve this problem on shared memory systems, we used the OpenMP. For the distributed memory parallelization, we employed the MPI The structure of the algorithm, the data distribution, synchronization, and communication schemes are explained in detail. Extensive experiments for the developed algorithms were carried out. The obtained results helped to make a comparative analysis of the developed algorithms.
Skyline queries have been widely used in various application domains including multi-criteria decision making, search pruning, and personalized recommendation systems. Given multiple criteria, skyline queries prune th...
详细信息
Skyline queries have been widely used in various application domains including multi-criteria decision making, search pruning, and personalized recommendation systems. Given multiple criteria, skyline queries prune the search space of a large collection of multi-dimensional objects to a small set by returning objects that are not dominated by or superior to others. As an extension of the traditional skyline queries, probabilistic skyline queries aim to cope with uncertain datasets. This paper presents a novel MapReduce-based framework, ProbSky, in support of fast parallel distributed evaluation of probabilistic skyline queries on large high-dimensional data. ProbSky is adept at efficiently evaluating exact p-skyline queries on large uncertain data without compromising the quality of query results. From the theoretical point of view, we formally prove two pruning lemmas integrated with ProbSky to strengthen the early pruning capacity. ProbSky builds on top of three optimization techniques: dominant instance pruning, slab-based partitioning, and reference point-based acceleration. These extensive experiments driven by both real and synthetic datasets, reveal that compared to the state-of-the-art methods ProbSky speeds up the evaluation of the exact p-skyline queries on large high dimensional data by at least one order of magnitude in most cases. Our experimental results also validate that by balancing the memory consumption and execution time among machines, ProbSky is adroit at curbing the bottleneck effect that causes severe system performance deterioration.
In a fast-changing data-driven world, real-time data processing systems are becoming ubiquitous in everyday applications. The increasing data we produce, such as audio, video, image, and, text are demanding quickly an...
详细信息
In a fast-changing data-driven world, real-time data processing systems are becoming ubiquitous in everyday applications. The increasing data we produce, such as audio, video, image, and, text are demanding quickly and efficiently computation. Stream parallelism allows accelerating this computation for real-time processing. But it is still a challenging task and most reserved for experts. In this paper, we present SPBench, a framework for benchmarking stream processing applications. It aims to support users with a set of real-world stream processing applications, which are made accessible through an Application programming Interface (API) and executable via Command Line Interface (CLI) to create custom benchmarks. We tested SPBench by implementing parallel benchmarks with Intel Threading Building Blocks (TBB), FastFlow, and SPar. This evaluation provided useful insights and revealed the feasibility of the proposed framework in terms of usage, customization, and performance analysis. SPBench demonstrated to be a high-level, reusable, extensible, and easy of use abstraction to build parallel stream processing benchmarks on multi-core architectures.
Access transparency means that both local and remote resources are accessed using identical op-erations. With transparency, unmodified single-machine applications could run over disaggregated compute, storage, and mem...
详细信息
Access transparency means that both local and remote resources are accessed using identical op-erations. With transparency, unmodified single-machine applications could run over disaggregated compute, storage, and memory resources. Hiding the complexity of distributed systems through transparency would have great benefits, like scaling-out local-parallel scientific applications over flexible disaggregated resources in the *** paper presents a performance evaluation where we assess the feasibility of access transparency over state-of-the-art Cloud disaggregated resources for Python multiprocessing applications. We have interfaced the multiprocessing module with an implementation that transparently runs processes on serverless functions and uses an in-memory data store for shared *** evaluate transparency, we run in the Cloud four unmodified applications: Uber Research's Evolution Strategies, Baselines-AI's Proximal Policy Optimization, ***'s dataframe, and Scikit Learn's Hyperparameter tuning. We compare execution time and scalability of the same application running over disaggregated resources using our library, with the single-machine Python multipro-cessing libraries in a large VM. For equal resources, applications efficiently using message-passing abstractions achieve comparable results despite the significant overheads of remote communication. Other shared-memory intensive applications do not perform due to high remote memory *** results show that Python's multiprocessing library design is an enabler towards transparency: legacy applications using efficient disaggregated abstractions can transparently scale beyond VM limited resources for increased parallelism without changing the underlying code or architecture.(c) 2022 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://***/licenses/by/4.0/).
This work presents a novel solution for accelerating the dynamic optimal power flow using a distributed-memory parallelization approach. Unlike other two-stage relaxation-based approaches (such as ADMM), the proposed ...
详细信息
This work presents a novel solution for accelerating the dynamic optimal power flow using a distributed-memory parallelization approach. Unlike other two-stage relaxation-based approaches (such as ADMM), the proposed approach constructs the entire dynamic optimal power flow problem in parallel and solves it using a parallel primal-dual interior point method with an iterative Krylov subspace linear solver with a block-Jacobi preconditioning scheme. The parallel primal-dual interior point method has been implemented in the open-source portable, extensible toolkit for scientific computation (PETSc) library. The formulation, implementation, and numerical results on multicore computers to demonstrate the performance of the proposed approach on medium- to large-scale networks with varying time horizons are presented. The results show that a significant speedup is achieved by using a block-Jacobi preconditioner with an iterative Krylov subspace method for solving the dynamic optimal power flow problems.
Latent Dirichlet Allocation (LDA) is a statistical approach for topic modeling with a wide range of applications. Attracted by the exceptional computing and memory throughput capabilities, this work introduces ezLDA w...
详细信息
Latent Dirichlet Allocation (LDA) is a statistical approach for topic modeling with a wide range of applications. Attracted by the exceptional computing and memory throughput capabilities, this work introduces ezLDA which achieves efficient and scalable LDA training on GPUs with the following three contributions: First, ezLDA introduces three-branch sampling method which takes advantage of the convergence heterogeneity of various tokens to reduce the redundant sampling task. Second, to enable sparsity-aware format for both D and W on GPUs with fast sampling and updating, we introduce hybrid format for W along with corresponding token partition to T and inverted index designs. Third, we design a hierarchical workload balancing solution to address the extremely skewed workload imbalance problem on GPU and scale ezLDA across multiple GPUs. Taken together, ezLDA achieves superior performance over the state-of-the-art attempts with lower memory consumption.
Cloud warehouses are increasingly adopting CPU-GPU collaborative systems to leverage diverse types and levels of parallelism in applications. These environments are shared among multiple clients to achieve maximum res...
详细信息
Cloud warehouses are increasingly adopting CPU-GPU collaborative systems to leverage diverse types and levels of parallelism in applications. These environments are shared among multiple clients to achieve maximum resource utilization with energyf efficiency and scalability. While OpenCL simplifies resource provisioning in such heterogeneous systems, ensuring the effective distribution of tasks remains challenging as CPU-GPU available architectures and workload characteristics can vary significantly. This study addresses the challenge of efficiently provisioning resources in OpenCL-based CPU-GPU cloud environments. To tackle this challenge, we introduce MultiProvision, a Design Space Exploration tool for multi-tenant resource provisioning in CPU-GPU environments. MultiProvision facilitates the identification of the most suitable provisioning strategy for a given workload and architecture scenario in a transparent manner. Through comprehensive evaluations encompassing various architecture combinations and workloads, we demonstrate that the choice of the most efficient
The Bellman operator constitutes the foundation of dynamic programming (DP). An alternative is presented by the Gauss-Seidel operator, whose evaluation, differently from that of the Bellman operator where the states a...
详细信息
暂无评论