In a perfect world, code would only be written once and would run on different devices with high efficiency. A programmer's time would primarily be spent on thinking about the algorithms and data structures, not o...
详细信息
ISBN:
(纸本)9781479920815
In a perfect world, code would only be written once and would run on different devices with high efficiency. A programmer's time would primarily be spent on thinking about the algorithms and data structures, not on implementing them. To a degree, that used to be the case in the era of frequency scaling on a single core. However, due to power limitations, parallel programming has become necessary to obtain performance gains. But parallel architectures differ substantially from each other, often require specialized knowledge, and typically necessitate reimplementation and fine tuning of application code. these slow tasks frequently result in situations where most of the time is spent reimplementing old rather than writing new code. the goal of our research is to find new programming techniques that increase productivity, maintain high performance, and provide abstraction to free the programmer from these unnecessary and time-consuming tasks. However, such techniques usually come at the cost of substantial performance degradation. this paper investigates current approaches to portable accelerator programming, seeking to answer whether they make it possible to combine high efficiency with sufficient algorithm abstraction. It discusses OpenCL as a potential solution and presents three approaches of writing portable code: GPU-centric, CPU-centric and combined. By applying the three approaches to a real-world program, we show that it is at least sometimes possible to run exactly the same code on many different devices with minimal performance degradation using parameterization. the main contributions of this paper are an extensive review of the current state-of-the-art regarding the stated problem and our original approach of addressing this problem with a generalized excessive-parallelism approach.
the proceedings contain 73 papers. the topics discussed include: alignment-based metrics for trace comparison;validation and uncertainty assessment of extreme-scale HPC simulation through Bayesian inference;energy-eff...
ISBN:
(纸本)9783642400469
the proceedings contain 73 papers. the topics discussed include: alignment-based metrics for trace comparison;validation and uncertainty assessment of extreme-scale HPC simulation through Bayesian inference;energy-efficient scheduling with time and processors eligibility restrictions;scheduling jobs with multiple non-uniform tasks;workflow fairness control on online and non-clairvoyant distributedcomputing platforms;how to be a successful thief: feudal work stealing for irregular divide-and-conquer applications on heterogeneous distributedsystems;scheduling HPC workflows for responsiveness and fairness with networking delays and inaccurate estimates of execution times;enhancing concurrency in distributed transactional memory through commutativity;adaptive granularity control in task parallel programs using multiversioning;and online dynamic dependence analysis for speculative polyhedral parallelization.
In this paper, an application of the analogical modeling and numerical simulation of the technical processes through the matrix of partial derivatives of the state vector (M-pdx) associated with Taylor series is prese...
详细信息
ISBN:
(纸本)9780769549804
In this paper, an application of the analogical modeling and numerical simulation of the technical processes through the matrix of partial derivatives of the state vector (M-pdx) associated with Taylor series is presented. the residual water blunting process is modeled as a distributed parameters process through the usage of equations with partial derivatives. In this approach the variation of the main output signal of the process (the chemical's pH value) both in relation with time and length of the tanks is considered.
In the last decades, many kinds of task execution models such as grid and cloud computing have been developed. In such distributedsystems, each task is processed by respective processor in multicored computers e.g., ...
详细信息
ISBN:
(纸本)9781479932177
In the last decades, many kinds of task execution models such as grid and cloud computing have been developed. In such distributedsystems, each task is processed by respective processor in multicored computers e.g., household PCs which we can easily harness in recent years. If there is one policy to automatically decide the "best" combination and the number of processors (and computers), we effectively utilize those computational resources, thereby large number of jobs can be executed in parallel. In this paper, we propose a method for mapping of execution units for such environments. the method adopts a remapping technology after processor-execution unit mapping[11] is finished. Experimental comparisons by a simulation show the advantages of the proposed method.
distributedparallel applications executed on heterogeneous and dynamic environments need to adapt their configuration (in terms of parallelism degree and parallelism form for each component) in response to unpredicta...
详细信息
ISBN:
(纸本)9783642400476
distributedparallel applications executed on heterogeneous and dynamic environments need to adapt their configuration (in terms of parallelism degree and parallelism form for each component) in response to unpredictable factors related to the physical platform and the application semantics. On emerging Cloud computing scenarios, reconfigurations induce economic costs and performance degradations on the execution. In this context, it is of paramount importance to define smart adaptation strategies able to achieve properties like control optimality (optimizing the application global QoS) and reconfiguration stability, expressed in terms of number of reconfigurations and the average time for which a configuration is not modified. In this paper we introduce a methodology to address this issue, based on Control theory and Optimal Control foundations. We present a first validation of our approach in a simulation environment, outlining its effectiveness and feasibility.
We study a simple parallel algorithm for computing matchings in a graph. A variant for unweighted graphs finds a maximal matching using linear expected work and O(log(2) n) expected running time in the CREW PRAM model...
详细信息
ISBN:
(纸本)9783642400476
We study a simple parallel algorithm for computing matchings in a graph. A variant for unweighted graphs finds a maximal matching using linear expected work and O(log(2) n) expected running time in the CREW PRAM model. Similar results also apply to External Memory, MapReduce and distributed memory models. In the maximum weight case the algorithm guarantees a 1/2-approximation. Although the parallel execution time is linear for worst case weights, an experimental evaluation indicates good scalabilty on distributed memory machines and on GPUs. Furthermore, the solution quality is very good in practice.
this paper introduces FLEX-MPI, a novel runtime approach for the dynamic load balancing of MPI-based SPMD applications running on heterogeneous platforms in the presence of dynamic external loads. To effectively balan...
详细信息
ISBN:
(纸本)9783642400476
this paper introduces FLEX-MPI, a novel runtime approach for the dynamic load balancing of MPI-based SPMD applications running on heterogeneous platforms in the presence of dynamic external loads. To effectively balance the workload, FLEX-MPI monitors the actual performance of applications via hardware counters and the MPI profiling interface-with a negligible overhead and minimal code modifications. Our results show that by using this approach the execution time of an application may be significantly reduced.
Graphs are used to model many real objects such as social networks and web graphs. Many real applications in various fields require efficient and effective management of large-scale graph structured data. Although dis...
详细信息
ISBN:
(纸本)9781450321747
Graphs are used to model many real objects such as social networks and web graphs. Many real applications in various fields require efficient and effective management of large-scale graph structured data. Although distributed graph engines such as GBase and Pregel handle billion-scale graphs, the user needs to be skilled at managing and tuning a distributed system in a cluster, which is a nontrivial job for the ordinary user. Furthermore, these distributedsystems need many machines in a cluster in order to provide reasonable performance. In order to address this problem, a disk-based parallel graph engine called GraphChi, has been recently proposed. Although GraphChi significantly outperforms all representative (disk-based) distributed graph engines, we observe that GraphChi still has serious performance problems for many important types of graph queries due to 1) limited parallelism and 2) separate steps for I/O processing and CPU processing. In this paper, we propose a general, disk-based graph engine called Turbo-Graph to process billion-scale graphs very efficiently by using modern hardware on a single PC. TurboGraph is the first truly parallel graph engine that exploits 1) full parallelism including multi-core parallelism and FlashSSD IO parallelism and 2) full overlap of CPU processing and I/O processing as much as possible. Specifically, we propose a novel parallel execution model, called pin-and-slide. TurboGraph also provides engine-level operators such as BFS which are implemented under the pin-and-slide model. Extensive experimental results with large real datasets show that TurboGraph consistently and significantly outperforms GraphChi by up to four orders of magnitude! Our implementation of TurboGraph is available at "http://***/turbograph" as executable files.
Nowadays we are facing an exponential growth of new data that is overwhelming the capabilities of companies, institutions and the society in general to manage and use it in a proper way. Ever-increasing investments in...
详细信息
ISBN:
(纸本)9783642400476
Nowadays we are facing an exponential growth of new data that is overwhelming the capabilities of companies, institutions and the society in general to manage and use it in a proper way. Ever-increasing investments in Big Data, cutting edge technologies and the latest advances in both application development and underlying storage systems can help dealing with data of such magnitude. Especially parallel and distributed approaches will enable new data management solutions that operate effectively at large scale.
this topic provides a forum for presentation of new results and practical experience in the development of parallel and distributed programs. the development of high-performance, correct, portable, and scalable parall...
详细信息
ISBN:
(纸本)9783642400476
this topic provides a forum for presentation of new results and practical experience in the development of parallel and distributed programs. the development of high-performance, correct, portable, and scalable parallel programs is a hard task, requiring advanced algorithms, realistic modeling, adequate programming abstractions and models, efficient design tools, high performance languages and libraries, and experimental evaluation. Current challenges in this topic are concerned with improved solutions for conciliating the transparency and expressiveness of the programming abstractions and models, with new issues arising in modern applications with increasing problem size and complexity, and in heterogeneous computing infrastructures with varying performance, scalability, failure and dynamic behaviors. this motivates for example, abstractions for handling concurrency, parallelism and distribution, and support for predictable performance, self-adaptation, fault-tolerance, and large-scale deployment.
暂无评论