In this paper we show that, under different circumstances, data scheduling and loop scheduling are both useful models for parallel programs executing on shared virtual memory (SVM) systems. We therefore propose a unif...
详细信息
In this paper we show that, under different circumstances, data scheduling and loop scheduling are both useful models for parallel programs executing on shared virtual memory (SVM) systems. We therefore propose a unified programming model that permits both types of scheduling. We show that, given affine array references, a program segment which is parallel under loop scheduling can always be transformed to make it parallel under data scheduling and vice-versa, and hence that the two types of scheduling are equally powerful at exploiting parallelism. We review existing Fortran dialects for SVM and propose compiler directives that allow program segments to be data scheduled.
New compact, low-power implementation technologies for processors and imaging arrays can enable a new generation of portable video products. However, software compatibility with large bodies of existing applications w...
详细信息
New compact, low-power implementation technologies for processors and imaging arrays can enable a new generation of portable video products. However, software compatibility with large bodies of existing applications written in C prevents more efficient, higher performance data parallel architectures from being used in these embedded products. If this software could be automatically retargeted explicitly for data parallel execution, product designers could incorporate these architectures into embedded products. The key challenge is exposing the parallelism that is inherent in these applications but that is obscured by artifacts imposed by sequential programming languages. This paper presents a recognition-based approach for automatically extracting a data parallel program model from sequential image processing code and retargeting it to data parallel execution mechanisms. The explicitly parallel model presented, called multidimensional data flow ( MDDF), captures a model of how operations on data regions ( e. g., rows, columns, and tiled blocks) are composed and interact. To extract an MDDF model, a partial recognition technique is used that focuses on identifying array access patterns in loops, transforming only those program elements that hinder parallelization, while leaving the core algorithmic computations intact. The paper presents results of retargeting a set of production programs to a representative data parallel processor array to demonstrate the capacity to extract parallelism using this technique. The retargeted applications yield a potential execution throughput limited only by the number of processing elements, exceeding thousands of instructions per cycle in massivelyparallel implementations.
This research presents two critical issues in the development of an integrated route assignment and traffic simulation system of ATMS-ATIS applications. The first issue addresses the conceptual and algorithmic aspects...
详细信息
ISBN:
(纸本)0872629163
This research presents two critical issues in the development of an integrated route assignment and traffic simulation system of ATMS-ATIS applications. The first issue addresses the conceptual and algorithmic aspects of the models. the conceptual aspects of the models. The second concerns with computation and implementation efficiency which lead to the exploration of using advanced parallel computing architecture. We propose an integrated system that has been implemented on a massivelyparallel computing architecture. This paper presents the structure of the proposed system, along with a brief description of each component.
This paper emphasizes on load balancing issues associated with hybrid programmingmodels for the parallelization of fully permutable nested loops onto SMP clusters. Hybrid parallelprogrammingmodels usually suffer fr...
详细信息
ISBN:
(纸本)0769523811
This paper emphasizes on load balancing issues associated with hybrid programmingmodels for the parallelization of fully permutable nested loops onto SMP clusters. Hybrid parallelprogrammingmodels usually suffer from intrinsic load imbalance between threads, mainly because most existing message passing libraries generally provide limited multi-threading support, allowing only the master thread to perform inter-node message passing communication. In order to mitigate this effect, we propose a generic method for the application of static load balancing on the coarse-grain hybrid model for the appropriate distribution of the computational load to the working threads. We experimentally evaluate the efficiency of the proposed scheme against a micro-kernel benchmark, and demonstrate the potential of such load balancing schemes for the extraction of maximum performance out of hybrid parallel programs.
Multistage stochastic programming is a popular technique to deal with uncertainty in optimization models. However, the need to adequately capture the underlying distributions leads to large problems that are usually b...
详细信息
ISBN:
(纸本)1845641744
Multistage stochastic programming is a popular technique to deal with uncertainty in optimization models. However, the need to adequately capture the underlying distributions leads to large problems that are usually beyond the scope of general purpose solvers. Dedicated methods exist but pose restrictions on the type of model they can be applied to. parallelism makes these problems potentially tractable, but is generally not exploited in today's general purpose solvers. We apply a structure-exploiting parallel primal-dual interior-point solver for linear, quadratic and nonlinear programming problems. The solver efficiently exploits the structure of these models. Its design relies on object-oriented programming principles, treating each substructure of the problem as an object carrying its own dedicated linear algebra routines. We demonstrate its effectiveness on a wide range of financial planning problems, resulting in linear, quadratic or non-linear formulations. Also coarse grain parallelism is exploited in a generic way that is efficient on any parallel architecture from ethernet linked PCs to massivelyparallel computers. On a 1280-processor machine with a peak performance of 6.2 TFlops we can solve a quadratic financial planning problem exceeding 109 decision variables.
This paper describes a research proposal related to the design of parallel software for image processing. The proposal focuses on the design of a tool, which generates the best implementation of a given application ta...
详细信息
Building massivelyparallel numerical simulations is not easy due to lasting changes of parallelprogrammingmodels and various software technologies needed. We develop a component based graphical parallelprogramming...
详细信息
ISBN:
(纸本)9781509024032
Building massivelyparallel numerical simulations is not easy due to lasting changes of parallelprogrammingmodels and various software technologies needed. We develop a component based graphical parallelprogramming approach to lower the difficulties of coding applications in scientific and engineering computing and support rapid development of large scale simulations basing on a domain specific framework. parallel applications can be constructed simply by configuring components and assembling them in predefined flowcharts interactively. Large part of codes is auto generated from the graphical configuration for an application. The approach facilitates the rapid design and development of parallel numerical simulations by shielding many knowledge and technologies required from domain experts. Real applications demonstrate that the approach for developing complex numerical is both practical and efficient.
As our expectations of what computer systems can do and our ability to capture data improves, the desire to perform ever more computationally intensive tasks increases. Often these tasks, comprising vast numbers of re...
详细信息
ISBN:
(纸本)9781467323703;9781467323727
As our expectations of what computer systems can do and our ability to capture data improves, the desire to perform ever more computationally intensive tasks increases. Often these tasks, comprising vast numbers of repeated computations, are highly interdependent on each other - a closely coupled problem. The process of Landscape-Evolution Modelling is an example of such a problem. In order to produce realistic models it is necessary to process landscapes containing millions of data points over time periods extending up to millions of years. This leads to non-tractable execution times, often in the order of years. Researchers therefore seek multiple orders of magnitude reduction in the execution time of these models. The massivelyparallelprogramming environment offered through General Purpose Graphical Processing Units offers the potential for multiple orders of magnitude speedup in code execution times. In this paper we demonstrate how the time dominant parts of a Landscape-Evolution Model can be recoded for a massivelyparallel architecture providing two orders of magnitude reduction in execution time.
Driven by the ever-growing demand for computing power, computers are becoming more and more powerful. However, in recent years, due to the physical limitations, this increased computing power does not come in the form...
详细信息
Driven by the ever-growing demand for computing power, computers are becoming more and more powerful. However, in recent years, due to the physical limitations, this increased computing power does not come in the form of increased CPU clock speed, but in the form of more cores (processors) in a single chip die. Computer industry has started to use this new multi-core technology to massively produce systems for both stand-alone desktop PCs and high-end servers. In the near future, multi-core cluster will become one of the most economic supercomputer architectures. In order to utilize the full power of multi-core systems, some kind of parallel computing is necessary. However, parallelprogramming is notoriously known as a challenge job. This paper analyzes different parallelprogrammingmodels, compares their strengths and weaknesses on multi-core based systems, and introduces an on-going project on providing a better parallelprogramming environment based on a novel View-Oriented parallelprogramming (VOPP) model. Copyright is held by the author/owner(s).
The paper analyses computational model based on dynamic programming for platforms with multicore processors and heterogeneous architectures with FPGA. The models are applied for solving a canonical problem of dispatch...
详细信息
ISBN:
(纸本)9781509048151
The paper analyses computational model based on dynamic programming for platforms with multicore processors and heterogeneous architectures with FPGA. The models are applied for solving a canonical problem of dispatching where the computation time significantly depends on the problem scale factor. The parallel algorithms of NP-hard problem of dispatching are complicate and require intensive RAM data exchange. In order to reduce the computation time, it is suggested to use FPGA as a coprocessor providing massivelyparallel computation and increase the operational performance of the system in one order.
暂无评论