Parsec is a parallel programming environment whose goal is to simplify the development of multicomputer programs without, as is often the case, sacrificing performance. We have reconciled these objectives by "com...
详细信息
Parsec is a parallel programming environment whose goal is to simplify the development of multicomputer programs without, as is often the case, sacrificing performance. We have reconciled these objectives by "compiling" the structure of parallel applications into information to configure each of a small set of communication primitives on a context-sensitive basis. In this paper, we show how Parsec can be used to implement a high-performance processor farm and compare Parsec and hand-optimized implementations to demonstrate that Parsec can achieve a similar level of performance. Extensive static analysis and optimization is necessary to achieve these results. We discuss both the tools which perform these tasks as well as the user interface that provides the necessary declarative structural information. Using the processor farm, we show how Parsec simplifies the task of specifying the structure of a parallel application and improves the result by supporting abstraction, reuse and scalability.< >
With the shrinking of transistors continuing to follow Moore's Law and the non-scalability of conventional out-of-order processors, multi-core systems are becoming the design choice for industry. Performance extra...
详细信息
With the shrinking of transistors continuing to follow Moore's Law and the non-scalability of conventional out-of-order processors, multi-core systems are becoming the design choice for industry. Performance extraction is thus largely alleviated from the hardware and placed on the pro-gr ammer/compiler camp, who now have to expose Thread Level parallelism (TLP) to the underlying system in the form of explicitly parallel applications. Unfortunately, parallel programming is hard and error-prone. The programmer has to parallelize the work, perform the data placement, and deal with thread synchronization. Systems that support speculative multithreaded execution like Thread Level Speculation (TLS), offer an interesting alternative since they relieve the programmer from the burden of parallelizing applications and correctly synchronizing them. Since systems that support speculative multithreading usually treat all threads equally, they are energy-inefficient. This inefficiency stems from the fact that speculation occasionally fails and, thus, power is spent on threads that will have to be discarded. In this paper we propose a power allocation scheme for TLS systems, based on Dynamic Voltage and Frequency Scaling (DVFS), that tries to remedy this inefficiency. More specifically, we propose a profitability-based power allocation scheme, where we ¿steal¿ power fro m non-profitable threads and use it to speed up more useful ones. We evaluate our techniques for a state-of-the-art TLS system and show that, with minimal hardware support, they lead to improvements in ED of up to 39.6% with an average of 21.2%, for a subset of the SPEC 2000 Integer benchmark suite.
Satellite remote sensing radar technologies provide powerful tools for geohazard monitoring and risk management at synoptic scale. In particular, advanced Multi-Temporal SAR Interferometric algorithms are capable to d...
详细信息
ISBN:
(纸本)9781479979301
Satellite remote sensing radar technologies provide powerful tools for geohazard monitoring and risk management at synoptic scale. In particular, advanced Multi-Temporal SAR Interferometric algorithms are capable to detect ground deformations and structural instabilities with millimetric precision, but impose strong requirements in terms of hardware re-sources. Recent advances in GPU computing and programming hold promise for time efficient implementation of imaging algorithms, thus enhancing the development of advanced Emergency Management Services based on Earth Observation technologies. In this study, a preliminary assessment of the potentials of GPU processing is carried out, by comparing CPU (single- and multi-thread) and GPU implementations of InSAR time-consuming algorithm kernels. In particular, it is focused on the fine coregistration of SAR interferometric pairs, a crucial step in the interferogram generation process. Experimental results are discussed.
In this paper, we propose a new family of interconnection networks, called cyclic networks (CNs), in which an intercluster connection is defined on a set of nodes whose addresses are cyclic shifts of one another. The ...
详细信息
In this paper, we propose a new family of interconnection networks, called cyclic networks (CNs), in which an intercluster connection is defined on a set of nodes whose addresses are cyclic shifts of one another. The node degrees of basic CNs are independent of system size, but can vary from a small constant (e.g., 3) to as large as required, thus providing flexibility and effective tradeoff between cost and performance. The diameters of suitably constructed CNs can be asymptotically optimal within their lower bounds, given the degrees. We show that packet routing and ascend/descend algorithms can be performed in /spl Theta/(log/sub d/ N) communication steps on some CNs with N nodes of degree /spl Theta/(d). Moreover CNs can also efficiently emulate homogeneous product networks (e.g., hypercubes and high dimensional meshes). As a consequence, we obtain a variety of efficient algorithms on such networks, thus proving the versatility of CNs.
Although Java was not specifically designed for the computationally intensive numeric applications that are the typical fodder of highly parallel machines, its widespread popularity and portability make it an interest...
详细信息
Although Java was not specifically designed for the computationally intensive numeric applications that are the typical fodder of highly parallel machines, its widespread popularity and portability make it an interesting candidate vehicle for massively parallel programming. With the advent of high-performance optimizing Java compilers, the open question is: How can Java programs best exploit massive parallelism? The authors have been contemplating this question via libraries of Java-routines for specifying and coordinating parallel codes. It would be most desirable to have these routines written in 100%-Pure Java; however, a more expedient solution is to provide Java wrappers (stubs) to existing parallel coordination libraries, such as MPI. MPI is an attractive alternative, as like Java, it is portable. We discuss both approaches here. In undertaking this study, we have also identified some minor modifications of the current language specification that would make 100%-Pure Java parallel programming more natural.
parallel programming is difficult. The need for correct and efficient parallel programs is important and one way to meet this requirement is to work on the refinement chain. Beginning with a specification written in T...
详细信息
parallel programming is difficult. The need for correct and efficient parallel programs is important and one way to meet this requirement is to work on the refinement chain. Beginning with a specification written in TLA/sup +/ (for instance), we can transform it-or refine it-into finer grained specifications. At some step, enough structure will have appeared so that we can bridge a gap to fill this structure. We introduce a more concrete version of TLA/sup +/, CTLA, where structuring concerns are to be expressed, but where distributing, mapping or implementation problems are avoided. Indeed, we firmly believe that it is a mistake to go immediately from TLA/sup +/ to a real language like CC++, since the ditch is still too wide. A numerical example supports our claim.
The Pilot library offers a new method for programmingparallel clusters in C. Formal elements from Communicating Sequential Processes (CSP) were used to realize a process/channel model of parallel computation that red...
详细信息
ISBN:
(纸本)9781424465330
The Pilot library offers a new method for programmingparallel clusters in C. Formal elements from Communicating Sequential Processes (CSP) were used to realize a process/channel model of parallel computation that reduces opportunities for deadlock and other communication errors. This simple model, plus an application programming interface (API) fashioned on C's formatted I/O, are designed to make the library easy for novice scientific C programmers to learn. Optional runtime services including deadlock detection help the programmer to debug communication issues. Pilot forms a thin layer on top of standard Message Passing Interface (MPI), preserving the letter's portability and efficiency, with little performance impact. MPI's powerful collective operations can still be accessed within the conceptual model.
This paper addresses the issues of programming a multi-level parallel *** computer has an architecture that combines multi-level parallelism for efficient *** exploit the full potential of this architecture,special fe...
详细信息
This paper addresses the issues of programming a multi-level parallel *** computer has an architecture that combines multi-level parallelism for efficient *** exploit the full potential of this architecture,special features are added to its programming language along with special functions in its *** base language is similar to *** keep the original openCL hierarchical (global,local and private) memory organization while extending openCL with features and library functions for message passing and remote function *** also add short vectors types and operations that frequently used in graphics and image *** features and library functions facilitate effective parallel programming using a combination of multi-level parallelism.
Recent efforts in adapting computer networks into system-on-chip (SOC), or network-on-chip, present a setback to the traditional computer systems for the lack of effective programming model, while not taking full adva...
详细信息
ISBN:
(纸本)9781581137620
Recent efforts in adapting computer networks into system-on-chip (SOC), or network-on-chip, present a setback to the traditional computer systems for the lack of effective programming model, while not taking full advantage of the almost unlimited on-chip bandwidth. In this paper, we propose a new programming model, called context-flow, that is simple, safe, highly parallelizable yet transparent to the underlying architectural details. An SOC platform architecture is then designed to support this programming model, while fully exploiting the physical proximity between the processing elements. We demonstrate the performance efficiency of this architecture over bus based and packet-switch based networks by two case studies using a multi-processor architecture simulator.
This paper is devoted to the research of bitmap image processing based on wavelet functions. The Daubechies wavelet function was used as a mathematical model for filtering, compression and smoothing of two-dimensional...
详细信息
This paper is devoted to the research of bitmap image processing based on wavelet functions. The Daubechies wavelet function was used as a mathematical model for filtering, compression and smoothing of two-dimensional signals, because the analysis of existing wavelet functions showed that the Daubechies wavelet family is most effective for image processing. OpenMP parallel programming in C/C++ was used for the parallelization of computing processes in image processing problems.
暂无评论