Voronoi mesh as the basic primitive in the field of computational geometry has been widely applied on solving fluid-related problem by finite volume method. By analyzing various limited conditions for reasonable point...
详细信息
Researches on technologies about testing IOPS and data transfer speed of disk arrays in mass storage systems. We propose a parallel testing technology towards the high performance disk array and realize the testing wo...
详细信息
Parallaxis is a machine-independent language for data-parallelprogramming, based on sequential Modula-2. programming in Parallaxis is done on a level of abstraction with virtual processors and virtual connections, wh...
详细信息
ISBN:
(纸本)0780320182
Parallaxis is a machine-independent language for data-parallelprogramming, based on sequential Modula-2. programming in Parallaxis is done on a level of abstraction with virtual processors and virtual connections, which may be defined by the application programmer. This paper describes Parallaxis-III, the current version of the language definition, together with a number of parallel sample algorithms.
Computed tomography (CT) technology has been used in many fields. But the slow speed of CT image reconstruction is unbearable in some situation. The parallel processing based on graphic processing unit (GPU) is a grea...
详细信息
parallelism suffers from a lack of programming languages both simple to handle and able to take advantage of the power of present parallel computers. If parallelism expression is too high level, compilers have to perf...
详细信息
ISBN:
(纸本)0818678836
parallelism suffers from a lack of programming languages both simple to handle and able to take advantage of the power of present parallel computers. If parallelism expression is too high level, compilers have to perform complex optimizations leading often to poor performances. One the other hand, too low level parallelism transfers difficulties reward the programmer. In this paper, we propose a new programming language that integrates both a synchronous data-parallel progamming model and an asynchronous execution model. The synchronous data-parallelprogramming model allows a safe program designing. The asynchronous execution model yields an efficient execution on present MIMD architectures without any program transformation. Our language relies on a logical instruction ordering exploited by specific send/receive communications. It allows to express only the effective data dependences between processors. This ability is enforced by a possible send/receive unmatching useful for irregular algorithms. A sparse vector computation exemplifies our language potentialities.
parallel computing is notoriously challenging due to the difficulty in developing correct and efficient programs. With the arrival of multi-core processors for desktop systems, desktop applications must now be paralle...
详细信息
Electromagnetic researchers are often faced with long execution time and therefore algorithmic and implementation-level optimization can dramatically increase the overall performance of electromagnetism simulation usi...
详细信息
irregular particle-based applications that use trees, far example hierarchical N-body applications, are important consumers of multiprocessor cycles, and are argued to benefit greatly in programming ease from a cohere...
详细信息
ISBN:
(纸本)0818684038
irregular particle-based applications that use trees, far example hierarchical N-body applications, are important consumers of multiprocessor cycles, and are argued to benefit greatly in programming ease from a coherent shared address space programming model. As more and more supercomputing platforms that can support different programming models become available to users, from tightly-coupled hardware-coherent machines to clusters of workstations or SMPs, to truly deliver on its ease of programing advantages to application users it is important that the shared address space model nor only perform and scale well in the rightly-coupled case but also port well in performance across the range of platforms (as the message passing model can). For tree-based N-body applications, this is currently not true: While the actual computation of interactions ports well, the parallel tree building phase can become a severe bottleneck on coherent shared address space platforms, in particular an platforms with less aggressive, commodity-oriented communication architectures (even though it rakes less than 3 percent of the time in most sequential executions). We therefore investigate the performance of five parallel tree building methods in the context of a complete galaxy simulation on four very different platforms that support this programming model: an SGI Origin2000 (an aggressive hardware cache-coherent machine with physically distributed memory), an SGI Challenge bits-based shared memory multiprocessor art Intel Paragon running a shared virtual memory protocol in software at page granularity, and a Wisconsin Typhoon-zero in which the granularity of coherence can be varied using hardware support but the protocol runs in software (in the last case using both a page-based and a fine-grained protocol). We find that the algorithms used successfully and widely distributed so far for the first two platforms cause overall application performance to be very poor on the latter two commodit
Grain packing is an important problem to the development of efficient parallel programs. It is desirable that the grain packing can be performed automatically, so that the programmer can write parallel programs withou...
详细信息
Grain packing is an important problem to the development of efficient parallel programs. It is desirable that the grain packing can be performed automatically, so that the programmer can write parallel programs without being troubled by the details of parallel-programming languages and parallelarchitectures, and the same parallel program can be executed efficiently on different machines. This paper presents a 2D Compression (2DC) grain packing method for determining optimal grain size and inherent parallelism concurrently. This ability is mainly due to 2DC's continuing efforts for achieving conflicting objectives. Experimental results demonstrate that 2DC increases the solution effectiveness, in comparison with state-of-art approaches that aim at economizing either speedup or resource utilization. Additionally, 2DC can determine inherent parallelism, which means that users will no longer be required to specify the number of processors before the compilation stage.
A library, called PAD, of basic parallel algorithms and data structures for the PRAM is currently being implemented using the PRAM programming language Fork95. Main motivations of the PAD project is to study the PRAM ...
详细信息
A library, called PAD, of basic parallel algorithms and data structures for the PRAM is currently being implemented using the PRAM programming language Fork95. Main motivations of the PAD project is to study the PRAM as a practical programming model, and to provide an organized collection of basic PRAM algorithms for the SB-PRAM under completion at the University of Saarbruecken. We give a brief survey of Fork95, and describe the main components of PAD. Finally we report on the status of the language and library and discuss further developments.
暂无评论