Voronoi mesh as the basic primitive in the field of computational geometry has been widely applied on solving fluid-related problem by finite volume method. By analyzing various limited conditions for reasonable point...
详细信息
Researches on technologies about testing IOPS and data transfer speed of disk arrays in mass storage systems. We propose a parallel testing technology towards the high performance disk array and realize the testing wo...
详细信息
parallelism suffers from a lack of programming languages both simple to handle and able to take advantage of the power of present parallel computers. If parallelism expression is too high level, compilers have to perf...
详细信息
ISBN:
(纸本)0818678836
parallelism suffers from a lack of programming languages both simple to handle and able to take advantage of the power of present parallel computers. If parallelism expression is too high level, compilers have to perform complex optimizations leading often to poor performances. One the other hand, too low level parallelism transfers difficulties reward the programmer. In this paper, we propose a new programming language that integrates both a synchronous data-parallel progamming model and an asynchronous execution model. The synchronous data-parallelprogramming model allows a safe program designing. The asynchronous execution model yields an efficient execution on present MIMD architectures without any program transformation. Our language relies on a logical instruction ordering exploited by specific send/receive communications. It allows to express only the effective data dependences between processors. This ability is enforced by a possible send/receive unmatching useful for irregular algorithms. A sparse vector computation exemplifies our language potentialities.
Computed tomography (CT) technology has been used in many fields. But the slow speed of CT image reconstruction is unbearable in some situation. The parallel processing based on graphic processing unit (GPU) is a grea...
详细信息
parallel computing is notoriously challenging due to the difficulty in developing correct and efficient programs. With the arrival of multi-core processors for desktop systems, desktop applications must now be paralle...
详细信息
Electromagnetic researchers are often faced with long execution time and therefore algorithmic and implementation-level optimization can dramatically increase the overall performance of electromagnetism simulation usi...
详细信息
irregular particle-based applications that use trees, far example hierarchical N-body applications, are important consumers of multiprocessor cycles, and are argued to benefit greatly in programming ease from a cohere...
详细信息
ISBN:
(纸本)0818684038
irregular particle-based applications that use trees, far example hierarchical N-body applications, are important consumers of multiprocessor cycles, and are argued to benefit greatly in programming ease from a coherent shared address space programming model. As more and more supercomputing platforms that can support different programming models become available to users, from tightly-coupled hardware-coherent machines to clusters of workstations or SMPs, to truly deliver on its ease of programing advantages to application users it is important that the shared address space model nor only perform and scale well in the rightly-coupled case but also port well in performance across the range of platforms (as the message passing model can). For tree-based N-body applications, this is currently not true: While the actual computation of interactions ports well, the parallel tree building phase can become a severe bottleneck on coherent shared address space platforms, in particular an platforms with less aggressive, commodity-oriented communication architectures (even though it rakes less than 3 percent of the time in most sequential executions). We therefore investigate the performance of five parallel tree building methods in the context of a complete galaxy simulation on four very different platforms that support this programming model: an SGI Origin2000 (an aggressive hardware cache-coherent machine with physically distributed memory), an SGI Challenge bits-based shared memory multiprocessor art Intel Paragon running a shared virtual memory protocol in software at page granularity, and a Wisconsin Typhoon-zero in which the granularity of coherence can be varied using hardware support but the protocol runs in software (in the last case using both a page-based and a fine-grained protocol). We find that the algorithms used successfully and widely distributed so far for the first two platforms cause overall application performance to be very poor on the latter two commodit
Grain packing is an important problem to the development of efficient parallel programs. It is desirable that the grain packing can be performed automatically, so that the programmer can write parallel programs withou...
详细信息
Grain packing is an important problem to the development of efficient parallel programs. It is desirable that the grain packing can be performed automatically, so that the programmer can write parallel programs without being troubled by the details of parallel-programming languages and parallelarchitectures, and the same parallel program can be executed efficiently on different machines. This paper presents a 2D Compression (2DC) grain packing method for determining optimal grain size and inherent parallelism concurrently. This ability is mainly due to 2DC's continuing efforts for achieving conflicting objectives. Experimental results demonstrate that 2DC increases the solution effectiveness, in comparison with state-of-art approaches that aim at economizing either speedup or resource utilization. Additionally, 2DC can determine inherent parallelism, which means that users will no longer be required to specify the number of processors before the compilation stage.
These keynote discusses the following: parallel and Interactive Computing of Big Data; Approximation algorithms: Methodologies, Applications and Empirical Evaluation.
These keynote discusses the following: parallel and Interactive Computing of Big Data; Approximation algorithms: Methodologies, Applications and Empirical Evaluation.
The proceedings contain 40 papers. The topics discussed include: parallel real-time computation: sometimes quantity means quality;a parallel tabu search and its hybridization with genetic algorithm;reconfigurable mesh...
ISBN:
(纸本)0769509363
The proceedings contain 40 papers. The topics discussed include: parallel real-time computation: sometimes quantity means quality;a parallel tabu search and its hybridization with genetic algorithm;reconfigurable mesh-connected processor arrays using row-column bypassing and direct replacement;batched circuit switched routing for efficient service of requests;environment of multiprocessor simulator development;portable runtime support for graph-oriented parallel and distributed programming;comprehensive evaluation of an instruction reissue mechanism;an accurate analysis of reliability parameters in meshes with fault-tolerant adaptive routing;on the effect of link failures in fibre channel storage area networks;and an approximation algorithm for multiprocessor scheduling of trees with communication delays.
暂无评论