Pangaea is a system that can distribute centralized Java programs, based on static source code analysis and using arbitrary distribution middleware as a back-end. As Pangaea handles the entire distribution aspect tran...
详细信息
In both database transaction management and parallel programming, parallel execution of operations is one of the most essential features. Although they look quite different, we will show that many important similariti...
详细信息
In order to be truly portable, a program must be tolerant of a wide range of development and execution environments, and a parallel program is just one which must be tolerant of a very wide range. First, the term &quo...
详细信息
In order to be truly portable, a program must be tolerant of a wide range of development and execution environments, and a parallel program is just one which must be tolerant of a very wide range. First, the term "tolerant programming" is defined. Then, a formal model called F-Nets is described in which parallel algorithms are expressed as folded partial-orderings of operations, and this is argued to provide a suitable framework for building tolerant programs. Finally, Software Cabling (SC), a very-high-level graphical programming language, demonstrates how many of the features normally expected from today's computer languages (e.g. data abstraction and data parallelism) can be obtained within the F-Net paradigm.
Multithreading is widely accepted in shared memory systems such as the symmetric multiprocessor (SMP) machine. However in the clustering environment, multithreading is considered expensive and difficult. Process is st...
详细信息
ISBN:
(纸本)0769503438
Multithreading is widely accepted in shared memory systems such as the symmetric multiprocessor (SMP) machine. However in the clustering environment, multithreading is considered expensive and difficult. Process is still used as the basic scheduling unit in the majority of parallel programming systems on clusters. To investigate the overhead a runtime system may have in supporting multithreading, an extended multithreaded Cilk runtime system called distributed Cilk, was evaluated on a cluster of SMP PCs. The distributed Cilk embodies the algorithmic features of the Cilk language, supports software distributed shared memory (DSM) for all Cilk threads in the cluster and provides user-level cluster-wide locking. The performance of the distributed Cilk was evaluated using three problems which were parallelized with different strategies to represent various parallel programming paradigms. The results were also compared with those of Tread-Marks, a typical software DSM implementation for clusters with no support for multithreading. The experiments show that multithreading Cilk programs written in a divide-and-conquer fashion with good data locality can achieve comparable performance with the corresponding multiple-process Tread-Marks programs. The results shed light on efficient multithreading for clusters.
Most current shared-memory parallel programming environments are based on thread packages that allow the exploitation of a single level of parallelism. These thread packages do not enable the spawning of new paralleli...
详细信息
Most current shared-memory parallel programming environments are based on thread packages that allow the exploitation of a single level of parallelism. These thread packages do not enable the spawning of new parallelism from a previously activated parallel region. Current initiatives (like OpenMP) include in their definition the exploitation of multiple levels of parallelism through the nesting of parallel constructs. This paper analyzes the requirements towards an efficient multi-level parallelization and reports some conclusions gathered from the experience in the parallelization of two benchmark applications. The underlying system is based on: i) an OpenMP compiler which accepts some extensions to the original definition and ii) a user-level threads library that supports the exploitation of both fine-grain and multi-level parallelism.
This paper discusses high performance clustering from a series of critical topics: architectural design, system software infrastructure, and programming environment. This is accomplished through an overview of a large...
详细信息
This paper discusses high performance clustering from a series of critical topics: architectural design, system software infrastructure, and programming environment. This is accomplished through an overview of a large scale, high performance SuperCluster (Roadrunner). This SuperCluster is based almost entirely on freely available, vendor-independent software: for example, its operating system (Linux), job scheduler (PBS), compilers (GNU/EGCS), and parallel programming libraries (MPI). The Globus toolkit, also available for this platform allows high performance distributed computing applications to use geographical distributed resources such as this SuperCluster. In addition to describing the design and analysis of the Roadrunner SuperCluster we provide experimental analyses from grand challenge applications and future directions for SuperClusters.
Because of their ability to exploit the tolerance for imprecision and uncertainty in real-world problems, and their robustness and parallelism, artificial neural networks (ANNs) and their techniques have become increa...
详细信息
Because of their ability to exploit the tolerance for imprecision and uncertainty in real-world problems, and their robustness and parallelism, artificial neural networks (ANNs) and their techniques have become increasingly important for modeling and optimization in many areas of science and engineering. As a consequence, the market is flooded with new, increasingly technical software and hardware products. This paper presents an analytical overview of the most popular ANNs, both in hardware and software modes. After an overview of ANN, the paper discusses global optimization for ANN training, and the NOVEL hybrid method is presented and its performance is discussed. The paper then discusses the techniques and means for parallelizing neurosimulations of ANNs, both at a high programming level and at a low hardware-emulation level. It then presents vector microprocessor architectures and the Spert II fixed-point system as applied to multimedia and human-machine interface. Finally, it introduces the most recently explored concept of cellular neural networks (CNN), its performance and operation are analyzed. Conclusions and recommendations conclude the paper.
The performance of parallel sorting is not well understood on hardware cache-coherent shared address space (CC-SAS) multiprocessors, which increasingly dominate the market for tightly-coupled multiprocessing. We study...
详细信息
The performance of parallel sorting is not well understood on hardware cache-coherent shared address space (CC-SAS) multiprocessors, which increasingly dominate the market for tightly-coupled multiprocessing. We study two high-performance parallel sorting algorithms, radix and sample sorting, under three major programming models-a load-store CC-SAS, message passing, and the segmented SHMEM model-on a 64-processor SGI Origin2000. We observe surprisingly good speedups on this demanding application. The performance of radix sort is greatly affected by the programming model and particular implementation used. Sample sort exhibits more uniform performance across programming models on this platform, but it is usually not so good as that of the best radix sort for larger data sets if each is allowed to use the best programming model for itself. The best combination of algorithm and programming model is radix sorting under the SHMEM model for larger data sets and sample sorting under CC-SAS for smaller data sets.
暂无评论