In this paper, two programming tools are presented, facilitating the development of portable parallelapplications on distributed memory systems. The Orchid system is a software platform, i.e. a set of facilities for ...
详细信息
In this paper, two programming tools are presented, facilitating the development of portable parallelapplications on distributed memory systems. The Orchid system is a software platform, i.e. a set of facilities for parallel programming. It consists of mechanisms for transparent message passing and a set of primitive functions to support the distributed shared memory programming model. In order to free the user from the tedius task of parallel programming, a new environment for logic programming is introduced: the Daffodil framework. Daffodil, implemented on top of Orchid, evaluates pure PROLOG programs, exploiting the inherent And/OR parallelism. Both systems have been implemented and evaluated on various platforms, since the layered structure of Orchid ensures portability only by re-engineering a small part of the code.
In this paper the control architecture and the characteristics of the synchronization of industrial application are presented. The control procedure is implemented with loosely coupled distributed real time system, in...
详细信息
In this paper the control architecture and the characteristics of the synchronization of industrial application are presented. The control procedure is implemented with loosely coupled distributed real time system, in where parallelprocessing is possible. There are five nodes in the network, one master actuator, three slave actuators and an machine controller. All nodes are implemented using Motorola's 68332 controller. The position and velocity of the master are transmitted as a command to slaves. Network protocol used is CAN (Controlled Area Network). On-line correction and synchronization are done through serial based network. In this paper synchronization methods, the characteristics of CAN, control architecture and electronics used are introduced.
The proceedings contain 31 papers. The special focus in this conference is on Programming Methods, Compiling Techniques, Mapping and Scheduling. The topics include: Regular versus irregular problems and algorithms;alg...
ISBN:
(纸本)3540603212
The proceedings contain 31 papers. The special focus in this conference is on Programming Methods, Compiling Techniques, Mapping and Scheduling. The topics include: Regular versus irregular problems and algorithms;algorithmic skeletons for adaptive multigrid methods;run-time techniques for parallelizing sparse matrix problems;fast execution of irregularly structured programs with low communication frequency on the hypercube;run-time parallelization of irregular do across loops;instruction scheduling and global register allocation for SIMD multiprocessors;general bounds for the assignment of irregular dependency graphs;a new scheme for dynamic processor assignment for irregular problems;an efficient mean field annealing formulation for mapping unstructured domains to hypercubes;partitioning and mapping of unstructured meshes to parallel machine topologies;integrating software pipelining and graph scheduling for iterative scientific computations;on the scope of applicability of the ETF algorithm;optimal mapping of neighbourhood-constrained systems;parallelprocessing in DNA analysis;solving computational fluid dynamics problems on unstructured grids with distributedparallelprocessing;parallel decomposition of unstructured fem-meshes;a parallelprocessing paradigm for irregular applications;load balancing strategies for a parallel system of particles;a reconfigurable parallel algorithm for sparse cholesky factorization;adapted wavelet analysis on moderate paralleldistributed memory MIMD architectures;a new parallel approach to the constrained two-dimensional cutting stock problem;parallel search for combinatorial optimization;better algorithms for parallel backtracking;parallel game tree search on SIMD machines;asynchronous parallel branch and bound and anomalies and fast priority queues for parallel branch-and-bound.
This paper presents the results of performance analysis of a seismic analysis kernel code on the KSR multiprocessors. The purpose of such analysis is to understand the performance behaviors of a class of applications ...
详细信息
This paper presents the results of performance analysis of a seismic analysis kernel code on the KSR multiprocessors. The purpose of such analysis is to understand the performance behaviors of a class of applications on shared memory parallel machines. The g5 kernel code, commonly used in seismic analysis applications, is parallelized, and its computational and I/O performance is analyzed on a 32-node KSR-1 and a 64-node KSR-2.
Numerous applications in science and engineering have a problem space that can be represented as a 2-dimensional grid. While some of these problems exhibit uniform computational requirements over all regions of the gr...
详细信息
Numerous applications in science and engineering have a problem space that can be represented as a 2-dimensional grid. While some of these problems exhibit uniform computational requirements over all regions of the grid, others are non-uniform: that is, some regions of the grid have more data points than others. We introduce a new block decomposition method, Fair Binary Recursive Decomposition (FBRD), which is suitable for a collection of heterogeneous processors, and extend it to accommodate non-uniform problems (NUFBRD). Mathematical comparisons of the NUFBRD method and other common partitioning schemes are presented to show the expected performance level of this new decomposition technique.
All-to-all personalized communication, or complete exchange, is at the heart of numerous applications in parallel computing. An efficient complete exchange algorithm is proposed for square 2n 215; 2n wormhole-routed...
详细信息
All-to-all personalized communication, or complete exchange, is at the heart of numerous applications in parallel computing. An efficient complete exchange algorithm is proposed for square 2n × 2n wormhole-routed tori. Previous work has only considered complete exchange algorithms for mesh networks. The proposed algorithm effectively uses the bisection bandwidth of a torus, which is twice that for an equal sized mesh, to achieve complete exchange in a time which is almost half of the (best known) complete exchange time on an equal sized mesh.
Factors affecting a characteristic flux pinning in light rare earth (RE)-Ba-Cu-O (RE: nd, Sm, Eu, Gd) superconductors fabricated by the oxygen-controlled-melt-growth (OCMG) process have been investigated through a com...
详细信息
Factors affecting a characteristic flux pinning in light rare earth (RE)-Ba-Cu-O (RE: nd, Sm, Eu, Gd) superconductors fabricated by the oxygen-controlled-melt-growth (OCMG) process have been investigated through a comparative study. At 77K and for the applied field parallel to the c-axis of the sample (H//c), the flux pinning of all OCMG-processed REBa(2)Cu(3)O(y) (RE 123) samples studied was very sensitive to the oxygen partial pressure (P-O2) controlled during the melt growth and thus, with lowering P-O2, the peak field (B-pk) in the M-H loops shifted to a high field and the irreversibility line (IL) shifted to a high H-T region. For a nd123 sample, as the oxygen annealing temperature increased above 300 degrees C, both B-pk and IL were systematically depressed. However, B-pk for all systems was insensitive to the amount of the second phase (nd4Ba2Cu2O10 (nd422) and RE(2)BaCuO(5) (RE211) for the other) inclusion in the superconducting RE123 matrix, supporting that the characteristic flux pinning is due to the superconducting matrix.
This paper discusses the development of algorithms for parallel interpretation-tree model matching for 3-D computer vision applications such as object recognition. The algorithms are developed with a prototyping appro...
详细信息
The SPLASH-2 suite of parallelapplications has recently been released to facilitate the study of centralized anddistributed shared-address-space multiprocessors. In this context, this paper has two goals. One is to ...
详细信息
The SPLASH-2 suite of parallelapplications has recently been released to facilitate the study of centralized anddistributed shared-address-space multiprocessors. In this context, this paper has two goals. One is to quantitatively characterize the SPLASH-2 programs in terms of fundamental properties and architectural interactions that are important to understand them well. The properties we study include the computational load balance, communication to computation ratio and traffic needs, important working set sizes, and issues related to spatial locality, as well as how these properties scale with problem size and the number of processors. The other, related goal is methodological: to assist people who will use the programs in architectural evaluations to prune the space of application and machine parameters in an informed and meaningful way. For example, by characterizing the working sets of the applications, we describe which operating points in terms of cache size and problem size are representative of realistic situations, which are not, and which re redundant. Using SPLASH-2 as an example, we hope to convey the importance of understanding the interplay of problem size, number of processors, and working sets in designing experiments and interpreting their results.
The purpose of this paper is to present an efficient distributed cycle/knot detection algorithm for general graphs which will determine whether a given node is a member of a knot or a cycle. This is relevant to an app...
详细信息
The purpose of this paper is to present an efficient distributed cycle/knot detection algorithm for general graphs which will determine whether a given node is a member of a knot or a cycle. This is relevant to an application such as parallel simulation [20, 15] in which (1) cycles and knots can arise frequently, (2) the size of the graph is very large and (3) it is necessary to know if a given node is in a cycle [14, 18] or a knot [1,15,20]. The algorithm is based on a diffusing computation [10]. It requires less communication cost than preceding algorithms [4, 6, 9, 17], and is the first algorithm capable of detecting both cycles and knots. The algorithm differs from the classical diffusing computation methods through its use of incomplete search messages to speed up the computation. The algorithm requires a total of at most 2m messages, where m is the number of links. This is compared to Chandy-Misra's algorithm [6] which requires at least (3m+n), where n is a number of nodes and m is the number of links. The algorithm requires O(log(n)) bits of memory. Various applications for the cycle/knot detection algorithm are presented. In particular, we demonstrate its importance to deadlock detection to algorithms for parallel simulation which employ a blocking paradigm and a deadlock breaking technique known as TNE/DLTNE [1, 15].
暂无评论