We exploited the recent advances in Internet connectivity and Web technologies for building Web-based parallelprogramming environments (WPPEs) that facilitate the development and execution of parallel programs on rem...
详细信息
ISBN:
(纸本)0818681187
We exploited the recent advances in Internet connectivity and Web technologies for building Web-based parallelprogramming environments (WPPEs) that facilitate the development and execution of parallel programs on remote high-performance computers. A Web browser running on the user's machine provides a user-friendly interface to sewer-site user accounts and allows the use of parallel computing platforms and software in a convenient manner. The user may create, edit, and execute files through this Web browser interface. This new Web-based client-sewer architecture has the potential of being used as a future front-end to high-performance computer systems. We discuss the design and implementation of several prototype WPPEs that are currently in use at the Northeast parallelarchitectures Center and the Cornell Theory Center These initial prototypes support high-level parallelprogramming with Fortran 90 and Nigh Performance Fortran (HPF), as well as explicit tow-level programming with Message Passing Interface (MPI). We detail the lessons learned during the development process and outline the tradeoffs of various design choices in the realization of the design. We especially concentrate on providing sewer-site user accounts, mechanisms to access those accounts through the Web, and the Web-related system security issues.
programming massively-parallel machine is a daunting task for any human programmer and parallelization may even be impossible for any compiler. Instead, the functional programming paradigm may prove to be an ideal sol...
详细信息
programming massively-parallel machine is a daunting task for any human programmer and parallelization may even be impossible for any compiler. Instead, the functional programming paradigm may prove to be an ideal solution by providing an implicitly parallel interface to the programmer. We describe here the Sisal project (Stream and Iteration in a Single Assignment Language) and its goal to provide a general-purpose user interface for a wide range of parallel processing platforms.
Derived from a thorough analysis of a wide class of image processing algorithms' properties, a parallel RISC architecture has been developed. The architecture gains performance from data level parallelism as well ...
详细信息
ISBN:
(纸本)0780342291
Derived from a thorough analysis of a wide class of image processing algorithms' properties, a parallel RISC architecture has been developed. The architecture gains performance from data level parallelism as well as from instruction level parallelism. From the beginning of the concept phase, high-level programming capabilities have been one of the major design goals. Thus, there has been a steady interaction between the design of the software development toolkit - optimizing assembler and C++ compiler - and the architecture itself. The RISC-typical register files are one of the most critical elements as well concerning die size and clock frequency as the assembler's ability in VLIW scheduling. Running at 100 MHz (200 mm(2), 0.35 mu m CMOS) the processor reaches a sustained performance of more than 2 GOPS for a wide range of image processing algorithms.
The MINCUT problem for graphs is to find a linear arrangement with minimum cut. The problem is NP-complete for general graphs while polynomial-time solvable for trees. The PLANAR MINCUT problem does not allow edge cro...
详细信息
In this paper we present a knowledge base for the generation and optimization of query execution plans for parallel database systems. This knowledge base builds the basis of a novel extended blackboard architecture fo...
详细信息
Much work has been done to implement declarative languages in parallel form. Most of them tend to resort to imperative features for some purposes, particularly for description of the parallelism. We propose parallel c...
详细信息
ISBN:
(纸本)0818678704
Much work has been done to implement declarative languages in parallel form. Most of them tend to resort to imperative features for some purposes, particularly for description of the parallelism. We propose parallel computation on associative networks, a machine independent parallelprogramming model, for automatic extraction of available inherent parallelism and optimization of declarative programs. Associative networks are used for representing program-like and data-like information. The computation follows the transformation style of information processing. All computational mechanisms are oriented toward the processing incomplete information and perform parallel partial evaluation. This partial evaluation is a base of the proposed technique for automatic transforming, optimizing, and parallelizing declarative programs.
We study efficient parallel solutions to the problem of selecting r elements at specified ranks from a set of n arbitrary elements, known as multiselection, on a hypercube with p processors, p, r less than or equal to...
详细信息
ISBN:
(纸本)0818682596
We study efficient parallel solutions to the problem of selecting r elements at specified ranks from a set of n arbitrary elements, known as multiselection, on a hypercube with p processors, p, r less than or equal to n,. We propose two parallel algorithms based on different approaches. where one requires processors to operate in the SIMD mode, and the other in the MIMD mode, Our SIMD algorithm runs iu time O((log n log log n) min{r, log n}) when p = Theta(n), and O(n(epsilon) min{r, (1 - epsilon) log n}) when p = n(epsilon) for any 0 < epsilon < 1, where the latter is cost optimal when r greater than or equal to p. Our MIMD algorithm runs in O(log n log log n log r) time when p = Theta(n), and in O(n(epsilon) log r) time when p = n(epsilon) for any 0 < epsilon < 1, which is cost optimal for any r. Both algorithms are more efficient than the possible straightforward solutions and that of direct simulation of the optimal EREW algorithm.
In this paper, Wavelength Division Multiple access (WDM) ring is proposed for interconnection in work-station clusters or parallel machines. This network consists of ring connected routers each of which selectively pa...
详细信息
ISBN:
(纸本)0818682596
In this paper, Wavelength Division Multiple access (WDM) ring is proposed for interconnection in work-station clusters or parallel machines. This network consists of ring connected routers each of which selectively passes signals addressed in some particular wavelengths. Other wavelengths are once converted to electric signals, and re-transmitted being addressed in different wavelengths. Wavelengths are assigned to divisors of the number of nodes in the system. Using the regular WDM ring with imaginary nodes, the diameter and average distance are reduced even if the number of nodes has few divisors. It provides better diameter and average distance than that of the unidirectional torus. Although the diameter and average distance is worse than that of Shuffle Net, the physical structure of the WDM ring is simple and the available number of nodes is flexible.
Inter-module bandwidth is one of the major constraints on the performance of current and future parallel systems. In this paper, we propose and evaluate several high-performance bus-based parallelarchitectures, inclu...
详细信息
Inter-module bandwidth is one of the major constraints on the performance of current and future parallel systems. In this paper, we propose and evaluate several high-performance bus-based parallelarchitectures, including bus-based cyclic networks (BCNs) and quotient cyclic networks (BQCNs), which are particularly efficient in view of their respective inter-module communication patterns. The inter-cluster connection in a BCN is defined on a set of nodes whose addresses are cyclic shifts of one another. The node degree of a basic BCN is 3;while those of BQCNs and enhanced BCNs can vary from a small constant (e.g., 2) to as large as required, thus providing flexibility and effective tradeoff between cost and performance. A variety of algorithms can be performed efficiently on these networks, thus proving the versatility of BCNs and BQCNs.
A novel comprehensive and coherent approach for the purpose of increasing instruction-level parallelism (ILP) is devised. The key new tool in our envisioned system update is the addition of a parallel prefix-sum (PS) ...
详细信息
A novel comprehensive and coherent approach for the purpose of increasing instruction-level parallelism (ILP) is devised. The key new tool in our envisioned system update is the addition of a parallel prefix-sum (PS) instruction, which will have efficient implementation in hardware, to the instruction-set architecture. This addition gives for the first time a concrete way for recruiting the whole knowledge base of parallel algorithms for that purpose. The potential increase in ILP is demonstrated by experimental results for a test application. The main technical contribution is in the form of a `completeness theorem'. Perhaps surprisingly, the current abstract proves that in an envisioned system which employs parallel PS functional units, a proper use of a serial programming language suffices for the following. With a moderate effort, one can program a parallel algorithm (in a serial language), so that a parallelizing compiler (even without run-time methods!) will be able to extract the same (i.e., `complete') ILP from such serial code as from code written in a parallel language. Alternatively, rather than have the programmer produce the serial code, a precompiler could derive it from a parallel language. The most interesting idea in the proof is the reliance on the new parallel PS for circumventing collision-ambiguity in references to memory. Other new ideas in the paper include hardware-design of a prefix-sum unit and an on-line algorithm for high-bandwidth register-files. An informal upshot of this paper is the following general insight: to accommodate parallelism in uniprocessor systems (from algorithms to ILP), it is sufficient to only add (and, of course, incorporate) parallel prefix-sum functional units to standard serial system designs.
暂无评论