Music synthesis often relies on very computationally intensive algorithms. Various strategies Rave been used to deal withthe complexity, including using simpler, but more limited algorithms, using specialized hardwar...
详细信息
ISBN:
(纸本)0818681187
Music synthesis often relies on very computationally intensive algorithms. Various strategies Rave been used to deal withthe complexity, including using simpler, but more limited algorithms, using specialized hardware, and executing them in non-real-time for later playback. Although several implementations using parallel hardware have been done, very little has been bone withdistributed implementations on clusters of workstations. distributed music synthesis is typical of distributed multimedia applications which use multiple servers to do computations generating high-bandwidth audio/video data, based on low-bandwidth control information. this work demonstrates distributed music synthesis and describes the effects of using different communication protocol and networks. the implementation is a version of the Csound music synthesis package which has been modified to distribute the synthesis load to multiple servers. the network performance should also be applicable to applications which use a high-bandwidth pipeline of processes, which would be appropriate for audio and video post-processing.
Efficient scheduling of parallel tasks onto processing elements of concurrent computer systems has been an important research issue for decades. the communication overhead often limits the speedup of parallel programs...
详细信息
ISBN:
(纸本)3540631380
Efficient scheduling of parallel tasks onto processing elements of concurrent computer systems has been an important research issue for decades. the communication overhead often limits the speedup of parallel programs in distributed systems. Duplication Based Scheduling (DBS) has been proposed to reduce the communication overhead by duplicating remote parent tasks on local processing elements. the DBS algorithms need task selection heuristics which decide the order of tasks to be considered for scheduling. this paper explores the speedup obtained by employing task duplication in the scheduling process and also investigates the effect of different task selection heuristics on a DBS algorithm. Our simulation results show that employing task duplication achieves considerable improvement in the application parallel execution time, but employing different task selection heuristics does not affect the speedup significantly.
Several methods have been proposed in the literature for the distribution of data on distributed memory machines, either oriented to dense or sparse structures. Many of the real applications, however, deal with both k...
详细信息
Several methods have been proposed in the literature for the distribution of data on distributed memory machines, either oriented to dense or sparse structures. Many of the real applications, however, deal with both kind of data jointly. this paper presents techniques for integrating dense and sparse array accesses in a way that optimizes locality and further allows an efficient loop partitioning within a data-parallel compiler. Our approach is evaluated through an experimental survey with several compilers and parallel platforms. the results prove the benefits of the BRS sparse distribution when combined with CYCLIC in mixed algorithms and the poor efficiency achieved by well-known distribution schemes when sparse elements arise in the source code.
Reduction operations are very useful in parallel and distributed computing, withapplications in barrier synchronization, distributed snapshots, termination detection, global virtual time computation, etc. In the cont...
详细信息
Reduction operations are very useful in parallel and distributed computing, withapplications in barrier synchronization, distributed snapshots, termination detection, global virtual time computation, etc. In the context of parallel discrete-event simulations, we have previously introduced a class of adaptive synchronization algorithms based on fast reductions. Here, we explore the implementation of fast reductions on a popular high performance computing platform - a network of workstations. the specific platform is a set of Pentium Pro PC's running the Linux operating system, interconnected by Myrinet - a Gbps network. the general reduction model on which our synchronization algorithms are based is introduced first, followed by a description of how this model can be implemented. We discuss several design trade-offs that must be made in order to achieve the driving goal of high speed reductions and provide innovative algorithms to meet the correctness and performance requirements of the reduction model.
the range tree is a fundamental data structure for multi- multi-dimensional point sets, and as such, is central in a wide range of geometric and database applications. In this paper, we describe the first non-trivial ...
详细信息
the range tree is a fundamental data structure for multi- multi-dimensional point sets, and as such, is central in a wide range of geometric and database applications. In this paper, we describe the first non-trivial adaptation of range trees to the paralleldistributed memory setting (BSP like models). Given a set L of n points in d-dimensional Cartesian space, we show how to construct on a coarse grained multicomputer a distributed range tree T in time O(3/p+Tc(s,p)), where s = n logd-1 n is the size of the sequential data structure and Tc (s,p) is the time to perform an h-relations with h = Θ(s/p). We then show how T can be used to answer a given set Q of m = O(n) range queries in time O(3 log n/p+Tc(s,p)) and O(3 log n/p+Tc(s,p)+k/p), for the associative-function and report modes respectively, where k is the number of results to be reported. these parallel construction and search algorithms are both highly efficient, in that their running times are the sequential time divided by the number of processors, plus a constant number of parallel communication rounds.
the Internet, best known by most users as the World-Wide-Web, continues to expand at an amazing pace. We propose a new infrastructure to harness the combined resources, such as CPU cycles or disk storage, and make the...
详细信息
the Internet, best known by most users as the World-Wide-Web, continues to expand at an amazing pace. We propose a new infrastructure to harness the combined resources, such as CPU cycles or disk storage, and make them available to everyone interested. this infrastructure has the potential for solving parallel supercomputing applications involving thousands of cooperating components. Our approach is based on recent advances in Internet connectivity and the implementation of safe distributed computing embodied in languages such as Java. We developed a prototype of a global computing infrastructure, called SuperWeb, that consists of hosts, brokers and clients. Hosts register a fraction of their computing resources (CPU time, memory, bandwidth, disk space) with resource brokers. Client computations are then mapped by the broker onto the registered resources. We examine an economic model for trading computing resources, and discuss several technical challenges associated with such a global computing environment.
Studies done with academic CC-NUMA machines and simulators indicate a good potential for application performance. Our goal therefore, is to investigate whether the CONVEX Exemplar, a commercial distributed shared memo...
详细信息
Studies done with academic CC-NUMA machines and simulators indicate a good potential for application performance. Our goal therefore, is to investigate whether the CONVEX Exemplar, a commercial distributed shared memory machine, lives up to the expected potential of CC-NUMA machines. If not, we would like to understand what architectural or implementation decisions make it less efficient. On evaluating the delivered performance on the Exemplar, we find that, while a moderate-scale Exemplar machine works well for several applications, it does not for some important classes. Further, performance was affected by four fundamental characteristics of the machine, all of which are due to basic implementation and design choices made on the Exemplar these are: the effect of processor clustering together with limited node-to-network bandwidth, the effect of tertiary caches, the limited user control over data placement, the sequential memory consistency model together with a cache-based cache coherence protocol, and lastly, longer remote latencies.
In this paper, we evaluate the use of software distributed shared memory (DSM) on a message passing machine as the target for a parallelizing compiler. We compare this approach to compiler-generated message passing, h...
详细信息
In this paper, we evaluate the use of software distributed shared memory (DSM) on a message passing machine as the target for a parallelizing compiler. We compare this approach to compiler-generated message passing, hand-coded software DSM, and hand-coded message passing. For this comparison, we use six applications: four that are regular and two that are irregular. Our results are gathered on an 8-node IBM SP/2 using the TreadMarks software DSM system. We use the APR shared-memory (SPF) compiler to generate the shared memory programs, and the APR XHPF compiler to generate message passing programs. the hand-coded message passing programs run withthe IBM PVMe optimized message passing library. On the regular programs, boththe compiler-generated and the hand-coded message passing outperform the SPF/TreadMarks combination: the compiler-generated message passing by 5.5% to 40%, and the hand-coded message passing by 7.5% to 49%. On the irregular programs, the SPF/TreadMarks combination outperforms the compiler-generated message passing by 38% and 89%, and only slightly underperforms the hand-coded message passing, differing by 4.4% and 16%. We also identify the factors that account for the performance differences, estimate their relative importance, and describe methods to improve the performance.
the purpose of adaptive fault-tolerance (AFT) is to meet the dynamically and widely changing fault-tolerance requirement by efficiently and adaptively utilizing a limited and dynamically changing amount of available r...
详细信息
the purpose of adaptive fault-tolerance (AFT) is to meet the dynamically and widely changing fault-tolerance requirement by efficiently and adaptively utilizing a limited and dynamically changing amount of available redundant processing resources. In this paper we present one concrete AFT scheme, named the adaptable distributed recovery block (ADRB) scheme, which is an extension of the distributed Recovery Block (DRB) scheme for reliable execution of real-time applications withthe tolerance of both hardware and software faults in distributed/parallel computer systems. An ADRB station dynamically switches its operating mode in response to significant changes in the resource and application modes. Different operating modes have different resource requirements and yield different fault tolerance capabilities. A modular implementation model for the ADRB scheme is also presented. An efficient execution support mechanism for the ADRB scheme has been implemented as a part of a timeliness-guaranteed kernel developed at the University of California, Irvine.
Searching is one of the most important algorithmic problems, used as a subroutine in many applications. Accordingly, designing search algorithms is in the center of research on data structures since decades. In this p...
详细信息
暂无评论