The parallel-Horus framework, developed at the University of Amsterdam, is a unique software architecture that allows non-expert parallel programmers to develop fully sequential multimedia applications for efficient e...
详细信息
ISBN:
(纸本)0769523129
The parallel-Horus framework, developed at the University of Amsterdam, is a unique software architecture that allows non-expert parallel programmers to develop fully sequential multimedia applications for efficient execution on homogeneous Beowulf-type commodity clusters. Previously obtained results for realistic, but relatively small-sized applications have shown the feasibility of the parallel-Horus approach, with parallel performance consistently being found to be optimal with respect to the abstraction level of message passing programs. In this paper we discuss the most serious challenge parallel-Horus has had to deal with so far: The processing of over 184 hours of video included in the 2004 NIST TRECVID evaluation, i.e. The de facto international standard benchmark for content-based video retrieval. Our results and experiences confirm that parallel-Horus is a very powerful support-tool for state-of-the-art research and applications in multimedia processing.
Since the organizers of this conference refused to accept my protestations that I didn't have anything meaningful to contribute, I acquiesced and accepted their invitation. When sobriety returned, I realized I had...
详细信息
ISBN:
(纸本)0769523129
Since the organizers of this conference refused to accept my protestations that I didn't have anything meaningful to contribute, I acquiesced and accepted their invitation. When sobriety returned, I realized I had to (a) first understand what distributedprocessing is and what it is not, and (b) see if my newfound knowledge could produce anything useful. In this talk I propose to give you my understanding of distributedprocessing, how it differs fundamentally from non-distributedprocessing, and show that in this world of continually increasing complexity, it is the only paradigm that makes sense. Not surprisingly, the effective (and ineffective) uses of distributedprocessing are all around us. Several enlightening examples: an effective military, organized religion, the American game of football, and university education. I will discuss each. I submit that these examples have a unifying theory that could be relevant to microarchitecture. I describe it as a two-phased approach: preparation and execution. The microarchitect historically has designed central processing units. But to keep pace with current and future design points, including processing power that device technology continues to provide, distributedprocessing technology will become essential. Along the way, I will introduce and explain the relevance of my levels of transformation, multichip processors, the refrigerator, and data flow.
For large scale computational grids, where the resources are distributed over areas spanning thousands of miles, achieving efficiency of collective communication operations such as broadcast becomes of paramount impor...
详细信息
ISBN:
(纸本)0769523129
For large scale computational grids, where the resources are distributed over areas spanning thousands of miles, achieving efficiency of collective communication operations such as broadcast becomes of paramount importance. We propose a broadcast algorithm constructed in terms of point-to-point communication operations that occur according to a topology determined using a generalization of the single source shortest path algorithm such that the point-to-point operations are ordered according to a heuristic. We show that the proposed approach is competitive with, and in some cases exceeds, the performance of the broadcast operation implemented in MPICH-G2, the most used grid-enabled implementation of MPI.
The utilization of toolkits for writing parallel and/or distributed applications has been shown to greatly enhance developer's productivity. Such an approach hides many of the complexities associated with writing ...
详细信息
ISBN:
(纸本)0769523129
The utilization of toolkits for writing parallel and/or distributed applications has been shown to greatly enhance developer's productivity. Such an approach hides many of the complexities associated with writing these applications, rather than relying solely on programming language aids and parallel library support, such as MPI or PVM. In this work, we evaluate three different middleware systems that have been used to implement a computation and I/O-intensive data analysis application from the domain of computer vision. This study shows the benefits and overheads associated with each of the middleware systems, in different homogeneous computational environments and with different workloads. Our results lead the way toward being able to make better decisions for tuning the application environment, for selecting the appropriate middleware, and also for designing more powerful middleware systems to efficiently build and run highly complex applications in both parallel and distributed computing environments.
Previously, DAG scheduling schemes used the mean (average) of computation or communication time in dealing with temporal heterogeneity. However, it is not optimal to consider only the means of computation and communic...
详细信息
ISBN:
(纸本)0769523129
Previously, DAG scheduling schemes used the mean (average) of computation or communication time in dealing with temporal heterogeneity. However, it is not optimal to consider only the means of computation and communication times in DAG scheduling on a temporally (and spatially) heterogeneous distributed computing system. In this paper, it is proposed that the second order moments of computation and communication times, such as the standard deviations, be taken into account in addition to their means, in scheduling "stochastic" DAGs. An effective scheduling approach which accurately estimates the earliest start time of each node and derives a schedule leading to a shorter average parallel execution time has been developed. Through an extensive computer simulation, it has been shown that a significant improvement (reduction) in the average parallel execution times of stochastic DAGs can be achieved by the proposed approach.
In distributed heterogeneous Grid environments the protocols used to exchange bits are crucial. As researchers work hard to discover the best new protocol for the Grid, application developers struggle with ways to use...
详细信息
ISBN:
(纸本)0769523129
In distributed heterogeneous Grid environments the protocols used to exchange bits are crucial. As researchers work hard to discover the best new protocol for the Grid, application developers struggle with ways to use these new protocols. A stable, consistent, and intuitive framework is needed to aid in the implementation and use of these protocols. While the application must not be burdened with the protocol details some of it may need to be exposed to take advantage of potential optimizations. In this paper we examine how the Globus XIO API provides this framework. We will explore the performance implications of using this abstraction layer and the benefits gained in application as well as protocol development.
A distributed system is a collection of computers that are connected via a communication network. distributed systems have become commonplace due to the wide availability of low-cost, high performance computers and ne...
详细信息
ISBN:
(纸本)0769523129
A distributed system is a collection of computers that are connected via a communication network. distributed systems have become commonplace due to the wide availability of low-cost, high performance computers and network devices. However, the management infrastructure often does not scale well when distributed systems get very large. The considerations in building a distributed system are the choice of the network topology and the method used to construct the distributed system so as to optimize the scalability and reliability of the system, lower the cost of linking nodes together and minimize the message delay in transmission, and simplify system resource management. We have developed a new distributed management system that is able to handle the dynamic increase of system size, detect and recover the unexpected failure of system services, and manage system resources. The topologies used in the system are the tree-structured network and the ring-structured network.
This paper extends the previous work on the maximal allowable workload (MAW) problem [2] by investigating a resource allocation problem for distributed real-time systems that contain replicable applications. The syste...
详细信息
ISBN:
(纸本)0769523129
This paper extends the previous work on the maximal allowable workload (MAW) problem [2] by investigating a resource allocation problem for distributed real-time systems that contain replicable applications. The systems may use multiple resources of a single type and be affected by multiple environmental factors. The approach searches for a feasible allocation that maximizes a user defined metric of stability. Several algorithms were developed and experiments were conducted to demonstrate the relative strength of these algorithms. The results showed that Simulated Annealing provides results that are the closest to the optimal for maximizing environmental parameter settings. In addition modified greedy first fit is shown to be the best performing algorithm for finding feasible allocations.
With the rapidly increasing diversity of parallel architectures and the increasing time and labor for developing parallel applications, the performance portability of parallel programs is becoming increasingly importa...
详细信息
ISBN:
(纸本)0769523129
With the rapidly increasing diversity of parallel architectures and the increasing time and labor for developing parallel applications, the performance portability of parallel programs is becoming increasingly important and should be considered when designing parallel execution models, APIs, and runtime system software. This paper analyzes both code portability and performance portability of parallel programs based on the EARTH model - an event-driven fine-grain multi-threaded execution and architecture model. We discuss several design considerations of the EARTH system that contribute to the performance portability of parallel applications. Experiments of four representative benchmarks are conducted on several different parallel architectures, including two clusters listed in the 23rd supercomputer TOP500 list. The results demonstrate that EARTH based programs can achieve robust performance portability across the selected hardware platforms without any code modification or tuning.
Most distributed Garbage Collection (DGC) algorithms are not complete as they fail to reclaim distributed cycles of garbage. Those that achieve such a level of completeness are very costly as they require either some ...
详细信息
ISBN:
(纸本)0769523129
Most distributed Garbage Collection (DGC) algorithms are not complete as they fail to reclaim distributed cycles of garbage. Those that achieve such a level of completeness are very costly as they require either some kind of synchronization or consensus between processes. Others use mechanisms such as backtracking, global counters, a central server, distributed tracing phases, and/or impose additional load and restrictions on local garbage collection. All these approaches hinder scalability and/or performance significantly. We propose a solution to this problem, i.e., we describe a DGC algorithm capable of reclaiming distributed cycles of garbage asynchronously and efficiently. Our algorithm does not require any particular coordination between processes and it tolerates message loss. We have implemented the algorithm both on Rotor (a free source version of ***) and on OBIWAN (a platform supporting mobile agents, object replication and remote invocation);we observed that applications are not disrupted.
暂无评论