Debugging distributed programs is much more difficult than debugging sequential programs. One of the reasons is the communication among programs (processes) which may happen concurrently and nondeterministically. To b...
详细信息
ISBN:
(纸本)081864222X
Debugging distributed programs is much more difficult than debugging sequential programs. One of the reasons is the communication among programs (processes) which may happen concurrently and nondeterministically. To be able to analyze such communication events is therefore an essential task for any distributed program debugger. This paper describes the design and preliminary implementation of a layered distributed program debugger. The debugger helps a user to locate bugs, to analyze a distributed program and to fix bugs.
The Loop-Level Process Control (LLPC) policy [9] dynamically adjusts the number of threads an application is allowed to execute based on the application's available parallelism and the overall system load. This st...
详细信息
ISBN:
(纸本)0818684038
The Loop-Level Process Control (LLPC) policy [9] dynamically adjusts the number of threads an application is allowed to execute based on the application's available parallelism and the overall system load. This study demonstrates the feasibility of incorporating the LLPC strategy into an existing commercial operating system and parallelizing compiler and provides further evidence of the performance improvement that is possible using this dynamic allocation strategy. In this implementation, applications are automatically parallelized and enhanced with the appropriate LLPC hooks so that each application interacts with the modified version of the Solaris operating system. The parallelism of the applications is then dynamically adjusted a automatically when they are executed in a multiprogrammed environment so that all applications obtain a fair share of the total processing resources.
Multiple access channel is a well-known communication model that deploys properties of many network systems, such as Aloha multi-access systems, local area Ethernet networks, satellite communication systems, packet ra...
详细信息
ISBN:
(纸本)9780769549712
Multiple access channel is a well-known communication model that deploys properties of many network systems, such as Aloha multi-access systems, local area Ethernet networks, satellite communication systems, packet radio networks. The fundamental aspect of this model is to provide efficient communication and computation in the presence of restricted access to the communication resource: at most one station can successfully transmit at a time, and a wasted round occurs when more than one station attempts to transmit at the same time. In this work we consider the problem of contention resolution in a multiple access channel in a realistic scenario when up to k stations out of n join the channel at different times. The goal is to let at least one station to transmit alone, which results in successful delivery of the message through the channel. We present three deterministic algorithms: two of them working under some constrained scenarios, and achieving asymptotically optimal time complexity Theta(k log(n/k)), while the third general algorithm accomplishes the goal in time O(k log n log log n).
Image matching based on image feature pixels involves heavily iterated computation and frequent memory access. The key to increase the speed is to employ parallelism on either parallel machines or workstation clusters...
详细信息
Image matching based on image feature pixels involves heavily iterated computation and frequent memory access. The key to increase the speed is to employ parallelism on either parallel machines or workstation clusters. This paper presents the development of a parallel image matching system which uses a divide-and-conquer method to implement the proposed hierarchical matching scheme on a networked workstation cluster. Our investigation shows that a distributed workstation cluster can best meet the demand of high computation and memory access in image processing. The performance of our proposed matching scheme is evaluated in terms of execution time.
This work presents the design of the Coven framework for construction of Problem Solving Environments (PSEs) for parallel computers. PSEs are an integral part of modern high performance computing (HPC) and Coven attem...
详细信息
ISBN:
(纸本)0769516866
This work presents the design of the Coven framework for construction of Problem Solving Environments (PSEs) for parallel computers. PSEs are an integral part of modern high performance computing (HPC) and Coven attempts to simplify PSE construction. Coven targets Beowulf cluster parallel computers but independent of any particular domain for the PSE. Multi-threaded parallel applications are created with Coven that are capable of supporting most of the constructs in a typical parallel programming language. Coven uses an agent-based front-end which allows multiple custom interfaces to be constructed Examples of the use of Coven in the construction of prototype PSEs are shown, and the effectiveness of these PSEs is evaluated in terms of the performance of the applications they generate.
Designing a good task allocation algorithm faces the challenge of allowing high levels of throughput, so that tasks are executed fast and processor parallelism is exploited, while still guaranteeing a low level of mem...
详细信息
ISBN:
(纸本)0818676833
Designing a good task allocation algorithm faces the challenge of allowing high levels of throughput, so that tasks are executed fast and processor parallelism is exploited, while still guaranteeing a low level of memory contention, so that performance does not suffer because of limitations on processor-to-memory bandwidth. In this work, we present a comparative study of throughput and contention guarantees provided by load balancing networks, a new class of distributed asynchronous algorithms for real-time task allocation in shared memory multiprocessors. Load balancing networks generalize balancing networks, to accomodate tasks with varying completion times. On the theoretical side, we formulate precise and crisp definitions for capturing the quality of load balancing provided by general task allocation algorithms;we use these definitions for formally evaluating the throughput performance of specific constructions of load balancing networks that we propose. Furthermore, we introduce a formal, complexity-theoretic measure of contention Incurred by tasks with varying completion times, and use it to analyse the contention performance of these constructions. Our theoretical results display precise and subtle trade-offs between throughput and contention performances for load balancing networks. On the practical side, we propose an experimental platform for evaluating the actual performance of load balancing networks through a series of carefully designed experiments that simulate these networks on real shared memory multiprocessor machines. Our experimental approach encompasses a rigorous methodology for randomly generating tasks that are not merely ''random'', but rather belong to common classes of tasks such as periodic and sporadic. Our experimental results reveal that load balancing networks substantially outperform in performance classical, centralized methods for task allocation.
Thread migration is established as a mechanism for achieving dynamic load sharing. However;fine-grained migration has not been used due to the high thread and messaging overheads. This paper describes a fine-grained t...
详细信息
ISBN:
(纸本)0818684038
Thread migration is established as a mechanism for achieving dynamic load sharing. However;fine-grained migration has not been used due to the high thread and messaging overheads. This paper describes a fine-grained thread migration system whose extensible event mechanism permits an efficient interface between threads and communications, without compromising the modularity and performance of either: Migration is supported by user level primitives based on which applications may implement different migration policies. The system is portable and can be used directly or serve as a compilation target for parallel languages. The system runs on a cluster of SMPs and observed performance is orders of magnitude better than other reported measurements.
In this paper, we consider the problem of selection on coarse-grained distributed memory parallel computers. We discuss several deterministic and randomized algorithms for parallel selection. Experimental results on t...
详细信息
In this paper, we consider the problem of selection on coarse-grained distributed memory parallel computers. We discuss several deterministic and randomized algorithms for parallel selection. Experimental results on the CM-5 demonstrate that randomized algorithms are superior to their deterministic counterparts.
This paper discusses the benefits of incorporating a separate processor into a processor node of Massively parallel Architectures to handle messages and proposes hardware solutions to provide atomicity between the mai...
详细信息
This paper discusses the benefits of incorporating a separate processor into a processor node of Massively parallel Architectures to handle messages and proposes hardware solutions to provide atomicity between the main processor and the processor dedicated to handle messages. The proposed design is aimed at improving the performance by relegating the responsibility of handling messages to a separate processor. The hardware modifications are kept to a minimum in order not to disturb the original functionality of a modern RISC processor.
On-Line Analytical processing techniques are used for data analysis and decision support systems. The multidimensionality of the underlying data is well represented by multidimensional databases. For data mining in kn...
详细信息
ISBN:
(纸本)0818684038
On-Line Analytical processing techniques are used for data analysis and decision support systems. The multidimensionality of the underlying data is well represented by multidimensional databases. For data mining in knowledge discovery, OLAP calculations can be effectively used. For these, high performance parallel systems are required to provide interactive analysis. Precomputed aggregate calculations in a Data Cube can provide efficient query processing for OLAP applications. In this article, we present parallel data cube construction on distributed-memory, parallel computers from a relational database. Data Cube is used for data mining of associations using Attribute Focusing. Results are presented for these on the IBM-SP2, which show that our algorithms and techniques are scalable to a large number of processors, providing a high performance platform for such applications.
暂无评论