In this paper, we use an object recognition application as a case study to illustrate our parallelization Methodology, for architecture, design and development. Our goal was to identify a methodology with a streamline...
详细信息
ISBN:
(纸本)9780889868786
In this paper, we use an object recognition application as a case study to illustrate our parallelization Methodology, for architecture, design and development. Our goal was to identify a methodology with a streamlined process for problem analysis, design and development, and use this design methodology in developing a high quality parallel object recognition application, with an efficient and scalable implementation. Our parallelization methodology consists of application design, concurrency analysis, implementation, debugging and testing, and performance tuning phases. In this paper, we focus on the application design, concurrency analysis and implementation phases of the parallelization methodology, to illustrate selection and use of parallel design patterns, during design and implementation phases. In the concurrency analysis phase, we identified key application computations, and performed application decomposition and dependency analysis. During application design, appropriate parallel design patterns were mapped to the problem for efficient parallelization of computations. We illustrate how parallel design patterns were used to guide algorithmic choices during application design, and how design patterns were used in exploring the design space for an efficient implementation, while satisfying the constraints and requirements of the object recognition problem. Applications in the area of object recognition tend to have similar computational flows with variations in input size. A design pattern based approach to parallel software architecture provided us with a common vocabulary for design and development, and helped in meeting the requirements and constraints of the original problem. We found that using our parallelization methodology resulted in a streamlined process for project planning and execution. In this paper, we step through the application design process using parallel design patterns. We review related work in the area of object recognition, and provi
Long-running distributed applications that automate critical decision processes require Byzantine fault tolerance to ensure progress in spite of arbitrary failures. Existing replication protocols for data servers guar...
详细信息
ISBN:
(纸本)9780889866386
Long-running distributed applications that automate critical decision processes require Byzantine fault tolerance to ensure progress in spite of arbitrary failures. Existing replication protocols for data servers guarantee that externally requested operations execute correctly even if a bounded number of replicas fail arbitrarily. However, since these protocols only support passive state machine replication, they are insufficient to support continued correct execution of autonomous long-running distributed applications. Building on the Castro and Liskov Byzantine Fault Tolerance protocol for replicated state machines (CLBFT), we present a practical algorithm for Byzantine fault-tolerant execution of autonomous distributed applications. The algorithm supports replicated clients that actively execute application logic by issuing operation requests on replicated data servers as well as other replicated clients. Our work facilitates dynamic upgrades to replica groups, supports both synchronous and asynchronous operation requests, and provides fault isolation between replica groups with respect to both correctness and performance. The algorithm scales well to large replica groups with only twice the latency and message complexity as compared to CLBFT, which supports only unreplicated clients.
Adaptive task partitioning may introduce implicit hot spots in the computations which affect the parallel performance of the adaptive multivariate integration algorithm. To maintain parallel efficiency, one needs to b...
详细信息
ISBN:
(纸本)088986392X
Adaptive task partitioning may introduce implicit hot spots in the computations which affect the parallel performance of the adaptive multivariate integration algorithm. To maintain parallel efficiency, one needs to balance the load of the processors by distributing the computationally intensive parts of the problem domain. We study and improve the performance of the scheduler based or centralized load balancing method in the adaptive task partitioning algorithm implemented in ParInt.
Despite the well known advantages of distributed processing for intensive computations like simulation, frameworks often fail to exploit them. A distributed simulation is harder to develop than a sequential one, becau...
详细信息
ISBN:
(纸本)9780889866379
Despite the well known advantages of distributed processing for intensive computations like simulation, frameworks often fail to exploit them. A distributed simulation is harder to develop than a sequential one, because it is necessary to interface and map activities to processors and handle the ensuing communication and synchronization problems. Very often the designer has to explicitly specify extra information concerning distribution for the framework to make an effort to exploit parallelism. This paper presents Automated distributed Simulation (ADS), which allows the designer to forget about distribution concerns while benefiting from the advantages. ADS relies on the actor formalism. It is realized as an open source implementation for the Ptolemy II simulation framework. Experiments compare different topologies, granularities and number of blocks, achieving linear speedups for practical cases. We implement pipelining techniques so iterative models with purely sequential topologies can benefit from ADS.
This paper presents improvements of the parallel-FIMI method for statical load balancing of mining of all frequent itemsets on a distributed-memory (DM) parallel machine. This method probabilistically partitions the s...
详细信息
ISBN:
(纸本)9780889867048
This paper presents improvements of the parallel-FIMI method for statical load balancing of mining of all frequent itemsets on a distributed-memory (DM) parallel machine. This method probabilistically partitions the space of all frequent itemsets into partitions of approximately the same size. The improvements consist in paralelization of the approximate partitioning of the search space and of dynamic reordering of items during construction of prefix-based equivalence classes. The new versions of the method achieve nearly linear speedups up to 10 processors.
Sombrero is an object-oriented single address space operating system whose virtual address space can be distributed over multiple nodes. This paper presents recent additions to the design and implementation of the low...
详细信息
ISBN:
(纸本)9780889866386
Sombrero is an object-oriented single address space operating system whose virtual address space can be distributed over multiple nodes. This paper presents recent additions to the design and implementation of the lower level architecture of Sombrero. This design exploits Sombrero's object-oriented development model to provide an implementation of a lower-level architecture for the Alpha platform. Software representations of the hardware system's major components have been implemented. The existing middle level architectural modules now use the lower-level services, These include board and processor level hardware interfaces as well as interrupt processing, device driver and protocol component architectures. We contrast differences in the ways these things are designed and operate in a single address space environment with the way they are done in a conventional multiple address space operating system. Features covered include the use of passive servers, simple efficient thread switching and threads blocking within interrupts. Passive and active device drivers are described and the rational for each mode given. The performance of the Sombrero protocol stack was closely examined with attention to its passive service architecture and thread switching method. The current Sombrero Prototype is described.
In this work, we introduce a robust parallel surface-to-surface contact solution strategy for three dimensional frictional contact problems. A new algorithm is described which has improved overall capability with resp...
详细信息
ISBN:
(纸本)9780889866386
In this work, we introduce a robust parallel surface-to-surface contact solution strategy for three dimensional frictional contact problems. A new algorithm is described which has improved overall capability with respect to numerical robustness and spatial convergence rates, showing also good parallel efficiency. In part because numerical methods for contact problems typically result in ill-conditioned and highly unsymmetric stiffness matrices, direct methods for equation solving have typically been preferred in such applications. However, for large-scale contact numerical modeling, the direct method appears to be limited by its requirements for memory and computation time, which scale with the number of unknowns n as )(n(1.5)) and O(n(2)), respectively. In this paper, we have implemented a parallel contact algorithm using iterative solution methods, and have applied it to several numerical examples using an Intel Xeon based Linux cluster and an HP-Compaq AlphaServer with up to 512 processors. Relatively good scalable performance has been observed in the applications tested.
In this paper, we propose an algorithm to solve the group k-mutual exclusion (Gk-ME) problem for distributedsystems. The Gk-ME problem is concerned with controlling the concurrent accesses of some resource by at most...
详细信息
ISBN:
(纸本)088986392X
In this paper, we propose an algorithm to solve the group k-mutual exclusion (Gk-ME) problem for distributedsystems. The Gk-ME problem is concerned with controlling the concurrent accesses of some resource by at most k nodes with the constraint that no two distinct resources can be accessed simultaneously. The proposed algorithm utilizes a token circulation mechanism and does not require the nodes to have identifications. The delay and session switches of the proposed algorithm are Ω(n2) and Ω(n), respectively.
In many distributed applications, identifying the sequence of event occurrence is useful. Doing so requires local clocks to be properly synchronized with each other. This paper presents a method of pseudo-synchronizat...
详细信息
ISBN:
(纸本)9780889867048
In many distributed applications, identifying the sequence of event occurrence is useful. Doing so requires local clocks to be properly synchronized with each other. This paper presents a method of pseudo-synchronization of clocks by making use of the occurrence time of events included in direct "happened-before" relations. The basic idea is to approximate the actual clock shift between two clocks using the minimum transmission delay of messages between a pair of hosts. Evaluation results suggest that non-base clocks are closely synchronized with a base clock without violating the direct "happened-before" relations.
Group mutual exclusion (GME) was first introduced by Joung [5] as a generalization of the n-process mutual exclusion problem, and subsequently modeled as the Congenial Talking Philosophers problem. GME allows processe...
详细信息
ISBN:
(纸本)9780889866386
Group mutual exclusion (GME) was first introduced by Joung [5] as a generalization of the n-process mutual exclusion problem, and subsequently modeled as the Congenial Talking Philosophers problem. GME allows processes requesting for the same resource to concurrently access the shared resource. However, a process requesting a resource that is different from the one currently being used will not be able to access the requested resource at the same time. Two solutions to the GME problem for a ring network exist. One by Wu and Joung had an unbounded message size problem [11, 12], the second one by Cantarell, et. al. provided a bound for message size, but could potentially generate an unbounded number of messages [2, 3]. This work presents a GME algorithm for a token-passing network that solves both problems by providing a bound both to the number of messages and to message size. Analysis results of time and message complexities and of correctness are presented. Performance is evaluated by simulation experiments, which further validate complexity analyses.
暂无评论