In order to fulfill the demands for increased performance / power ratio, most chip-vendors are deploying multicore processors to their product lines. Multicore processors are frequently deployed with multilevel cache ...
详细信息
ISBN:
(纸本)9780889867741
In order to fulfill the demands for increased performance / power ratio, most chip-vendors are deploying multicore processors to their product lines. Multicore processors are frequently deployed with multilevel cache memories. parallel thread execution in such a multicore system is difficult as it relates to cache sharing to achieve the best performance. Due to the increased execution time unpredictability, it becomes a challenge to support realtime applications on multicore systems with multilevel caches. Studies show that predictability can be improved using cache locking techniques. However, entire locking at level-1 cache may be inefficient for smaller data size (when compared with the cache size). Also, way locking at level-1 cache is not permitted on some processors (like PowerPC 750GX), but way locking at level-2 cache is possible. By locking at level-2 cache, Xenon processor achieves the effect of using local storage by Cell SPEs. In this work, we simulate a multicore parallelcomputing system with two levels of caches to explore the impact of level-2 cache locking on the performance, power consumption, and predictability. Experimental results show that performance and predictability can be increased and power consumption can be decreased by adding a level-2 cache locking mechanism to an efficient cache sharing structure.
Clusters of nondedicated heterogeneous nodes promise high utilization and performance. A market-based resource allocation allows effective and decentralized management and motivates participants to contribute to the f...
详细信息
ISBN:
(纸本)9780889866386
Clusters of nondedicated heterogeneous nodes promise high utilization and performance. A market-based resource allocation allows effective and decentralized management and motivates participants to contribute to the functionality of a cluster. Major issues for a user who wants to execute a task on a nondedicated cluster are to ensure data integrity and evaluate reasonably a target processor. We have proposed an auditing mechanism that allows a user to establish suitable prices of cluster computational resources in an untrusted system environment of heterogeneous nondedicated clusters. In a process of price evaluation, a user can detect resources that behave incorrectly.
In this paper we propose a new load balancing algorithm for the grid computing service. The proposed load balancing is based on the CPU speed of the workers in the grid system. We developed a simulation model using NS...
详细信息
ISBN:
(纸本)9780889866379
In this paper we propose a new load balancing algorithm for the grid computing service. The proposed load balancing is based on the CPU speed of the workers in the grid system. We developed a simulation model using NS2 to evaluate the performance of our load balancing algorithm. Our simulation results show an asymptotically optimal behaviour of our load balancing algorithms.
This paper describes a novel approach to deterministic multithreading for active replication of Java objects. Unlike other existing approaches, the presented deterministic thread scheduler fully supports the native Ja...
详细信息
ISBN:
(纸本)9780889866386
This paper describes a novel approach to deterministic multithreading for active replication of Java objects. Unlike other existing approaches, the presented deterministic thread scheduler fully supports the native Java synchronisation mechanisms, including reentrant locks, condition variables, and time bounds on wait operations. Furthermore, this paper proposes source-code transformation as a novel approach for intercepting Java synchronisation statements. This allows the reuse of existing object implementations and simplifies application development.
We have developed a distributed asynchronous Web based training system. In order to improve the scalability and robustness of this system, all contents and a function that scores user's answers are realized on mob...
详细信息
ISBN:
(纸本)9780889866386
We have developed a distributed asynchronous Web based training system. In order to improve the scalability and robustness of this system, all contents and a function that scores user's answers are realized on mobile agents. These agents are distributed to computers, and they can obtain using a P2P network that modified Content-Addressable Network. In this system, although entire services do not become impossible even if some computers break down, the problem that contents disappear occurs with an agent's disappearance. In this study, as a solution for this problem, backups of agents are distributed to computers. If a failure of a computer is detected, other computer will continue service using backups of the agents belonged to the computer. The developed algorithm is examined by experiments.
The efficient characterization of adaptive parallel applications is usually challenging due to their complexity and large problem size. Unlike traditional profiling approaches which target the tracing of events or det...
详细信息
ISBN:
(纸本)9780889866386
The efficient characterization of adaptive parallel applications is usually challenging due to their complexity and large problem size. Unlike traditional profiling approaches which target the tracing of events or determining performance parameters for subroutines, the approach described in this paper attempts to discover the inherent adaptivity of parallel applications mapped to the computation domain/mesh, which are independent of runtime environment, so as to aid in the performance tuning of parallel applications, especially dynamic load balancing and repartitioning. Our profiling scheme only requires one-time execution of the target program on any platform to generate a sequence of traces with timestamps. The traces can then be fed to simulations under various system configurations while independent of the real application. Preliminary experiments have been performed to evaluate the proposed profiling techniques.
distributed service architectures are mandatory to handle the platform scale and dynamicity hindering the development of grid and P2P applications. These large-scaled distributed applications are difficult to design, ...
详细信息
ISBN:
(纸本)9780889866386
distributed service architectures are mandatory to handle the platform scale and dynamicity hindering the development of grid and P2P applications. These large-scaled distributed applications are difficult to design, develop and tune because of both theoretical and practical issues. This paper presents the GRAS framework that allows developers to first implement and experiment with such an infrastructure in simulation, benefiting from a controlled environment. The infrastructure can then be deployed in situ without code modification. We detail our design goals, and contrast them with the state of the art. We study the exchange of a message (from the Pastry protocol) using either GRAS 0 or several other solutions. We quantify both the code complexity and the performance and find that GRAS performs better according to both metrics.
This paper research on how to select a subtree with exactly k leaves and a diameter of at most 1, which minimizes the distance from the farthest vertex to the subtree. We call such a subtree (k, l)-center of a tree ne...
详细信息
ISBN:
(纸本)9780889866386
This paper research on how to select a subtree with exactly k leaves and a diameter of at most 1, which minimizes the distance from the farthest vertex to the subtree. We call such a subtree (k, l)-center of a tree network. In this paper, an efficient parallel algorithm is proposed for finding a (k, l)-center of a tree network. This algorithm performs on the EREW PRAM in O(log n) time using O(n) work.
A stabilizing system guarantees that, regardless of the current configuration, the system reaches a legal configuration in a bounded number of steps and the system configuration remains legal thereafter. Whereas, a st...
详细信息
ISBN:
(纸本)9780889867741
A stabilizing system guarantees that, regardless of the current configuration, the system reaches a legal configuration in a bounded number of steps and the system configuration remains legal thereafter. Whereas, a stabilizing system that maintains no explicit variables in the processes of the system is referred to as an inherently stabilizing system, and hence all system states are legal by construction. Due to this attribute, inherently stabilizing systems are immune to transient faults and do not experience any delay due arbitrary system initialization. We view a fault that perturbs the system configuration but not the program as transient fault. Due to these features, inherently stabilizing distributed protocols for peer-to-peer, sensor and mobile networks are desirable. Hypercube, star networks and their variations that provide an increased degree of scalability have been initially design for parallel networks. However, their scalability and the presence of multiple disjoint paths in these topologies make them viable alternatives to existing peer-to-peer and sensor networks topologies. In this paper, we proposed an inherently stabilizing algorithm for delivering messages over all node-disjoint paths from a process to another in star networks. The proposed algorithm has numerous applications including VLSI layout, reliable networks routing, secure message transmission, and network survivability. The proposed routing algorithm is optimal with respect to its state space and lengths of the node-disjoint paths.
As new genes are sequenced, it is common for molecular biologists to compare the new gene's DNA to known sequences. One simple form of DNA sequence comparison is done by solving the Longest Common Subsequence (LCS...
详细信息
ISBN:
(纸本)9780889866386
As new genes are sequenced, it is common for molecular biologists to compare the new gene's DNA to known sequences. One simple form of DNA sequence comparison is done by solving the Longest Common Subsequence (LCS) problem. In this paper, we propose a parallel algorithm and specialized FPGA-based processor (the associative ASC Processor with reconfigurable 2D mesh) to solve the exact and approximate match LCS problems. This solution uses inexpensive hardware and can be reconfigured as new analysis techniques are developed, making it particularly attractive for processing biosequences.
暂无评论