This paper studies the impact of using automatic data-layout techniques on the process of coding the well-known multigrid MG NAS parallel benchmark. We describe the sequential problem in detail, and discuss the parall...
详细信息
ISBN:
(纸本)9780769543284
This paper studies the impact of using automatic data-layout techniques on the process of coding the well-known multigrid MG NAS parallel benchmark. We describe the sequential problem in detail, and discuss the parallel version and its optimizations. Then, we implement the parallel algorithm using Hitmap, a highly-efficient modular library for hierarchical tiling and mapping of arrays. We describe how to use the library plug-in system to add a new data-layout module that encapsulates a generalization of the data-alignment policy of the MG benchmark. The module system applies this policy to automatically adapt the data distribution and communication code to any grain level. The impact of using these techniques is qualitatively and quantitatively described in terms of development effort and performance. Our results show that it is possible to introduce flexible automatic data-layout techniques in current parallel compiler technology, without sacrificing performance.
An approach is provided in this article for converting a sequential program to a distributed program. An Architecture Description Language (ADL) is used in this approach as an interface model between the sequential co...
详细信息
ISBN:
(纸本)1601320841
An approach is provided in this article for converting a sequential program to a distributed program. An Architecture Description Language (ADL) is used in this approach as an interface model between the sequential code and distributed code. The implementation process is fulfilled based upon the aforementioned descriptive language in a direct fashion, and this has resulted in the better understanding of the system. First, the required information for the creation of ADL is provided by making use of a sequential code. Then, ADL is produced by making use of such information and the implementation framework is established according to it. Among the specifications of this environment we can refer you to the achievement of behavioral description for each component in the framework of communicational protocol with other components as well as the offer of a procedure for the implementation of asynchronous between components in architecture.
In this paper, we have proposed a novel recovery approach to deal with the lost and orphan messages for distributed computing environment. The proposed scheme considers the complex issue of handling concurrent failure...
详细信息
ISBN:
(纸本)1601320841
In this paper, we have proposed a novel recovery approach to deal with the lost and orphan messages for distributed computing environment. The proposed scheme considers the complex issue of handling concurrent failures. It avoids the complex recovery scheme associated with the asynchronous approach in such a way that in the event of a failure after the system recovers from it, processes can restart from their respective recent checkpoints (thus avoiding the domino effect) irrespective of the existence of any lost or orphan messages among these recent checkpoints. It reduces to a good extent the re-computation time per process after a failure occurs.
Cellular Automata (CA) are parallel models well suited for studying complex systems that are based on local rules of evolution. Notable examples of application are found in fluid-dynamics, crowd simulation, flow-simul...
详细信息
ISBN:
(纸本)9781728165820
Cellular Automata (CA) are parallel models well suited for studying complex systems that are based on local rules of evolution. Notable examples of application are found in fluid-dynamics, crowd simulation, flow-simulation and many more. Nevertheless, CA can be fruitfully exploited as a support in numerical approaches, such as finite element and finite volume methods. Though easily parallelizable by domain partitioning among the nodes of a parallel system, the performance and scalability of cellular automata executed on parallel/distributed machines are limited due to the need of synchronizing nodes at each computational step. With the aim of reducing the synchronization burden, we here present a preliminary study on techniques stemmed from the Discrete Event Simulation field for the optimization of CA on distributed memory architectures. Preliminary results, executed in a distributed memory environment, have shown the usefulness of the considered approach in reducing execution times and therefore in improving the speed up of the parallel execution of the test case.
The skyline queries help users handle the huge amount of available data by finding a set of interesting points. As the dataset sizes are constantly increasing and skyline queries are computationally expensive, it is c...
详细信息
ISBN:
(纸本)9781538637906
The skyline queries help users handle the huge amount of available data by finding a set of interesting points. As the dataset sizes are constantly increasing and skyline queries are computationally expensive, it is critical to compute such queries by utilizing parallelism. Existing works deal exclusively with the totally ordered attribute domains. In this paper, we present a framework, named PSLP, for parallel skyline evaluation for data with both totally and partially ordered domains. We introduce a new partial-to-order mapping scheme that guarantees the correctness of the mapping by preserving incomparability and preference with low mapping cost. We also propose a novel logical partitioning for parallelprocessing where data space are partitioned according to their incomparability and preference relationships by using a pivot point. The logical partitioning can prune away partitions that do not contain any skyline point at the partitioning processing. An extensive performance evaluation confirms the efficiency and effectiveness of the proposed approach.
This research investigates the problem of robust static resource allocation for distributed computing systems operating under imposed Quality of Service (QoS) constraints. Often, such systems are expected to function ...
详细信息
ISBN:
(纸本)1601320841
This research investigates the problem of robust static resource allocation for distributed computing systems operating under imposed Quality of Service (QoS) constraints. Often, such systems are expected to function in an environment where uncertainties in system parameters is common. In such an environment, the amount of processing required to complete a task may fluctuate substantially. Determining a resource allocation that accounts for this uncertainty-in a way that can provide a probability that a given level of QoS is achieved-is an important area of research. We present two techniques for maximizing the probability that a given level of QoS is achieved. The performance results for our techniques are presented for a simulated environment that models a heterogeneous cluster- based radar data processing center.
The telecommunication industry traditionally uses clusters to meet its carrier-class requirements of high availability and reliability. As security has also become a major issue, a distributed Security Infrastructure ...
详细信息
ISBN:
(纸本)1892512416
The telecommunication industry traditionally uses clusters to meet its carrier-class requirements of high availability and reliability. As security has also become a major issue, a distributed Security Infrastructure (DSI) has been initiated for carrier-class Linux clusters. DSI is a security framework which focuses on providing distributed security services and simplifying security administration. This paper presents one of those services: distributed access control service (DisAC). This service manages access rights throughout the whole cluster with process-level granularity. Rules are configured through a unique security policy, which is propagated to each node of the cluster. DisAC enhances this policy at node level but also inter-node access control with process-level granularity.
The fat tree is a network topology well suited for use as the interconnection network in systems such as parallel computers. Its large number of paths between every source/destination pair gives the fat tree the abili...
详细信息
ISBN:
(纸本)9781932415582
The fat tree is a network topology well suited for use as the interconnection network in systems such as parallel computers. Its large number of paths between every source/destination pair gives the fat tree the ability to provide high throughput. This also gives it a high probability of tolerating network faults statically, but few algorithms to dynamically tolerate faults in fat-trees have previously been proposed. In this paper we present a deadlock free routing method for providing dynamic fault tolerance through misrouting downwards in the network. We show that the algorithm is one fault-tolerant, and that it with a certain probability can tolerate a large number of faults.
Dumping large amounts of related data simultaneously to local storage devices instead of a parallel file system is a frequent I/O pattern of HPC applications running at large scale. Since local storage resources are p...
详细信息
ISBN:
(纸本)9781479986484
Dumping large amounts of related data simultaneously to local storage devices instead of a parallel file system is a frequent I/O pattern of HPC applications running at large scale. Since local storage resources are prone to failures and have limited potential to serve multiple requests in parallel, techniques such as replication are often used to enable resilience and high availability. However, replication introduces overhead, both in terms of network traffic necessary to distribute replicas, as well as extra storage space requirements. To reduce this overhead, state-of-art techniques often apply redundancy elimination (e.g. compression or deduplication) before replication, ignoring the natural redundancy that is already present. By contrast, this paper proposes a novel scheme that treats redundancy elimination and replication as a single co-optimized phase: remotely duplicated data is detected and directly leveraged to maintain a desired replication factor by keeping only as many replicas as needed and adding more if necessary. In this context, we introduce a series of high performance algorithms specifically designed to operate under tight and controllable constrains at large scale. We present how this idea can be leveraged in practice and demonstrate its viability for two real-life HPC applications.
This paper discusses the concept of a platform for sharing distributed objects. The platform is based on object wrappers as a method for providing transparent replication that results in increased availability and eff...
详细信息
ISBN:
(纸本)1892512416
This paper discusses the concept of a platform for sharing distributed objects. The platform is based on object wrappers as a method for providing transparent replication that results in increased availability and efficiency. In contrast to many other object sharing systems we do not focus only on supporting strong consistency model (sequential consistency, linearalizability). Consistency maintenance may be configured by the developer by providing semantic description of objects which directly tune the properties of coherence protocols used by the system. The system solves also the problem of nested invocations, which arises when using active replication. Theoretical assumptions have been verified in a prototype implementation called RAP, which provides efficiency-oriented replication for Java objects.
暂无评论