This paper presents a novel cascaded conference network that provides distributedprocessing and signal transmission among members of disjoint sets of generic send/receive devices called conferees. It assumes an onlin...
详细信息
This paper presents a novel cascaded conference network that provides distributedprocessing and signal transmission among members of disjoint sets of generic send/receive devices called conferees. It assumes an online request model in which idle groups of conferees may request the formation of a conference interconnection. Once a conference is established, all conferees remain connected until the entire conference is dissolved. The Hypercube Sandwich Network (HSN) consists of two components. A bidirectional permutation network is used for routing purposes to and from a hypercube of special processing elements for the purpose of conference formation. The HSN achieves strictly nonblocking performance for N conferees using O(N root log N) processing elements, and this is shown to be tight to within a log(1/4) N factor. Previous constructions required a quadratic number of processing elements for strictly nonblocking performance or could only provide wide-sense nonblocking conferencing. If the stronger requirement is made that the communication delay is logarithmic in the conference size, a simple algorithm is presented for wide-sense nonblocking conferencing in an HSN with O(N log N) processing elements.
This paper deals with defining a distributed logging architecture and extending the Java Logging APIs to support such a framework with variable leveling capabilities. In its current form, the Java Logging API has mini...
详细信息
ISBN:
(纸本)1892512416
This paper deals with defining a distributed logging architecture and extending the Java Logging APIs to support such a framework with variable leveling capabilities. In its current form, the Java Logging API has minimal support for logging in a distributed environment. Rather than using the available SocketHandler class, an RMI solution is considered so as to maintain integrity of log messages on systems that may not be the points of message generation. Toward the end, a configurable RMI Server is presented to compliment the RMI Handler.
The studies of Monte Carlo and quasi Monte Carlo have been one of the most interesting topics in computational science in the past decades. In this paper, we present our report on the studies of the two schemes includ...
详细信息
ISBN:
(纸本)1601320841
The studies of Monte Carlo and quasi Monte Carlo have been one of the most interesting topics in computational science in the past decades. In this paper, we present our report on the studies of the two schemes including application in the computation of invariant measures for dynamical systems. The fundamental ideas can be easily applied to other cases of parallel computing.
Bit-reproducibility has many advantages in the context of high-performance computing. Besides simplifying and making more accurate the process of debugging and testing the code, it can allow the deployment of applicat...
详细信息
ISBN:
(纸本)9780769552071
Bit-reproducibility has many advantages in the context of high-performance computing. Besides simplifying and making more accurate the process of debugging and testing the code, it can allow the deployment of applications on heterogeneous systems, maintaining the consistency of the computations. In this work we analyze the basic operations performed by scientific applications and identify the possible sources of non-reproducibility. In particular, we consider the tasks of evaluating transcendental functions and performing reductions using non-associative operators. We present a set of techniques to achieve reproducibility and we propose improvements over existing algorithms to perform reproducible computations in a portable way, at the same time obtaining good performance and accuracy. By applying these techniques to more complex tasks we show that bit-reproducibility can be achieved on a broad range of scientific applications.
In distributed Java environments, locality of objects and threads is crucial for the performance of parallelapplications. We introduce dynamic locality optimizations in the context of JavaParty, a programming and run...
详细信息
ISBN:
(纸本)9780769530895
In distributed Java environments, locality of objects and threads is crucial for the performance of parallelapplications. We introduce dynamic locality optimizations in the context of JavaParty, a programming and runtime environment for parallel Java applications. Until now, an optimal distribution of the individual objects of an application has to be found manually, which has several drawbacks. Based on a former static approach, we develop a dynamic methodology for automatic locality optimizations. By measuring processing and communication times of remote method calls at runtime, a placement strategy can be computed that maps each object of the distributed system to its optimal virtual machine. Objects then are migrated between the processing nodes in order to realize this placement strategy. We evaluate our approach by comparing the performance of two benchmark applications with manually distributed versions. It is shown that our approach is particularly suitable for dynamic applications where the optimal object distribution varies at runtime.
This paper presents the design and preliminary evaluation of hierarchical partitioning and load-balancing techniques for distributed Structured Adaptive Mesh Refinement (SAMR) applications. The overall goal of these t...
详细信息
ISBN:
(纸本)0769516807
This paper presents the design and preliminary evaluation of hierarchical partitioning and load-balancing techniques for distributed Structured Adaptive Mesh Refinement (SAMR) applications. The overall goal of these techniques is to enable the load distribution to reflect the state of the adaptive grid hierarchy and exploit it to reduce synchronization requirements, improve load-balance, and enable concurrent communications and incremental redistribution. The hierarchical partitioning algorithm (HPA) partitions the computational domain into subdomains and assigns them to hierarchical processor groups. Two variants of HPA are presented in this paper. The Static Hierarchical Partitioning Algorithm (SHPA) assigns portions of overall load to processor groups. In SHPA, the group size and the number of processors in each group is setup during initialization and remains unchanged during application execution. It is experimentally shown that SHRA reduces communication costs as compared to the Non-HPA scheme, and reduces overall application execution time by up to 41%. The Adaptive Hierarchical Partitioning Algorithm (AHPA) dynamically partitions the processor pool into hierarchical groups that match the structure of the adaptive grid hierarchy. Initial evaluations of AHRA show that it can reduce communication costs by up to 70%.
This paper discusses some metrics and models aiming at quantifying the heterogeneity level of distributed computing systems. Many of the metrics proposed in previous works are not entirely suitable when used to suppor...
详细信息
ISBN:
(纸本)1892512416
This paper discusses some metrics and models aiming at quantifying the heterogeneity level of distributed computing systems. Many of the metrics proposed in previous works are not entirely suitable when used to support both load and process scheduling mechanisms once practical results show that the metrics are not as general as suggested in the literature. A novel metric, constructed from a new approach using the standard deviation concept, is proposed in this paper. This metric is shown to be adequate for all the case studies adopted and it has potential to support most of the load and process scheduling mechanisms used in parallel/distributed computing.
The DAG-based task graph model has been found effective in scheduling for performance prediction and optimization of parallelapplications. However the scheduling complexity and solution normally depend on the problem...
详细信息
ISBN:
(纸本)0818686030
The DAG-based task graph model has been found effective in scheduling for performance prediction and optimization of parallelapplications. However the scheduling complexity and solution normally depend on the problem size. In this paper we propose a symbolic scheduling scheme for a parameterized task graph which models coarse-grain DAG parallelism independent of the problem size. The algorithm first derives symbolic clusters to group of tasks in order to minimize communication while preserving parallelism and then it evenly assigns task clusters to processors. The run-time system executes clusters on each processor in a multithreaded fashion. This paper also presents preliminary experimental results to demonstrate the effectiveness of our techniques.
As the explosive growth of energy consumption in current heterogeneous distributed systems, energy consumption constraint has been one of the primary design issues Minimizing the schedule length while satisfying the e...
详细信息
ISBN:
(纸本)9781538637906
As the explosive growth of energy consumption in current heterogeneous distributed systems, energy consumption constraint has been one of the primary design issues Minimizing the schedule length while satisfying the energy consumption constraint of parallelapplications is one of the most important problem which has been studied recently. Previous studies have proposed a preassignment approach which tried to presuppose the minimum energy consumption assignment for unassigned tasks to solve the problem based on the dynamic voltage and frequency scaling (DVFS) technique. However, the preassignment of unassigned tasks with the minimum energy consumption does not necessarily lead to the minimization of the schedule length. In this study, we propose an efficient scheduling algorithm using a relative average assignments for tasks. The results of experiments on two real parallelapplications validate that the proposed algorithm can obtain shorter schedule length while satisfying the energy consumption constraint compared with the state-ofthe-art methods in various situations.
Shared resource interference is observed by applications as dynamic performance asymmetry. Prior art has developed approaches to reduce the impact of performance asymmetry mainly at the operating system and architectu...
详细信息
ISBN:
(纸本)9781450388689
Shared resource interference is observed by applications as dynamic performance asymmetry. Prior art has developed approaches to reduce the impact of performance asymmetry mainly at the operating system and architectural levels. In this work, we study how application-level scheduling techniques can leverage moldability (i.e. flexibility to work as either single-threaded or multithreaded task) and explicit knowledge on task criticality to handle scenarios in which system performance is not only unknown but also changing over time. Our proposed task scheduler dynamically learns the performance characteristics of the underlying platform and uses this knowledge to devise better schedules aware of dynamic performance asymmetry, hence reducing the impact of interference. Our evaluation shows that both criticality-aware scheduling and parallelism tuning are effective schemes to address interference in both shared and distributed memory applications.
暂无评论