In recent years, the Service Oriented Architecture (SOA) has evolved itself into emerging technologies like cloud computing to give it more relevance. ANU-SOAM - a service oriented middleware - aims to provide conveni...
详细信息
ISBN:
(纸本)9780889868649
In recent years, the Service Oriented Architecture (SOA) has evolved itself into emerging technologies like cloud computing to give it more relevance. ANU-SOAM - a service oriented middleware - aims to provide convenient API, a unique data service extension and proper load-balancing techniques for high performance scientific computing. The data service extension offers both Common Data Service (CDS) and Local Data Service (LDS). CDS helps set data common to all service instances and to manipulate it using add, get, put, sync, etc. functions. The LDS allows consumer to partially replicate data among service instances to improve memory scalability. Comparable paradigms like MPI are mostly agnostic and non-responsive to heterogeneous conditions. The SOA approach enables ANU-SOAM to have load balancing techniques implemented with the help of a Resource Manager. Experiments using N Body Solver and Heat Transfer applications have shown that ANU-SOAM performs as good as most of its MPI counterparts, especially under heterogeneous conditions.
Permutations belong to communications patterns demanded frequently in massively-parallel computers especially of the SIMD type. A permutation is said "admissible" to a given interconnection network if it doe...
详细信息
ISBN:
(纸本)9780889867840
Permutations belong to communications patterns demanded frequently in massively-parallel computers especially of the SIMD type. A permutation is said "admissible" to a given interconnection network if it does not cause blockings in that network under a chosen routing algorithm. Determining the admissibility of a given permutation to various static connecting topologies is a fundamental problem. Based on congruence notion from number theory, this paper presents a simple method which solves admissibility problem for regular permutations to uniaxial 2D and 3D tori under deterministic dimension-order routing commonly used in practice. Here "uniaxial" means that in every routing step all data items participating in a permutation can move along the same axis only. It is assumed that all nodes of a system work in a synchronous fashion what is also characteristic to SIMDs. The efficiency of the method is illustrated by the examples with checking admissibility of some frequently used in parallel programming permutations which belong either to Omega or BPC (bit-permute-complement) classes.
Recent investigations into resilience of large-scale high-performance computing (HPC) systems showed a continuous trend of decreasing reliability and availability. Newly installed systems have a lower mean-time to fai...
详细信息
ISBN:
(纸本)9780889867840
Recent investigations into resilience of large-scale high-performance computing (HPC) systems showed a continuous trend of decreasing reliability and availability. Newly installed systems have a lower mean-time to failure (MTTF) and a higher mean-time to recover (MTTR) than their predecessors. Modular redundancy is being used in many mission critical systems today to provide for resilience, such as for aerospace and command & control systems. The primary argument against modular redundancy for resilience in HPC has always been that the capability of a HPC system, and respective return on investment, would be significantly reduced. We argue that modular redundancy can significantly increase compute node availability as it removes the impact of scale from single compute node MTTR. We further argue that single compute nodes can be much less reliable, and therefore less expensive, and still be highly available, if their MTTR/MTTF ratio is maintained.
The simplex method has been successfully used in solving linear programming problems for many years. parallel approaches have also extensively been studied due to the intensive computatios required (especially for the...
详细信息
ISBN:
(纸本)9780889868649
The simplex method has been successfully used in solving linear programming problems for many years. parallel approaches have also extensively been studied due to the intensive computatios required (especially for the solution of large in size linear problems). In this paper we present a highly scaleable parallel implementation framework of the standard full tableau simplex method on a highly parallel (distributed memory) environment. Specifically, we have designed and implemented a column distribution scheme (similar to the one presented in [24]) as well as a row distribution scheme (similar to the one presented in [3]) and we have entirely tested our implementations over a considerably powerful parallel environment (a linux-cluster of eight powerful Xeon processors connected via a high speed Myrinet network interface). We then compare our approaches (a) among each other for variable number of problem size (number of rows and columns) and (b) to the corresponding ones of [3] and [24] which are two of the most recent and valuable corresponding efforts. In most cases the column distribution scheme performs quite/much better than the row distribution scheme. Moreover, both schemes (even the row distribution scheme over large scale problems) lead to particularly high speed-up and efficiency values, that are considerably better in all cases than the ones achieved by the corresponding implementations of [3] and [24].
An efficient parallelization strategy is presented for a Hierarchical Run Length Encoded (HRLE) data structure, implemented for the Sparse Field Level Set method. In order to achieve high parallel efficiency, computat...
详细信息
ISBN:
(纸本)9780889868649
An efficient parallelization strategy is presented for a Hierarchical Run Length Encoded (HRLE) data structure, implemented for the Sparse Field Level Set method. In order to achieve high parallel efficiency, computational work must be distributed evenly over all available CPU threads. Since the Level Set surface must be allowed to deform and evolve, thereby increasing the simulation area, there must exist a way to increase the surface domain while keeping an efficient parallelization strategy in place. This is achieved by processing the same number of calculations across each available CPU. The addition of data to HRLE data structures is only permitted in a sequential or lexicographical order, making parallelization more complex. The presented solution uses as many HRLE data structures as there are CPUs available. Approximately 90% of operations can be performed in parallel when using the presented strategy, leading to an efficiency of up to 96% or 78.5% when using two or sixteen CPU cores of an AMD Opteron 8435 processor, clocked at 2.6GHz, respectively. Topographies with one and two moving interfaces were simulated using multi-threading, showing the speedup and efficiency for the presented strategy.
Computer Generated Holography (CGH) is considered a promising candidate for realizing 3D display with complete depth features. However, the realization of a largescale CGH requires a huge computation time. This paper ...
详细信息
Computer Generated Holography (CGH) is considered a promising candidate for realizing 3D display with complete depth features. However, the realization of a largescale CGH requires a huge computation time. This paper proposes a method for decomposing an input object into sub-objects and generating sub-holograms from them utilizing interpolation. The major advantage of this method is that the generation processes become mutually independent and can be executed in parallel without communication and synchronization. After presenting the theory of interpolation method, we will show how we can implement the method on GPU using data-parallel operations. We will also show the simulation and optical reconstruction results that verify the correctness of the method. Finally, we will present the preliminary results of our experiment by using GPU.
High performance servers of heterogeneous computing environments, as can be found in data centers for cloud computing, consume immense amounts of energy even though they are usually underutilized. In times when not al...
详细信息
High performance servers of heterogeneous computing environments, as can be found in data centers for cloud computing, consume immense amounts of energy even though they are usually underutilized. In times when not all computing capabilities are needed the task to be solved is how to distribute the computational load in a power-efficient manner. The question to be answered is, what load partitions should be assigned to each physical server so that all work is done with minimal energy consumption. This problem is closely related to the selection of physical servers that can be switched off completely to further reduce the power consumption. In this work, we present algorithms which calculate a power-efficient distribution of a divisible workload among multiple, heterogeneous physical servers. We assume a fully divisible load to calculate an optimized utilization of each server. Based on this distribution, an iterative process is carried out to identify servers, which can be switched off in order to further reduce the power consumption. With that information, workload (re)distribution can take place to partition appropriate subloads to the remaining servers. As before, the calculated partitioning minimizes the power consumption.
The networked application environment has motivated the development of multitasking operating systems for sensor networks and other low-power electronic devices, but their multitasking capability is severely limited b...
详细信息
ISBN:
(纸本)9780769540597
The networked application environment has motivated the development of multitasking operating systems for sensor networks and other low-power electronic devices, but their multitasking capability is severely limited because traditional stack management techniques perform poorly on small-memory systems. In this paper, we show that combining binary translation and a new kernel runtime can lead to efficient OS designs on resource-constrained platforms. We introduce SenSmart, a multitasking OS for sensor networks, and present new OS design techniques for supporting preemptive multi-task scheduling, memory isolation, and versatile stack management. We have implemented SenSmart on MICA2/MICAz motes. Evaluation shows that SenSmart performs efficient binary translation and demonstrates a significantly better capability in managing concurrent tasks than other sensornet operating systems.
This paper demonstrates a distributed on-line service selection( probe/access) scheme: optimal stopping web service selection scheme based on the rate of return problem from optimal stopping theory. There are three di...
详细信息
ISBN:
(纸本)9780889869431
This paper demonstrates a distributed on-line service selection( probe/access) scheme: optimal stopping web service selection scheme based on the rate of return problem from optimal stopping theory. There are three differences between our scheme and the conventional schemes. Firstly, it does not need to probe all web services, and only probe a few web services. Secondly, our scheme focuses on maximizing the average QoS(Quality of Service) return per unit of cost over all stages of probe and access for a long period rather than maximizing QoS return per single stage of probe and access in usual schemes. Thirdly, our scheme develops a return function based on three factors: QoS return, user's requirement and probe cost which are seldom considered simultaneously before. Through theory analysis and computation, we demonstrate that compared with the conventional schemes our scheme has additional advantages while achieving same good performances.
We consider the problem of scheduling parallel applications, represented by directed acyclic graphs (DAGs), onto Grid style resource pools. The core issues are that the availability and performance of grid resources, ...
详细信息
ISBN:
(纸本)9780889866379
We consider the problem of scheduling parallel applications, represented by directed acyclic graphs (DAGs), onto Grid style resource pools. The core issues are that the availability and performance of grid resources, which are already by their nature heterogeneous, can be expected to vary dynamically, even during the course of an execution. Typical scheduling methods in the literature partially address this issue because they consider static heterogenous computing environments (i.e. heterogeneous resources are dedicated and unchanging over time). This paper presents the Grid Task Positioning GTP scheduling method, which addresses the problem by allowing rescheduling of an executing application in response to significant variations in resource characteristics. GTP considers the impact of partial completion of tasks and task migration. We compare the performance of GTP with that of the well-known, and static, Heterogeneous Earliest Finish Time (HEFT) algorithm.
暂无评论