distributed Hash Tables (DHTs) provide a means to build a completely decentralized, large-scale persistent storage service from the individual storage capacities contributed by each node of the peer-to-peer overlay. H...
详细信息
Multicore processor provides large computation capability but also involves the complicate parallel programming. One of major considerations in parallel programming is the performance. Traditional design methodologies...
详细信息
X10 language and Asynchronous Partitioned Global Address Space (APGAS) model is an emerging mechanism for programming high-performance computers and commodity clusters. However, little work exists on distributed progr...
详细信息
ISBN:
(纸本)9781467375887
X10 language and Asynchronous Partitioned Global Address Space (APGAS) model is an emerging mechanism for programming high-performance computers and commodity clusters. However, little work exists on distributed programming framework for dynamic programming (DP) problems based on X10 and APGAS model. In this paper we present DPX10, an efficient distributed X10 framework for DP applications. DPX10 enables developers to write highly efficient DP programs without much effort. A DPX10 program is specified by a directed acyclic graph (DAG) pattern and a compute method for the vertices. DPX10 provides eight commonly used DAG patterns and a simple API to create custom patterns. The system handles all the tiresome work of implementing parallelization including DAG distribution, vertices scheduling, and vertices communication. Moreover, a new recovery method for distributed arrays is developed to provide transparent fault tolerance. We describe the design of the framework and use four DP applications with up to a billion vertices on 120 cores to demonstrate its simplicity, efficiency, and scalability.
Many institutions already have networks of workstations, which could potentially be harnessed as a powerful parallelprocessing resource. A new, automatic task allocation system has been built on top of MPI, an enviro...
详细信息
Many institutions already have networks of workstations, which could potentially be harnessed as a powerful parallelprocessing resource. A new, automatic task allocation system has been built on top of MPI, an environment that permits parallel programming by using the message-passing paradigm and implemented in extensions to C and FORTRAN. This system, known as 'Hector', supports dynamic migration of tasks and automatic run-time performance optimization. MPI programs can be run without modification under Hector, and can be run on existing networks of workstations. Thus Hector permits institutions to harness existing computational resources quickly and transparently.
The proceedings contain 448 papers. The topics discussed include: building the tree of life on terascale systems;efficient block device sharing over myrinet with memory bypass;conserving memory bandwidth in chip multi...
详细信息
ISBN:
(纸本)1424409101
The proceedings contain 448 papers. The topics discussed include: building the tree of life on terascale systems;efficient block device sharing over myrinet with memory bypass;conserving memory bandwidth in chip multiprocessors with runahead execution;towards a better understanding of workload dynamics on data-intensive clusters and grids;energy-aware self-stabilization in mobile ad hoc networks: a multicasting case study;optimizing multiple distributed stream queries using hierarchical network partitions;fast failure detection in a process group;route table partitioning and load balancing for parallel searching with TCAMs;load balancing in the bulk-synchronous-parallel setting using process migrations;capacity sharing and stealing in dynamic server-based real-time systems;and power-aware routing for well-nested communications on the circuit switched tree.
distributed detection has been intensively studied in the past. In this correspondence, we consider the design of local decision rules in the presence of nonideal transmission channels between the sensors and the fusi...
详细信息
distributed detection has been intensively studied in the past. In this correspondence, we consider the design of local decision rules in the presence of nonideal transmission channels between the sensors and the fusion center. Under the conditional independence assumption among multiple sensor observations, we show that the optimal local decisions that minimize the error probability at the fusion center amount to a likelihood-ratio test (LRT) given a particular constraint on the fusion rule. This constraint turns out to be quite general and is easily satisfied for most sensible fusion rules. A design example using a parallel sensor fusion structure with binary-symmetric channels (BSCs) between local sensors and the fusion center is given to illustrate the usefulness of the result in obtaining optimal thresholds for local sensor observations. The study that incorporates the transmission channel in sensor system design may have potential applications in the emerging field of wireless sensor networks.
This paper presents an efficient and low-power-consumption parallel face-detection technology based on Haar-like features and implemented with a massive-parallel memory-embedded SIMD matrix. The massive-parallel memor...
详细信息
ISBN:
(纸本)9781424477739
This paper presents an efficient and low-power-consumption parallel face-detection technology based on Haar-like features and implemented with a massive-parallel memory-embedded SIMD matrix. The massive-parallel memory-embedded SIMD matrix architecture has up to 2,048 2-bit processing elements, which are connected by a flexible switching network, and supports 2-bit 2,048-way bit-serial and word-parallel operations with a single command. For experimented verification of this matrix processing architecture, this parallel Haar-like-feature based face-detection technique has been implemented on an evaluation board and tested in practice. Evaluation results show that a total processing time of about 313 ms at 162 MHz clock frequency and 150 mW power dissipation can be realized. Thus, the reported parallel-face detection method with the massive-parallel memory-embedded SIMD matrix is a practical technology and is a promising solution for real-time mobile multimedia applications.
We describe an experimental time utility for synchronizing the operating system clocks on the SP1 and SP2 parallel system nodes. It synchronizes the node clocks typically within 5 microseconds of each other utilizing ...
详细信息
We describe an experimental time utility for synchronizing the operating system clocks on the SP1 and SP2 parallel system nodes. It synchronizes the node clocks typically within 5 microseconds of each other utilizing the synchronous feature of the SP1 and SP2 interconnection network. This is 2 to 3 orders of magnitude better than what can be achieved by previous methods. Synchronized clocks are useful for parallel program performance measurement and tuning, parallel program tracing and debugging, and gang scheduling of parallel processes, to name a few. We also measure the performance of a widely used time synchronization utility using the SP1 and SP2 interconnection network.
In this paper we present a model and simulator for many clusters of heterogeneous PCs belonging to a local network. These clusters are assumed to be connected to each other through a global network and each cluster is...
详细信息
ISBN:
(纸本)0769524346
In this paper we present a model and simulator for many clusters of heterogeneous PCs belonging to a local network. These clusters are assumed to be connected to each other through a global network and each cluster is managed via a local scheduler which is shared by many users. We validate our simulator by comparing the experimental and analytical results of a M/M/4 queuing system. These studies indicate that the simulator is consistent. Next, we do the comparison with a real batch system and we obtain an average error of 10.5% for the response time and 12% for the makespan. We conclude that the simulator is realistic and well describes the behaviour of a large-scale system. Thus we can study the scheduling of our system called DIRAC in a high throughput context. We, justify our decentralized, adaptive and opportunistic approach in comparison to a centralized approach in such a context.
More and more companies rely on cloud services to provide their online software solutions. Cloud services are offered by a multitude of providers, each of them offering services through proprietary, mostly incompatibl...
详细信息
ISBN:
(纸本)9781509060580
More and more companies rely on cloud services to provide their online software solutions. Cloud services are offered by a multitude of providers, each of them offering services through proprietary, mostly incompatible interfaces. Developing applications employing these vendor specific interfaces can create the "vendor lock-in" problem (i.e the application is tightly coupled to the underlying cloud provider). Consequently, such applications cannot be ported without incurring significant costs and time delay. A cloud services consumer can decide to switch to a different cloud provider based on different criteria such as changes in business requirements, continuously evolving offerings from cloud providers and costs control. Maintaining the flexibility to change cloud providers in an efficient way can be a challenging task. We propose an efficient model-driven framework for cloud application portability. Our approach enables applications consuming REST resources in the cloud to be transferred to different cloud providers without the need to refactor the applications. The framework supports a wide range of cloud resources. The framework produces an intermediation layer which translates the calls between the format of the initial cloud platform and the new target cloud platform. The intermediation layer can be consumed by any programming language. We demonstrate that cloud application portability can be achieved. Our solution successfully maps cloud-based services with an overall median of 100% for requests, and 74.8% for responses. Furthermore, we show that the intermediation layer introduces minimal additional latency.
暂无评论