Google’s MapReduce enables program automatic parallelization by partitioning input data and replicating functions, but it does not directly support complex parallel modes like pipeline. However, many parallel modes a...
详细信息
Google’s MapReduce enables program automatic parallelization by partitioning input data and replicating functions, but it does not directly support complex parallel modes like pipeline. However, many parallel modes are helpful to optimize solution of parallel computing problem. In this paper, we propose EFC (Execution Flow Control), a novel programming model and related implementation. It supports an execution-flow control interface which makes the model more compatible with different parallel modes. It allows user to modify execution flow as needed. The new model enables simple compact design of most parallel modes.
We introduce the Concurrent Collections (CnC) programming model. CnC supports flexible combinations of task and data parallelism while retaining determinism. CnC is implicitly parallel, with the user providing high-le...
详细信息
We introduce the Concurrent Collections (CnC) programming model. CnC supports flexible combinations of task and data parallelism while retaining determinism. CnC is implicitly parallel, with the user providing high-level operations along with semantic ordering constraints that together form a CnC graph. We formally describe the execution semantics of CnC and prove that the model guarantees deterministic computation. We evaluate the performance of CnC implementations on several applications and show that CnC offers performance and scalability equivalent to or better than that offered by lower-level parallel programming models.
Technical advances are leading to a pervasive computational ecosystem that integrates computing infrastructures with embedded sensors and actuators, and are giving rise to a new paradigm for monitoring, understanding,...
详细信息
Technical advances are leading to a pervasive computational ecosystem that integrates computing infrastructures with embedded sensors and actuators, and are giving rise to a new paradigm for monitoring, understanding, and managing natural and engineered systems one that is information/data-driven. In this paper, we present a programming system that can support such end-to-end sensor-based dynamic data-driven applications. Specifically, the programming system enables these applications at two levels. First, it provides programming abstractions for integrating sensor systems with computational models for scientific and engineering processes and with other application components in an end-to-end experiment. Second, it provides programming abstractions and system software support for developing in-network data processing mechanisms. The former supports complex querying of the sensor system, while the latter enables development of in-network data processing mechanisms such as aggregation, adaptive interpolation and assimilation. Furthermore, for the latter, we also explore the use of temporal and spatial correlations of sensor measurements in the targeted application domains to. tradeoff between the complexity of coordination among sensor clusters and the savings that result from having fewer sensors for in-network processing, while maintaining an acceptable error threshold. The research is evaluated using two application scenarios: the management and optimization of an instrumented oil field and the management and optimization of an instrumented data center. Experimental results show that the provided programming system reduces overheads while achieving near optimal and timely management and control in both application scenarios. (C) 2010 Elsevier B.V. All rights reserved.
The paper presents the SmartGridRPC model, an extension of the GridRPC model, which aims to achieve higher performance. The traditional GridRPC provides a programming model and API for mapping individual tasks of an a...
详细信息
The paper presents the SmartGridRPC model, an extension of the GridRPC model, which aims to achieve higher performance. The traditional GridRPC provides a programming model and API for mapping individual tasks of an application in a distributed Grid environment, which is based on the client-server model characterized by the star network topology. SmartGridRPC provides a programming model and API for mapping a group of tasks of an application in a distributed Grid environment, which is based on the fully connected network topology. The SmartGridRPC programming model and API and its performance advantages over the GridRPC model are outlined in this paper. In addition, experimental results using a real-world application are also presented. Copyright (C) 2010 John Wiley & Sons, Ltd.
The sarc architecture is composed of multiple processor types and a set of user-managed Direct Memory Access (DMA) engines that let the runtime scheduler overlap data transfer and computation. The runtime system autom...
详细信息
The sarc architecture is composed of multiple processor types and a set of user-managed Direct Memory Access (DMA) engines that let the runtime scheduler overlap data transfer and computation. The runtime system automatically allocates tasks on the heterogeneous cores and schedules the data transfers through the DMA engines. SARC's programming model supports various highly parallel applications, with matching support from specialized accelerator processors.
Efficient transaction nesting is one of the ongoing challenges for hardware transactional memory. To increase efficiency of closed nesting, this paper proposes a conditional partial rollback (CPR) scheme which support...
详细信息
ISBN:
(纸本)9783642119491
Efficient transaction nesting is one of the ongoing challenges for hardware transactional memory. To increase efficiency of closed nesting, this paper proposes a conditional partial rollback (CPR) scheme which supports conditional partial rollback without increasing hardware complexities significantly. In stead of rolling back to the outermost transaction as in commonly-used flattening model, the CPR scheme just rolls back to the conflicted transaction itself or one of its outer-level transactions if given conditions are satisfied. By recording access status of each nested transaction, the scheme uses one global data set for all of the nested transactions rather than independent data set for each nested transaction. Hardware transactional memory architecture with Hie support of CPR scheme is also proposed based on multi-core processor and current cache coherence mechanism. Time system is implemented by simulation, and evaluated using seven benchmark applications. Evaluation results show that the CPR scheme achieves better performance and scalability than the flattening model which is commonly-used in hardware transactional memory.
The Compute Unified Device Architecture (CUDA) programming environment from NVIDIA is a milestone towards making programming many-core GPUs more flexible to programmers. However, there are still many challenges for pr...
详细信息
ISBN:
(纸本)9783642156717
The Compute Unified Device Architecture (CUDA) programming environment from NVIDIA is a milestone towards making programming many-core GPUs more flexible to programmers. However, there are still many challenges for programmers when using CUDA. One is how to deal with GPU device memory, and data transfer between host memory and GPU device memory explicitly. In this study, source-to-source compiling and runtime library technologies are used to implement an experimental programming system based on CUDA, called memCUDA, which can automatically map GPU device memory to host memory. With some pragma directive language, programmer can directly use host memory in CUDA kernel functions, during which the tedious and error-prone data transfer and device memory management are shielded from programmer. The performance is also improved with some near-optimal technologies. Experiment results show that memCUDA programs can get similar effect with well-optimized CUDA programs with more compact source code.
In principle, the entire world can exploit ubiquitous and pervasive systems to great societal benefits. In practice, however, there is as yet no fundamental basis or widely accepted programming models for such systems...
详细信息
Parallel programming is an important tool used in flash memories to achieve high write speed. In parallel programming, a common programm voltage is applied to many cells for simultaneous charge injection. This propert...
详细信息
ISBN:
(纸本)9781424482641
Parallel programming is an important tool used in flash memories to achieve high write speed. In parallel programming, a common programm voltage is applied to many cells for simultaneous charge injection. This property significantly simplifies the complexity of the memory hardware, and is a constraint that limits the storage capacity of flash memories. Another important property is that cells have different hardness for charge injection. It makes the charge injected into cells differ even when the same program voltage is applied to them. In this paper, we study the parallel programming of flash memory cells, focusing on the above two properties. We present algorithms for parallel programming when there is information on the cells' hardness for charge injection, but there is no feedback information on cell levels during programming. We then proceed to the programming model with feedback information on cell levels, and study how well the information on the cells' hardness for charge injection can be obtained. The results can be useful for understanding the storage capacity of flash memories with parallel programming.
In the past several years, grid computing has emerged as a way to harness computing resources geographically distributed across multiple organizations. Due to its inherently largely distributed and heterogeneous natur...
详细信息
In the past several years, grid computing has emerged as a way to harness computing resources geographically distributed across multiple organizations. Due to its inherently largely distributed and heterogeneous nature, grid computing has enlarged the importance of specific requirements, such as scalability, performance and the need of an adequate programming model. Several programming models have been proposed for grid programming. Nonetheless, so far, none of them met all the requirements. Differently, in the field of high performance cluster computing, the message passing model became a true standard with a large number of libraries and legacy applications. This work proposes a hybrid framework that combines the high performance and high acceptability of the MPI standard boosted with intuitive extensions to enable developers to design grid applications or "gridify" existing ones with the flexibility of a component-based runtime modeling resources hierarchy and offering support to inter-cluster communication. The proposed solution relies on the addition of new MPI communicators and a related API, which may offer a support well-suited to programmers used to MPI in order to reflect a hierarchical topology within the deployed application. Carlo Simulation, a Mergesort and a Poissond3D solver) have shown that the "gridification" of applications improve application performance on grid environments. Even if the goal is not to compete against existing MPI distributions, the performance of the solution is comparable with MPI performance, even better in some cases. From the results obtained in the evaluation of this prototype, we conclude that the overhead introduced by the components is not negligible, but inside of the expected. However, we can expect the benefits to grid applications to bypass the generated overhead. Besides, the extended interface may offer users the adequate abstractions to design parallel algorithms in a hierarchical way addressing grid environments
暂无评论