Mesh-connected processor array is a popular architecture used in parallelprocessing. Extensive studies have been conducted on reconfiguration algorithms for the processor arrays with faults, but few work is on parall...
详细信息
ISBN:
(纸本)9780769548791
Mesh-connected processor array is a popular architecture used in parallelprocessing. Extensive studies have been conducted on reconfiguration algorithms for the processor arrays with faults, but few work is on parallel algorithm to accelerate the reconfiguration. This paper presents a fast algorithm to reconfigure two dimensional mesh-connected processor arrays with faults. A traditional algorithm is successfully accelerated in the manner of multithread, without loss of harvest. The proposed algorithm reconfigures the processor array with the mechanics of route distance in order to avoid the routing errors. Simulation results show that the proposed algorithm can accelerate the reconfiguration nearly by 15 times on a 64 x 64 array in comparison to the traditional algorithm cited in this paper.
Algorithmic codes for scientific computing may exhibit diverse levels of tolerance to memory errors, depending on the program behavior when accessing data. There are factors that can be controlled in an HPC program an...
详细信息
ISBN:
(纸本)9781538649756
Algorithmic codes for scientific computing may exhibit diverse levels of tolerance to memory errors, depending on the program behavior when accessing data. There are factors that can be controlled in an HPC program and may influence the tolerance degree to memory errors. A characterization of the degree of vulnerability an application exhibits can help to improve its security as well as save time and resources. In this work, we study some main factors that may have an impact on the propagation of errors originated from memory accesses.
Cost sensitive applications for parallel computing require system designs using commodity hardware. Off-the-shelf processing node have already been implemented in parallel systems. This article proposes the use of ATM...
详细信息
ISBN:
(纸本)0818674601
Cost sensitive applications for parallel computing require system designs using commodity hardware. Off-the-shelf processing node have already been implemented in parallel systems. This article proposes the use of ATM (Asynchronous Transfer Mode) for interconnection networks. Because ATM was not designed as communication technology for parallel systems, some adaptation has to be done in order to meet the special requirements of parallel systems. This paper discusses advantages and drawbacks of this approach and shows solutions to adapt the ATM technology for usage in this special environment while preserving some unique features of ATM.
Latency-sensitive multiparty applications involve intensive communication between multiple participating nodes. Relays are usually adopted for matchmaking end hosts, filtering unwanted traffics, bypassing routing outa...
详细信息
ISBN:
(纸本)9781479980062
Latency-sensitive multiparty applications involve intensive communication between multiple participating nodes. Relays are usually adopted for matchmaking end hosts, filtering unwanted traffics, bypassing routing outages and so on. Speeding up the relay-communication becomes increasingly important to improve the QoE of clients. Currently, no rigorous guarantees have been made for the latency-optimal relay communication. We propose a novel framework to truthfully represent the relay communication in the latency space. Real-world data sets show that nearly 90% of node triples obey the average triangle inequality, while our new model allows for the asymmetry and triangle inequality violations to occur. We propose the general triangle to rigorously locate a candidate relay closer to multiple nodes, with which we systematically analyze the feasibility of finding an optimal relay node for arbitrarily sized groups. Our results show that distributed greedy methods are able to locate optimal relays with modest communication overhead and small search hops.
This paper presents a novel Adaptive Dynamic Grid-based Data Distribution Management (DDM) scheme, which we refer to as ADGB. The main objective of our protocol is to optimize DDM time through matching probability (MP...
详细信息
Mapping a pipelined application onto a distributed and parallel platform is a challenging problem. The problem becomes even more difficult when multiple optimization criteria are involved, and when the target resource...
详细信息
With the rapid development of information communication, computer and control technology, the smart grid has become a direction and trend of the development of electric power industry. The ultimate goal of smart grid ...
详细信息
ISBN:
(纸本)9781509035397
With the rapid development of information communication, computer and control technology, the smart grid has become a direction and trend of the development of electric power industry. The ultimate goal of smart grid is to build a panoramic real-time system which covers the whole production process of power system. However, it is difficult to meet the demand of the power system dispatching department to store and process large scale data in the current power system. In view of the above reasons, this paper develops the Mysql-CIM model, realizes the distributed cloud storage of power network data, and develops the application of the parallel topology processing of power network. Verified by the case, this paper develops the model Mysql-CIM fully meet requirements of the smart grid of system reliability, availability, high throughput;In this paper, the development of the power network based on CIM parallel topological processing is applied to realize the fast network topology processing and topology island formation, its running time scale increased to milliseconds, greatly improving the work efficiency of the system.
This paper presents the novel heterogeneous DSP architecture ePUMA and demonstrates its features through an implementation of sorting of larger data sets. We derive a sorting algorithm with fixed-size merging tasks su...
详细信息
ISBN:
(纸本)9781467379526
This paper presents the novel heterogeneous DSP architecture ePUMA and demonstrates its features through an implementation of sorting of larger data sets. We derive a sorting algorithm with fixed-size merging tasks suitable for distributed memory architectures, which allows very simple scheduling and predictable data-independent sorting time. The implementation on ePUMA utilizes the architecture's specialized compute cores and control cores, and local memory parallelism, to separate and overlap sorting with data access and control for close to stall-free sorting. Penalty-free unaligned and out-of-order local memory access is used in combination with proposed application-specific sorting instructions to derive highly efficient local sorting and merging kernels used by the system-level algorithm. Our evaluation shows that the proposed implementation can rival the sorting performance of high-performance commercial CPUs and GPUs, with two orders of magnitude higher energy efficiency, which would allow high-performance sorting on low-power devices.
Congestion Control is a necessary tool in Transaction processing Systems (TPS), to avoid excessive degradation of response times. But it should have an autonomic behavior, adapting automatically to request characteris...
详细信息
ISBN:
(纸本)9783540747413
Congestion Control is a necessary tool in Transaction processing Systems (TPS), to avoid excessive degradation of response times. But it should have an autonomic behavior, adapting automatically to request characteristics to deliver the best possible service. Given that every request has either explicit or implicit deadlines (maximum acceptable response times), we analyze strategies that target request deadlines and throughput. These include maximum throughput seeker strategies, strategies using feedback control on miss rate and an additional proposal for preventive control of miss rates and throughput. Another goal of our proposal is for the control to be external and it should not rely on analytic models for control (because the predictions may be erroneous due to physical system issues). We analyze and compare alternative designs for the control including our own proposals and related ones. Our experiments consider both varied request inter-arrival and duration distributions and a real transaction processing benchmark.
parallelizing software applications through the use of existing optimized primitives is a common trend that mediates the complexity of manual parallelization and the use of less efficient directive-based programming m...
详细信息
ISBN:
(纸本)9781479986705
parallelizing software applications through the use of existing optimized primitives is a common trend that mediates the complexity of manual parallelization and the use of less efficient directive-based programming models. parallel primitive libraries allow software engineers to map any sequential code to a target many-core architecture by identifying the most computational intensive code sections and mapping them into one ore more existing primitives. On the other hand, the spreading of such a primitive-based programming model and the different GPU architectures have led to a large and increasing number of third-party libraries, which often provide different implementations of the same primitive, each one optimized for a specific architecture. From the developer point of view, this moves the actual problem of parallelizing the software application to selecting, among the several implementations, the most efficient primitives for the target platform. This paper presents a profiling framework for GPU primitives, which allows measuring the implementation quality of a given primitive by considering the target architecture characteristics. The framework collects the information provided by a standard GPU profiler and combines them into optimization criteria. The criteria evaluations are weighed to distinguish the impact of each optimization on the overall quality of the primitive implementation. The paper shows how the tuning of the different weights has been conducted through the analysis of five of the most widespread existing primitive libraries and how the framework has been eventually applied to improve the implementation performance of a standard primitive.
暂无评论