this paper presents a novel parallel algorithm to synthesize textures in patches. It decomposes the synthesis process into two steps by the chessboard pattern, withthe first step to place patches in the black grids, ...
详细信息
In the field of HPC, the current hardware trend is to design multiprocessor architecturesthat feature heterogeneous technologies such as specialized coprocessors (e.g., Cell/BE SPUs) or data-parallel accelerators (e....
详细信息
ISBN:
(纸本)9783642038686
In the field of HPC, the current hardware trend is to design multiprocessor architecturesthat feature heterogeneous technologies such as specialized coprocessors (e.g., Cell/BE SPUs) or data-parallel accelerators (e.g., GPGPUs). Approaching the theoretical performance of these architectures is a complex issue. Indeed, substantial efforts have already been devoted to efficiently offload parts of the computations. However, designing an execution model that unifies all computing units and associated embedded memory remains a main challenge. We have thus designed STAR PU, an original runtime system providing a high-level, unified execution model tightly coupled with an expressive data management library. the main goal of STARPU is to provide numerical kernel designers with it convenient way to generate parallel tasks over heterogeneous hardware on the one hand, and easily develop and tune powerful scheduling algorithms on the other hand. We have developed several strategies that can be selected seamlessly at run time, and we have demonstrated their efficiency by analyzing the impact of those scheduling policies on several classical linear algebra algorithmsthat take advantage of multiple cores and GPUs at the same time. In addition to substantial improvements regarding execution times, we obtained consistent superlinear parallelism by actually exploiting the heterogeneous nature of the machine.
this paper addresses a novel coarse grain dynamic reconfigurable computing system, called DReAC-2, design and hardware implement. A whole DReAC-2 system integrates a Nios II processor, which manages the whole reconfig...
详细信息
ISBN:
(纸本)9781424438686
this paper addresses a novel coarse grain dynamic reconfigurable computing system, called DReAC-2, design and hardware implement. A whole DReAC-2 system integrates a Nios II processor, which manages the whole reconfigurable system, and a dynamic reconfigurable coprocessor, which comprises of an 8x8processing node array designed for high regularity, high computation-intensive tasks. Hardware prototype of DReAC-2 has been implemented on the ALTERA STRATIX II EP2S180 development board. According to task's nature, MIMD computing array can select either parallel-pipelined pattern or array-parallel pattern to gain the better performance. the experiment results show that DReAC-2 achieves much higher 10 similar to 100x factor than NIOS II processors, and 2x similar to 4x factors and higher precision than some others reconfigurable processors(1).
this paper proposes a parallel particle swarm optimization (PPSO) by dividing the search space into sub-spaces and using different swarms to optimize different parts of the space. In the PPSO framework, the search spa...
详细信息
ISBN:
(纸本)9783642030949
this paper proposes a parallel particle swarm optimization (PPSO) by dividing the search space into sub-spaces and using different swarms to optimize different parts of the space. In the PPSO framework, the search space is regarded as a solution vector and is divided into two sub-vectors. Two cooperative swarms work in parallel and each swarm only optimizes one of the subvectors. An adaptive asynchronous migration strategy (AAMS) is designed for the swarms to communicate with each other. the PPSO benefits from the following two aspects. First, the PPSO divides the search space and each swarm can focus on optimizing a smaller scale problem. this reduces the problem complexity and makes the algorithm promising in dealing with large scale problems. Second, the AAMS makes the migration adapt to the search environment and results in a very timing and efficient communication fashion. Experiments based on benchmark functions have demonstrated the good performance of the PPSO with AAMS oil both solution accuracy and convergence speed when compared withthe traditional serial PSO (SPSO) and the PPSO with fixed migration frequency.
As more computing cores are integrated onto a single chip, the effect of network communication latency is becoming more and more significant on Multi-core Network-on-Chips (NoCs). For data-parallel applications, we st...
详细信息
ISBN:
(纸本)9781424438686
As more computing cores are integrated onto a single chip, the effect of network communication latency is becoming more and more significant on Multi-core Network-on-Chips (NoCs). For data-parallel applications, we study the model of parallel speedup by including network communication latency in Amdahl's law. the speedup analysis considers the effect of network topology, network size, traffic model and computation/communication ratio. We also study the speedup efficiency. In our Multi-core NoC platform, a real data-parallel application, i.e. matrix multiplication, is used to validate the analysis. Our theoretical analysis and the application results show that the speedup improvement is nonlinear and the speedup efficiency decreases as the system size is scaled up. Such analysis can be used to guide architects and programmers to improve parallelprocessing efficiency by reducing network latency with optimized network design and increasing computation proportion in the program.
We build wavelet-based adaptive numerical methods for the simulation of advection dominated flows that develop multiple spatial scales, with an emphasis on fluid mechanics problems. Wavelet based adaptivity is inheren...
详细信息
ISBN:
(纸本)9783642038686
We build wavelet-based adaptive numerical methods for the simulation of advection dominated flows that develop multiple spatial scales, with an emphasis on fluid mechanics problems. Wavelet based adaptivity is inherently sequential and in this work we demonstrate that these numerical methods can be implemented in software that is capable of harnessing the capabilities of multi-core architectures while maintaining their computational efficiency. Recent designs in frameworks for multi-core software development allow us to rethink parallelism as task-based, where parallel tasks are specified and automatically mapped into physical threads. this way of exposing parallelism enables the parallelization of algorithmsthat were considered inherently sequential, such as wavelet-based adaptive simulations. In this paper we present a framework that combines wavelet-based adaptivity withthe task-based parallelism. We demonstrate good scaling performance obtained by simulating diverse physical systems on different multi-core and SMP architectures using up to 16 cores.
Motivated by a peer-to-peer estimation algorithm in which adaptive weights are optimized to minimize the estimation error variance, we formulate and solve a novel non-convex Lipschitz optimization problem that guarant...
详细信息
ISBN:
(纸本)9781424451081
Motivated by a peer-to-peer estimation algorithm in which adaptive weights are optimized to minimize the estimation error variance, we formulate and solve a novel non-convex Lipschitz optimization problem that guarantees global stability of a large class of peer-to-peer consensus-based algorithms for wireless sensor network. Because of packet. losses, the solution of this optimization problem cannot be achieved efficiently with either traditional centralized methods or distributed Lagrangian message passing. the prove that the optimal solution can be obtained by solving a set of nonlinear equations. A fast distributed algorithm, which requires only local computations, is presented for solving these equations. Analysis and computer simulations illustrate the algorithm and its application to various network topologies.
Multi-wavelength data cross-match among multiple catalogs is a basic and unavoidable step to make distributed digital archives accessible and inter-operable. As current catalogs often contain millions or billions obje...
详细信息
ISBN:
(纸本)9783642030949
Multi-wavelength data cross-match among multiple catalogs is a basic and unavoidable step to make distributed digital archives accessible and inter-operable. As current catalogs often contain millions or billions objects, it is a typical data-intensive computation problem. In this paper, a high-efficient parallel approach of astronomical cross-match is introduced. We issue our partitioning and parallelization approach, after that we address some problems introduced by task partition and give the solutions correspondingly, including a sky splitting function HEALPix we selected which play a key role on boththe task partitioning and the database indexing, and a quick bit-operation algorithm we advanced to resolve the block-edge problem. Our experiments prove that the function has a marked performance superiority comparing withthe previous functions and is fully applicable to large-scale cross-match.
Physically-based illumination is an essential factor for realistic rendering. In this context, hierarchical radiosity is one of the most accurate global illumination methods. One of the key features of the radiosity a...
详细信息
ISBN:
(纸本)9780769535449
Physically-based illumination is an essential factor for realistic rendering. In this context, hierarchical radiosity is one of the most accurate global illumination methods. One of the key features of the radiosity approach is that it obtains view-independent global illumination results. Unfortunately, global illumination has high computational and memory requirements, and hierarchical radiosity, though more efficient than other radiosity solutions, is not an exception. the progressive popularization of multiprocessor and multi-core processor systems makes the design and implementation of efficient parallelalgorithms an appealing alternative in this field. In this paper we present a novel parallel radiosity method addressing the hierarchical radiosity computation on current homogeneous multi-core environments. One of the main contributions of our work is the use of different tasks to exploit the independent interactions among the geometric elements in the scene. Our parallel solution leads to a versatile radiosity implementation that takes advantage of the multiple computational resources in the system, such as multi-core processors and SMT (Simultaneous Multithreading) capabilities. Good results in terms of performance have been achieved.
In this paper we present PasS (Privacy as a Service);a set of security protocols for ensuring the privacy and legal compliance of customer data in cloud computing architectures. PasS allows for the secure storage and ...
详细信息
ISBN:
(纸本)9780769539294
In this paper we present PasS (Privacy as a Service);a set of security protocols for ensuring the privacy and legal compliance of customer data in cloud computing architectures. PasS allows for the secure storage and processing of users' confidential data by leveraging the tamper-proof capabilities of cryptographic coprocessors. Using tamper-proof facilities provides a secure execution domain in the computing cloud that is physically and logically protected from unauthorized access. PasS central design goal is to maximize users' control in managing the various aspects related to the privacy of sensitive data. this is achieved by implementing user-configurable software protection and data privacy mechanisms. Moreover, PasS provides a privacy feedback process which informs users of the different privacy operations applied on their data and makes them aware of any potential risks that may jeopardize the confidentiality of their sensitive information. To the best of our knowledge, PasS is the first practical cloud computing privacy solution that utilizes previous research on cryptographic coprocessors to solve the problem of securely processing sensitive data in cloud computing infrastructures.
暂无评论