the SOBER family ciphers are widely used in embedded devices. For improving these ciphers' processing speed, this paper introduces the reconfigurable processing architecture design for them. According to need, the...
详细信息
this paper presents a computational performance analysis of an accelerated medical image registration using Graphics processing Units (GPUs). In our previous work, a multi-resolution approach using normalized mutual i...
详细信息
ISBN:
(纸本)9780769538839
this paper presents a computational performance analysis of an accelerated medical image registration using Graphics processing Units (GPUs). In our previous work, a multi-resolution approach using normalized mutual information (NMI) has proven to be useful in medical image registration. In this paper, we propose an acceleration of the NMI procedure using GPU implementation because of the parallelprocessing capabilities. Registration algorithms were implemented on NVIDIA's GeForece 9600 GT graphic processor withthe Compute Unified Device Architecture (CUDA) programming environment. Experimental results showed that the GPU implementation improves the registration computational performance with a speedup factor of 23.4x. In addition, the maximum speedup can be achieved with diligent data profiling.
In this paper, we present a parallel connected component labeling method and its VLSI architecture design. the proposed method can assign labels to three pixels simultaneously for the raster scan input and then genera...
详细信息
In the field of HPC, the current hardware trend is to design multiprocessor architecturesthat feature heterogeneous technologies such as specialized coprocessors (e.g., Cell/BE SPUs) or data-parallel accelerators (e....
详细信息
ISBN:
(纸本)9783642038686
In the field of HPC, the current hardware trend is to design multiprocessor architecturesthat feature heterogeneous technologies such as specialized coprocessors (e.g., Cell/BE SPUs) or data-parallel accelerators (e.g., GPGPUs). Approaching the theoretical performance of these architectures is a complex issue. Indeed, substantial efforts have already been devoted to efficiently offload parts of the computations. However, designing an execution model that unifies all computing units and associated embedded memory remains a main challenge. We have thus designed STAR PU, an original runtime system providing a high-level, unified execution model tightly coupled with an expressive data management library. the main goal of STARPU is to provide numerical kernel designers with it convenient way to generate parallel tasks over heterogeneous hardware on the one hand, and easily develop and tune powerful scheduling algorithms on the other hand. We have developed several strategies that can be selected seamlessly at run time, and we have demonstrated their efficiency by analyzing the impact of those scheduling policies on several classical linear algebra algorithmsthat take advantage of multiple cores and GPUs at the same time. In addition to substantial improvements regarding execution times, we obtained consistent superlinear parallelism by actually exploiting the heterogeneous nature of the machine.
Recently, an increasing number of ancient documents are being digitized in text form, but it is difficult to apply natural language processing techniques to these documents because the language resources for ancient l...
详细信息
this paper proposes a parallel particle swarm optimization (PPSO) by dividing the search space into sub-spaces and using different swarms to optimize different parts of the space. In the PPSO framework, the search spa...
详细信息
ISBN:
(纸本)9783642030949
this paper proposes a parallel particle swarm optimization (PPSO) by dividing the search space into sub-spaces and using different swarms to optimize different parts of the space. In the PPSO framework, the search space is regarded as a solution vector and is divided into two sub-vectors. Two cooperative swarms work in parallel and each swarm only optimizes one of the subvectors. An adaptive asynchronous migration strategy (AAMS) is designed for the swarms to communicate with each other. the PPSO benefits from the following two aspects. First, the PPSO divides the search space and each swarm can focus on optimizing a smaller scale problem. this reduces the problem complexity and makes the algorithm promising in dealing with large scale problems. Second, the AAMS makes the migration adapt to the search environment and results in a very timing and efficient communication fashion. Experiments based on benchmark functions have demonstrated the good performance of the PPSO with AAMS oil both solution accuracy and convergence speed when compared withthe traditional serial PSO (SPSO) and the PPSO with fixed migration frequency.
By leveraging modern networking hardware (RDMA-enabled network cards), we can shift priorities in distributed database processing significantly. Complex and sophisticated mechanisms to avoid network traffic can be rep...
详细信息
ISBN:
(纸本)9781605587011
By leveraging modern networking hardware (RDMA-enabled network cards), we can shift priorities in distributed database processing significantly. Complex and sophisticated mechanisms to avoid network traffic can be replaced by a scheme that takes advantage of the bandwidth and low latency offered by such interconnects. We illustrate this phenomenon with cyclo-join, an efficient join algorithm based on continuously pumping data through a ring-structured network. Our approach is capable of exploiting the resources of all CPUs and distributed main-memory available in the network for processing queries of arbitrary shape and datasets of arbitrary size. Copyright 2009 ACM.
We build wavelet-based adaptive numerical methods for the simulation of advection dominated flows that develop multiple spatial scales, with an emphasis on fluid mechanics problems. Wavelet based adaptivity is inheren...
详细信息
ISBN:
(纸本)9783642038686
We build wavelet-based adaptive numerical methods for the simulation of advection dominated flows that develop multiple spatial scales, with an emphasis on fluid mechanics problems. Wavelet based adaptivity is inherently sequential and in this work we demonstrate that these numerical methods can be implemented in software that is capable of harnessing the capabilities of multi-core architectures while maintaining their computational efficiency. Recent designs in frameworks for multi-core software development allow us to rethink parallelism as task-based, where parallel tasks are specified and automatically mapped into physical threads. this way of exposing parallelism enables the parallelization of algorithmsthat were considered inherently sequential, such as wavelet-based adaptive simulations. In this paper we present a framework that combines wavelet-based adaptivity withthe task-based parallelism. We demonstrate good scaling performance obtained by simulating diverse physical systems on different multi-core and SMP architectures using up to 16 cores.
A high efficiency Carrier Tracking Loop to deals with high dynamic circumstance GPS receiver was presented in this paper. Based on FFT parallel capture method, fast frequency discrimination and its realization scheme ...
详细信息
ISBN:
(纸本)9781424436927
A high efficiency Carrier Tracking Loop to deals with high dynamic circumstance GPS receiver was presented in this paper. Based on FFT parallel capture method, fast frequency discrimination and its realization scheme are studied. We proposed a non-linear carrier NCO unit to track precisely by selecting interpolating filter orders. the carrier tracking is performed by the interpolating filter, which is derived from the frequency discriminator. As usual, carrier correction could be performed by phase rotating, and research shows three-order interpolation filter should give high tracking accuracy. We also put forward a signal processing architecture combined DSP and FPGA deeply studied base-band parallelprocessing technology in GPS receiver. the simulation demonstrates that the digital frequency tracking loop has high performance on tracking precision, acquisition and enable fast seeking ability in frequency zone under high dynamic circumstance.
Multimedia and some scientific applications have achieved good performance on the stream processor architecture by employing the stream programming model. In order to find out the way to accelerate the symmetric crypt...
详细信息
暂无评论