Erasure codes can improve the availability of distributed storage in comparison with replication systems. In this paper, we focus on investigating how to map systematically the Reed-Solomon and Cauchy Reed-Solomon era...
详细信息
ISBN:
(纸本)9783642281440;9783642281457
Erasure codes can improve the availability of distributed storage in comparison with replication systems. In this paper, we focus on investigating how to map systematically the Reed-Solomon and Cauchy Reed-Solomon erasure codes onto the Cell/B.E. and GPU multicore architecture. A method for the systematic mapping of computation kernels of encoding/decoding algorithms onto the Cell/B.E. architecture is proposed. this method takes into account properties of the architecture on all three levels of its parallelprocessing hierarchy. the performance results are shown to be very promising. the possibility of using GPUs is studied as well, based on the Cauchy version of Reed-Solomon codes.
the Discrete Periodic Radon Transform (DPRT) has many important applications in reconstructing images from their projections and has recently been used in fast and scalable architectures for computing 2D convolutions....
详细信息
ISBN:
(纸本)9781479970612
the Discrete Periodic Radon Transform (DPRT) has many important applications in reconstructing images from their projections and has recently been used in fast and scalable architectures for computing 2D convolutions. Unfortunately, the direct computation of the DPRT involves O(N-3) additions and memory accesses that can be very costly in single-core architectures. the current paper presents new and efficient algorithms for computing the DPRT and its inverse on multi-core CPUs and GPUs. the results are compared against specialized hardware implementations (FPGAs/ASICs). the results provide significant evidence of the success of the new algorithms. On an 8-core CPU (Intel Xeon), with support for two threads per core, FastDirDPRT and FastDirInvDPRT achieve a speedup of approximately 10x (up to 12:83x) over the single-core CPU implementation. On a 2048-core GPU (GTX 980), FastRayDPRT and FastRayInvDPRT achieve speedups in the range of 526 (for 127 x 127) to 873 (for 1021 x 1021), which approximate ideal speedups of what can be achieved. the DPRT can be computed exactly and in real-time (30 frames per second) for 1471x1471 images using FastRayDPRT on the GPU. Furthermore, the GPU algorithms approximate the performance of an efficient FPGA implementation using 2N parallel cores at 100MHz.
As the fast development of Bluetooth networks and wireless communications, the mobile devices share information with each other easier than ever before. However, the handy communication technology accompanies privacy ...
详细信息
ISBN:
(纸本)9783642131189
As the fast development of Bluetooth networks and wireless communications, the mobile devices share information with each other easier than ever before. However, the handy communication technology accompanies privacy and security issues. Nowadays, a Bluetooth adopts peer-to-peer and Frequency Hopping Spread Spectrum (FHSS) mechanisms to avoid data reveal, but the malicious attacks collect the transmission data of the relay station for a long period of time and then can break into the system. In this study, we take a Piconet as a cube, and transform a Scatternet into a cluster (N-cube) structure. Subsequently, this study exploits the Elliptic Curve Diffie-Hellman (ECDH) [1] and the conference Key (CK) schemes to perform session key agreements and secure data transmissions. the proposed scheme only needs a small key length 160-bit to achieve compatible security levels on 1024-bit Diffee-Hellman (DH) [2], and each node uses few CPU, memory and bandwidth to complete security operations. As a result, the proposed fault-tolerant routing algorithm with secure data transmissions can perform rapidly and efficiently, and is quite suited for Bluetooth networks with limited resources.
this paper presents a two-level parallel evolutionary algorithm for solving function optimization problem containing multiple solutions.. By combining the characteristics of both global search and local search, the fo...
详细信息
ISBN:
(纸本)0769515126
this paper presents a two-level parallel evolutionary algorithm for solving function optimization problem containing multiple solutions.. By combining the characteristics of both global search and local search, the former enables individual to draw closer to each optimal solution and keeps the genetic diversity,of individuals. then different individuals are selected fort local evolution in their appropriate neighborhood. this simple as well as easy-to-handle algorithm turns out to be very practical according to the numerical experiments which indicate that all optimal solutions can be found out by running once of the algorithm within a fairly short period of time.
To enumerate chemical compounds with given path frequencies is a fundamental procedure in Chemo- and Bio-inforrnatics. the applications include structure determination, novel molecular development, etc. the problem co...
详细信息
ISBN:
(纸本)9783642131356
To enumerate chemical compounds with given path frequencies is a fundamental procedure in Chemo- and Bio-inforrnatics. the applications include structure determination, novel molecular development, etc. the problem complexity has been proven as NP-hard. Many methods have been proposed to solve this problem. However, most of them are heuristic algorithms. Fujiwara et al. propose a sequential branch-and-bound algorithm. Although it reaches all solutions and avoids exhaustive searching, the computation time still increases significantly when the number of atoms increases. Hence, in this paper, a parallel algorithm is presented for solving this problem. the experimental results showed that computation time was reduced even when more processes were launched. Moreover, the speed-up ratio for most of the test cases was satisfactory and, furthermore, it showed potential for use in drug design.
A novel reconfigurable architecture based on a Multi-Ring Multiprocessor Network is described. the reconfigurable architecture is shown to combine low network diameter with a low degree of connectivity for each node i...
详细信息
this paper describes the design of unified support vector machine circuit for pedestrians and cars detection. By unifying the algorithms and architectures of linear and nonlinear SVM classifications, the proposed circ...
详细信息
ISBN:
(纸本)9781467308595
this paper describes the design of unified support vector machine circuit for pedestrians and cars detection. By unifying the algorithms and architectures of linear and nonlinear SVM classifications, the proposed circuit can support both linear and non-linear classifications very efficiently in terms of circuit size and performance. the circuit size is minimized by sharing most of the resources required in the computation for both classification types. parallel architecture with pipeline is adopted to accelerate the processing speed to handle a large amount of operations for real-time processing. 48x96 and 64x64 sliding windows with 6 window strides are used to detect pedestrians and cars, respectively. the synthesized circuit using 65nm standard cell library consists of 848,349 gates and its maximum operating frequency is 435MHz. the circuit can process 91.9 640x480 image frames per second assuming three cameras equipped on front, right and left side positions of the vehicle.
We present the first parallel algorithm for building a Hausdorff Voronoi diagram (HVD). Our algorithm is targeted towards cluster computing architectures and computes the Hausdorff Voronoi diagram for non-crossing obj...
详细信息
ISBN:
(纸本)0769526365
We present the first parallel algorithm for building a Hausdorff Voronoi diagram (HVD). Our algorithm is targeted towards cluster computing architectures and computes the Hausdorff Voronoi diagram for non-crossing objects in time O(nlog(4)n/p)for input size n and p processors. In addition, our parallel algorithm also implies a new sequential HVD algorithm that constructs HVDs for noncrossing objects in time O(n log(4) n). this improves on previous sequential results and solves an open problem posed by Papadopoulou and Lee [18].
this paper presents a parallel architecture that can simultaneously perform block-matching motion estimation (ME) and discrete cosine transform (DCT). Because DCT and ME are both processed block by block, it is prefer...
详细信息
ISBN:
(纸本)9783540729044
this paper presents a parallel architecture that can simultaneously perform block-matching motion estimation (ME) and discrete cosine transform (DCT). Because DCT and ME are both processed block by block, it is preferable to put them in one module for resource sharing. Simulation results performed using Simulink demonstrate that the parallel fashioned architecture improves the performance in terms of running time by 18.6% compared to the conventional sequential fashioned architecture.
In the recent years streaming accelerators like GPUs have been pop-up as an effective step towards parallel computing. the wish-list for these devices span from having a support for thousands of small cores to a natur...
详细信息
ISBN:
(纸本)9781467344265;9781467344258
In the recent years streaming accelerators like GPUs have been pop-up as an effective step towards parallel computing. the wish-list for these devices span from having a support for thousands of small cores to a nature very close to the general purpose computing. this makes the design space very vast for the future accelerators containing thousands of parallel streaming cores. this complicates to exercise a right. choice of the architectural configuration for the next generation devices. However, accurate deign space exploration tools developed for the massively parallelarchitectures can ease this task. the main objectives of this work are twofold. (i) We present a complete environment of a trace driven simulator named SArcs(1) (Streaming Architectural Simulator) for the streaming accelerators. (ii) We use our simulation tool-chain for the design space explorations of the GPU like streamin g architectures. Our design space explorations for different architectural aspects of a GPU like device are with reference to a base line established for NVIDlA's Fermi architecture (GPU Tesla C2050). the explored aspects include the performance effects by the variations in the configurations of Streamiflg Multiprocessors, Global Memory Bandwidth, Chaflflels between SMs down to Memory Hierarchy and Cache Hierarchy. the explorations are performed using application kernels from m Vector Reduction, 2D-COnvolution, Matrix-Matrix Multiplication and 3D-Stencil. Results show that the configurations of the computational resources for the current Fermi GPU device can deliver higher performance with further improvement in the global memory bandwidth for the same device.
暂无评论