this paper develops a new approach to compiling C programs for multiple address space, multi-processor DSPs. It integrates a novel data transformation technique that exposes the processor location of partitioned data ...
详细信息
this paper develops a new approach to compiling C programs for multiple address space, multi-processor DSPs. It integrates a novel data transformation technique that exposes the processor location of partitioned data into a parallelization strategy. When this is combined with a new address resolution mechanism, it generates efficient programs that run on multiple address spaces without using message passing. this approach is applied to the UTDSP benchmark suite and evaluated on a four processor TigerSHARC board, where it is shown to outperform existing approaches and give an average speedup of 3.25 on the parallel benchmarks.
A novel GF(p) crypto processor core architecture is presented in this paper. the core is used to implement a GF(p) elliptic curve cryptosystem (ECC). the architecture is such that a single core can be used to implemen...
详细信息
A novel GF(p) crypto processor core architecture is presented in this paper. the core is used to implement a GF(p) elliptic curve cryptosystem (ECC). the architecture is such that a single core can be used to implement ECC or alternatively a two core solution can be adopted. As a result, the core architecture allows the exploitation of the parallelism that exists in elliptic curve point addition and doubling. the core architecture results in several advantages over conventional implementations with regard to speed and power consumption.
3D graphics performance is increasing faster than any other computing application. Almost all PC systems now include 3D graphics accelerators for games, Computer Aided Design (CAD) or visualization applications. this ...
详细信息
3D graphics performance is increasing faster than any other computing application. Almost all PC systems now include 3D graphics accelerators for games, Computer Aided Design (CAD) or visualization applications. this paper investigates the suitability of Field Programmable Gate Array (FPGA) devices as a low cost solution for implementing 3D affine trans formations. A proposed solution based on processing large matrix multiplication has been implemented, for large 3D models, on the RC1000-PP Celoxica board based development platform using Handel-C, a C-like language supporting parallelism, flexible data size and compilation of high-level programs directly into FPGA hardware.
On-chip inductive coupling has been shown to depend on the distance wires ran in parallel. It has also been shown to depend on the distance separating an attacker and the victim. this has major effect on global signal...
详细信息
On-chip inductive coupling has been shown to depend on the distance wires ran in parallel. It has also been shown to depend on the distance separating an attacker and the victim. this has major effect on global signal busses in high performance microprocessors as they are usually routed as a bundle containing a large number of signals traveling for long distances. A solution has been proposed through a process known as swizzling, where the order of signal wires in the bus is continuously rearranged to move attackers and victims away from each other. this technique has the advantage of reducing the mutual inductance between neighboring wires with zero area or routing resource cost. In this paper, we show a formulation of the swizzling technique and report on some simulation results to highlight the resulting improvements.
the proceedings contain 78 papers. the topics discussed include: efficient weighted multiselection in parallelarchitectures;local block factorization and its parallelization to block tridiagonal matrices;parity declu...
ISBN:
(纸本)0769515126
the proceedings contain 78 papers. the topics discussed include: efficient weighted multiselection in parallelarchitectures;local block factorization and its parallelization to block tridiagonal matrices;parity declustering data layout for tolerating dependent disk failures in network raid systems;an analysis of update ordering in a cluster of replicated servers;performance of dynamic load balancing algorithm on cluster of workstations and PCs;universal parallel numerical computing for 3d convection-diffusion equation with variable coefficients;efficient loop partitioning for parallel codes of irregular scientific computations;an evolutionary algorithm of contracting search space based on partial ordering relation for constrained optimization problems;a new divide and conquer algorithm for real symmetric band generalized eigenvalue problem;a framework of using cooperating mobile agents to achieve load sharing in distributed web server groups;and design and analysis of finite difference domain decomposition algorithms for the two-dimensional heat equation.
this paper presents a single FPGA implementation of a real-time sound localization system using two microphones. the implementation, utilizing a cross-correlation technique based on a modified version of the phase tra...
详细信息
this paper presents a single FPGA implementation of a real-time sound localization system using two microphones. the implementation, utilizing a cross-correlation technique based on a modified version of the phase transform, successfully localizes sound sources in noisy environments with as low an SNR as 10 dB. Using the same algorithm and similar hardware architecture, it is shown that up to 5 parallel systems (using 10 microphones), all real-time, can be implemented on a single FPGA while only utilizing an estimated 77 mW-108 mW per microphone.
作者:
Huang, HChinese Acad Sci
Supercomp Ctr Comp Network Informat Ctr Beijing 100080 Peoples R China
In this paper, we use a new language-TPL (Tensor product Language) to compute the Fast Fourier Transform. It can provide good performance and portability. We detail the method and application to the FFT of TPL, andext...
详细信息
ISBN:
(纸本)0769515126
In this paper, we use a new language-TPL (Tensor product Language) to compute the Fast Fourier Transform. It can provide good performance and portability. We detail the method and application to the FFT of TPL, andextendto Sande-Tucky FFT algorithm.
As to Markov cipher, its transition probability matrix is a doubly stochastic one. the eigenvalue of the matrix with maximum magnitude less than one plays an important role in designing Markov cipher this paper provid...
详细信息
ISBN:
(纸本)0769515126
As to Markov cipher, its transition probability matrix is a doubly stochastic one. the eigenvalue of the matrix with maximum magnitude less than one plays an important role in designing Markov cipher this paper provides a parallel algorithm for computing the eigenvalue of the doubly stochastic matrix A of size 65535x65535, which comes from a Markov cipher shrunken model with both 16 bits plaintext and ciphertext, an analysis on the complexity of the parallel algorithm is also considered.
Investigations of the parallel computing of the non-ideal 3-D space detonation wave propagation are presented in this paper on the hi-performance computer based on CC-NUMA architecture. Upon analyzing and testing the ...
详细信息
ISBN:
(纸本)0769515126
Investigations of the parallel computing of the non-ideal 3-D space detonation wave propagation are presented in this paper on the hi-performance computer based on CC-NUMA architecture. Upon analyzing and testing the previous serial program, the computation of curvature, the first-order and the second-order difference were determined to be the main objects of parallelization. Some processing techniques were applied to convert the serial program into parallel program, such as the strategy of "Divide and Conquer", the balance of the loading distribution. Numerical simulation computation of the parallel program results in a great increase of computing speed of the non-ideal 3-D space detonation wave propagation.
An algorithm, which solves the cooperative concurrent computing tasks by using the idle cycle of a number of high performance heterogeneous workstations interconnected by a high-speed network, is proposed. In order to...
详细信息
ISBN:
(纸本)0769515126
An algorithm, which solves the cooperative concurrent computing tasks by using the idle cycle of a number of high performance heterogeneous workstations interconnected by a high-speed network, is proposed. In order to get better parallel computation performance, this paper gives a model and an algorithm of task scheduling among heterogeneous workstations, in which the costs of loading data, computing, communication and collecting results are considered. Using this efficient algorithm, an optimal subset of heterogeneous workstations withthe shortest parallel executing time of tasks can be selected.
暂无评论