In this paper, an FPGA implementation of a novel and highly scalable hardware architecture for fast inversion of triangular matrices is presented. An integral part of modem signal processing and communications applica...
详细信息
In this paper, an FPGA implementation of a novel and highly scalable hardware architecture for fast inversion of triangular matrices is presented. An integral part of modem signal processing and communications applications involves manipulation of large matrices. therefore, scalable and flexible hardware architectures are increasingly sought for. In this paper, the traditional triangular shaped array architecture with n(n+l)/2 communicating processors, with n being the number of inputs, is mapped to a linear structure with only n processors. the linear and the triangular shaped architectures are compared in aspect of area consumption, latencies, and maximum clocking speed. this paper also show that the linear array structure avoids drawbacks such as non-scalability, large area, and large power consumption. the implementation is based on a numerically stable recurrence algorithm, which has excellent properties for hardware implementation.
this paper presents a software implementation of a very fast parallel Reed-Solomon decoder on the second generation of MorphoSys reconfigurable computation platform, which is targeting on streamed applications such as...
详细信息
ISBN:
(纸本)9781581137422
this paper presents a software implementation of a very fast parallel Reed-Solomon decoder on the second generation of MorphoSys reconfigurable computation platform, which is targeting on streamed applications such as multimedia and DSP. Numerous modifications of the first-generation of the architecture have made a scalable computation and communication intensive architecture capable of extracting parallelisms of fine grain in instruction level. Many algorithms and the whole digital video broadcasting base-band receiver as well, have been mapped onto the second architecture with impressive performance. the mapping of a Reed-Solomon decoder proposed in the paper highly parallelizes all of its sub-algorithms, including Syndrome Computation, Berlekamp Algorithm, Chein Search, and Error Value Computation, in a SIMD fashion. the mapping is tested on a cycle-accurate simulator, "Mulate", and the performance is encouragingly better than other architectures. the decoding speed of the RS (255,239,16) decoder using two different methods of GF multiplication can be 1.319 Gbps and 2.534 Gbps, respectively. Furthermore, since there is no functionality specifically tailored to Reed-Solomon decoder, the result has demonstrated the capability of MorphoSys architecture to extracting instruction level parallelism from streamed applications.
the proceedings contain 78 papers. the topics discussed include: efficient weighted multiselection in parallelarchitectures;local block factorization and its parallelization to block tridiagonal matrices;parity declu...
ISBN:
(纸本)0769515126
the proceedings contain 78 papers. the topics discussed include: efficient weighted multiselection in parallelarchitectures;local block factorization and its parallelization to block tridiagonal matrices;parity declustering data layout for tolerating dependent disk failures in network raid systems;an analysis of update ordering in a cluster of replicated servers;performance of dynamic load balancing algorithm on cluster of workstations and PCs;universal parallel numerical computing for 3d convection-diffusion equation with variable coefficients;efficient loop partitioning for parallel codes of irregular scientific computations;an evolutionary algorithm of contracting search space based on partial ordering relation for constrained optimization problems;a new divide and conquer algorithm for real symmetric band generalized eigenvalue problem;a framework of using cooperating mobile agents to achieve load sharing in distributed web server groups;and design and analysis of finite difference domain decomposition algorithms for the two-dimensional heat equation.
the benchmark for the reliability quality of networks depends mainly on the accuracy of the reliability parameters. Analytical solutions for such parameters exist for simple network architectures. In this paper simula...
详细信息
the benchmark for the reliability quality of networks depends mainly on the accuracy of the reliability parameters. Analytical solutions for such parameters exist for simple network architectures. In this paper simulations are used to determine reliability parameters for complex architecture such as the MPLS backbone planned for next-generation Internet. Beside analyzing single figure average type reliability parameters such as availability, average downtime over one year, and probability of zero downtime, the paper also considers the downtime distribution among a population of equally design systems. the distribution approach provides more comprehensive information about the behavior of the individual systems. Results obtained from the simulations are compared to the analytical results of simple parallel architecture.
New characteristics for e-commerce applications, such as highly distributed data and unpredictable system nature, require us to revisit query processing for distributed database systems. As join operations involve rel...
详细信息
Describes two different approaches to optimize the performance of SoC architectures in the architecture exploration phase. Both solve the problem to map and schedule a task graph on a target architecture under special...
详细信息
Describes two different approaches to optimize the performance of SoC architectures in the architecture exploration phase. Both solve the problem to map and schedule a task graph on a target architecture under special consideration of on-chip communications. A constructive algorithm is presented that extends previous work by taking into account potential data transfers in the future. the second approach is a recursive procedure that is based on local search techniques in a specially defined neighborhood of the critical path. Simulated annealing and tabu search are used as search algorithms. Both approaches find solutions with better performance than established methodologies. the recursive technique leads to superior results than the constructive approach, however, is limited to small and mid-sized problems, whereas the constructive algorithm is not limited by this issue.
this paper develops a new approach to compiling C programs for multiple address space, multi-processor DSPs. It integrates a novel data transformation technique that exposes the processor location of partitioned data ...
详细信息
this paper develops a new approach to compiling C programs for multiple address space, multi-processor DSPs. It integrates a novel data transformation technique that exposes the processor location of partitioned data into a parallelization strategy. When this is combined with a new address resolution mechanism, it generates efficient programs that run on multiple address spaces without using message passing. this approach is applied to the UTDSP benchmark suite and evaluated on a four processor TigerSHARC board, where it is shown to outperform existing approaches and give an average speedup of 3.25 on the parallel benchmarks.
parallelprocessing is a vital tool for many scientific and industrial applications where real time constraints apply;in many applications the use of parallelprocessing and multiprocessor platforms seems to be the fa...
详细信息
ISBN:
(纸本)0780375963
parallelprocessing is a vital tool for many scientific and industrial applications where real time constraints apply;in many applications the use of parallelprocessing and multiprocessor platforms seems to be the favourable solution for achieving acceptable throughput. Hence parallelprocessingalgorithms are vital tools to achieve a good trade off between hardware cost, system efficiency and power. In this paper, the one-dimensional generalised parallel block filter algorithm based on the overlap-add approach is implemented on multi-DSPs platform. the mathematical concept of the input stage, output stage and the generalised direct filter equation are given. Also the I-D parallel algorithm is shown and a suitable parallel architecture is presented.
作者:
Huang, HChinese Acad Sci
Supercomp Ctr Comp Network Informat Ctr Beijing 100080 Peoples R China
In this paper, we use a new language-TPL (Tensor product Language) to compute the Fast Fourier Transform. It can provide good performance and portability. We detail the method and application to the FFT of TPL, andext...
详细信息
ISBN:
(纸本)0769515126
In this paper, we use a new language-TPL (Tensor product Language) to compute the Fast Fourier Transform. It can provide good performance and portability. We detail the method and application to the FFT of TPL, andextendto Sande-Tucky FFT algorithm.
As to Markov cipher, its transition probability matrix is a doubly stochastic one. the eigenvalue of the matrix with maximum magnitude less than one plays an important role in designing Markov cipher this paper provid...
详细信息
ISBN:
(纸本)0769515126
As to Markov cipher, its transition probability matrix is a doubly stochastic one. the eigenvalue of the matrix with maximum magnitude less than one plays an important role in designing Markov cipher this paper provides a parallel algorithm for computing the eigenvalue of the doubly stochastic matrix A of size 65535x65535, which comes from a Markov cipher shrunken model with both 16 bits plaintext and ciphertext, an analysis on the complexity of the parallel algorithm is also considered.
暂无评论