In this paper we present a parallel algorithm that solves the Toeplitz Least Squares Problem. We exploit the displacement structure of Toeplitz matrices and parallelize the Generalized Schur method. the stability prob...
详细信息
In order to fulfil real time signal processing tasks such as clutter rejection, moving target detection (MTD) and constant false alarm rate (CFAR) control in airborne radar, an airborne radar parallel signal processin...
详细信息
ISBN:
(纸本)0780370007
In order to fulfil real time signal processing tasks such as clutter rejection, moving target detection (MTD) and constant false alarm rate (CFAR) control in airborne radar, an airborne radar parallel signal processing system (ARPS2) is proposed with DSP chips as its kernel processing nodes. the DSP chips are used withparallel architecture. Each node has its private input and output memory. It adopts several parallel techniques, such as parallel storage, parallelprocessing, parallel code loading and parallel data organization to achieve high efficiency. It has a simple structure, excellent flexibility and easiness in developing. ARPS2 is going to be applied to an airborne radar. It can also be applied to perform high-speed real time signal processingalgorithms in other kinds of radar.
We present a high-speed VLSI fuzzy logic controller, which is well suitable for real time applications. the main distinction of our approach is that it may complete the max-min calculation within one clock cycle. the ...
详细信息
We present a high-speed VLSI fuzzy logic controller, which is well suitable for real time applications. the main distinction of our approach is that it may complete the max-min calculation within one clock cycle. the speedup is achieved by an effective format for membership function and a careful analysis to the conditions of max-min calculation. As a result, the latency of a fuzzy inference can be considerably reduced. Based on the basic idea, a pipelined parallel architecture is proposed to fully utilize the parallelism inherited in the fuzzy inference. the VLSI fuzzy logic controller was implemented and simulated by using 0.35 /spl mu/m cell library as the target technology. Experimental data shows that the proposed architecture achieves higher performance compared with other approaches.
A recent latency tolerance technique, read-miss clustering, restructures code to send demand-miss references in parallel to the underlying memory system. An alternative, widely-used latency tolerance technique is soft...
详细信息
A recent latency tolerance technique, read-miss clustering, restructures code to send demand-miss references in parallel to the underlying memory system. An alternative, widely-used latency tolerance technique is software prefetching, which initiates data fetches ahead of expected demand-miss references by a certain distance. Since both techniques seem to target the same types of latencies and use the same system resources, it is unclear which technique is superior or if both can be combined. this paper shows that these two techniques are actually mutually beneficial, each helping to overcome limitations of the other: We perform our study for uniprocessor and multiprocessor configurations, in simulation and on a real machine (the Convex Exemplar). Compared to prefetching alone (the state-of-the-art implemented in systems today), the combination of the two techniques reduces the execution time by an average of 21% across all cases studied in simulation, and by an average of 16% for 5 out of 10 cases on the Exemplar. the combination sees execution time reductions relative to clustering alone averaging 15% for 6 out of 11 cases in simulation and 20% for 6 out of 10 cases on the Exemplar.
In this paper, we proposed a flexible VLSI-based parallelprocessing architecture for an improved three-step search (ITSS) motion estimation algorithm that is superior to the existing three-step search (TSS) algorithm...
详细信息
In this paper, we proposed a flexible VLSI-based parallelprocessing architecture for an improved three-step search (ITSS) motion estimation algorithm that is superior to the existing three-step search (TSS) algorithm in all cases and also to the recently proposed new three-step search (NTSS) algorithm if used for low bit-rate video coding, as withthe H.261 standard. Based on a VLSI tree processor and an FPGA addressing circuit, the architecture can successfully implement the ITSS algorithm on silicon withthe minimum number of gates. Because of the flexibility of the architecture, it can also be extended to implement other three-step search algorithms.
Over the last decades Genetic algorithms (GA) and Genetic Programming (GP) have proven to be efficient tools for a wide range of applications. However, in order to solve human-competitive problems they require large a...
详细信息
Efficient use of data-reuse transformations combined with a custom memory hierarchy that exploits the temporal locality of data related memory accesses can have a significant impact on system power consumption, especi...
详细信息
Efficient use of data-reuse transformations combined with a custom memory hierarchy that exploits the temporal locality of data related memory accesses can have a significant impact on system power consumption, especially in data dominated applications e.g. multimedia processing. In this paper the effect of data-reuse decisions on power consumption, area and performance of multimedia applications implemented on uni- and dual-processor embedded cores is explored. By this work it is clarified that conclusions for the transformations effect on multi-processor architectures can be extracted by the corresponding effect on the uniprocessor architecture. In this way the exploration space can be significantly reduced. A motion estimation algorithm, namely the two-dimensional logarithmic search, and a discrete cosine transform (DCT) algorithm are used as demonstrator applications.
Multiplication is one of the most critical operations in many computational systems. In this paper, we present an improved architecture for a multiplexer-based multiplication algorithm. Also through intensive HSPICE s...
详细信息
Multiplication is one of the most critical operations in many computational systems. In this paper, we present an improved architecture for a multiplexer-based multiplication algorithm. Also through intensive HSPICE simulation, it has been shown in this paper that due to smaller internal capacitance, the multiplexer-based array multiplier outperforms the modified Booth multiplier in both speed and power dissipation by 13% to 26%. In addition, we demonstrate that using area-efficient full adder circuits (SERF and 10T) can help reduce the overall routing capacitance, resulting in less power consumption for multipliers built upon those adder circuits. therefore, a multiplexer-based multiplier following the suggested architecture, along with area-efficient full adder circuits, can be used for low power high performance parallel multiplier designs.
ICA3PP 2000 was an important conferencethat brought together researchers and practitioners from academia, industry and governments to advance the knowledge of parallel and distributed computing. the proceedings const...
详细信息
ISBN:
(数字)9789812792037
ISBN:
(纸本)9789810244811
ICA3PP 2000 was an important conferencethat brought together researchers and practitioners from academia, industry and governments to advance the knowledge of parallel and distributed computing. the proceedings constitute a well-defined set of innovative research papers in two broad areas of parallel and distributed computing: (1) architectures, algorithms and networks; (2) systems and applications.
this paper presents a path of parallelism exploitation in commercial programmable DSP processors. DSP processors have gained in their complexity and recently adopted some very sophisticated parallelism extraction tech...
详细信息
this paper presents a path of parallelism exploitation in commercial programmable DSP processors. DSP processors have gained in their complexity and recently adopted some very sophisticated parallelism extraction techniques, namely very long instruction word (VLIW) and SIMD designs. the intention is to show a development path of digital signal processors (DSP) and focuses on their features that allow parallelprocessing of algorithms.
暂无评论