We present a high-speed VLSI fuzzy logic controller, which is well suitable for real time applications. the main distinction of our approach is that it may complete the max-min calculation within one clock cycle. the ...
详细信息
We present a high-speed VLSI fuzzy logic controller, which is well suitable for real time applications. the main distinction of our approach is that it may complete the max-min calculation within one clock cycle. the speedup is achieved by an effective format for membership function and a careful analysis to the conditions of max-min calculation. As a result, the latency of a fuzzy inference can be considerably reduced. Based on the basic idea, a pipelined parallel architecture is proposed to fully utilize the parallelism inherited in the fuzzy inference. the VLSI fuzzy logic controller was implemented and simulated by using 0.35 /spl mu/m cell library as the target technology. Experimental data shows that the proposed architecture achieves higher performance compared with other approaches.
A recent latency tolerance technique, read-miss clustering, restructures code to send demand-miss references in parallel to the underlying memory system. An alternative, widely-used latency tolerance technique is soft...
详细信息
A recent latency tolerance technique, read-miss clustering, restructures code to send demand-miss references in parallel to the underlying memory system. An alternative, widely-used latency tolerance technique is software prefetching, which initiates data fetches ahead of expected demand-miss references by a certain distance. Since both techniques seem to target the same types of latencies and use the same system resources, it is unclear which technique is superior or if both can be combined. this paper shows that these two techniques are actually mutually beneficial, each helping to overcome limitations of the other: We perform our study for uniprocessor and multiprocessor configurations, in simulation and on a real machine (the Convex Exemplar). Compared to prefetching alone (the state-of-the-art implemented in systems today), the combination of the two techniques reduces the execution time by an average of 21% across all cases studied in simulation, and by an average of 16% for 5 out of 10 cases on the Exemplar. the combination sees execution time reductions relative to clustering alone averaging 15% for 6 out of 11 cases in simulation and 20% for 6 out of 10 cases on the Exemplar.
In this paper, we proposed a flexible VLSI-based parallelprocessing architecture for an improved three-step search (ITSS) motion estimation algorithm that is superior to the existing three-step search (TSS) algorithm...
详细信息
In this paper, we proposed a flexible VLSI-based parallelprocessing architecture for an improved three-step search (ITSS) motion estimation algorithm that is superior to the existing three-step search (TSS) algorithm in all cases and also to the recently proposed new three-step search (NTSS) algorithm if used for low bit-rate video coding, as withthe H.261 standard. Based on a VLSI tree processor and an FPGA addressing circuit, the architecture can successfully implement the ITSS algorithm on silicon withthe minimum number of gates. Because of the flexibility of the architecture, it can also be extended to implement other three-step search algorithms.
Over the last decades Genetic algorithms (GA) and Genetic Programming (GP) have proven to be efficient tools for a wide range of applications. However, in order to solve human-competitive problems they require large a...
详细信息
Efficient use of data-reuse transformations combined with a custom memory hierarchy that exploits the temporal locality of data related memory accesses can have a significant impact on system power consumption, especi...
详细信息
Efficient use of data-reuse transformations combined with a custom memory hierarchy that exploits the temporal locality of data related memory accesses can have a significant impact on system power consumption, especially in data dominated applications e.g. multimedia processing. In this paper the effect of data-reuse decisions on power consumption, area and performance of multimedia applications implemented on uni- and dual-processor embedded cores is explored. By this work it is clarified that conclusions for the transformations effect on multi-processor architectures can be extracted by the corresponding effect on the uniprocessor architecture. In this way the exploration space can be significantly reduced. A motion estimation algorithm, namely the two-dimensional logarithmic search, and a discrete cosine transform (DCT) algorithm are used as demonstrator applications.
Multiplication is one of the most critical operations in many computational systems. In this paper, we present an improved architecture for a multiplexer-based multiplication algorithm. Also through intensive HSPICE s...
详细信息
Multiplication is one of the most critical operations in many computational systems. In this paper, we present an improved architecture for a multiplexer-based multiplication algorithm. Also through intensive HSPICE simulation, it has been shown in this paper that due to smaller internal capacitance, the multiplexer-based array multiplier outperforms the modified Booth multiplier in both speed and power dissipation by 13% to 26%. In addition, we demonstrate that using area-efficient full adder circuits (SERF and 10T) can help reduce the overall routing capacitance, resulting in less power consumption for multipliers built upon those adder circuits. therefore, a multiplexer-based multiplier following the suggested architecture, along with area-efficient full adder circuits, can be used for low power high performance parallel multiplier designs.
ICA3PP 2000 was an important conferencethat brought together researchers and practitioners from academia, industry and governments to advance the knowledge of parallel and distributed computing. the proceedings const...
详细信息
ISBN:
(数字)9789812792037
ISBN:
(纸本)9789810244811
ICA3PP 2000 was an important conferencethat brought together researchers and practitioners from academia, industry and governments to advance the knowledge of parallel and distributed computing. the proceedings constitute a well-defined set of innovative research papers in two broad areas of parallel and distributed computing: (1) architectures, algorithms and networks; (2) systems and applications.
this paper presents a path of parallelism exploitation in commercial programmable DSP processors. DSP processors have gained in their complexity and recently adopted some very sophisticated parallelism extraction tech...
详细信息
this paper presents a path of parallelism exploitation in commercial programmable DSP processors. DSP processors have gained in their complexity and recently adopted some very sophisticated parallelism extraction techniques, namely very long instruction word (VLIW) and SIMD designs. the intention is to show a development path of digital signal processors (DSP) and focuses on their features that allow parallelprocessing of algorithms.
In this paper we present an approach to determine scheduling functions suitable for the design of processor arrays. the considered scheduling functions support a followed LSGP-partitioning of the processor array by al...
详细信息
ISBN:
(纸本)0769507166
In this paper we present an approach to determine scheduling functions suitable for the design of processor arrays. the considered scheduling functions support a followed LSGP-partitioning of the processor array by allowing to execute the tasks of processors of the frill-size array mapped into one processor of the partitioned processor array in art arbitrary order: Several constraints are derived to ensure the causality of computations and to prevent access conflicts to bath modules and registers. We propose an optimization problem generating the scheduling functions and outline its implementation as an integer linear program. the proposed methods are also applicable for the mapping of algorithms to parallelarchitectures. In this case, the scheduling function produces identical, independent small threads which can be combined to utilize the target architecture as much as possible.
the emergence of multimedia technology in recent years is strongly driven by an enormous commercial potential. For the scientific community this development is interesting because a number of attractive disciplines fo...
ISBN:
(纸本)3540679561
the emergence of multimedia technology in recent years is strongly driven by an enormous commercial potential. For the scientific community this development is interesting because a number of attractive disciplines for computer science and engineering flow together into the multimedia mainstream: image processing, computer graphics, data compression, encoding, cryptography, and broadband communication, to mention just a few of them. these fields have always been driving forces behind the design of massively parallelarchitectures and algo- rithms as well as special purpose processors and storage systems.
暂无评论