One of the promising directions in the complex control and data processing system design is the development of specialized processors for large data arrays with an internal performance of the order of 2 /spl times/ 10...
详细信息
One of the promising directions in the complex control and data processing system design is the development of specialized processors for large data arrays with an internal performance of the order of 2 /spl times/ 10/sup 8/ operations per second and the modeling of complex systems executing in parallel and simultaneously complex operations in real time. the relevant nonrecurrent Boolean functions for these systems have been classified previously where multifunctional logic modules (MLM) realizing ordered Boolean functions from the given class were also described. By MLM (module) is meant an adjustable automaton whose output values depend only on the values of its input signals. the present work describes an MLM for computation of disordered functions.
the paper considers the existence of oblivious many-many packet routing in multi-link binary hypercubes. Modular destination graphs are decomposed according to a certain number of links for a particular dimension n of...
详细信息
the paper considers the existence of oblivious many-many packet routing in multi-link binary hypercubes. Modular destination graphs are decomposed according to a certain number of links for a particular dimension n of the hypercube in order to establish conflict-free tight minimum routing within a network cycle consisting of n hops. Quasi-dimension-order routing is applied, where the replacements of the unidentical bit positions between source and destination are cyclic. Sample results of graph decomposition for a number of dimensions n /spl les/ 7 are presented in the Appendix.
An optimized software implementation of a high quality MPEG AAC-LC (low complexity) audio encoder is presented in this paper. the standard reference encoder is improved by utilizing several algorithmic optimizations (...
详细信息
An optimized software implementation of a high quality MPEG AAC-LC (low complexity) audio encoder is presented in this paper. the standard reference encoder is improved by utilizing several algorithmic optimizations (fast psycho-acoustic model, new tonality estimation, new time domain block switching, optimized quantizer and Huffman coder) and very careful code optimizations for PC CPU architectures with SIMD (single-instruction-multiple-data) instruction set. the psychoacoustic model used the MDCT filterbank for energy estimation and peak detection as a measure of tonality. Block size decision is based on local perceptual entropies as well as LPC analysis of the time signal. Algorithmic optimizations in the quantizer include loop control module modification and optimized Huffman search. Code optimization is based on parallelprocessing by replacing vector algebra and math junctions withtheir optimized equivalents with Intel/sup /spl reg// Signal processing Library (SPL). the implemented codec outperforms consumer MP3 encoders at 30% less bitrate at the same time achieving encoding times several times faster than real-time.
this paper gives an overview on analogic cellular array architecturesthat can also be used to approximate partial differential equations (PDEs). Cellular arrays are massively parallel computing structures composed of...
详细信息
this paper gives an overview on analogic cellular array architecturesthat can also be used to approximate partial differential equations (PDEs). Cellular arrays are massively parallel computing structures composed of cells placed on a regular grid. these cells interact locally an th e array can have both local and global dynamics. the software of this architecture is an analogic algorithm that builds on analog and logical spatio-temporal instructions of the underlying hardware, that is a locally connected cellular nonlinear network (CNN). Within this framework two classes of PDEs, motivated also by image processing methodologies will be discussed: (i) reaction-diffusion (local) types and (ii) contrast modification (global) types. It will be shown that based on cellular diffusion and wave-computing formulations these classes can be approximated on existing CNN Universal Machine (CNN-UM) chips. thus, the last generation of stored program topographic array microprocessors with integrated sensing and computing could also be viewed as the first prototypes of analogic cellular PDE machines implemented on silicon.
the design procedure for high-order single amplifier BP filters is presented. A method for the design of 2/sup nd/- and 4/sup th/-order band-pass (BP) active-RC filters using a modified low-pass to band-pass (LP-BP) f...
详细信息
the design procedure for high-order single amplifier BP filters is presented. A method for the design of 2/sup nd/- and 4/sup th/-order band-pass (BP) active-RC filters using a modified low-pass to band-pass (LP-BP) frequency transformation, has already been presented in previous works. It showed that a BP filter could be realized simply by the substitution of resistors and capacitors of the ladder in a low-pass (LP) prototype filter, by serial and parallel RC circuits. Such a substitution results from a so-called "lossy" LP-BP transformation. In this paper, the design procedure is extended to higher-order BP filters, such as 6th- and 8th-order. the design procedure is simple, and the closed-form design equations are presented. Furthermore, it is shown that "impedance tapering" decreases sensitivities to component tolerances for the LP prototype, as well as for the resulting BP filter. Schoeffler's sensitivity measure is used for the sensitivity analysis.
Multiply and multiply-accumulate (MAC) instructions (see ARM DDI0l00E, ARM Architecture Reference Manual) are fundamental instructions in DSP applications. In an embedded digital signal processing (DSP) core and high-...
详细信息
ISBN:
(纸本)0780374886
Multiply and multiply-accumulate (MAC) instructions (see ARM DDI0l00E, ARM Architecture Reference Manual) are fundamental instructions in DSP applications. In an embedded digital signal processing (DSP) core and high-performance enhanced DSP instruction processor core, the implementation of high-performance multiply and MAC instructions is very important. An algorithm of 32/spl times/32 multiply and MAC instructions' VLSI implementation with 32/spl times/8 multiplier-accumulator in DSP applications is presented. the 32/spl times/32 multiplication is achieved by 4 times 32/spl times/8 multiplication. the result of one 32/spl times/8 multiplication serves as a partial product of the next 32/spl times/8 operation; when the result of four such multiplications is accumulated, we get the result of 32/spl times/32. the 32/spl times/8 multiplication is only implemented by the hardware Booth multiplier. the algorithm of multiply and MAC instructions' implementation is the better trade-off between serial multiplier and parallel multiplier.
In this paper, we proposed a flexible VLSI-based parallelprocessing architecture for an improved three-step search (ITSS) motion estimation algorithm that is superior to the existing three-step search (TSS) algorithm...
详细信息
ISBN:
(纸本)0780370570
In this paper, we proposed a flexible VLSI-based parallelprocessing architecture for an improved three-step search (ITSS) motion estimation algorithm that is superior to the existing three-step search (TSS) algorithm in all cases and also to the recently proposed new three-step search (NTSS) algorithm if used for low bit-rate video coding, as withthe H.261 standard. Based on a VLSI tree processor and an FPGA addressing circuit, the architecture can successfully implement the ITSS algorithm on silicon withthe minimum number of gates. Because of the flexibility of the architecture, it can also be extended to implement other three-step search algorithms.
We present parallelalgorithms to find cut vertices, bridges, and Hamiltonian Path in bounded interval tolerance graphs. For a graph with n vertices, the algorithms require O (log n) time and use O (n) processors to r...
详细信息
ISBN:
(纸本)0769511538
We present parallelalgorithms to find cut vertices, bridges, and Hamiltonian Path in bounded interval tolerance graphs. For a graph with n vertices, the algorithms require O (log n) time and use O (n) processors to run OR. Concurrent Read Exclusive Write parallel RAM (CREW PRAM) model of computation. Our approach transforms the original graph problem to a problem in computational geometry. the total work done by the parallelalgorithms is comparable to the work done by the best known sequential algorithms for the more restricted class of graphs, namely, interval graphs and permutation graphs. In this sense our algorithms have optimal complementary.
In this paper, we present an adaptive version of the parallel Distributive Join (DJ) algorithm that we proposed in [1]. the adaptive parallel DJ algorithm can handle the data skew in operand relations efficiently. We ...
详细信息
ISBN:
(纸本)0769511538
In this paper, we present an adaptive version of the parallel Distributive Join (DJ) algorithm that we proposed in [1]. the adaptive parallel DJ algorithm can handle the data skew in operand relations efficiently. We implemented the original and adaptive parallel DJ algorithms on a network of Alpha workstations using the parallel Virtual Machine (PVM). We analyzed the performance of the algorithms, and compared it withthat of the parallel Hybrid-Hash (KH) join algorithms. Our results show that the parallel DJ algorithms perform comparably withthe parallel HH join algorithms over the entire range of the number of processors used and for different join selectivities. A significant advantage of the parallel DJ algorithms is that they can easily support non-equijoin operations.
Techniques for scheduling parallel I/O for both uniprogrammed systems that run single jobs in isolation and multiprogrammed environments that execute multiple parallel jobs simultaneously ate presented. the performanc...
详细信息
ISBN:
(纸本)0769511538
Techniques for scheduling parallel I/O for both uniprogrammed systems that run single jobs in isolation and multiprogrammed environments that execute multiple parallel jobs simultaneously ate presented. the performance of the scheduling algorithms is evaluated on a network of workstations. A new scheduling algorithm proposed in this paper is observed to perform very well for systems running single jobs in isolation. the algorithmsthat use knowledge of job characteristics are observed to produce a superior performance in multiprogrammed parallel environments.
暂无评论