In order to fulfil real time signal processing tasks such as clutter rejection, moving target detection (MTD) and constant false alarm rate (CFAR) control in airborne radar, an airborne radar parallel signal processin...
详细信息
ISBN:
(纸本)0780370007
In order to fulfil real time signal processing tasks such as clutter rejection, moving target detection (MTD) and constant false alarm rate (CFAR) control in airborne radar, an airborne radar parallel signal processing system (ARPS2) is proposed with DSP chips as its kernel processing nodes. the DSP chips are used withparallel architecture. Each node has its private input and output memory. It adopts several parallel techniques, such as parallel storage, parallelprocessing, parallel code loading and parallel data organization to achieve high efficiency. It has a simple structure, excellent flexibility and easiness in developing. ARPS2 is going to be applied to an airborne radar. It can also be applied to perform high-speed real time signal processingalgorithms in other kinds of radar.
In this paper, we proposed a flexible VLSI-based parallelprocessing architecture for an improved three-step search (ITSS) motion estimation algorithm that is superior to the existing three-step search (TSS) algorithm...
详细信息
In this paper, we proposed a flexible VLSI-based parallelprocessing architecture for an improved three-step search (ITSS) motion estimation algorithm that is superior to the existing three-step search (TSS) algorithm in all cases and also to the recently proposed new three-step search (NTSS) algorithm if used for low bit-rate video coding, as withthe H.261 standard. Based on a VLSI tree processor and an FPGA addressing circuit, the architecture can successfully implement the ITSS algorithm on silicon withthe minimum number of gates. Because of the flexibility of the architecture, it can also be extended to implement other three-step search algorithms.
Efficient use of data-reuse transformations combined with a custom memory hierarchy that exploits the temporal locality of data related memory accesses can have a significant impact on system power consumption, especi...
详细信息
Efficient use of data-reuse transformations combined with a custom memory hierarchy that exploits the temporal locality of data related memory accesses can have a significant impact on system power consumption, especially in data dominated applications e.g. multimedia processing. In this paper the effect of data-reuse decisions on power consumption, area and performance of multimedia applications implemented on uni- and dual-processor embedded cores is explored. By this work it is clarified that conclusions for the transformations effect on multi-processor architectures can be extracted by the corresponding effect on the uniprocessor architecture. In this way the exploration space can be significantly reduced. A motion estimation algorithm, namely the two-dimensional logarithmic search, and a discrete cosine transform (DCT) algorithm are used as demonstrator applications.
Software pipelining is an instruction-level loop scheduling method for achieving high performance fine-grain parallelism on VLIW (very long instruction word) processors. this paper presents a novel software pipelining...
详细信息
ISBN:
(纸本)9539676940
Software pipelining is an instruction-level loop scheduling method for achieving high performance fine-grain parallelism on VLIW (very long instruction word) processors. this paper presents a novel software pipelining method for non-pipelining parallel processors based on integer scaling and retiming transformations. this approach generalises and simplifies the analogous extended retiming model of T.W. O'Neil et al. (see Proc. ISCA 12th Int. Conf. parallel & Distributed Computing Syst., p.292-7, 1999; Proc. of ICASSP'99 Conf., vol.4 p.2001-4, 1999). Matrix techniques are used in order to simplify the corresponding graph transformations. Some general properties taken from algebraic graph theory are applied in order to obtain general scheduling techniques: node and cycle methods. the two-phase scheduling method considered is first defined by means of two standard linear programming problems. We transform the corresponding problems into some variants of the maximum cost-to-time ratio problem and shortest path problem, in order to obtain efficient polynomial time algorithms. An example of software pipelining optimization of a digital correlator is also given.
this paper presents a new implementation of a 2D wavelet transform in a VLSI circuit, for real-time digital signal processing. the parallel algorithm of the 2D wavelet transform (2D-WT) used for designing and implemen...
详细信息
ISBN:
(纸本)0780365429
this paper presents a new implementation of a 2D wavelet transform in a VLSI circuit, for real-time digital signal processing. the parallel algorithm of the 2D wavelet transform (2D-WT) used for designing and implementing this new architecture enhances the performance of computations. the proposed multi-elementary processor architecture of 2D-WT yields a very flexible hardware configuration. this approach offers a high processing speed, relative to other methods, for providing the wavelet coefficients. the 2D-WT is a powerful tool for several applications, the most important one being image processing.
ICA3PP 2000 was an important conferencethat brought together researchers and practitioners from academia, industry and governments to advance the knowledge of parallel and distributed computing. the proceedings const...
详细信息
ISBN:
(数字)9789812792037
ISBN:
(纸本)9789810244811
ICA3PP 2000 was an important conferencethat brought together researchers and practitioners from academia, industry and governments to advance the knowledge of parallel and distributed computing. the proceedings constitute a well-defined set of innovative research papers in two broad areas of parallel and distributed computing: (1) architectures, algorithms and networks; (2) systems and applications.
this paper examines implementations of a multi-layer perceptron (MLP) on bus-based shared memory (SM) and on distributed memory (DM) multiprocessor systems. the goal has been to optimize HW and SW architectures in ord...
详细信息
this paper examines implementations of a multi-layer perceptron (MLP) on bus-based shared memory (SM) and on distributed memory (DM) multiprocessor systems. the goal has been to optimize HW and SW architectures in order to obtain the fastest response possible. Prototyping parallel MLP algorithms for up to 8 processing nodes withthe DM as well as SM memory was done using CSP-based TRANSIM tool. the results of prototyping MLPs of different sizes on various number of processing nodes demonstrate the feasible speedups, efficiency and time responses for the given CPU speed, link speed or bus bandwidth.
In this paper we propose a novel associative parallel algorithm for selecting a critical cycle in directed weighted graphs by means of an abstract model of the SIMD type with vertical data processing (the STAR-machine...
详细信息
ISBN:
(纸本)0769505686
In this paper we propose a novel associative parallel algorithm for selecting a critical cycle in directed weighted graphs by means of an abstract model of the SIMD type with vertical data processing (the STAR-machine). this problem arises when performing Edmonds' algorithm for finding optimum branchings. this algorithm is represented as the corresponding STAR procedure whose correctness is verified and time complexity is evaluated.
the proceedings contain 51 papers. the special focus in this conference is on System Software and algorithms. the topics include: Charon message-passing toolkit for scientific computations;dynamic slicing of concurren...
ISBN:
(纸本)3540414290
the proceedings contain 51 papers. the special focus in this conference is on System Software and algorithms. the topics include: Charon message-passing toolkit for scientific computations;dynamic slicing of concurrent programs;an efficient run-time scheme for exploiting parallelism on multiprocessor systems;characterization and enhancement of static mapping heuristics for heterogeneous systems;optimal segmented scan and simulation of reconfigurable architectures on fixed connection networks;reducing false causality in causal message ordering;the working-set based adaptive protocol for software distributed shared memory;evaluation of the optimal causal message ordering algorithm;register efficient mergesorting;applying patterns to improve the performance of fault tolerant CORBA;design, implementation and performance evaluation of a high performance CORBA group membership protocol;analyzing the behavior of event dispatching systems through simulation;a domain-specific semi-automatic parallelization tool;practical experiences with java compilation;performance prediction and analysis of parallel out-of-core matrix factorization;integration of task and data parallelism;parallel and distributed computational fluid dynamics;parallel congruent regions on a mesh-connected computer;can scatter communication take advantage of multidestination message passing?;a first class design constraint for future architectures;embedded computing;instruction level distributed processing;speculative multithreaded processors;a fast tree-based barrier synchronization on switch-based irregular networks;meta-data management system for high-performance large-scale scientific data access and parallel sorting algorithms with sampling techniques on clusters with processors running at different speeds.
In this paper we present an approach to determine scheduling functions suitable for the design of processor arrays. the considered scheduling functions support a followed LSGP-partitioning of the processor array by al...
详细信息
ISBN:
(纸本)0769507166
In this paper we present an approach to determine scheduling functions suitable for the design of processor arrays. the considered scheduling functions support a followed LSGP-partitioning of the processor array by allowing to execute the tasks of processors of the frill-size array mapped into one processor of the partitioned processor array in art arbitrary order: Several constraints are derived to ensure the causality of computations and to prevent access conflicts to bath modules and registers. We propose an optimization problem generating the scheduling functions and outline its implementation as an integer linear program. the proposed methods are also applicable for the mapping of algorithms to parallelarchitectures. In this case, the scheduling function produces identical, independent small threads which can be combined to utilize the target architecture as much as possible.
暂无评论