the theory of bulk-synchronous parallel computing has produced a large number of attractive algorithms, which are provably optimal in some sense, but typically require that the aggregate random access memory (RAM) of ...
详细信息
ISBN:
(纸本)9783540772194
the theory of bulk-synchronous parallel computing has produced a large number of attractive algorithms, which are provably optimal in some sense, but typically require that the aggregate random access memory (RAM) of the processors be sufficient to hold the entire data set of the parallel problem instance. In this work we investigate the performance of parallelalgorithms for extremely large problem instances relative to the available RAM. We describe a system, parallel External Memory System (PEMS), which allows existing parallel programs designed for a large number of processors without disks to be adapted easily to smaller, realistic numbers of processors, each with its own disk system. Our experiments with PEMS show that this approach is practical and promising and the run times scale predictable withthe number of processors and withthe problem size.
Xetal-II is a SIMD processor with 320 processing elements delivering a peak performance of 107GOPS on 16b data while dissipating 600mW. A 10Mb on-chip memory can store up to 4 VGA frames allowing efficient implementat...
详细信息
the implementation of parallel asynchronous iterative algorithms on message passing architectures is considered. Several issues related to communication via message passing interfaces or libraries such as MPI-1, MPI-2...
详细信息
the implementation of parallel asynchronous iterative algorithms on message passing architectures is considered. Several issues related to communication via message passing interfaces or libraries such as MPI-1, MPI-2, PVM or SHMEM are discussed in this survey paper. Practical implementations are proposed
Reflective symmetry is useful for various areas such as computer vision, medical imaging, and 3D model retrieval system. this paper presents an intuitive reflective symmetry detection method for 3D polygon objects. Wi...
详细信息
ISBN:
(纸本)9783540715900
Reflective symmetry is useful for various areas such as computer vision, medical imaging, and 3D model retrieval system. this paper presents an intuitive reflective symmetry detection method for 3D polygon objects. Without any mapping process the method detects the reflective symmetry plane by parallel projection. this paper defines a continuous measure to estimate how much an object is reflective symmetrical for a projection plane through the center of the object. Also it explores the method to detect the reflective symmetry plane withthe measure. the proposed method can detect up to 99% reflective symmetry plane not exceeding 4 degree angle for perfect symmetry objects and detect up to 85% reflective symmetry plane not exceeding 10 degree angle for near symmetry objects using Princeton Shape Benchmark.
the Kalman filter is a set of mathematical equations that provides an efficient computational (recursive) mean to estimate the state of a process, in a way that minimizes the mean of the squared error. this filter is ...
详细信息
Network processors employ a multithreaded, chip-multiprocessing architecture to effectively hide memory latency and deliver high performance for packet processing applications. In such a parallel paradigm, when multip...
详细信息
We investigate the efficient storage of row-sorted 1-variant (m + 1) X (n + 1) matrices, m > n, that have the following properties: the rows are sorted in strictly increasing order and the set of elements of each r...
详细信息
Multiprocessor system-on-a-chip architectures have received a lot of attention in the past years, but few advances in compilation techniques are targeting these architectures. this is particularly true for the exploit...
详细信息
Multiprocessor system-on-a-chip architectures have received a lot of attention in the past years, but few advances in compilation techniques are targeting these architectures. this is particularly true for the exploitation of data locality. Most of the compilation techniques discussed in the literature for parallelarchitectures are based on single loop nest. However, most multimedia and image processing applications are composed of several loop nests. In this paper, new techniques based on program transformations are proposed to optimize these types of applications. In a monoprocessor architecture, the loop fusion technique is well known. In this paper, the loop fusion is generalized and adapted to a MPSoC architecture. Another technique called "computation propagation " is proposed. It completely removes the temporary arrays and significantly reduces the memory accesses, the memory space and the processing time. Experimental results show that this new technique yields a significant reduction in the number of data cache misses (35%), in processing time (30%) and in channel transactions (85%).
parallel multiücontext reconfigurable architectures provide very attractive platforms with respect to computational performance and reconfigurable features. Today's challenge is the exploitation of this recon...
详细信息
parallel multiücontext reconfigurable architectures provide very attractive platforms with respect to computational performance and reconfigurable features. Today's challenge is the exploitation of this reconfigurable and computational potential to ascertain efficient solution for mapping applications onto these architectures. the demand for appropriate tools is evident. In this paper we provide a combination and a mutual adaption of two separate tools to create a continuous design flow for parallel multiücontext reconfigurable architectures. Especially we present the interaction of a parameterized mapping tool for mapping compute intensive algorithms on processor arrays and a subsequent verification of the mapping results using the Configurable Reconfigurable Core (CRC) architecture model. the SystemC implementation of the CRC model leads to a cycle accurate functional simulation of the realization. Using this continuous design flow we derive an efficient realization of the edge detection algorithm (EDA) on a parallel multiücontext reconfigurable architecture. We describe in detail how the parallel realization of the EDA has to be translated in a specification for programming the CRC model.
Block-based motion estimation technique is being widely used in video compression applications, for the removal of video temporal redundancy. In this paper we have implemented the six-level nested do-loop full-search ...
详细信息
Block-based motion estimation technique is being widely used in video compression applications, for the removal of video temporal redundancy. In this paper we have implemented the six-level nested do-loop full-search block-matching motion estimation algorithm proposed by *** and *** by breaking the respective frames into macroblocks. We have used Matlab for the simulation of the algorithm and the results obtained is being presented in this paper. We have first implemented the algorithm using 25 movie frames without breaking them into macroblocks, in the next phase we have implemented the same after breaking the frames into the respective macroblocks. We have also proposed a multi processing based architecture for the hardware implementation of the simulated algorithm.
暂无评论