the main contribution of this work is to propose a number of broadcast efficient VLSI architectures for computing the sum and the prefix sums of a wk-bit, k ≥ 2, binary sequence using, as basic building blocks, linea...
详细信息
In this work we present a procedure for automatic parallel code generation in the case of algorithms described through Set of Affine Recurrence Equations (SARE); starting from the original SARE description in an N-dim...
详细信息
In this work we present a procedure for automatic parallel code generation in the case of algorithms described through Set of Affine Recurrence Equations (SARE); starting from the original SARE description in an N-dimensional iteration space, the algorithm is converted into a parallel code for an m-dimensional distributed memory parallel machine (m
A fully parallel iterative thinning algorithm called MB2 is presented. It favourably competes withthe best known algorithms regarding homotopy, mediality, thickness, rotation invariance and noise immunity, while feat...
详细信息
A fully parallel iterative thinning algorithm called MB2 is presented. It favourably competes withthe best known algorithms regarding homotopy, mediality, thickness, rotation invariance and noise immunity, while featuring a speed improvement by a factor of two or more owing to a smaller number of operations to perform. MB2 is grounded on a simple physics-based thinning principle that conveys both quality, efficiency and conceptual clarity. It is particularly suited to data parallel execution.
In this paper we present a load adaptive parallel algorithm and implementation to compute 2D Discrete Wavelet Transform (DWT) on multithreading machines. In a 2D DWT computation, the problem sizes reduces at every dec...
详细信息
In this paper we present a load adaptive parallel algorithm and implementation to compute 2D Discrete Wavelet Transform (DWT) on multithreading machines. In a 2D DWT computation, the problem sizes reduces at every decomposition level and the lengths of the emerging computation paths also vary. the parallel algorithm proposed in this paper dynamically scales itself to the varying problem size. Experimental results are reported based on the implementations of the proposed algorithm on a 2D node multithreading emulation platform, EARth-MANNA. We show that multithreading implementations of the proposed algorithm are at least 2 times faster than the MPI based message passing implementations reported in the literature. We further show that the proposed algorithm and implementations scale linearly with respect to problem and machine sizes.
Discretization of image restoration problems often leads to a discrete inverse ill-posed problem: the discretized operator is so badly conditioned that it can be actually considered as undetermined. In this case one s...
详细信息
the crossbreeding between advanced microprocessor design and Field Programmable Gate AI-rays (FPGAs) has produced the Field Programmable Processor Array: code named FPPA. the first integrated version has been targeted...
详细信息
ISBN:
(纸本)0769500439
the crossbreeding between advanced microprocessor design and Field Programmable Gate AI-rays (FPGAs) has produced the Field Programmable Processor Array: code named FPPA. the first integrated version has been targeted for low power consumption parallelprocessing. the FPPA is composed of a 10x10 array of RISC microcontrollers offering up to 500 MIPS at 5 MHz for processors (20 MHz for communications). the very low power feature of the core processor results in a I Watt power consumption for the whole array at 5 MHz and makes it particularly interesting for portable devices that require quite complex algorithms. In addition, FPPA principle, i.e., fault-tolerant large array of cells interconnected with an asynchronous communication scheme, is applicable on alternative structures for the cell architecture.
In this paper we present a load adaptive parallel algorithm and implementation to compute 2D Discrete Wavelet Transform (DWT) on multithreading machines. In a 2D DWT computation, the problem sizes reduces at every dec...
详细信息
In this paper we present a load adaptive parallel algorithm and implementation to compute 2D Discrete Wavelet Transform (DWT) on multithreading machines. In a 2D DWT computation, the problem sizes reduces at every decomposition level and the lengths of the emerging computation paths also vary. the parallel algorithm proposed in this paper, dynamically scales itself to the varying problem size. Experimental results are reported based on the implementations of the proposed algorithm on a 20 node multithreading emulation platform, EARth-MANNA. We show that multithreading implementations of the proposed algorithm are at least 2 times faster than the MPI based message passing implementations reported in the literature. We further show that the proposed algorithm and implementations scale linearly with respect to problem and machine sizes.
this paper presents parallel approaches to the complete transient numerical analysis of stochastic reward nets (SRNs) for both shared and distributed-memory machines. parallelization concepts and implementation issues...
详细信息
this paper contains the description of the implementation of the weighted distance transform for digital image analysis on a two-processor shared memory computer. the computer is a SMP (symmetric multiprocessing) comp...
详细信息
this paper contains the description of the implementation of the weighted distance transform for digital image analysis on a two-processor shared memory computer. the computer is a SMP (symmetric multiprocessing) computer and the algorithm is implemented using POSIX threads.
Clustering and scheduling of tasks for parallel implementation is a well researched problem. Several techniques have been presented in the literature to improve performance and reduce problem execution times. In this ...
详细信息
Clustering and scheduling of tasks for parallel implementation is a well researched problem. Several techniques have been presented in the literature to improve performance and reduce problem execution times. In this paper we present a novel approach where clustering and scheduling of tasks can be tuned to achieve maximal speedup or efficiency. the proposed scheme is based on the relation between the costs of computation and communication of task clusters. In this paper we show how clustering can be adapted to suit different architectures and number of available processors. the proposed efficient clustering and scheduling algorithm is flexible: the clustering and scheduling can be tuned to suit bounded or unbounded number of processors and/or parallel computing environment. Comparative studies indicate superior efficiency compared to most other schemes proposed in recent years.
暂无评论