In this paper, we mainly study the parallelization aspects of the accelerated waveform relaxation algorithms for the transient simulation of semiconductor devices on parallel distributed memory computers since these m...
ISBN:
(纸本)0769505716
In this paper, we mainly study the parallelization aspects of the accelerated waveform relaxation algorithms for the transient simulation of semiconductor devices on parallel distributed memory computers since these method are competitive with standard pointwise methods on serial machines, but are significantly faster on parallel computers. Here we are using an efficient parallel version of the Biconjugate gradient method (BiCG) proposed in [6] combining elements of numerical stability and parallel algorithm design, for solving the resulting sequence of time-varying sparse linear differential-algebraic initial-value problems (IVP) arising at each linearization step with waveform Newton. the algorithm is derived such that all inner products and matrix-vector multiplications of a single iteration step are independent. therefore, the cost of global communication can be significantly reduced. Experimental results carried out On Parsytec massively parallel systems with regards to the comparison with other accelerated approaches such as convolution SOR and waveform GMRES techniques on waveform relaxation algorithm. and pointwise methods are described as well.
General purpose microprocessors have long been considered a computing platform unsuited to image processing and vision tasks. the so-called Von-Neumann paradigm and the associated memory bottleneck have motivated the ...
详细信息
General purpose microprocessors have long been considered a computing platform unsuited to image processing and vision tasks. the so-called Von-Neumann paradigm and the associated memory bottleneck have motivated the research into various forms of parallelprocessing and of special processors for vision. the SIMD approach, adopted in massively parallel processors, has been introduced in a minimal format in the multimedia extensions to Instruction Set architectures of standard microprocessors. this papers examines the characteristics of SIMD processingthat have been mapped into these extensions.
this paper presents a novel loop transformation (Loop Regularization, LR) that increases the execution efficiency of Image and Video processing programs running on instruction level parallel (ILP) processors. LR is sp...
详细信息
this paper presents a novel loop transformation (Loop Regularization, LR) that increases the execution efficiency of Image and Video processing programs running on instruction level parallel (ILP) processors. LR is specifically devised for those ILP processors that do not include hardware mechanisms for instruction reordering and register renaming such as today's low cost processors for embedded systems and digital signal processors. this paper shows the effects of LR and reports on a set of system-level experiments that validate the technique.
this paper proposes a concept of vision application-adaptable architecture called FreeTIV (Free architecture dedicated to image processing and vision) based on an adaptable message passing router called RouTIV (Router...
详细信息
this paper proposes a concept of vision application-adaptable architecture called FreeTIV (Free architecture dedicated to image processing and vision) based on an adaptable message passing router called RouTIV (Router dedicated to image processing and vision). this router adapts interconnection of the available calculation resources in order to reduce the running application data movements implementation and execution costs. the adaptable router concept allows to obtain application dedicated fast and reliable parallel machine at low cost.
this paper presents an implementation of a topological segmentation on a SIMD massively parallel computer based on reconfigurability and asynchronism: Associative Mesh. this architecture provides powerful computationa...
详细信息
this paper presents an implementation of a topological segmentation on a SIMD massively parallel computer based on reconfigurability and asynchronism: Associative Mesh. this architecture provides powerful computational primitives that can apply an associative operator over the connex sets of a graph. So, basic primitives combine communications and computations. these primitives can be easily and efficiently realized in hardware by means of asynchronous operations and are adapted to a large number of image analysis primitives. We try to show the adequacy of Associative Mesh computing model withthe different data movements that are generated by the several approaches of the image analysis. We are interested here with a new approach: image topology. We indicate how to get an homotopic kernel and a leveling kernel withparallelalgorithms. Such kernels may be seen as `ultimate' topological simplifications of an image. this kind of image is similar to a very good split because it is based on topological information of image. We show one example of merge: we implement a method segmenting without the need of defining and tuning parameters.
Withthe development of network technology, distributed parallel computing with multi processors as its basis is becoming a new kind of efficient parallel computing method. We introduce a parallel neurocomputing envir...
详细信息
Withthe development of network technology, distributed parallel computing with multi processors as its basis is becoming a new kind of efficient parallel computing method. We introduce a parallel neurocomputing environment-HCPC (heterogeneous computer parallel computing), which is realized in a network of UNIX workstations and microcomputers. Capabilities of HCPC are tested through the realization of an ART1 neural network model.
Recently, bit-parallel architecture for hardware implementation in GF(2/sup m/) is of practical concern. We present a new inner product multiplication algorithm that is an alternative develop in a polynomial basis for...
详细信息
Recently, bit-parallel architecture for hardware implementation in GF(2/sup m/) is of practical concern. We present a new inner product multiplication algorithm that is an alternative develop in a polynomial basis for the field GF(2/sup m/) generated by an irreducible all one polynomial (AOP). the algorithm is more efficient to construct a low-complexity bit-parallel architecture for computing AB multiplication. the complexity of the designed multiplier only requires the latency of m+2 clock delays and the complexity of basic cell comprises one 2-input AND gate, one 2-input XOR gate, and four latches. Meanwhile, the designed multiplication tree, based on the characteristic of a binary tree, uses the ideal AB multiplier to compute exponentiation in GF(2/sup m/). the latency of exponentiation only requires (m+2)[log/sub 2/m]+1 clock cycles. the cyclic time (a clock period) of our presented architectures desires one-gate delay. For the computing exponentiation in GF(2/sup m/), it turns out that our designed exponentiation is more efficient as it leads to simpler architecture and accelerates computation.
A new approach for computing DFT of arbitrary length is proposed, which is based on the arithmetic Fourier transform (AFT). the algorithm needs only /spl Oscr/(N) multiplications and has a simple computational structu...
详细信息
A new approach for computing DFT of arbitrary length is proposed, which is based on the arithmetic Fourier transform (AFT). the algorithm needs only /spl Oscr/(N) multiplications and has a simple computational structure, so it can be easily performed in parallel and it is very suitable for VLSI design. the algorithm is faster than the classical FFT when the length of the DFT contains relatively large factors. It is especially efficient for computing the DFT of prime length, where FFT does not work. the algorithm is competitive withthe FFT in term of accuracy. A method to enhance the accuracy of the algorithm is also proposed for cases when higher accuracy is required.
this paper describes a high level environment dedicated to implement biorthogonal wavelet transforms on the Xilinx XC 4000 series. the system hides the low level hardware details of the FPGA structure and thus allows ...
详细信息
this paper describes a high level environment dedicated to implement biorthogonal wavelet transforms on the Xilinx XC 4000 series. the system hides the low level hardware details of the FPGA structure and thus allows the user to concentrate more on the experimentation rather than on the low-level architecture. the implementation of the biorthogonal 9/7 wavelet shows the effectiveness of the approach.
Using on-board multiple beam antenna (MBA) to obtain the direction of arrival (DOA) is a new method for communication satellite interference location. After reviewing the interference location techniques for communica...
详细信息
Using on-board multiple beam antenna (MBA) to obtain the direction of arrival (DOA) is a new method for communication satellite interference location. After reviewing the interference location techniques for communication satellites, a new high accurate location method based on the radial basis function (RBF) neural network is presented. the neural network method is simple and computationally effective owing to its superior ability of parallelprocessing. Simulation results show the effectiveness of the proposed method.
暂无评论