In this work, we deal with the problem of minimizing the load redistribution cost in parallel implementations for cluster architectures. Due to the importance of the network latency in this kind of systems, the redist...
详细信息
In this work, we deal with the problem of minimizing the load redistribution cost in parallel implementations for cluster architectures. Due to the importance of the network latency in this kind of systems, the redistribution cost is primarily depending on the maximum number of messages sent or received by a processor. The load redistribution is a NP-hard problem similar to the multiple knapsack problems. Three heuristics are proposed to solve the problem in a global context, and a comparison is made to emphasize their characteristics. In a parallel application, it is important to decide whether it is efficient or not to carry out the workload redistribution. This decision is taken comparing the cost of the load imbalance and the communication overheads associated with the load balancing heuristic. Depending on these costs, a theoretic value of imbalance from which the redistribution is profitable is defined. Experimental results show the accuracy of our proposals.
A new parallel normalized explicit preconditioned conjugate gradient method in conjunction with normalized approximate inverse matrix techniques is presented for solving efficiently sparse linear systems on multi-comp...
详细信息
A new parallel normalized explicit preconditioned conjugate gradient method in conjunction with normalized approximate inverse matrix techniques is presented for solving efficiently sparse linear systems on multi-computer systems. Application of the proposed method on a three dimensional boundary value problem is discussed and numerical results are given. The implementation and performance on a distributed, memory MIMD machine, using message passing interface (MPI) is also investigated.
The results of a full three-dimensional, ballistic quantum transport model for a quantum wire silicon MOSFET are presented. We use the recursive scattering matrix approach for simulation of the ballistic transport thr...
详细信息
The results of a full three-dimensional, ballistic quantum transport model for a quantum wire silicon MOSFET are presented. We use the recursive scattering matrix approach for simulation of the ballistic transport through the device (Gilbert and Ferry). An efficient, three-dimensional, self-consistent quantum simulation technique (Gilbert and Ferry) was utilized with the inclusion of an adaptable non-uniform mesh to optimize the discretization of the solution space. One of the key issues surrounding the use of quantum simulations is the discretization of the solution space, as it is necessary that proper grid selection keep the corresponding energies within the artificially-created bandstructure, even when applying large bias across the device. Should the energies exceed the numerical bandstructure, then errors will result in the output. However, in addition to keeping the solutions physical, the grid must be optimized to reduce the number of grid points in order to hold the computational time, particularly at high bias (/spl sim/ 0.5 V) to acceptable levels. These constraints stipulate the use of a non-uniform mesh with finer grid spacing in the high potential regions. We apply this methodology to the simulation of a quantum wire SOI MOSFET with a narrow channel (8 nm).
We compute the weight enumerators of various quadratic residue (QR) codes over F/sub 2/ and F/sub 3/, together with certain codes of related families like the duadic codes. We use a parallel algorithm to find the numb...
详细信息
We compute the weight enumerators of various quadratic residue (QR) codes over F/sub 2/ and F/sub 3/, together with certain codes of related families like the duadic codes. We use a parallel algorithm to find the number of codewords of a given (not too high) weight, from which we deduce by usual classical methods for selfdual and isodual codes over F/sub 2/ and F/sub 3/ their associated, previously unknown, weight enumerators. We compute weight enumerators for lengths as high as 152 for binary codes (except for n=138 for which one lacks the number of codewords of weight 34) and 84 for ternary codes.
It is well known that 4 or 8-neighborhood property has an important role in many algorithms, for example, image processing, solving partial differential equation (PDE) etc. In this paper, we establish these properties...
详细信息
It is well known that 4 or 8-neighborhood property has an important role in many algorithms, for example, image processing, solving partial differential equation (PDE) etc. In this paper, we establish these properties on an OTIS-Mesh, an optoelectronic parallel computer. We show that these properties can be established in constant time with the help of a new indexing scheme called processor data index (PDI), proposed in this paper.
In This work a method of dynamic analysis of big operational Petri nets is described. A net is decomposed and its blocks are distributed within a computer network. Each block is simulated independently, and the result...
详细信息
In This work a method of dynamic analysis of big operational Petri nets is described. A net is decomposed and its blocks are distributed within a computer network. Each block is simulated independently, and the results of simulation are joined and interpreted by the master computer.
Summary form only given. Biological structures are extremely complex at the cellular level. The MCell project has been highly successful in simulating the microphysiology of systems of modest size, but many larger pro...
详细信息
Summary form only given. Biological structures are extremely complex at the cellular level. The MCell project has been highly successful in simulating the microphysiology of systems of modest size, but many larger problems require too much storage and computation time to be simulated on a single workstation. MCell-K, a new parallel variant of MCell, has been implemented using the KeLP framework and is running on NPACl's Blue Horizon. MCell-K not only produces validated results consistent with the serial version of MCell but does so with unprecedented scalability. We have thus found a level of description and a way to simulate cellular systems that can approach the complexity of nature on its own terms. At the heart of MCell is a 3D random walk that models diffusion using a Monte Carlo method. We discuss two challenging issues that arose in parallelizing the diffusion process - detecting time-step termination efficiently and performing parallel diffusion of particles in a biophysically accurate way. We explore the scalability limits of the present parallel algorithm and discuss ways to improve upon these limits.
A new algorithm for modular multiplication for public key cryptography is presented. The algorithm is optimised with respect to area and time by use of a combination of adders and fast lookup tables. This leads to a m...
详细信息
A new algorithm for modular multiplication for public key cryptography is presented. The algorithm is optimised with respect to area and time by use of a combination of adders and fast lookup tables. This leads to a multiplication method that can significantly speed up exponentiation, because the values of the lookup table do not depend on the operands of the individual multiplication. The speedup is achieved by continuous modification of one operand.
Summary form only given. We present a simulation of an acyclic n/spl times/n DR-Mesh on an n/spl times/n LR-Mesh. The simulation is efficient in regards to size since both models use the same number of processors. The...
详细信息
Summary form only given. We present a simulation of an acyclic n/spl times/n DR-Mesh on an n/spl times/n LR-Mesh. The simulation is efficient in regards to size since both models use the same number of processors. The worst execution time for this simulation is O(n/sup 2/) time, but we demonstrate that its average execution time is O(log n). The existing fastest simulation takes O(log n) time, but it uses an extremely large number of processors. On the other hand, the most efficient simulation in terms of size takes O(log/sup 2/ n) time with O(n/sup 4//log/sup 2/ n) processors. Both of the existing simulations are for the unrestricted DR-Mesh. This paper provides an important step to efficiently simulate the unrestricted DR-Mesh on weaker models such as the R-Mesh and the LR-Mesh.
In this paper the result of implementing the PBS/spl ***/LMS algorithm is reported. Transversal adaptive filters for digital signal processing have traditionally been implemented onto DSP processors due to their abili...
详细信息
In this paper the result of implementing the PBS/spl ***/LMS algorithm is reported. Transversal adaptive filters for digital signal processing have traditionally been implemented onto DSP processors due to their ability to perform fast floating-point arithmetic operations. Motorola implemented an adaptive filter on ASICS technology (DSP56300). However, with its growing die size as well as incorporating the embedded digital signal processing blocks, the FPGA devices have become a serious contender in the signal processing market. In this paper an adaptive filter is implemented on 2V1500bg575 (Virtex-II family) and on EPIS25F1020C (Stratix family) FPGA from XiIinx and Altera companies. A comparison with this implementation shows a speed about 10:1 with respect to Motorola ASICS is achieved.
暂无评论