It is well known that 4 or 8-neighborhood property has an important role in many algorithms, for example, image processing, solving partial differential equation (PDE) etc. In this paper, we establish these properties...
详细信息
It is well known that 4 or 8-neighborhood property has an important role in many algorithms, for example, image processing, solving partial differential equation (PDE) etc. In this paper, we establish these properties on an OTIS-Mesh, an optoelectronic parallel computer. We show that these properties can be established in constant time with the help of a new indexing scheme called processor data index (PDI), proposed in this paper.
Hybrid adders, combining a sparse carry-lookahead tree and a carry-select output stage are a well-known implementation form of high-speed adders. In this paper, a hybrid Ling carry-select adder is presented. It is sho...
详细信息
Hybrid adders, combining a sparse carry-lookahead tree and a carry-select output stage are a well-known implementation form of high-speed adders. In this paper, a hybrid Ling carry-select adder is presented. It is shown how a carry-select output stage can be used to eliminate the entire conversion of all pseudo-carries. The adder is implemented in enhanced multiple output domino logic (EMODL). A technique is presented to avoid false discharge paths, which present impairment to EMODL, in the sum selection multiplexer.
Summary form only given. The design of algorithms exhibiting a high degree of temporal and spatial locality of reference is crucial to attain good performance on current and foreseeable computing systems featuring eve...
详细信息
Summary form only given. The design of algorithms exhibiting a high degree of temporal and spatial locality of reference is crucial to attain good performance on current and foreseeable computing systems featuring ever deeper memory hierarchies. Previous work has demonstrated that task parallelism can be efficiently transformed into locality of reference in two-level hierarchies. Recently, we moved a step forward and showed how the more structured type of parallelism exposed by submachine locality can be efficiently turned into temporal locality on arbitrarily deep hierarchies. We complete and extend the above result by encompassing also spatial locality. Specifically, we present a scheme to simulate parallel algorithms designed for the decomposable BSP (a BSP variant which captures submachine locality) on the hierarchical memory model with block transfer. The simulation yields good hierarchy-conscious sequential algorithms from parallel ones, and provides evidence of the strict relation between submachine locality in parallel computation and locality of reference (both temporal and spatial) in the hierarchical memory setting.
Summary form only given. Biological structures are extremely complex at the cellular level. The MCell project has been highly successful in simulating the microphysiology of systems of modest size, but many larger pro...
详细信息
Summary form only given. Biological structures are extremely complex at the cellular level. The MCell project has been highly successful in simulating the microphysiology of systems of modest size, but many larger problems require too much storage and computation time to be simulated on a single workstation. MCell-K, a new parallel variant of MCell, has been implemented using the KeLP framework and is running on NPACl's Blue Horizon. MCell-K not only produces validated results consistent with the serial version of MCell but does so with unprecedented scalability. We have thus found a level of description and a way to simulate cellular systems that can approach the complexity of nature on its own terms. At the heart of MCell is a 3D random walk that models diffusion using a Monte Carlo method. We discuss two challenging issues that arose in parallelizing the diffusion process - detecting time-step termination efficiently and performing parallel diffusion of particles in a biophysically accurate way. We explore the scalability limits of the present parallel algorithm and discuss ways to improve upon these limits.
Given an array of positive and negative values, we consider the problem of K maximum sums. When an overlapping property needs to be observed, previous algorithms for the maximum sum are not directly applicable. We des...
详细信息
ISBN:
(纸本)0769521355
Given an array of positive and negative values, we consider the problem of K maximum sums. When an overlapping property needs to be observed, previous algorithms for the maximum sum are not directly applicable. We designed an O(K * n) algorithm for the K maximum subsequences problem. This was then modified to solve the K maximum subarrays problem in O(K * n/sup 3/) time. Finally, we present a VLSI K maximum subarrays algorithm with O(K * n) steps and a circuit size of O(n/sup 2/), which is cost-optimal in parallelisation of the sequential algorithm.
Summary form only given. We present a simulation of an acyclic n/spl times/n DR-Mesh on an n/spl times/n LR-Mesh. The simulation is efficient in regards to size since both models use the same number of processors. The...
详细信息
Summary form only given. We present a simulation of an acyclic n/spl times/n DR-Mesh on an n/spl times/n LR-Mesh. The simulation is efficient in regards to size since both models use the same number of processors. The worst execution time for this simulation is O(n/sup 2/) time, but we demonstrate that its average execution time is O(log n). The existing fastest simulation takes O(log n) time, but it uses an extremely large number of processors. On the other hand, the most efficient simulation in terms of size takes O(log/sup 2/ n) time with O(n/sup 4//log/sup 2/ n) processors. Both of the existing simulations are for the unrestricted DR-Mesh. This paper provides an important step to efficiently simulate the unrestricted DR-Mesh on weaker models such as the R-Mesh and the LR-Mesh.
In this paper the result of implementing the PBS/spl ***/LMS algorithm is reported. Transversal adaptive filters for digital signal processing have traditionally been implemented onto DSP processors due to their abili...
详细信息
In this paper the result of implementing the PBS/spl ***/LMS algorithm is reported. Transversal adaptive filters for digital signal processing have traditionally been implemented onto DSP processors due to their ability to perform fast floating-point arithmetic operations. Motorola implemented an adaptive filter on ASICS technology (DSP56300). However, with its growing die size as well as incorporating the embedded digital signal processing blocks, the FPGA devices have become a serious contender in the signal processing market. In this paper an adaptive filter is implemented on 2V1500bg575 (Virtex-II family) and on EPIS25F1020C (Stratix family) FPGA from XiIinx and Altera companies. A comparison with this implementation shows a speed about 10:1 with respect to Motorola ASICS is achieved.
In This work a method of dynamic analysis of big operational Petri nets is described. A net is decomposed and its blocks are distributed within a computer network. Each block is simulated independently, and the result...
详细信息
In This work a method of dynamic analysis of big operational Petri nets is described. A net is decomposed and its blocks are distributed within a computer network. Each block is simulated independently, and the results of simulation are joined and interpreted by the master computer.
The results of a full three-dimensional, ballistic quantum transport model for a quantum wire silicon MOSFET are presented. We use the recursive scattering matrix approach for simulation of the ballistic transport thr...
详细信息
The results of a full three-dimensional, ballistic quantum transport model for a quantum wire silicon MOSFET are presented. We use the recursive scattering matrix approach for simulation of the ballistic transport through the device (Gilbert and Ferry). An efficient, three-dimensional, self-consistent quantum simulation technique (Gilbert and Ferry) was utilized with the inclusion of an adaptable non-uniform mesh to optimize the discretization of the solution space. One of the key issues surrounding the use of quantum simulations is the discretization of the solution space, as it is necessary that proper grid selection keep the corresponding energies within the artificially-created bandstructure, even when applying large bias across the device. Should the energies exceed the numerical bandstructure, then errors will result in the output. However, in addition to keeping the solutions physical, the grid must be optimized to reduce the number of grid points in order to hold the computational time, particularly at high bias (/spl sim/ 0.5 V) to acceptable levels. These constraints stipulate the use of a non-uniform mesh with finer grid spacing in the high potential regions. We apply this methodology to the simulation of a quantum wire SOI MOSFET with a narrow channel (8 nm).
Summary form only given. In recent years, there was a huge development of low cost large scale parallel systems. The design of efficient parallel algorithms has to be reconsidered to take into account new parameters o...
详细信息
Summary form only given. In recent years, there was a huge development of low cost large scale parallel systems. The design of efficient parallel algorithms has to be reconsidered to take into account new parameters of such execution platforms which are characterized by a larger number of heterogeneous processors, often organized as hierarchical subsystems. Alternative computational models have been designed to take into account these new characteristics. parallel tasks model /spl times/ PT in short - is a promising alternative for scheduling parallel applications. Another way of looking at the problem (which is somehow a dual view) is the divisible load model (DL) where an application is considered as a collection of a large number of elementary - sequential - computing units. These two new views of the problem allow us to consider communications implicitly or to mask them, leading to more tractable problems. This paper, first, presents some approximation algorithms for the PT model with a special emphasis on new execution platforms. We show how to mix these results with the DL model to manage the resources of an actual computational grid of 600 processors.
暂无评论