Within the trend of object-based distributed computing, we present the design and implementation of a numerical simulation for electromagnetic waves propagation. A sequential Java design and implementation is first pr...
详细信息
ISBN:
(纸本)0769521320
Within the trend of object-based distributed computing, we present the design and implementation of a numerical simulation for electromagnetic waves propagation. A sequential Java design and implementation is first presented. Further, a distributed and parallel version is derived from the first, using an active object pattern. In addition, benchmarks are presented on this non embarrassingly parallel application. A first contribution of this paper resides in the sequential object-oriented design that proved to be very modular and extensible;the classes and abstractions are designed to allow both element and volume type methods, furthermore, valid on structured, unstructured, or hybrid meshes. Compared to a Fortran version, the performance of this highly modular version proved to be in the same range. It is also shown how smoothly the sequential version can be distributed, keeping the same structuring and object abstractions, allowing to deal with larger data size. Finally, benchmarks on up to 64 processors compare the performances with respect to sequential and parallel versions, putting that in perspective with a comparable Fortran version.
Many practical applications require solving an optimization over large and high-dimensional data sets, which makes these problems hard to solve and prohibitively time consuming. In this paper, we propose a parallel di...
详细信息
ISBN:
(纸本)9781479903566
Many practical applications require solving an optimization over large and high-dimensional data sets, which makes these problems hard to solve and prohibitively time consuming. In this paper, we propose a paralleldistributed algorithm that uses an adaptive regularizer (PDAR) to solve a joint optimization problem with separable constraints. The regularizer is adaptive and depends on the step size between iterations and the iteration number. We show theoretical convergence of our algorithm to an optimal solution, and use a multi-agent three-bin resource allocation example to illustrate the effectiveness of the proposed algorithm. Numerical simulations show that our algorithm converges to the same optimal solution as other distributed methods, with significantly reduced computational time.
In recent years, three-dimensional imaging by means of SAR tomography has become a field of intensive research. In SAR tomography, the vertical reflectivity function for every azimuth-range pixel is usually recovered ...
详细信息
ISBN:
(纸本)9781457710056
In recent years, three-dimensional imaging by means of SAR tomography has become a field of intensive research. In SAR tomography, the vertical reflectivity function for every azimuth-range pixel is usually recovered by processing data collected using a defined repeat pass acquisition geometry. The most common approach is to generate a synthetic aperture in the elevation direction through imaging from a large number of parallel tracks. This imaging technique is appealing, since it is very simple. However, it has the drawback that large temporal baselines, which is the case for space-borne platforms, can severely affect the reconstruction. In an attempt to reduce the number of parallel tracks, we propose a new tomographic focusing approach that trades number of SAR images for correlations between neighboring azimuth-range pixels and polarimetric channels. As a matter of fact, this can be done under the framework of distributed Compressed Sensing (DCS), which stems from Compressed Sensing (CS) theory, thus also exploiting sparsity in our tomographic signal. In addition, we address the problem of measurements affected by additive as well as multiplicative speckle noise. Results demonstrating the potential of the DCS methodology will be validated by using fully polarimetric L-band data acquired by the E-SAR sensor of DLR.
With the continuous expansion of data centers, their carbon emission becomes a serious issue. A number of studies are committing to reduce the carbon emission of data centers. Carbon trading is a promising emission re...
详细信息
This paper presents a novel reconfrgurable data flow processing architecture that promises high performance by explicitly targeting both fine- and course-grained parallelism. This architecture is based on multiple FPG...
详细信息
ISBN:
(纸本)0769526616
This paper presents a novel reconfrgurable data flow processing architecture that promises high performance by explicitly targeting both fine- and course-grained parallelism. This architecture is based on multiple FPGAs organized in a scalable direct network that is substantially more interconnect-efficient than currently used crossbar technology. In addition, we discuss several ancillary issues and propose solutions required to support this architecture and achieve maximal performance for general-purpose applications;these include supporting IP, mapping techniques, and routing policies that enable greater flexibility for architectural evolution and code portability.
We describe an efficient parallel algorithm for hidden-surface removal for terrain maps. The algorithm runs in O(log4 n) steps on the CREW PRAM model with a work bound of O((n+k)polylog(n)) where n and k are the input...
详细信息
ISBN:
(纸本)0818684046
We describe an efficient parallel algorithm for hidden-surface removal for terrain maps. The algorithm runs in O(log4 n) steps on the CREW PRAM model with a work bound of O((n+k)polylog(n)) where n and k are the input and output sizes respectively. In order to achieve the work bound we use a number of techniques, among which our use of persistent data-structures is somewhat novel in the context of parallel algorithms. To the best of our knowledge this is the most efficient parallel algorithm for hidden-surface removal for an important class of 3-D scenes.
Reconfigurable models were shown to be very powerful in solving many problems faster than non reconfigurable models. WECPAR W(M,N,k) is an M x N reconfigurable model that has point-to-point reconfigurable interconnect...
详细信息
ISBN:
(纸本)9781479941162
Reconfigurable models were shown to be very powerful in solving many problems faster than non reconfigurable models. WECPAR W(M,N,k) is an M x N reconfigurable model that has point-to-point reconfigurable interconnection with k wires between neighboring processors. This paper studies several aspects of WECPAR. We first solve the list ranking problem on WECPAR. Some of the results obtained show that ranking one element in a list of N elements can be solved on W(N,N,N) WECPAR in O(1) time. Also, on W(N,N,k), ranking a list L(N) of N elements can be done in O((log N)( inverted right perpendicular log(k) (+1) N inverted left perpendicular )) time. To transfer a large body of algorithms to work on WECPAR and to assess its relative computational power, several simulations algorithms are introduced between WECPAR and well-known models such as PRAM and RMBM. Simulations algorithms show that a PRIORITY CRCW PRAM of N processors and S shared memory locations can be simulated by an W(S, N, k) WECPAR in O( inverted right perpendicular log(k) (+1) N inverted left perpendicular + inverted right perpendicular log S-k (+1) inverted left perpendicular ) time. Also, we show that a PRIORITY CRCW Basic-RMBM(P, B), of P processors and B buses can be simulated by an W(B, P+ B, k) WECPAR in O( inverted right perpendicular log(k) (+1) (P + B) inverted left perpendicular ) time. This has the effect of migrating a large number of algorithms to work directly on WECPAR with the simulation overhead.
The advent of multi-core processors has made parallel computing techniques mandatory on main stream systems. With the recent rise of hardware accelerators, hybrid parallelism adds yet another dimension of complexity t...
详细信息
We describe an experimental time utility for synchronizing the operating system clocks on the SP1 and SP2 parallel system nodes. It synchronizes the node clocks typically within 5 microseconds of each other utilizing ...
详细信息
We describe an experimental time utility for synchronizing the operating system clocks on the SP1 and SP2 parallel system nodes. It synchronizes the node clocks typically within 5 microseconds of each other utilizing the synchronous feature of the SP1 and SP2 interconnection network. This is 2 to 3 orders of magnitude better than what can be achieved by previous methods. Synchronized clocks are useful for parallel program performance measurement and tuning, parallel program tracing and debugging, and gang scheduling of parallel processes, to name a few. We also measure the performance of a widely used time synchronization utility using the SP1 and SP2 interconnection network.
暂无评论