In this paper we present an approach to determine scheduling functions suitable for the design of processor arrays. the considered scheduling functions support a followed LSGP-partitioning of the processor array by al...
详细信息
ISBN:
(纸本)0769507166
In this paper we present an approach to determine scheduling functions suitable for the design of processor arrays. the considered scheduling functions support a followed LSGP-partitioning of the processor array by allowing to execute the tasks of processors of the frill-size array mapped into one processor of the partitioned processor array in art arbitrary order: Several constraints are derived to ensure the causality of computations and to prevent access conflicts to bath modules and registers. We propose an optimization problem generating the scheduling functions and outline its implementation as an integer linear program. the proposed methods are also applicable for the mapping of algorithms to parallelarchitectures. In this case, the scheduling function produces identical, independent small threads which can be combined to utilize the target architecture as much as possible.
the emergence of multimedia technology in recent years is strongly driven by an enormous commercial potential. For the scientific community this development is interesting because a number of attractive disciplines fo...
ISBN:
(纸本)3540679561
the emergence of multimedia technology in recent years is strongly driven by an enormous commercial potential. For the scientific community this development is interesting because a number of attractive disciplines for computer science and engineering flow together into the multimedia mainstream: image processing, computer graphics, data compression, encoding, cryptography, and broadband communication, to mention just a few of them. these fields have always been driving forces behind the design of massively parallelarchitectures and algo- rithms as well as special purpose processors and storage systems.
this paper presents the design of highly optimized TTA architectures for image processing applications. An automatic processor design framework as described in [2] is used. Specialized hardware is used to improve the ...
详细信息
ISBN:
(纸本)3540679561
this paper presents the design of highly optimized TTA architectures for image processing applications. An automatic processor design framework as described in [2] is used. Specialized hardware is used to improve the performance-cost ratio of the processors. An explorer searches the design space for solutions that are good in terms of cost and performance. We show that architectures can be found that efficiently execute very different algorithms at low cost. A hardware feasible architecture is presented that efficiently executes a set of image processingalgorithms and performs almost equally or better than alternative, commercial-available solutions do.
作者:
Danielson, KTAdley, MDNorthwestern Univ
Mech Engn & Army High Performance Comp Res Ctr Evanston IL 60208 USA USA
Waterways Expt Stn Engineer Res & Dev Ctr Vicksburg MS 39180 USA
A meshless modeling procedure of three-dimensional targets for penetration analysis on parallel computing systems is described. Buried structures are modeled by arbitrary layers of concrete and geologic materials, and...
详细信息
A meshless modeling procedure of three-dimensional targets for penetration analysis on parallel computing systems is described. Buried structures are modeled by arbitrary layers of concrete and geologic materials, and the projectile is modeled by standard finite elements. Penetration resistance of the buried structure is provided by functions derived from principles of dynamic cavity expansion. the resistance functions are influenced by the target material properties and projectile kinematics. Additional capabilities accommodate the varying structural and geometrical characteristics of the target. Coupling between the finite elements and the meshless target model is made by applying resistance loads to elements on the outer surface of the projectile mesh. Penetration experiments verify the approach. In this manner, the target is effectively modeled and the strategy is well suited for parallelprocessing. the procedure is incorporated into an explicit transient dynamics code, using mesh partitioning for a coarse grain parallelprocessing paradigm. Message Passing Interface (MPI) is used for all interprocessor communication. Large detailed finite element analyses of projectiles are performed on up to several hundred processors with excellent scalability. the efficiency of the strategy is demonstrated by analyses executed on several types of scalable computing platforms.
Today's applications are both object-oriented and based on a new type of three-tiered client-server architecture with clients, processing servers, and data servers as cornerstones. By recognising these trends, ind...
详细信息
Today's applications are both object-oriented and based on a new type of three-tiered client-server architecture with clients, processing servers, and data servers as cornerstones. By recognising these trends, industry and researchers have been engaged in defining standards and technologies for communicating the components of Distributed Information Systems and for providing compatible mechanisms to access databases, but a key problem withthese complex architectures is still their performance. this paper presents a tool for predicting the performance of systems based on CORBA and DCOM as distributed-object architectures, and OLE-DB and PL/SQL as data-access architectures. the tool is an extension of SMART, a workbench that exploits analytical and simulation performance models to predict the performance of database applications. (C)2000 Elsevier Science B.V. All rights reserved.
this paper presents a new implementation of a 2D wavelet transform in a VLSI circuit, for real-time digital signal processing. the parallel algorithm of the 2D wavelet transform (2D-WT) used for designing and implemen...
详细信息
ISBN:
(纸本)0780365429
this paper presents a new implementation of a 2D wavelet transform in a VLSI circuit, for real-time digital signal processing. the parallel algorithm of the 2D wavelet transform (2D-WT) used for designing and implementing this new architecture enhances the performance of computations. the proposed multi-elementary processor architecture of 2D-WT yields a very flexible hardware configuration. this approach offers a high processing speed, relative to other methods, for providing the wavelet coefficients. the 2D-WT is a powerful tool for several applications, the most important one being image processing.
Simplifying the programming models is paramount to the success of reconfigurable computing. We apply the principles of objectoriented programming to the design of stream architectures for reconfigurable computing. the...
详细信息
this paper examines implementations of a multi-layer perceptron (MLP) on bus-based shared memory (SM) and on distributed memory (DM) multiprocessor systems. the goal has been to optimize HW and SW architectures in ord...
详细信息
this paper examines implementations of a multi-layer perceptron (MLP) on bus-based shared memory (SM) and on distributed memory (DM) multiprocessor systems. the goal has been to optimize HW and SW architectures in order to obtain the fastest response possible. Prototyping parallel MLP algorithms for up to 8 processing nodes withthe DM as well as SM memory was done using CSP-based TRANSIM tool. the results of prototyping MLPs of different sizes on various number of processing nodes demonstrate the feasible speedups, efficiency and time responses for the given CPU speed, link speed or bus bandwidth.
We consider the design and performance of nonlinear minimum mean-square-error multiuser detectors for direct-sequence code-division multiple-access (CDMA) networks. With multiple users transmitting asynchronously at h...
详细信息
We consider the design and performance of nonlinear minimum mean-square-error multiuser detectors for direct-sequence code-division multiple-access (CDMA) networks. With multiple users transmitting asynchronously at high data rates over multipath fading channels, the detectors contend with both multiple-access interference (MAI) and intersymbol interference (ISI). the cyclostationarity of the MAI and ISI is exploited through a feedforward filter (FFF), which processes the samples at the output of parallel chip-matched filters, and a feedback filter (FBF), which processes detected symbols. By altering the connectivity of the FFF and FBF, we define four architectures based on fully connected (FC) and nonconnected (NC) filters. Increased connectivity of the FFF gives each user access to more samples of the received signal, while increased connectivity of the FBF provides each user access to previous decisions of other users, We consider three methods for specifying the FFF sampling and propose a nonuniform FFF sampling scheme based on multipath rag tracking that can offer improved performance relative to uniform FFF sampling. For the FC architecture, we capitalize on the sharing of filter contents among users by deriving a multiuser recursive least squares (RLS) algorithm and direct matrix inversion approach, which determine the coefficients more efficiently than single-user algorithms. We estimate the uncoded bit-error rate (BER) of the feedforward/feedback detectors for CDMA systems with varying levels of power control and timing control for multipath channels with quasi-static Rayleigh fading, the FC-FFF/FC-FBF architecture is shown to offer significant improvement over the NC architectures by sustaining eight users in an asynchronous CDMA system with a processing gain of 8, 2-Mb/s quadrature phase-shift keying (QPSK) transmissions, a delay spread of 1.25 mu s, an average signal-to-noise ratio of 15 dB, with uncoded BER's less than 10(-4) and 10(-3) with 97% and 99.
the evolving of current and future broadband access techniques into the wireless domain introduces new and flexible network architectures with difficult and interesting challenges. the system designers are faced with ...
详细信息
暂无评论