this paper presents the design of highly optimized TTA architectures for image processing applications. An automatic processor design framework as described in [2] is used. Specialized hardware is used to improve the ...
详细信息
ISBN:
(纸本)3540679561
this paper presents the design of highly optimized TTA architectures for image processing applications. An automatic processor design framework as described in [2] is used. Specialized hardware is used to improve the performance-cost ratio of the processors. An explorer searches the design space for solutions that are good in terms of cost and performance. We show that architectures can be found that efficiently execute very different algorithms at low cost. A hardware feasible architecture is presented that efficiently executes a set of image processingalgorithms and performs almost equally or better than alternative, commercial-available solutions do.
作者:
Danielson, KTAdley, MDNorthwestern Univ
Mech Engn & Army High Performance Comp Res Ctr Evanston IL 60208 USA USA
Waterways Expt Stn Engineer Res & Dev Ctr Vicksburg MS 39180 USA
A meshless modeling procedure of three-dimensional targets for penetration analysis on parallel computing systems is described. Buried structures are modeled by arbitrary layers of concrete and geologic materials, and...
详细信息
A meshless modeling procedure of three-dimensional targets for penetration analysis on parallel computing systems is described. Buried structures are modeled by arbitrary layers of concrete and geologic materials, and the projectile is modeled by standard finite elements. Penetration resistance of the buried structure is provided by functions derived from principles of dynamic cavity expansion. the resistance functions are influenced by the target material properties and projectile kinematics. Additional capabilities accommodate the varying structural and geometrical characteristics of the target. Coupling between the finite elements and the meshless target model is made by applying resistance loads to elements on the outer surface of the projectile mesh. Penetration experiments verify the approach. In this manner, the target is effectively modeled and the strategy is well suited for parallelprocessing. the procedure is incorporated into an explicit transient dynamics code, using mesh partitioning for a coarse grain parallelprocessing paradigm. Message Passing Interface (MPI) is used for all interprocessor communication. Large detailed finite element analyses of projectiles are performed on up to several hundred processors with excellent scalability. the efficiency of the strategy is demonstrated by analyses executed on several types of scalable computing platforms.
Today's applications are both object-oriented and based on a new type of three-tiered client-server architecture with clients, processing servers, and data servers as cornerstones. By recognising these trends, ind...
详细信息
Today's applications are both object-oriented and based on a new type of three-tiered client-server architecture with clients, processing servers, and data servers as cornerstones. By recognising these trends, industry and researchers have been engaged in defining standards and technologies for communicating the components of Distributed Information Systems and for providing compatible mechanisms to access databases, but a key problem withthese complex architectures is still their performance. this paper presents a tool for predicting the performance of systems based on CORBA and DCOM as distributed-object architectures, and OLE-DB and PL/SQL as data-access architectures. the tool is an extension of SMART, a workbench that exploits analytical and simulation performance models to predict the performance of database applications. (C)2000 Elsevier Science B.V. All rights reserved.
this paper presents a new implementation of a 2D wavelet transform in a VLSI circuit, for real-time digital signal processing. the parallel algorithm of the 2D wavelet transform (2D-WT) used for designing and implemen...
详细信息
ISBN:
(纸本)0780365429
this paper presents a new implementation of a 2D wavelet transform in a VLSI circuit, for real-time digital signal processing. the parallel algorithm of the 2D wavelet transform (2D-WT) used for designing and implementing this new architecture enhances the performance of computations. the proposed multi-elementary processor architecture of 2D-WT yields a very flexible hardware configuration. this approach offers a high processing speed, relative to other methods, for providing the wavelet coefficients. the 2D-WT is a powerful tool for several applications, the most important one being image processing.
Simplifying the programming models is paramount to the success of reconfigurable computing. We apply the principles of objectoriented programming to the design of stream architectures for reconfigurable computing. the...
详细信息
this paper examines implementations of a multi-layer perceptron (MLP) on bus-based shared memory (SM) and on distributed memory (DM) multiprocessor systems. the goal has been to optimize HW and SW architectures in ord...
详细信息
this paper examines implementations of a multi-layer perceptron (MLP) on bus-based shared memory (SM) and on distributed memory (DM) multiprocessor systems. the goal has been to optimize HW and SW architectures in order to obtain the fastest response possible. Prototyping parallel MLP algorithms for up to 8 processing nodes withthe DM as well as SM memory was done using CSP-based TRANSIM tool. the results of prototyping MLPs of different sizes on various number of processing nodes demonstrate the feasible speedups, efficiency and time responses for the given CPU speed, link speed or bus bandwidth.
We consider the design and performance of nonlinear minimum mean-square-error multiuser detectors for direct-sequence code-division multiple-access (CDMA) networks. With multiple users transmitting asynchronously at h...
详细信息
We consider the design and performance of nonlinear minimum mean-square-error multiuser detectors for direct-sequence code-division multiple-access (CDMA) networks. With multiple users transmitting asynchronously at high data rates over multipath fading channels, the detectors contend with both multiple-access interference (MAI) and intersymbol interference (ISI). the cyclostationarity of the MAI and ISI is exploited through a feedforward filter (FFF), which processes the samples at the output of parallel chip-matched filters, and a feedback filter (FBF), which processes detected symbols. By altering the connectivity of the FFF and FBF, we define four architectures based on fully connected (FC) and nonconnected (NC) filters. Increased connectivity of the FFF gives each user access to more samples of the received signal, while increased connectivity of the FBF provides each user access to previous decisions of other users, We consider three methods for specifying the FFF sampling and propose a nonuniform FFF sampling scheme based on multipath rag tracking that can offer improved performance relative to uniform FFF sampling. For the FC architecture, we capitalize on the sharing of filter contents among users by deriving a multiuser recursive least squares (RLS) algorithm and direct matrix inversion approach, which determine the coefficients more efficiently than single-user algorithms. We estimate the uncoded bit-error rate (BER) of the feedforward/feedback detectors for CDMA systems with varying levels of power control and timing control for multipath channels with quasi-static Rayleigh fading, the FC-FFF/FC-FBF architecture is shown to offer significant improvement over the NC architectures by sustaining eight users in an asynchronous CDMA system with a processing gain of 8, 2-Mb/s quadrature phase-shift keying (QPSK) transmissions, a delay spread of 1.25 mu s, an average signal-to-noise ratio of 15 dB, with uncoded BER's less than 10(-4) and 10(-3) with 97% and 99.
the evolving of current and future broadband access techniques into the wireless domain introduces new and flexible network architectures with difficult and interesting challenges. the system designers are faced with ...
详细信息
the proceedings contain 51 papers. the special focus in this conference is on System Software and algorithms. the topics include: Charon message-passing toolkit for scientific computations;dynamic slicing of concurren...
ISBN:
(纸本)3540414290
the proceedings contain 51 papers. the special focus in this conference is on System Software and algorithms. the topics include: Charon message-passing toolkit for scientific computations;dynamic slicing of concurrent programs;an efficient run-time scheme for exploiting parallelism on multiprocessor systems;characterization and enhancement of static mapping heuristics for heterogeneous systems;optimal segmented scan and simulation of reconfigurable architectures on fixed connection networks;reducing false causality in causal message ordering;the working-set based adaptive protocol for software distributed shared memory;evaluation of the optimal causal message ordering algorithm;register efficient mergesorting;applying patterns to improve the performance of fault tolerant CORBA;design, implementation and performance evaluation of a high performance CORBA group membership protocol;analyzing the behavior of event dispatching systems through simulation;a domain-specific semi-automatic parallelization tool;practical experiences with java compilation;performance prediction and analysis of parallel out-of-core matrix factorization;integration of task and data parallelism;parallel and distributed computational fluid dynamics;parallel congruent regions on a mesh-connected computer;can scatter communication take advantage of multidestination message passing?;a first class design constraint for future architectures;embedded computing;instruction level distributed processing;speculative multithreaded processors;a fast tree-based barrier synchronization on switch-based irregular networks;meta-data management system for high-performance large-scale scientific data access and parallel sorting algorithms with sampling techniques on clusters with processors running at different speeds.
the proceedings contain 34 papers. the special focus in this conference is on RTL Power Modeling, Power Estimation, System-Level Design, Transistor-Level Modeling and Asynchronous Circuit Design. the topics include: A...
ISBN:
(纸本)9783540410683
the proceedings contain 34 papers. the special focus in this conference is on RTL Power Modeling, Power Estimation, System-Level Design, Transistor-Level Modeling and Asynchronous Circuit Design. the topics include: Architectural design space exploration achieved through innovative RTL power estimation techniques;power models for semi-autonomous RTL macros;RTL estimation of steering logic power;reducing power consumption through dynamic frequency scaling for a class of digital receivers;framework for high-level power estimation of signal processingarchitectures;adaptive bus encoding techique for switching activity reduced data transfer over wide system buses;accurate power estimation of logic structures based on timed Boolean functions;a holistic approach to system level energy optimization;design-space exploration of low power coarse grained reconfigurable datapath array architectures;internal power dissipation modeling and minimization for submicronic CMOS design;degradation delay model extension to CMOS gates;second generation delay model for submicron CMOS process;semi-modular latch chains for asynchronous circuit design;comparative study on self-checking carry-propagate adders in terms of area, power and performance;VLSI Implementation of a low-power high-speed self-timed adder;low power design techniques for contactless chipcards;dynamic memory design for low data-retention power;data-reuse and parallel embedded architectures for low-power, real-time multimedia applications and modeling of power consumption of adiabatic gates versus fan in and comparison with conventional gates.
暂无评论