this paper examines implementations of a multi-layer perceptron (MLP) on bus-based shared memory (SM) and on distributed memory (DM) multiprocessor systems. the goal has been to optimize HW and SW architectures in ord...
详细信息
this paper examines implementations of a multi-layer perceptron (MLP) on bus-based shared memory (SM) and on distributed memory (DM) multiprocessor systems. the goal has been to optimize HW and SW architectures in order to obtain the fastest response possible. Prototyping parallel MLP algorithms for up to 8processing nodes withthe DM as well as SM memory was done using CSP-based TRANSIM tool. the results of prototyping MLPs of different sizes on various number of processing nodes demonstrate the feasible speedups, efficiency and time responses for the given CPU speed, link speed or bus bandwidth.
the proceedings contain 86 papers. the special focus in this conference is on Workshops, and Several Associated Events. the topics include: A problem solving environment based on commodity software;a virtual programmi...
ISBN:
(纸本)9783540675532
the proceedings contain 86 papers. the special focus in this conference is on Workshops, and Several Associated Events. the topics include: A problem solving environment based on commodity software;a virtual programming environment for high performance parallel computing;grid computing on the web using the globus toolkit;algorithms for generic tools in parallel numerical simulation;dynamic grid adaption for computational magnetohydrodynamics;parallelization of irregular problems based on hierarchical domain representation;dynamic iterative method for fast network partitioning;a family of parallel incomplete cholesky preconditioners;a parallel block preconditioner accelerated by coarse grid correction;towards an execution system for distributed business processes in a virtual enterprise;towards a multi-layer architecture for scientific virtual laboratories;modelling control systems in an event-driven coordination language;ruling agent motion in structured environments;dynamic reconfiguration in coordination languages;adding flexibility in a cooperative workflow execution engine;a web-based distributed programming environment;interoperability support in distributed on-line monitoring systems;run-time optimization using dynamic performance prediction;performance portability for skeletal programming;a novel distributed algorithm for high-throughput and scalable gossiping;run-time support to register allocation for loop parallelization of image processing programs;a java-based parallel programming support environment;task farm computations in java;simulating job scheduling for clusters of workstations;effective communication support for parallel applications running on clusters of commodity workstation;distributed parallel query processing on networks of workstations and explicit schemes applied to aeroacoustic simulations.
this paper presents a new implementation of a 2D wavelet transform in a VLSI circuit, for real-time digital signal processing. the parallel algorithm of the 2D wavelet transform (2D-WT) used for designing and implemen...
详细信息
ISBN:
(纸本)0780365429
this paper presents a new implementation of a 2D wavelet transform in a VLSI circuit, for real-time digital signal processing. the parallel algorithm of the 2D wavelet transform (2D-WT) used for designing and implementing this new architecture enhances the performance of computations. the proposed multi-elementary processor architecture of 2D-WT yields a very flexible hardware configuration. this approach offers a high processing speed, relative to other methods, for providing the wavelet coefficients. the 2D-WT is a powerful tool for several applications, the most important one being image processing.
We consider the design and performance of nonlinear minimum mean-square-error multiuser detectors for direct-sequence code-division multiple-access (CDMA) networks. With multiple users transmitting asynchronously at h...
详细信息
We consider the design and performance of nonlinear minimum mean-square-error multiuser detectors for direct-sequence code-division multiple-access (CDMA) networks. With multiple users transmitting asynchronously at high data rates over multipath fading channels, the detectors contend with both multiple-access interference (MAI) and intersymbol interference (ISI). the cyclostationarity of the MAI and ISI is exploited through a feedforward filter (FFF), which processes the samples at the output of parallel chip-matched filters, and a feedback filter (FBF), which processes detected symbols. By altering the connectivity of the FFF and FBF, we define four architectures based on fully connected (FC) and nonconnected (NC) filters. Increased connectivity of the FFF gives each user access to more samples of the received signal, while increased connectivity of the FBF provides each user access to previous decisions of other users, We consider three methods for specifying the FFF sampling and propose a nonuniform FFF sampling scheme based on multipath rag tracking that can offer improved performance relative to uniform FFF sampling. For the FC architecture, we capitalize on the sharing of filter contents among users by deriving a multiuser recursive least squares (RLS) algorithm and direct matrix inversion approach, which determine the coefficients more efficiently than single-user algorithms. We estimate the uncoded bit-error rate (BER) of the feedforward/feedback detectors for CDMA systems with varying levels of power control and timing control for multipath channels with quasi-static Rayleigh fading, the FC-FFF/FC-FBF architecture is shown to offer significant improvement over the NC architectures by sustaining eight users in an asynchronous CDMA system with a processing gain of 8, 2-Mb/s quadrature phase-shift keying (QPSK) transmissions, a delay spread of 1.25 mu s, an average signal-to-noise ratio of 15 dB, with uncoded BER's less than 10(-4) and 10(-3) with 97% and 99.
the proceedings contain 51 papers. the special focus in this conference is on System Software and algorithms. the topics include: Charon message-passing toolkit for scientific computations;dynamic slicing of concurren...
ISBN:
(纸本)3540414290
the proceedings contain 51 papers. the special focus in this conference is on System Software and algorithms. the topics include: Charon message-passing toolkit for scientific computations;dynamic slicing of concurrent programs;an efficient run-time scheme for exploiting parallelism on multiprocessor systems;characterization and enhancement of static mapping heuristics for heterogeneous systems;optimal segmented scan and simulation of reconfigurable architectures on fixed connection networks;reducing false causality in causal message ordering;the working-set based adaptive protocol for software distributed shared memory;evaluation of the optimal causal message ordering algorithm;register efficient mergesorting;applying patterns to improve the performance of fault tolerant CORBA;design, implementation and performance evaluation of a high performance CORBA group membership protocol;analyzing the behavior of event dispatching systems through simulation;a domain-specific semi-automatic parallelization tool;practical experiences with java compilation;performance prediction and analysis of parallel out-of-core matrix factorization;integration of task and data parallelism;parallel and distributed computational fluid dynamics;parallel congruent regions on a mesh-connected computer;can scatter communication take advantage of multidestination message passing?;a first class design constraint for future architectures;embedded computing;instruction level distributed processing;speculative multithreaded processors;a fast tree-based barrier synchronization on switch-based irregular networks;meta-data management system for high-performance large-scale scientific data access and parallel sorting algorithms with sampling techniques on clusters with processors running at different speeds.
Network of workstations (NOW) is an attractive alternative to parallel database systems. Here we present a distributed architecture for parallel query processing on networks of workstations. We describe a comprehensiv...
详细信息
We use a network of workstations to compute all pairwise alignments of the 500 bp upstream regions of 6,225 yeast ORFs (Open Reading Frames). We correlate the alignments with DNA microarray expression data from buddin...
详细信息
GeoFEM is solid earth simulator software, which is under development to be used by STA's super parallel computer, "Earth Simulator (GS40)". GeoFEM has already been used to analyze large-scale static line...
详细信息
ISBN:
(纸本)3540675531
GeoFEM is solid earth simulator software, which is under development to be used by STA's super parallel computer, "Earth Simulator (GS40)". GeoFEM has already been used to analyze large-scale static linear problems up to 100M (100,000,000) degrees of freedom (DOF) and demonstrated high performance computing ability by parallelprocessing. therefore, in the present study, we apply the same data structure and a similar parallelization method to wave propagation problems. We describe the formulation, parallelization procedure and benchmark results of the wave propagation analysis function of GeoFEM. According to the simple benchmark problem, GeoFEM demonstrated more than 98% of the rate of CPU usage performing parallel wave propagation analysis of 100M DOF problem using 1,000 PEs and also shows almost linear scalability on the Hitachi SR2201.
the two-layered distributed clustered server architecture consisting of a control server and a group of storage servers has been widely used to support multimedia file systems. Withthis kind of the server architectur...
详细信息
ISBN:
(纸本)9781581131987
the two-layered distributed clustered server architecture consisting of a control server and a group of storage servers has been widely used to support multimedia file systems. Withthis kind of the server architecture, it is easy to sustain disk load balancing and scalable disk bandwidth. However, it brings about the communication overhead between the control server and storage servers. therefore, to reduce such overhead, media data should be transmitted from the storage servers to the users directly without the intervention of the control server. In addition, the file system must support the effective data placement and disk scheduling policies and should be designed to sustain portability and flexibility in any hardware or software environment to cope with rapidly developed hardware and software technologies. So, to fulfill such requirements described in the above statements, we designed and implemented a parallel file system named PMFS(parallel Multimedia File System) and analyzed its performance by comparing it with PVFS(parallel Virtual File System). As a result of our experimental simulation, we came to conclusion that the PMFS shows better performance than the PVFS does in real time data processing.
暂无评论