The authors propose a new method for finding the sum of two or more multidigit values in a parallel computational model. The method reduces finding the sum of a large number of multidigit values to the sum of two mult...
详细信息
The authors propose a new method for finding the sum of two or more multidigit values in a parallel computational model. The method reduces finding the sum of a large number of multidigit values to the sum of two multidigit values by carry-save addition, which can be efficiently implemented in a parallel computational model based on carry-lookahead addition of groups of words. The algorithms for implementing the operation of finding the sum of values on one processor and on k processors are proposed. The complexity analysis is carried out for the proposed algorithms.
We present a parallel approach for integrating speech and natural language understanding. The method emphasizes a hierarchically-structured knowledge base and memory-based parsing techniques. Processing is carried out...
详细信息
We present a parallel approach for integrating speech and natural language understanding. The method emphasizes a hierarchically-structured knowledge base and memory-based parsing techniques. Processing is carried out by passing multiple markers in parallel through the knowledge base. Speech-specific problems such as insertion, deletion, substitution, and word boundary detection have been analyzed and their parallel solutions are provided. Results on the SNAP-1 multiprocessor show an 80% sentence recognition rate for the Air Traffic Control (ATC) domain. Furthermore, speed-up of up to 15-fold is obtained from the parallel platform which provides response times of a few seconds per sentence for the ATC domain.
Processor networks connected by buses have attracted considerable attention. Since a reconfigurable array is more powerful than a PRAM and more practical, it becomes the focus of attention. The Processor Array with Re...
详细信息
Processor networks connected by buses have attracted considerable attention. Since a reconfigurable array is more powerful than a PRAM and more practical, it becomes the focus of attention. The Processor Array with Reconfigurable Bus System (PARBS) and the Reconfigurable Multiple Bus Machine (RMBM) are both models of parallel computation based on reconfigurable bus and processor array. The PARBS is a processor array that consists of processors arranged to a 2-dimensional grid with a reconfigurable bus system. The RMBM is also made of processors and reconfigurable bus system, but the processors are located in a row and the number of processors and the number of buses are independent of each other. Four versions of RMBM has been proposed and Extended RMBM (E-RMBM) is regarded as the most powerful one among them. In this paper, we describe that a PARBS of size N x M can be simulated in constant time by a E-RMBM of 4N M processors, M + 3 buses and 1 read buffer, and that a E-RMBM of P processors, B buses and D read buffers can be also simulated in constant time by a PARBS of size B x P. A PARBS or RMBM that solves a computational problem of size n is polynomially bounded iff the product of the number of processors and buses and read and write ports is O(n(c)), for some constant c. When a PARBS is polynomially bounded, the E-RMBM which simulates it is also polynomially bounded, and vice versa.
The paper presents the process of homogenization of the composite material properties obtained by method of continuous source functions developed for simulation both elasticity and heat conduction in composite materia...
详细信息
The paper presents the process of homogenization of the composite material properties obtained by method of continuous source functions developed for simulation both elasticity and heat conduction in composite material reinforced by finite-length regularly distributed, parallel, overlapping fibres. The interaction (fibre-fibre, fibre-matrix) of physical micro-fields influences the composite behaviour. Comparing with finite element method (FEM), the interaction can be simulated either by very fine FE mesh or the interaction is smoothed. The presented computational method is a mesh-reducing boundary meshless type method. The increase in computational efficiency is obtained by use of parallel MATLAB in presented computationalmodels. The stiffness/conductivity is incrementally reduced starting with superconductive/rigid material properties of fibres and the fibre-matrix interface boundary conditions are satisfied by the iterative procedure. The computational examples presented in paper show the homogenized properties of finite-length fibre composites;the thermal and elasticity behaviour of the finite-length fibre composites;the similarities and differences in composite behaviour in thermal and elasticity problems;the control volume element for homogenization of composite materials reinforced by finite-length fibres with the large aspect ratio (length/diameter). The behaviour of the finite-length fibre composite will be shown in similar the heat conduction and elasticity problems. Moreover, the paper provides the possibilities and difficulties connected with present numerical models and suggested ways for further developments.
The MASC (Multiple ASsociative Computing) model is a multi-SIMD model that uses control parallelism to coordinate the interaction of data parallel threads and supports associative SIMD computing on each of its threads...
详细信息
The MASC (Multiple ASsociative Computing) model is a multi-SIMD model that uses control parallelism to coordinate the interaction of data parallel threads and supports associative SIMD computing on each of its threads. There have been a wide range of algorithms developed for this model. Research on using this model in real-time system applications and building a scalable MASC architecture is currently quite active. In this paper, we present simulations between MASC and reconfigurable bus-based models, e.g., various versions of the Reconfigurable Multiple Bus Machine (RMBM). Constant time simulations of the basic RMBM by MASC and vice versa are obtained. Simulations of the segmenting RMBM, fusing RMBM, and extended RMBM by MASC in non-constant time are also discussed. By taking advantage of previously established relationships between RMBM and two other popular parallel computational models, namely, the Reconfigurable Mesh (RM) and the parallel Random Access Machine (PRAM), we extend our simulation results to further categorize the power of the MASC model in relation to RM and PRAM. (C) 2009 Elsevier Inc. All rights reserved.
In this paper we develop optimal algorithms in the binary-forking model for a variety of fundamental problems, including sorting, semisorting, list ranking, tree contraction, range minima, and ordered set union, inter...
详细信息
ISBN:
(纸本)9781450369350
In this paper we develop optimal algorithms in the binary-forking model for a variety of fundamental problems, including sorting, semisorting, list ranking, tree contraction, range minima, and ordered set union, intersection and difference. In the binary-forking model, tasks can only fork into two child tasks, but can do so recursively and asynchronously. The tasks share memory, supporting reads, writes and test-and-sets. Costs are measured in terms of work (total number of instructions), and span (longest dependence chain). The binary-forking model is meant to capture both algorithm performance and algorithm-design considerations on many existing multithreaded languages, which are also asynchronous and rely on binary forks either explicitly or under the covers. In contrast to the widely studied PRAM model, it does not assume arbitrary-way forks nor synchronous operations, both of which are hard to implement in modern hardware. While optimal PRAM algorithms are known for the problems studied herein, it turns out that arbitrary-way forking and strict synchronization are powerful, if unrealistic, capabilities. Natural simulations of these PRAM algorithms in the binary-forking model (i.e., implementations in existing parallel languages) incur an Omega(logn) overhead in span. This paper explores techniques for designing optimal algorithms when limited to binary forking and assuming asynchrony. All algorithms described in this paper are the first algorithms with optimal work and span in the binary-forking model. Most of the algorithms are simple. Many are randomized.
The paper presents an experimental parallel metaheuristics framework for solving combinatorial optimization of grand challenge scientific and engineering problems that has been developed based on biologically inspired...
详细信息
ISBN:
(纸本)9781424448814
The paper presents an experimental parallel metaheuristics framework for solving combinatorial optimization of grand challenge scientific and engineering problems that has been developed based on biologically inspired metaheuristics, modeling of social behavior and cultural evolution as well as trajectory-based methods. A prototype class library for metaheuristics is developed and several parallel computational models of metaheuristics for solving combinatorial optimization problems are implemented. The library contains implementations in C++ of parallel computational models for both population based and trajectory based metaheuristics. Some improvements in the parallelmodels are suggested and implemented in the library PARMETAOPT. The influence of the parameters on the performance of some of the parallel algorithms is analyzed using the developed parallel metaheuristics framework and performance tuning rules are suggested. The implementations are based on message passing with MPICH2 for the flat programming models and OpenMP API is used for multithreading in the hybrid programming models.
Clusters of symmetric multiprocessor nodes (SMP clusters) are one of the most important parallel architectures at the moment. The architecture consists of shared-memory nodes with multiple processors and a fast interc...
详细信息
Clusters of symmetric multiprocessor nodes (SMP clusters) are one of the most important parallel architectures at the moment. The architecture consists of shared-memory nodes with multiple processors and a fast interconnection network between the nodes. New programming models try to exploit this architecture by using threads in the nodes and using message-passing-libraries for inter-node communication. In order to develop efficient algorithms, it is necessary to consider the hybrid nature of the architecture and of the programming models. We present the κNUMA-model and a methodology that build a good base for designing efficient algorithms for SMP clusters. The κNUMA-model is a computationalmodel that extends the bulk-synchronous parallel (BSP) model with the characteristics of SMP clusters and new hybrid programming models. The κNUMA-methodology suggests to develop efficient overall algorithms by developing efficient algorithms for each level in the hierarchy. We use the problem of personalized one-to-all-broadcast and the dense matrix-vector-multiplication for the presentation. The theoretical results of the analysis of the dense matrix-vector-multiplication are verified practically. We show results of experiments, made on a Linux-cluster of dual Pentium-III nodes.
暂无评论