The proceedings contain 52 papers. The topics discussed include: performance and reliability evaluations on stochastic activity networks;multi-path position-based routing in mobile ad-hoc networks;PARA-SNOR : a multi-...
ISBN:
(纸本)9780889868113
The proceedings contain 52 papers. The topics discussed include: performance and reliability evaluations on stochastic activity networks;multi-path position-based routing in mobile ad-hoc networks;PARA-SNOR : a multi-thread snort on multi-core ia platform;on the surface area of the alternating group networks;a programming model for high-performance adaptive applications on pervasive mobile grids;dynamic P2P topology management for scalable H.264 multiple-description coded video streaming;a new protocol to optimize the degree of concurrency in object-oriented databases;highly parallel multi-dimensional fast fourier transform on fine- and coarse-grained many-core approaches;cache-based bounds checking for multi-threaded c programs;design and evaluation of a user-oriented availability benchmark for distributed file systems;and partitioning strategies: spatiotemporal patterns of program decomposition.
A speech recognition front-end is a digital signal processing device used to transform an audio signal into feature vectors used for Automatic Speech Recognition or storage of semantic audio information. The complete ...
详细信息
ISBN:
(纸本)9780889868113
A speech recognition front-end is a digital signal processing device used to transform an audio signal into feature vectors used for Automatic Speech Recognition or storage of semantic audio information. The complete implementation of this device does not fit in the fabric of the FPGA of the used development board. Exploiting the inherent parallelism seen in the design of the device, the redesigning using a developed library of algorithmic skeletons and the use of dynamic partial reconfiguration have made possible to fit the device into the used FPGA.
A monetary network cost issue in a parallelcomputing network with a homogeneous single-level tree topology is discussed. The monetary network cost, which is linearly dependent on the amount of divisible workload, is ...
详细信息
ISBN:
(纸本)9780889868113
A monetary network cost issue in a parallelcomputing network with a homogeneous single-level tree topology is discussed. The monetary network cost, which is linearly dependent on the amount of divisible workload, is composed of a communication cost and a computing cost. Through mathematical analysis for the monetary network cost in two different load distribution strategies (sequential distribution and simultaneous distribution), the issue of the relationship between the monetary network cost and ratio of network speed parameters is studied comprehensively. By introducing a new parameter, cost efficiency, a numerical network model suited for cost efficient parallel processing is examined. Simulation results yield insights for trends of network performance against network cost. The numerical analysis and simulation works here are worth being highlighted for high performance parallelcomputing networks under limited network cost.
This paper considers solving consensus in an n-process distributed system where the processes are pairwise connected by reliable links and up to f i1-bisource is enough for solving consensus. That is, if the links bet...
详细信息
ISBN:
(纸本)9780889868113
This paper considers solving consensus in an n-process distributed system where the processes are pairwise connected by reliable links and up to f i1-bisource is enough for solving consensus. That is, if the links between a correct process and all the other processes are eventually timely, consensus can be solved. This paper show that a }3f -bisource is also enough, so we only require that the links between some correct process and 3f (instead of all) other processes are eventually timely. The significance lies in that the degree of synchrony needed to solve consensus relies on only the resilience (that is, f), independent of the scale n of the system.
Modern science requires a close collaboration of scientists who are possibly scattered all over the world. The ongoing spreading of the internet and the emergence of grid and cloud techniques in recent years have caus...
详细信息
ISBN:
(纸本)9780889868113
Modern science requires a close collaboration of scientists who are possibly scattered all over the world. The ongoing spreading of the internet and the emergence of grid and cloud techniques in recent years have caused intensified efforts in the development of infrastructures that are suitable to enhance a world-wide collaboration of researchers. The term 'eScience' designates a scientific paradigm which has the primary goal to employ a shared digital infrastructure to improve the collaboration of researchers in the core areas of science. Nevertheless, a mere interconnection of hardware resources, programs, and data would be insufficient to achieve this aim. In this paper we will propose the concept of Shared Workspaces which will simplify the worldwide interdisciplinary collaboration. A platform based on this concept implements a controlled and secure communication, facilitates a connection with high performance computers, and enables the exchange and archiving of programs, data, communications and (intermediate) results. It also allows a connection to legacy projects and repositories.
Biological ants organize themselves into forager groups that converge to shortest paths to and from food sources. This has motivated development of a large class of biologically inspired agent-based graph search techn...
详细信息
ISBN:
(纸本)9780889868113
Biological ants organize themselves into forager groups that converge to shortest paths to and from food sources. This has motivated development of a large class of biologically inspired agent-based graph search techniques, called Ant Colony Optimization, to solve diverse combinatorial problems. Our approach to parallel graph search uses multiple ant agent populations distributed across processors and clustered computers to solve largescale graph search problems. We discuss our implementation using the NIst Data Flow System II, and show good scalability of our parallel search algorithm.
A recent study characterizing failures in computer networks shows that transient single element (node/link) failures are the dominant failures in large communication networks like the Internet. Thus, having the routin...
详细信息
ISBN:
(纸本)9780889868113
A recent study characterizing failures in computer networks shows that transient single element (node/link) failures are the dominant failures in large communication networks like the Internet. Thus, having the routing paths globally recomputed on a failure does not pay off since the failed element recovers fairly quickly, and the recomputed routing paths need to be discarded. In this paper, we present the firstdistributed algorithm that computes the alternate paths required by some proactive recovery schemes for handling transient failures. Our algorithm computes paths that avoid a failed node, and provides an alternate path to a particular destination from an upstream neighbor of the failed node. With minor modifications, we can have the algorithm compute alternate paths that avoid a failed link as well. To the best of our knowledge all previous algorithms proposed for computing alternate paths are centralized, and need complete information of the network graph as input to the algorithm.
The progress of semiconductor technology enables to implement a large system to only one chip, and physical problems such as IR-drop, electro migration, etc. become serious problems for VLSI circuit. To alleviate thes...
详细信息
ISBN:
(纸本)9780889868113
The progress of semiconductor technology enables to implement a large system to only one chip, and physical problems such as IR-drop, electro migration, etc. become serious problems for VLSI circuit. To alleviate these problems, VLSI circuit optimization which includes fast and accurate circuit simulator is necessary. This paper describes fast and accurate parallel transient simulator for RLC power grid circuit by GPU (Graphics Processing Unit) using CUDA. The GPU is a processor specified for graphic processing, and its architecture is quite unique. This paper proposes transient simulation method considering the feature of GPU architecture. Experimental results show that proposed transient simulator can achieve 173 times faster simulation than CPU, and the simulation error between proposed simulator and simulator executed on CPU is under 0.01%.
This work presents a parallel implementation of the implicitly restarted Arnoldi/Lanczos method for the solution of eigenproblems approximated by the finite element method. The implicitly restarted Arnoldi/Lanczos use...
详细信息
ISBN:
(纸本)9780889868113
This work presents a parallel implementation of the implicitly restarted Arnoldi/Lanczos method for the solution of eigenproblems approximated by the finite element method. The implicitly restarted Arnoldi/Lanczos uses a restart scheme in order to improve the convergence of the desired portion of the spectrum, maintaining the orthogonality of the Krylov basis. The presented implementation is suitable for distributed memory architectures, specially PC clusters. In the parallel solution, a subdomain by subdomain approach was implemented and overlapping and non-overlapping mesh partitions were used. Compressed data structures in the formats CSRC and CSRC/CSR were used to store the global matrices coefficients. The parallelization of numerical linear algebra operations presented in both Krylov and implicitly restarted methods are discussed. In order to point out the efficiency and applicability of the proposed algorithms, a numerical example is shown.
We are eloping a task parallel script language MegaScript for executing large-scale workflows on widely distributed heterogeneous environments. For efficient execution of this language, we have proposed a multi-layere...
详细信息
ISBN:
(纸本)9780889868113
We are eloping a task parallel script language MegaScript for executing large-scale workflows on widely distributed heterogeneous environments. For efficient execution of this language, we have proposed a multi-layered task scheduling scheme: the upper layer making rough global scheduling, and the lower layer making precise local scheduling. However, the cost for local scheduling is still a serious issue. Therefore, we propose an adaptive scheduling scheme appropriate to this kind of workflow. The scheme adaptively switches DAG scheduling and independent task scheduling, reducing the scheduling cost for independent task sets in the workflow. The results of our evaluation show our scheme achieved a 540 times speedup of total scheduling time when each host executes 100 tasks on average without serious extension of the makespan less than 7%.
暂无评论