A novel parallel architecture for estimating computationally intensive 4th-order cumulants is presented. Different from most systolic array implementations, a MIMD array processor is used to efficiently compute the cu...
详细信息
In this work we describe several portable sequential and parallelalgorithms for solving the inverse eigenproblem for Real Symmetric Toeplitz matrices. the algorithms are based on Newton’s method (and some variations...
详细信息
the proceedings contain 130 papers. the special focus in this conference is on parallelprocessing. the topics include: Software component technology for high performance parallel and grid computing;connecting computa...
ISBN:
(纸本)3540424954
the proceedings contain 130 papers. the special focus in this conference is on parallelprocessing. the topics include: Software component technology for high performance parallel and grid computing;connecting computational requirements with computing resources;a tool for binding to threads processors;a distributed object infrastructure for interaction and steering;optimal polling for latency-throughput tradeoffs in queue-based network interfaces for clusters;performance prediction of data-dependent task parallel programs;the hardware performance monitor toolkit;via communication performance on a gigabit Ethernet cluster;group-based performance analysis for multithreaded SMP cluster applications;exploiting unused time slots in list scheduling considering communication contention;an evaluation of partitioners for parallel SAMR applications;load balancing on networks with dynamically changing topology;approximation algorithms for scheduling independent malleable tasks;load redundancy elimination on executable code;using a swap instruction to coalesce loads and stores;data-parallel compiler support for multipartitioning;parallel and distributed databases, data mining and knowledge discovery;an experimental performance evaluation of join algorithms for parallel object databases;a classification of skew effects in parallel database systems;experiments in parallel clustering with DBSCAN;analysis of the cycle structure of permutations;scanning biosequence databases on a hybrid parallel architecture;experiences in using MPI-Io on top of GPFS for the IFS weather forecast code;improving conditional branch prediction on speculative multithreading architectures;performances of a dynamic threads scheduler and self-stabilizing neighborhood unique naming under unfair scheduler.
A network of workstations is a cost-effective alternative to parallel database systems. this paper studies the performance of hash join operations on a Pentium-based network of workstations platform. Specifically, we ...
详细信息
ISBN:
(纸本)1880843390
A network of workstations is a cost-effective alternative to parallel database systems. this paper studies the performance of hash join operations on a Pentium-based network of workstations platform. Specifically, we study performance of intraquery (pipelined) and intra-operation (partitioned) parallelism in a centralized query processing architecture. We also study the impact of the network bandwidth on the performance of the join algorithms. For this purpose, in our experimental study, we use a 10 Mbps Ethernet and a 100 Mbit fast Ethernet. Our performance results suggest that pipelined hash joins exhibit significant improvement in response time for complex queries and for queries that involve large input relations.
We have developed and evaluated two parallelization schemes for a tree-based k-means clustering method on shared memory machines. One scheme is to partition the pattern space across processors. We have determined that...
详细信息
the most demand ing image processing applications require real time processing, often using special purpose hardware. the work herein presentedrefers to the application of cluster computing for off line image processi...
详细信息
Retrograde analysis is an efficient exhaustive search method. It is a powerful tool that can be used in solving problems where end states have known values but starting states do not. It has been widely used to solve ...
详细信息
Retrograde analysis is an efficient exhaustive search method. It is a powerful tool that can be used in solving problems where end states have known values but starting states do not. It has been widely used to solve mathematically-precise games such as chess endgames, and is potentially usable in energy-minimization problems. With increasing computing power, both in speed and storage capacity, retrograde analysis will become more and more useful. this paper looks at successful applications to games, the challenges ahead and the modifications that are required to utilize distributed hardware. the power and the usefulness of retrograde analysis are still limited by the computing resources one has access to. Today, the best sequential retrograde algorithms are capable of solving problems with about 10/sup 9/ states in a few hours on a standard personal computer. Bigger problems need more powerful computers, or take much longer to solve, or are simply out of the reach of today's technologies. Introducing parallelism to retrograde analysis is a natural way to attack the bigger problems. there are today three main architectures available for doing parallel retrograde analysis, namely symmetric multiprocessor (SMP) systems, high-speed network-based distributed systems and Internet-based distributed systems. In this paper, we discuss some of the key issues in doing parallel retrograde analysis on these different architectures. Technical challenges are addressed in detail, as well as some examples and proposals. these examples and proposals are drawn from various board games, but the ideas can be applied to other problem domains.
In this paper, we propose a new method for task decomposition based on output parallelism, in order to find the appropriate architectures for large-scale real-world problems automatically and efficiently. By using thi...
详细信息
In this paper, we propose a new method for task decomposition based on output parallelism, in order to find the appropriate architectures for large-scale real-world problems automatically and efficiently. By using this method, a problem can be divided flexibly into several sub-problems as chosen, each of which is composed of the whole input vector and a fraction of the output vector. Each module (for each subproblem) is responsible for producing a fraction of the output vector of the original problem. this way, the hidden structure for the original problem's output units is decoupled. these modules can be grown and trained in sequence or in parallel. Incorporated withthe constructive learning algorithm, our method does not require excessive computation and any prior knowledge concerning decomposition. the feasibility of output parallelism is analyzed and proved. Several benchmarks are implemented to test the validity of this method. their results show that this method can reduce computation time, increase learning speed, and improve generalization accuracy for both classification and regression problems.
In this paper we present a parallel algorithm that solves the Toeplitz Least Squares Problem. We exploit the displacement structure of Toeplitz matrices and parallelize the Generalized Schur method. the stability prob...
详细信息
In order to fulfil real time signal processing tasks such as clutter rejection, moving target detection (MTD) and constant false alarm rate (CFAR) control in airborne radar, an airborne radar parallel signal processin...
详细信息
ISBN:
(纸本)0780370007
In order to fulfil real time signal processing tasks such as clutter rejection, moving target detection (MTD) and constant false alarm rate (CFAR) control in airborne radar, an airborne radar parallel signal processing system (ARPS2) is proposed with DSP chips as its kernel processing nodes. the DSP chips are used withparallel architecture. Each node has its private input and output memory. It adopts several parallel techniques, such as parallel storage, parallelprocessing, parallel code loading and parallel data organization to achieve high efficiency. It has a simple structure, excellent flexibility and easiness in developing. ARPS2 is going to be applied to an airborne radar. It can also be applied to perform high-speed real time signal processingalgorithms in other kinds of radar.
暂无评论