In several adaptive array application areas the Gaussian distribution has not proven to be an accurate model of the measured data. Nevertheless, Gaussian based processors have demonstrated robust performance in spite ...
详细信息
In several adaptive array application areas the Gaussian distribution has not proven to be an accurate model of the measured data. Nevertheless, Gaussian based processors have demonstrated robust performance in spite of this statistical mismatch. A need therefore exists for the consideration of (i) problem reformulation and (ii) performance analysis in non-Gaussian environments. the theory of complex multivariate elliptically contoured (MEC) distributions provides an attractive theoretic framework for these considerations especially in the adaptive array setting. We replace the Gaussian data assumption with one of MEC distributed and reexamine the optimality and performance of widely used adaptive detection and beamforming structures.
the paper proposes a linear, deterministic, logical time model for distributed systems. the authors give an account of causality within distributed systems which undergirds the time model. they discuss some advantages...
详细信息
the paper proposes a linear, deterministic, logical time model for distributed systems. the authors give an account of causality within distributed systems which undergirds the time model. they discuss some advantages for the application programmer in using the time model.
the computational requirements for an adaptive solution of unsteady problems change as the simulation progresses. this causes workload imbalance among processors on a parallel machine which, in turn, requires signific...
详细信息
the computational requirements for an adaptive solution of unsteady problems change as the simulation progresses. this causes workload imbalance among processors on a parallel machine which, in turn, requires significant data movement at runtime. We present a dynamic load-balancing framework, called JOVE, that balances the workload across all processors with a global view each time the computational mesh is adapted. JOVE has been implemented on an SP2 in MPI for portability. Experimental results for two model meshes demonstrate that mesh adaption with load balancing gives more than a sixfold improvement over one without load balancing. Furthermore, JOVE gives a 24-fold speedup on 64 processors compared to sequential execution.
Most static scheduling algorithms that schedule parallel programs represented by directed acyclic graphs (DAGs) are sequential. this paper discusses the essential issues on parallelization of static scheduling algorit...
详细信息
ISBN:
(纸本)0818676833
Most static scheduling algorithms that schedule parallel programs represented by directed acyclic graphs (DAGs) are sequential. this paper discusses the essential issues on parallelization of static scheduling algorithms. An efficient parallel scheduling algorithm, the HPMCP algorithm, is proposed. It produces high-quality scheduling and is much faster than existing algorithms.
this paper addresses the problem of analyzing the performance of parallel algorithms for the training procedure of a neural network based fingerprint image comparison (FIC) system. the target architecture is assumed t...
详细信息
this paper addresses the problem of analyzing the performance of parallel algorithms for the training procedure of a neural network based fingerprint image comparison (FIC) system. the target architecture is assumed to be a coarse-grain distributed memory parallel architecture. Two types of parallelism: node parallelism and training set parallelism (TSP) are investigated. these algorithms are implemented on a 32 node CM-5. theoretical analysis and experimental results comparing the performance of these algorithms are presented.
the authors empirically analyze and compare two distributed low-overhead policies for scheduling dynamic tree-structured computations on rings of identical PEs. the experiments show that both policies give significant...
详细信息
the authors empirically analyze and compare two distributed low-overhead policies for scheduling dynamic tree-structured computations on rings of identical PEs. the experiments show that both policies give significant parallel speedup on large classes of computations, and that one yields almost optimal speedup on moderate size rings. they believe that the methodology of experiment design and analysis will prove useful in other such studies.
Efficient divide and conquer algorithms can be mapped to a parallel computer using either task parallelism or data parallelism. the former involves significant data movement and the latter can lead to severe load imba...
详细信息
Efficient divide and conquer algorithms can be mapped to a parallel computer using either task parallelism or data parallelism. the former involves significant data movement and the latter can lead to severe load imbalances. A new strategy is proposed, which the authors call concatenated parallelism, for efficient parallel solution of problems resulting in divide and conquer trees. their strategy is useful when the communication time due to data movement in distributing the subproblems is significant in comparison to the time required for subdivision.
Recently, there has been growing interest in simultaneous exploitation of task and data parallelism in scientific applications and in compiler and runtime support of this combined form of parallelism. In this paper we...
详细信息
Recently, there has been growing interest in simultaneous exploitation of task and data parallelism in scientific applications and in compiler and runtime support of this combined form of parallelism. In this paper we report on the integration of task and data parallelism on an important irregular application from the VLSI computer-aided design field, namely VLSI layout verification. We report on the implementation, and experimental results of our study on a SUN Sparcserver 1000 shared memory multiprocessor, a CM-5 distributed memory multiprocessor.
the author considers the problem of designing efficient parallel algorithms for summing and prefix summing. the author presents optimal algorithms for summing on a latency-dependent distributed-memory model and shows ...
详细信息
the author considers the problem of designing efficient parallel algorithms for summing and prefix summing. the author presents optimal algorithms for summing on a latency-dependent distributed-memory model and shows that any optimal summing algorithm must have an inherent structure. Moreover, the author presents optimal or near-optimal algorithms for prefix summing for both non-commutative and commutative binary operators. Furthermore, the author shows that the optimal algorithms for prefix summing for these two types of operators are not equivalent.
We propose a completely general, informed randomized dynamic load balancing method called random seeking (RS) suitable for parallel algorithms with characteristics found in many search algorithms used in artificial in...
详细信息
We propose a completely general, informed randomized dynamic load balancing method called random seeking (RS) suitable for parallel algorithms with characteristics found in many search algorithms used in artificial intelligence and operations research and many divide-and-conquer algorithms. In it, source processors randomly seek out sink processors for load balancing by flinging 'probe' messages. these probes not only locate sinks, but also collect load distribution information which is used to efficiently regulate load balancing activities. We empirically compare RS with a well-known randomized dynamic load balancing method, the random communication (RC) strategy, by using them in parallel best-first branch-and-bound algorithms on up to 512 processors of an nCUBE2 multicomputer. We find that the RC execution times are more than those of RS by 8-67% when used to perform combined dynamic quantitative and qualitative load balancing, and by 5-74% when used to perform just dynamic quantitative load balancing.
暂无评论