检索结果-内蒙古大学图书馆

Annual ACM symposium on parallel algorithms and architectures 2000年 176-185页

作者： Thulasiraman, Parimala Theobald, Kevin B. Khokhar, Ashfaq A. Gao, Guang R. Univ of Delaware Newark DE United States

In this paper we present fine-grained multithreaded algorithms and implementations for the Fast Fourier Transform (FFT) problem. The FFT problem has been formulated using two distinct approaches based on the dataflow concepts. The first approach, referred to as the receiver-initiated algorithm, realizes the FFT iterations as a parent-child relationship while fully exploiting the underlying parallelism. The second approach, referred to as the sender-initiated algorithm, follows a data-flow model based on the producer-consumer style of programming and can be adopted to different architectural parameters for achieving high performance. The implementations of the proposed algorithms have been carried out on the EARTH (Efficient Architecture for Running THreads) platform. For both the algorithms, we analyze the ratio of remote vs local threads and study its impact on the experimental results. Our implementation results show that for certain block sizes on fixed problem size and machine size, the receiver-initiated approach performs better than the sender-initiated approach. For large number of processors, both the algorithms perform well, yielding execution times of only 10 msec for an input of 16 K data points on a 64 processor machine, assuming each processor running at 140 MHz clock speed.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

A comparative study of the NAS MG benchmark across parallel languages and architectures

A comparative study of the NAS MG benchmark across parallel ...

引用

2000 ACM/IEEE Conference on Supercomputing, SC 2000

作者： Chamberlain, Bradford L. Deitz, Steven J. Snyder, Lawrence University of Washington SeattleWA98195-2350 United States

ISBN: (纸本)0780398025

Hierarchical algorithms such as multigrid applications form an important cornerstone for scientific computing. In this study, we take a first step toward evaluating parallel language support for hierarchical applications by comparing implementations of the NAS MG benchmark in several parallel programming languages: Co-Array Fortran, High Performance Fortran, Single Assignment C, and ZPL. We evaluate each language in terms of its portability, its performance, and its ability to express the algorithm clearly and concisely. Experimental platforms include the Cray T3E, IBM SP, SGI Origin, Sun Enterprise 5500, and a high-performance Linux cluster. Our findings indicate that while it is possible to achieve good portability, performance, and expressiveness, most languages currently fall short in at least one of these areas. We find a strong correlation between expressiveness and a language's support for a global view of computation, and we identify key factors for achieving portable performance in multigrid applications. © 2000 IEEE.

关键词： Benchmarking

来源：评论

学校读者我要写书评

暂无评论

Fast stable matching algorithm using asynchronous parallel programming model

Fast stable matching algorithm using asynchronous parallel p...

引用

Computer architectures for Machine Perception (CAMP)

作者： F. Verdier A. Merigot B. Zavidovique ETIS-Equipe Traitement des Image et du Signal Université de Cergy Pontoise Cergy-Pontoise France Institut d'Electronique Fondamentale Université PARIS Sud Orsay France

This paper presents some results of programming efficient matching algorithms on a new asynchronous parallel programming model. Matching algorithms are widely used in image processing when considering high-level treatments. Pattern analysis, database search, 2D and 3D reconstruction all need matching algorithms to perform. Experiments we did were mainly oriented towards a particular matching problem: the stable marriage algorithm. Different implementations of this algorithm have been done on a massively parallel asynchronous model. This model relies on a network of asynchronously communicating processors leading to very fast SIMD treatments. The asynchronous model and implementations of the matching algorithm are presented. An example of image processing problem is also used for illustration purpose and supports the architectural discussion and results.

关键词： parallel programming Image processing Pattern matching Pattern analysis Image databases parallel machines Optimal matching Computer networks Asynchronous communication Arithmetic

来源：评论

学校读者我要写书评

暂无评论

A parallel tabu search and its hybridization with genetic algorithms

A parallel tabu search and its hybridization with genetic al...

引用

international symposium on parallel architectures, algorithms and Networks (ISPAN)

作者： T. Matsumura M. Nakamura S. Tamaki K. Onaga Department of Information Engineering University of Ryukyus Okinawa Japan Okinawa Research Center Telecommunications Advancement Organization of Japan Naha Okinawa Japan

ISBN: (纸本)0769509363

The paper proposes two parallel meta-heuristics. One is a cooperative parallel tabu search which incorporates historical information exchange among processors in addition to its own searching of each processor. The other is a cooperative parallel search between genetic algorithm and tabu search processes. Through computational experiment, we observe the improvement of solutions by our proposed method.

关键词： Genetic algorithms Polynomials parallel processing Message passing Telecommunication computing Design optimization Processor scheduling Circuit synthesis Computational modeling Simulated annealing

来源：评论

学校读者我要写书评

暂无评论

A distributed multi-storage resource architecture and I/O performance prediction for scientific computing 9

A distributed multi-storage resource architecture and I/O pe...

引用

9th IEEE international symposium on High Performance Distributed Computing

作者： Shen, XH Choudhary, A Northwestern Univ Dept Elect & Comp Engn Ctr Parallel & Distributed Comp Evanston IL 60208 USA

ISBN: (纸本)0769507840

I/O intensive applications have posed great challenges to computational scientists. A major problem of these ap plications is that users have to sacrifice performance requirement in order to satisfy storage capacity requirement bl a conventional computing environment. Further performance improvement is impeded by the physical nature of these storage media evert state-of-the-art I/O optimizations are employed. In this paper we present a distributed multi-storage resource architecture that carr satisfy both performance and capacity requirements by employing multiple storage resources. Compared to traditional single storage resource architecture, our architecture provides a more flexible and reliable computing environment. It can bring new opportunities for high performance computing as well as inherit state-of-the-art I/O optimization approaches that have already been developed. We also develop an Application programming Interface (API) that provides transparent management and access to various storage resources in our computing environment. As I/O usually dominates the performance in I/O intensive applications, we establish an I/O performance prediction mechanism which consists of a performance database and a prediction algorithm to help users better evaluate and schedule their applications. A tool is also developed to help users automatically generate the performance database. The experiments show that our multi-storage resource architecture is a promising platform for high performance distributed computing.

关键词： Computer architecture Computer interfaces Databases Distributed computing Environmental management Impedance Prediction algorithms Processor scheduling Resource management Scheduling algorithm

来源：评论

学校读者我要写书评

暂无评论

Calculational design of special purpose parallel algorithms

Calculational design of special purpose parallel algorithms

引用

IEEE international Conference on Electronics, Circuits and Systems (ICECS)

作者： A.E. Abdallah J. Hawkins South Bank University London UK

ISBN: (纸本)0780365429

This paper adopts a transformational programming approach for deriving massively parallel algorithms from functional specifications. It gives a brief description of a framework for relating key higher order functions such as map, reduce, and scan with communicating processes with different configurations. The parallelisation of many interesting functional algorithms can then be systematically synthesized by combining "off the shelf" parallel implementations of instances of these higher order functions. Efficiency in the final message-passing algorithms is achieved by exploiting data parallelism, for generating the intermediate results in parallel; and functional parallelism, for processing intermediate results in stages such that the output of one stage is simultaneously input to the next one. This approach is illustrated through a case study for testing whether all the elements of a given list are distinct. Bird-Meertens formalism is used to concisely carry out algebraic transformations.

关键词： Algorithm design and analysis parallel algorithms Calculus Functional programming parallel programming Testing Skeleton parallel architectures Systolic arrays Field programmable gate arrays

来源：评论

学校读者我要写书评

暂无评论

parallel performance study of Monte Carlo photon transport code on shared-, distributed-, and distributed-shared-memory architectures

Parallel performance study of Monte Carlo photon transport c...

引用

international symposium on parallel and Distributed Processing (IPDPS)

作者： A. Majumdar San Diego Supercomputer Center University of California San Diego La Jolla CA USA

We have parallelized a Monte Carlo photon transport algorithm. Three different parallel versions of the algorithm were developed. The first version is for the Tera Multi-Threaded Architecture (MTA) and uses Tera specific directives. The second version, which uses MPI library calls, has been implemented on both the CRAY T3E and the 8-way SMP IBM SP with Power3 processors. The third version is a hybrid MPI-OpenMP implementation and is used on the SMP IBM SP. This version uses MPI to communicate between nodes and OpenMP to perform shared memory operations among processors within a node. We explain the three different parallelization approaches and present parallel performance results of these three parallel implementations on three different machines. We observe near perfect speedup for the three versions on the three architectures. The results on the SMP IBM SP suggest that the hybrid MPI-OpenMP programming is suitable for SMP type machines.

关键词： Monte Carlo methods parallel architectures Message passing Plasma simulation Plasma temperature Computer architecture parallel programming Plasma confinement Plasma density Plasma transport processes

来源：评论

学校读者我要写书评

暂无评论

Preemptive parallel task scheduling in o(N) + poly(m) time 11th

Preemptive parallel task scheduling in o(N) + poly(m) time

引用

11th Annual international symposium on algorithms and Computation, ISAAC 2000

作者： Jansen, Klaus Porkolab, Lorant Institut für Informatik und praktische Mathematik Christian Albrechts University of Kiel Germany Department of Computing Imperial College London United Kingdom

ISBN: (纸本)3540412557

We study the problem of scheduling a set of n independent parallel tasks on m processors, where in addition to the processing time there is a size associated with each task indicating that the task can be processed on any subset of processors of the given size. Based on a linear programming formulation, we propose an algorithm for computing a preemptive schedule with minimum makespan, and show that the running time of the algorithm depends polynomially on m and only linearly on n. Thus for any fixed m, an optimal preemptive schedule can be computed in O(n) time. We also present extensions of this approach to other (more general) scheduling problems with malleable tasks, release times, due dates and maximum lateness minimization. © Springer-Verlag Berlin Heidelberg 2000.

关键词： Scheduling

来源：评论

学校读者我要写书评

暂无评论

NP-completeness of the bulk synchronous task scheduling problem and its approximation algorithm

NP-completeness of the bulk synchronous task scheduling prob...

引用

international symposium on parallel architectures, algorithms and Networks (ISPAN)

作者： N. Fujimoto K. Hagihara Graduate School of Engineering Science Osaka University Toyonaka Japan

ISBN: (纸本)0769509363

The bulk synchronous task scheduling problem (BSSP) is known as an effective task scheduling problem for distributed-memory machines, but the time complexity of BSSP is unknown. This paper presents a proof of NP-completeness of BSSP even in the case of unit time tasks and positive integer constant communication delays. This paper also gives an approximation algorithm for BSSP in several restricted cases.

关键词： Approximation algorithms Scheduling algorithm Processor scheduling Delay effects Concurrent computing TV parallel machines Software packages Packaging machines Coprocessors

来源：评论

学校读者我要写书评

暂无评论

Multicomputer algorithms for wavelet packet image decomposition

Proceedings of the International Parallel Processing Symposi...

引用

Proceedings of the international parallel Processing symposium, IPPS 2000年 793-798页

作者： Feil, Manfred Uhl, Andreas Univ of Salzburg Austria

In this work we describe and analyze algorithms for 2-D wavelet packet decomposition for MIMD distributed memory architectures. We discuss two different approaches: On the one hand algorithms generating the entire wavelet packet subband structure (as required for adaptive applications), on the other hand algorithms generating the lowest subband level only (as required for numerical applications). We investigate several optimizations and generalizations of corresponding message passing algorithms and finally compare the results obtained on a Cray T3D and a Parsytec GCel 1024.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：