检索结果-内蒙古大学图书馆

Scalable parallel computational geometry for coarse grained multicomputers

international JOURNAL OF COMPUTATIONAL GEOMETRY & APPLICATIONS 1996年第3期6卷 379-400页

作者： Dehne, F Fabri, A RauChaplin, A CARLETON UNIV SCH COMP SCIOTTAWAON K1S 5B6CANADA INRIA F-06902 SOPHIA ANTIPOLISFRANCE

We study scalable parallel computational geometry algorithms for the coarse grained multicomputer model: p processors solving a problem on n data items, were each processor has O(n/p) much greater than O(1) local memory and all processors are connected via some arbitrary interconnection network (e.g. mesh, hypercube, fat tree). We present O(T-sequential/p + T-s(n,p)) time scalable parallel algorithms for several computational geometry problems. T-s(n,p) refers to the time of a global sort operation. Our results are independent of the multicomputer's interconnection network. their time complexities become optimal when T-sequential/p dominates T-s(n,p) or when T-s(n,p) is optimal. this is the case for several standard architectures, including meshes and hypercubes, and a wide range of ratios n/p that include many of the currently available machine configurations. Our methods also have some important practical advantages: For interprocessor communication, they use only a small fixed number of one global routing operation, global sort, and all other programming is in the sequential domain. Furthermore, our algorithms use only a small number of very large messages, which greatly reduces the overhead for the communication protocol between processors. (Note however, that our time complexities account for the lengths of messages.) Experiments show that our methods are easy to implement and give good timing results.

关键词： computational geometry parallel algorithms scalability

来源：评论

学校读者我要写书评

暂无评论

parallel architecture dedicated to connected component analysis

Parallel architecture dedicated to connected component analy...

引用

international conference on Pattern Recognition

作者： E. Mozef S. Weber J. Jaber E. Tisserand Laboratoire d'hstrumentation Electronique de Nancy L.I.E.N Université de Nancy France

this paper presents the design of a dedicated parallel architecture for connected component analysis. Categorized in one-dimensional array processors, for an image of n/spl times/n pixels, the proposed architecture has n-1 linear processing elements (PEs), n/sup 2/ CAM memory modules, and a tree structure of (n/2)-1 switches allowing communication through the global bus in O(log n) unit of propagation time. Well suited for low and intermediate-level vision, this architecture allows sequential processing through its line structure which is perfectly adapted to real time image analysis from any interlaced-mode video signal. this paper presents the algorithms for connected component labeling, area and perimeter determination, all of which are in O(n log n). the performance of the proposed architecture is compared with another architecture types. the simulation results, the possibility of implementation, and future work are discussed.

关键词： parallel architectures Pixel Computer aided manufacturing CADCAM Tree data structures Switches Communication switching Image analysis Signal processing Labeling

来源：评论

学校读者我要写书评

暂无评论

An incremental benchmark suite for performance tuning of parallel discrete event simulation

An incremental benchmark suite for performance tuning of par...

引用

Annual Hawaii international conference on System Sciences (HICSS)

作者： R. Ronngren L. Barriga R. Ayani Department of Teleinformatics Royal Institute of Technology Stockholm Sweden

parallel simulation has been an active research area for more than a decade, and several parallel simulation algorithms have been proposed. To evaluate parallel simulation environments, there is a need for a common benchmark suite. Such benchmarks should allow the designer of simulation kernels to: (i) evaluate how efficiently the simulation kernel runs on specific architectures; and (ii) evaluate how simulation problems scale on the kernel. A vast majority of benchmarks suggested in the literature focus on the latter problem. the first requirement is however equally important, as such benchmarks are needed to implement efficient simulation systems. In this report, we advocate an incremental benchmark methodology which is primarily intended for performance tuning. the benchmark suite is based on a small set of ping models with which it is possible to effectively isolate and estimate various overheads, contention and latencies encountered in simulation kernels. the use of the benchmark suite is illustrated by its application to performance tuning and evaluation of a time warp kernel.

关键词： Discrete event simulation Computational modeling Kernel Computer simulation Computer architecture Performance analysis Communication networks parallel processing Delay Application software

来源：评论

学校读者我要写书评

暂无评论

parallel algorithms for logic synthesis using the MIS approach

Parallel algorithms for logic synthesis using the MIS approa...

引用

Proceedings of the IEEE 9th international parallel processing Symposium

作者： De, Kaushik Chandy, John A. Roy, Sumit Parkes, Steven Banerjee, Prithviraj LSI Logic Corp Milpitas United States

Combinational logic synthesis is a very important but computationally expensive phase of VLSI system design. parallel processing offers an attractive solution to reduce this design cycle time. In this paper, we describe ProperMIS, a portable parallel algorithm for logic synthesis based on the MIS multi-level logic synthesis system. As part of this work, we have developed novel parallel algorithms for the different logic transformations of the MIS system. Our algorithm uses an asynchronous message-driven computing model with no synchronizing barriers separating phases of parallel computation. the algorithm is portable across a wide variety of parallel architectures, and is built around a well-defined sequential algorithm interface, so that we can benefit from future expansion of the sequential algorithm. We present results on several MCNC and ISCAS benchmark circuits for a variety of shared memory and distributed processing architectures. Our implementation produces speedups of an average of 4 on 8 processors.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Analytical comparison of nearest neighbor algorithms for load balancing in parallel computers

Analytical comparison of nearest neighbor algorithms for loa...

引用

Proceedings of the IEEE 9th international parallel processing Symposium

作者： Xu, Chengzhong Monien, Burkhard Luling, Reinhard Lau, Francis C.M. Wayne State Univ United States

With nearest neighbor load balancing algorithms, a processor makes balancing decisions based on its local information and manages workload migrations within its neighborhood. this paper compares a couple of fairly well-known nearest neighbor algorithms, the dimension exchange and the diffusion methods and their variants in terms of their performances in both one-port and all-port communication architectures. It turns out that the dimension exchange method outperforms the diffusion method in the one-port communication model, and that the strength of the diffusion method is in asynchronous implementations in the all-port communication model. the underlying communication networks considered assume the most popular topologies, the mesh and the torus and their special cases: the hypercube and the k-ary n-cube.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Efficient routing and message bounds for optimal parallel algorithms

Efficient routing and message bounds for optimal parallel al...

引用

Proceedings of the IEEE 9th international parallel processing Symposium

作者： Deng, Xiaotie Dymond, Patrick York Univ North York Canada

the cost of interprocessor communication has a substantial impact on execution time when implementating parallel algorithms on physical parallel computers. We study these implementation costs, examining the number of inter-processor messages, the cost of routing these messages on various architectures, and the number of communication phases. We provide an improved direct routing algorithm for realizing h-relations on crossbar networks. We also introduce a round-robin message-delivery algorithm which reduces the number of times a communication link is established between a pair of processors (by delivering all messages of that phase for the pair in order without interruption.) We summarize criteria sufficient for a parallel algorithm to be implemented optimally on several common networks. We also describe a log n-phase optimal parallel list-ranking algorithm.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Highly parallel algorithm to approximate maxcut on distributed memory architectures

Highly parallel algorithm to approximate maxcut on distribut...

引用

Proceedings of the IEEE 9th international parallel processing Symposium

作者： Homer, Steven Peinado, Marcus Boston Univ Boston United States

We develop and experiment with a new parallel algorithm to approximate the maximum weight cut in a weighted undirected graph. Our implementation is based on the recent new algorithm of Goemans and Williamson for this problem. However, our work aims for an efficient, practical formulation of the algorithm with close to optimal parallelization. Our theoretical analysis and an implementation on the Connection Machine CM5 show that our parallelization achieves linear speedup. We test our implementation on several large graphs (more than 9000 vertices), particularly on large instances of the Ising model.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Impact of pipelining on SIMD architectures

Impact of pipelining on SIMD architectures

引用

Proceedings of the IEEE 9th international parallel processing Symposium

作者： Allen, James D. Schimmel, David E. Georgia Inst of Technology Atlanta United States

this paper explores the fundamental impacts of pipeline technology on massively parallel SIMD architectures. the potential for performance improvement in the instruction delivery is explored, and stall penalties associated with reduction operations are derived. Scheduling mechanisms to mitigate stall cycles are also presented. In addition, the design of pipelined processing elements is considered, and formula for stall penalties, and area costs are constructed. these results indicate a 5-10 fold improvement in SIMD performance is well within technological limits.

关键词： Pipeline processing systems

来源：评论

学校读者我要写书评

暂无评论

Object-oriented expert systems parallel knowledge processing in a COBRA-based architecture

Object-oriented expert systems parallel knowledge processing...

引用

Proceedings of the IEEE 1st international conference on algorithms and architectures for parallel processing. Part 1 (of 2)

作者： Flor, th. Eisenecker, U. Siembor, P. Technical Univ Ilmenau Ilmenau Germany

the prototype system OOXSDAR VISIS was implemented in VisualWorks/Smalltalk and Distributed Smalltalk respectively. To achieve distribution in a heterogeneous network a common object request broker architecture (CORBA)-based architecture was chosen. the architecture consists of three layers: knowledge client level, knowledge domain agent server level, and persistent knowledge storage level. the architecture is based on the semantic/presentation split of logical knowledge objects. this architecture combines the advantages of standardized communication protocols such as CORBA with the power and expressivity of OODBMS. First studies with the prototype system OOXSDAR VISIS were carried out for performance analysis. the results allowed significant improvement of the distributed inference process. Well known principles such as increasing intra-modul cohesion and minimizing inter-modul dependencies were applied for restructuring the distributed knowledge bases.

关键词： Expert systems

来源：评论

学校读者我要写书评

暂无评论

Performance evaluation of a parallel I/O architecture 95

Performance evaluation of a parallel I/O architecture

引用

9th international conference on Supercomputing, ICS 1995

作者： Baylor, Sandra Johnson Benveniste, Caroline Hsu, Yarsun IBM T.J. Watson Research Center P.O. Box 218 Yorktown HeightsNY10594 United States

ISBN: (纸本)0897917286

Presented are the results of a study conducted to evaluate the performance of parallel I/O on a massively parallel processor (MPP). the network traversal and total processing times are calculated for I/O reads and writes while varying the I/O and non-I/O request rates and the request size. Also studied is the performance impact of I/O and non-I/O traffic on each other. the results show that the system is scalable for I/O loads considered;however, the scalability is limited by I/O node saturation or considerable network contention. © 1995 ACM.

关键词： parallel architectures

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：