检索结果-内蒙古大学图书馆

3rd international symposium on parallel architectures, algorithms, and Networks, I-SPAN 1997

作者： Kashem, M.A. Zhou, Xiao Nishizeki, T. Graduate School of Information Sciences Tohoku University Sendai980-77 Japan Education Center for Information Processing Tohoku University Sendai980-77 Japan

A c-vertex-ranking of a graph G for a positive integer c is a labeling of the vertices of G with integers such that, for any label i, deletion of all vertices with labels >i leaves connected components, each having at most c vertices with label i. We present a parallel algorithm to find a c-vertex-ranking of a partial k-tree using the minimum number of ranks. This is the first parallel algorithm for c-vertex-ranking of a partial k-tree G, and takes O(log n) time using a polynomial number of processors on the common CRCW PRAM for any positive integer c and any fixed integer k, where n is the number of vertices in G. © 1997 IEEE.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Memory hierarchy design for jetpipeline: To execute scalar and vector instructions in parallel

Memory hierarchy design for jetpipeline: To execute scalar a...

引用

2nd AIZU international symposium on parallel algorithms/Architecture Synthesis

作者： Sasaki, T Nakaike, T Takano, K Katahira, M Kobayashi, H Nakamura, T Tohoku Univ Sendai-shi Japan

ISBN: (纸本)0818678704

Superscalar and VLIW architectures are based on instruction-level parallelism (ILP), which ideally achieve high performance to execute multiple instructions in parallel. However, the system performance is restricted because of the Von Neumann bottleneck. Therefore, the memory hierarchy design is very important in this kind of architecture. We have proposed a computer architecture named Jetpipeline, which can execute both vector and scalar instructions in parallel. To make full use of the computing ability of Jetpipeline, this paper presents the memory hierarchy design for Jetpipeline and evaluates the effect of the design on the system performance of Jetpipeline through simulations.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

parallel approaches for discovering functional dependencies from data for information system design recovery 3

Parallel approaches for discovering functional dependencies ...

引用

3rd international symposium on parallel architectures, algorithms, and Networks, I-SPAN 1997

作者： Lim, Wie Ming Harrison, J. Centre for Software Maintenance School of Information Technology University of Queensland QLD4072 Australia

The extraction of functional dependencies is a fundamental activity in the database design recovery process. Existing algorithms for this task are computationally expensive and appear to be impractical if applied to large legacy database instances, e.g., their performance deteriorates when number of attributes or/and instances is large. This paper presents strategies for parallelising the functional dependencies discovery process. We propose three parallel discovery models which are based on horizontal, vertical, and matrix database table slicing techniques. We exploit both program parallelism and data parallelism in our implementations. The results are discovery approaches that are more applicable to large real world databases. © 1997 IEEE.

关键词： Database systems

来源：评论

学校读者我要写书评

暂无评论

Simple randomized merge/sort on parallel disks

引用

parallel COMPUTING 1997年第4-5期23卷 601-631页

作者： Barve, RD Grove, EF Vitter, JS DUKE UNIV DEPT COMP SCIDURHAMNC 27708

We consider the problem of sorting a file of N records on the D-disk model of parallel I/O in which there are two sources of parallelism, Records are transferred to and from disk concurrently in blocks of B contiguous records. In each I/O operation, up to one block can be transferred to or from each of the D-disks in parallel, We propose a simple, efficient, randomized mergesort algorithm called SRM that uses a forecast-and-flush approach to overcome the inherent difficulties of simple merging on parallel disks, SRM exhibits a limited use of randomization and also has a useful deterministic version. Generalizing the technique of forecasting, our algorithm is able to read in, at any time, the 'right' block from any disk and using the technique of flushing, our algorithm evicts, without any I/O overhead, just the 'right' blocks from memory to make space for new ones to be read in. The disk layout of SRM is such that it enjoys perfect write parallelism, avoiding fundamental inefficiencies of previous mergesort algorithms, By analysis of generalized maximum occupancy problems we are able to derive an analytical upper bound on SRM's expected overhead valid for arbitrary inputs, The upper bound derived on expected I/O performance of SRM indicates that SRM is provably better than disk-striped mergesort (DSM) for realistic parameter values D, M and B. Average-case simulations show further improvement on the analytical upper bound. Unlike previously proposed optimal sorting algorithms, SRM outperforms DSM even when the number D of parallel disks is small.

关键词： I/O external memory disk parallel disks sorting merge/sort merging forecasting maximum occupancy disk striping

来源：评论

学校读者我要写书评

暂无评论

parallel and fault-tolerant LAN with dual communication subnetworks

Parallel and fault-tolerant LAN with dual communication subn...

引用

Proceedings of the 1997 2nd Aizu international symposium on parallel algorithms/Architecture Synthesis

作者： Huiqiang, Wang Zhenliang, Yin Dao, Wang Harbin Engineering Univ Harbin China

In this paper, a parallel and fault-tolerant LAN (P_FTLAN) with dual communication subnetworks is presented to improve LANs' reliability. Its function modes, technical characters, hardware and software architectures, and some key implementation techniques, such as logical addresses and parallel mechanisms, are described in details. Our prototype system and analyzing results suggest that the scheme presented in the paper not only provides an effective approach to high reliable LANs, but also can improve their performance greatly.

关键词： Local area networks

来源：评论

学校读者我要写书评

暂无评论

Interprocedural array remapping

Interprocedural array remapping

引用

1997 international Conference on parallel architectures and Compilation Techniques

作者： Cierniak, M Li, W Univ of Rochester Rochester United States

ISBN: (纸本)0818680903

programming languages like Fortran or C define exactly the layout of array elements in memory. Programmers often use that definition to access the same memory via variables of different types. For many real programs this practice makes changing the layout of an array impossible without violating the semantics of the program since the same memory block may be accessed via variables of different types-such accessed may now receive wrong array elements. On the other hand, changing array layout is often necessary to obtain good parallel performance or even to improve sequential performance by providing better cache locality. Our paper demonstrates that the problem of changing array layouts in the presence of multiple variables of different types accessing the same memory can be solved with our algorithms for 1) detecting overlapping arrays, 2) using procedure cloning to reduce overlapping, 3) array type coercion, and 4) code structure recovery.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Evaluating parallel logic programming systems on scalable multiprocessors

Evaluating parallel logic programming systems on scalable mu...

引用

Proceedings of the 1997 2nd international symposium on parallel Symbolic Computation, PASCO

作者： Costa, Vitor Santos Bianchini, Ricardo de Castro Dutra, Ines Porto Univ Porto Portugal

parallel logic programming systems are sophisticated examples of symbolic computing systems. They address problems such as dynamic memory allocation, scheduling irregular execution patterns, and managing different types of implicit parallelism. Most parallel logic programming systems have been developed for bus-based shared-memory architectures. The complexity of parallel logic programming systems and the large amount of data they process raises the question of whether logic programming systems can still obtain good performance on scalable architectures, such as distributed shared-memory systems. In this work we use execution-driven simulation to investigate the access patterns and caching behaviour exhibited by a parallel logic programming system, Andorra-I. We show that the system obtains reasonable performance, but that it does not scale well. By studying the behaviour of the major data structures in Andorra-I in detail, we conclude that this result is largely a consequence of the scheduling and work manipulation implementation used in the system. We also show that the Andorra-I's data structures exhibit widely-varying memory access patterns and caching behaviour, which not only depend on the number of processors, but also on the amount and type of parallelism available in the application program. Some of these data structures clearly favour invalidate-based cache coherence protocols, while others favour update-based protocols. Since most of Andorra-I's data structures are common to other parallel logic programming systems, we believe that these systems can greatly benefit from flexible coherence schemes where either the compiler can specify the protocol to be used for each data structure or the protocol can adapt to varying memory access patterns.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Optimal weighted loop fusion for parallel programs 97

Optimal weighted loop fusion for parallel programs

引用

Proceedings of the 1997 9th Annual ACM symposium on parallel algorithms and architectures, SPAA

作者： Megiddo, Nimrod Sarkar, Vivek IBM Almaden Research Cent CA San Jose United States

ISBN: (纸本)9780897918909

Much of the computation involved in parallel programs occurs within loops, either nested loops as in parallel scientific applications or collections of loops as in stream-based applications. Loop fusion is a well-known program transformation that has shown to be effective in improving data locality in parallel programs by reducing inter-processor communication and improving register and cache locality. Weighted loop fusion is the problem of finding a legal partition of loop nests into fusible clusters so as to minimize the total inter-cluster weights. The loop nests may contain parallel or sequential loops;care is taken to ensure that a parallel loop does not get serialized after fusion. It has been shown in past work that the weighted loop fusion problem is NP-hard. Despite the NP-hardness property, we show how optimal solutions can be found efficiently (i.e., within the compile-time constraints of a product-quality optimizing compiler) for weighted loop fusion problem sizes that occur in practice. In this paper, we present an integer programming formulation for weighted loop fusion with size (number of variables and constraints) that is linearly proportional to the size of the input weighted loop fusion problem. The linear-sized formulation is key to making the execution time small enough for use in a product-quality optimizing compiler, since the natural integer programming formulation for this problem has cubic size for which the execution time would be too large to be practical. The linear-sized integer programming formulation can be solved efficiently using any standard optimization package but we also present a custom branch-and-bound algorithm that can be used if greater efficiency is desired. A prototype implementation of this approach has been completed, and preliminary compile-time measurements are included in the paper as validation of the practicality of this approach.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Design of high-speed parallel arithmetic algorithms and architectures 1

Design of high-speed parallel arithmetic algorithms and arch...

引用

1st international Workshop on Distributed Interactive Simulation and Real Time Applications, DS-RT 1997

作者： Markova, V. Supercomputer Softvare Department Pr. Lavrentieva 6 Novosibirsk630090 Russia

ISBN: (纸本)0818677732

Presents an algorithm for computing a sum of products, realizing a fundamental compound multiply-and-add operation of high-speed arithmetic. Two new cellular pipelined algorithms and architectures (2D and 3D) are proposed. The initial data and results are binary signed-digit integers. The multipliers are loaded digit-serially, while the multiplicands are loaded in a digit-parallel fashion and the results are produced in the same way. The design is performed in terms of cellular technology, based on an original model of distributed computation (the parallel substitution algorithm). The time- and structural complexity is obtained. © 1997 IEEE.

关键词： parallel architectures

来源：评论

学校读者我要写书评

暂无评论

Temporal notions of synchronization and consistency in Beehive 97

Temporal notions of synchronization and consistency in Beehi...

引用

Proceedings of the 1997 9th Annual ACM symposium on parallel algorithms and architectures, SPAA

作者： Singla, Aman Ramachandran, Umakishore Hodgins, Jessica College of Computing Georgia Institute of Technology Atlanta Ga Digital Equipment Corporation Cambridge Research Lab and College of Computing Georgia Institute of Technology Atlanta Ga

ISBN: (纸本)9780897918909

An important attribute in the specification of many compute-intensive applications is `time'. Simulation of interactive virtual environments is one such domain. There is a mismatch between the synchronization and consistency guarantees needed by such applications (which are temporal in nature) and the guarantees offered by current shared memory systems. Consequently, programming such applications using standard shared memory style synchronization and communication is cumbersome. Furthermore, such applications offer opportunities for relaxing both the synchronization and consistency requirements along the temporal dimension. In this work, we develop a temporal programming model that is more intuitive for the development of applications that need temporal correctness guarantees. This model embodies two mechanisms: `delta consistency' - a novel time-based correctness criterion to govern the shared memory access guarantees, and a companion `temporal synchronization' - a mechanism for thread synchronization along the time axis. These mechanisms are particularly appropriate for expressing the requirements in interactive application domains. In addition to the temporal programming model, we develop efficient explicit communication mechanisms that aggressively push the data out to `future' consumers to hide the read miss latency at the receiving end. We implement these mechanisms on a cluster of workstations in a software distributed shared memory architecture called `Beehive.' Using a virtual environment application as the driver, we show the efficacy of the proposed mechanisms in meeting the real time requirements of such applications.

关键词： Computer systems programming

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：