检索结果-内蒙古大学图书馆

IEEE International Workshop/symposium on Workload Characterization

作者： M.R. de Alba D.R. Kaeli Department of Electrical and Computer Engineering Northeastern University Boston MA USA

ISBN: (纸本)0780373154

To obtain the benefits of aggressive, wide-issue, architectures, a large window of valid instructions must be available. While researchers have been successful in obtaining high accuracies with a range of dynamic branch predictors, there still remains the need for more aggressive instruction delivery. Loop bodies possess a large amount of spatial and temporal locality. A large percentage of a program's entire execution can be attributed to code found in loop bodies. If we retain this code in a buffer or the cache, we do not have to refetch this code on subsequent loop iterations., Loops tend to iterate multiple times before exiting, thus providing us with the opportunity to speculatively issue multiple iterations. While some loops can be unrolled by a compiler many contain conditional branches. The number of times a loop iterates may be dependent on a program variable. These issues can hinder our ability to speculatively issue multiple iterations of a loop. If we are able to profile loops during runtime, we can use this information to more accurately issue speculative paths through loop bodies. In this paper we present a characterization of loop execution across the SPECint2000 benchmark suite. We intend for this study to serve as a guide in the selection of design parameters of a loop path predictor We characterize the patterns exhibited during multiple visits to a loop body. We present the design of a table that records path-based loop execution history and allows us to predict multiple loop iterations dynamically.

关键词： Runtime Dynamic range History Computer aided instruction Computer architecture Programming profession Iterative algorithms parallel processing Terminology

来源：评论

学校读者我要写书评

暂无评论

How to vectorize the algebraic multilevel iteration

引用

acm TRANSACTIONS ON MATHEMATICAL SOFTWARE 2000年第2期26卷 293-309页

作者： Grosz, L Massey Univ Inst Informat & Math Sci N Shore Mail Ctr Auckland New Zealand

We consider the algebraic multilevel iteration (AMLI) for the solution of systems of linear equations as they arise from a finite-difference discretization on a rectangular grid. Key operation is the matrix-vector product, which can efficiently be executed on vector and parallel-vector computer architectures if the nonzero entries of the matrix are concentrated in a few diagonals. In order to maintain this structure for all matrices on all levels coarsening in alternating directions is used. In some cases it is necessary to introduce additional dummy grid hyperplanes. The data movements in the restriction and prolongation are crucial, as they produce massive memory conflicts on vector architectures. By using a simple performance model the best of the possible vectorization strategies is automatically selected at runtime. Examples show that on a Fujitsu VPP300 the presented implementation of AMLI reaches about 85% of the useful performance, and scalability with respect to computing time can be achieved.

关键词： algorithms performance large linear systems preconditioned iterative solver vector computer multigrid method numerical software parallel processing

来源：评论

学校读者我要写书评

暂无评论

Fast distributed algorithms for Brooks-Vizing colorings

引用

JOURNAL OF algorithms 2000年第1期37卷 85-120页

作者： Grable, DA Panconesi, A Humboldt Univ Inst Informat D-10099 Berlin Germany Univ Bologna I-40127 Bologna Italy

Let G be a Delta-regular graph with n vertices and girth at least 4 such that Delta much greater than log n. We give very simple, randomized, distributed algorithms for vertex coloring G with Delta/k colors in O(k + log n) communication rounds, where k = O(log Delta). The algorithm may fail or exceed the above running time, but the probability that this happens is o(1), a quantity that goes to zero as n grows. The probabilistic analysis relies on a powerful generalization of Azuma's martingale inequality that we dub the Method of Bounded Variances. (C) 2000 Academic Press.

关键词： parallel Probabilistic analysis Color logarithm (log) vertex head delta parallel Lines GIRTH Delta distributed algorithms Runtime

来源：评论

学校读者我要写书评

暂无评论

Efficient conditional operations for data-parallel architectures 33

Efficient conditional operations for data-parallel architect...

引用

33rd annual IEEE/acm International symposium on Microarchitecture

作者： Kapasi, Ujval J. Dally, William J. Rixner, Scott Mattson, Peter R. Owens, John D. Khailany, Brucek Stanford Univ Stanford CA United States

ISBN: (纸本)9781581131963

Many data-parallel applications, including emerging media applications, have regular structures that can easily be expressed as a series of arithmetic kernels operating on data streams. Data-parallel architectures are designed to exploit this regularity by performing the same operation on many data elements concurrently. However, applications containing data-dependent control constructs perform poorly on these architectures. Conditional streams convert these constructs into data-dependent data movement. This allows data-parallel architectures to efficiently execute applications with data-dependent control flow. Essentially, conditional streams extend the range of applications that a data-parallel architecture can execute efficiently. For example, polygon rendering speeds up by a factor of 1.8 with the use of conditional streams.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

Fast concurrent access to parallel disks

Fast concurrent access to parallel disks

引用

11th annual acm/SIAM symposium on Discrete algorithms

作者： Sanders, P Egner, S Korst, J Max Planck Inst Comp Sci D-66123 Saarbrucken Germany

ISBN: (纸本)0898714532

High performance applications involving large data sets require the efficient and flexible use of multiple disks. In an external memory machine with D parallel, independent disks, only one block can be accessed on each disk in one I/O step. This restriction leads to a load balancing problem that is perhaps the main inhibitor for adapting single-disk external memory algorithms to multiple disks. This paper shows that this problem can be solved efficiently using a combination of randomized placement, redundancy and an optimal scheduling algorithm. A buffer of O(D) blocks suffices to support efficient writing of arbitrary blocks if blocks are distributed uniformly at random to the disks (e.g., by hashing). If two randomly allocated copies of each block exist, N arbitrary blocks can be read within [N/D] + I/O steps with high probability. In addition, the redundancy can be reduced from 2 to if 1/r for any integer r. These results can be used to emulate the simple and powerful "single-disk multi-head" model of external computing [1] on the physically more realistic independent disk model [33] with small constant overhead. This is faster than a lower bound for deterministic emulation [3].

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel co-simulation of conventional and active networks 8

Parallel co-simulation of conventional and active networks

引用

8th International symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems

作者： Rao, DM Wilsey, PA Dept ECECS Expt Comp Lab Cincinnati OH 45221 USA

ISBN: (纸本)076950728X

Discrete event simulations are widely used to study and analyze active and conventional networking architectures and protocols. Active networks must coexist and communicate with conventional networks to effectively utilize and extend the infrastructure of the Internet. Hence, large scale network simulations containing both conventional and active components should be conducted to study crucial scalability and performance issues. In this paper, a framework to enable parallel co-simulation of conventional and active networks is described. The framework integrates ANSE, a parallel Active Networks Simulation Environment, with NS, a popular sequential network simulator. Object oriented techniques that completely insulate the application modules from the modifications have been employed for parallelizing NS in order to eliminate changes to the network models. This paper presents the design and implementation of the parallel co-simulation framework along with the results obtained from our co-simulation studies. Our studies indicate that parallel simulation techniques considerably reduce simulation times for even small sized network models.

关键词： Active networks

来源：评论

学校读者我要写书评

暂无评论

(2.954+Ε)n oblivious routing algorithm on 2D meshes

Annual ACM Symposium on Parallel Algorithms and Architecture...

引用

annual acm symposium on parallel algorithms and architectures 2000年 186-195页

作者： Iwama, Kazuo Miyano, Eiji Kyoto Univ Kyoto Japan

A deterministic, oblivious, permutation routing algorithm on the 2D meshes of constant queue size is presented. An optimal, linear-time bound for oblivious permutation routing could be obtained for these meshes. A new algorithm based on bit-reversal permutation for oblivious routing is proposed. A (2.954+Ε)n oblivious routing is proposed where Ε is any positive constant and the queue-size is a function of Ε. The running time of the algorithm is less than 1.5 times as large as the network diameter of the mesh. The n×n mesh is considered and model on these meshes are presented. The problems in controlling the pipeline of packet flows is discussed.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

No-busy-wait balanced tree parallel algorithmic paradigm

Annual ACM Symposium on Parallel Algorithms and Architecture...

引用

annual acm symposium on parallel algorithms and architectures 2000年 147-155页

作者： Vishkin, Uzi Univ of Maryland United States

Suppose that a parallel algorithm can include any number of parallel threads. Each thread can proceed without ever having to busy wait to another thread. A thread can proceed till its termination, but no new threads can be formed. What kind of problems can such restrictive algorithms solve and still be competitive in the total number of operations they perform with the fastest serial algorithm for the same problem? Intrigued by this informal question, we considered one of the most elementary parallel algorithmic paradigms, that of balanced binary trees. The main contribution of this paper is a new balanced (not necessarily binary) tree no-busy-wait paradigm for parallel algorithms;applications of the basic paradigm to two problems are presented: building heaps, and executing parallel tree contraction (assuming a preparatory stage);the latter is known to be applicable to evaluating a family of general arithmetic expressions. For putting things in context, we also discuss our `PRAM-on-chip' vision (actually a small update to it), presented at SPAA98.

关键词： parallel processing systems

来源：评论

学校读者我要写书评

暂无评论

On scheduling parallel tasks at twilight

引用

THEORY OF COMPUTING SYSTEMS 2000年第5-6期33卷 489-563页

作者： Bast, H Max Planck Inst Informat D-66123 Saarbrucken Germany

We consider the problem of processing a given number of tasks on a given number of processors as quickly as possible when only vague information about the processing time of a task is available before it is completed. Whenever a processor is idle, it can be assigned, at the price of a certain overhead, a portion, called a chunk, of the unassigned tasks. The goal is to minimize the makespan, that is, the time that passes until all the tasks are completed. The difficulty then is to find the optimal tradeoff between the processors' load balance, which is favoured by having small, and therefore many, chunks, and the total scheduling overhead, which is lower when there are fewer chunks. This scheduling problem has been the subject of intensive research in the past, and a large variety of heuristics have been proposed. Its mathematical analysis, however, turned out to be difficult even for simplistic models of the vague-information issue, and little theoretical work has been presented to date. In this work we present a novel theoretical model that covers a multitude of natural vague-information scenarios, and for which we can prove general upper and lower bounds on the achievable makespan. From this we derive optimal bounds and algorithms for a whole variety of specific scenarios, including the modelling of task processing times as independent, identically distributed random variables, which guided the design of most of flee previously existing heuristics. Unlike traditional approaches, our model neither ignores a priori knowledge of the input (the processing times) nor does it restrict the distribution of the input, but instead works with the concepts of an a priori estimate of the processing times, which is implicit in every algorithm, and a measure for the deviation of this estimate from the actual processing times, which is not known until all the tasks are completed.

关键词： COMPUTER systems COMPUTERS

来源：评论

学校读者我要写书评

暂无评论

Computing the arrangement of curve segments: Divide-and-conquer algorithms via sampling

Computing the arrangement of curve segments: Divide-and-conq...

引用

11th annual acm-SIAM symposium on Discrete algorithms

作者： Amato, Nancy M. Goodrich, Michael T. Ramos, Edgar A. Texas A&M Univ College Station United States

We describe two deterministic algorithms for constructing the arrangement determined by a set of (algebraic) curve segments in the plane. They both use a divide-and-conquer approach based on derandomized geometric sampling and achieve the optimal running time O(n log n+k), where n is the number of segments and k is the number of intersections. The first algorithm, a simplified version of one presented in [1], generates a structure of size O(n log log n+k) and its parallel implementation runs in time O(log2 n). The second algorithm is better in that the decomposition of the arrangement constructed has optimal size O(n+k) and it has a parallel implementation in the EREW PRAM model that runs in time O(log3/2 n). The improvements in the second algorithm are achieved by means of an approach that adds some degree of globality to the divide-and-conquer approach based on random sampling. The approach extends previous work by Dehne et al., Deng and Zhu and Kuehn, that use small separators for planar graphs in the design of randomized geometric algorithms for coarse grained multicomputers. The approach simplifies other previous geometric algorithms, and also has the potential of providing efficient deterministic algorithms for the external memory model.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：