Modern. real-time systems consist of complex parallel and heterogeneous architectures. Early design decisions, such as the partitioning of functionality onto architecture building blocks and the choice of algorithms, ...
详细信息
ISBN:
(纸本)0769525598
Modern. real-time systems consist of complex parallel and heterogeneous architectures. Early design decisions, such as the partitioning of functionality onto architecture building blocks and the choice of algorithms, have a large impact on the quality of the resulting platform. In order to support the designer during this concept phase we have developed our performance evaluation framework SystemQ. In this paper, we demonstrate why multiclass queuing networks as used by SystemQ are a natural abstraction for evaluating network processing platforms. In particular we reveal the impact of scheduling policies on the Quality-of-Service, such as the residence time of network traffic in the system. For the same stimuli the packet latency can vary by more than one order of magnitude if only one queuing discipline in the system is modified.
We describe an implementation of a compact parallel algorithm for 3D Delaunay tetrahedralization on a 64-processor shared-memory machine. Our algorithm uses a concurrent version of the Bowyer-Watson incremental insert...
详细信息
ISBN:
(纸本)1595933409
We describe an implementation of a compact parallel algorithm for 3D Delaunay tetrahedralization on a 64-processor shared-memory machine. Our algorithm uses a concurrent version of the Bowyer-Watson incremental insertion, and a thread-safe space-efficient structure for representing the mesh. Using the implementation we are able to generate significantly larger Delaunay meshes than have previously been generated - 10 billion tetrahedra on a 64 processor SMP using 200GB of RAM. The implementation makes use of a locality based rela-beling of the vertices that serves three purposes - it is used as part of the space efficient representation, it improves the memory locality, and it reduces the overhead necessary for locks. The implementation also makes use of a caching technique to avoid excessive decoding of vertex information, a technique for backing out of insertions that collide, and a shared work queue for maintaining points that have yet to be inserted. Copyright 2006 acm.
Hydra PPS is a collection of annotations, classes, a runtime, and a compiler designed to provide Java programmers with a fairly simple method of producing programs for Symmetric Multiprocessing (SMP) architectures. Th...
ISBN:
(纸本)9781595934529
Hydra PPS is a collection of annotations, classes, a runtime, and a compiler designed to provide Java programmers with a fairly simple method of producing programs for Symmetric Multiprocessing (SMP) architectures. This paper introduces the basics of this new system including the basic constructs for this new programming language and the relationship between the Java VM, the compiler, the runtime, and the parallel program. Hydra will exploit parallelism when the underlying architecture supports it and will run as normal sequential Java program when the architecture does not have support for parallelism. parallelism is expressed through events in Hydra, it is easy to use, and programs run efficiently on parallelarchitectures.
Platform FPGAs that integrate sequential processors with a spatial fabric have become prevalent. While these hybrid architectures ease the burden of integrating sequential and spatial code in a single application, pro...
详细信息
Platform FPGAs that integrate sequential processors with a spatial fabric have become prevalent. While these hybrid architectures ease the burden of integrating sequential and spatial code in a single application, programming them, and particularly their spatial fabrics remains challenging. The difficulty arises in part from the lack of an agreed upon computational model and family of programming languages. In addition, moving algorithms into hardware is an arcane art far removed from the experience of most programmers. To address this challenge, we present a new type architecture, an abstract model analogous to the von Neumann machine for sequential computers, that can serve as common ground for algorithm designers, language designers, and hardware architects. We show that many parallelarchitectures, including platform FPGAs, are implementations of this type architecture. Using examples from a variety of application domains, we show how algorithms can be analyzed to estimate their performance on implementations of this type architecture. This analysis is done without having to delve into the details of any architecture in particular. Finally, we describe some of the common features of languages designed for expressing micro-parallelism, highlighting connections with the type architecture
Suppose we have a parallel or distributed system whose nodes have limited capacities, such as processing speed, bandwidth, memory, or disk space. How does the performance of the system depend on the amount of heteroge...
详细信息
ISBN:
(纸本)9781595934529
Suppose we have a parallel or distributed system whose nodes have limited capacities, such as processing speed, bandwidth, memory, or disk space. How does the performance of the system depend on the amount of heterogeneity of its capacity distribution? We propose a general framework to quantify the worst-case effect of increasing heterogeneity in models of parallel systems. Given a cost function g(C,W) representing the system's performance as a function of its nodes' capacities C and workload W (such as the completion time of an optimum schedule of jobs W on machines C), we say that g has price of heterogeneity α when for any workload, cost cannot increase by more than a factor α if node capacities become arbitrarily more heterogeneous. We give constant bounds on the price of heterogeneity of several well-known job scheduling and graph degree/diameter problems, indicating that increasing heterogeneity can never be much of a disadvantage. On the other hand, with the introduction of timing constraints such as release times or precedence constraints on the jobs, the dependence on node capacities becomes more complex, so that increasing heterogeneity may be quite detrimental.
Modern real-time systems consist of complex parallel and heterogeneous architectures. Early design decisions, such as the partitioning of functionality onto architecture building blocks and the choice of algorithms, h...
详细信息
Modern real-time systems consist of complex parallel and heterogeneous architectures. Early design decisions, such as the partitioning of functionality onto architecture building blocks and the choice of algorithms, have a large impact on the quality of the resulting platform. In order to support the designer during this concept phase, we have developed our performance evaluation framework SystemQ. In this paper, we demonstrate why multiclass queuing networks as used by SystemQ are a natural abstraction for evaluating network processing platforms. In particular, we reveal the impact of scheduling policies on the quality-of-service, such as the residence time of network traffic in the system. For the same stimuli, the packet latency can vary by more than one order of magnitude if only one queuing discipline in the system is modified.
While there are strong beliefs within the community about whether one particular parallel programming model is easier to use than another, there has been little research to analyze these claims empirically. Currently,...
详细信息
ISBN:
(纸本)9781595934529
While there are strong beliefs within the community about whether one particular parallel programming model is easier to use than another, there has been little research to analyze these claims empirically. Currently, the most popular paradigm is message-passing, as implemented by the MPI library [1]. However, MPI is considered to be difficult for developing programs, because it forces the programmer to work at a very low level of abstraction. One alternative parallel programming model is the PRAM model, which supports fine-grained parallelism and has a substantial history of algorithmic theory [2]. It is not possible to program current parallel machines using the PRAM model because modern architectures are not designed to support such a model efficiently. However, current trends towards multicore chips suggest that large-scale, fine-grained uniform-memory access parallel machines may soon be feasible. XMT-C is an extension of the C language that supports parallel directives to provide a PRAM-like model to the programmer. A prototype compiler exists that generates code which runs on a simulator for an XMT architecture [3].To better understand how much benefit a PRAM-like model could provide over a message-passing model, we conducted a feasibility study in an academic setting to compare the effort required to solve a particular problem. The questions under study were: can we measure the effort in developing a program using these two programming models and can we differentiate the amount of effort for each model?.The subjects participating in the study were divided up into two groups. One group solved a problem using the MPI library in either C,C++, or Fortran, and the other group solved the problem using XMT-C. The task was to write a function to multiply a sparse matrix with a dense vector. To obtain subjects, we leveraged existing graduate-level parallel programming courses at two different universities: University of California, Santa Barbara (UCSB), and Universit
In [6] it is shown that every graph can be probabilistically embedded into a distribution over its spanning trees with expected distortion O(log2 n log log n), narrowing the gap left by [1], where a lower bound of Ω(...
ISBN:
(纸本)9780898716054
In [6] it is shown that every graph can be probabilistically embedded into a distribution over its spanning trees with expected distortion O(log2 n log log n), narrowing the gap left by [1], where a lower bound of Ω(log n) is established. This lower bound holds even for the class of series-parallel graphs as proved in [8]. In this paper we close this gap for series-parallel graphs, namely we prove that every n-vertex series-parallel graph can be probabilistically embedded into a distribution over its spanning trees with expected stretch O(log n) for every two vertices. We gain our upper bound by presenting a polynomial time probabilistic algorithm that constructs spanning trees with low expected stretch. This probabilistic algorithm can be derandomized to yield a deterministic polynomial time algorithm for constructing a spanning tree of a given series-parallel graph G, whose communication cost is at most O(log n) times larger than that of G.
暂无评论