One way to further exploit the reconfigurable resources of SRAM FPGAs and increase functional density is to reconfigure them during system operation. This process is referred to as Run-Time Reconfiguration (RTR). RTR ...
详细信息
One way to further exploit the reconfigurable resources of SRAM FPGAs and increase functional density is to reconfigure them during system operation. This process is referred to as Run-Time Reconfiguration (RTR). RTR is an approach to system implementation that divides an application or algorithm into time-exclusive operations that are implemented as separate configurations. The Run-Time Reconfiguration Artificial Neural Network (RRANN) is a proof-of-concept system that demonstrates the effectiveness of RTR for implementing neural networks. It implements the popular backpropagation training algorithm as three distinct time-exclusive FPGA configurations: feed-forward, backpropagation and update. System operation consists of sequencing through these three reconfigurations at run-time, one configuration at a time. RRANN has been fully implemented with Xilinx FPGAs, tested and shown to increase the functional density of a network up to 500% when compared to FPGA-based implementations that do not use RTR.
Many new remote procedure calls (RPC) systems are being built to meet different application requirements, and much development effort has been spent on redoing significant parts of the RPC system. This paper describes...
详细信息
Many new remote procedure calls (RPC) systems are being built to meet different application requirements, and much development effort has been spent on redoing significant parts of the RPC system. This paper describes URPC, a toolkit for prototyping new RPC systems. It allows programmers to provide high-level implementations of RPC semantics and to customize supporting RPC services, such as stub generation and name service, to match the requirements of different RPC semantics. This approach increases flexibility in constructing new RPC systems and greatly reduces coding effort. In addition, this approach allows application-specific optimization by increasing the semantic content of individual RPC calls through customization, as well as by allowing programmers to import protocol machine implementations. Thus, the generated prototype RPC implementations can perform as fast as native RPCs.
By viewing different parallel programming paradigms as essentially heterogeneous approaches in mapping 'real-world' problems to parallel systems, the authors discuss methodologies in integrating multiple progr...
详细信息
By viewing different parallel programming paradigms as essentially heterogeneous approaches in mapping 'real-world' problems to parallel systems, the authors discuss methodologies in integrating multiple programming models on a massively parallel system such as Connection Machine CM5, Using a dataflow based integration model built in a visualization software AVS, the authors describe a simple, effective and modular way to couple sequential, data-parallel and explicit message-passing modules into an integrated parallel programming environment on a CM5. A case study in the area of numerical advection modeling is given to demonstrate the integration of data-parallel and message-passing modules in the proposed multi-paradigm programming environment.
Applications programming for high-performance computing is notoriously difficult. Although parallel programming is intrinsically complex, the principal reason why high-performance computing is difficult is the lack of...
详细信息
Applications programming for high-performance computing is notoriously difficult. Although parallel programming is intrinsically complex, the principal reason why high-performance computing is difficult is the lack of effective software tools. We believe that the lack of tools in turn is largely due to market forces rather than our inability to design and build such tools. Unfortunately, the poor availability and utilization of parallel tools hurt the entire supercomputing industry and the U.S. high performance computing initiative which is focused on applications. A disproportionate amount of resources is being spent on faster hardware and architectures, while tools are being neglected. This article introduces a taxonomy of tools, analyzes the major factors that contribute to this situation, and suggests ways that the imbalance could be redressed and the likely evolution of tools.
A novel methodology is presented for reducing synchronization costs of programs compiled for SPMD execution. The methodology combines data flow analysis with communication analysis to determine the ordering between pr...
详细信息
ISBN:
(纸本)9780897917698
A novel methodology is presented for reducing synchronization costs of programs compiled for SPMD execution. The methodology combines data flow analysis with communication analysis to determine the ordering between production and consumption of data on different processors. It is shown that several commonly occurring computation patterns lend themselves well to this optimization. The framework presented also recognizes situations where the synchronization needs for multiple data transfers can be satisfied by a single synchronization message. This analysis, while applicable to all shared memory machines as well, is especially useful for those with a flexible cache-coherence protocol.
Some applications are most easily expressed in a programming language that supports concurrency, notably interactive and distributed systems. We propose extensions to the purely-functional language Haskell that allow ...
详细信息
Some applications are most easily expressed in a programming language that supports concurrency, notably interactive and distributed systems. We propose extensions to the purely-functional language Haskell that allow it to express explicitly concurrent applications;we call the resulting language Concurrent Haskell. The resulting system appears to be both expressive and efficient, and we give a number of examples of useful abstractions that can be built from our primitives. We have developed a freely-available implementation of Concurrent Haskell, and are now using it as a substrate for a graphical user interface toolkit.
The paper presents a partitioning and parallelizing programming environment for a novel parallel architecture. This universal embedded accelerator is based on a reconfigurable datapath hardware. The partitioning and p...
详细信息
The paper presents a partitioning and parallelizing programming environment for a novel parallel architecture. This universal embedded accelerator is based on a reconfigurable datapath hardware. The partitioning and parallelizing programming environment accepts C-programs and carries out both, a profiling-driven host/ accelerator partitioning for performance optimization in a first step, and in a second step a resource-driven sequential/ structural partitioning of the accelerator source code to optimize the utilization of its reconfigurable resources.
An active object is a function that returns a pointer to its environment when an execution thread is attached to it. This facility of BaLinda K, a parallel Lisp dialect with an imperative appearance, is shown to be us...
详细信息
An active object is a function that returns a pointer to its environment when an execution thread is attached to it. This facility of BaLinda K, a parallel Lisp dialect with an imperative appearance, is shown to be useful for constructing I/O interfaces and execution control mechanisms, and has potential as a tool for system program implementation.
Speculative evaluation, including leniency and futures, is often used to produce high degrees of parallelism. Existing speculative implementations, however, may serialize computation because of their implementation of...
详细信息
Speculative evaluation, including leniency and futures, is often used to produce high degrees of parallelism. Existing speculative implementations, however, may serialize computation because of their implementation of queues of suspended threads. We give a provably efficient parallel implementation of a speculative functional language on various machine models. The implementation includes proper parallelization of the necessary queuing operations on suspended threads. Our target machine models are a butterfly network, hypercube, and PRAM. To prove the efficiency of our implementation, we provide a cost model using a profiling semantics and relate the cost model to implementations on the parallel machine models.
Prior to designing a system, customers and contractors should agree on required black box (externally apparent) system behavior. To define this behavior, practical, precise, design independent methods are needed. This...
详细信息
Prior to designing a system, customers and contractors should agree on required black box (externally apparent) system behavior. To define this behavior, practical, precise, design independent methods are needed. This paper describes results of a case study in which formal event-based approaches are used, demonstrating that a combination of history based traces and guarded event-action statements is practical for defining black box behavior. Externally apparent modes (states) simplify the specification to promote human understanding. The specifier is allowed to use both traces and event-action statements in a single specification, as requirements that define sequential events are best specified using traces, and requirements that are conditional are best specified using event-action statements. Graph generation from the model, as opposed to graph definition makes this type of specification easier to define and maintain.
暂无评论