In this paper, we propose the design of a library environment, called PARUL (parallel User Library), for distributed memory multiprocessor systems. An important feature of the environment is that it allows the data di...
详细信息
ISBN:
(纸本)0818626720
In this paper, we propose the design of a library environment, called PARUL (parallel User Library), for distributed memory multiprocessor systems. An important feature of the environment is that it allows the data distributed for use of a library function as well as the results generated by the function to be retained in the network of processors to be used by subsequent library functions. The user of the library is given full control over the set of variables that are retained in the network. We describe the implementation details of PARUL on a multi-transputer system and discuss its performance.
Data redistribution of parallel data representations has become an important factor of grid frameworks for scientific computing. Providing the developers with generalized interfaces for flexible parallel data redistri...
详细信息
ISBN:
(纸本)9783540680673
Data redistribution of parallel data representations has become an important factor of grid frameworks for scientific computing. Providing the developers with generalized interfaces for flexible parallel data redistribution is a major goal of this research. In this article we present the architecture and the implementation of the redistribution module of TGrid. TGrid is a grid-enabled runtime system for applications consisting of cooperating multiprocessor tasks (M-tasks). The data redistribution module enables TGrid components to transfer data structures to other components which may be located on the same local subnet or may be executed remotely. We show how the parallel data redistribution is designed to be flexible, extendible, scalable, and particularly easy-to-use. The article includes a detailed experimental analysis of the redistribution module by providing a comparison of throughputs which were measured for a large range of processors and for different interconnection networks.
A new secret-sharing-based e-auction scheme is proposed. distributed bid opening is employed to protect bid privacy. It can achieve all the desired properties for sealed-bid auctions at a reasonable cost. Moreover, at...
详细信息
In data parallelapplications, a major source of load imbalance is in the uneven distribution of data between the nodes. The major contribution of this paper is the analysis of a new distributed load balancing scheme ...
This paper studies hierarchical configuration of distributed systems for achieving optimized system performance. A distributed system consists of a collection of local processes which are distributed over the network ...
详细信息
Artificial neural networks can solve complex problems such as time series prediction, handwritten pattern recognition or speech processing. Though software simulations are essential when one sets about to study a new ...
详细信息
Recent neuromorphic applications now use spiking neural networks (SNNs) because of their improved computational power compared to previous generations of neural networks. Efficient simulation is essential when using t...
详细信息
ISBN:
(纸本)9781479983919
Recent neuromorphic applications now use spiking neural networks (SNNs) because of their improved computational power compared to previous generations of neural networks. Efficient simulation is essential when using this type of neuron since many events have to be handled on a large number of neurons within the network. In this demonstration, a hardware simulator for SNNs that has applications in image recognition is presented. This SNN uses synchrony processing for efficient event-driven simulation (SPEEDS) which allows parallel computations of synchronized events. SPEEDS differs from common event-driven approaches that serialize every event and can improve significantly the computational efficiency of a SNN simulator. The hardware SNN is implemented on a Xilinx Virtex-6 XC6VLX240T field-programmable gate array (FPGA) and can contain 131 072 neurons. It can process approximately 70 million spikes per second on a 4-bank architecture clocked at 100 MHz. The presentation explains how such a system can be used for image processing tasks like image segmentation, feature extraction and pattern matching to realize a recognition system that can detect several objects in a given image.
As multicore architectures dominate mainstream computing platforms, migrating legacy applications into their parallel representation becomes a viable approach to reaping the benefits of multicore computing. In this pa...
详细信息
ISBN:
(纸本)9781509060580
As multicore architectures dominate mainstream computing platforms, migrating legacy applications into their parallel representation becomes a viable approach to reaping the benefits of multicore computing. In this paper we present a dataflow analysis tool that assists programmers to exploit the coarse-grained pipeline parallelism in stream-like Java applications on multicores. With this tool, programmers can partition a source Java program into a set of regions, which as pipeline stages, are connected via data channels to execute on multicores. To this end, we propose a simple yet effective framework that leverages JVMTI (JVM Tool Interface) and Javaagent techniques to track the data communication patterns among different regions, whereby a stream graph of the program is constructed. The graph is further used by the framework and programmers to re-factor the Java application into a pipelined program so that the potential of the multicores can be fully utilized. This procedure can be repeated in several rounds to progressively improve the performance. By applying this tool to several selected benchmarks, we demonstrate the effectiveness of the approach in terms of the performance improvements of some stream-like Java applications.
Writing correct and efficient parallel programs is hard. A lack of overview leads to errors in control- and dataflow, e.g., race conditions, which are hard to find due to their nondeterministic nature. In this paper, ...
详细信息
ISBN:
(纸本)9781538649756
Writing correct and efficient parallel programs is hard. A lack of overview leads to errors in control- and dataflow, e.g., race conditions, which are hard to find due to their nondeterministic nature. In this paper, we present a graphical programming model for parallel stream processingapplications, which improves the overview by visualizing high level dataflow together with explicit and concise annotations for concurrency-related dependency information. The key idea of our approach is twofold: First, we present a powerful graphical task editor together with annotations that enable the designer to define stream properties, task dependencies, and routing information. These annotations facilitate fine-granular and correct parallelization. Second, we propose seamless integration with the safe parallel programming language Rust by providing automated code structure generation from the graphical representation, design patterns for common parallel programming constructs like filters, and a scheduling and runtime environment. We demonstrate the applicability of our approach with a network-based processing system as it is typically found in advanced firewalls.
The programming of heterogeneous clusters is inherently complex, as these architectures require programmers to manage both distributed memory and computational units with a very different nature. Fortunately, there ha...
详细信息
ISBN:
(纸本)9781509028252
The programming of heterogeneous clusters is inherently complex, as these architectures require programmers to manage both distributed memory and computational units with a very different nature. Fortunately, there has been extensive research on the development of frameworks that raise the level of abstraction of cluster-based applications, thus enabling the use of programming models that are much more convenient that the traditional one based on message-passing. One of such proposals is the Hierarchically Tiled Array (HTA), a data type that represents globally distributed arrays on which it is possible to perform a wide range of data-parallel operations. In this paper we explore for the first time the development of heterogeneous applications for clusters using HTAs. In order to use a high level API also for the heterogenous parts of the application, we developed them using the Heterogeneous Programming Library (HPL), which operates on top of OpenCL but providing much better programmability. Our experiments show that this approach is a very attractive alternative, as it obtains large programmability benefits with respect to a traditional implementation based on MPI and OpenCL, while presenting average performance overheads just around 2%.
暂无评论