Cellular Automata (CA) are parallel models well suited for studying complex systems that are based on local rules of evolution. Notable examples of application are found in fluid-dynamics, crowd simulation, flow-simul...
详细信息
ISBN:
(纸本)9781728165820
Cellular Automata (CA) are parallel models well suited for studying complex systems that are based on local rules of evolution. Notable examples of application are found in fluid-dynamics, crowd simulation, flow-simulation and many more. Nevertheless, CA can be fruitfully exploited as a support in numerical approaches, such as finite element and finite volume methods. Though easily parallelizable by domain partitioning among the nodes of a parallel system, the performance and scalability of cellular automata executed on parallel/distributed machines are limited due to the need of synchronizing nodes at each computational step. With the aim of reducing the synchronization burden, we here present a preliminary study on techniques stemmed from the Discrete Event Simulation field for the optimization of CA on distributed memory architectures. Preliminary results, executed in a distributed memory environment, have shown the usefulness of the considered approach in reducing execution times and therefore in improving the speed up of the parallel execution of the test case.
The binary-swap and the parallel-pipelined methods are two popular image composition methods for volume rendering on distributed memory multicomputers. However, these methods either restrict the number of processors t...
详细信息
We present the design of a global object space in a distributed Java Virtual Machine that supports parallel execution of a multi-threaded Java program on a cluster of computers. The global object space virtualizes a s...
详细信息
ISBN:
(纸本)0769516777
We present the design of a global object space in a distributed Java Virtual Machine that supports parallel execution of a multi-threaded Java program on a cluster of computers. The global object space virtualizes a single Java object heap across machine boundaries to facilitate transparent object accesses. Based on the object connectivity information that is available at runtime, the object reachable from threads at different nodes, named as distributed-shared object, are detected, With the detection of distributed-shared objects, we can alleviate overheads in maintaining the memory consistency within the global object space. Several runtime optimization methods have been incorporated in the global object space design, including an object home migration method that reallocates the home of a distributed-shared object, synchronized method migration that allows the remote execution of a synchronized method at the home node of its synchronized object, and object pushing that uses the object connectivity information to improve access locality.
The trend to efficient, however more complex, multicore designs has also reached the world of Digital Signal Processors (DSP), a field where typically low-level programming has been prevalent. To overcome the addition...
详细信息
ISBN:
(纸本)9781479984909
The trend to efficient, however more complex, multicore designs has also reached the world of Digital Signal Processors (DSP), a field where typically low-level programming has been prevalent. To overcome the additional complexity of programming multi-core and multi-chip DSP systems, we present an object-oriented framework for task-based parallel programming on the highly power efficient Texas Instruments TSMC320C6678 platform. Our framework incorporates hardware architectural details of this platform such as DMA units in a high-level manner, while maintaining portability - guiding the path for algorithmic designers from PCs to embedded DSP platforms. The whole framework has been designed and implemented with real-time requirements and low overhead in mind, which is crucial for the acceptance of higher-level solutions on embedded systems.
Stream processing applications are spread across different sectors of industry and people's daily lives. The increasing data we produce, such as audio, video, image, and text are demanding quickly and efficiently ...
详细信息
ISBN:
(纸本)9781665414555
Stream processing applications are spread across different sectors of industry and people's daily lives. The increasing data we produce, such as audio, video, image, and text are demanding quickly and efficiently computation. It can be done through Stream parallelism, which is still a challenging task and most reserved for experts. We introduce a Stream processing framework for asse s s ing parallel Programming Interfaces (PPIs). Our framework targets multi-core architectures and C++ stream processing applications, providing an API that abstracts the details of the stream operators of these applications. Therefore, users can easily identify all the basic operators and implement parallelism through different PPIs. In this paper, we present the proposed framework, implement three applications using its API, and show how it works, by using it to parallelize and evaluate the applications with the PPIs Intel TBB, FastFlow, and SPar. The performance results were consistent with the literature.
parallelprocessing of thresholding based on image between-class variance (BCV) is studied in this paper. In parallelprocessing, a frame of image is divided into M sub-images with the same size. Computation of the no...
详细信息
An edge detection process in computer vision and imageprocessing detects any types of significant features appearing as discontinuities in intensities. This paper presents our experience with parallelizing an edge de...
详细信息
An edge detection process in computer vision and imageprocessing detects any types of significant features appearing as discontinuities in intensities. This paper presents our experience with parallelizing an edge detection application algorithm that reduces noise and unnecessary detail in a gray-scale image from a coarse level to a fine level of resolution by using an edge focusing technique. Numerical methods and parallel implementations of edge focusing are presented. The image detection algorithms are implemented on three representative massage-passing architectures: a low-cost heterogeneous PVM network, an Intel iPSC/860 hypercube, and a CM-5 massively parallel multicomputer. Our objectives are to provide insight into implementation and performance issues for imageprocessing applications on general-purpose message-passing architectures, to investigate implications on network variations, and to evaluate the computing scalabilities on the three network systems by examining execution and communication patterns of the image edge detection application.
URL, or layer-5, switches can be used to implement locally and globally distributed web sires. URL switches must be able to exploit knowledge of server load and content (e.g., of reverse caches). Implementing globally...
详细信息
ISBN:
(纸本)0769507719
URL, or layer-5, switches can be used to implement locally and globally distributed web sires. URL switches must be able to exploit knowledge of server load and content (e.g., of reverse caches). Implementing globally distributed web sites offers difficulties not present in local server clusters due to bandwidth and delay constraints in the Internet. With delayed load information, server selection methods based on choosing the least-loaded server will result in oscillations in network and server load. In this paper, methods that make effective use of delayed load information are described and evaluated The new Pick-KX method is developed and shown to be better than existing methods. Load information is adjusted with probabilistic information using Bloom filter summaries of site content A combined loan and content metric is suggested for use for selecting the best server in a globally distributed site.
A difference scheme for noise removal based on four-order partial differential equations is suggested. It can approximate actual image while preserving edges and avoiding blocky effects in imageprocessing. Numerical ...
详细信息
ISBN:
(纸本)0769529097
A difference scheme for noise removal based on four-order partial differential equations is suggested. It can approximate actual image while preserving edges and avoiding blocky effects in imageprocessing. Numerical results are demonstrated its efficiency and the better choice of parameters.
We explore the filtering properties of wavelets functions in order to develop accurate and efficient numerical algorithms for image Restoration problems. We propose a parallel implementation for MIMD distributed memor...
详细信息
ISBN:
(纸本)0819429139
We explore the filtering properties of wavelets functions in order to develop accurate and efficient numerical algorithms for image Restoration problems. We propose a parallel implementation for MIMD distributed memory environments. The key insight of our approach is the use of distributed versions of Level 3 Basic Linear Algebra Subprograms as computational building blocks and the use of Basic Linear Algebra Communication Subprograms las communication building blocks for advanced architecture computers. The use of these low-level mathematical software libraries garantees the development of efficient, portable and scalable high-level algorithms and hides many details of the parallelism from the user's point of view. Numerical experiments on a simulated image restoration applications are shown. The parallel software has been tested on a 12 nodes IBM SP2 available at the Center for Research on parallel Computing and Supercomputers in Naples (Italy).
暂无评论