This article presents the C++ library vShark which reduces the intranode communication overhead of parallel programs on clusters of SMPs. The library is built on top of message-passing libraries like MPI to provide th...
详细信息
ISBN:
(纸本)3540297693
This article presents the C++ library vShark which reduces the intranode communication overhead of parallel programs on clusters of SMPs. The library is built on top of message-passing libraries like MPI to provide thread-safe communication but most importantly, to improve the communication between threads within one SMP node. vShark uses a modular but transparent design which makes it independent of specific communication libraries, Thus, different subsystems such as MPI, CORBA, or PVM could also be used for low-level communication. We present an implementation of vShark based on MPI and the POSIX thread library, and show that the efficient intra-node communication of vShark improves the performance of parallel algorithms.
In this paper an adaptive parallel ant colony optimization is developed, We propose two different strategies for information exchange between the processors: selection based on sorting and on distance, which make each...
详细信息
ISBN:
(纸本)3540297693
In this paper an adaptive parallel ant colony optimization is developed, We propose two different strategies for information exchange between the processors: selection based on sorting and on distance, which make each processor choose a partner to communicate and update the pheromone according to the partner's pheromone. In order to increase the ability of search and avoid early convergence, we also propose a method of adjusting the time interval of information exchange adaptively according to the convergence factor of each processor. Experimental results based on traveling salesman problem on the massive parallel processors (MPP) Dawn 2000 demonstrate the proposed APACO are superior to the classical ant colony optimization.
This paper describes a parallelprocessing implementation for neural computing and its application to finite element mesh decomposition. The parallelized neural network software developed is based on the public domain...
详细信息
This paper describes a parallelprocessing implementation for neural computing and its application to finite element mesh decomposition. The parallelized neural network software developed is based on the public domain NASA developed program NETS 2.01, which is based on the back propagation algorithm of Rumelhart et al. [Learning internal representation by error propagation. In paralleldistributedprocessing: Explorations in the Microstructure of Cognition (Edited by D. E. Rummelhart and J. L. McClelland), Vol. 1: Foundations. MIT Press, MA (1986)]. The principal focus of this research concerns the parallel implementation. Comparisons between sequential and parallel versions are given. Finally a structural design problem concerned with finite element mesh generation is solved using the parallel neural network software. (C) 1997 Civil-Comp Ltd and Elsevier Science Ltd.
Physiological monitoring can be useful in a number of scenarios to evaluate or diagnose the status of individuals or groups, for health or mental reasons. The devices used to collect this data have become increasingly...
详细信息
ISBN:
(纸本)9781467347211
Physiological monitoring can be useful in a number of scenarios to evaluate or diagnose the status of individuals or groups, for health or mental reasons. The devices used to collect this data have become increasingly portable, but deriving useful metrics from such data can often take significant processing - a commodity not always available in mobile environments. This paper presents and evaluates a system designed to easily process physiological signals in mobile environments by utilising commonly-available smartphone hardware to collaboratively transform collected data.
Aim of the paper is to demonstrate how by integrating unsupervised and supervised parallel neural clustering methods in a GPU platform we may carry out a fast image segmentation with a satisfactory compromise between ...
详细信息
ISBN:
(纸本)9781467325851;9781467325837
Aim of the paper is to demonstrate how by integrating unsupervised and supervised parallel neural clustering methods in a GPU platform we may carry out a fast image segmentation with a satisfactory compromise between the topological preservation of the original image and the minimization of the quantization error, also known as clustering accuracy. For this reason, an unsupervised parallel clustering method inspired by the Extended SOM (ESOM) powered by a Learning Vector Quantization (LVQ) like algorithm is proposed. Then, its parallel supervised versions is presented to further minimize the quantization error in case proper prototypes of the desired clusters are known. Finally, the GPU implementation of both these methods are illustrated to show how we may support time critical tasks such as real time surveillance, interactive medical diagnosis, and control of dynamical systems. The performance of the GPU implementation is discussed with the help of small examples and realistic processing tasks.
In this paper, a holistic approach to realize survivability of distributed information network systems for critical applications (DISCA) based on three basic states, processed, stored, and transmitted, of information ...
详细信息
ISBN:
(纸本)3540297693
In this paper, a holistic approach to realize survivability of distributed information network systems for critical applications (DISCA) based on three basic states, processed, stored, and transmitted, of information (called a PST-based system model), is proposed and its evaluation method and some experiment results are given as an example of its application. A PST-based system model brings all three parts together and coordinates them through the services supported by them, in which whole system's survivability is embodied by system services and their interdependency relations. With this model, a multi-layer survivability framework based on the information states is formed and the complexity of a DISCA system in implementation and evaluation can be conquered in the most prevalent approach-"divide and conquer" approach.
The paper describes concept and implementation of a data cache architecture with concurrent conflict free access to shared data for DSPs with parallel, synchronized processing units. It utilizes techniques known from ...
详细信息
ISBN:
(纸本)0780342291
The paper describes concept and implementation of a data cache architecture with concurrent conflict free access to shared data for DSPs with parallel, synchronized processing units. It utilizes techniques known from object-oriented software design to achieve efficient and programmer friendly on-chip storage of data. The cache internally uses virtual 1D or 2D address spaces directly assigned to data structures instead of a conventional, linear address space. Data within the cache are distributed to a number of memory banks. Virtual local addresses are used for data location and hit/miss detection to minimize cost and memory latency. The object-oriented cache is fully transparent to programmer and compiler, reduces the amount of address calculations to be performed, exploits the 2D spatial locality typical for image processing algorithms and can be integrated into a standard RISC processor pipeline.
A scalable wallet-size cluster computing system has been proposed to obtain low cost, low power consumption and small sized digital home processing center. In this paper, we apply the wallet-size cluster for H.264 enc...
详细信息
K-Means is one of the major clustering algorithms thanks to its simplicity and performance. Also, clustering is widely used in several applications that involve image processing, machine intelligence and others. This ...
详细信息
ISBN:
(纸本)9781467385237
K-Means is one of the major clustering algorithms thanks to its simplicity and performance. Also, clustering is widely used in several applications that involve image processing, machine intelligence and others. This work discusses an enhanced parallel implementation of K-Means clustering using Cilk Plus and OpenMP on the CPU and CUDA on the GPU. The results are presented for different datasets and images of varying data sizes with concentration on relatively large data. Different numbers of features and clusters are also considered.
Power system is tending to form a large scale network with numerous interconnected subsystems. In order to maintain the reliability and stability of the power grid, multiple analysis systems with specific hardware pla...
详细信息
ISBN:
(纸本)9781612848372
Power system is tending to form a large scale network with numerous interconnected subsystems. In order to maintain the reliability and stability of the power grid, multiple analysis systems with specific hardware platforms are implemented in the electrical power dispatching centre, which makes the software maintenance and hardware management more complicated and ponderous. This paper describes a common task based parallel platform for power system analysis and stability assessment, in which multiple analysis systems and applications can be implemented respectively by sharing the same hardware and support software. The computers in the platform are assigned as data server, computation workstations or console workstations. There are two roles for the computation workstations, one is assumed the manager, and the others are assumed the node. The support software includes 9 programs: file transfer service, configuration service, database service, raw data service, task partitioning interface, task dispatching service, calculation interface, result processing interface and computation management service. Analysis systems and applications can easily be integrated into the platform by development of 3 programs: the task partitioning program, the calculation program and the result processing program. The parallel platform is designed to meet the needs of different applications, by which the hardware costs are reduced and the development cycle is shortened. It is also beneficial to enhance the reliability of power system operations.
暂无评论