We developed a software framework for boundary element analyses. The software supports a hybrid parallel programming model and is equipped with a hierarchical matrix (H-matrix) library to accelerate the BEM analysis.
ISBN:
(纸本)9781509010332
We developed a software framework for boundary element analyses. The software supports a hybrid parallel programming model and is equipped with a hierarchical matrix (H-matrix) library to accelerate the BEM analysis.
Hyper-threading (HT) technology allows one thread to execute its task while another thread is stalled waiting for shared resource or other operations to complete. Thus, this reduces the idle time of a processor. If HT...
详细信息
ISBN:
(纸本)9781509021611
Hyper-threading (HT) technology allows one thread to execute its task while another thread is stalled waiting for shared resource or other operations to complete. Thus, this reduces the idle time of a processor. If HT is enabled, an operating system would see two logical cores per each physical core. This gives one physical core the ability to run two threads simultaneously. However, it does not necessarily speed up the performance of a parallel code twice the number of physical cores. This happens when two threads are trying to access the shared CPU resource. The instructions could only be executed one after another at any given time. In this case, parallel CPU-bound code could attain a little improvement in terms of speedup from HT on a quad-core platform, which is Intel i5-2410M@2.30GHz.
This paper present experiment done with mapping of Algorithmic structure pattern with implementation pattern. Selection of implementation patterns and data structures needs to consider parallel platform for which they...
详细信息
ISBN:
(纸本)9781509006700
This paper present experiment done with mapping of Algorithmic structure pattern with implementation pattern. Selection of implementation patterns and data structures needs to consider parallel platform for which they are developed and they also affects the performance of program. The experiment results supports need of Adaptive patterns for parallel programming to develop software's runs on different parallel environments.
Increasingly sophisticated, complex, and energy-efficient cyber-physical systems and wireless sensor networks are emerging, facilitated by recent advances in computing and sensor technologies. Integration of cyberphys...
详细信息
ISBN:
(纸本)9781509027729
Increasingly sophisticated, complex, and energy-efficient cyber-physical systems and wireless sensor networks are emerging, facilitated by recent advances in computing and sensor technologies. Integration of cyberphysical systems and wireless sensor networks with other contemporary technologies, such as unmanned aerial vehicles and fog or edge computing, enable creation of completely new smart solutions. We present the concept of a Smart Mobile Access Point (SMAP), which is a key building block for a smart network, and propose an efficient placement approach for such SMAPs. SMAPs predict the behavior of the network, based on information collected from the network, and select the best approach to support the network at any given time. When needed, they autonomously change their positions to obtain a better configuration from the network performance perspective. Therefore, placement of SMAPs is an important issue in such a system. Initial placement of SMAPs is an NP problem, and evolutionary algorithms provide an efficient means to solve it. Specifically, we present a parallel implementation of the imperialistic competitive algorithm and an efficient evaluation or fitness function to solve the initial placement of SMAPs in the fog computing context.
This article presents a method of enhancing the efficiency of Grid scheduling algorithms by employing a job grouping method based on priorities and also grouping of Grid machines based on their configuration before im...
详细信息
This article presents a method of enhancing the efficiency of Grid scheduling algorithms by employing a job grouping method based on priorities and also grouping of Grid machines based on their configuration before implementing a suitable scheduling algorithm within paired groups. The Priority method is employed to group jobs into four groups, while two different methods, Similar Together and Evenly Distributed, are employed to group machines into four groups before implementing the Min Min Grid scheduling algorithm simultaneously. Implementing the scheduling algorithms simultaneously within paired groups (multi-scheduling) ensures a high degree of parallelism, increases throughput and improves the overall performance of scheduling algorithms. Two sets of controlled experiments were carried out on an HPC system. Analysis of results shows that the Priority Grouping method improved the scheduling efficiency by very large margins over the non-grouping method. (C) 2014 Elsevier Inc. All rights reserved.
The simulation of magnetically geared electrical machines using the finite element method is an especially demanding task when movement has to be considered. Several methods that facilitate movement exist. In this pap...
详细信息
The simulation of magnetically geared electrical machines using the finite element method is an especially demanding task when movement has to be considered. Several methods that facilitate movement exist. In this paper, two of these methods, the macro air-gap element (AGE) and the moving band (MB) are applied in a time-stepped static simulation of a magnetically geared machine (MGM). The methods are evaluated in terms of accuracy and computational efficiency, vitally important factors for numerical optimization. The implementation of both methods exploit the multi-core architecture of modern CPUs to solve several steps in parallel, drastically reducing the simulation time. Nevertheless, the computational cost of the AGE is prohibitively high in the simulation of MGMs. The MB is computationally efficient and good accuracy can be achieved using a multilayer approach.
Streaming applications process possibly infinite streams of data and often have both high throughput and low latency requirements. They are comprised of operator graphs that produce and consume data tuples. General st...
详细信息
Streaming applications process possibly infinite streams of data and often have both high throughput and low latency requirements. They are comprised of operator graphs that produce and consume data tuples. General streaming applications use stateful, selective, and user-defined operators. The stream programming model naturally exposes task and pipeline parallelism, enabling it to exploit parallel systems of all kinds, including large clusters. However, data parallelism must either be manually introduced by programmers, or extracted as an optimization by compilers. Previous data parallel optimizations did not apply to selective, stateful and user-defined operators. This article presents a compiler and runtime system that automatically extracts data parallelism for general stream processing. Data-parallelization is safe if the transformed program has the same semantics as the original sequential version. The compiler forms parallel regions while considering operator selectivity, state, partitioning, and graph dependencies. The distributed runtime system ensures that tuples always exit parallel regions in the same order they would without data parallelism, using the most efficient strategy as identified by the compiler. Our experiments using 100 cores across 14 machines show linear scalability for parallel regions that are computation-bound, and near linear scalability when tuples are shuffled across parallel regions.
We propose a parallel algorithm for computing exact solutions to the problem of minimizing the number of multileaf collimator apertures needed in step-and-shoot intensity modulated radiotherapy. These problems are ver...
详细信息
We propose a parallel algorithm for computing exact solutions to the problem of minimizing the number of multileaf collimator apertures needed in step-and-shoot intensity modulated radiotherapy. These problems are very challenging particularly as the problem size increases. Here, we investigate how advanced parallel computing methods can be applied to these problems with a focus on the issues that are peculiar to parallel search algorithms and do not arise in their serial counterparts. A previous paper by the authors presented the MU-RD method for solving such problems using a serial constraint programming based search method. This method is being used as the starting point for a parallel implementation. The key challenges in creating a parallel implementation are ensuring that the CPUs are not starved of work and avoiding unnecessary computation due to the rearrangement of the search order in the parallel version. We show that efficient parallel optimisation is possible by dynamically changing the way work is split with potentially multiple tree search processes as well as parallel search of nodes. A weakly sorted queueing system is used to ensure appropriate prioritisation of tasks. Numerical results are presented to demonstrate the effectiveness of our algorithms in scaling from 8 to 64 CPUs.
In this paper, we propose a parallel ant colony optimization based metaheuristic for solving the maximum-weight clique problem, which is a variation of the maximum clique problem. The advised parallel computing model ...
详细信息
ISBN:
(纸本)9781509036837
In this paper, we propose a parallel ant colony optimization based metaheuristic for solving the maximum-weight clique problem, which is a variation of the maximum clique problem. The advised parallel computing model is based on concept of cooperation among multiple ant colonies system. The cooperation system consists of a message center and a number of ant colonies. Each ant colony attempts to explore the solution space by using its own search strategy. The message center first collects the solution information from different ant colonies, and then it shares the current best solution with them. The performance of the proposed method was evaluated on a set of the standard benchmark instances from literature. The obtained results were compared to those reached by the Cplex solver and the best solutions reported in the literature. From the experimental results, one can observe that encouraging results have been obtained.
Sequence comparison problems such as sequence alignment and approximate string matching are part of the fundamental problems in many fields such as natural language processing, data mining and bioinformatics. However,...
详细信息
ISBN:
(纸本)9781467386159
Sequence comparison problems such as sequence alignment and approximate string matching are part of the fundamental problems in many fields such as natural language processing, data mining and bioinformatics. However, the algorithms proposed to address these problems suffer from high computational complexities prohibiting them from being widely used in practical large-scale settings. Many researchers used parallel programming to reduce the execution time of these algorithms. In this paper, we follow this approach and use the parallelism capabilities of the Graphics Processing Unit (GPU) to accelerate one of the most common algorithms to compute the edit distance between two strings, which is known as the Levenshtein distance. To take full advantage of the large number of cores in a GPU, we employ a diagonal-based tracing technique which results in even greater improvements in terms of the running time. In fact, our CUDA implementation of the Levenshtein algorithm is about 11X faster than the sequential implementation. This is achieved without affecting the accuracy.
暂无评论