This is an overview of the existing criteria of the failure of the composite materials and of the results of the application of some of them to simulate a low-speed hit on the composition material for the three-dimens...
详细信息
Graphlets represent small induced subgraphs and are becoming increasingly important for a variety of applications. Despite the importance of the local subgraph (graphlet) counting problem, existing work focuses mainly...
详细信息
ISBN:
(纸本)9781467390057
Graphlets represent small induced subgraphs and are becoming increasingly important for a variety of applications. Despite the importance of the local subgraph (graphlet) counting problem, existing work focuses mainly on counting graphlets globally over the entire graph. These global counts have been used for tasks such as graph classification as well as for understanding and summarizing the fundamental structural patterns in graphs. In contrast, this work proposes an accurate, efficient, and scalable parallel framework for the more challenging problem of counting graphlets locally for a given edge or set of edges. The local graphlet counts provide a topologically rigorous characterization of the local structure surrounding an edge. The aim of this work is to obtain the count of every graphlet of size k for each edge. The framework gives rise to efficient, parallel, and accurate unbiased estimation methods with provable error bounds, as well as exact algorithms for counting graphlets locally. Experiments demonstrate the effectiveness of the proposed exact and estimation methods on various datasets. In particular, the exact methods show strong scaling results (11-16x on 16 cores). Moreover, our estimation framework is accurate with error less than 5% on average.
A parallel implementation of a surface reconstruction algorithm is presented. This algorithm uses the vector field surface representation and was adapted in a previous work by the authors to handle large scale environ...
详细信息
ISBN:
(纸本)9781509032280
A parallel implementation of a surface reconstruction algorithm is presented. This algorithm uses the vector field surface representation and was adapted in a previous work by the authors to handle large scale environment reconstruction. Two parallel implementations with different memory requirements and processing speeds are described and compared. These parallel implementations increase the vector field computation speed by a factor of up to 31 times relative to a purely serial implementation. The method is demonstrated on different datasets captured on the sites of Hydro-Quebec using a variety of sensors: LiDAR, sonar and the WireScan, an underwater laser scanner designed at our laboratory.
Particle Swarm Optimization (PSO) is a heuristic technique that have been used to solve problems where many events occur simultaneously and small pieces of the problem can collaborate to reach a solution. Among its ad...
详细信息
ISBN:
(纸本)9783319322438;9783319322421
Particle Swarm Optimization (PSO) is a heuristic technique that have been used to solve problems where many events occur simultaneously and small pieces of the problem can collaborate to reach a solution. Among its advantages are fast convergence, large exploration coverage, and adequate global optimization;however to address the premature convergence problem, modifications to the basic model have been developed such as Aging Leader and Challengers (ALC) PSO and Bio-inspired Aging (BAM) PSO. Being these algorithmsparallel in nature, some authors have attempted different approaches to apply PSO using MPI and GPU. Nevertheless ALC-PSO and BAM-PSO have not been implemented in parallel. For this study, we develop PSO, ALC-PSO and BAM-PSO, through MPI and GPU using the High Performance Computing Cluster (HPCC) Agave. The results suggest that ALC-PSO and BAM-PSO reduce the premature convergence, improving global precision, whilst BAM-PSO achieves better optimal at the expense of significantly increasing the algorithm computational complexity.
We introduce a new algorithm for an unbounded concurrent double-ended queue (deque). Like the bounded deque of Herlihy, Luchangco, and Moir on which it is based, the new algorithm is simple and obstruction free, has n...
详细信息
ISBN:
(纸本)9781509028238
We introduce a new algorithm for an unbounded concurrent double-ended queue (deque). Like the bounded deque of Herlihy, Luchangco, and Moir on which it is based, the new algorithm is simple and obstruction free, has no pathological long-latency scenarios, avoids interference between operations at opposite ends, and requires no special hardware support beyond the usual compare-and-swap. To the best of our knowledge, no prior concurrent deque combines these properties with unbounded capacity, or provides consistently better performance across a wide range of concurrent workloads.
In the present paper an approach to solving the global optimization problems using a nested optimization scheme is developed. The use of different algorithms at different nesting levels is the novel element. A complex...
详细信息
ISBN:
(纸本)9783319556680;9783319556697
In the present paper an approach to solving the global optimization problems using a nested optimization scheme is developed. The use of different algorithms at different nesting levels is the novel element. A complex serial algorithm (on CPU) is used at the upper level, and a simple parallel algorithm (on GPU) is used at the lower level. This computational scheme has been implemented in ExaMin parallel solver. The results of computational experiments demonstrating the speedup when solving a series of test problems are presented.
In recent years, significant research effort has been invested in development of mesh-free methods for different types of continuum problems. Prominent amongst these methods are element free Galerkin (EFG) method, RKP...
详细信息
ISBN:
(纸本)9780791857496
In recent years, significant research effort has been invested in development of mesh-free methods for different types of continuum problems. Prominent amongst these methods are element free Galerkin (EFG) method, RKPM, and mesh-less Petrov Galerkin (MLPG) method. Most of these methods employ a set of nodes for disbretization of the problem domain, and use a moving least squares (MLS) approximation to generate shape functions. Of these methods, MLPG method is seen as a pure meshless method since it does not require any background mesh. Accuracy and flexibility of MLPG method is well established for a variety of continuum problems. However, most of the applications have been limited to small scale problems solvable on serial machines. Very few attempts have been made to apply it to large scale problems which typically involve many millions (or even billions) of nodes and would require use of parallel algorithms based on domain decomposition. Such parallel techniques are well established in context of mesh-based methods. Extension of these algorithms in conjunction with MLPG method requires considerable further research. Objective of this paper is to spell out these challenges which need urgent attention to enable the application of meshless methods to large scale problems. We specifically address the issue of the solution of large scale linear problems which would necessarily require use of iterative solvers. We focus on application of BiCGSTAB method and an appropriate set of preconditioners for the solution of the MLPG system.
Delaunay tessellations are fundamental data structures in computational geometry. They are important in data analysis, where they can represent the geometry of a point set or approximate its density. The algorithms fo...
详细信息
ISBN:
(纸本)9781467388153
Delaunay tessellations are fundamental data structures in computational geometry. They are important in data analysis, where they can represent the geometry of a point set or approximate its density. The algorithms for computing these tessellations at scale perform poorly when the input data is unbalanced. We investigate the use of k-d trees to evenly distribute points among processes and compare two strategies for picking split points between domain regions. Because resulting point distributions no longer satisfy the assumptions of existing parallel Delaunay algorithms, we develop a new parallel algorithm that adapts to its input and prove its correctness. We evaluate the new algorithm using two late-stage cosmology datasets. The new running times are up to 50 times faster using k-d tree compared with regular grid decomposition. Moreover, in the unbalanced data sets, decomposing the domain into a k-d tree is up to five times faster than decomposing it into a regular grid.
Due the recent increase of the volume of data that has been generated, organizing this data has become one of the biggest problems in Computer Science. Among the different strategies propose to deal efficiently and ef...
详细信息
Due the recent increase of the volume of data that has been generated, organizing this data has become one of the biggest problems in Computer Science. Among the different strategies propose to deal efficiently and effectively for this purpose, we highlight those related to clustering, more specifically, density-based clustering strategies, which stands out for its ability to define clusters of arbitrary shape and the robustness to deal with the presence of data noise, such as DBSCAN and OPTICS. However, these algorithms are still a computational challenge since they are distance-based proposals. In this work we present a new approach to make OPTICS feasible based on data indexing strategy. Although the simplicity with which the data are indexed, using graphs, it allows explore various parallelization opportunities, which were explored using graphic processing unit (GPU). Based on this structure, the complexity of OPTICS is reduced to O(E * logV) in the worst case, becoming itself very fast. In our evaluation we show that our proposal can be over 200x faster than its sequential version using CPU.
Many high-performance distributed memory applications rely on point-to-point messaging using the Message Passing Interface (MPI). Due to the latency of the network, and other costs, this communication can limit the sc...
详细信息
ISBN:
(纸本)9781509021406
Many high-performance distributed memory applications rely on point-to-point messaging using the Message Passing Interface (MPI). Due to the latency of the network, and other costs, this communication can limit the scalability of an application when run on high node counts of distributed memory supercomputers. Communication costs are further increased on modern multi- and many-core architectures, when using more than one MPI process per node, as each process sends and receives messages independently, inducing multiple latencies and contention for resources. In this paper, we use shared memory constructs available in the MPI 3.0 standard to implement an aggregated communication method to minimize the number of inter-node messages to reduce these costs. We compare the performance of this Minimal Aggregated SHared Memory (MASHM) messaging to the standard point-to-point implementation on large-scale supercomputers, where we see that MASHM leads to enhanced strong scalability of a weighted Jacobi relaxation. For this application, we also see that the use of shared memory parallelism through MASHM and MPI 3.0 can be more efficient than using Open Multi-Processing (OpenMP). We then present a model for the communication costs of MASHM which shows that this method achieves its goal of reducing latency costs while also reducing bandwidth costs. Finally, we present MASHM as an open source library to facilitate the integration of this efficient communication method into existing distributed memory applications.
暂无评论