Graph algorithms play a prominent role in several fields of sciences and engineering. Notable among them are graph traversal, finding the connected components of a graph, and computing shortest paths. There are severa...
详细信息
ISBN:
(纸本)9781479907298
Graph algorithms play a prominent role in several fields of sciences and engineering. Notable among them are graph traversal, finding the connected components of a graph, and computing shortest paths. There are several efficient implementations of the above problems on a variety of modern multiprocessor architectures. It can be noticed in recent times that the size of the graphs that correspond to real world data sets has been increasing. parallelism offers only a limited succor to this situation as current parallel architectures have severe short-comings when deployed for most graph algorithms. At the same time, these graphs are also getting very sparse in nature. This calls for particular work efficient solutions aimed at processing large, sparse graphs on modern parallel architectures. In this paper, we introduce graph pruning as a technique that aims to reduce the size of the graph. Certain elements of the graph can be pruned depending on the nature of the computation. Once a solution is obtained for the pruned graph, the solution is extended to the entire graph. We apply the above technique on three fundamental graph algorithms: breadth first search (BFS), Connected Components (CC), and All Pairs Shortest Paths (APSP). To validate our technique, we implement our algorithms on a heterogeneous platform consisting of a multicore CPU and a GPU. On this platform, we achieve an average of 35% improvement compared to state-of-the-art solutions. Such an improvement has the potential to speed up other applications that rely on these algorithms.
Electromagnetic scattering from electrically large objects with multiscale features is an increasingly important problem in computational electromagnetics. A conventional approach is to use an integral equation-based ...
详细信息
Electromagnetic scattering from electrically large objects with multiscale features is an increasingly important problem in computational electromagnetics. A conventional approach is to use an integral equation-based solver that is then augmented with an accelerator, a popular choice being a parallel multilevel fast multipole algorithm (MLFMA). One consequence of multiscale features is locally dense discretization, which leads to low-frequency breakdown and requires nonuniform trees. To the authors' knowledge, the literature on parallel MLFMA for such multiscale distributions capable of arbitrary accuracy is sparse;this paper aims to fill this niche. We prescribe an algorithm that overcomes this bottleneck. We demonstrate the accuracy (with respect to analytical data) and performance of the algorithm for both PEC scatterers and point clouds as large as 755 lambda with several hundred million unknowns and nonuniform trees as deep as 16 levels.
This paper studies the nucleus decomposition problem, which has been shown to be useful in finding dense substructures in graphs. We present a novel parallel algorithm that is efficient both in theory and in practice....
详细信息
In order to improve the optimal storage capacity of redundant data in serial hybrid network cascade database, a high efficiency compression algorithm for redundant data in serial hybrid network cascade database based ...
详细信息
The implementation of nonlinear model predictive controllers for systems operating at high frequencies constitutes a significant challenge, mainly because of the complexity and time consumption of the optimization pro...
详细信息
The implementation of nonlinear model predictive controllers for systems operating at high frequencies constitutes a significant challenge, mainly because of the complexity and time consumption of the optimization problem involved. An alternative that has been proposed is the employment of data-driven techniques to offline learn the control law, and then to implement it on a target embedded platform. Following this trend, in this paper we propose the implementation of predictive controllers on FPGA platforms making use of a parallel version of the machine learning technique known as Lipschitz interpolation. By doing this, computation time can be enormously accelerated. The results are compared to those obtained when the sequential algorithm runs on standard CPU platforms, and when the system is controlled by solving the optimization problem online, in terms of the error made and computing time. This method is validated in a case study where the nonlinear model predictive controller is employed to control a self-balancing two-wheel robot.
This paper presents a simple and efficient approach for finding the bridges and failure points in a densely connected network mapped as a graph. The algorithm presented here is a parallel algorithm which works in a di...
详细信息
Complex networks are large and analysis of these networks require significantly different methods than small networks. parallel processing is needed to provide analysis of these networks in a timely manner. Graph cent...
详细信息
ISBN:
(纸本)9781665407601
Complex networks are large and analysis of these networks require significantly different methods than small networks. parallel processing is needed to provide analysis of these networks in a timely manner. Graph centrality measures provide convenient methods to assess the structure of these networks. We review main centrality algorithms, describe implementation of closed centrality in Python and propose a simple parallel algorithm of closed centrality and show its implementation in Python with obtained results.
The integration of reduced-order models with high-performance computing is critical for developing digital twins, particularly for real-time monitoring and predictive maintenance of industrial systems. This paper pres...
详细信息
The integration of reduced-order models with high-performance computing is critical for developing digital twins, particularly for real-time monitoring and predictive maintenance of industrial systems. This paper presents a comprehensive, high-performance computing-enabled workflow for developing and deploying projection-based reduced-order models for large-scale mechanical simulations. We use PyCOMPSs’ parallel framework to efficiently execute reduced-order model training simulations, employing parallel singular value decomposition algorithms such as randomized singular value decomposition, Lanczos singular value decomposition, and full singular value decomposition based on tall-skinny QR. Moreover, we introduce a partitioned version of the hyperreduction scheme known as the Empirical Cubature Method to further enhance computational efficiency in projection-based reduced-order models for mechanical systems. Despite the widespread use of high-performance computing for projection-based reduced-order models, there is a significant lack of publications detailing comprehensive workflows for building and deploying end-to-end projection-based reduced-order models in high-performance computing environments. Our workflow is validated through a case study focusing on the thermal dynamics of a motor, a multiphysics problem involving convective heat transfer and mechanical components. The projection-based reduced-order model is designed to deliver a real-time prognosis tool that could enable rapid and safe motor restarts post-emergency shutdowns under different operating conditions, demonstrating its potential impact on the practice of simulations in engineering mechanics. To facilitate deployment, we use the High-Performance Computing Workflow as a Service strategy and Functional Mock-Up Units to ensure compatibility and ease of integration across high-performance computing, edge, and cloud environments. The outcomes illustrate the efficacy of combining projection-based reduc
Recently, there has been substantial interest in the study of various random networks as mathematical models of complex systems. As these complex systems grow larger, the ability to generate progressively large random...
详细信息
ISBN:
(纸本)9781450323789
Recently, there has been substantial interest in the study of various random networks as mathematical models of complex systems. As these complex systems grow larger, the ability to generate progressively large random networks becomes all the more important. This motivates the need for efficient parallel algorithms for generating such networks. Naive parallelization of the sequential algorithms for generating random networks may not work due to the dependencies among the edges and the possibility of creating duplicate (parallel) edges. In this paper, we present MPI-based distributed memory parallel algorithms for generating random scale-free networks using the preferential-attachment model. Our algorithms scale very well to a large number of processors and provide almost linear speedups. The algorithms can generate scale-free networks with 50 billion edges in 123 seconds using 768 processors.
Nowadays, most of the current research on object detection is to improve the whole framework, in order to improve the accuracy of detection, but another problem of object detection is the detection speed. The more com...
详细信息
ISBN:
(纸本)9798350396386
Nowadays, most of the current research on object detection is to improve the whole framework, in order to improve the accuracy of detection, but another problem of object detection is the detection speed. The more complex the architecture, the slower the speed. This time, we implemented a Single Shot Multibox Detector(SSD) using GPU with *** have improved the object detection speed of SSD, which is one of the most regularly used object detection frameworks. The most time-consuming part, the VGG16 network, is rephrased by using cuDNN, which is made faster by about 9%. The second time-consuming part is post-processing, where non-maximum-suppression (NMS) is performed. We accelerated NMS by implementing our new algorithms that are suitable for GPUs, which is about 52% faster than the original PyTorch version [11]. We also ported those parts that were originally executed on the CPU to the GPU. In total, our GPU-accelerated SSD can detect objects 22.5% faster than the original version. We demonstrate that using GPUs to accelerate existing frameworks is a viable approach.
暂无评论