检索结果-内蒙古大学图书馆

20th International Conference on High Performance Computing (HiPC)

作者： Banerjee, Dip Sankar Sharma, Shashank Kothapalli, Kishore Int Inst Informat Technol Hyderabad 500032 Andhra Pradesh India

ISBN: (纸本)9781479907298

Graph algorithms play a prominent role in several fields of sciences and engineering. Notable among them are graph traversal, finding the connected components of a graph, and computing shortest paths. There are several efficient implementations of the above problems on a variety of modern multiprocessor architectures. It can be noticed in recent times that the size of the graphs that correspond to real world data sets has been increasing. parallelism offers only a limited succor to this situation as current parallel architectures have severe short-comings when deployed for most graph algorithms. At the same time, these graphs are also getting very sparse in nature. This calls for particular work efficient solutions aimed at processing large, sparse graphs on modern parallel architectures. In this paper, we introduce graph pruning as a technique that aims to reduce the size of the graph. Certain elements of the graph can be pruned depending on the nature of the computation. Once a solution is obtained for the pruned graph, the solution is extended to the entire graph. We apply the above technique on three fundamental graph algorithms: breadth first search (BFS), Connected Components (CC), and All Pairs Shortest Paths (APSP). To validate our technique, we implement our algorithms on a heterogeneous platform consisting of a multicore CPU and a GPU. On this platform, we achieve an average of 35% improvement compared to state-of-the-art solutions. Such an improvement has the potential to speed up other applications that rely on these algorithms.

关键词： MULTIPROCESSOR ARCHITECTURE Line graph EFFICIENT parallel ALGORITHM parallel algorithms Graph algorithms Exploration connected component shortest path

来源：评论

学校读者我要写书评

暂无评论

parallel Wideband MLFMA for Analysis of Electrically Large, Nonuniform, Multiscale Structures

引用

IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION 2019年第2期67卷 1094-1107页

作者： Hughey, Stephen Aktulga, H. M. Vikram, Melapudi Lu, Mingyu Shanker, Balasubramaniam Michielssen, Eric Michigan State Univ Dept Elect & Comp Engn E Lansing MI 48824 USA Michigan State Univ Dept Comp Sci E Lansing MI 48824 USA GE Global Res Ctr Bengaluru 560066 India West Virginia Univ Inst Technol Dept Elect & Comp Engn Beckley WV 25801 USA Univ Michigan Dept Elect Engn & Comp Sci Ann Arbor MI 48109 USA

Electromagnetic scattering from electrically large objects with multiscale features is an increasingly important problem in computational electromagnetics. A conventional approach is to use an integral equation-based solver that is then augmented with an accelerator, a popular choice being a parallel multilevel fast multipole algorithm (MLFMA). One consequence of multiscale features is locally dense discretization, which leads to low-frequency breakdown and requires nonuniform trees. To the authors' knowledge, the literature on parallel MLFMA for such multiscale distributions capable of arbitrary accuracy is sparse;this paper aims to fill this niche. We prescribe an algorithm that overcomes this bottleneck. We demonstrate the accuracy (with respect to analytical data) and performance of the algorithm for both PEC scatterers and point clouds as large as 755 lambda with several hundred million unknowns and nonuniform trees as deep as 16 levels.

关键词： Adaptive algorithms computational electromagnetics method of moments (MoM) multilevel fast multipole algorithm (MLFMA) parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Theoretically and Practically Efficient parallel Nucleus Decomposition

arXiv

引用

arXiv 2021年

作者： Shi, Jessica Dhulipala, Laxman Shun, Julian MIT CSAIL United States

This paper studies the nucleus decomposition problem, which has been shown to be useful in finding dense substructures in graphs. We present a novel parallel algorithm that is efficient both in theory and in practice. Our algorithm achieves a work complexity matching the best sequential algorithm while also having low depth (parallel running time), which significantly improves upon the only existing parallel nucleus decomposition algorithm (Sariyüce et al., PVLDB 2018). The key to the theoretical efficiency of our algorithm is the use of a theoretically-efficient parallel algorithms for clique listing and bucketing. We introduce several new practical optimizations, including a new multi-level hash table structure to store information on cliques space-efficiently and a technique for traversing this structure cache-efficiently. On a 30-core machine with two-way hyper-threading on real-world graphs, we achieve up to a 55x speedup over the state-of-the-art parallel nucleus decomposition algorithm by Sariyüce et al., and up to a 40x self-relative parallel speedup. We are able to efficiently compute larger nucleus decompositions than prior work on several million-scale graphs for the first time. Copyright © 2021, The Authors. All rights reserved.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Redundant Data High Efficiency Compression Based on Distributed parallel Algorithm

Research Square

引用

Research Square 2021年

作者： Gong, Jianhu School of Data and Computer Science Guangdong Peizheng College Guangzhou510830 China

In order to improve the optimal storage capacity of redundant data in serial hybrid network cascade database, a high efficiency compression algorithm for redundant data in serial hybrid network cascade database based on distributed parallel algorithm is proposed. The distributed storage structure model of redundant data of serial mixed network cascade database is constructed, the association feature extraction of redundant data of serial mixed network cascade database is carried out by using distributed hybrid feature mining method, the dimension reduction of redundant data of serial mixed network cascade database is carried out by combining with feature transformation method, the automatic location allocation of redundant data of serial mixed network cascade database is carried out by using high-order spectrum decomposition method, and the high-efficiency energy compression output model of redundant data of serial mixed network cascade database is constructed. The simulation results show that this method has good losslessness for redundant data compression of serial mixed network cascaded database and good fidelity of data output. © 2021, CC BY.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Implementation of Fast Predictive Controllers on FPGA Platforms based on parallel Lipschitz Interpolation

Implementation of Fast Predictive Controllers on FPGA Platfo...

引用

European Control Conference (ECC)

作者： J.M. Nadales J.M. Manzano A. Barriga D. Limon University of Seville Universidad Loyola Andalucía

The implementation of nonlinear model predictive controllers for systems operating at high frequencies constitutes a significant challenge, mainly because of the complexity and time consumption of the optimization problem involved. An alternative that has been proposed is the employment of data-driven techniques to offline learn the control law, and then to implement it on a target embedded platform. Following this trend, in this paper we propose the implementation of predictive controllers on FPGA platforms making use of a parallel version of the machine learning technique known as Lipschitz interpolation. By doing this, computation time can be enormously accelerated. The results are compared to those obtained when the sequential algorithm runs on standard CPU platforms, and when the system is controlled by solving the optimization problem online, in terms of the error made and computing time. This method is validated in a case study where the nonlinear model predictive controller is employed to control a self-balancing two-wheel robot.

关键词： Interpolation Predictive models Prediction algorithms Control systems parallel algorithms Field programmable gate arrays Standards

来源：评论

学校读者我要写书评

暂无评论

An efficient parallel algorithm for finding bridges in a dense graph

arXiv

引用

arXiv 2021年

作者： Kumar, Ashwani Singh, Aditya Pratap Independent Scholar Delhi India

This paper presents a simple and efficient approach for finding the bridges and failure points in a densely connected network mapped as a graph. The algorithm presented here is a parallel algorithm which works in a distributed environment. The main idea of our algorithm is to generate a sparse certificate for a graph and finds bridges using a simple DFS (Depth First Search). We first decompose the graph into independent and minimal subgraphs using a minimum spanning forest algorithm. To identify the bridges in the graph network, we convert these subgraphs into a single compressed graph and use a DFS approach to find bridges. The approach presented here is optimized for the use cases of dense graphs and gives the time complexity of O(E/M + Vlog(M)), for a given graph G(V,E) running on M machines. © 2021, CC BY.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

A parallel closed centrality algorithm for complex networks

A parallel closed centrality algorithm for complex networks

引用

International Informatics and Software Engineering Conference (UBMYK)

作者： Kayhan Erciyes Maltepe University Istanbul Turkey

ISBN: (纸本)9781665407601

Complex networks are large and analysis of these networks require significantly different methods than small networks. parallel processing is needed to provide analysis of these networks in a timely manner. Graph centrality measures provide convenient methods to assess the structure of these networks. We review main centrality algorithms, describe implementation of closed centrality in Python and propose a simple parallel algorithm of closed centrality and show its implementation in Python with obtained results.

关键词： Software algorithms Complex networks parallel algorithms Informatics Python Software engineering

来源：评论

学校读者我要写书评

暂无评论

parallel reduced-order modeling for digital twins using high-performance computing workflows

引用

Computers & Structures 2025年 316卷

作者： S. Ares de Parga J.R. Bravo N. Sibuet J.A. Hernandez R. Rossi Stefan Boschert Enrique S. Quintana-Ortí Andrés E. Tomás Cristian Cătălin Tatu Fernando Vázquez-Novoa Jorge Ejarque Rosa M. Badia Department of Civil and Environmental Engineering Universitat Politècnica de Catalunya Building B0 Campus Nord Jordi Girona 1–3 Barcelona 08034 Spain Centre Internacional de Mètodes Numèrics en Enginyeria (CIMNE) Universitat Politècnica de Catalunya Building C1 Campus Nord Jordi Girona 1–3 Barcelona 08034 Spain E.S. d’Enginyeries Industrial Aeroespacial i Audiovisual de Terrassa Universitat Politècnica de Catalunya C/ Colom 11 Terrassa 08222 Spain Siemens AG Munich Germany Departamento de Informática de Sistemas y Computadores Universitat Politécnica de Valéncia Valencia 46022 Spain Barcelona Supercomputing Center Plaça Eusebi Güell 1-3 Barcelona 08034 Spain

The integration of reduced-order models with high-performance computing is critical for developing digital twins, particularly for real-time monitoring and predictive maintenance of industrial systems. This paper presents a comprehensive, high-performance computing-enabled workflow for developing and deploying projection-based reduced-order models for large-scale mechanical simulations. We use PyCOMPSs’ parallel framework to efficiently execute reduced-order model training simulations, employing parallel singular value decomposition algorithms such as randomized singular value decomposition, Lanczos singular value decomposition, and full singular value decomposition based on tall-skinny QR. Moreover, we introduce a partitioned version of the hyperreduction scheme known as the Empirical Cubature Method to further enhance computational efficiency in projection-based reduced-order models for mechanical systems. Despite the widespread use of high-performance computing for projection-based reduced-order models, there is a significant lack of publications detailing comprehensive workflows for building and deploying end-to-end projection-based reduced-order models in high-performance computing environments. Our workflow is validated through a case study focusing on the thermal dynamics of a motor, a multiphysics problem involving convective heat transfer and mechanical components. The projection-based reduced-order model is designed to deliver a real-time prognosis tool that could enable rapid and safe motor restarts post-emergency shutdowns under different operating conditions, demonstrating its potential impact on the practice of simulations in engineering mechanics. To facilitate deployment, we use the High-Performance Computing Workflow as a Service strategy and Functional Mock-Up Units to ensure compatibility and ease of integration across high-performance computing, edge, and cloud environments. The outcomes illustrate the efficacy of combining projection-based reduc

关键词： High-performance computing Projection-based reduced-order models Digital twins Computational mechanics Multiphysics simulations parallel algorithms Empirical cubature method Hyperreduction techniques Convective heat transfer Thermal analysis Motor thermal dynamics

来源：评论

学校读者我要写书评

暂无评论

Distributed-Memory parallel algorithms for Generating Massive Scale-free Networks Using Preferential Attachment Model 13

Distributed-Memory Parallel Algorithms for Generating Massiv...

引用

International Conference for High Performance Computing, Networking, Storage and Analysis (SC)

作者： Alam, Maksudul Khan, Maleq Marathe, Madhav V. Virginia Tech Dept Comp Sci Blacksburg VA 24061 USA Virginia Tech Dept Comp Sci Blacksburg VA 24061 USA

ISBN: (纸本)9781450323789

Recently, there has been substantial interest in the study of various random networks as mathematical models of complex systems. As these complex systems grow larger, the ability to generate progressively large random networks becomes all the more important. This motivates the need for efficient parallel algorithms for generating such networks. Naive parallelization of the sequential algorithms for generating random networks may not work due to the dependencies among the edges and the possibility of creating duplicate (parallel) edges. In this paper, we present MPI-based distributed memory parallel algorithms for generating random scale-free networks using the preferential-attachment model. Our algorithms scale very well to a large number of processors and provide almost linear speedups. The algorithms can generate scale-free networks with 50 billion edges in 123 seconds using 768 processors.

关键词： scale-free networks Big Data high performance computing preferential attachment random networks parallel algorithms copy model

来源：评论

学校读者我要写书评

暂无评论

Speed-up Single Shot Detector on GPU with CUDA

Speed-up Single Shot Detector on GPU with CUDA

引用

Summer Virtual Conference on Software Engineering, Artificial Intelligence, Networking and parallel/Distributed Computing (SNPD-Summer), ACIS International

作者： Chenyu Wang Toshi Endo Takahiro Hirofuchi Tsutomu Ikegami Tokyo Institute of Technology & National Institute of Advanced Industrial Science and Technology Tokyo Japan Tokyo Institute of Technology Tokyo Japan National Institute of Advanced Industrial Science and Technology Tokyo Japan

ISBN: (纸本)9798350396386

Nowadays, most of the current research on object detection is to improve the whole framework, in order to improve the accuracy of detection, but another problem of object detection is the detection speed. The more complex the architecture, the slower the speed. This time, we implemented a Single Shot Multibox Detector(SSD) using GPU with *** have improved the object detection speed of SSD, which is one of the most regularly used object detection frameworks. The most time-consuming part, the VGG16 network, is rephrased by using cuDNN, which is made faster by about 9%. The second time-consuming part is post-processing, where non-maximum-suppression (NMS) is performed. We accelerated NMS by implementing our new algorithms that are suitable for GPUs, which is about 52% faster than the original PyTorch version [11]. We also ported those parts that were originally executed on the CPU to the GPU. In total, our GPU-accelerated SSD can detect objects 22.5% faster than the original version. We demonstrate that using GPUs to accelerate existing frameworks is a viable approach.

关键词： Graphics processing units Object detection Detectors Computer architecture parallel algorithms Artificial intelligence Software engineering

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：