检索结果-内蒙古大学图书馆

arXiv 2021年

作者： Kumar, Ashwani Singh, Aditya Pratap Independent Scholar Delhi India

This paper presents a simple and efficient approach for finding the bridges and failure points in a densely connected network mapped as a graph. The algorithm presented here is a parallel algorithm which works in a distributed environment. The main idea of our algorithm is to generate a sparse certificate for a graph and finds bridges using a simple DFS (Depth First Search). We first decompose the graph into independent and minimal subgraphs using a minimum spanning forest algorithm. To identify the bridges in the graph network, we convert these subgraphs into a single compressed graph and use a DFS approach to find bridges. The approach presented here is optimized for the use cases of dense graphs and gives the time complexity of O(E/M + Vlog(M)), for a given graph G(V,E) running on M machines. © 2021, CC BY.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Locality properties of 3D data orderings with application to parallel molecular dynamics simulations

引用

INTERNATIONAL JOURNAL OF HIGH PERFORMANCE COMPUTING APPLICATIONS 2019年第5期33卷 998-1018页

作者： Al-Kharusi, Ibrahim Walker, David W. Cardiff Univ Sch Comp Sci & Informat 5 Parade Cardiff CF24 3AA S Glam Wales Cardiff Univ Sch Comp Sci & Informat High Performance Comp Cardiff S Glam Wales

Application performance on graphical processing units (GPUs), in terms of execution speed and memory usage, depends on the efficient use of hierarchical memory. It is expected that enhancing data locality in molecular dynamic simulations will lower the cost of data movement across the GPU memory hierarchy. The work presented in this article analyses the spatial data locality and data reuse characteristics for row-major, Hilbert and Morton orderings and the impact these have on the performance of molecular dynamics simulations. A simple cache model is presented, and this is found to give results that are consistent with the timing results for the particle force computation obtained on NVidia GeForce GTX960 and Tesla P100 GPUs. Further analysis of the observed memory use, in terms of cache hits and the number of memory transactions, provides a more detailed explanation of execution behaviour for the different orderings. To the best of our knowledge, this is the first study to investigate memory analysis and data locality issues for molecular dynamics simulations of Lennard-Jones fluids on NVidia's Maxwell and Tesla architectures.

关键词： Molecular dynamics simulation Morton ordering Hilbert ordering GPU parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel reduced-order modeling for digital twins using high-performance computing workflows

引用

Computers & Structures 2025年 316卷

作者： S. Ares de Parga J.R. Bravo N. Sibuet J.A. Hernandez R. Rossi Stefan Boschert Enrique S. Quintana-Ortí Andrés E. Tomás Cristian Cătălin Tatu Fernando Vázquez-Novoa Jorge Ejarque Rosa M. Badia Department of Civil and Environmental Engineering Universitat Politècnica de Catalunya Building B0 Campus Nord Jordi Girona 1–3 Barcelona 08034 Spain Centre Internacional de Mètodes Numèrics en Enginyeria (CIMNE) Universitat Politècnica de Catalunya Building C1 Campus Nord Jordi Girona 1–3 Barcelona 08034 Spain E.S. d’Enginyeries Industrial Aeroespacial i Audiovisual de Terrassa Universitat Politècnica de Catalunya C/ Colom 11 Terrassa 08222 Spain Siemens AG Munich Germany Departamento de Informática de Sistemas y Computadores Universitat Politécnica de Valéncia Valencia 46022 Spain Barcelona Supercomputing Center Plaça Eusebi Güell 1-3 Barcelona 08034 Spain

The integration of reduced-order models with high-performance computing is critical for developing digital twins, particularly for real-time monitoring and predictive maintenance of industrial systems. This paper presents a comprehensive, high-performance computing-enabled workflow for developing and deploying projection-based reduced-order models for large-scale mechanical simulations. We use PyCOMPSs’ parallel framework to efficiently execute reduced-order model training simulations, employing parallel singular value decomposition algorithms such as randomized singular value decomposition, Lanczos singular value decomposition, and full singular value decomposition based on tall-skinny QR. Moreover, we introduce a partitioned version of the hyperreduction scheme known as the Empirical Cubature Method to further enhance computational efficiency in projection-based reduced-order models for mechanical systems. Despite the widespread use of high-performance computing for projection-based reduced-order models, there is a significant lack of publications detailing comprehensive workflows for building and deploying end-to-end projection-based reduced-order models in high-performance computing environments. Our workflow is validated through a case study focusing on the thermal dynamics of a motor, a multiphysics problem involving convective heat transfer and mechanical components. The projection-based reduced-order model is designed to deliver a real-time prognosis tool that could enable rapid and safe motor restarts post-emergency shutdowns under different operating conditions, demonstrating its potential impact on the practice of simulations in engineering mechanics. To facilitate deployment, we use the High-Performance Computing Workflow as a Service strategy and Functional Mock-Up Units to ensure compatibility and ease of integration across high-performance computing, edge, and cloud environments. The outcomes illustrate the efficacy of combining projection-based reduc

关键词： High-performance computing Projection-based reduced-order models Digital twins Computational mechanics Multiphysics simulations parallel algorithms Empirical cubature method Hyperreduction techniques Convective heat transfer Thermal analysis Motor thermal dynamics

来源：评论

学校读者我要写书评

暂无评论

A parallel closed centrality algorithm for complex networks

A parallel closed centrality algorithm for complex networks

引用

International Informatics and Software Engineering Conference (UBMYK)

作者： Kayhan Erciyes Maltepe University Istanbul Turkey

ISBN: (纸本)9781665407601

Complex networks are large and analysis of these networks require significantly different methods than small networks. parallel processing is needed to provide analysis of these networks in a timely manner. Graph centrality measures provide convenient methods to assess the structure of these networks. We review main centrality algorithms, describe implementation of closed centrality in Python and propose a simple parallel algorithm of closed centrality and show its implementation in Python with obtained results.

关键词： Software algorithms Complex networks parallel algorithms Informatics Python Software engineering

来源：评论

学校读者我要写书评

暂无评论

Distributed-Memory parallel algorithms for Generating Massive Scale-free Networks Using Preferential Attachment Model 13

Distributed-Memory Parallel Algorithms for Generating Massiv...

引用

International Conference for High Performance Computing, Networking, Storage and Analysis (SC)

作者： Alam, Maksudul Khan, Maleq Marathe, Madhav V. Virginia Tech Dept Comp Sci Blacksburg VA 24061 USA Virginia Tech Dept Comp Sci Blacksburg VA 24061 USA

ISBN: (纸本)9781450323789

Recently, there has been substantial interest in the study of various random networks as mathematical models of complex systems. As these complex systems grow larger, the ability to generate progressively large random networks becomes all the more important. This motivates the need for efficient parallel algorithms for generating such networks. Naive parallelization of the sequential algorithms for generating random networks may not work due to the dependencies among the edges and the possibility of creating duplicate (parallel) edges. In this paper, we present MPI-based distributed memory parallel algorithms for generating random scale-free networks using the preferential-attachment model. Our algorithms scale very well to a large number of processors and provide almost linear speedups. The algorithms can generate scale-free networks with 50 billion edges in 123 seconds using 768 processors.

关键词： scale-free networks Big Data high performance computing preferential attachment random networks parallel algorithms copy model

来源：评论

学校读者我要写书评

暂无评论

Staggered parallel short-time Fourier transform

引用

DIGITAL SIGNAL PROCESSING 2019年 93卷 70-86页

作者： Labao, Alfonso B. Camaclang, Rodolfo C., III Caro, Jaime D. L. Univ Philippines Diliman Coll Engn Dept Comp Sci Quezon City Philippines

In this paper, we present the staggered parallel short-time Fourier transform, an algorithm that uses a quasi-parallel procedure to compute exact STFT coefficients of 1D signals. The algorithm leverages parallelism with the capacity of feedforward STFT algorithms to re-use prior computations. It performs this by carefully organizing input signals and collecting past computations into 2D memory buffers. Re-using stored information in memory enables fast computation of up to N/2 FFTs in parallel. The algorithm's time complexity is at O[6T] under an abstract circuit implementation - achieving a complexity measure that is independent of sample complexity N. Its time complexity is asymptotically equivalent with the best possible exact algorithm of O[T] time complexity, with a constant efficiency at O[1] relative to the best known sequential algorithm. Its efficiency property holds whether in an abstract circuit implementation or in a CPU implementation with limited number of cores. In general, the algorithm consumes less processors than other parallel STFT algorithms but can potentially require more memory. To test the algorithm's properties, we implement several STFT algorithms in a CPU with varying numbers of cores. These algorithms use either FFT, iterative, or feedforward schemes to capture the range of existing STFT algorithms for comparison. From our experimental results, our proposed algorithm has the least running time among exact STFT algorithms, while consuming less CPU processors than other forms of parallel implementations. (C) 2019 Elsevier Inc. All rights reserved.

关键词： Short time Fourier transform Signal processing parallel algorithms Fourier transform algorithmics

来源：评论

学校读者我要写书评

暂无评论

Speed-up Single Shot Detector on GPU with CUDA

Speed-up Single Shot Detector on GPU with CUDA

引用

Summer Virtual Conference on Software Engineering, Artificial Intelligence, Networking and parallel/Distributed Computing (SNPD-Summer), ACIS International

作者： Chenyu Wang Toshi Endo Takahiro Hirofuchi Tsutomu Ikegami Tokyo Institute of Technology & National Institute of Advanced Industrial Science and Technology Tokyo Japan Tokyo Institute of Technology Tokyo Japan National Institute of Advanced Industrial Science and Technology Tokyo Japan

ISBN: (纸本)9798350396386

Nowadays, most of the current research on object detection is to improve the whole framework, in order to improve the accuracy of detection, but another problem of object detection is the detection speed. The more complex the architecture, the slower the speed. This time, we implemented a Single Shot Multibox Detector(SSD) using GPU with *** have improved the object detection speed of SSD, which is one of the most regularly used object detection frameworks. The most time-consuming part, the VGG16 network, is rephrased by using cuDNN, which is made faster by about 9%. The second time-consuming part is post-processing, where non-maximum-suppression (NMS) is performed. We accelerated NMS by implementing our new algorithms that are suitable for GPUs, which is about 52% faster than the original PyTorch version [11]. We also ported those parts that were originally executed on the CPU to the GPU. In total, our GPU-accelerated SSD can detect objects 22.5% faster than the original version. We demonstrate that using GPUs to accelerate existing frameworks is a viable approach.

关键词： Graphics processing units Object detection Detectors Computer architecture parallel algorithms Artificial intelligence Software engineering

来源：评论

学校读者我要写书评

暂无评论

Efficient parallel algorithms for XML Filtering with Structural and Value Constraints

引用

8th International Conference on Web Information Systems and Technologies (WEBIST)

作者： Antonellis, Panagiotis Makris, Christos Pispirigos, Georgios Univ Patras Fac Engn Dept Comp Engn & Informat Patras Greece

ISBN: (纸本)9783642366086;9783642366079

Information seeking applications employ information filtering techniques as a main component of their functioning. The purpose of the present article is to explore techniques to efficiently implement scalable and efficient information filtering techniques, based on the XML representation, when both structural and value constraints are imposed. In the majority of the provided implementations the use of the XML representation appears in single processor systems, and the involved user profiles are represented using the XPath query language and efficient heuristic techniques for constraining the complexity of the filtering mechanism are employed. Here, we propose a parallel filtering algorithm based on the well known YFilter algorithm, which dynamically applies a work-load balancing approach to each thread to achieve the best parallelization. In addition, the proposed filtering algorithm adds support for value-based predicates by embedding three different algorithms for handling value constraints during XML filtering, based on the popularity and the semantic interpretation of the predicate values. Experimental results depict that the proposed system outperforms the previous parallel approaches to the XML filtering problem.

关键词： XML filtering parallel algorithms Semantic similarity

来源：评论

学校读者我要写书评

暂无评论

Intelligent recognition model of English translation based on cloud computing GLR algorithm

Intelligent recognition model of English translation based o...

引用

The International Conference on Forthcoming Networks and Sustainability (FoNeS 2022)

作者： J. Wang Xi'an Fanyi University Xi'an 710105 Shaanxi People's Republic of China

Language communication and understanding involve more and more fields. Human-computer interaction, translation systems, etc. have entered our lives more and more, and are and will change the way of production of our lives. The GLR algorithm can analyze English translation sentences well, and its analysis time complexity is lower than that of parallel algorithms. The main purpose of this paper is to design and study the intelligent recognition model of English translation based on the GLR of cloud computing. The paper mainly lists the main points of the GLR algorithm, the characteristics of the word algorithm, and the classification of words and collocations through calculation, which is convenient for intelligent and so on. Experiments show that the accuracy of English translation is improved by 24% after proofreading, and there is a difference. Therefore, the intelligent recognition model of English translation of this system is relatively very effective.

关键词： cloud computing language translation parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

LOWERBOUNDS FOR BISIMULATION BY PARTITION REFINEMENT

arXiv

引用

arXiv 2022年

作者： Groote, Jan Friso Martens, Jan De Vink, Erik P. Eindhoven University of Technology Netherlands

We provide time lowerbounds for sequential and parallel algorithms deciding bisimulation on labelled transition systems that use partition refinement. For sequential algorithms this is Ω((m+n)logn) and for parallel algorithms this is Ω(n), where n is the number of states and m is the number of transitions. The lowerbounds are obtained by analysing families of deterministic transition systems, ultimately with two actions in the sequential case, and one action for parallel algorithms. For deterministic transition systems with one action, bisimilarity can be decided sequentially with fundamentally different techniques than partition refinement. In particular, Paige, Tarjan, and Bonic give a linear algorithm for this specific situation. We show, exploiting the concept of an oracle, that this approach is not of help to develop a faster generic algorithm for deciding bisimilarity. For parallel algorithms there is a similar situation where our approach can be applied, too. © 2022, CC BY.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：