检索结果-内蒙古大学图书馆

arXiv 2024年

作者： Kam, Jeffrey Kamali, Shahin Miller, Avery Nishimura, Naomi University of Waterloo Canada York University Canada University of Manitoba Canada

We use the reconfiguration framework to analyze problems that involve the rearrangement of items among groups. In various applications, a group of items could correspond to the files or jobs assigned to a particular machine, and the goal of rearrangement could be improving efficiency or increasing locality. To cover problems arising in a wide range of application areas, we define the general Repacking problem as the rearrangement of multisets of multisets. We present hardness results for the general case and algorithms for various classes of instances that arise in real-life scenarios. By limiting the total size of items in each multiset, our results can be viewed as an offline approach to Bin Packing, in which each bin is represented as a multiset. In addition to providing the first results on reconfiguration of multisets, our contributions open up several research avenues: the interplay between reconfiguration and online algorithms and parallel algorithms;the use of the tools of linear programming in reconfiguration;and, in the longer term, a focus on extra resources in reconfiguration. Copyright © 2024, The Authors. All rights reserved.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

FFT Algorithm Optimization and RD Imaging Algorithm Implementation Based on Heterogeneous Platform

FFT Algorithm Optimization and RD Imaging Algorithm Implemen...

引用

International Conference on Communication Technology (ICCT)

作者： BingZhi Hou Chengguang Ma Junyu Li Daiwei Li Graduate School of the Second Research Institute of China Aerospace Science and Industry Corporation Beijing China Beijing Remote Sensing Equipment Research Institute Beijing China China Telecom Corporation Limited Beijing China

ISBN: (数字)9798350363760

ISBN: (纸本)9798350363777

This paper implements the Fast Fourier Transform (FFT) algorithm for signal data processing using Open Computing Language (OpenCL). A parallel algorithm model suitable for staged FFT across different GPUs is proposed, including methods for execution and memory model settings. The characteristics of the OpenCL model and specific data structures are applied to optimize the logical structure of the parallel algorithm. Finally, the proposed method is applied and implemented in the Synthetic Aperture Radar(SAR) imaging RD algorithm. Experimental data confirm that the computational speed of the parallel algorithm in this paper is significantly higher than that of a serial CPU-based algorithm. Compared to the fastest FFT algorithm FFTW on the current CPU platform, it achieves substantially better performance. Additionally, compared to the CUDA-based CUFFT parallel algorithm, the performance of the algorithm in this paper is notably improved. In the SAR imaging RD algorithm, based on classical airborne SAR imaging parameters, it shows a significant improvement over FFTW.

关键词： Fast Fourier transforms Computational modeling Signal processing algorithms Graphics processing units Imaging Signal processing Programming Radar imaging Radar polarimetry parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Distributed Path Compression for Piecewise Linear Morse-Smale Segmentations and Connected Components

Distributed Path Compression for Piecewise Linear Morse-Smal...

引用

IEEE Symposium on Large Data Analysis and Visualization (LDAV)

作者： Michael Will Jonas Lukasczyk Julien Tierny Christoph Garth RPTU Kaiserslautern-Landau CNRS and Sorbonne Université

ISBN: (数字)9798331516925

ISBN: (纸本)9798331516932

This paper describes the adaptation to a distributed computational setting of a well-scaling parallel algorithm for computing Morse-Smale segmentations based on path compression. Additionally, we extend the algorithm to efficiently compute connected components in distributed structured and unstructured grids, based either on the connectivity of the underlying mesh or a feature mask. Our implementation is seamlessly integrated with the distributed extension of the Topology ToolKit (TTK), ensuring robust performance and scalability. To demonstrate the practicality and efficiency of our algorithms, we conducted a series of scaling experiments on large-scale datasets, with sizes of up to 4096 3 vertices on up to 64 nodes and 768 cores.

关键词： Data analysis Instruction sets Scalability Data visualization Solids Topology Global communication parallel algorithms Distributed computing Optimization

来源：评论

学校读者我要写书评

暂无评论

Combinatorial Correlation Clustering

arXiv

引用

arXiv 2024年

作者： Cohen-Addad, Vincent Lolck, David Rasmussen Pilipczuk, Marcin Thorup, Mikkel Yan, Shuyi Zhang, Hanwen Google Research United States University of Copenhagen Denmark University of Warsaw Poland

Correlation Clustering is a classic clustering objective arising in numerous machine learning and data mining applications. Given a graph G = (V, E), the goal is to partition the vertex set into clusters so as to minimize the number of edges between clusters plus the number of edges missing within clusters. The problem is APX-hard and the best known polynomial time approximation factor is 1.73 by Cohen-Addad, Lee, Li, and Newman [FOCS’23]. They use an LP with |V |1/ǫΘ(1) variables for some small ǫ. However, due to the practical relevance of correlation clustering, there has also been great interest in getting more efficient sequential and parallel algorithms. The classic combinatorial pivot algorithm of Ailon, Charikar and Newman [JACM’08] provides a 3-approximation in linear time. Like most other algorithms discussed here, this uses randomization. Recently, Behnezhad, Charikar, Ma and Tan [FOCS’22] presented a 3 + ǫapproximate solution for solving problem in a constant number of rounds in the Massively parallel Computation (MPC) setting. Very recently, Cao, Huang, Su [SODA’24] provided a 2.4-approximation in a polylogarithmic number of rounds in the MPC model and in Õ(|E|1.5) time in the classic sequential setting. They asked whether it is possible to get a better than 3-approximation in near-linear time? We resolve this problem with an efficient combinatorial algorithm providing a drastically better approximation factor. It achieves a ∼ 2 − 2/13 © 2024, CC BY.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

6th IEEE Workshop on parallel and Distributed Processing for Computational Social Systems (ParSocial 2022)

6th IEEE Workshop on Parallel and Distributed Processing for...

引用

IEEE International Symposium on parallel and Distributed Processing Workshops and Phd Forum (IPDPSW)

作者： John Korah Eunice E. Santos

ISBN: (数字)9781665497473

ISBN: (纸本)9781665497480

Welcome to the Sixth IEEE Workshop on parallel and Distributed Processing for Computational Social Systems (ParSocial 2022). This year the workshop highlights novel algorithms and models that leverage parallel computing with applications in social network and social media analysis. The first set of papers focus on the key individual identification problem in social network analysis. The paper by Vandromme et al entitled “Efficient parallel PageRank Algorithm for Network Analysis” proposes a more efficient parallel algorithm for PageRank that has been shown to improve the time complexity by a factor of two. In a similar vein, the paper by Sahu et al entitled “Dynamic Batch parallel algorithms for Updating PageRank” proposes two parallel algorithms for recomputing PageRank of nodes in a dynamic social network that can scale across various architectures. A related research problem is identifying opinion leaders that can improve information dissemination within communities. The paper entitled “Effect of Community-based Opinion Leaders on Guideline Dissemination in Large-Scale Physician Networks” by Murugappan et al, focuses on the problem of the dissemination of medical guidelines. The authors propose a culturally infused agent based model to analyze the effectiveness of various opinion leader selection strategies and the tradeoffs between the reach and rate of spread of medical guideline information. The next set of papers focus on social media analysis. Systems for large scale ingestion of social media data sets can support a wide range of research problems in computational social systems. A step in this direction is taken by authors Huber et al, who have proposed a parallel system for large scale processing of Reddit data in their paper entitled “A Streaming System for Large-scale Mining of Reddit Data”. On the other hand, authors Abeysinghe et al in their short research paper entitled “Unsupervised User Stance Detection on Tweets Against Web Articles Using Sentence

关键词： Social networking (online) Conferences Distributed processing parallel algorithms Heuristic algorithms Guidelines Analytical models

来源：评论

学校读者我要写书评

暂无评论

Efficient Core Decomposition Over Large Heterogeneous Information Networks

Efficient Core Decomposition Over Large Heterogeneous Inform...

引用

International Conference on Data Engineering

作者： Yucan Guo Chenhao Ma Yixiang Fang The Chinese University of Hong Kong Shenzhen

ISBN: (数字)9798350317152

ISBN: (纸本)9798350317169

Core decomposition is a critical metric for evaluating the vertex importance and analyzing graph structure. Given a graph $G$ , a k-core is the largest subgraph of $G$ where each vertex has at least $k$ neighbors. Most existing works mainly focus on homogeneous graphs in which edges are of the same type and cannot be applied to heterogeneous information networks (HINs) directly. However, most real-world networks are HINs which consist of different vertex types and edge types. To reveal the cohesive subgraphs with hierarchical relations on HINs, we adopt the well-known $(k,\mathcal{P})$ -core model to compute coreness over HINs, where $\mathcal{P}$ is a meta-path, i.e., a sequence of relations defined between different types of vertices. Hence, the $(k,\mathcal{P})$ -core is a subgraph where each vertex is connected to at least $k$ other vertices via instances of $\mathcal{P}$ . Based on two kinds of sparse matrix products, we propose two kinds of algebraic core decomposition algorithms, which are suitable for general HINs and locally dense HINs, respectively. We have performed extensive empirical evaluations of our algorithms on six large real-world HINs. The results show that the proposed solutions are highly efficient for core decomposition and achieve up to $258.84\times$ speedup than the state-of-the-art parallel algorithm on 20 cores. Moreover, other HIN tasks that involve homogeneous graph construction can also benefit from our algorithms.

关键词： Measurement Computational modeling Buildings Data engineering Data models Sparse matrices parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Efficient parallelization of Global Graph Measures on Multicore Shared Memory Systems

Efficient Parallelization of Global Graph Measures on Multic...

引用

Moratuwa Engineering Research Conference (MERCon)

作者： Sangeetha Mahendran Jenusiya Jeyaseelan Nagulan Ratnarajah Department of Physical Science University of Vavuniya Vavuniya Sri Lanka

ISBN: (数字)9798331529048

ISBN: (纸本)9798331529055

Exploring the structural and functional properties of real-world large graphs, such as detecting community structure in social networks and assessing the connectivity of different brain regions in brain graphs, is an increasingly prominent research area. Quantitative graph theory has been developed to quantify both structural and functional aspects of graphs. Typically, nodal and global graph measures are employed to estimate the information content of a graph. There is currently a pronounced interest in parallel graph processing, driven by the imperative to quickly analyze the large graphs available today. Modern desktop and laptop computers are equipped with multicore processors featuring shared memory architecture. The utilization of the OpenMP API offers numerous advantages for shared memory systems. In this study, parallel algorithms for four global graph measures have been designed and implemented on multicore shared memory systems using both task-centric and data-centric parallel techniques. We assess performance across varying numbers of cores and for different sizes of random graphs, as well as numerous real brain graphs, comparing the results against serial algorithms within the same hardware environment. Experimental results demonstrate a significant enhancement in the parallel algorithms across multiple cores, effectively meeting the demand for accelerated computation of graph measures.

关键词： Computers Program processors Portable computers Multicore processing Social networking (online) Memory management Memory architecture Hardware Graph theory parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Dynamic Screening of Two-Electron Repulsion Integrals in GPU parallelization

Dynamic Screening of Two-Electron Repulsion Integrals in GPU...

引用

International Symposium on Computing and Networking Workshops (CANDARW)

作者： Satoki Tsuji Yasuaki Ito Haruto Fujii Nobuya Yokogawa Kanta Suzuki Koji Nakano Akihiko Kasagi Graduate School of Advanced Science and Technology Hiroshima University Higashi-Hiroshima Japan Computing Laboratory Fujitsu Limited Kawasaki Japan

ISBN: (数字)9798331505349

ISBN: (纸本)9798331505356

Two-electron repulsion integrals (ERIs) are among the most foundational quantities for numerically solving the Schrödinger equation, accounting for the Coulomb interactions between electrons in a molecule. Calculating ERIs is a computationally intensive task of determining integral values for every combination of four atomic orbitals, which dominates the execution time of fundamental frameworks in quantum chemistry such as the Hartree-Fock method. It is known that the numerous ERI calculations can be significantly reduced using the Schwarz inequality. However, the Schwarz screening requires evaluating the upper bound for every integral value, leading to the time complexity comparable to the entire ERI calculations themselves. This paper proposes a dynamic screening algorithm that minimizes the number of the upper bound evaluations using the asynchronous parallelism of GPUs. Our parallel algorithm performed by dynamically scheduled CUDA blocks can discard most ERI calculations without even evaluating their upper bounds. Furthermore, by screening ERIs at the level of the finest integral units, primitive integrals, we reduce more integral calculations over coarse-grained screening. Experimental results for various molecules using an NVIDIA A100 GPU demonstrate that the proposed dynamic screening achieves up to an 18.0-fold speedup compared to the non-screening ERI computation while keeping the energy error below 1.0 × 10 −7 hartree.

关键词： Upper bound Heuristic algorithms Quantum chemistry Integral equations Graphics processing units Orbits Mathematical models parallel algorithms Time complexity Electrons

来源：评论

学校读者我要写书评

暂无评论

Research on GPU Accelerated Polar Power Flow Calculation

Research on GPU Accelerated Polar Power Flow Calculation

引用

Energy, Power and Electrical Engineering (EPEE), International Conference on

作者： Haowei Li Tong Jiang School of Electrical and Electronic Engineering North China Electric Power University Beijing China

ISBN: (数字)9798331518066

ISBN: (纸本)9798331518073

The increasing scale of power systems necessitates corresponding enhancements in the speed and accuracy of power flow calculations. Leveraging the study of Graphics Processing Unit (GPU) architecture, parallel algorithms, and power flow calculation methods, we exploit the high parallelism offered by GPU. In this paper, we optimize the Newton method under traditional polar coordinates to develop parallel programs and utilize Compute Unified Device Architecture (CUDA) for data parallel computing. Simulation results demonstrate that as the number of nodes in the test system increases, our proposed program exhibits more pronounced advantages, effectively improving power flow calculation speed and providing a promising approach for further research in this field.

关键词： Electrical engineering Accuracy Simulation Graphics processing units Computer architecture Programming Power systems Newton method parallel algorithms Load flow

来源：评论

学校读者我要写书评

暂无评论

Batch Updates of Distributed Streaming Graphs using Linear Algebra

Batch Updates of Distributed Streaming Graphs using Linear A...

引用

High Performance Computing, Networking, Storage and Analysis, SC-W: Workshops of the International Conference for

作者： Elaheh Hassani Md Taufique Hussain Ariful Azad Dept. of Intelligent Systems Engineering Indiana University Bloomington IN USA

ISBN: (数字)9798350355543

ISBN: (纸本)9798350355550

We develop a distributed-memory parallel algorithm for performing batch updates on streaming graphs, where vertices and edges are continuously added or removed. Our algorithm leverages distributed sparse matrices as the core data structures, utilizing equivalent sparse matrix operations to execute graph updates. By reducing unnecessary communication among processes and employing shared-memory parallelism, we accelerate updates of distributed graphs. Additionally, we maintain a balanced load in the output matrix by permuting the resultant matrix during the update process. We demonstrate that our streaming update algorithm is at least 25 times faster than alternative linear-algebraic methods and scales linearly up to 4,096 cores (32 nodes) on a Cray EX supercomputer.

关键词： High performance computing Conferences Linear algebra Data structures Supercomputers Sparse matrices parallel algorithms Standards

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：