检索结果-内蒙古大学图书馆

parallel Global Search Algorithm with Local Tuning for Solving Mixed-Integer Global Optimization Problems

LOBACHEVSKII JOURNAL OF MATHEMATICS 2021年第7期42卷 1492-1503页

作者： Barkalov, K. A. Gergel, V. P. Lebedev, I. G. Lobachevskii State Univ Nizhny Novgorod Nizhnii Novgorod 603950 Russia

In this paper, we consider mixed-integer global optimization problems and propose a parallel algorithm for solving problems of this class based on information-statistical approach for solving continuous global optimization problems. Within this algorithm, we suggest using a local tuning scheme based on the assumption that the multiextremality of the discussed problem is weak. We also compare the sequential version of the algorithm with other similar methods. The effectiveness of parallelizing the algorithm has been confirmed by solving a series of mixed-integer global optimization problems on the Lobachevskii supercomputer.

关键词： global optimization non-convex constraints mixed-integer problems local tuning parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Efficient parallel CP decomposition with pairwise perturbation and multi-sweep dimension tree 35

Efficient parallel CP decomposition with pairwise perturbati...

引用

35th IEEE International parallel and Distributed Processing Symposium (IPDPS)

作者： Ma, Linjian Solomonik, Edgar Univ Illinois Dept Comp Sci Champaign IL 61820 USA

ISBN: (纸本)9781665440660

The widely used alternating least squares (ALS) algorithm for the canonical polyadic (CP) tensor decomposition is dominated in cost by the matricized-tensor times Khatri-Rao product (MTTKRP) kernel. This kernel is necessary to set up the quadratic optimization subproblems. State-of-the-art parallel ALS implementations use dimension trees to avoid redundant computations across M1TKRPs within each ALS sweep. In this paper, we propose two new parallel algorithms to accelerate CP-ALS. We introduce the multi-sweep dimension tree (MSDT) algorithm, which requires the contraction between an order N input tensor and the first-contracted input matrix once every (N - 1) /N sweeps. This algorithm reduces the leading order computational cost by a factor of 2(N - 1)/N relative to the best previously known approach. In addition, we introduce a more communication-efficient approach to parallelizing an approximate CP-ALS algorithm, pairwise perturbation. This technique uses perturbative corrections to the subproblems rather than recomputing the contractions, and asymptotically accelerates ALS. Our benchmark results on 1024 processors on the Stampede2 supercomputer show that CP decomposition obtains a 1.25X speed-up from MSDT and a 1.94X speed-up from pairwise perturbation compared to the state-of-the-art dimension-tree based CP-ALS implementations.

关键词： Tensors Program processors Perturbation methods Approximation algorithms Supercomputers Computational efficiency parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel Top-K Motif Discovery in Weighted Networks

SSRN

引用

SSRN 2023年

作者： Papadopoulos, Apostolos N. Koutounidis, Nikolaos Aristotle University of Thessaloniki Greece

The enumeration of all cliques in a graph or finding the largest clique are important problems that unfortunately are computationally intensive. Another alternative is to select only the most important motifs (e.g., small subgraphs, or patterns), where the importance is quantified by means of a function applied on a subgraph. Given a weighted graph G(V,E,w()), where V is the set of nodes and E is the set of edges and w() is a function that returns the weight of an edge e we are looking for the efficient computation of the top-k weighted triangles (and also higher-order cliques, e.g., 4-cliques, 5-cliques, etc). More specifically, the proposed methodology is based on a parallel algorithm which is efficient and scalable and exploits the multi-threading capabilities of modern multi-core processors. Initially, we present a solution for the discovery of top-k triangles, which are the simplest non-trivial cliques and then we generalize our solution for the discovery of top-k c-cliques of higher order, i.e., when c > 3. Performance evaluation results based on real-life networks show that the proposed algorithmic technique is significantly more efficient than the centralized one and also it is scalable, showing very good speedups by increasing the number of CPU cores being used. © 2023, The Authors. All rights reserved.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Accelerated Stochastic Gradient for Nonnegative Tensor Completion and parallel Implementation 29

Accelerated Stochastic Gradient for Nonnegative Tensor Compl...

引用

29th European Signal Processing Conference (EUSIPCO)

作者： Siaminou, Ioanna Papagiannakos, Ioannis Marios Kolomvakis, Christos Liavas, Athanasios P. Tech Univ Crete Sch Elect & Comp Engn Khania Greece

ISBN: (纸本)9789082797060

We consider the problem of nonnegative tensor completion. We adopt the alternating optimization framework and solve each nonnegative matrix completion problem via a stochastic variation of the accelerated gradient algorithm. We experimentally test the effectiveness and the efficiency of our algorithm using both real-world and synthetic data. We develop a shared-memory implementation of our algorithm using the multi-threaded API OpenMP, which attains significant speedup. We believe that our approach is a very competitive candidate for the solution of very large nonnegative tensor completion problems.

关键词： tensors stochastic gradient nonnegative tensor completion optimal first-order optimization algorithms parallel algorithms OpenMP

来源：评论

学校读者我要写书评

暂无评论

parallel Subgraph Isomorphism on Multi-core Architectures: A Comparison of Four Strategies Based on Tree Search

Parallel Subgraph Isomorphism on Multi-core Architectures: A...

引用

Joint IAPR International Workshop on Structural and Syntactic Pattern Recognition (SSPR) / International Workshop on Statistical Techniques in Pattern Recognition (SPR)

作者： Carletti, Vincenzo Foggia, Pasquale Greco, Antonio Vento, Mario Univ Salerno Dept Informat Engn Elect Engn & Appl Math Fisciano Italy

ISBN: (纸本)9783030739720;9783030739737

Subgraph isomorphism is one of the most challenging problems on graph-based representations. Despite many efficient sequential algorithms have been proposed over the last decades, solving this problem on large graphs is still a time demanding task. For this reason, there is a recently growing interest in realizing effective parallel algorithms able to exploit at their best the modern multi-core architectures commonly available on servers and workstations. We propose a comparison of four parallel algorithms derived from the state-of-the-art sequential algorithm VF3-Light;two of them were presented in previous works, while the other two are introduced in this paper. In order to evaluate strong points and weaknesses of each algorithm, we performed a benchmark over six datasets of random large and dense graphs, both labelled and unlabelled, measuring memory usage, speed-up and efficiency. We also add a comparison with a different parallel algorithm, named Glasgow, that is not derived from VF3-Light.

关键词： Exact graph matching Subgraph isomorphism parallel algorithms VF3

来源：评论

学校读者我要写书评

暂无评论

CMAP-LAP: Configurable Massively parallel Solver for Lattice Problems 28

CMAP-LAP: Configurable Massively Parallel Solver for Lattice...

引用

28th Annual IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC)

作者： Tateiwa, Nariaki Shinano, Yuji Yamamura, Keiichiro Yoshida, Akihiro Kaji, Shizuo Yasuda, Masaya Fujisawa, Katsuki Kyushu Univ Grad Sch Math Fukuoka Japan Zuse Inst Berlin ZIB Appl Algorithm Intelligence Methods A2IM Berlin Germany Kyushu Univ Inst Math Ind Fukuoka Japan Rikkyo Univ Dept Math Tokyo Japan

ISBN: (纸本)9781665410168

Lattice problems are a class of optimization problems that are notably hard. There are no classical or quantum algorithms known to solve these problems efficiently. Their hardness has made lattices a major cryptographic primitive for post-quantum cryptography. Several different approaches have been used for lattice problems with different computational profiles;some suffer from super-exponential time, and others require exponential space. This motivated us 10 develop a novel lattice problem solver, CMAP-LAP, based on the clever coordination of different algorithms that run massively in parallel. With our flexible framework, heterogeneous modules run asynchronously in parallel on a large-scale distributed system while exchanging information, which drastically boosts the overall performance. We also implement full checkpoint-and-restart functionality, which is vital to high-dimensional lattice problems. CMAP-LAP facilitates the implementation of large-scale parallel strategies for lattice problems since all the functions are designed to he customizable and abstract. Through numerical experiments with up to 103,680 cores, we evaluated the performance and stability of our system and demonstrated its high capability for future massive-scale experiments.

关键词： Discrete optimization Lattice problem Lattice-based cryptography Shortest vector problem parallel algorithms Ubiquity Generator Framework

来源：评论

学校读者我要写书评

暂无评论

parallel Filtered Graphs for Hierarchical Clustering

arXiv

引用

arXiv 2023年

作者： Yu, Shangdi Shun, Julian MIT CSAIL

Given all pairwise weights (distances) among a set of objects, filtered graphs provide a sparse representation by only keeping an important subset of weights. Such graphs can be passed to graph clustering algorithms to generate hierarchical clusters. In particular, the directed bubble hierarchical tree (DBHT) algorithm on filtered graphs has been shown to produce good hierarchical clusters for time series data. We propose a new parallel algorithm for constructing triangulated maximally filtered graphs (TMFG), which produces valid inputs for DBHT, and a scalable parallel algorithm for generating DBHTs that is optimized for TMFG inputs. In addition to parallelizing the original TMFG construction, which has limited parallelism, we also design a new algorithm that inserts multiple vertices on each round to enable more parallelism. We show that the graphs generated by our new algorithm have similar quality compared to the original TMFGs, while being much faster to generate. Our new parallel algorithms for TMFGs and DBHTs are 136–2483x faster than state-of-the-art implementations, while achieving up to 41.56x self-relative speedup on 48 cores with hyper-threading, and achieve better clustering results compared to the standard average-linkage and complete-linkage hierarchical clustering algorithms. We show that on a data set, our algorithms produce clusters that align well with human experts’ classification. Copyright © 2023, The Authors. All rights reserved.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

A Time-Optimal Randomized parallel Algorithm for MIS 32

A Time-Optimal Randomized Parallel Algorithm for MIS

引用

32nd Annual ACM-SIAM Symposium on Discrete algorithms (SODA)

作者： Ghaffari, Mohsen Haeupler, Bernhard Swiss Fed Inst Technol Zurich Switzerland Carnegie Mellon Univ Pittsburgh PA 15213 USA

ISBN: (纸本)9781611976465

We present a randomized parallel algorithm, in the Exclusive-Read Exclusive-Write (EREW) PRAM model, that computes a Maximal Independent Set (MIS) in O(log n) time and using O(m log(2) n) work, with high probability. Thus, MIS is an element of RNC1. This time complexity is optimal and it improves on the celebrated O(log(2) n) time algorithms of Luby [STOC'85] and Alon, Babai, and Itai [JALG'86], which had remained the state of the art for the past 35 years.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel and distributed paradigms for community detection in social networks: A methodological review

引用

EXPERT SYSTEMS WITH APPLICATIONS 2022年 187卷

作者： Naik, Debadatta Ramesh, Dharavath Gandomi, Amir H. Gorojanam, Naveen Babu Indian Inst Technol Indian Sch Mines Dept Comp Sci & Engn Dhanbad Bihar India Univ Technol Sydney Dept Engn & Informat Technol Ultimo NSW Australia

Community detection in social networks is the process of identifying the cohesive groups of similar nodes. Detection of these groups can be helpful in many applications, such as finding networks of protein interaction in biological networks, finding the users of similar mind for ads and suggestions, finding a shared research field in collaborative networks, analyzing public health, future link prediction in social networks, analyzing criminology, and many more. However, with the increase in the number of profiles and content shared on social media platforms, the analysis is often time-consuming and exhaustive. In order to speed up and optimize the community detection process, parallel processing and Shared/Distributed memory techniques are widely used. Despite community detection has widespread use in social networks, no attempt has ever been made to compile and systematically discuss research efforts on the emerging subject of identifying parallel and distributed methods for community detection in social networks. Most of the surveys described the serial algorithms used for community detection. Our survey work comes under the scope of new design techniques, exciting or novel applications, components or standards, and applications of an educational, transactional, and co-operational nature. This paper accommodates and presents a systematic literature review with state-of-the-art research on the application of parallel processing and Shared/Distributed techniques to determine communities for social network analysis. Advanced search strategy has been performed on several digital libraries for extracting several studies for the review. The systematic search landed in finding 3220 studies, among which 65 relevant studies are selected after conducting various screening phases for further review. The application of parallel computing, shared memory, and distributed memory on the existing community detection methodologies have been discussed thoroughly. More specifically,

关键词： Social Networks Community Detection parallel algorithms Distributed algorithms

来源：评论

学校读者我要写书评

暂无评论

PPBT: A High Performance parallel Search Tree 28

PPBT: A High Performance Parallel Search Tree

引用

28th Annual IEEE International Conference on High Performance Computing, Data, and Analytics (HiPC)

作者： Guan, Jiawen Fan, Rui ShanghaiTech Univ Shanghai Peoples R China

ISBN: (纸本)9781665410168

Search trees are one of the most important and widely used data structures, and parallelization is an effective method to improve their performance. However, many existing parallel search trees incur high synchronization costs and low memory I/O efficiency, which limits their performance. We propose PPBT, a batched parallel search tree which minimizes synchronization by partitioning the tree using novel algorithms and minimizing I/O cost using buffering. We give a new sequential algorithm for batch processing on search trees with optimal I/O efficiency for insert and delete operations, and also present a fast parallel algorithm for joining disjoint search trees. We show experimentally that PPBT is over 6x faster than the state-of-the-art parallel tree in [1] and over 40x faster than the concurrent search tree in [7], and achieves 21x speedup using 32 threads. PPBT's throughput on searches is lower due to reduced opportunities for buffering, but is still 1.3x that of [1]. In addition, PPBT has good response times for searches, for example completing 100K searches in under 1 ms in a tree with 10M elements.

关键词： parallel algorithms data structures search tree cache efficiency

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：