检索结果-内蒙古大学图书馆

60th IEEE Annual Symposium on Foundations of Computer Science (FOCS)

作者： Ghaffari, Mohsen Kuhn, Fabian Uitto, Jara Swiss Fed Inst Technol Dept Comp Sci Zurich Switzerland Univ Freiburg Dept Comp Sci Freiburg Germany Aalto Univ Dept Comp Sci Espoo Finland

ISBN: (纸本)9781728149523

We present the first conditional hardness results for massively parallel algorithms for some central graph problems including (approximating) maximum matching, vertex cover, maximal independent set, and coloring. In some cases, these hardness results match or get close to the state of the art algorithms. Our hardness results are conditioned on a widely believed conjecture in massively parallel computation about the complexity of the connectivity problem. We also note that it is known that an unconditional variant of such hardness results might be somewhat out of reach for now, as it would lead to considerably improved circuit complexity lower bounds and would concretely imply that NC1 is a proper subset of P. We obtain our conditional hardness result via a general method that lifts unconditional lower bounds from the well-studied LOCAL model of distributed computing to the massively parallel computation setting.

关键词： parallel algorithms Algorithm design and analysis

来源：评论

学校读者我要写书评

暂无评论

A novel accelerated implementation of RSA using parallel processing

引用

JOURNAL OF DISCRETE MATHEMATICAL SCIENCES & CRYPTOGRAPHY 2019年第2期22卷 309-322页

作者： Rawat, Abhishek Sehgal, Kartik Tiwari, Amartya Sharma, Abhishek Joshi, Ashish Bharati Vidyapeeths Coll Engn Dept Informat Technol New Delhi 110063 India

Past research has evidently proved that public key cryptosystems are usually slower than symmetric key cryptosystems due to the reason that they use one additional cryptographic key and different methods for encryption and decryption. RSA is one of the most common asymmetric key cryptography algorithms. Recent research has focused on speeding up RSA using various techniques. With the introduction of distributed computing, parallelization of algorithms enables them to run on multiple cores concurrently at a time. RSA consists of two resource intensive operations namely Modular Exponentiation of up to 1024-bit exponents and repeated calculation of Greatest common divisor. Thus, RSA lays the perfect base for application of Montgomery Reduction algorithm to optimize the Repeated Modular multiplication in exponentiation. In this paper we proposed a parallel scheme for RSA using a new parallel data structure known as Concurrent Indexed List of character blocks. The aim of our research was to improve the speed of RSA encryption and decryption using parallelism and also make it compatible with leading industry cryptography standards. We have simulated four different approaches namely both parallel and sequential with and without Montgomery. We have also integrated our parallel paradigm with renowned C++ Crypto library and achieved a speed-up of upto four times than sequential approach. Unlike any other previous approaches, our implementation got easily integrated with any external library and thus can be adopted by any other algorithmic scheme.

关键词： RSA parallel algorithms Montgomery Reduction Hyperthreading

来源：评论

学校读者我要写书评

暂无评论

Performance-influencing Factors for parallel and Algorithmic Problems in Multicore Environments 19

Performance-influencing Factors for Parallel and Algorithmic...

引用

10th ACM/SPEC International Conference on Performance Engineering (ICPE)

作者： Frank, Markus Becker, Steffen Kaplan, Angelika Koziolek, Anne Univ Stuttgart Stuttgart Germany Karlsruhe Inst Technol Karlsruhe Germany

ISBN: (纸本)9781450362863

Model-based approaches in Software Performance Engineering (SPE) are used in early design phases to evaluate performance. Most current model-based prediction approaches work quite well for single-core CPUs but are not suitable or precise enough for multicore environments. This is because they only consider a single metric (i.e., the CPU speed) as a factor affecting performance. Therefore, we investigate parallel-performance-influencing factors (PPIFs) as a preparing step to improve current performance prediction models by providing references curves for the speedup behaviour of different resource demands and scenarios. In this paper, we show initial results and their relevance for future work.

关键词： Software Performance Engineering Multicore Performance-influencing Factors Performance Prediction parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

HarpGBDT: Optimizing Gradient Boosting Decision Tree for parallel Efficiency

HarpGBDT: Optimizing Gradient Boosting Decision Tree for Par...

引用

IEEE International Conference on Cluster Computing (IEEE CLUSTER)

作者： Peng, Bo Chen, Langshi Li, Jiayu Jiang, Miao Akkas, Selahattin Smirnov, Egor Israfilov, Ruslan Khekhnev, Sergey Nikolaev, Andrey Qiu, Judy Indiana Univ Bloomington IN 47405 USA Intel Corp Santa Clara CA 95051 USA

ISBN: (纸本)9781728147345

Gradient Boosting Decision Tree (GBDT) is a widely used machine learning algorithm, whose training involves both irregular computation and random memory access and is challenging for system optimizations. In this paper, we conduct a comprehensive performance analysis of two state-of-the-art systems, XGBoost and LightGBM. They represent two typical parallel implementations for GBDT;one is data parallel and the other one is parallel over features. Substantial thread synchronization overhead, as well as the inefficiency of random memory access, is identified. We propose HarpGBDT, a new GBDT system designed from the perspective of parallel efficiency optimization. Firstly, we adopt a new tree growth method that selects the top K candidates of tree nodes to enable the use of more levels of parallelism without sacrificing the algorithm's accuracy. Secondly, we organize the training data and model data in blocks and propose a block-wise approach as a general model that enables the exploration of various parallelism options. Thirdly, we propose a mixed mode to utilize the advantages of a different mode of parallelism in different phases of training. By changing the configuration of the block size and parallel mode, HarpGBDT is able to attain better parallel efficiency. By extensive experiments on four datasets with different statistical characteristics on the Intel(R) Xeon(R) E5-2699 server, HarpGBDT on average performs 8x faster than XGBoost and 2.6x faster than LightGBM.

关键词： Machine learning algorithms parallel algorithms Performance evaluation Multithreading

来源：评论

学校读者我要写书评

暂无评论

Theoretically-Efficient and Practical parallel In-Place Radix Sorting 19

Theoretically-Efficient and Practical Parallel In-Place Radi...

引用

31st ACM Symposium on parallelism in algorithms and Architecturess (SPAA)

作者： Obeya, Omar Kahssay, Endrias Fan, Edward Shun, Julian MIT CSAIL Cambridge MA 02139 USA

ISBN: (纸本)9781450361842

parallel radix sorting has been extensively studied in the literature for decades. However, the most efficient implementations require auxiliary memory proportional to the input size, which can be prohibitive for large inputs. The standard serial in-place radix sorting algorithm is based on swapping each key to its correct place in the array on each level of recursion. However, it is not straightforward to parallelize this algorithm due to dependencies among the swaps. This paper presents Regions Sort, a new parallel in-place radix sorting algorithm that is efficient both in theory and in practice. Our algorithm uses a graph structure to model dependencies across elements that need to be swapped, and generates independent tasks from this graph that can be executed in parallel. For sorting n integers from a range r, and a parameter K, Regions Sort requires only O(K log r log n) auxiliary memory. Our algorithm requires O(n log r) work and O((n/K + logK) log r) span, making it work-efficient and highly parallel. In addition, we present several optimizations that significantly improve the empirical performance of our algorithm. We compare the performance of Regions Sort to existing parallel in-place and out-of-place sorting algorithms on a variety of input distributions and show that Regions Sort is faster than optimized out-of-place radix sorting and comparison sorting algorithms, and is almost always faster than the fastest publicly-available in-place sorting algorithm.

关键词： radix sort in-place algorithms parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Submodular Function Maximization in parallel via the Multilinear Relaxation 30

Submodular Function Maximization in Parallel via the Multili...

引用

30th Annual ACM-SIAM Symposium on Discrete algorithms (SODA)

作者： Chekuri, Chandra Quanrud, Kent Univ Illinois Urbana IL 61801 USA

ISBN: (纸本)9781611975482

Balkanski and Singer [4] recently initiated the study of adaptivity (or parallelism) for constrained submodular function maximization, and studied the setting of a cardinality constraint. Subsequent improvements for this problem by Balkanski, Rubinstein, and Singer [6] and Ene and Nguyen [21] resulted in a near-optimal (1-1/ epsilon-epsilon)approximation in O(log n/epsilon(2)) rounds of adaptivity. Partly motivated by the goal of extending these results to more general constraints, we describe parallel algorithms for approximately maximizing the multilinear relaxation of a monotone submodular function subject to packing constraints. Formally our problem is to maximize F(x) over x is an element of [0;1](n) subject to Ax <= 1 where F is the multilinear relaxation of a monotone submodular function. Our algorithm achieves a near-optimal (1 - 1=e - epsilon)-approximation in O(log(2) mlog n=epsilon(4)) rounds where n is the cardinality of the ground set and m is the number of packing constraints. For many constraints of interest, the resulting fractional solution can be rounded via known randomized rounding schemes that are oblivious to the specific submodular function. We thus derive randomized algorithms with poly-logarithmic adaptivity for a number of constraints including partition and laminar matroids, matchings, knapsack constraints, and their intersections. Our algorithm takes a continuous view point and combines several ideas ranging from the continuous greedy algorithm of [38, 13], its adaptation to the MWU framework for packing constraints [20], and parallel algorithms for packing LPs [31, 41]. For the basic setting of cardinality constraints, this viewpoint gives rise to an alternative, simple to understand algorithm that matches recent results [6, 21]. Our algorithm to solve the multilinear relaxation is deterministic if it is given access to a value oracle for the multilinear extension and its gradient;this is possible in some interesting cases such as the co

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Comparative study of performance of parallel Alpha Beta Pruning for different architectures 9

Comparative study of performance of parallel Alpha Beta Prun...

引用

IEEE 9th International Conference on Advanced Computing (IACC)

作者： Singhal, Shubhendra Pal Sridevi, M. Natl Inst Technol Dept Comp Sci & Engn Tiruchirappalli Tamil Nadu India

ISBN: (纸本)9781728143927

Optimization of searching the best possible action depending on various states like state of environment, system goal etc. has been a major area of study in computer systems. In any search algorithm, searching best possible solution from the pool of every possibility known can lead to the construction of the whole state search space popularly called as minimax algorithm. This may lead to a impractical time complexities which may not be suitable for real time searching operations. One of the practical solution for the reduction in computational time is Alpha Beta pruning Instead of searching for the whole state space, we prune the unnecessary branches, which helps reduce the time by significant amount. This paper focuses on the various possible implementations of the Alpha Beta pruning algorithms and gives an insight of what algorithm can be used for parallelism. Various studies have been conducted on how to make Alpha Beta pruning faster. parallelizing Alpha Beta pruning for the GPUs specific architectures like mesh(CUDA) etc. or shared memory model(OpenMP) helps in the reduction of the computational time. This paper studies the comparison between sequential and different parallel forms of Alpha Beta pruning and their respective efficiency for the chess game as an application.

关键词： parallel algorithms Minimax Alpha Beta pruning CUDA OpenMP Mesh architecture Shared memory model

来源：评论

学校读者我要写书评

暂无评论

An eigenvector-enhanced parallel adaptive differential evolution for electric motor design 31

An eigenvector-enhanced parallel adaptive differential evolu...

引用

31st IEEE International Conference on Tools with Artificial Intelligence (ICTAI)

作者： Essaid, Mokhtar Brevilliers, Mathieu Lepagnot, Julien Idoumghar, Lhassane Fodorean, Daniel Univ Haute Alsace IRIMAS Mulhouse France Tech Univ Cluj Napoca Dept Elect Machines & Drives Cluj Napoca Romania

ISBN: (纸本)9781728137988

Differential Evolution (DE) is a well-known meta-heuristic designed to solve continuous optimization problems. Its simple structure and straight forward search operators make it suitable for solving a wide range of real world problems. Despite its success, DE performance may be limited when tackling high dimensional complex problems. Therefore, its algorithmic structure can be reconsidered by adaptively controlling its parameters, and incorporating more resilient search operators. In this study, a Q-learning-based strategy is proposed to adapt DE parameters during the search process. Moreover, an eigenvector-based crossover is introduced in order to accelerate the convergence rate when ill-conditioned landscapes are treated. However, to avoid premature convergence, a simple yet efficient switching technique is proposed to choose between the normal and the eigenvector-based crossover. Due to the high computational time that might occur when applying the eigenvector-based crossover, a parallel counterpart of the algorithm has been implemented using graphics processing units (GPUs). The proposed algorithm has been applied to find the optimal mechanical structure of a recent electric motor. Its performance has been also validated by testing the proposal on CEC 2011 test suite, which contains a set of real world problems. The experimental results reveal the competetive performance of our algorithm compared to recent adaptive DE versions. Besides, the parallel version of the proposal achieved a serious speedup compared with the sequential version while keeping the same results.

关键词： Differential evolution parameter adaptation eigenvector-based crossover GPU real world problems parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

Table of Contents April 2025

引用

IEEE Latin America Transactions 2025年第4期23卷 265-265页

来源：评论

学校读者我要写书评

暂无评论

parallel Balanced Allocations: The Heavily Loaded Case 19

Parallel Balanced Allocations: The Heavily Loaded Case

引用

31st ACM Symposium on parallelism in algorithms and Architecturess (SPAA)

作者： Lenzen, Christoph Parter, Merav Yogev, Eylon MPI Informat Saarbrucken Germany Weizmann Inst Sci Rehovot Israel Technion Haifa Israel

ISBN: (纸本)9781450361842

We study parallel algorithms for the classical balls-into -bins problem, in which m balls acting in parallel as separate agents are placed into n bins. algorithms operate in synchronous rounds, in each of which balls and bins exchange messages once. The goal is to minimize the maximal load over all bins using a small number of rounds and few messages. While the case of m = n balls has been extensively studied, little is known about the heavily loaded case. In this work, we consider parallel algorithms for this somewhat neglected regime of m >> n. The naive solution of allocating each ball to a bin chosen uniformly and independently at random results in maximal load m/n 0(Vm/n " log n) (for m > n log n) with high probability (w.h.p.). In contrast, for the sequential setting Berenbrink et al. [5] showed that letting each ball join the least loaded bin of two randomly selected bins reduces the maximal load to m/ n 0(log log m) w.h.p. To date, no parallel variant of such a result is known. We present a simple parallel threshold algorithm that obtains a maximal load of m/ n + 0(1) w.h.p. within 0(log log(m/n) + log* n) rounds. The algorithm is symmetric (balls and bins all "look the same"), and balls send 0(1) messages in expectation. The additive term of O(log* n) in the complexity is known to be tight for such algorithms [10]. We also prove that our analysis is tight, i.e., algorithms of the type we provide must run for 11(min{log log (m / n), n}) rounds w.h.p. Finally, we give a simple asymmetric algorithm (i.e., balls are aware of a common labeling of the bins) that achieves a maximal load of m/ n + 0(1) in a constant number of rounds w.h.p. Again, balls send only a single message per round, and bins receive (1 + o(1))m/n 0 (log n) messages w.h.p. This goes to show that, similar to the case of m = n, asymmetry allows for highly efficient solutions.

关键词： balls and bins parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：