检索结果-内蒙古大学图书馆

29th IEEE International Conference on parallel and Distributed Systems, ICPADS 2023

作者： Tian, Min Wei, Zhenguo Li, Yanlong Du, Wei Xiao, Lei Xu, Chaoshuai China Shandong Fundamental Research Center for Computer Science Shandong Provincial Key Laboratory of Computer Networks China Jinan Institute of Supercomputing Technology Jinan Key Laboratory of High Performance Computing China

ISBN: (纸本)9798350330717

The solution of over-determined equations plays a very important role in fields such as data fitting, signal processing, and machine learning. It is of great significance in predicting natural phenomena, optimizing engineering design, and other fields. However, there is currently no efficient method to solve over-determined equations, either being not accurate enough or consuming a lot of time. In this article, we propose a parallel iterative method for solving over-determined equations, called the PIOD algorithm. By using a sub-convergence condition to terminate the iterative calculation, we have developed a task partitioning strategy for the algorithm and implemented parallelization of the solution of over-determined equations on a distributed memory system. Our proposed algorithm achieves an average speedup of 152 ×times compared to the open-source Eigen solver. Additionally, it also achieves a parallel efficiency of over 30%. © 2023 IEEE.

关键词： conjugate gradient method MPI over-determined equations parallel algorithms parallel iterative methods

来源：评论

学校读者我要写书评

暂无评论

A Scalable and Accurate Chessboard-Based AMC Algorithm With Low Computing Demands

引用

IEEE ACCESS 2023年 11卷 120955-120962页

作者： Zhao, Yuqin Gavin, William C. J. Deng, Tiantai Ball, Edward A. Seed, Luke Univ Sheffield Dept Elect & Elect Engn Sheffield S1 3JD England

Automatic Modulation Classification (AMC) is a technique used to identify signal modulations in applications like IoT devices, cognitive radar, software-defined radio, and electronic warfare. These applications could be applied to IoT devices. With future wide applications of IoT devices, AMC algorithms need to be more compact yet suitable for embedded devices with limited resources and remain acceptable accuracy. Although current AMC algorithms deliver high accuracy, they require substantial computing power, making them unsuitable for IoT devices. This paper introduces the novel Chessboard-based Automatic Modulation Classification (CAMC) algorithm, which has dramatically high accuracy. Test results reveal that CAMC achieves 99%* accuracy under a 3dB SNR condition and 100% above 5dB SNR. Meanwhile, this algorithm is scalable and demands less computing power. It offers better accuracy results compared to state-of-the-art AMC algorithms, classifying mainstream modulations in IoT devices like BPSK, QPSK, 8PSK, and 16QAM, but requires less computing power than existing algorithms. Additionally, CAMC is hardware-friendly due to its inherent parallelism and scalability. The novelty of this paper is to classify 4 different modulations in a low-computation-loading required and hardware-friendly way and achieve a high accuracy of over 99%* above SNR of 3dB. (* Accuracy that most of the time could reach)

关键词： Communications technology classification algorithms modulation Internet of Things parallel algorithms image classification software radio phase modulation

来源：评论

学校读者我要写书评

暂无评论

Communication Optimization algorithms for Distributed Deep Learning Systems: A Survey

引用

IEEE TRANSACTIONS ON parallel AND DISTRIBUTED SYSTEMS 2023年第12期34卷 3294-3308页

作者： Yu, Enda Dong, Dezun Liao, Xiangke Natl Univ Def Technol Coll Comp Changsha 410073 Peoples R China

Deep learning's widespread adoption in various fields has made distributed training across multiple computing nodes essential. However, frequent communication between nodes can significantly slow down training speed, creating a bottleneck in distributed training. To address this issue, researchers are focusing on communication optimization algorithms for distributed deep learning systems. In this paper, we propose a standard that systematically classifies all communication optimization algorithms based on mathematical modeling, which is not achieved by existing surveys in the field. We categorize existing works into four categories based on the optimization strategies of communication: communication masking, communication compression, communication frequency reduction, and hybrid optimization. Finally, we discuss potential future challenges and research directions in the field of communication optimization algorithms for distributed deep learning systems.

关键词： Communication optimization algorithms distributed computing distributed deep learning parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel Enumeration of Parse Trees 48

Parallel Enumeration of Parse Trees

引用

48th International Symposium on Mathematical Foundations of Computer Science, MFCS 2023

作者： Mikhelson, Margarita Okhotin, Alexander Department of Mathematics and Computer Science Saint Petersburg State University Russia

ISBN: (纸本)9783959772921

A parallel algorithm for enumerating parse trees of a given string according to a fixed context-free grammar is defined. The algorithm computes the number of parse trees of an input string;more generally, it applies to computing the weight of a string in a weighted grammar. The algorithm is first implemented on an arithmetic circuit of depth O((log n)2) with O(n6) elements. Then, it is improved using fast matrix multiplication to use only O(n5.38) elements, while preserving depth O((log n)2). © Margarita Mikhelson and Alexander Okhotin

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

A Fuzzy-Model-Based Approach to Optimal Control for Nonlinear Markov Jump Singularly Perturbed Systems: A Novel Integral Reinforcement Learning Scheme

引用

IEEE TRANSACTIONS ON FUZZY SYSTEMS 2023年第10期31卷 3734-3740页

作者： Shen, Hao Wang, Yun Wang, Jing Park, Ju H. Anhui Univ Technol AnHui Prov Key Lab Special Heavy Load Robot Maanshan 243002 Peoples R China Anhui Univ Technol Sch Mech Engn Maanshan 243002 Peoples R China Yeungnam Univ Dept Elect Engn Kyongsan 38541 South Korea

A fuzzy-model-based approach is developed to investigate the reinforcement learning-based optimization for nonlinear Markov jump singularly perturbed systems. As the first attempt, an offline parallel iteration learning algorithm is presented to solve the coupled algebraic Riccati equations with singular perturbation and jumping parameters. Furthermore, based on the integral reinforcement learning approach, a novel online parallel learning algorithm is proposed by employing the slow and fast sampled data simultaneously, where the impacts of stochastic jumping and ill-conditioned numerical problems are avoided. Meanwhile, the convergence of the proposed learning algorithms is proved. Finally, we present a tunnel diode circuit model to demonstrate the efficacy of the proposed methods.

关键词： Markov processes Optimal control parallel algorithms Heuristic algorithms Reinforcement learning System dynamics Fuzzy systems Fuzzy-model-based approach Markov jump singularly perturbed systems (MJSPSs) parallel algorithm reinforcement learning (RL)

来源：评论

学校读者我要写书评

暂无评论

parallel Filtered Graphs for Hierarchical Clustering 39

Parallel Filtered Graphs for Hierarchical Clustering

引用

39th IEEE International Conference on Data Engineering, ICDE 2023

作者： Yu, Shangdi Shun, Julian Mit Csail United States

ISBN: (纸本)9798350322279

Given all pairwise weights (distances) among a set of objects, filtered graphs provide a sparse representation by only keeping an important subset of weights. Such graphs can be passed to graph clustering algorithms to generate hierarchical clusters. In particular, the directed bubble hierarchical tree (DBHT) algorithm on filtered graphs has been shown to produce good hierarchical clusters for time series *** propose a new parallel algorithm for constructing triangulated maximally filtered graphs (TMFG), which produces valid inputs for DBHT, and a scalable parallel algorithm for generating DBHTs that is optimized for TMFG inputs. In addition to parallelizing the original TMFG construction, which has limited parallelism, we also design a new algorithm that inserts multiple vertices on each round to enable more parallelism. We show that the graphs generated by our new algorithm have similar quality compared to the original TMFGs, while being much faster to generate. Our new parallel algorithms for TMFGs and DBHTs are 136-2483x faster than state-of-the-art implementations, while achieving up to 41.56x self-relative speedup on 48 cores with hyper-threading, and achieve better clustering results compared to the standard average-linkage and complete-linkage hierarchical clustering algorithms. We show that on a stock data set, our algorithms produce clusters that align well with human experts' classification. © 2023 IEEE.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

An OpenMP-based breadth-first search implementation using the bag data structure

引用

CONCURRENCY AND COMPUTATION-PRACTICE & EXPERIENCE 2024年第16期36卷 e8119-e8119页

作者： de Oliveira, S. L. Gonzaga Santana, M. I. Brandao, D. N. Osthoff, C. Univ Fed Sao Paulo Inst Ciencia & Tecnol Sao Paulo Brazil Univ Fed Lavras Comp Sci Dept Lavras MG Brazil Ctr Fed Educ Tecnol Celso Suckow Fonseca Dept Comp Sci Rio De Janeiro Brazil Ctr Nacl Proc Alto Desempenho Lab Nacl Computacao Cient Rio De Janeiro Brazil

The breadth-first search procedure is an algorithm that traverses the vertices of a graph, determining the distance from each vertex to the initial vertex. The distance is infinite for a non-reachable vertex from the starting vertex. Despite having an efficient serial version, this important algorithm is irregular, making its effective parallel implementation a daunting task. This paper shows the results of an OpenMP-based implementation of the breadth-first search procedure using the bag data structure. Furthermore, the code relied on the C++ programming language. This paper reimplements an existing proposal coded using the Cilk++ programming language. The experiments relied on 32 strongly connected graphs and 31 disconnected graphs in executions performed on two machines. The first machine contained 28 cores and two threads per core. The second machine comprised 48 processing cores, with hyperthreading disabled. Regarding the serial version, the parallel implementation yielded a speedup of up to 20x when using 28 processing cores and up to 25x when using 56 threads in tests performed on a machine with the first generation of Intel (R) Xeon (R) Scalable processors. Furthermore, the new parallel implementation yielded speedups of up to 45x when using 48 cores in experiments performed on a machine with the second generation of Intel (R) Xeon (R) Scalable processors.

关键词： graph algorithms high-performance computing OpenMP parallel algorithms parallel breadth-first search parallel computing

来源：评论

学校读者我要写书评

暂无评论

SBMGT: Scaling Bayesian Multinomial Group Testing 25

SBMGT: Scaling Bayesian Multinomial Group Testing

引用

30th Symposium on Principles and Practice of parallel Programming

作者： Chen, Weicong Qi, Hao Tatsuoka, Curtis Lu, Xiaoyi Univ Calif Merced Merced CA 95343 USA Univ Pittsburgh Pittsburgh PA 15260 USA

ISBN: (纸本)9798400714436

Group testing is a widely used binary classification method that efficiently distinguishes between samples with and without a binary-classifiable attribute by pooling and testing subsets of a group. Bayesian Group Testing (BGT) is the state-of-the-art approach, which integrates prior risk information into a Bayesian Boolean Lattice framework to minimize test counts and reduce false classifications. However, BGT, like other existing group testing techniques, struggles with multinomial group testing, where samples have multiple binary-classifiable attributes that can be individually distinguished simultaneously. We address this need by proposing Bayesian Multinomial Group Testing (BMGT), which includes a new Bayesian-based model and supporting theorems for an efficient and precise multinomial pooling strategy. We further design and develop SBMGT, a high-performance and scalable framework to tackle BMGT's computational challenges by proposing three key innovations: 1) a parallel binaryencoded product lattice model with up to 99.8% efficiency;2) the Bayesian Balanced Partitioning Algorithm (BBPA), a multinomial pooling strategy optimized for parallel computation with up to 97.7% scaling efficiency on 4096 cores;and 3) a scalable multinomial group testing analytics framework, demonstrated in a real-world disease surveillance case study using AIDS and STDs datasets from Uganda, where SBMGT reduced tests by up to 54% and lowered false classification rates by 92% compared to BGT.

关键词： Multinomial group testing Bayesian methods parallel algorithms Graph algorithms

来源：评论

学校读者我要写书评

暂无评论

parallel Algorithm of High Precision Surface Modeling Based on Differential Geometry 4

Parallel Algorithm of High Precision Surface Modeling Based ...

引用

4th IEEE Annual Flagship India Council International Subsections Conference, INDISCON 2023

作者： Hou, Lanbao School of Mathematics and Physics Science Jingchu University of Technology Jingmen448000 China

ISBN: (纸本)9798350333558

The development of high-precision surface modeling has always been an important research field in computer science. One way to solve this problem is to use differential geometry, which provides a mathematical framework for describing the shape and curvature of surfaces. parallel algorithms have been developed to improve the efficiency and scalability of these models. These algorithms divide computation into smaller tasks that can be executed simultaneously on multiple processors, thereby reducing the total time required for modeling. The parallel algorithm of high-precision surface modeling based on differential geometry includes several steps. Firstly, create the initial mesh using a rough approximation of the surface. Then, iterative techniques are used to refine the mesh and adjust its vertices to better match the desired shape. Finally, apply smoothing to eliminate any irregularities in the surface. This algorithm has been proven to generate highly accurate models with minimal computational overhead. In a word, the parallel algorithm based on differential geometry provides a promising method for high-precision surface modeling. With the continuous improvement of computing power, these technologies will become increasingly important in fields such as engineering and manufacturing, where accurate representation of surfaces is crucial. © 2023 IEEE.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

FastGR: Global Routing on CPU-GPU With Heterogeneous Task Graph Scheduler

引用

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS 2023年第7期42卷 2317-2330页

作者： Liu, Siting Pu, Yuan Liao, Peiyu Wu, Hongzhong Zhang, Rui Chen, Zhitang Lv, Wenlong Lin, Yibo Yu, Bei Chinese Univ Hong Kong Dept Comp Sci & Engn Hong Kong Peoples R China Peking Univ Sch Integrated Circuits Beijing 100871 Peoples R China HiSilicon Shenzhen 518129 Peoples R China Huawei Noahs Ark Lab Hong Kong Peoples R China Huawei Technol Co Shenzhen Peoples R China Peking Univ Sch Integrated Circuits Beijing 100871 Peoples R China Beijing Adv Innovat Ctr Integrated Circuits Beijing 100871 Peoples R China

Running time is a key metric across the standard physical design flow stages. However, with the rapid growth in design sizes, routing runtime has become the runtime bottleneck in the physical design flow. As a result, speeding routing becomes a critical and pressing task for IC design automation. Aside from the running time, we need to evaluate the quality of the global routing solution since a poor global routing engine degrades the solution performance after the entire routing stage. This work takes both of them into consideration. We propose a global routing framework with GPU-accelerated routing algorithms and a heterogeneous task graph scheduler, called FastGR, to accelerate the procedure of the modern global router and improve its effectiveness. Its runtime-oriented version FastGRL achieves 2.489x speedup compared with the state-of-the-art global router. Furthermore, the GPU-accelerated L-shape pattern routing algorithm used in FastGRL can contribute to 9.324x speedup over the sequential algorithm on CPU. Its quality-oriented version FastGRH offers a 27.855% improvement of the number of shorts over the runtime-oriented version and still gets 1.970x faster than the most advanced global router.

关键词： parallel algorithms routing

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：