检索结果-内蒙古大学图书馆

arXiv 2024年

作者： Träff, Jesper Larsson TU Wien Faculty of Informatics Institute of Computer Engineering Research Group Parallel Computing 191-4 Treitlstrasse 3 5th Floor Vienna1040 Austria

These lecture notes are designed to accompany an imaginary, virtual, undergraduate, one or two semester course on fundamentals of parallel Computing as well as to serve as background and reference for graduate courses on High-Performance Computing, parallel algorithms and shared-memory multiprocessor programming. They introduce theoretical concepts and tools for expressing, analyzing and judging parallel algorithms and, in detail, cover the two most widely used concrete frameworks OpenMP and MPI as well as the threading interface pthreads for writing parallel programs for either shared or distributed memory parallel computers with emphasis on general concepts and principles. Code examples are given in a C-like style and many are actual, correct C code. The lecture notes deliberately do not cover GPU architectures and GPU programming, but the general concerns, guidelines and principles (time, work, cost, efficiency, scalability, memory structure and bandwidth) will be just as relevant for efficiently utilizing various GPU architectures. Likewise, the lecture notes focus on deterministic algorithms only and do not use randomization. Slides or blackboard drawings are imagined to be worked out for the actual lectures by the lecturer, so the lecture notes deliberately do not provide such important visual aid: some is available from the author on request. Also the student of this material will find it instructive to take the time to understand concepts and algorithms visually. The exercises can be used for self-study and as inspiration for small implementation projects in OpenMP and MPI that can and should accompany any serious course on parallel Computing. The student will benefit from actually implementing and carefully benchmarking the suggested algorithms on the parallel computing system that may or should be made available as part of such a parallel Computing course. In class, the exercises can be used as basis for hand-ins and small programming projects for which su

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

SuperBlocking: An Efficient Blocking Technique for Record Linkage

SuperBlocking: An Efficient Blocking Technique for Record Li...

引用

2023 IEEE International Conference on Big Data, BigData 2023

作者： Basak, Joyanta Sahni, Sartaj Rajasekaran, Sanguthevar University of Connecticut CSE Department Storrs United States University of Florida CISE Department Gainesville United States

ISBN: (纸本)9798350324457

Given multiple data sets, the problem of record linkage is to cluster them such that each cluster has all the information pertaining to a single entity and does not contain any other information. This problem has numerous applications in domains such as healthcare, law enforcement, medicine, census data analysis, etc. The performance of record linkage algorithms is measured with two metrics, namely, run times and accuracy. Record linkage has been studied extensively and numerous algorithms have been proposed. These algorithms take a very long time especially when the input data sets are large. Many applications of interest call for real-time or very nearly real-time performance. Thus there is a crucial need for the creation of novel record linkage algorithms that are very fast while maintaining a very good *** is a technique that is typically used to speed up record linkage algorithms. In this paper, we introduce a novel algorithm for blocking called SuperBlocking. We have created novel record linkage algorithms that employ SuperBlocking. Experimental comparisons reveal that our algorithms outperform state-of-the-art algorithms for record linkage. We have also developed parallel versions of our record linkage algorithms and they obtain close to linear speedups. © 2023 IEEE.

关键词： blocking clustering parallel algorithms record linkage

来源：评论

学校读者我要写书评

暂无评论

Thoughts on the Big Data parallel Clustering Algorithm Based on the Mining of Community Maximal Classes

Thoughts on the Big Data Parallel Clustering Algorithm Based...

引用

Cybersecurity, Internet of Things and Soft Computing (CITSC), Asia-Europe Conference on

作者： Hezheng Mao Nanjing Tech University Nanjing China

ISBN: (数字)9798331504205

ISBN: (纸本)9798331504212

In order to accurately and quickly find the network structure in big data, this paper proposes a big data clustering algorithm based on community maximal classes. To address the time consumption caused by the uncertainty of initial nodes and the calculation of the fitness function, local key nodes are introduced and the fitness formula is improved to reduce the time consumption. For the formation of the initial community, the concept of maximal clique is introduced. By analyzing the characteristics of maximal cliques, it is concluded that the core category of the community is composed of maximal cliques. Meanwhile, a method to obtain local core categories through the discovery of maximal cliques is proposed, and a parallel strategy for the maximal clique discovery algorithm is put forward. Then, the parallel strategy of the whole algorithm is proposed and experiments are conducted on real datasets. The experimental results prove that the algorithm proposed in this paper is feasible and effective, and is applicable to the discovery of network structures in large-scale data.

关键词： Uncertainty Clustering algorithms Computer architecture Big Data Market research Organ transplantation Internet of Things Data mining parallel algorithms Computer security

来源：评论

学校读者我要写书评

暂无评论

Design and Implementation of an Efficient parallel Algorithm for Sparse Principal Component Analysis

Design and Implementation of an Efficient Parallel Algorithm...

引用

International Conference on Computer and Information Science (ACIS)

作者： Hadis Barati Mohammad Jalali Abdorreza Torabi Department of Engineering Sciences University of Tehran Tehran Iran

ISBN: (数字)9798350376647

ISBN: (纸本)9798350376654

Sparse matrix computations are an important class of algorithms. One of the important topics in this field is SPCA (Sparse Principal Component Analysis), a variant of PCA. SPCA is used to compute the principal components of a matrix. There are various methods for computing the sparse principal components of a dataset. One of them is the congradU (Conditional gradient algorithm with unit step size) method, which is an iterative approach. This method performs a matrix-vector multiplication at each iteration of its execution process. Therefore, we need to accelerate the multiplication operation. In this regard, we propose a parallel algorithm for the congradU method that uses a master/worker model to distribute the rows of the matrix among the cores or processors in a manner that ensures an appropriate workload distribution between them. By optimizing the workload distribution among processors, we can reduce the overall execution time of operations. The proposed algorithm has been tested on randomly generated matrices with different sizes and sparsity percentages. We compare the time to find the first principal component using the proposed algorithm and SVD algorithm. It was observed that by increasing the size and sparsity percentage of the matrix, the proposed algorithm finds the first principal component faster than the SVD algorithm. Also, we compare the time of the multiplication operation in one iteration of the proposed algorithm and the dot operator (in Python), and we observe that with increasing the percentage of sparsity, the proposed algorithm performs better than the dot operator.

关键词： Computers Information science Program processors Iterative algorithms Computational efficiency Sparse matrices Resource management parallel algorithms Principal component analysis Load modeling

来源：评论

学校读者我要写书评

暂无评论

parallel Strong Connectivity Based on Faster Reachability

引用

Proceedings of the ACM on Management of Data 2023年第2期1卷 1-29页

作者： Letong Wang Xiaojun Dong Yan Gu Yihan Sun University of California Riverside Riverside CA USA

Computing strongly connected components (SCC) is among the most fundamental problems in graph analytics. Given the large size of today's real-world graphs, parallel SCC implementation is increasingly important. SCC is challenging in the parallel setting and is particularly hard on large-diameter graphs. Many existing parallel SCC implementations can be even slower than Tarjan's sequential algorithm on large-diameter *** tackle this challenge, we propose an efficient parallel SCC implementation using a new parallel reachability approach. Our solution is based on a novel idea referred to as vertical granularity control (VGC). It breaks the synchronization barriers to increase parallelism and hide scheduling overhead. To use VGC in our SCC algorithm, we also design an efficient data structure called the parallel hash bag. It uses parallel dynamic resizing to avoid redundant work in maintaining frontiers (vertices processed in a round).We implement the parallel SCC algorithm by Blelloch et al. (J. ACM, 2020) using our new parallel reachability approach. We compare our implementation to the state-of-the-art systems, including GBBS, iSpan, Multi-step, and our highly optimized Tarjan's (sequential) algorithm, on 18 graphs, including social, web, k-NN, and lattice graphs. On a machine with 96 cores, our implementation is the fastest on 16 out of 18 graphs. On average (geometric means) over all graphs, our SCC is 6.0× faster than the best previous parallel code (GBBS), 12.8× faster than Tarjan's sequential algorithms, and 2.7× faster than the best existing implementation on each *** believe that our techniques are of independent interest. We also apply our parallel hash bag and VGC scheme to other graph problems, including connectivity and least-element lists (LE-lists). Our implementations improve the performance of the state-of-the-art parallel implementations for these two problems.

关键词： graph algorithms graph analytics parallel algorithms reachability strong connectivity

来源：评论

学校读者我要写书评

暂无评论

parallel Power Flow Calculation of Urban Rail Transit Traction Power Supply Network Based on Multi-Process

Parallel Power Flow Calculation of Urban Rail Transit Tracti...

引用

Clean Energy and Electric Power Engineering (ICCEPE), International Conference on

作者： Jiapeng Wang Liwei Zhang Menglei Zhang School of Electrical Engineering Beijing Jiaotong University Beijing China

ISBN: (数字)9798350390315

ISBN: (纸本)9798350390322

To solve the problem of high time cost in power flow calculation of urban rail transit traction power supply network, research on using an acceleration algorithm to reduce calculation time was carried out. As the complexity and scale of urban rail transit traction power supply network systems continue to increase, traditional serial calculation methods can no longer meet actual needs. Using parallel algorithms for optimization has become an important solution to improve the speed and efficiency of power flow calculations. In this optimization algorithm, the traction power supply network data in the database is read in batches, and based on the train's running time, the MapReduce model and process pool are used to process and calculate the fragmented data in parallel. Finally, the results are merged and output. Such an optimization algorithm can make full use of the performance of parallel computing devices, such as multi-core CPUs while reducing the time of data transmission and processing, thereby effectively improving the speed and efficiency of power flow calculations. Experimental results show that the optimization effect of using this parallel algorithm is closely related to the number of parallel processes and CPU cores. The more cores are, the more obvious the optimization effect is. In the experiments of this article, the acceleration effect is optimal when the number of CPU cores is equal to the number of processes. The calculation time required after optimization is about 1/6 of that before optimization.

关键词： Rails Costs Multicore processing Computational modeling Traction power supplies Central Processing Unit Resource management parallel algorithms Optimization Load flow

来源：评论

学校读者我要写书评

暂无评论

Collision Detection Technique Based on the Spatial Decomposition Method

Collision Detection Technique Based on the Spatial Decomposi...

引用

2023 International Seminar on Computer Science and Engineering Technology, SCSET 2023

作者： Zhao, Wei Qu, Huiyan Automobile College of Changchun Institute of Engineering Changchun China

ISBN: (纸本)9798350301472

In this paper, we propose a collision detection algorithm for a dynamic simulation system. This algorithm first conducts global search for obj ects, optimizes global detection through spatial decomposition, and uses spatial segmentation technology to map objects to spatial grids and quickly exclude adjacent but disjoint objects. In the process of accurate collision detection, the data structure of the hash table is used to save the geometric information of the objects, and the parallel algorithm is used to update the hash table to further improve the rate of the parallel algorithm. The experiment shows that the algorithm can ensure the authenticity of flexible fabric collision detection, and the space complexity and time complexity are relatively low, which can solve the flexible fabric simulation problem of time-consuming and large space. © 2023 IEEE.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

pSTL-Bench: A Micro-Benchmark Suite for Assessing Scalability of C++ parallel STL Implementations

arXiv

引用

arXiv 2024年

作者： Laso, Ruben Krupitza, Diego Hunold, Sascha Faculty of Informatics TU Wien Vienna Austria

Since the advent of parallel algorithms in the C++17 Standard Template Library (STL), the STL has become a viable framework for creating performance-portable applications. Given multiple existing implementations of the parallel algorithms, a systematic, quantitative performance comparison is essential for choosing the appropriate implementation for a particular hardware configuration. In this work, we introduce a specialized set of micro-benchmarks to assess the scalability of the parallel algorithms in the STL. By selecting different backends, our micro-benchmarks can be used on multi-core systems and GPUs. Using the suite, in a case study on AMD and Intel CPUs and NVIDIA GPUs, we were able to identify substantial performance disparities among different implementations, including GCC+TBB, GCC+HPX, Intel’s compiler with TBB, or NVIDIA’s compiler with OpenMP and CUDA. © 2024, CC BY.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

A Fast parallel Algorithm Based on GPU for Harmonic Balance Method

A Fast Parallel Algorithm Based on GPU for Harmonic Balance ...

引用

IEEE Asia-Pacific Conference on Antennas and Propagation (APCAP)

作者： Zhengzhuo Wang Yanliang Sha Lingyun Ouyang Quan Chen Jianguo Hu School of Microelectronics Science and Technology Sun Yat-sen University Zhuhai China School of Microelectronics Southern University of Science and Technology Shenzhen China

ISBN: (数字)9798350351019

ISBN: (纸本)9798350351026

Computational electromagnetics methods for analysing nonlinear systems are computationally complex, such as harmonic balance (HB) method, especially when dealing with a large number of frequency points. In this paper, we propose a fast parallel algorithm for HB method to accelerate electromagnetic simulation. The new algorithm parallelizes the construction of nonlinear Jacobian matrix, utilizing graphical processing unit (GPU) to realize improvements for electromagnetic simulation. We present the formulations of the parallel HB method, and subsequently provide its implementation details based on the mixed platform with GPU and CPU. Experimental results from several industrial cases illustrate that the new parallel algorithm leads to $3 \times$ speedup compared to the conventional HB method while still maintaining the similar accuracy, where the GPU-accelerated part is about 10 times faster than its CPU counterpart.

关键词： Jacobian matrices Accuracy Graphics processing units Harmonic analysis Computational electromagnetics Central Processing Unit parallel architectures parallel algorithms Electromagnetics Nonlinear systems

来源：评论

学校读者我要写书评

暂无评论

A Reexamination of the Communication Bandwidth Cost Analysis of A parallel Recursive Algorithm for Solving Triangular Systems of Linear Equations

arXiv

引用

arXiv 2024年

作者： Tang, Yuan School of Computer Science School of Software Fudan University Shanghai China

This paper presents a reexamination of the research paper titled "Communication-Avoiding parallel algorithms for TRSM" by Wicky et al. We focus on the communication bandwidth cost analysis presented in the original work and identify potential issues that require clarification or revision. The problem at hand is the need to address inconsistencies and miscalculations found in the analysis, particularly in the categorization of costs into three scenarios based on the relationship between matrix dimensions and processor count. Our findings contribute to the ongoing discourse in the field and pave the way for further improvements in this area of research. Copyright © 2024, The Authors. All rights reserved.

关键词： parallel algorithms

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：