All-reduce is a widely used communication technique for distributed and parallel applications typically implemented using either a tree-based or ring-based scheme. Each of these approaches has its own limitations: tre...
详细信息
All-reduce is a widely used communication technique for distributed and parallel applications typically implemented using either a tree-based or ring-based scheme. Each of these approaches has its own limitations: tree-based schemes struggle with efficiently exchanging large messages, while ring-based solutions assume constant communication throughput,an unrealistic expectation in modern network communication infrastructures. We present FMCC-RT, an all-reduce approach that combines the advantages of tree-and ring-based implementations while mitigating their drawbacks. FMCC-RT dynamically switches between tree and ring-based implementations depending on the size of the message being processed. It utilizes an analytical model to assess the impact of message sizes on the achieved throughput, enabling the derivation of optimal work partitioning parameters. Furthermore, FMCC-RT is designed with an Open MPI-compatible API, requiring no modification to user code. We evaluated FMCC-RT through micro-benchmarks and real-world application tests. Experimental results show that FMCC-RT outperforms state-of-the-art tree-and ring-based methods, achieving speedups of up to 5.6×.
Community detection is a vital task in many fields,such as social networks and financial analysis,to name a *** Louvain method,the main workhorse of community detection,is a popular heuristic *** apply it to large-sca...
详细信息
Community detection is a vital task in many fields,such as social networks and financial analysis,to name a *** Louvain method,the main workhorse of community detection,is a popular heuristic *** apply it to large-scale graph networks,researchers have proposed several parallel Louvain methods(PLMs),which suffer from two challenges:the latency in the information synchronization,and the community *** tackle these two challenges,we propose an isolate sets based parallel Louvain method(IPLM)and a fusion IPLM with the hashtables based Louvain method(FIPLM),which are based on a novel graph partition *** graph partition algorithm divides the graph network into subgraphs called isolate sets,in which the vertices are relatively decoupled from *** first describe the concepts and properties of the isolate *** we propose an algorithm to divide the graph network into isolate sets,which enjoys the same computation complexity as the breadth-first ***,we propose IPLM,which can efficiently calculate and update vertices information in parallel without latency or community ***,we achieve further acceleration by FIPLM,which maintains a high quality of community detection with a faster speedup than *** two methods are for shared-memory architecture,and we implement our methods on an 8-core PC;the experiments show that IPLM achieves a maximum speedup of 4.62x and outputs higher modularity(maximum 4.76%)than the serial Louvain method on 14 of 18 ***,FIPLM achieves a maximum speedup of 7.26x.
The conventional Levenberg-Marquardt (LM) algorithm is a state-of-the-art trust-region optimization method for solving bundle adjustment problems in the Structure-from-Motion community, which not only takes advantage ...
详细信息
Motion and appearance cues play a crucial role in Multi-object Tracking (MOT) algorithms for associating objects across consecutive frames. While most MOT methods prioritize accurate motion modeling and distincti...
详细信息
Anomalies in time series appear consecutively, forming anomaly segments. Applying the classical point-based evaluation metrics to evaluate the detection performance of segments leads to considerable underestimation, s...
详细信息
K-Means algorithm is one of the most common clustering algorithms widely applied in various data analysis applications. Yinyang K-Means algorithm is a popular enhanced K-Means algorithm that avoids most unnecessary ca...
详细信息
Abstract: Predicting pollutant leakage and diffusion processes is crucial for ensuring people’s safety. While the deep learning method offers high simulation efficiency and superior generalization, there is currently...
详细信息
Network traffic classification is crucial for network security and network management and is one of the most important network tasks. Current state-of-the-art traffic classifiers are based on deep learning models to a...
详细信息
Recent years have seen the wide application of natural language processing(NLP)models in crucial areas such as finance,medical treatment,and news media,raising concerns about the model robustness and *** find that pro...
详细信息
Recent years have seen the wide application of natural language processing(NLP)models in crucial areas such as finance,medical treatment,and news media,raising concerns about the model robustness and *** find that prompt paradigm can probe special robust defects of pre-trained language *** prompt texts are first constructed for inputs and a pre-trained language model can generate adversarial examples for victim models via *** results show that prompt paradigm can efficiently generate more diverse adversarial examples besides synonym ***,we propose a novel robust training approach based on prompt paradigm which incorporates prompt texts as the alternatives to adversarial examples and enhances robustness under a lightweight minimax-style optimization *** on three real-world tasks and two deep neural models show that our approach can significantly improve the robustness of models to resist adversarial attacks.
Depthwise convolutions are widely used in lightweight convolutional neural networks (CNNs). The performance of depthwise convolutions is mainly bounded by the memory access rather than the arithmetic operations for cl...
详细信息
暂无评论