Withthe rapiddevelopment of deep learning,the size of data sets anddeepneuralnetworks(DNNs)models are also *** a result,the intolerable long time for models’training or inference with conventional strategies can not ...
详细信息
Withthe rapiddevelopment of deep learning,the size of data sets anddeepneuralnetworks(DNNs)models are also *** a result,the intolerable long time for models’training or inference with conventional strategies can not meet the satisfaction of modern tasks ***,devices stay idle in the scenario of edge computing(EC),which presents a waste of resources since they can share the pressure of the busy devices but they do *** address the problem,the strategy leveraging distributed processing has been applied to load computation tasks from a single processor to a group of devices,which results in the acceleration of training or inference of DNN models and promotes the high utilization of devices in edge *** with existing papers,this paper presents an enlightening and novel review of applying distributed processing with data and model parallelism to improve deep learning tasks in edge *** the practicalities,commonly used lightweight models in a distributed system are introduced as *** the key technique,the parallel strategy will be described in *** some typical applications of distributed processing will be ***,the challenges of distributed processing with edge computing will be described.
Graph Convolutional Networks (GCNs) are widely used in various domains. However, training distributed full-batch GCNs on large-scale graphs poses challenges due to inefficient memory access patterns and high communica...
详细信息
ISBN:
(纸本)9798400714436
Graph Convolutional Networks (GCNs) are widely used in various domains. However, training distributed full-batch GCNs on large-scale graphs poses challenges due to inefficient memory access patterns and high communication overhead. This paper presents a general and efficient GCN training framework on CPU supercomputers. It comprises a general aggregation kernel designed to optimize irregular memory access and a quantization method with label propagation to reduce communication overhead. Experimental results show that our method achieves a speedup of up to 4.1× compared with the SoTA implementations.
The integration of social networks with the Internet of Things (IoT) has been explored in recent research, giving rise to the Social Internet of Things (SIoT). One promising application of SIoT is viral marketing, whi...
详细信息
Graph models many real-world data like social, transportation, biology, and communication data. Hence, graph traversal including multi-hop or graph-walking queries has been the key operation atop graph stores. However...
详细信息
Graph models many real-world data like social, transportation, biology, and communication data. Hence, graph traversal including multi-hop or graph-walking queries has been the key operation atop graph stores. However, since different graph traversals may touch different sets of vertices, it is hard or even impossible to have a one-size-fits-all graph partitioning algorithm that preserves access locality for various graph traversal workloads. Meanwhile, prior shard-based migration faces a dilemma such that coarse-grained migration may incur more migration overhead over increased locality benefits, while fine-grained migration usually requires excessive metadata and incurs non-trivial maintenance costs. We present Pragh, an efficient locality-preserving live graph migration scheme for graph stores in the form of key-value pairs. The key idea of Pragh is a split migration model that only migrates values physically while retaining keys in the initial location. This allows fine-grained migration while avoiding the need to maintain excessive metadata. Pragh integrates an RDMA-friendly location cache from DrTM-KV to provide fully-localized access to migrated data and further makes a novel reuse of the cache replacement policy for lightweight monitoring. Pragh further supports evolving graphs through a check-and-forward mechanism to resolve the conflict between updates and migration of graph data. Evaluations on an 8-node RDMA-capable cluster (100 Gbps) using a representative graph traversal benchmark show that Pragh can increase the throughput by up to 19x and decrease the median latency by up to 94%, thanks to split live migration that eliminates 97% remote accesses. A port of split live migration to Wukong shows up to 2.53x throughput improvement on representative workloads like LUBM-10240, thanks to a reduction of 88% remote accesses. This further confirms the effectiveness and generality of Pragh. Finally, though Pragh focuses on RDMA-based graph traversal, we show it
Vertex Cover parameterized by the solution size k is the quintessential fixed-parameter tractable problem. FPT algorithms are most interesting when the parameter is small. Several lower bounds on k are well-known, suc...
详细信息
At present, the development of bifacial photovoltaic modules is rapid, but there is a lack of concentrating light devices in bifacial photovoltaic modules for power generation. This paper analyzes the changing pattern...
详细信息
In this paper, we introduce an artificial intelligence-based system designed for the segmentation and auxiliary diagnosis of focal liver lesions. This system can effectively segment lesions in both singlephase non-con...
详细信息
In recent years, there has been a significant rise in the number of software startups globally, driven by advances in technology and increasing reliance on digital solutions. These startups are crucial in shaping the ...
详细信息
Performing an object detection task after the restoration of a hazy image, or rather detecting with the network backbone directly, will result in the inclusion of information mixed with dehazing, which tends to interf...
详细信息
Cylindrical Algebraic Decomposition (CAD) is one of the pillar algorithms of symbolic computation, and its worst-case complexity is double exponential to the number of variables. Researchers found that variable order ...
详细信息
ISBN:
(纸本)9781713899921
Cylindrical Algebraic Decomposition (CAD) is one of the pillar algorithms of symbolic computation, and its worst-case complexity is double exponential to the number of variables. Researchers found that variable order dramatically affects efficiency and proposed various heuristics. The existing learning-based methods are all supervised learning methods that cannot cope with diverse polynomial sets. This paper proposes two Reinforcement Learning (RL) approaches combined with Graph Neural Networks (GNN) for Suggesting Variable Order (SVO). One is GRL-SVO(UP), a branching heuristic integrated with CAD. The other is GRL-SVO(NUP), a fast heuristic providing a total order directly. We generate a random dataset and collect a real-world dataset from SMT-LIB. The experiments show that our approaches outperform state-of-the-art learning-based heuristics and are competitive with the best expert-based heuristics. Interestingly, our models show a strong generalization ability, working well on various datasets even if they are only trained on a 3-var random dataset. The source code and data are available at https://***/dongyuhang22/GRL-SVO.
暂无评论