Edge computing has emerged as a promising paradigm to fulfill the escalating demands of latency-sensitive and computationally intensive applications. In this context, efficient server deployment and service placement ...
详细信息
Balanced hypergraph partitioning is a classical NP-hard optimization problem with applications in various domains such as VLSI design, simulating quantum circuits, optimizing data placement in distributed databases or...
详细信息
ISBN:
(纸本)9783031125973;9783031125966
Balanced hypergraph partitioning is a classical NP-hard optimization problem with applications in various domains such as VLSI design, simulating quantum circuits, optimizing data placement in distributed databases or minimizing communication volume in high performance computing. Engineering parallel partitioning heuristics is a topic of recent research, yet most of them are non-deterministic. In this work, we design and implement a highly scalable deterministic algorithm in the parallel partitioning framework Mt-KaHyPar. On our extensive set of benchmark instances, it achieves similar partition quality and performance as a comparable but non-deterministic configuration of Mt-KaHyPar and outperforms the only other parallel deterministic algorithm BiPart regarding partition quality, running time and speedups.
Massive terminal users have brought explosive need of data residing at edge of overall network. Multiple Mobile Edge computing (MEC) servers are built in/near base station to meet this need. However, optimal distribut...
详细信息
ISBN:
(纸本)9781665473156
Massive terminal users have brought explosive need of data residing at edge of overall network. Multiple Mobile Edge computing (MEC) servers are built in/near base station to meet this need. However, optimal distribution of these servers to multiple users in real time is still a problem. Reinforcement Learning (RL) as a framework to solve interaction problem is a promising solution. In order to apply RL based algorithm into a multi-agent environment, we propose an iterative scheme: select individual users with priorities to interact with the environment iteratively one at a time. Furthermore, we tried to optimize the overall system performance based on this scheme. Hence, we construct three objective system performance indicators: average processing cost, delay and energy consumption, improve the existing Deep Q-learning Network (DQN) by using the cost as reward function, changing the fixed exploitation rate into dynamic one that associated with reward and episode time. In order to explore the performance potential of the proposed algorithm, we have simulated the proposed algorithm, DQN algorithm and greedy algorithm under different users and data sizes. The results show that the proposed algorithm had reduced at least 12% of system average processing cost comparing to the greedy algorithm. It also outperform the greedy algorithm and DQN algorithm in delay and energy consumption significantly.
In parallel with the continuously increasing parameter space dimensionality, search and optimization algorithms should support distributed parameter evaluations to reduce cumulative runtime. Intel's neuromorphic o...
详细信息
Convolutional neural network (CNN) has attracted increasing attention and been widely used in imaging processing, bioinformatics and so on. As the cloud computing and multiparty computing are booming, the training and...
详细信息
Hierarchical federated learning has been studied as a more practical approach to federated learning in terms of scalability, robustness, and privacy protection, particularly in edge computing. To achieve these advanta...
详细信息
ISBN:
(纸本)9798400708435
Hierarchical federated learning has been studied as a more practical approach to federated learning in terms of scalability, robustness, and privacy protection, particularly in edge computing. To achieve these advantages, operations are typically conducted in a grouped manner at the edge, which means that the formation of client groups can affect the learning performance, such as the benefits gained and costs incurred by group operations. This is especially true for edge and mobile devices, which are more sensitive to computation and communication overheads. The formation of groups is critical for group-based federated edge learning but has not been studied in detail, and even been overlooked by researchers. In this paper, we consider a group-based federated edge learning framework that leverages the hierarchical cloud-edge-client architecture and probabilistic group sampling. We first theoretically analyze the convergence rate with respect to the characteristics of the client groups, and find that group heterogeneity plays an important role in the convergence. Then, on the basis of this key observation, we propose new group formation and group sampling methods to reduce data heterogeneity within groups and to boost the convergence and performance of federated learning. Finally, our extensive experiments show that our methods outperform current algorithms in terms of prediction accuracy and training cost.
Magnetic resonance imaging is an essential tool for clinical diagnosis, but acquisition time and reconstruction errors have always been bottleneck issues limiting the development of this field. In this paper, we use S...
详细信息
In high-throughput intelligent computing scenarios, multi-device parallelism strategies based on data parallelism or pipeline parallelism have been extensively utilized to accelerate large deep neural network model in...
详细信息
Pointwise convolutions are widely used in various convolutional neural networks, due to low computation complexity and parameter requirements. However, pointwise convolutions are still time-consuming like regular conv...
详细信息
ISBN:
(数字)9789819708628
ISBN:
(纸本)9789819708611;9789819708628
Pointwise convolutions are widely used in various convolutional neural networks, due to low computation complexity and parameter requirements. However, pointwise convolutions are still time-consuming like regular convolutions. As a result of increasing power consumption, low-power embedded processors have been brought into high-performance computing field, such as multi-core digital signal processors (DSPs). In this paper, we propose a high-performance multi-level parallel direct implementation of pointwise convolutions on multi-core DSPs in FT-M7032, a CPU-DSP heterogeneous prototype processor. The main optimizations include on-chip memory blocking, loop ordering, vectorization, register blocking, and multi-core parallelization. The experimental results show that the proposed direct implementation achieves much better performance than GEMM-based ones on FT-M7032, and a speedup of up to 79.26 times is achieved.
In light of previous endeavors and trends in the realm of parallel programming, HPPython emerges as an essential superset that enhances the accessibility of parallel programming for developers, facilitating scalabilit...
详细信息
暂无评论