With the scaling-up of high-performance computing (HPC) systems, the resilience has become an important challenge. As a widely used resilience technique for HPC systems, checkpointing saves checkpoints of the system d...
详细信息
ISBN:
(纸本)9783031695827;9783031695834
With the scaling-up of high-performance computing (HPC) systems, the resilience has become an important challenge. As a widely used resilience technique for HPC systems, checkpointing saves checkpoints of the system during the execution of parallel programs, and in case of failure, recovers the execution of the program from the most recent checkpoint. However, large-scale parallel programs often produce thousands of processes, and result in large-volume simultaneous data-writings on each checkpoint, which impacts the storage as well as the parallel file systems of HPC. To tackle this problem, this paper proposes AdapCK, an I/O-optimization scheme for checkpointing on large-scale HPC systems. AdapCK consists of two main parts: a load-balancing mechanism used for balancing workloads across low-level storage volumes on checkpointing, and a throughput-aware checkpoint-data writing mechanism used for reducing I/O contentions and increasing utilization of I/O-bandwidth. Experiment results show that the AdapCK can reduce the checkpoint time by more than 30%, up to 54.5%.
Energy consumption prediction is a crucial approach to enhance the operational efficiency of HPC (highperformance Computing) clusters. Existing predicting methods for the energy consumption mainly rely on time series...
详细信息
ISBN:
(纸本)9798350386783;9798350386776
Energy consumption prediction is a crucial approach to enhance the operational efficiency of HPC (highperformance Computing) clusters. Existing predicting methods for the energy consumption mainly rely on time series mode, focusing on the instantaneous power during job execution. However, in practical applications, due to the large scale of HPC clusters, the prediction models need to process a vast amount of data, often resulting in poor performance. Moreover, it's challenging to provide effective predictions for jobs with significant power fluctuations. Thus we propose a more efficient energy consumption prediction framework for HPC cluster jobs, achieving two main objectives: On one hand, it uses a data encoding module to pre-encode the sampled data of energy consumption related indicators, simplifying the data that the time series prediction model needs to process and improving prediction efficiency. On the other hand, the prediction target is set to the power distribution range in a future period of time, rather than the instantaneous power at some specific time points, which improves the usability of the prediction results. Test results show that the proposed prediction framework can achieve more markable performance improvement while ensuring the effectiveness of the predictions.
In application software development, memory defects are difficult to detect. Traditional memory defect detection tools generally face issues of highperformance overhead and excessive memory consumption, which limits ...
详细信息
Outdoor LiDAR point clouds are typically large-scale and complexly distributed. To achieve efficient and accurate registration, emphasizing the similarity among local regions and prioritizing global local-to-local mat...
详细信息
ISBN:
(纸本)9798350318920;9798350318937
Outdoor LiDAR point clouds are typically large-scale and complexly distributed. To achieve efficient and accurate registration, emphasizing the similarity among local regions and prioritizing global local-to-local matching is of utmost importance, subsequent to which accuracy can be enhanced through cost-effective fine registration. In this paper, a novel hierarchical neural network with double attention named HDMNet is proposed for large-scale outdoor LiDAR point cloud registration. Specifically, A novel feature consistency enhanced double-soft matching network is introduced to achieve two-stage matching with high flexibility while enlarging the receptive field with high efficiency in a patch-to-patch manner, which significantly improves the registration performance. Moreover, in order to further utilize the sparse matching information from deeper layer, we develop a novel trainable embedding mask to incorporate the confidence scores of correspondences obtained from pose estimation of deeper layer, eliminating additional computations. The high-confidence keypoints in the sparser point cloud of the deeper layer correspond to a high-confidence spatial neighborhood region in shallower layer, which will receive more attention, while the features of non-key regions will be masked. Extensive experiments are conducted on two large-scale outdoor LiDAR point cloud datasets to demonstrate the high accuracy and efficiency of the proposed HDMNet.
Spatial accelerator is a specialized hardware to provide noticeable performance speedup for tensor computations. It also brings a challenge to map tensor computations on spatial accelerators. Auto-tuning compiler is o...
详细信息
ISBN:
(纸本)9798350326598;9798350326581
Spatial accelerator is a specialized hardware to provide noticeable performance speedup for tensor computations. It also brings a challenge to map tensor computations on spatial accelerators. Auto-tuning compiler is one of the most promising directions for tensor mapping. However, existing auto-tuning compilers suffer from either numerous invalid and inefficient programs or inaccurate evaluation of incomplete programs, leading to sub-optimal performance. In this paper, we propose Soter, a novel auto-tuning tensor compilation framework for spatial accelerators. The key is to perform exploration in a both valid and efficient program design space and perform optimization according to accurate evaluation of complete programs. First, we design an analytical model to generate a high-quality program design space, which excludes invalid and inefficient programs. Second, we design an automatic program tuner to efficiently explore the program space and avoid evaluating incomplete programs. Finally, we coordinate the model and the tuner to further improve the quality of program space. The program space is identified by the model and is updated during the exploration of tuner. On average, Soter achieves 2.1x to 3.5x speedup over the state-of-the-art tensor compilers. Moreover, Soter shows better scalability for larger-scale tensor computations and spatial architectures.
Real-time multi-person pose estimation presents significant challenges in balancing speed and precision. While two-stage top-down methods slow down as the number of people in the image increases, existing one-stage me...
ISBN:
(纸本)9798350353013;9798350353006
Real-time multi-person pose estimation presents significant challenges in balancing speed and precision. While two-stage top-down methods slow down as the number of people in the image increases, existing one-stage methods often fail to simultaneously deliver high accuracy and real-time performance. This paper introduces RTMO, a one-stage pose estimation framework that seamlessly integrates coordinate classification by representing keypoints using dual 1-D heatmaps within the YOLO architecture, achieving accuracy comparable to top-down methods while maintaining high speed. We propose a dynamic coordinate classifier and a tailored loss function for heatmap learning, specifically designed to address the incompatibilities between coordinate classification and dense prediction models. RTMO outperforms state-of-the-art one-stage pose estimators, achieving 1.1% higher AP on COCO while operating about 9 times faster with the same backbone. Our largest model, RTMO-l, attains 74.8% AP on COCO val2017 and 141 FPS on a single V100 GPU, demonstrating its efficiency and accuracy. The code and models are available at https://***/open-mmlab/mmpose/tree/main/projects/rtmo.
We aim to identify the differences in Input/Output (I/O) behavior between multiple user programs through the inspection of system calls (i.e., requests made to the operating system). A typical program issues a large n...
详细信息
high-performance serverless computing has garnered significant attention. Researchers have developed numerous optimization strategies for serverless frameworks to fully leverage the benefits of serverless computing. H...
详细信息
iToF is a prevalent, cost-effective technology for 3D perception. While its reliance on multi-measurement commonly leads to reduced performance in dynamic environments. Based on the analysis of the physical iToF imagi...
详细信息
ISBN:
(纸本)9798350353013;9798350353006
iToF is a prevalent, cost-effective technology for 3D perception. While its reliance on multi-measurement commonly leads to reduced performance in dynamic environments. Based on the analysis of the physical iToF imaging process, we propose the iToF flow, composed of crossmode transformation and uni-mode photometric correction, to model the variation of measurements caused by different measurement modes and 3D motion, respectively. We propose a local linear transform (LLT) based cross-mode transfer module (LCTM) for mode-varying and pixel shift compensation of cross-mode flow, and uni-mode photometric correct module (UPCM) for estimating the depth-wise motion caused photometric residual of uni-mode flow. The iToF flow-based depth extraction network is proposed which could facilitate the estimation of the 4-phase measurements at each individual time for high framerate and accurate depth estimation. Extensive experiments, including both simulation and real-world experiments, are conducted to demonstrate the effectiveness of the proposed methods. Compared with the SOTA method, our approach reduces the computation time by 75% while improving the performance by 38%. The code and database are available at https://***/ComputationalPerceptionLab/iToF_flow.
In this paper we study a variant of the non-preemptive unrelated parallel machines scheduling problem with sequence-dependent setup times and machine eligibility restrictions. We first formulate the problem as a mixed...
详细信息
ISBN:
(纸本)9783031692567;9783031692574
In this paper we study a variant of the non-preemptive unrelated parallel machines scheduling problem with sequence-dependent setup times and machine eligibility restrictions. We first formulate the problem as a mixed integer linear program (MILP), and further devise a branch-and-cut (B&C) algorithm for solving the problem. Due to the NP hardness of the problem, we propose a metaheuristic based on an iterated local search (ILS) algorithm. Using this, we provide several matheuristics for solving the problem. The proposed approaches are compared using different families of instances.
暂无评论