With the rapid growth of large language models, cloud computing has become an indispensable component of the AI industry. Cloud service providers(CSPs) are establishing AI data centers to service AI workloads. In the ...
详细信息
ISBN:
(数字)9798350387339
ISBN:
(纸本)9798350387346
With the rapid growth of large language models, cloud computing has become an indispensable component of the AI industry. Cloud service providers(CSPs) are establishing AI data centers to service AI workloads. In the face of this surging need for AI computing power, building a connected computing environment across various clouds and forming a JointCloud presents an attractive solution. However, scheduling AI tasks across multiple AI data centers within a JointCloud environment presents a significant challenge: how to balance users’ demands while ensuring CSPs’ fairness in scheduling. Existing research primarily focuses on optimizing scheduling quality with limited consideration for fairness. Therefore, this paper proposes a Fairness-Aware AI-Workloads Allocation method (F3A), a fair cross-cloud allocation technique for AI tasks. F3A utilizes Point and Token to reflect both the resource status and historical task allocations of AI data centers, enabling the consideration of users’ multidimensional demands and facilitating fair task allocation across multiple centers. In order to better assess the fairness of scheduling, we also devised a fairness indicator(FI), based on the Gini coefficient to measure the fairness of task allocation. The experimental results demonstrate that F3A consistently maintains FI within 0.1 across various cluster sizes and different task quantities, representing an improvement of 76.45% compared to classical fair scheduling algorithms round-robin. F3A exhibits commendable performance in ensuring fairness in task allocation while also demonstrating effectiveness in cost reduction and enhancing user satisfaction.
Offline imitative learning(OIL) is often used to solve complex continuous decision-making tasks. For these tasks such as robot control, automatic driving and etc., it is either difficult to design an effective reward ...
详细信息
ISBN:
(纸本)9781450398336
Offline imitative learning(OIL) is often used to solve complex continuous decision-making tasks. For these tasks such as robot control, automatic driving and etc., it is either difficult to design an effective reward for learning or very expensive and time-consuming for agents to collect data interactively with the environment. However, the data used in previous OIL methods are all gathered by reinforcement learning algorithms guided by task-specific rewards, which is not a true reward-free premise and still suffers from the problem of designing an effective reward function in real tasks. To this end, we propose the reward-free exploratory data driven offline imitation learning (ExDOIL) framework. ExDOIL first trains an unsupervised reinforcement learning agent by interacting with the environment, and collects enough unsupervised exploration data during training; Then, a task independent yet simple and efficient reward function is used to relabel the collected data; Finally, an agent is trained to imitate the expert to complete the task through a conventional RL algorithm such as TD3. Extensive experiments on continuous control tasks demonstrate that the proposed framework can achieve better imitation performance(28% higher episode returns on average) comparing with previous SOTA method(ORIL) without any task-specific rewards.
The scale of model parameters and the amount of training data is exponentially increasing. It requires more GPU memory with the exponential increasement of model parameters. Recomputation and swapping are two main mem...
The scale of model parameters and the amount of training data is exponentially increasing. It requires more GPU memory with the exponential increasement of model parameters. Recomputation and swapping are two main memory optimization methods that have been extensively studied, and there are also optimization strategies that combine the two methods. However, most of them are based on heuristic search strategies, which do not explore the complete solution space and can’t guarantee the optimality of the solution results. An optimal search strategy with tensor-level recomputation and swapping is expected in large-scale model training. In this paper, we propose an optimal strategy searching algorithm combining tensor-based recomputation and swapping. Specifically, the memory swapping strategy is reformulated as an optimization problem, which converts the memory constraints into mixed integer programming, to find the optimal memory optimization strategy. By leveraging the advantages of both recomputation and swapping, this approach minimizes computation consumption without exceeding the available memory limitation. Experimental results show that our method exhibits about 60% reduction in memory requirements during the training process. Furthermore, our method can reduce the overall training time beyond the existing algorithms. Compared to Checkmate, our approach achieves about 0.3–0.9% reduction in computation cost per iteration.
Instruction tuning for large language models (LLMs) can drive them to produce results consistent with human goals in specific downstream tasks. However, the process of continual instruction tuning (CIT) for LLMs may b...
详细信息
In airfoil numerical simulation, the mesh quality has an important influence on the accuracy and error of numerical simulation. The existing mesh quality evaluation requires a lot of manual interaction, which greatly ...
详细信息
Serverless computing, comprised of Function as a Service (FaaS) and Backend as a Service (BaaS), has garnered widespread attention owning to its features such as maintenance-free operations, pay-per-use pricing, and a...
详细信息
ISBN:
(数字)9798350387339
ISBN:
(纸本)9798350387346
Serverless computing, comprised of Function as a Service (FaaS) and Backend as a Service (BaaS), has garnered widespread attention owning to its features such as maintenance-free operations, pay-per-use pricing, and automatic scalability. However, practical usage encounters several challenges: 1) The diversity of user applications makes comprehensive performance evaluation difficult, as benchmark and application tests only reflect performance under specific conditions and cannot fully capture users’ actual experiences across different serverless platforms. 2) Disparities in performance and costs across different serverless platforms make it challenging to achieve optimal performance and cost efficiency through single-cloud deployment, thereby underutilizing the advantages of each platform. 3) Vendor lock-in issues restrict the migration of user applications and exacerbate dependence on a single cloud *** address these challenges, this paper proposes a collaborative mechanism, referred to as DCSA, which integrates FaaS and storage services to achieve automatic cross-cloud deployment of user applications while considering both performance and cost comprehensively. Firstly, we adapt the interfaces of different serverless platforms, effectively reducing the complexity of cross-cloud deployment. Secondly, we develop cost and latency models for the cross-cloud deployment of chained serverless applications and propose a deployment scheduling algorithm that simultaneously considers both latency and cost. Finally, we conduct experiments to evaluate the performance of the proposed algorithm. Results demonstrate that our method can effectively reduce latency (up to 2.3%) and lower costs (up to 9.9%).
Sparse matrix reordering is an important step in Cholesky decomposition. By reordering the rows and columns of the matrix, the time of computation and storage cost can be greatly reduced. With the proposal of various ...
详细信息
ISBN:
(数字)9798350359312
ISBN:
(纸本)9798350359329
Sparse matrix reordering is an important step in Cholesky decomposition. By reordering the rows and columns of the matrix, the time of computation and storage cost can be greatly reduced. With the proposal of various reordering algorithms, the selection of suitable reordering methods for various matrices has become an important research topic. In this paper, we propose a method to predict the optimal reordering method by visualizing sparse matrices in chunks in a parallel manner and feeding them into a deep convolutional neural network. The results show that the theoretical performance can reach 95% of the optimal performance, the prediction accuracy of the method can reach up to 85%, the parallel framework achieves an average speedup ratio of 11.35 times over the serial framework, and the performance is greatly improved compared with the traversal selection method on large sparse matrices.
With advancements in AI infrastructure and Trusted Execution Environment (TEE) technology, Federated Learning as a Service (FLaaS) through JointCloud Computing (JCC) is promising to break through the resource constrai...
详细信息
Self-supervised time series anomaly detection (TSAD) demonstrates remarkable performance improvement by extracting high-level data semantics through proxy tasks. Nonetheless, most existing self-supervised TSAD techniq...
详细信息
ISBN:
(数字)9798350368741
ISBN:
(纸本)9798350368758
Self-supervised time series anomaly detection (TSAD) demonstrates remarkable performance improvement by extracting high-level data semantics through proxy tasks. Nonetheless, most existing self-supervised TSAD techniques rely on manual- or neural-based transformations when designing proxy tasks, overlooking the intrinsic temporal patterns of time series. This paper proposes a local temporal pattern learning-based time series anomaly detection (LTPAD). LTPAD first generates sub-sequences. Pairwise sub-sequences naturally manifest proximity relationships along the time axis, and such correlations can be used to construct supervision and train neural networks to facilitate the learning of temporal patterns. Time intervals between two sub-sequences serve as labels for sub-sequence pairs. By classifying these labeled data pairs, our model captures the local temporal patterns of time series, thereby modeling the temporal pattern-aware "normality". Abnormal scores of testing data are acquired by evaluating their conformity to these learned patterns shared in training data. Extensive experiments show that LTPAD significantly outperforms state-of-the-art competitors.
The talking head generation aims to synthesize a speech video of the source identity from a driving video or audio or text data irrelevant to the source identity. It can not only be applied to games and virtual realit...
详细信息
暂无评论