Time series data are pervasive in varied real-world applications, and accurately identifying anomalies in time series is of great importance. Many current methods are insufficient to model long-term dependence, wherea...
Time series data are pervasive in varied real-world applications, and accurately identifying anomalies in time series is of great importance. Many current methods are insufficient to model long-term dependence, whereas some anomalies can be only identified through long temporal contextual information. This may finally lead to disastrous outcomes due to false negatives of these anomalies. Prior arts employ Transformers (i.e., a neural network architecture that has powerful capability in modeling long-term dependence and global association) to alleviate this problem; however, Transformers are insensitive in sensing local context, which may neglect subtle anomalies. Therefore, in this paper, we propose a local-adaptive Transformer based on cross-correlation for time series anomaly detection, which unifies both global and local information to capture comprehensive time series patterns. Specifically, we devise a cross-correlation mechanism by employing causal convolution to adaptively capture local pattern variation, offering diverse local information into the long-term temporal learning process. Furthermore, a novel optimization objective is utilized to jointly optimize reconstruction of the entire time series and matrix derived from cross-correlation mechanism, which prevents the cross-correlation from becoming trivial in the training phase. The generated cross-correlation matrix reveals underlying interactions between dimensions of multivariate time series, which provides valuable insights into anomaly diagnosis. Extensive experiments on six real-world datasets demonstrate that our model outperforms state-of-the-art competing methods and achieves 6.8%-27.5% $F_{1}$ score improvement. Our method also has good anomaly interpretability and is effective for anomaly diagnosis.
As deep learning grows rapidly, model training heavily relies on parallel methods and there exist numerous cluster configurations. However, current preferences for parallel training focus on data centers, overlooking ...
As deep learning grows rapidly, model training heavily relies on parallel methods and there exist numerous cluster configurations. However, current preferences for parallel training focus on data centers, overlooking the financial constraints faced by most researchers. To attain the best performance within the cost limitation, we introduce a throughput-cost metric to accurately characterize clusters' cost-effectiveness. Based on this metric, we design a cost-effective cluster featuring the 3090 with NVLink. The experiment results demonstrate that our cluster achieves remarkable cost-effectiveness in various distributed model training schemes.
Multidimensional parallel training has been widely applied to train large-scale deep learning models like GPT-3. The efficiency of parameter communication among training devices/processes is often the performance bott...
Multidimensional parallel training has been widely applied to train large-scale deep learning models like GPT-3. The efficiency of parameter communication among training devices/processes is often the performance bottleneck of large model training. Analysis of parameter communication mode and traffic has important reference significance for the research of interconnection network design and computing task scheduling to improve the training performance. In this paper, we analyze the parametric communication modes in typical 3D parallel training (data parallelism, pipeline parallelism, and tensor parallelism), and model the traffic in different communication modes. Finally, taking GPT-3 as an example, we present the communication in its 3D parallel training.
Large models have achieved impressive performance in many downstream tasks. Using pipeline parallelism to fine-tune large models on commodity GPU servers is an important way to make the excellent performance of large ...
Large models have achieved impressive performance in many downstream tasks. Using pipeline parallelism to fine-tune large models on commodity GPU servers is an important way to make the excellent performance of large models available to the general public. Previous solutions fail to achieve an efficient memory-balanced pipeline parallelism. In this poster, we introduce a memory load-balanced pipeline parallel solution. This solution balances memory consumption across stages on commodity GPU servers via NVLink bridges. It establishes a new pathway to offload data from GPU to CPU by using the PCIe link of adjacent GPUs connected by the NVLink bridge. Furthermore, our method orchestrates offload operations to minimize the offload latency during large model fine-tuning. Experiments demonstrate that our solution can balance the memory footprint among pipeline stages without sacrificing training performance.
Multivariate time series anomaly detection (MTAD) poses a challenge due to temporal and feature dependencies. The critical aspects of enhancing the detection performance lie in accurately capturing the dependencies be...
详细信息
ISBN:
(数字)9798350368741
ISBN:
(纸本)9798350368758
Multivariate time series anomaly detection (MTAD) poses a challenge due to temporal and feature dependencies. The critical aspects of enhancing the detection performance lie in accurately capturing the dependencies between variables within the sliding window and effectively leveraging them. Existing studies rely on domain knowledge to pre-set the window size, and overlook the strength of dependencies while calculating direction based on variable similarity. This paper proposes GSLTE, a graph structure learning method for MTAD. GSLTE employs Fast Fourier Transform to conduct iterative segmentation of the whole series, selecting the dominant Fourier frequency as the window size for each subsequence within the minimum interval. GSLTE quantifies the direction and strength of the dependencies based on variable-lag transfer entropy which is achieved through Dynamic Time Warping method to learn asymmetric links between variables. Extensive experiments show that GNN-based MTAD methods applying GSLTE can further improve anomaly detection performance while outperforming state-of-the-art competitors.
The multiplier is an important component of the processor's computing unit. Multiplication, multiplication, addition, and multiplication and subtraction operations are widely used in various signal processing algo...
详细信息
ISBN:
(数字)9798331541750
ISBN:
(纸本)9798331541767
The multiplier is an important component of the processor's computing unit. Multiplication, multiplication, addition, and multiplication and subtraction operations are widely used in various signal processing algorithms. Based on this, this article extends the multiplication instructions of RISC-V, designs a 64-bit multiplier that combines the booth2 algorithm and wallace compression technology, and designs a deep pipeline mechanism for the multiplier to improve performance. Finally, Through logical simulation and emulation, the result output is correct. Comprehensive results show that under the 28nm cmos process, the MAC unit can reach 2.22Ghz.
Foundation models are in the process of becoming the dominant deep learning technology. Pretraining a foundation model is always time-consuming due to the large scale of both the model parameter and training dataset. ...
详细信息
Graph neural networks (GNNs) have been becoming important tools for processing structured graph data and successfully applied to multiple graph-based application scenarios. The existing GNN systems adopt sample-based ...
详细信息
Graph neural networks (GNNs) have been becoming important tools for processing structured graph data and successfully applied to multiple graph-based application scenarios. The existing GNN systems adopt sample-based training on large-scale graphs over multiple GPUs. Although they support large-scale graph training, large data loading overhead of transferring vertex features between CPUs and GPUs is still a bottleneck. In this work, we propose SCGraph, a method that supports GPU high-speed feature caching. SCGraph classifies the graph vertices sorted by out-degrees. For high out-degree vertices, SCGraph sets grading caches via different GPUs to increase the overall cache content through NVLink high-speed data transmission between them. For low out-degree vertices, SCGraph expands training vertices' neighborhood in advance to regenerate cache. We evaluate SCGraph against two state-of-the-art industrial GNN frameworks, i.e., DGL and PaGraph on various benchmarks. Experimental results show that SCGraph improves the cache hit rate over GPUs up to 23.6%, and achieves up to 1.71x performance speedup over the state-of-the-art baselines while the convergence almost constant.
Blockchain technology has been extensively uti-lized in decentralized data-sharing applications, with the immutability of blockchain providing a witness for the circulation of data. However, current blockchain data-sh...
详细信息
ISBN:
(数字)9798331509712
ISBN:
(纸本)9798331509729
Blockchain technology has been extensively uti-lized in decentralized data-sharing applications, with the immutability of blockchain providing a witness for the circulation of data. However, current blockchain data-sharing solutions still fail to address the simultaneous screening needs of both the sender and receiver with multi-keywords. Without the capability to support bilateral simultaneous filtering, the disclosure of reasons for matching failures could inadvertently expose sensitive user data. Therefore, the challenge lies in enabling ciphertexts with multiple keywords and receivers with multiple interests to achieve mutual and simultaneous matching. Based on the technical foundations of SE (Searchable Encryption), MABE (Multi-Attribute Based Encryption), and polynomial fitting, this paper proposes a scheme called DMSA (Decentralized and Multi-keyword selective Sharing and selective Acquisition). This scheme can satisfy soundness, enabling ciphertexts carrying multiple keywords and receivers representing multiple interests to match each other simultaneously. We conducted a security analysis that confirms the security of DMSA against chosen-plaintext attacks. Our experimental results demonstrate a significant efficiency improvement, with a 67% increase over single-keyword data-sharing schemes and a 16% enhancement compared to the existing multi-keyword data-sharing solution.
Dear editor,The increasing awareness of the potential value hidden in data has resulted in many data mining studies being conducted. In the domain of software engineering, for example, developers' behavioral data ...
Dear editor,The increasing awareness of the potential value hidden in data has resulted in many data mining studies being conducted. In the domain of software engineering, for example, developers' behavioral data and code review data have been leveraged in social coding sites to automatically recommend relevant projects [1] and candidate reviewers [2, 3].
暂无评论