A hierarchical diagnosis approach, namely Magnifier, was proposed, which models the execution path graph of a user request as component layer, module layer and function layer, and detects anomalies from higher layer t...
详细信息
A hierarchical diagnosis approach, namely Magnifier, was proposed, which models the execution path graph of a user request as component layer, module layer and function layer, and detects anomalies from higher layer to lower layer separately. Extensive experiments were conducted on the Alibaba cloud computing platform. The results indicate that, under the conditions of large volume of data and high complexity of execution paths, Magnifier can accurately and efficiently locate the prime causes of performance degradation.
Beyond classical domain-specific adversarial training, a recently proposed task-specific framework has achieved a great success in single source domain adaptation by utilizing task-specific decision boundaries. Howeve...
Beyond classical domain-specific adversarial training, a recently proposed task-specific framework has achieved a great success in single source domain adaptation by utilizing task-specific decision boundaries. However, compared to single-source-single-target setting, multi-source domain adaptation (MDA) shows more powerful capability to handle with most real-life cases. To align target domain with diverse multi-source domains using task-specific decision boundaries, we provide a deep insight of task-specific framework on MDA for the first time. Accordingly, we propose a novel task-specific multi-source domain adaptation method (TMDA) with a clustering embedded adversarial training process. Specifically, the proposed TMDA detects and refines less discriminative target representations through a max-min optimization over two adversarial task-specific classifiers. Moreover, our analysis implies that scattered multi-source representations disturb the adversarial training under the task-specific framework. To tight up the dispersed source representations, we embeds a relationship-based domain clustering into TMDA. Empirical results demonstrate that our TMDA outperforms state-of-the-art methods on toy dataset, sentiment analysis and digit classification.
Wearable health monitoring is a crucial technical tool that offers early warning for chronic diseases due to its superior portability and low power ***,most wearable health data is distributed across dfferent organiza...
详细信息
Wearable health monitoring is a crucial technical tool that offers early warning for chronic diseases due to its superior portability and low power ***,most wearable health data is distributed across dfferent organizations,such as hospitals,research institutes,and companies,and can only be accessed by the owners of the data in compliance with data privacy *** first challenge addressed in this paper is communicating in a privacy-preserving manner among different *** second technical challenge is handling the dynamic expansion of the federation without model *** address the first challenge,we propose a horizontal federated learning method called Federated Extremely Random Forest(FedERF).Its contribution-based splitting score computing mechanism significantly mitigates the impact of privacy protection constraints on model *** on FedERF,we present a federated incremental learning method called Federated Incremental Extremely Random Forest(FedIERF)to address the second technical *** introduces a hardness-driven weighting mechanism and an importance-based updating scheme to update the existing federated model *** experiments show that FedERF achieves comparable performance with non-federated methods,and FedIERF effectively addresses the dynamic expansion of the *** opens up opportunities for cooperation between different organizations in wearable health monitoring.
GPUs render higher computing unit density than contemporary CPUs and thus exhibit much higher power consumption despite its higher power efficiency. The power consumption has become an important issue that impacts CPU...
详细信息
GPUs render higher computing unit density than contemporary CPUs and thus exhibit much higher power consumption despite its higher power efficiency. The power consumption has become an important issue that impacts CPU's applications, thereby necessitating the low power optimization technology for GPUs. Software prefetching is an efficient way to alleviate the memory wall problem which overlaps the computing and memory access latencies. However, software prefetching will cause some power overhead because it increases the number and density of the instructions. Thus, we should consider the balance between the performance income and the power overhead when applying the optimization. To address this problem, in this paper we first analyze the multi-thread execution model of GPU and validate the potential space of software prefetching optimization. Then we give the software prefetching method for GPU programs to improve the performance. Aiming at two different objects: energy optimization under performance constraint and performance optimization under power constraint, we discuss the optimization methods based on software prefetching and dynamic voltage scaling technologies. The experimental results show that our method can efficiently optimize the energy consumption (performance) under the performance (power) constraint.
As an important component of multimedia analysis tasks, audio classification aims to discriminate between different audio signal types and has received intensive attention due to its wide applications. Generally speak...
详细信息
Convolution operation is the most important and time consuming step in a convolution neural network *** this work,we analyze the computing complexity of direct convolution and fast-Fourier-transform-based(FFT-based) *...
详细信息
ISBN:
(纸本)9781510835368
Convolution operation is the most important and time consuming step in a convolution neural network *** this work,we analyze the computing complexity of direct convolution and fast-Fourier-transform-based(FFT-based) *** creatively propose CS-unit,which is equivalent to a combination of a convolutional layer and a pooling layer but more *** computing complexity of and some other similar operation is demonstrated,revealing an advantage on computation of ***,practical experiments are also performed and the result shows that CS-unit holds a real superiority on run time.
In recent years, the rapid-growing scales of graphs have sparked a lot of parallel graph analysis frameworks to leverage the massive hardware resources on CPUs or GPUs. Existing CPU implementations are time-consuming,...
详细信息
In recent years, the rapid-growing scales of graphs have sparked a lot of parallel graph analysis frameworks to leverage the massive hardware resources on CPUs or GPUs. Existing CPU implementations are time-consuming, while GPU implementations are restricted by the memory space and the complexity of programming. In this paper, we present a high performance hybrid CPU-GPU parallel graph analytics framework with good productivity based on GraphMat. We map vertex programs to generalized sparse matrix vector multiplication on GPUs to deliver high performance, and propose a high-level abstraction for developers to implement various graph algorithms with relatively little efforts. Meanwhile, several optimizations have been adopted for reducing the communication cost and leveraging hardware resources, especially the memory hierarchy. We evaluate the proposed framework on three graph primitives(PageRank, BFS and SSSP) with large-scale graphs. The experimental results show that, our implementation achieves an average speedup of 7.0 X than GraphMat on two 6-core Intel Xeon CPUs. It also has the capability to process larger datasets but achieves comparable performance than MapGraph, a state-of-theart GPU-based framework.
The goal of knowledge graph completion (KGC) is to predict missing facts among entities. Previous methods for KGC re-ranking are mostly built on non-generative language models to obtain the probability of each candida...
详细信息
When a large-scale distributed interactive simulation system is running on WAN, the sites usually disperse over a wide area in geography, which results in the simulation clock of each site is hardly to be accurately s...
详细信息
Due to the large message transmission latency in distributed Virtual Environments(DVEs) on Wide Area Net-work(WAN), the effectiveness of causality consistency control of message ordering is determined by not only caus...
详细信息
暂无评论