Graph pooling that summaries the information in a large graph into a compact form is essential in hierarchical graph representation learning. Existing graph pooling methods either suffer from high computational comple...
详细信息
As one knows, an event often consists of several actions while each action is atomic. Inspired by this insight, we propose a novel framework named Atomic-action-based Contrastive Network model (ACN) for weakly supervi...
As one knows, an event often consists of several actions while each action is atomic. Inspired by this insight, we propose a novel framework named Atomic-action-based Contrastive Network model (ACN) for weakly supervised temporal language grounding task to localize the query-related event moment in an untrimmed video, without access to any temporal annotations. Specifically, ACN first determines the accurate moment boundary of each action in a query-agnostic way. This can adequately exploit homogeneous visual cues while impeding the heterogeneity of the query from hurting the atomicity of visual action, i.e., action boundary. To effectively localize the query-related event, we seek the discriminative words in the given query, and explore a composite-grained contrastive module to retrieve those corresponding atomic actions in the common latent space across modalities. This boosts feature discrimination of visual event segment to remove irrelevant action video segments. Experiments on two popular datasets show the efficacy of our model.
As a critical component for online advertising and marketing, click-through rate (CTR) prediction has drawn lots of attention from both industry and academia. Recently, deep learning has become the mainstream methodol...
详细信息
With serverless computing offering more efficient and cost-effective application deployment, the diversity of serverless platforms presents challenges to users, including platform lock-in and costly migration. Moreove...
With serverless computing offering more efficient and cost-effective application deployment, the diversity of serverless platforms presents challenges to users, including platform lock-in and costly migration. Moreover, due to the black box nature of function computing, traditional performance benchmarking methods are not applicable, necessitating new studies. This article presents a detailed comparison of six major public cloud function computing platforms and introduces a benchmarking framework for function computing performance. This framework aims to help users make comprehensive comparisons and select the most suitable platform for their specific needs.
B-mode ultrasound tongue imaging is widely used to visualize the tongue motion, due to its appearing properties. Extracting the tongue surface contour in the B-mode ultrasound image is still a challenge, while it is a...
详细信息
B-mode ultrasound tongue imaging is widely used to visualize the tongue motion, due to its appearing properties. Extracting the tongue surface contour in the B-mode ultrasound image is still a challenge, while it is a prerequisite for further quantitative analysis. Recently, deep learning-based approach has been adopted in this task. However, the standard deep models fail to address faint contour when the ultrasound wave goes parallel to the tongue surface. To address the faint or missing contours in the sequence, we explore the shape consistency-based regularizer, which can take sequential information into account. By incorporating the regularizer, the deep model not only can extract frame-specific contours, but also can enforce the similarity between the contours extracted from adjacent frames. Extensive experiments are conducted both on the synthetic and real ultrasound tongue imaging dataset and the results demonstrate the effectiveness of proposed method. To better promote the research in this field, we have released our code at.
Multi-view learning has been explored for audio classification tasks, exploiting different representations of audio signals, ranging from MFCC, CQT, to raw signals. The quality of each view may vary for different audi...
Multi-view learning has been explored for audio classification tasks, exploiting different representations of audio signals, ranging from MFCC, CQT, to raw signals. The quality of each view may vary for different audio signals, and the appropriate uncertainty quantification for each view has not been fully explored. In this work, we explore a trusted multi-view learning framework for classification tasks in order to fully incorporate different views. Our framework consists of three parallel branches of Transformer architectures (Gammatone spectrogram, log-Mel and CQT) and they are combined using the uncertainty estimation of different branch. In addition to computing the classification probabilities, the uncertainty of each representation can also be obtained using the framework. We firstly calculate the evidence based on feature vectors to obtain the probabilities and the uncertainty of classification problems for Gammatone, log-Mel and CQT branch. By integrating the confidence from each of the different representations using the Dempster–Shafer theory, the classification framework can provide higher accuracy and confidence. To demonstrate the effectiveness of the proposed framework, we conduct the experiments on the GTZAN dataset. The obtained results show that our method can reach the accuracy of 83.0%, which significantly outperforms single representation-based methods while providing uncertainty estimation for different views.
Semantic networks, such as the knowledge graph, can represent the knowledge leveraging the graph structure. Although the knowledge graph shows promising values in natural language processing, it suffers from incomplet...
详细信息
Semantic networks, such as the knowledge graph, can represent the knowledge leveraging the graph structure. Although the knowledge graph shows promising values in natural language processing, it suffers from incompleteness. This paper focuses on knowledge graph completion by predicting linkage between entities, which is a fundamental yet critical task. Semantic matching is a potential solution for link prediction as it can deal with unseen entities, while the translational distance based methods struggle with the unseen entities. However, to achieve competitive performance as translational distance based methods, semantic matching based methods require large-scale datasets for the training purpose, which are typically unavailable in practical settings. Therefore, we employ the language model and introduce a novel knowledge graph architecture named LP-BERT, which contains two main stages: multi-task pre-training and knowledge graph fine-tuning. In the pre-training phase, three tasks are taken to drive the model to learn the relationship information from triples by predicting either entities or relations. While in the fine-tuning phase, inspired by contrastive learning, we design a triple-style negative sampling in a batch, which greatly increases the proportion of negative sampling while keeping the training time almost unchanged. Furthermore, we propose a new data augmentation method utilizing the inverse relationship of triples to improve the performance and robustness of the model. To demonstrate the effectiveness of our proposed framework, we conduct extensive experiments on three widely-used knowledge graph datasets, WN18RR, FB15k-237, and UMLS. The experimental results demonstrate the superiority of our methods, and our approach achieves state-of-the-art results on the WN18RR and FB15k-237 datasets. Significantly, the Hits@10 indicator is improved by 5% from the previous state-of-the-art result on the WN18RR dataset while reaching 100% on the UMLS dataset. Copy
Deep reinforcement learning (RL) is playing an increasingly important role in web services such as news recommendation, vulnerability detection, and personalized services. Exploration is a key component of RL, which d...
详细信息
ISBN:
(数字)9781728187891
ISBN:
(纸本)9781728187907
Deep reinforcement learning (RL) is playing an increasingly important role in web services such as news recommendation, vulnerability detection, and personalized services. Exploration is a key component of RL, which determines whether these RL-based applications could find effective solutions eventually. In this paper, we propose a novel gradient-based fast adaptation approach for model agnostic meta-reinforcement learning via Bayesian structure exploration (BSE-MAML). BSE-MAML could effectively learn exploration strategies from prior experience by updating policy with embedding latent space via a Bayesian mechanism. Coherent stochasticity injected by latent space are more efficient than random noise, and can produce exploration strategies to perform well in novel environment. We have conducted extensive experiments to evaluate BSE-MAML. Experimental results show that BSE-MAML achieves better performance in exploration in realistic environments with sparse rewards, compared to state-of-the-art meta-RL algorithms, RL methods without learning exploration strategies, and task-agnostic exploration approaches.
In JointCloud Computing, multi-party participation introduces complexity and uncertainty. For all participants in JointCloud Computing, both continuous supervision and necessary privacy protection are required. Tradit...
详细信息
In JointCloud Computing, multi-party participation introduces complexity and uncertainty. For all participants in JointCloud Computing, both continuous supervision and necessary privacy protection are required. Traditional supervision methods usually adopt the centralized information interaction mode, which has such defects as collusion of interests, single point of failure, privacy disclosure, etc. Building a decentralized supervision mechanism has become a new research direction. In this paper, we propose PPSS, a privacy-preserving supervision scheme based on blockchain, which decentralizes the supervision of the participants in JointCloud Computing, and combines the “double encryptions” and “threshold encryption” technologies to provide privacy protection. While making full use of the decentralization of the blockchain, a committee is established to carry out the analysis and decision-making tasks in terms of supervision and privacy protection. Experimental results indicate that PPSS can balance performance and security by reasonably configuring the committee.
Underwater acoustic classification is a challenging task due to complex background noise and complicated sound propagation patterns. How to represent the signals is important for the classification task. In this paper...
Underwater acoustic classification is a challenging task due to complex background noise and complicated sound propagation patterns. How to represent the signals is important for the classification task. In this paper, we propose a novel representation learning method for the underwater acoustic signals, leveraging the mask modeling-based self-supervised learning paradigm. Specifically, we first explore modifying the Swin Transformer architecture to learn general representation for the audio signals, accompanied with random masking on the log-mel spectrogram. The main goal of the pretext task is to predict the masked parts of Log-mel spectrogram and the gamma-stone spectrogram, so that the model can not only learn the local and global features but also learn complementary information. For downstream task, we utilize the labelled datasets to fine-tune the pre-trained model. On DeepShip datasets which consist of 47 hand 4 minof ship sounds in four categories, our model achieves state-of-the-art performance compared with competitive approaches. Our method obtains a classification accuracy of 78.03%, which is better than the separable convolution autoencoder (SCAE) and using the constant-Q transform spectrogram. This work demonstrates the potential of the masked modeling based self-supervised learning for understanding and interpretation of underwater acoustic signals.
暂无评论