Given a document in a source language, cross-lingual summarization (CLS) aims at generating a concise summary in a different target language. Unlike monolingual summarization (MS), naturally occurring source-language ...
详细信息
RNA-binding proteins (RBPs) are essential for gene expression, and the complex RNA-protein interaction mechanisms require analysis of global RNA information. Therefore, accurate prediction of RBP binding sites on full...
详细信息
ISBN:
(数字)9798350386226
ISBN:
(纸本)9798350386233
RNA-binding proteins (RBPs) are essential for gene expression, and the complex RNA-protein interaction mechanisms require analysis of global RNA information. Therefore, accurate prediction of RBP binding sites on full-length RNA transcripts is crucial for understanding these mechanisms and their roles in diseases. While machine learning methods can predict RBP binding to RNA fragments, extending this to full-length transcripts presents challenges due to sequence length and data imbalance. In this paper, we introduce RBP-Former, a binding site joint prediction model designed specifically for full-length RNA transcripts that can be used for multiple RBPs. This model processes information at both coarse and fine-grained levels to fully exploit sequence data and its interactions with multiple RBPs. We develop multi-level imbalance learning strategies, achieving favorable results on imbalanced data. Our method outperforms existing methods in predicting binding sites on full-length RNA transcripts for multiple RBPs, demonstrating its effectiveness in handling imbalanced label and sample distributions.
Nowadays, people are increasingly concerned about the safety of traffic systems. Road segmentation and recognition is a fundamental problem in perceiving traffic environments and serve as the basis for self-driving ca...
详细信息
ISBN:
(纸本)9781467396769
Nowadays, people are increasingly concerned about the safety of traffic systems. Road segmentation and recognition is a fundamental problem in perceiving traffic environments and serve as the basis for self-driving cars. In this paper, inspired by an iterative deep analysis thinking, we propose a novel method which is able to learning powerful features step by step, and solve the optimal precision by balancing local and global information to conduct pixel-level classification for road segmentation. Firstly, we introduce an iterative deep analysis thinking which shows that how to design a strong and robustness deep model from failure experience. Secondly, we choose a powerful global features learning network as basis to create a novel framework for our task. Meanwhile, we employ the patch and multi-scale pyramid as input to enhance local features learning. We conduct experiments on three datasets from KITTI vision Benchmark, namely UU, UM, UMM. The experimental results demonstrate that our proposed method obtains comparable performance with state-of-the-art methods on these datasets.
This study improves a classifier of the support vector machine (SVM) by optimizing its parameters by adjusting cockroach swarm optimization (CSO). Classification system design includes data inputs, pre-process, and cl...
详细信息
In this work, a new glass classification method is proposed. Firstly, images are enhanced by image preprocessing. Secondly, a series of glass features including shape and texture features are proposed. Finally, we emp...
详细信息
ISBN:
(纸本)9781509029181
In this work, a new glass classification method is proposed. Firstly, images are enhanced by image preprocessing. Secondly, a series of glass features including shape and texture features are proposed. Finally, we employ simple minimum distance classifier to classify the input glass images. The experimental results show that the proposed method has high classification efficiency and accuracy.
Given a speech clip and facial image, the goal of talking face generation is to synthesize a talking face video with accurate mouth synchronization and natural face motion. Recent progress has proven the effectiveness...
详细信息
ISBN:
(纸本)9781728188089;9781728188096
Given a speech clip and facial image, the goal of talking face generation is to synthesize a talking face video with accurate mouth synchronization and natural face motion. Recent progress has proven the effectiveness of the landmarks as the intermediate information during talking face generation. However, the large gap between audio and visual modalities makes the prediction of landmarks challenging and limits generation ability. This paper proposes a semantic and temporal synchronous landmark learning method for talking face generation. First, we propose to introduce a word detector to enforce richer semantic information. Then, we propose to preserve the temporal synchronization and consistency between landmarks and audio via the proposed temporal residual loss. Lastly, we employ a U-Net generation network with adaptive reconstruction loss to generate facial images for the predicted landmarks. Experimental results on two benchmark datasets LRW and GRID demonstrate the effectiveness of our model compared to the state-of-the-art methods of talking face generation.
Multi-view clustering has attracted more attention recently since many real-world data are comprised of different representations or views. Recent multi-view clustering works mainly exploit the instance consistency to...
详细信息
Multi-view clustering has attracted more attention recently since many real-world data are comprised of different representations or views. Recent multi-view clustering works mainly exploit the instance consistency to obtain the shared representations across different views, and apply a single-view clustering method to perform data partitions. However, these existing methods often ignore the inconsistency of instance associations within the views, which may enlarge the intra-class diversity among the views and therefore degrade the clustering performance. To address this issue, this paper proposes an efficient mutual contrastive teacher-student leaning (MC-TSL) model to enhance the multi-view clustering, which is the first attempt to study the inconsistency distillation for consistency learning. First, the proposed MC-TSL approach exploits a view-specific encoder with two heads, an instance encoding head and a semantic distillation head, respectively, for capturing the consistent and discriminative feature representations. To be specific, the former head exploits a cross-view contrastive learning method to obtain a redundancy-free consistent representation at the instance level, while the latter head designs a mutual teacher-student learning module to capture the intra-view information at semantic level. By training these two heads in an end-to-end manner, the discriminative multi-view embeddings are efficiently obtained and refined by minimizing the weighted sum of the reconstruction loss, contrastive loss and contrast distillation loss. Extensive experiments verify the superiorities of the proposed MC-TSL framework and show its competitive clustering performances.
Camouflaged object detection (COD) and salient object detection (SOD) are two distinct yet closely-related computervision tasks widely studied during the past decades. Though sharing the same purpose of segmenting an...
详细信息
Deep learning, in particular Convolutional Neural Network (CNN), has achieved promising results in face recognition recently. However, it remains an open question: why CNNs work well and how to design a 'good'...
详细信息
Deep learning, in particular Convolutional Neural Network (CNN), has achieved promising results in face recognition recently. However, it remains an open question: why CNNs work well and how to design a 'good' architecture. The existing works tend to focus on reporting CNN architectures that work well for face recognition rather than investigate the reason. In this work, we conduct an extensive evaluation of CNN-based face recognition systems (CNN-FRS) on a common ground to make our work easily reproducible. Specifically, we use public database LFW (Labeled Faces in the Wild) to train CNNs, unlike most existing CNNs trained on private databases. We propose three CNN architectures which are the first reported architectures trained using LFW data. This paper quantitatively compares the architectures of CNNs and evaluates the effect of different implementation choices. We identify several useful properties of CNN-FRS. For instance, the dimensionality of the learned features can be significantly reduced without adverse effect on face recognition accuracy. In addition, a traditional metric learning method exploiting CNN-learned features is evaluated. Experiments show two crucial factors to good CNN-FRS performance are the fusion of multiple CNNs and metric learning. To make our work reproducible, source code and models will be made publicly available.
Video action anticipation aims to predict future action categories from observed frames. Current state-of-the-art approaches mainly resort to recurrent neural networks to encode history information into hidden states,...
详细信息
暂无评论