检索结果-内蒙古大学图书馆

IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

作者： Kuo Guo Yifan Li Hao Chen Hong-Bin Shen Yang Yang Department of Computer Science and Engineering Shanghai Jiao Tong University and Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering Shanghai China Institute of Image Processing and Pattern Recognition Shanghai Jiao Tong University and Key Laboratory of System Control and Information Processing Ministry of Education of China Shanghai China Computational Biology Department School of Computer Science Carnegie Mellon University Pittsburgh PA USA

Isoforms refer to different mRNA molecules transcribed from the same gene, which can be translated into proteins with varying structures and functions. Predicting the functions of isoforms is an essential topic in bioinformatics as it can provide valuable insights into the intricate mechanisms of gene regulation and biological processes. Conventionally, gene function labels are standardized in Gene Ontology (GO) terms. However, traditional methods for predicting isoform function are largely limited by the absence of isoform-specific labels, sparse annotations, and the vast number of GO terms. To address these issues, we propose HANIso, a deep learning-based method for isoform function prediction. HANIso leverages a pretrained protein language model to extract features from protein sequences. It also integrates heterogeneous information, such as isoform sequence features, GO annotations, and isoform interaction data, using a Heterogeneous Graph Attention Network (HAN). This allows the model to learn the importance of different sources of information and their semantic relationships through the attention mechanism. Our method can predict function labels at both the gene level and isoform level. We conduct experiments on two species datasets, and the results demonstrate that our method outperforms existing methods on both AUROC and AUPRC. HANIso has the potential to overcome the limitations of traditional methods and provide a more accurate and comprehensive understanding of isoform function.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Learning referee evaluation and assessing action quality from coarse to fine in diving sport

引用

Neurocomputing 2025年

作者： Hong-Ming Qiu Hong-Bo Zhang Qing Lei Jing-Hua Liu Ji-Xiang Du Department of Computer Science and Technology Huaqiao University Xiamen 361021 China Fujian Key Laboratory of Big Data Intelligence and Security Huaqiao University Xiamen 361021 China Key Laboratory of Computer Vision and Machine Learning (Huaqiao University) Fujian Province University Xiamen 361021 China Xiamen Key Laboratory of Computer Vision and Pattern Recognition Huaqiao University Xiamen 361021 China

Intelligently assessing the quality of athletic performances in sports scenarios remains a fascinating challenge in computer vision. However, unraveling the subtle distinctions between two similar actions in videos and mapping those video representations to quality scores remain significant obstacles. To address these challenges, this work redefines the paradigm of quality score estimation from traditional relative quality score prediction to relative referee score prediction. To make this shift, a cross-feature fusion module rooted in Transformer-based video representation is introduced, to improve pairwise video feature learning in the realm of action quality assessment. Then, a novel contrastive action parsing decoder module generates mid-level representations to effectively connect visual features with detailed quality scores. Both modules utilize cross-attention mechanisms; the former refines the pairwise video features to represent the differences between video pairs, while the latter updates the input queries corresponding to each referee’s evaluation. Finally, to achieve precise quality score estimation, we introduce a meticulous coarse-to-fine decision process, integrating a score classifier and offset regressor. After validation on challenging diving datasets, including MTL-AQA, FineDiving, and TASD-2, the experimental results show that the proposed approach demonstrates effectiveness and feasibility when compared with state-of-the-art methods.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Learning dynamical human-joint affinity for 3D pose estimation in videos

arXiv

引用

arXiv 2021年

作者： Zhang, Junhao Wang, Yali Zhou, Zhipeng Luan, Tianyu Wang, Zhe Qiao, Yu ShenZhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences University of California Irvine United States Shanghai AI Laboratory Shanghai China

Graph Convolution Network (GCN) has been successfully used for 3D human pose estimation in videos. However, it is often built on the fixed human-joint affinity, according to human skeleton. This may reduce adaptation capacity of GCN to tackle complex spatio-temporal pose variations in videos. To alleviate this problem, we propose a novel Dynamical Graph Network (DG-Net), which can dynamically identify human-joint affinity, and estimate 3D pose by adaptively learning spatial/temporal joint relations from videos. Different from traditional graph convolution, we introduce Dynamical Spatial/Temporal Graph convolution (DSG/DTG) to discover spatial/temporal human-joint affinity for each video exemplar, depending on spatial distance/temporal movement similarity between human joints in this video. Hence, they can effectively understand which joints are spatially closer and/or have consistent motion, for reducing depth ambiguity and/or motion uncertainty when lifting 2D pose to 3D pose. We conduct extensive experiments on three popular benchmarks, e.g., Human3.6M, HumanEva-I, and MPI-INF-3DHP, where DG-Net outperforms a number of recent SOTA approaches with fewer input frames and model size. Copyright © 2021, The Authors. All rights reserved.

关键词： Convolution

来源：评论

学校读者我要写书评

暂无评论

Spatial Reconstructed Local Attention Res2Net with F0 Subband for Fake Speech Detection

arXiv

引用

arXiv 2023年

作者： Fan, Cunhang Xue, Jun Tao, Jianhua Yi, Jiangyan Wang, Chenglong Zheng, Chengshi Lv, Zhao Anhui Province Key Laboratory of Multimodal Cognitive Computation School of Computer Science and Technology Anhui University Hefei230601 China Department of Automation Tsinghua University Beijing100190 China National Laboratory of Pattern Recognition Institute of Automation Chinese Academy of Sciences Beijing100190 China Key Laboratory of Noise and Vibration Research Institute of Acoustics Chinese Academy of Sciences Beijing100190 China

The rhythm of bonafide speech is often difficult to replicate, which causes that the fundamental frequency (F0) of synthetic speech is significantly different from that of real speech. It is expected that the F0 feature contains the discriminative information for the fake speech detection (FSD) task. In this paper, we propose a novel F0 subband for FSD. In addition, to effectively model the F0 subband so as to improve the performance of FSD, the spatial reconstructed local attention Res2Net (SR-LA Res2Net) is proposed. Specifically, Res2Net is used as a backbone network to obtain multiscale information, and enhanced with a spatial reconstruction mechanism to avoid losing important information when the channel group is constantly superimposed. In addition, local attention is designed to make the model focus on the local information of the F0 subband. Experimental results on the ASVspoof 2019 LA dataset show that our proposed method obtains an equal error rate (EER) of 0.47% and a minimum tandem detection cost function (min t-DCF) of 0.0159, achieving the state-of-the-art performance among all of the single systems. Copyright © 2023, The Authors. All rights reserved.

关键词： Cost functions

来源：评论

学校读者我要写书评

暂无评论

Stable and Feature Decoupled of Deep Mutual Information Maximization Based on Wasserstein Distance

SSRN

引用

SSRN 2023年

作者： He, Xing Peng, Changgen Wang, Lin Tan, Weijie State Key Laboratory of Public Big Data College of Computer Science and Technology Guizhou University PR Guiyang550025 China Guizhou Key Laboratory of Pattern Recognition and Intelligent System Guizhou Minzu University PR Guiyang550025 China Guizhou Big Data Academy Guizhou University PR Guiyang550025 China Key Laboratory of Advanced Manufacturing Technology Ministry of Education Guizhou University PR Guiyang550025 China

Deep learning techniques usually require large amounts of expensive labeled data to train networks, and the extracted deep representations are usually mixed with multiple attributes having uninterpretability, which limits the application and development of deep learning techniques. Therefore, it is crucial to investigate the stability of training models by unsupervised learning methods and to learn feature decoupled representations. Although the method based on depth mutual information maximization(Deep InfoMax or DIM) has achieved good results in the field of computer vision, it is extremely prone to "gradient explosion" or "gradient disappearance", which makes the training model unstable. At the same time, it is difficult to decouple the features learned by the encoder. In order to solve these problems, we stable and feature decoupled of deep mutual information maximization based on Wasserstein distance, which we call the method called WDIM. The method first addresses the instability of the model trained by the DIM method and proposes to measure the distance between the encoder output and the prior distribution based on the Wasserstein distance, which is a good guide to the stability of the model training. Second, to address the problem that the features learned by the encoder are difficult to be decoupled, the mutual information between the features learned by the hidden layer of the encoder and the intermediate layer is required to be minimized during the training of the encoder so that the features learned by each filter are as uncorrelated as possible to achieve feature decoupling. Finally, the WDIM method is validated on the CIFAR-10, CIFAR-100, STL-10, and FashionMNIST datasets. Experiments show that our proposed WDIM method is more stable for model training, faster model convergence, and more significant feature decoupling. © 2023, The Authors. All rights reserved.

关键词： Network coding

来源：评论

学校读者我要写书评

暂无评论

Synchronization of Nonlinear Neural Networks with Hybrid Couplings and Uncertain Time-Varying Perturbations: A Novel Distributed-Delay Impulsive Comparison Principle

SSRN

引用

SSRN 2024年

作者： Fan, Hongguang Shi, Kaibo Zhao, Yi College of Computer Chengdu University Chengdu610106 China Key Laboratory of Pattern Recognition and Intelligent Information Processing Institutions of Higher Education of Sichuan Province Chengdu University Chengdu610106 China School of Electronic Information and Electrical Engineering Chengdu University Chengdu610106 China School of Mathematical Sciences Shenzhen University Shenzhen518060 China

In this paper, the synchronization of nonlinear drive-response neural networks with uncertain time-varying perturbations, non-delayed coupling, and distributed delay coupling is studied. To address the impact of distributed delay and discrete delay on the system, a novel impulsive comparison principle is established, which can be viewed as an effective continuation of the Halanay inequality. Besides, with the help of Lyapunov stability theory, sufficient conditions guaranteeing the exponential synchronization of concerned neural networks are derived by using a delayed impulsive controller with historical status information, which releases the conventional limitation that impulsive delays should be smaller than impulsive intervals, and generalize the existing synchronization results regarding distributed delay networks. Besides, numerical simulations for chaotic neural networks are presented to demonstrate the validity of theoretical results and the sensitivity of the control gain matrix. © 2024, The Authors. All rights reserved.

关键词： Complex networks

来源：评论

学校读者我要写书评

暂无评论

High-Resolution Natural Image Matting by Refining Low-resolution Alpha Mattes

引用

IEEE Transactions on Image Processing 2025年 34卷 3323-3335页

作者： Ye, Xianmin Liang, Yihui Tan, Mian Feng, Fujian Wang, Lin Huang, Han Guizhou Minzu University College of Data Science and Information Engineering Guiyang 550025 China Guizhou Minzu University Guizhou Key Laboratory of Pattern Recognition and Intelligent System Guiyang 550025 China University of Electronic Science and Technology of China School of Computer Science Zhongshan Institute Zhongshan 528400 China South China University of Technology School of Software Engineering Guangzhou 510006 China Key Laboratory of Big Data and Intelligent Robot (SCUT) MOE of China Guangzhou 510006 China Guangdong Engineering Center for Large Model and GenAI Technology Guangzhou 510006 China Jilin University Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education Changchun 130012 China

High-resolution natural image matting plays an important role in image editing, film-making and remote sensing due to its ability of accurately extract the foreground from a natural background. However, due to the complexity brought about by the proliferation of resolution, the existing image matting methods cannot obtain high-quality alpha mattes on high-resolution images in reasonable time. To overcome this challenge, we introduce a high-resolution image matting framework based on alpha matte refinement from low-resolution to high-resolution (HRIMF-AMR). The proposed framework transforms the complex high-resolution image matting problem into low-resolution image matting problem and high-resolution alpha matte refinement problem. While the first problem is solved by adopting an existing image matting method, the latter is addressed by applying the Detail Difference Feature Extractor (DDFE) designed as a part of our work. The DDFE extracts detail difference features from high-resolution images by measuring the image feature difference between high-resolution images and low-resolution images. The low-resolution alpha matte is refined according to the extracted detail difference feature, providing the high-resolution alpha matte. In addition, the Matte Detail Resolution Difference (MDRD) loss is introduced to train the DDFE, which imposes an additional constraint on the extraction of detail difference features with mattes. Experimental results show that integrating HRIMF-AMR significantly enhances the performance of existing matting methods on high-resolution images of Transparent-460 and Alphamatting. © 1992-2012 IEEE.

关键词： alpha matte detail detail difference feature High-resolution image matting natural image matting

来源：评论

学校读者我要写书评

暂无评论

Survey on AI-Generated Media Detection: From Non-MLLM to MLLM

arXiv

引用

arXiv 2025年

作者： Zou, Yueying Li, Peipei Li, Zekun Huang, Huaibo Cui, Xing Liu, Xuannan Zhang, Chenghanyu He, Ran School of Artificial Intelligence Beijing University of Posts and Telecommunications Beijing100876 China School of Science Beijing University of Posts and Telecommunications Beijing100876 China School of Computer Science University of California Santa Barbara United States State Key Laboratory of Multimodal Artificial Intelligence Systems CASIA New Laboratory of Pattern Recognition CASIA School of Artificial Intelligence University of Chinese Academy of Sciences Beijing100190 China

The proliferation of AI-generated media poses significant challenges to information authenticity and social trust, making reliable detection methods highly demanded. Methods for detecting AI-generated media have evolved rapidly, paralleling the advancement of Multimodal Large Language Models (MLLMs). Current detection approaches can be categorized into two main groups: Non-MLLM-based and MLLM-based methods. The former employs high-precision, domain-specific detectors powered by deep learning techniques, while the latter utilizes general-purpose detectors based on MLLMs that integrate authenticity verification, explainability, and localization capabilities. Despite significant progress in this field, there remains a gap in literature regarding a comprehensive survey that examines the transition from domain-specific to general-purpose detection methods. This paper addresses this gap by providing a systematic review of both approaches, analyzing them from single-modal and multimodal perspectives. We present a detailed comparative analysis of these categories, examining their methodological similarities and differences. Through this analysis, we explore potential hybrid approaches and identify key challenges in forgery detection, providing direction for future research. Additionally, as MLLMs become increasingly prevalent in detection tasks, ethical and security considerations have emerged as critical global concerns. We examine the regulatory landscape surrounding Generative AI (GenAI) across various jurisdictions, offering valuable insights for researchers and practitioners in this field. © 2025, CC BY.

关键词： Adversarial machine learning

来源：评论

学校读者我要写书评

暂无评论

Zero-Shot Audio Captioning Using Soft and Hard Prompts

arXiv

引用

arXiv 2024年

作者： Zhang, Yiming Xu, Xuenan Du, Ruoyi Liu, Haohe Dong, Yuan Tan, Zheng-Hua Wang, Wenwu Ma, Zhanyu The Pattern Recognition and Intelligent System Laboratory School of Artificial Intelligence Beijing University of Posts and Telecommunications Beijing100876 China The Department of Computer Science and Engineering Shanghai Jiao Tong University Shanghai200240 China The Department of Electronic Systems Aalborg University Aalborg9220 Denmark The Centre for Vision Speech and Signal Processing University of Surrey GuildfordGU2 7XH United Kingdom

In traditional audio captioning methods, a model is usually trained in a fully supervised manner using a human-annotated dataset containing audio-text pairs and then evaluated on the test sets from the same dataset. Such methods have two limitations. First, these methods are often data-hungry and require time-consuming and expensive human annotations to obtain audio-text pairs. Second, these models often suffer from performance degradation in cross-domain scenarios, i.e., when the input audio comes from a different domain than the training set, which, however, has received little attention. We propose an effective audio captioning method based on the contrastive language-audio pre-training (CLAP) model to address these issues. Our proposed method requires only textual data for training, enabling the model to generate text from the textual feature in the cross-modal semantic space. In the inference stage, the model generates the descriptive text for the given audio from the audio feature by leveraging the audio-text alignment from CLAP. We devise two strategies to mitigate the discrepancy between text and audio embeddings: a mixed-augmentation-based soft prompt and a retrieval-based acoustic-aware hard prompt. These approaches are designed to enhance the generalization performance of our proposed model, facilitating the model to generate captions more robustly and accurately. Extensive experiments on AudioCaps and Clotho benchmarks show the effectiveness of our proposed method, which outperforms other zero-shot audio captioning approaches for in-domain scenarios and outperforms the compared methods for cross-domain scenarios, underscoring the generalization ability of our method. © 2024, CC BY.

关键词： Semantics

来源：评论

学校读者我要写书评

暂无评论

Cross Domain Object Detection by Target-Perceived Dual Branch Distillation

arXiv

引用

arXiv 2022年

作者： He, Mengzhe Wang, Yali Wu, Jiaxi Wang, Yiru Li, Hanqing Li, Bo Gan, Weihao Wu, Wei Qiao, Yu ShenZhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences China SenseTime Research University of Chinese Academy of Science China Shanghai AI Laboratory Shanghai China Beihang University China SIAT Branch Shenzhen Institute of Artificial Intelligence and Robotics for Society China

Cross domain object detection is a realistic and challenging task in the wild. It suffers from performance degradation due to large shift of data distributions and lack of instance-level annotations in the target domain. Existing approaches mainly focus on either of these two difficulties, even though they are closely coupled in cross domain object detection. To solve this problem, we propose a novel Target-perceived Dual-branch Distillation (TDD) framework. By integrating detection branches of both source and target domains in a unified teacher-student learning scheme, it can reduce domain shift and generate reliable supervision effectively. In particular, we first introduce a distinct Target Proposal Perceiver between two domains. It can adaptively enhance source detector to perceive objects in a target image, by leveraging target proposal contexts from iterative cross-attention. Afterwards, we design a concise Dual Branch Self Distillation strategy for model training, which can progressively integrate complementary object knowledge from different domains via self-distillation in two branches. Finally, we conduct extensive experiments on a number of widely-used scenarios in cross domain object detection. The results show that our TDD significantly outperforms the state-of-the-art methods on all the benchmarks. Our code and model will be available at here. Copyright © 2022, The Authors. All rights reserved.

关键词： Distillation

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：