检索结果-内蒙古大学图书馆

arXiv 2024年

作者： Tao, Zeng Wang, Yan Lin, Junxiong Wang, Haoran Mai, Xinji Yu, Jiawen Tong, Xuan Zhou, Ziheng Yan, Shaoqi Zhao, Qing Han, Liyuan Zhang, Wenqiang Shanghai Engineering Research Center of AI & Robotics Academy for Engineering & Technology Fudan University Shanghai China School of Information Science and Technology Fudan University Shanghai China Institute of Automation Chinese Academy of Sciences Beijing China Engineering Research Center of AI & Robotics Ministry of Education Academy for Engineering & Technology Fudan University Shanghai China Shanghai Key Lab of Intelligent Information Processing School of Computer Science Fudan University Shanghai China

The performance of CLIP in dynamic facial expression recognition (DFER) task doesn’t yield exceptional results as observed in other CLIP-based classification tasks. While CLIP’s primary objective is to achieve alignment between images and text in the feature space, DFER poses challenges due to the abstract nature of text and the dynamic nature of video, making label representation limited and perfect alignment difficult. To address this issue, we have designed A3lign-DFER, which introduces a new DFER labeling paradigm to comprehensively achieve alignment, thus enhancing CLIP’s suitability for the DFER task. Specifically, our A3lign-DFER method is designed with multiple modules that work together to obtain the most suitable expanded-dimensional embeddings for classification and to achieve alignment in three key aspects: affective, dynamic, and bidirectional. We replace the input label text with a learnable Multi-Dimensional Alignment Token (MAT), enabling alignment of text to facial expression video samples in both affective and dynamic dimensions. After CLIP feature extraction, we introduce the Joint Dynamic Alignment Synchronizer (JAS), further facilitating synchronization and alignment in the temporal dimension. Additionally, we implement a Bidirectional Alignment Training Paradigm (BAP) to ensure gradual and steady training of parameters for both modalities. Our insightful and concise A3lign-DFER method achieves state-of-the-art results on multiple DFER datasets, including DFEW, FERV39k, and MAFW. Extensive ablation experiments and visualization studies demonstrate the effectiveness of A3lign-DFER. The code will be available in the future. © 2024, CC BY.

关键词： Alignment

来源：评论

学校读者我要写书评

暂无评论

Syntax-enhanced pre-trained model 59

Syntax-enhanced pre-trained model

引用

Joint Conference of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language processing, ACL-IJCNLP 2021

作者： Xu, Zenan Guo, Daya Tang, Duyu Su, Qinliang Shou, Linjun Gong, Ming Zhong, Wanjun Quan, Xiaojun Jiang, Daxin Duan, Nan School of Computer Science and Engineering Sun Yat-Sen University Guangzhou China Microsoft Research Asia Beijing China Microsoft Search Technology Center Asia Beijing China Guangdong Key Laboratory of Big Data Analysis and Processing Guangzhou China Key Lab. of Machine Intelligence and Advanced Computing Ministry of Education China

ISBN: (纸本)9781954085527

We study the problem of leveraging the syntactic structure of text to enhance pre-trained models such as BERT and RoBERTa. Existing methods utilize syntax of text either in the pre-training stage or in the fine-tuning stage, so that they suffer from discrepancy between the two stages. Such a problem would lead to the necessity of having human-annotated syntactic information, which limits the application of existing methods to broader scenarios. To address this, we present a model that utilizes the syntax of text in both pre-training and fine-tuning stages. Our model is based on Transformer with a syntax-aware attention layer that considers the dependency tree of the text. We further introduce a new pre-training task of predicting the syntactic distance among tokens in the dependency tree. We evaluate the model on three downstream tasks, including relation classification, entity typing, and question answering. Results show that our model achieves state-of-the-art performance on six public benchmark datasets. We have two major findings. First, we demonstrate that infusing automatically produced syntax of text improves pre-trained models. Second, global syntactic distances among tokens bring larger performance gains compared to local head relations between contiguous tokens. © 2021 Association for Computational Linguistics

关键词： Syntactics

来源：评论

学校读者我要写书评

暂无评论

OpenAUC: towards AUC-oriented open-set recognition 22

OpenAUC: towards AUC-oriented open-set recognition

引用

Proceedings of the 36th International Conference on Neural information processing Systems

作者： Zitai Wang Qianqian Xu Zhiyong Yang Yuan He Xiaochun Cao Qingming Huang SKLOIS Institute of Information Engineering CAS and School of Cyber Security University of Chinese Academy of Sciences Key Lab. of Intelligent Information Processing Institute of Computing Tech. CAS School of Computer Science and Tech. University of Chinese Academy of Sciences Alibaba Group School of Cyber Science and Tech. Shenzhen Campus Sun Yat-sen University and SKLOIS Institute of Information Engineering CAS School of Computer Science and Tech. University of Chinese Academy of Sciences and Key Lab. of Intelligent Information Processing Institute of Computing Tech. CAS and BDKM University of Chinese Academy of Sciences and Peng Cheng Laboratory

ISBN: (纸本)9781713871088

Traditional machine learning follows a close-set assumption that the training and test set share the same label space. While in many practical scenarios, it is inevitable that some test samples belong to unknown classes (open-set). To fix this issue, Open-Set Recognition (OSR), whose goal is to make correct predictions on both close-set samples and open-set samples, has attracted rising attention. In this direction, the vast majority of literature focuses on the pattern of open-set samples. However, how to evaluate model performance in this challenging task is still unsolved. In this paper, a systematic analysis reveals that most existing metrics are essentially inconsistent with the aforementioned goal of OSR: (1) For metrics extended from close-set classification, such as Open-set F-score, Youden's index, and Normalized Accuracy, a poor open-set prediction can escape from a low performance score with a superior close-set prediction. (2) Novelty detection AUC, which measures the ranking performance between close-set and open-set samples, ignores the close-set performance. To fix these issues, we propose a novel metric named OpenAUC. Compared with existing metrics, OpenAUC enjoys a concise pairwise formulation that evaluates open-set performance and close-set performance in a coupling manner. Further analysis shows that OpenAUC is free from the aforementioned inconsistency properties. Finally, an end-to-end learning method is proposed to minimize the OpenAUC risk, and the experimental results on popular benchmark datasets speak to its effectiveness.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Suppressing Uncertainties in Degradation Estimation for Blind Super-Resolution

arXiv

引用

arXiv 2024年

作者： Lin, Junxiong Tao, Zen Tong, Xuan Mai, Xinji Wang, Haoran Wang, Boyang Wang, Yan Zhao, Qing Yu, Jiawen Lin, Yuxuan Yan, Shaoqi Gao, Shuyong Zhang, Wenqiang Shanghai Engineering Research Center of AI & Robotics Academy for Engineering & Technology Fudan University Shanghai China East China University of Science and Technology Shanghai China Fudan University Shanghai China Engineering Research Center of AI & Robotics Ministry of Education Academy for Engineering & Technology Fudan University Shanghai Key Lab of Intelligent Information Processing School of Computer Science Fudan University Shanghai China

The problem of blind image super-resolution aims to recover high-resolution (HR) images from low-resolution (LR) images with unknown degradation modes. Most existing methods model the image degradation process using blur kernels. However, this explicit modeling approach struggles to cover the complex and varied degradation processes encountered in the real world, such as high-order combinations of JPEG compression, blur, and noise. Implicit modeling for the degradation process can effectively overcome this issue, but a key challenge of implicit modeling is the lack of accurate ground truth labels for the degradation process to conduct supervised training. To overcome this limitations inherent in implicit modeling, we propose an Uncertainty-based degradation representation for blind Super-Resolution framework (USR). By suppressing the uncertainty of local degradation representations in images, USR facilitated self-supervised learning of degradation representations. The USR consists of two components: Adaptive Uncertainty-Aware Degradation Extraction (AUDE) and a feature extraction network composed of Variable Depth Dynamic Convolution (VDDC) blocks. To extract Uncertainty-based Degradation Representation from LR images, the AUDE utilizes the Self-supervised Uncertainty Contrast module with Uncertainty Suppression Loss to suppress the inherent model uncertainty of the Degradation Extractor. Furthermore, VDDC block integrates degradation information through dynamic convolution. Rhe VDDC also employs an Adaptive Intensity Scaling operation that adaptively adjusts the degradation representation according to the network hierarchy, thereby facilitating the effective integration of degradation information. Quantitative and qualitative experiments affirm the superiority of our approach. Copyright © 2024, The Authors. All rights reserved.

关键词： Convolution

来源：评论

学校读者我要写书评

暂无评论

Text Recognition in Real Scenarios with a Few labeled Samples

Text Recognition in Real Scenarios with a Few Labeled Sample...

引用

International Conference on Pattern Recognition

作者： Jinghuang Lin Zhanzhan Cheng Fan Bai Yi Niu Shiliang Pu Shuigeng Zhou Shanghai Key Lab of Intelligent Information Processing and School of Computer Science Fudan University Shanghai China Hikvision Research Institute China

Scene text recognition (STR) is still a hot research topic in computer vision field due to its various applications. Existing works mainly focus on learning a general model with a huge number of synthetic text images to recognize unconstrained scene texts, and have achieved substantial progress. However, these methods are not quite applicable in many real-world scenarios where 1) high recognition accuracy is required, while 2) labeled samples are lacked. To tackle this challenging problem, this paper proposes a few-shot adversarial sequence domain adaptation (FASDA) approach to build sequence adaptation between the synthetic source domain (with many synthetic labeled samples) and a specific target domain (with only some or a few real labeled samples). This is done by simultaneously learning each character's feature representation with an attention mechanism and establishing the corresponding character-level latent subspace with adversarial learning. Our approach can maximize the character-level confusion between the source domain and the target domain, thus achieves the sequence-level adaptation with even a small number of labeled samples in the target domain. Extensive experiments on various datasets show that our method significantly outperforms the finetuning scheme, and obtains comparable performance to the state-of-the-art STR methods.

关键词： Computer vision Image recognition Text recognition Target recognition Computational modeling

来源：评论

学校读者我要写书评

暂无评论

Recognizing Multiple Text Sequences from an Image by Pure End-to-End Learning

Recognizing Multiple Text Sequences from an Image by Pure En...

引用

International Conference on Pattern Recognition

作者： Zhenlong Xu Shuigeng Zhou Fan Bai Zhanzhan Cheng Yi Niu Shiliang Pu Shanghai Key Lab of Intelligent Information Processing School of Computer Science Fudan University Shanghai China Hikvision Research Institute China

We address a challenging problem: recognizing multiple text sequences from an image by pure end-to-end learning. It is twofold: 1) Multiple text sequences recognition. Each image may contain multiple text sequences of different content, location and orientation, we try to recognize all these texts in the image. 2) Pure end-to-end (PEE) learning. We solve the problem in a pure end-to-end learning way where each training image is labeled by only text transcripts of the contained sequences, without any geometric annotations. Most existing works recognize multiple text sequences from an image in a non-end-to-end (NEE) or quasi-end-to-end (QEE) way, in which each image is trained with both text transcripts and text locations. Only recently, a PEE method was proposed to recognize text sequences from an image where the text sequence was split to several lines in the image. However, it cannot be directly applied to recognizing multiple text sequences from an image. So in this paper, we propose a pure end-to-end learning method to recognize multiple text sequences from an image. Our method directly learns the probability distribution of multiple sequences conditioned on each input image, and outputs multiple text transcripts with a well-designed decoding strategy. To evaluate the proposed method, we construct several datasets mainly based on an existing public dataset and two real application scenarios. Experimental results show that the proposed method can effectively recognize multiple text sequences from images, and outperforms CTC-based and attention-based baseline methods.

关键词： Training Performance evaluation Learning systems Image recognition Text recognition Annotations Probability distribution

来源：评论

学校读者我要写书评

暂无评论

Tiny noise, big mistakes: adversarial perturbations induce errors in brain–computer interface spellers

引用

National Science Review 2021年第4期8卷 78-90页

作者： Xiao Zhang Dongrui Wu Lieyun Ding Hanbin Luo Chin-Teng Lin Tzyy-Ping Jung Ricardo Chavarriaga Ministry of Education Key Laboratory of Image Processing and Intelligent Control School of Artificial Intelligence and AutomationHuazhong University of Science and Technology School of Civil Engineering and Mechanics Huazhong University of Science and Technology Centre of Artificial Intelligence Faculty of Engineering and Information Technology University of Technology Sydney Swartz Center for Computational Neuroscience Institute for Neural ComputationUniversity of California San Diego Center for Advanced Neurological Engineering Institute of Engineering in Medicine University of California San Diego ZHAW Data Lab Zürich University of Applied Sciences

An electroencephalogram(EEG)-based brain–computer interface(BCI) speller allows a user to input text to a computer by thought. It is particularly useful to severely disabled individuals, e.g. amyotrophic lateral sclerosis patients, who have no other effective means of communication with another person or a *** studies so far focused on making EEG-based BCI spellers faster and more reliable; however, few have considered their security. This study, for the first time, shows that P300 and steady-state visual evoked potential BCI spellers are very vulnerable, i.e. they can be severely attacked by adversarial perturbations,which are too tiny to be noticed when added to EEG signals, but can mislead the spellers to spell anything the attacker wants. The consequence could range from merely user frustration to severe misdiagnosis in clinical applications. We hope our research can attract more attention to the security of EEG-based BCI spellers, and more broadly, EEG-based BCIs, which has received little attention before.

关键词： electroencephalogram brain-computer interfaces BCI spellers adversarial examples

来源：评论

学校读者我要写书评

暂无评论

OTKGE: multi-modal knowledge graph embeddings via optimal transport 22

OTKGE: multi-modal knowledge graph embeddings via optimal tr...

引用

Proceedings of the 36th International Conference on Neural information processing Systems

作者： Zongsheng Cao Qianqian Xu Zhiyong Yang Yuan He Xiaochun Cao Qingming Huang SKLOIS Institute of Information Engineering CAS and School of Cyber Security University of Chinese Academy of Sciences Key Lab. of Intelligent Information Processing Institute of Computing Tech. CAS School of Computer Science and Tech. University of Chinese Academy of Sciences Alibaba Group School of Cyber Science and Tech. Shenzhen Campus Sun Yat-sen University and SKLOIS Institute of Information Engineering CAS School of Computer Science and Tech. University of Chinese Academy of Sciences and Key Lab. of Intelligent Information Processing Institute of Computing Tech. CAS and BDKM University of Chinese Academy of Sciences and Peng Cheng Laboratory

ISBN: (纸本)9781713871088

Multi-modal knowledge graph embeddings (KGE) have caught more and more attention in learning representations of entities and relations for link prediction tasks. Different from previous uni-modal KGE approaches, multi-modal KGE can leverage expressive knowledge from a wealth of modalities (image, text, etc.), leading to more comprehensive representations of real-world entities. However, the critical challenge along this course lies in that the multi-modal embedding spaces are usually heterogeneous. In this sense, direct fusion will destroy the inherent spatial structure of different modal embeddings. To overcome this challenge, we revisit multi-modal KGE from a distributional alignment perspective and propose optimal transport knowledge graph embeddings (OTKGE). Specifically, we model the multi-modal fusion procedure as a transport plan moving different modal embeddings to a unified space by minimizing the Wasserstein distance between multi-modal distributions. Theoretically, we show that by minimizing the Wasserstein distance between the individual modalities and the unified embedding space, the final results are guaranteed to maintain consistency and comprehensiveness. Moreover, experimental results on well-established multi-modal knowledge graph completion benchmarks show that our OTKGE achieves state-of-the-art performance.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Direct Participant Recruitment Strategy in Sparse Mobile Crowdsensing

引用

Jisuanji Xuebao/Chinese Journal of Computers 2022年第7期45卷 1539-1556页

作者： Tu, Chun-Yu Yu, Zhi-Yong Han, Lei Zhu, Wei-Ping Huang, Fang-Wan Guo, Wen-Zhong Wang, Le-Ye College of Mathematics and Computer Science Fuzhou University Fuzhou350108 China Department of Fujian Key Laboratory of Network Computing and Intelligent Information Processing Fuzhou University Fuzhou350108 China School of Computer Science Northwestern Polytechnical University Xi'an710072 China Key Lab of High Confidence Software Technologies Peking University Beijing100871 China School of Computer Science Peking University Beijing100871 China

Sparse Mobile Crowdsensing (Sparse MCS) selects a small part of sub-areas for data collection and infers the data of other sub-areas from the collected data. Compared with Mobile Crowdsensing (MCS) that does not use data inference methods, Sparse MCS saves sensing costs while ensuring the quality of global data. However, the existing research works on Sparse MCS only focus on selecting a small part of sub-areas with higher value. It does not consider whether the recruited participants can collect the data of the required sub-areas, and also ignores the value of other data collected by the participants. In order to solve the limitations of traditional methods in sub-areas selection, this paper starts from the perspective of participants and concentrates on the contribution of the data collected by each participant to the entire collection task. All the data contributions collected by each participant will become the basis for decision-making for the participant's choice. And correspondingly, a new idea to deal with the problem of participant selection under Sparse MCS is proposed. In view of the fact that each person's daily movement trajectory is basically stable, and the data collected by different people on their respective trajectories have different values, this paper uses this regularity and difference to study how to directly recruit participants who can collect high-value data. Furthermore, the participant selection problem considered in this paper is not limited to the data collection in the next cycle, but directly recruits some participants to continue the data collection task in the next multiple cycles. The participant selection problem that spans multiple cycles can be modeled as a dynamic decision-making problem. Since heuristic strategies may fall into a local optimal solution, this paper uses reinforcement learning to solve the participant selection problem: We use the participant selection system as an agent of reinforcement learning, and design the

关键词： Compressed sensing

来源：评论

学校读者我要写书评

暂无评论

DEEPACC:Automate Chromosome Classification Based On Metaphase Images Using Deep Learning Framework Fused With Priori Knowledge

DEEPACC:Automate Chromosome Classification Based On Metaphas...

引用

IEEE International Symposium on Biomedical Imaging

作者： Li Xiao Chunlong Luo Ningbo HuaMei Hospital University of the Chinese Academy of Sciences (UCAS) Advanced Computer Research Center Key Laboratory of Intelligent Information Processing Institute of Computing Technology Chinese Academy of Science

Chromosome classification is an important but difficult and tedious task in karyotyping. Previous methods only classify manually segmented single chromosome, which is far from clinical practice. In this work, we propose a detection based method, DeepACC, to locate and fine classify chromosomes simultaneously based on the whole metaphase image. We firstly introduce the Additive Angular Margin Loss to enhance the discriminative power of the model. To alleviate batch effects, we transform decision boundary of each class case-by-case through a siamese network which make full use of priori knowledges that chromosomes usually appear in pairs. Furthermore, we take the clinically seven group criteria as a prior-knowledge and design an additional Group Inner-Adjacency Loss to further reduce inter-class similarities. A private metaphase image dataset from clinical laboratory are collected and labelled to evaluate the performance. Results show that the new design brings encouraging performance gains comparing to the state-of-the-art baseline models.

关键词： Deep learning Image segmentation Additives Biological system modeling Knowledge based systems Transforms Performance gain

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：