检索结果-内蒙古大学图书馆

arXiv 2021年

作者： Luo, Gen Zhou, Yiyi Sun, Xiaoshuai Wu, Yongjian Gao, Yue Ji, Rongrong Key Laboratory of Multimedia Trusted Perception and Efficient Computing Ministry of Education of China Xiamen University 361005 China Institute of Artificial Intelligence Xiamen University 361005 China Youtu Lab Tencent China Software School of Tsinghua University China

In this paper, we are committed to establishing a unified and end-to-end multi-modal network via exploring language-guided visual recognition. To approach this target, we first propose a novel multimodal convolution module called Language-guided Dynamic Convolution (LaConv). Its convolution kernels are dynamically generated based on natural language information, which can help extract differentiated visual features for different multi-modal examples. Based on the LaConv module, we further build a fully language-driven convolution network, termed as LaConvNet, which can unify the visual recognition and multi-modal reasoning in one forward structure. To validate LaConv and LaConvNet, we conduct extensive experiments on seven benchmark datasets of three vision-and-language tasks, i.e., visual question answering (VQA), referring expression comprehension (REC) and segmentation (RES). The experimental results not only show the competitive or better performance of LaConvNet against existing multi-modal networks, but also witness the merits of LaConvNet as an unified structure, including compact network, low computational cost and high generalization ability. Our source code is released in SimREC project: https://***/luogen1996/LaConvNet. Copyright © 2021, The Authors. All rights reserved.

关键词： Convolution

来源：评论

学校读者我要写书评

暂无评论

Lottery Jackpots Exist in Pre-trained Models

arXiv

引用

arXiv 2021年

作者： Zhang, Yuxin Lin, Mingbao Zhong, Yunshan Chao, Fei Ji, Rongrong The Key Laboratory of Multimedia Trusted Perception and Efficient Computing Ministry of Education of China Xiamen University Xiamen361005 China School of Informatics Xiamen University Xiamen361005 China Youtu Laboratory Tencent Shanghai200233 China Institute of Artificial Intelligence Xiamen University Xiamen361005 China

Network pruning is an effective approach to reduce network complexity with acceptable performance compromise. Existing studies achieve the sparsity of neural networks via time-consuming weight training or complex searching on networks with expanded width, which greatly limits the applications of network pruning. In this paper, we show that high-performing and sparse sub-networks without the involvement of weight training, termed "lottery jackpots", exist in pre-trained models with unexpanded width. Our presented lottery jackpots are traceable through empirical and theoretical outcomes. For example, we obtain a lottery jackpot that has only 10% parameters and still reaches the performance of the original dense VGGNet-19 without any modifications on the pre-trained weights on CIFAR-10. Furthermore, we improve the efficiency for searching lottery jackpots from two perspectives. Firstly, we observe that the sparse masks derived from many existing pruning criteria have a high overlap with the searched mask of our lottery jackpot, among which, the magnitude-based pruning results in the most similar mask with ours. In compliance with this insight, we initialize our sparse mask using the magnitude-based pruning, resulting in at least 3× cost reduction on the lottery jackpot searching while achieving comparable or even better performance. Secondly, we conduct an in-depth analysis of the searching process for lottery jackpots. Our theoretical result suggests that the decrease in training loss during weight searching can be disturbed by the dependency between weights in modern networks. To mitigate this, we propose a novel short restriction method to restrict change of masks that may have potential negative impacts on the training loss, which leads to a faster convergence and reduced oscillation for searching lottery jackpots. Consequently, our searched lottery jackpot removes 90% weights in ResNet-50, while it easily obtains more than 70% top-1 accuracy using only 5 searching

关键词： Convolutional neural networks

来源：评论

学校读者我要写书评

暂无评论

Uncovering the Over-smoothing Challenge in Image Super-Resolution: Entropy-based Quantification and Contrastive Optimization

arXiv

引用

arXiv 2022年

作者： Xu, Tianshuo Li, Lijiang Mi, Peng Zheng, Xiawu Chao, Fei Ji, Rongrong Tian, Yonghong Shen, Qiang Key Laboratory of Multimedia Trusted Perception and Efficient Computing Ministry of Education of China School of Informatics Xiamen University 361005 China Peng Cheng Laboratory Shenzhen518066 China Department of Computer Science Institute of Mathematics Physics and Computer Science Aberystwyth University SY23 3DB United Kingdom Institute of Artificial Intelligence Xiamen University Xiamen361005 China School of Electronics Engineering and Computer Science Peking University Beijing100871 China

PSNR-oriented models are a critical class of super-resolution models with applications across various fields. However, these models tend to generate over-smoothed images, a problem that has been analyzed previously from the perspectives of models or loss functions, but without taking into account the impact of data properties. In this paper, we present a novel phenomenon that we term the center-oriented optimization (COO) problem, where a model's output converges towards the center point of similar high-resolution images, rather than towards the ground *** demonstrate that the strength of this problem is related to the uncertainty of data, which we quantify using entropy. We prove that as the entropy of high-resolution images increases, their center point will move further away from the clean image distribution, and the model will generate over-smoothed images. Implicitly optimizing the COO problem, perceptualdriven approaches such as perceptual loss, model structure optimization, or GAN-based methods can be viewed. We propose an explicit solution to the COO problem, called Detail Enhanced Contrastive Loss (DECLoss). DECLoss utilizes the clustering property of contrastive learning to directly reduce the variance of the potential high-resolution distribution and thereby decrease the entropy. We evaluate DECLoss on multiple super-resolution benchmarks and demonstrate that it improves the perceptual quality of PSNR-oriented models. Moreover, when applied to GAN-based methods, such as RaGAN, DECLoss helps to achieve state-of-the-art performance, such as 0.093 LPIPS with 24.51 PSNR on 4× downsampled Urban100, validating the effectiveness and generalization of our approach. © 2022, CC BY-NC-ND.

关键词： Benchmarking

来源：评论

学校读者我要写书评

暂无评论

Spatio-Temporal Joint Graph Convolutional Networks for Traffic Forecasting

arXiv

引用

arXiv 2021年

作者： Zheng, Chuanpan Fan, Xiaoliang Pan, Shirui Jin, Haibing Peng, Zhaopeng Wu, Zonghan Wang, Cheng Yu, Philip S. Fujian Key Laboratory of Sensing and Computing for Smart Cities School of Informatics Computer Science and Technology Department Key Laboratory of Multimedia Trusted Perception and Efficient Computing Ministry of Education of China Xiamen University Xiamen361005 China School of Information and Communication Technology Griffith University Australia Centre for Artificial Intelligence FEIT University of Technology Sydney Australia Department of Computer Science University of Illinois at Chicago ChicagoIL60607 United States

Recent studies have shifted their focus towards formulating traffic forecasting as a spatio-temporal graph modeling problem. Typically, they constructed a static spatial graph at each time step and then connected each node with itself between adjacent time steps to create a spatio-temporal graph. However, this approach failed to explicitly reflect the correlations between different nodes at different time steps, thus limiting the learning capability of graph neural networks. Additionally, those models overlooked the dynamic spatio-temporal correlations among nodes by using the same adjacency matrix across different time steps. To address these limitations, we propose a novel approach called Spatio-Temporal Joint Graph Convolutional Networks (STJGCN) for accurate traffic forecasting on road networks over multiple future time steps. Specifically, our method encompasses the construction of both pre-defined and adaptive spatio-temporal joint graphs (STJGs) between any two time steps, which represent comprehensive and dynamic spatio-temporal correlations. We further introduce dilated causal spatio-temporal joint graph convolution layers on the STJG to capture spatio-temporal dependencies from distinct perspectives with multiple ranges. To aggregate information from different ranges, we propose a multi-range attention mechanism. Finally, we evaluate our approach on five public traffic datasets and experimental results demonstrate that STJGCN is not only computationally efficient but also outperforms 11 state-of-the-art baseline methods. Copyright © 2021, The Authors. All rights reserved.

关键词： Graph neural networks

来源：评论

学校读者我要写书评

暂无评论

Discover and align taxonomic context priors for open-world semi-supervised learning 23

Discover and align taxonomic context priors for open-world s...

引用

Proceedings of the 37th International Conference on Neural Information Processing Systems

作者： Yu Wang Zhun Zhong Pengchong Qiao Xuxin Cheng Xiawu Zheng Chang Liu Nicu Sebe Rongrong Ji Jie Chen School of Electronic and Computer Engineering Peking University Shenzhen China and AI for Science (AI4S)-Preferred Program Peking University Shenzhen Graduate School China School of Computer Sceince University of Nottingham United Kingdom School of Electronic and Computer Engineering Peking University Shenzhen China and Department of Information Engineering and Computer Science University of Trento Italy School of Electronic and Computer Engineering Peking University Shenzhen China Peng Cheng Laboratory Shenzhen China and Key Laboratory of Multimedia Trusted Perception and Efficient Computing Ministry of Education of China Xiamen University Department of Automation Tsinghua University Beijing China Department of Information Engineering and Computer Science University of Trento Italy School of Electronic and Computer Engineering Peking University Shenzhen China and Peng Cheng Laboratory Shenzhen China and AI for Science (AI4S)-Preferred Program Peking University Shenzhen Graduate School China

Open-world Semi-Supervised Learning (OSSL) is a realistic and challenging task, aiming to classify unlabeled samples from both seen and novel classes using partially labeled samples from the seen classes. Previous works typically explore the relationship of samples as priors on the pre-defined single-granularity labels to help novel class recognition. In fact, classes follow a taxonomy and samples can be classified at multiple levels of granularity, which contains more underlying relationships for supervision. We thus argue that learning with single-granularity labels results in sub-optimal representation learning and inaccurate pseudo labels, especially with unknown classes. In this paper, we take the initiative to explore and propose a uniformed framework, called Taxonomic context prIors Discovering and Aligning (TIDA), which exploits the relationship of samples under various granularity. It allows us to discover multi-granularity semantic concepts as taxonomic context priors (i.e., sub-class, target-class, and super-class), and then collaboratively leverage them to enhance representation learning and improve the quality of pseudo labels. Specifically, TIDA comprises two components: i) A taxonomic context discovery module that constructs a set of hierarchical prototypes in the latent space to discover the underlying taxonomic context priors; ii) A taxonomic context-based prediction alignment module that enforces consistency across hierarchical predictions to build the reliable relationship between classes among various granularity and provide additions supervision. We demonstrate that these two components are mutually beneficial for an effective OSSL framework, which is theoretically explained from the perspective of the EM algorithm. Extensive experiments on seven commonly used datasets show that TIDA can significantly improve the performance and achieve a new state of the art. The source codes are publicly available at https://***/rain305f/TIDA.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Deep Code Search with Naming-Agnostic Contrastive Multi-view Learning

引用

ACM Transactions on Knowledge Discovery from Data 1000年

作者： Jiadong Feng Wei Li Suhuang Wu Zhao Wei Yong Xu Juhong Wang Hui Li Key Laboratory of Multimedia Trusted Perception and Efficient Computing Ministry of Education of China Xiamen University China School of Electronic and Computer Engineering Peking University China Tencent China

Software development is a repetitive task, as developers usually reuse or get inspiration from existing implementations. Code search, which refers to the retrieval of relevant code snippets from a codebase according to the developer’s intent that has been expressed as a query, has become increasingly important in the software development process. Due to the success of deep learning in various applications, a great number of deep learning based code search approaches have sprung up and achieved promising results. However, developers may not follow the same naming conventions and the same variable may have different variable names in different implementations, bringing a challenge to deep learning based code search methods that rely on explicit variable correspondences to understand source code. To overcome this challenge, we propose a naming-agnostic code search method (NACS) based on contrastive multi-view code representation learning. NACS strips information bound to variable names from Abstract Syntax Tree (AST), the representation of the abstract syntactic structure of source code, and focuses on capturing intrinsic properties solely from AST structures. We use semantic-level and syntax-level augmentation techniques to prepare realistically rational data and adopt contrastive learning to design a graph-view modeling component in NACS to enhance the understanding of code snippets. We further model ASTs in a path view to strengthen the graph-view modeling component through multi-view learning. Extensive experiments show that NACS provides superior code search performance compared to baselines and NACS can be adapted to help existing code search methods overcome the impact of different naming conventions. Our implementation is available at https://***/KDEGroup/NACS.

关键词： code search multi-view learning graph self-supervised learning graph neural network

来源：评论

学校读者我要写书评

暂无评论

MMICT: Boosting Multi-Modal Fine-Tuning with In-Context Examples

引用

ACM Transactions on multimedia computing, Communications, and Applications 1000年

作者： Tao Chen Enwei Zhang Yuting Gao Ke Li Xing Sun Yan Zhang Hui Li Rongrong Ji Key Laboratory of Multimedia Trusted Perception and Efficient Computing Ministry of Education of China Xiamen University China Tencent Youtu Lab China

Although In-Context Learning (ICL) brings remarkable performance gains to Large Language Models (LLMs), the improvements remain lower than fine-tuning on downstream tasks. This paper introduces Multi-Modal In-Context Tuning (MMICT), a novel multi-modal fine-tuning paradigm that boosts multi-modal fine-tuning by fully leveraging the promising ICL capability of multi-modal LLMs (MM-LLMs). We propose the Multi-Modal Hub (M-Hub), a unified module that captures various multi-modal features according to different inputs and objectives. Based on M-Hub, MMICT enables MM-LLMs to learn from in-context visual-guided textual features and subsequently generate outputs conditioned on the textual-guided visual features. Moreover, leveraging the flexibility of M-Hub, we design a variety of in-context demonstrations. Extensive experiments on a diverse range of downstream multi-modal tasks demonstrate that MMICT significantly outperforms traditional fine-tuning strategy and the vanilla ICT method that directly takes the concatenation of all information from different modalities as input. Our implementation is available at: https://***/KDEGroup/MMICT.

关键词： Multi-Modal Alignment Text Generation In-Context Tuning

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：