检索结果-内蒙古大学图书馆

Bridging explicit and implicit deep generative models via neural stein estimators 21

学校读者我要写书评

暂无评论

Bridging explicit and implicit deep generative models via ne...

Proceedings of the 35th International Conference on Neural Information Processing Systems

作者： Qitian Wu Rui Gao Hongyuan Zha Department of Computer Science and Engineering MoE Key Lab of Artificial Intelligence AI Institute Shanghai Jiao Tong University University of Texas at Austin Department of Computer Science and Engineering MoE Key Lab of Artificial Intelligence AI Institute Shanghai Jiao Tong University and School of Data Science Shenzhen Institute of Artificial Intelligence and Robotics for Society The Chinese University of Hong Kong Shenzhen

ISBN: (纸本)9781713845393

There are two types of deep generative models: explicit and implicit. The former defines an explicit density form that allows likelihood inference; while the latter targets a flexible transformation from random noise to generated samples. While the two classes of generative models have shown great power in many applications, both of them, when used alone, suffer from respective limitations and drawbacks. To take full advantages of both models and enable mutual compensation, we propose a novel joint training framework that bridges an explicit (unnormalized) density estimator and an implicit sample generator via Stein discrepancy. We show that our method 1) induces novel mutual regularization via kernel Sobolev norm penalization and Moreau-Yosida regularization, and 2) stabilizes the training dynamics. Empirically, we demonstrate that proposed method can facilitate the density estimator to more accurately identify data modes and guide the generator to output higher-quality samples, comparing with training a single counterpart. The new approach also shows promising results when the training samples are contaminated or limited.

关键词：

Incrementally zero-shot detection by an extreme value analyzer

学校读者我要写书评

暂无评论

arXiv 2021年

作者： Zheng, Sixiao Fu, Yanwei Hou, Yanxi Academy for Engineering & Technology Fudan University Shanghai Engineering Research Center of AI& Robotics Engineering Research Center of AI & Robotics Ministry of Education China School of Data Science MOE Frontiers Center for Brain Science Shanghai Key Lab of Intelligent Information Processing Fudan University China School of Data Science Fudan University China

Human beings not only have the ability to recognize novel unseen classes, but also can incrementally incorporate the new classes to existing knowledge preserved. However, zero-shot learning models assume that all seen classes should be known beforehand, while incremental learning models cannot recognize unseen classes. This paper introduces a novel and challenging task of Incrementally Zero-Shot Detection (IZSD), a practical strategy for both zero-shot learning and class-incremental learning in real-world object detection. An innovative end-to-end model – IZSD-EVer was proposed to tackle this task that requires incrementally detecting new classes and detecting the classes that have never been seen. Specifically, we propose a novel extreme value analyzer to detect objects from old seen, new seen, and unseen classes, simultaneously. Additionally and technically, we propose two innovative losses, i.e., background-foreground mean squared error loss alleviating the extreme imbalance of the background and foreground of images, and projection distance loss aligning the visual space and semantic spaces of old seen classes. Experiments demonstrate the efficacy of our model in detecting objects from both the seen and unseen classes, outperforming the alternative models on Pascal VOC and MSCOCO datasets. © 2021, CC BY-NC-SA.

关键词： Object detection

SEA: Sentence encoder assembly for video retrieval by textual queries

学校读者我要写书评

暂无评论

arXiv 2020年

作者： Li, Xirong Zhou, Fangming Xu, Chaoxi Ji, Jiaqi Yang, Gang Key Lab of Data Engineering and Knowledge Engineering AI & Media Computing Lab School of Information Renmin University of China Beijing100872 China

Retrieving unlabeled videos by textual queries, known as Ad-hoc Video Search (AVS), is a core theme in multimedia data management and retrieval. The success of AVS counts on cross-modal representation learning that encodes both query sentences and videos into common spaces for semantic similarity computation. Inspired by the initial success of previously few works in combining multiple sentence encoders, this paper takes a step forward by developing a new and general method for effectively exploiting diverse sentence encoders. The novelty of the proposed method, which we term Sentence Encoder Assembly (SEA), is two-fold. First, different from prior art that use only a single common space, SEA supports text-video matching in multiple encoder-specific common spaces. Such a property prevents the matching from being dominated by a specific encoder that produces an encoding vector much longer than other encoders. Second, in order to explore complementarities among the individual common spaces, we propose multi-space multi-loss learning. As extensive experiments on four benchmarks (MSR-VTT, TRECVID AVS 2016-2019, TGIF and MSVD) show, SEA surpasses the state-of-the-art. In addition, SEA is extremely ease to implement. All this makes SEA an appealing solution for AVS and promising for continuously advancing the task by harvesting new sentence encoders. © 2020, CC-BY.

关键词： Signal encoding

Fast Quaternion Product Units for Learning Disentangled Representations in SO(3)

学校读者我要写书评

暂无评论

TechRxiv

TechRxiv 2022年

作者： Qin, Shaofei Zhang, Xuan Xu, Hongteng Xu, Yi MoE Key Lab of Artificial Intelligence AI Institute Shanghai Jiao Tong University Shanghai China Department of Computer Science & Engineering Texas A&M University College StationTX77843 United States Gaoling School of Artificial Intelligence Renmin University of China Beijing Key Laboratory of Big Data Management and Analysis Methods Beijing China

Real-world 3D structured data like point clouds and skeletons often can be represented as data in a 3D rotation group (denoted as SO(3)). However, most existing neural networks are tailored for the data in the Euclidean space, which makes the 3D rotation data not closed under their algebraic operations and leads to sub-optimal performance in 3D-related learning tasks. To resolve the issues caused by the above mismatching between data and model, we propose a novel non-real neuron model called quaternion product unit (QPU) to represent data on 3D rotation groups. The proposed QPU leverages quaternion algebra and the law of the 3D rotation group, representing 3D rotation data as quaternions and merging them via a weighted chain of Hamilton products. We demonstrate that the QPU mathematically maintains the SO(3) structure of the 3D rotation data during the inference process and disentangles the 3D representations into "rotation-invariant" features and "rotation-equivariant" features, respectively. Moreover, we design a fast QPU to accelerate the computation of QPU. The fast QPU applies a tree-structured data indexing process, and accordingly, leverages the power of parallel computing, which reduces the computational complexity of QPU in a single thread from O(N) to O(log N). Taking the fast QPU as a basic module, we develop a series of quaternion neural networks (QNNs), including quaternion multi-layer perceptron (QMLP), quaternion message passing (QMP), and so on. In addition, we make the QNNs compatible with conventional real-valued neural networks and applicable for both skeletons and point clouds. Experiments on synthetic and real-world 3D tasks show that the QNNs based on our fast QPUs are superior to state-of-the-art real-valued models, especially in the scenarios requiring the robustness to random rotations. © 2022, CC BY.

关键词： 3D modeling

Simple and Deep Graph Convolutional Networks

学校读者我要写书评

暂无评论

arXiv 2020年

作者： Chen, Ming Wei, Zhewei Huang, Zengfeng Ding, Bolin Li, Yaliang School of Information Renmin University of China China Gaoling School of Articial Intelligence Renmin University of China China Beijing Key Lab of Big Data Management and Analysis Methods China MOE Key Lab of Data Engineering and Knowledge Engineering China School of Data Science Fudan University China Alibaba Group

Graph convolutional networks (GCNs) are a powerful deep learning approach for graph-structured data. Recently, GCNs and subsequent variants have shown superior performance in various application areas on real-world datasets. Despite their success, most of the current GCN models are shallow, due to the over-smoothing problem. In this paper, we study the problem of designing and analyzing deep graph convolutional networks. We propose the GCNII, an extension of the vanilla GCN model with two simple yet effective techniques: Initial residual and Identity mapping. We provide theoretical and empirical evidence that the two techniques effectively relieves the problem of over-smoothing. Our experiments show that the deep GCNII model outperforms the state-of-the-art methods on various semi- and full-supervised tasks. Code is available at https://***/chennnM/GCNII. Copyright © 2020, The Authors. All rights reserved.

关键词： Convolutional neural networks

Breaking the moments condition barrier: no-regret algorithm for bandits with super heavy-tailed payoffs 21

学校读者我要写书评

暂无评论

Breaking the moments condition barrier: no-regret algorithm ...

Proceedings of the 35th International Conference on Neural Information Processing Systems

作者： Han Zhong Jiayi Huang Lin F. Yang Liwei Wang Center for Data Science Peking University Center for Data Science Peking University Pazhou Lab Department of Electrical and Computer Engineering University of California Los Angles Key Laboratory of Machine Perception MOE School of EECS Institute for Artificial Intelligence Peking University

ISBN: (纸本)9781713845393

Despite a large amount of effort in dealing with heavy-tailed error in machine learning, little is known when moments of the error can become non-existential: the random noise η satisfies Pr[|η| > |y|] ≤ 1/|y|α for some α > 0. We make the first attempt to actively handle such super heavy-tailed noise in bandit learning problems: We propose a novel robust statistical estimator, mean of medians, which estimates a random variable by computing the empirical mean of a sequence of empirical medians. We then present a generic reductionist algorithmic framework for solving bandit learning problems (including multi-armed and linear bandit problem): the mean of medians estimator can be applied to nearly any bandit learning algorithm as a black-box filtering for its reward signals and obtain similar regret bound as if the reward is sub-Gaussian. We show that the regret bound is near-optimal even with very heavy-tailed noise. We also empirically demonstrate the effectiveness of the proposed algorithm, which further corroborates our theoretical results.

关键词：

A Framework of Meta Functional Learning for Regularising knowledge Transfer

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Li, Pan Fu, Yanwei Gong, Shaogang The School of Electrical Engineering and Computer Science Queen Mary University of London LondonE1 4NS United Kingdom The School of Data Science Fudan University Shanghai Key Lab of Intelligent Information Processing Fudan University China The MOE Frontiers Center for Brain Science Fudan University China

Machine learning classifiers' capability is largely dependent on the scale of available training data and limited by the model overfitting in data-scarce learning tasks. To address this problem, this work proposes a novel framework of Meta Functional Learning (MFL) by meta-learning a generalisable functional model from data-rich tasks whilst simultaneously regularising knowledge transfer to data-scarce tasks. The MFL computes meta-knowledge on functional regularisation generalisable to different learning tasks by which functional training on limited labelled data promotes more discriminative functions to be learned. Based on this framework, we formulate three variants of MFL: MFL with Prototypes (MFL-P) which learns a functional by auxiliary prototypes, Composite MFL (ComMFL) that transfers knowledge from both functional space and representational space, and MFL with Iterative Updates (MFL-IU) which improves knowledge transfer regularisation from MFL by progressively learning the functional regularisation in knowledge transfer. Moreover, we generalise these variants for knowledge transfer regularisation from binary classifiers to multi-class classifiers. Extensive experiments on two few-shot learning scenarios, Few-Shot Learning (FSL) and Cross-Domain Few-Shot Learning (CD-FSL), show that meta functional learning for knowledge transfer regularisation can improve FSL classifiers. Copyright © 2022, The Authors. All rights reserved.

关键词： knowledge management

High-Resolution Natural Image Matting by Refining Low-resolution Alpha Mattes

学校读者我要写书评

暂无评论

IEEE transactions on image processing : a publication of the IEEE Signal Processing Society 2025年 PP卷 PP页

作者： Xianmin Ye Yihui Liang Mian Tan Fujian Feng Lin Wang Han Huang College of Data Science and Information Engineering Guizhou Minzu University Guiyang China Guizhou Key Laboratory of Pattern Recognition and Intelligent System Guizhou Minzu University Guiyang China School of Computer Science Zhongshan Institute University of Electronic Science and Technology of China Zhongshan China Guizhou Key Laboratory of Pattern Recognition and Intelligent System and the College of Data Science and Information Engineering Guizhou Minzu University Guiyang China School of Software Engineering South China University of Technology Guangzhou China Key Laboratory of Big Data and Intelligent Robot (SCUT) MOE of China Guangzhou China Guangdong Engineering Center for Large Model and GenAI Technology Guangzhou China Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education Jilin University Changchun China

High-resolution natural image matting plays an important role in image editing, film-making and remote sensing due to its ability of accurately extract the foreground from a natural background. However, due to the complexity brought about by the proliferation of resolution, the existing image matting methods cannot obtain high-quality alpha mattes on high-resolution images in reasonable time. To overcome this challenge, we introduce a high-resolution image matting framework based on alpha matte refinement from low-resolution to high-resolution (HRIMF-AMR). The proposed framework transforms the complex high-resolution image matting problem into low-resolution image matting problem and high-resolution alpha matte refinement problem. While the first problem is solved by adopting an existing image matting method, the latter is addressed by applying the Detail Difference Feature Extractor (DDFE) designed as a part of our work. The DDFE extracts detail difference features from high-resolution images by measuring the image feature difference between high-resolution images and low-resolution images. The low-resolution alpha matte is refined according to the extracted detail difference feature, providing the high-resolution alpha matte. In addition, the Matte Detail Resolution Difference (MDRD) loss is introduced to train the DDFE, which imposes an additional constraint on the extraction of detail difference features with mattes. Experimental results show that integrating HRIMF-AMR significantly enhances the performance of existing matting methods on high-resolution images of Transparent-460 and Alphamatting. Project page: https://***/yexianmin/HRAMR-Matting.

关键词： Feature extraction Image resolution Image color analysis Optimization Remote sensing Clustering algorithms Accuracy Visualization Transforms Training

Deep Representation of Hierarchical Semantic Attributes for Zero-shot Learning

学校读者我要写书评

暂无评论

Deep Representation of Hierarchical Semantic Attributes for ...

International Joint Conference on Neural Networks (IJCNN)

作者： Zhaocheng Zhang Gang Yang School of Information Renmin University of China Beijing China Key Lab of Data Engineering and Knowledge Engineering Renmin University of China Beijing China

ISBN: (数字)9781728169262

ISBN: (纸本)9781728169279

On account of a large scale of dataset need to be annotated to fit for specific tasks, Zero-Shot Learning(ZSL) has invoked so much attention and got significant progress in recent research due to the prevalence of deep neural networks. At present, ZSL is mainly solved through the utilization of auxiliary information, such as semantic attributes and text descriptions. And then, we can employ the mapping method to bridge the gap between visual and semantic space. However, due to the lack of effective use of auxiliary information, this problem has not been solved well. Inspired by previous work, we consider that visual space can be used as the embedding space to get a stronger ability to express the precise characteristics of semantic information. Meanwhile, we take into account that there are some noise attributes in the annotated information of public datasets that need to be processed. Based on these considerations, we propose an end-to-end method with convolutional architecture, instead of conventionally linear projection, to provide a deep representation for semantic information to solve ZSL. Semantic features would express more detailed and precise information after being feed into our method. Besides, we use word embedding to generate some superclasses for original classes and propose a new loss function for these superclasses to assist in training. Experiments show that our method can get decent improvements for ZSL and Generalized Zero-Shot Learning(GZSL) on several public datasets.

关键词： Semantics Visualization Training Testing Task analysis Neural networks Probabilistic logic