检索结果-内蒙古大学图书馆

Can Gaussian sketching converge faster on a preconditioned landscape? 24

学校读者我要写书评

暂无评论

Can Gaussian sketching converge faster on a preconditioned l...

Proceedings of the 41st International Conference on machine learning

作者： Yilong Wang Haishan Ye Guang Dai Ivor W. Tsang Center for Intelligent Decision-Making and Machine Learning School of Management Xi'an Jiaotong University China and SGIT AI Lab State Grid Corporation of China SGIT AI Lab State Grid Corporation of China CFAR and IHPC Agency for Science Technology and Research (A*STAR) Singapore and College of Computing and Data Science NTU Singapore

This paper focuses on the large-scale optimization which is very popular in the big data era. The gradient sketching is an important technique in the large-scale optimization. Specifically, the random coordinate descent algorithm is a kind of gradient sketching method with the random sampling matrix as the sketching matrix. In this paper, we propose a novel gradient sketching called GSGD (Gaussian Sketched Gradient Descent). Compared with the classical gradient sketching methods such as the random coordinate descent and SEGA (Hanzely et al., 2018), our GSGD does not require the importance sampling but can achieve a fast convergence rate matching the ones of these methods with importance sampling. Furthermore, if the objective function has a non-smooth regularization term, our GSGD can also exploit the implicit structure information of the regularization term to achieve a fast convergence rate. Finally, our experimental results substantiate the effectiveness and efficiency of our algorithm.

关键词：

A Lightweight Network Model For Video Frame Interpolation Using Spatial Pyramids

学校读者我要写书评

暂无评论

A Lightweight Network Model For Video Frame Interpolation Us...

IEEE International Conference on Image Processing

作者： Jiankai Zhuang Zengchang Qin Jialu Chen Tao Wan Intelligent Computing and Machine Learning Lab School of ASEE Beihang University China School of BSME Beihang University China

ISBN: (数字)9781728163956

ISBN: (纸本)9781728163963

In recent years, deep learning based video frame interpolation methods have shown impressive results in handling occlusion, blur and large motion. However, they are usually very heavy in terms of model size, and they hardly to be employed in i.e. mobile phones or other portable devices with limited computing power. To address the problem, we propose light-weighted Spatial Pyramid Frame Interpolation Network (SPFIN), a hierarchical network in a coarse-to-fine approach to reconstruct frames. At each pyramid level, we apply two light sub-networks to model optical flow and visibility mask instead of commonly used U-Net architecture. The flow and mask are up-sampled and optimized progressively. Finally, the intermediate frame is formed by linearly blending warped frames and masks. Experimental results on two benchmark problems show that our model has the smallest size, but better or comparable performance comparing to existing state-of-the art models.

关键词： Interpolation Computational modeling Estimation Optical transmitters Training machine learning Integrated optics

DAM: Deliberation, abandon and memory networks for generating detailed and non-repetitive responses in visual dialogue 29

学校读者我要写书评

暂无评论

DAM: Deliberation, abandon and memory networks for generatin...

29th International Joint Conference on Artificial Intelligence, IJCAI 2020

作者： Jiang, Xiaoze Yu, Jing Sun, Yajing Qin, Zengchang Zhu, Zihao Hu, Yue Wu, Qi Institute of Information Engineering Chinese Academy of Sciences Beijing China Intelligent Computing and Machine Learning Lab School of ASEE Beihang University Beijing China School of Cyber Security University of Chinese Academy of Sciences Beijing China AI Research Codemao Inc University of Adelaide Australia

ISBN: (纸本)9780999241165

Visual Dialogue task requires an agent to be engaged in a conversation with human about an image. The ability of generating detailed and non-repetitive responses is crucial for the agent to achieve human-like conversation. In this paper, we propose a novel generative decoding architecture to generate high-quality responses, which moves away from decoding the whole encoded semantics towards the design that advocates both transparency and flexibility. In this architecture, word generation is decomposed into a series of attention-based information selection steps, performed by the novel recurrent Deliberation, Abandon and Memory (DAM) module. Each DAM module performs an adaptive combination of the response-level semantics captured from the encoder and the word-level semantics specifically selected for generating each word. Therefore, the responses contain more detailed and non-repetitive descriptions while maintaining the semantic accuracy. Furthermore, DAM is flexible to cooperate with existing visual dialogue encoders and adaptive to the encoder structures by constraining the information selection mode in DAM. We apply DAM to three typical encoders and verify the performance on the VisDial v1.0 dataset. Experimental results show that the proposed models achieve new state-of-the-art performance with high-quality responses. The code is available at https://***/JXZe/DAM. © 2020 Inst. Sci. inf., Univ. Defence in Belgrade. All rights reserved.

关键词： Semantics

KBGN: Knowledge-bridge graph network for adaptive vision-text reasoning in visual dialogue

学校读者我要写书评

暂无评论

arXiv 2020年

作者： Jiang, Xiaoze Du, Siyi Qin, Zengchang Sun, Yajing Yu, Jing Intelligent Computing and Machine Learning Lab School of ASEE Beihang University Beijing China AI Research Codemao Inc Institute of Information Engineering Chinese Academy of Sciences Beijing China

Visual dialogue is a challenging task that needs to extract implicit information from both visual (image) and textual (dialogue history) contexts. Classical approaches pay more attention to the integration of the current question, vision knowledge and text knowledge, despising the heterogeneous semantic gaps between the cross-modal information. In the meantime, the concatenation operation has become de-facto standard to the cross-modal information fusion, which has a limited ability in information retrieval. In this paper, we propose a novel Knowledge-Bridge Graph Network (KBGN) model by using graph to bridge the cross-modal semantic relations between vision and text knowledge in fine granularity, as well as retrieving required knowledge via an adaptive information selection mode. Moreover, the reasoning clues for visual dialogue can be clearly drawn from intra-modal entities and inter-modal bridges. Experimental results on VisDial v1.0 and VisDial-Q datasets demonstrate that our model outperforms existing models with state-of-the-art results. Copyright © 2020, The Authors. All rights reserved.

关键词： Graph neural networks

Generalized label enhancement with sample correlations

学校读者我要写书评

暂无评论

arXiv 2020年

作者： Zheng, Qinghai Zhu, Jihua Tang, Haoyu Liu, Xinyuan Li, Zhongyu Lu, Huimin Lab of Vision Computing and Machine Learning School of Software Engineering Xi'an Jiaotong University Xi'an710049 China Environment Recognition & Intelligent Computation Laboratory Kyushu Institute of Technology Japan

Recently, label distribution learning (LDL) has drawn much attention in machine learning, where LDL model is learned from labelel instances. Different from single-label and multi-label annotations, label distributions describe the instance by multiple labels with different intensities and accommodate to more general scenes. Since most existing machine learning datasets merely provide logical labels, label distributions are unavailable in many real-world applications. To handle this problem, we propose two novel label enhancement methods, i.e., label Enhancement with Sample Correlations (LESC) and generalized label Enhancement with Sample Correlations (gLESC). More specifically, LESC employs a low-rank representation of samples in the feature space, and gLESC leverages a tensor multi-rank minimization to further investigate the sample correlations in both the feature space and label space. Benefitting from the sample correlations, the proposed methods can boost the performance of label enhancement. Extensive experiments on 14 benchmark datasets demonstrate the effectiveness and superiority of our methods. Copyright © 2020, The Authors. All rights reserved.

关键词： machine learning

Follow me up sports: New benchmark for 2d human keypoint recognition 1

学校读者我要写书评

暂无评论

2nd Chinese Conference on Pattern Recognition and Computer Vision, PRCV 2019

作者： Huang, Ying Sun, Bin Kan, Haipeng Zhuang, Jiankai Qin, Zengchang Alibaba Business School Hangzhou Normal University Hangzhou China Keep Inc Beijing China Intelligent Computing and Machine Learning Lab School of ASEE Beihang University Beijing China

ISBN: (数字)9783030317263

ISBN: (纸本)9783030317256

Human pose estimation has made significant advancement in recent years. However, the existing datasets are limited in their coverage of pose variety. In this paper, we introduce a novel benchmark "FollowMeUp Sports" that makes an important advance in terms of specific postures, self-occlusion and class balance, a contribution that we feel is required for future development in human body models. This comprehensive dataset was collected using an established taxonomy of over 200 standard workout activities with three different shot angles. The collected videos cover a wider variety of specific workout activities than previous datasets including push-up, squat and body moving near the ground with severe self-occlusion or occluded by some sport equipment and outfits. Given these rich images, we perform a detailed analysis of the leading human pose estimation approaches gaining insights for the success and failures of these methods. © Springer Nature Switzerland AG 2019.

关键词： Sports

CogTree: Cognition tree loss for unbiased scene graph generation

学校读者我要写书评

暂无评论

arXiv 2020年

作者： Yu, Jing Chai, Yuan Wang, Yujing Hu, Yue Wu, Qi Institute of Information Engineering Chinese Academy of Sciences Beijing China Intelligent Computing and Machine Learning Lab School of ASEE Beihang University Beijing China Key Laboratory of Machine Perception MOE School of EECS Peking University Beijing China University of Adelaide Australia

Scene graphs are semantic abstraction of images that encourage visual understanding and reasoning. However, the performance of Scene Graph Generation (SGG) is unsatisfactory when faced with biased data in real-world scenarios. Conventional debiasing research mainly studies from the view of balancing data distribution or learning unbiased models and representations, ignoring the correlations among the biased classes. In this work, we analyze this problem from a novel cognition perspective: automatically building a hierarchical cognitive structure from the biased predictions and navigating that hierarchy to locate the relationships, making the tail relationships receive more attention in a coarse-to-fine mode. To this end, we propose a novel debiasing Cognition Tree (CogTree) loss for unbiased SGG. We first build a cognitive structure CogTree to organize the relationships based on the prediction of a biased SGG model. The CogTree distinguishes remarkably different relationships at first and then focuses on a small portion of easily confused ones. Then, we propose a debiasing loss specially for this cognitive structure, which supports coarse-to-fine distinction for the correct relationships. The loss is model-agnostic and consistently boosting the performance of several state-of-the-art models. The code is available at: https://***/CYVincent/Scene-Graph-Transformer-CogTree. Copyright © 2020, The Authors. All rights reserved.

关键词： Forestry

Overview of the Tenth Dialog System Technology Challenge: DSTC10

学校读者我要写书评

暂无评论

IEEE/ACM Transactions on Audio Speech and Language Processing 2024年 32卷 765-778页

作者： Yoshino, Koichiro Chen, Yun-Nung Crook, Paul Kottur, Satwik Li, Jinchao Hedayatnia, Behnam Moon, Seungwhan Fei, Zhengcong Li, Zekang Zhang, Jinchao Feng, Yang Zhou, Jie Kim, Seokhwan Liu, Yang Jin, Di Papangelis, Alexandros Gopalakrishnan, Karthik Hakkani-Tur, Dilek Damavandi, Babak Geramifard, Alborz Hori, Chiori Shah, Ankit Zhang, Chen Li, Haizhou Sedoc, Joao D'haro, Luis F. Banchs, Rafael Rudnicky, Alexander Guardian Robot Project R-IH RIKEN 2-2-2 Hikaridai Seika Shoraku619-0288 Japan Information Science Nara Institute of Science and Technology Ikoma630-0101 Japan Computer Science and Information Engineering National Taiwan University Taipei10617 Taiwan Inc. Palo AltoCA95054 United States Alexa AI *** Inc. SunnyvaleCA94089 United States Meta Seattle RedmondWA98052 United States Institute of Computing Technology Chinese Academy of Sciences Beijing100190 China Key Laboratory of Intelligent Information Processing Institute of Computing Technology Chinese Academy of Sciences Beijing100190 China Tencent AI Lab Beijing Beijing China Kexueyuan South Road Zhongguancun Beijing100190 China Beijing 100190 China Alexa AI *** Inc. SunnyvaleCA United States 1120 Enterprise way Sunnyvale94089 United States *** Inc. SeattleWA United States Menlo Park CA United States Audio and Speech Group Mitsubishi Electric Research Laboratories CambridgeMA02139-1955 United States Carnegie Mellon University Department of Language and Information Technologies or just Carnegie Mellon University Pittsburgh United States National University of Singapore Singapore Singapore Department of Electrical and Computer Engineering National University of Singapore Singapore Singapore Shenzhen Research Institute of Big Data School of Data Science Chinese University of Hong Kong Shenzhen518172 China New York University New YorkNY United States ETSI de Telecomunicacion - Speech Technology and Machine Learning Group Universidad Politecnica de Madrid Ciudad Universitaria Madrid28040 Spain Nanyang Technological University Singapore Singapore Carnegie Mellon University PittsburghPA United States

This article introduces the Tenth Dialog System Technology Challenge (DSTC-10). This edition of the DSTC focuses on applying end-to-end dialog technologies for five distinct tasks in dialog systems, namely 1. Incorporation of Meme images into open domain dialogs, 2. Knowledge-grounded Task-oriented Dialogue Modeling on Spoken Conversations, 3. Situated Interactive Multimodal dialogs, 4. Reasoning for Audio Visual Scene-Aware Dialog, and 5. Automatic Evaluation and Moderation of Open-domainDialogue Systems. This article describes the task definition, provided datasets, baselines, and evaluation setup for each track. We also summarize the results of the submitted systems to highlight the general trends of the state-of-the-art technologies for the tasks. © 2023 The Authors.

关键词： Job analysis

Tackling Instance-Dependent label Noise via a Universal Probabilistic Model

学校读者我要写书评

暂无评论

arXiv 2021年

作者： Wang, Qizhou Han, Bo Liu, Tongliang Niu, Gang Yang, Jian Gong, Chen Key Laboratory of Intelligent Perception and Systems for High-Dimensional Information of MoE School of Computer Science and Engineering Nanjing University of Science and Technology China Department of Computer Science Hong Kong Baptist University Hong Kong Trustworthy Machine Learning Lab School of Computer Science Faculty of Engineering The University of Sydney Australia Japan Department of Computing Hong Kong Polytechnic University Hong Kong

The drastic increase of data quantity often brings the severe decrease of data quality, such as incorrect label annotations, which poses a great challenge for robustly training Deep Neural Networks (DNNs). Existing learning methods with label noise either employ ad-hoc heuristics or restrict to specific noise assumptions. However, more general situations, such as instance-dependent label noise, have not been fully explored, as scarce studies focus on their label corruption process. By categorizing instances into confusing and unconfusing instances, this paper proposes a simple yet universal probabilistic model, which explicitly relates noisy labels to their instances. The resultant model can be realized by DNNs, where the training procedure is accomplished by employing an alternating optimization algorithm. Experiments on datasets with both synthetic and real-world label noise verify that the proposed method yields significant improvements on robustness over state-of-the-art counterparts. Copyright © 2021, The Authors. All rights reserved.

关键词： Deep neural networks