检索结果-内蒙古大学图书馆

Research Square 2021年

作者： Zhao, Changming Wu, Dongrui Huang, Jian Yuan, Ye Zhang, Hai-Tao Peng, Ruimin Shi, Zhenhua Guo, Chenfeng Key Laboratory of the Ministry of Education for Image Processing and Intelligent Control School of Artificial Intelligence and Automation Huazhong University of Science and Technology Wuhan China

Bootstrap aggregating (Bagging) and boosting are two popular ensemble learning approaches, which combine multiple base learners to generate a composite model for more accurate and more reliable performance. They have been widely used in biology, engineering, healthcare, etc. This article proposes BoostForest, which is an ensemble learning approach using BoostTree as base learners and can be used for both classification and regression. BoostTree constructs a tree model by gradient boosting. It achieves high randomness (diversity) by sampling its parameters randomly from a parameter pool, and selecting a subset of features randomly at node splitting. BoostForest further increases the randomness by bootstrapping the training data in constructing different BoostTrees. BoostForest outperformed four classical ensemble learning approaches (Random Forest, Extra-Trees, XGBoost and LightGBM) on 34 classification and regression datasets. Remarkably, BoostForest has only one hyper-parameter (the number of BoostTrees), which can be easily specified. Our code is publicly available, and the proposed ensemble learning framework can also be used to combine many other base learners. © 2021, CC BY.

关键词： Classification (of information)

来源：评论

学校读者我要写书评

暂无评论

Multi-Task Active Learning for Simultaneous Emotion Classification and Regression

Multi-Task Active Learning for Simultaneous Emotion Classifi...

引用

IEEE International Conference on Systems, Man and Cybernetics

作者： Xue Jiang Lubin Meng Dongrui Wu Key Laboratory of the Ministry of Education for Image Processing and Intelligent Control School of Artificial Intelligence and Automation Huazhong University of Science and Technology Wuhan China

ISBN: (纸本)9781665442084

Emotion recognition, which aims to identify an individual’s emotional state from the acquired physiological or body signals, is very important in affective computing. Emotions have two common representations: categorical, e.g., happy, sad, etc., and dimensional (continuous), e.g., valence, arousal and dominance. Training a good emotion classification or regression model usually requires a large number of labeled data. However, the labeling process is very difficult. As emotions are subtle and uncertain, it usually requires multiple assessors to label each emotional instance to obtain the groundtruth categorical label or dimensional values. In this paper, we propose a multi-task active learning (MTAL) framework to query the most useful samples for labeling, which enables the efficient training of an emotion classification model and multiple emotion regression models simultaneously. This is novel and challenging, as all previous research considered only emotion classification or regression alone, but not simultaneously. Experimental results on the IEMOCAP dataset demonstrated that MTAL outperformed random selection and several state-of-the-art single task active learning approaches, i.e., with the same number of labeled samples, MTAL can obtain better emotion classification and regression models simultaneously.

关键词： Training Emotion recognition Conferences Multitasking Physiology Data models Classification algorithms

来源：评论

学校读者我要写书评

暂无评论

Proposal relation network for temporal action detection

arXiv

引用

arXiv 2021年

作者： Wang, Xiang Qing, Zhiwu Huang, Ziyuan Feng, Yutong Zhang, Shiwei Jiang, Jianwen Tang, Mingqian Gao, Changxin Sang, Nong Key Laboratory of Image Processing and Intelligent Control School of Artificial Intelligence and Automation Huazhong University of Science and Technology China Alibaba Group China

This technical report presents our solution for temporal action detection task in AcitivityNet Challenge 2021. The purpose of this task is to locate and identify actions of interest in long untrimmed videos. The crucial challenge of the task comes from that the temporal duration of action varies dramatically, and the target actions are typically embedded in a background of irrelevant activities. Our solution builds on BMN [10], and mainly contains three steps: 1) action classification and feature encoding by Slowfast [6], CSN [13] and ViViT [1];2) proposal generation. We improve BMN by embedding the proposed Proposal Relation Network (PRN), by which we can generate proposals of high quality;3) action detection. We calculate the detection results by assigning the proposals with corresponding classification results. Finally, we ensemble the results under different settings and achieve 44.7% on the test set, which improves the champion result in ActivityNet 2020 [17] by 1.9% in terms of average mAP. Copyright © 2021, The Authors. All rights reserved.

关键词： Machine learning

来源：评论

学校读者我要写书评

暂无评论

TransView: Inside, Outside, and Across the Cropping View Boundaries

TransView: Inside, Outside, and Across the Cropping View Bou...

引用

International Conference on Computer Vision (ICCV)

作者： Zhiyu Pan Zhiguo Cao Kewei Wang Hao Lu Weicai Zhong Key Laboratory of Image Processing and Intelligent Control Ministry of Education School of Artificial Intelligence and Automation Huazhong University of Science and Technology Huawei CBG

ISBN: (纸本)9781665428132

We show that relation modeling between visual elements matters in cropping view recommendation. Cropping view recommendation addresses the problem of image recomposition conditioned on the composition quality and the ranking of views (cropped sub-regions). This task is challenging because the visual difference is subtle when a visual element is reserved or removed. Existing methods represent visual elements by extracting region-based convolutional features inside and outside the cropping view boundaries, without probing a fundamental question: why some visual elements are of interest or of discard? In this work, we observe that the relation between different visual elements significantly affects their relative positions to the desired cropping view, and such relation can be characterized by the attraction inside/outside the cropping view boundaries and the repulsion across the boundaries. By instantiating a transformer-based solution that represents visual elements as visual words and that models the dependencies between visual words, we report not only state-of-the-art performance on public benchmarks, but also interesting visualizations that depict the attraction and repulsion between visual elements, which may shed light on what makes for effective cropping view recommendation.

关键词： Visualization Computer vision Computational modeling Benchmark testing Feature extraction Transformers Decoding

来源：评论

学校读者我要写书评

暂无评论

Adversarial Semantic Data Augmentation for Human Pose Estimation 16th

Adversarial Semantic Data Augmentation for Human Pose Estima...

引用

16th European Conference on Computer Vision, ECCV 2020

作者： Bin, Yanrui Cao, Xuan Chen, Xinya Ge, Yanhao Tai, Ying Wang, Chengjie Li, Jilin Huang, Feiyue Gao, Changxin Sang, Nong Key Laboratory of Image Processing and Intelligent Control School of Artificial Intelligence and Automation Huazhong University of Science and Technology Wuhan China Tencent Youtu Lab Shanghai China

ISBN: (纸本)9783030585280

Human pose estimation is the task of localizing body keypoints from still images. The state-of-the-art methods suffer from insufficient examples of challenging cases such as symmetric appearance, heavy occlusion and nearby person. To enlarge the amounts of challenging cases, previous methods augmented images by cropping and pasting image patches with weak semantics, which leads to unrealistic appearance and limited diversity. We instead propose Semantic Data Augmentation (SDA), a method that augments images by pasting segmented body parts with various semantic granularity. Furthermore, we propose Adversarial Semantic Data Augmentation (ASDA), which exploits a generative network to dynamically predict tailored pasting configuration. Given off-the-shelf pose estimation network as discriminator, the generator seeks the most confusing transformation to increase the loss of the discriminator while the discriminator takes the generated sample as input and learns from it. The whole pipeline is optimized in an adversarial manner. state-of-the-art results are achieved on challenging benchmarks. The code has been publicly available at https://***/Binyr/ASDA. © 2020, Springer Nature Switzerland AG.

关键词： Semantics

来源：评论

学校读者我要写书评

暂无评论

Densely activated self-attention for semantic segmentation

引用

Pattern Recognition 2025年

作者： Liwen Xiao Wenze Liu Zhicheng Wang Yiran Wang Zhiyu Pan Hao Lu Zhiguo Cao Key Laboratory of Image Processing and Intelligent Control Ministry of Education School of Artificial Intelligence and Automation Huazhong University of Science and Technology Wuhan 430074 China

In semantic segmentation, pixels often share the same mask labels across a vast region. However, in recent prevalent transformer-based models, predictions frequently suffer from incompleteness or discontinuities. The dilemma is caused by sparse activation of vanilla self-attention during feature extraction. During the vanilla self-attention, each query over-focuses on a small number of relevant keys but neglects numerous keys sharing the same category with it, restricting the capture of universal feature representation for its category. Such sparsely activated self-attention will further cause unignorable feature differences among tokens sharing the same class in feature maps, introducing noise on final predictions. To reduce such differences, we propose the Densely Activated self-attention Module (DAM), a novel pluggable module designed to generate densely activated self-attention. Inserted after the encoder, it encourages each query to attend to a broader range of keys, obtaining more consistent features. Experimental results on three widely used benchmarks with six different baselines demonstrate that DAM consistently improves performance with a negligible increase in parameters and FLOPs. Our work provides a new perspective on the behavior of self-attention in semantic segmentation.

关键词：

来源：评论

学校读者我要写书评

暂无评论

A method for detecting floating objects on water based on edge computing

A method for detecting floating objects on water based on ed...

引用

2023 IEEE International Symposium on Broadband Multimedia Systems and Broadcasting, BMSB 2023

作者： Li, He Yang, Shuaipeng Liu, Jinjiang Fang, Honglin Fu, Zhumu Zhang, Rui Jia, Huimei Lv, Lianmeng Henan Costar Group Co. Ltd Henan Nanyang473003 China Henan University of Science and Technology College of Information Engineering Henan Luoyang471000 China Nanyang Normal University Henan Engineering Research Center of Intelligent Processing for Big Data of Digital Image School of Computer Science and Technology Nanyang473061 China Beijing University of Posts and Telecommunication State Key Laboratory of Networking and Switching Technology Beijing100876 China Xi'an Hengpin Electronic Technology Co. Ltd Xi'an710100 China

ISBN: (纸本)9798350321524

With the development and application of computer vision, many target detection networks are applied to the detection of floating objects in rivers. For the detection problems such as small targets easily missed and misdetected in water surface floating object detection tasks and difficult to deploy models. An edge computing-oriented approach to river floater detection is proposed. First, a four-fold down-sampling feature layer is added to the YOLOv5 network which enhances more target detail features and improves the detection capability of small objects. Second, CA (Coordinate Attention) is added to the Backbone to suppress background noise interference, and different pooling is used to accommodate different hierarchical features. Then, a bilinear interpolation method is adopted for up-sampling to avoid the loss of small object features. Design a data enhancement algorithm for small targets based on Mosaic to increase the number of small objects and enrich the training background. Finally, for the edge computing architecture platform, the channel pruning algorithm is used to prune and compress the model structure to adapt to the computing capability of edge devices. The experimental results show that the method can effectively improve the detection capability of the network for floating objects on the water surface. The detection accuracy can reach 93.6%, and the detection speed can be maintained at 36 frames per second, which can achieve high-precision real-time detection of floating objects on the water surface. © 2023 IEEE.

关键词： Object detection

来源：评论

学校读者我要写书评

暂无评论

A2J-Transformer: Anchor-to-Joint Transformer Network for 3D Interacting Hand Pose Estimation from a Single RGB image

A2J-Transformer: Anchor-to-Joint Transformer Network for 3D ...

引用

Conference on Computer Vision and Pattern Recognition (CVPR)

作者： Changlong Jiang Yang Xiao Cunlin Wu Mingyang Zhang Jinghong Zheng Zhiguo Cao Joey Tianyi Zhou Key Laboratory of Image Processing and Intelligent Control Ministry of Education School of Artificial Intelligence and Automation Huazhong University of Science and Technology Wuhan China Alibaba Group Centre for Frontier AI Research Agency for Science Technology and Research (A*STAR) Singapore Institute of High Performance Computing Agency for Science Technology and Research (A*STAR) Singapore

3D interacting hand pose estimation from a single RGB image is a challenging task, due to serious self-occlusion and inter-occlusion towards hands, confusing similar appearance patterns between 2 hands, ill-posed joint position mapping from 2D to 3D, etc.. To address these, we propose to extend A2J-the state-of-the-art depth-based 3D single hand pose estimation method-to RGB domain under interacting hand condition. Our key idea is to equip A2J with strong local-global aware ability to well capture interacting hands' local fine details and global articulated clues among joints jointly. To this end, A2J is evolved under Transformer's non-local encoding-decoding framework to build A2J- Transformer. It holds 3 main advantages over A2J. First, self-attention across local anchor points is built to make them global spatial context aware to better capture joints' articulation clues for resisting occlusion. Secondly, each anchor point is regarded as learnable query with adaptive feature learning for facilitating pattern fitting capacity, instead of having the same local representation with the others. Last but not least, anchor point locates in 3D space instead of 2D as in A2J, to leverage 3D pose prediction. Experiments on challenging InterHand 2.6M demonstrate that, A2J-Transformer can achieve state-of-the-art model-free performance (3.38mm MPJPE advancement in 2-hand case) and can also be applied to depth domain with strong generalization. The code is avaliable at https://***/ChanglongJiangGit/A2J-Transformer.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Real-time Multi-person Eyeblink Detection in the Wild for Untrimmed Video

arXiv

引用

arXiv 2023年

作者： Zeng, Wenzheng Xiao, Yang Wei, Sicheng Gan, Jinfang Zhang, Xintao Cao, Zhiguo Fang, Zhiwen Zhou, Joey Tianyi Key Laboratory of Image Processing and Intelligent Control Ministry of Education School of Artificial Intelligence and Automation Huazhong University of Science and Technology Wuhan430074 China School of Biomedical Engineering Southern Medical University Guangzhou510515 China Department of Rehabilitation Medicine Zhujiang Hospital Southern Medical University Guangzhou510280 China Singapore Singapore

Real-time eyeblink detection in the wild can widely serve for fatigue detection, face anti-spoofing, emotion analysis, etc. The existing research efforts generally focus on single-person cases towards trimmed video. However, multi-person scenario within untrimmed videos is also important for practical applications, which has not been well concerned yet. To address this, we shed light on this research field for the first time with essential contributions on dataset, theory, and practices. In particular, a large-scale dataset termed MPEblink that involves 686 untrimmed videos with 8748 eyeblink events is proposed under multi-person conditions. The samples are captured from unconstrained films to reveal "in the wild" characteristics. Meanwhile, a real-time multi-person eyeblink detection method is also proposed. Being different from the existing counterparts, our proposition runs in a one-stage spatio-temporal way with end-to-end learning capacity. Specifically, it simultaneously addresses the sub-tasks of face detection, face tracking, and human instance-level eyeblink detection. This paradigm holds 2 main advantages: (1) eyeblink features can be facilitated via the face's global context (e.g., head pose and illumination condition) with joint optimization and interaction, and (2) addressing these sub-tasks in parallel instead of sequential manner can save time remarkably to meet the real-time running requirement. Experiments on MPEblink verify the essential challenges of real-time multi-person eyeblink detection in the wild for untrimmed video. Our method also outperforms existing approaches by large margins and with a high inference speed. Copyright © 2023, The Authors. All rights reserved.

关键词： Face recognition

来源：评论

学校读者我要写书评

暂无评论

Watch the Speakers: A Hybrid Continuous Attribution Network for Emotion Recognition in Conversation With Emotion Disentanglement

Watch the Speakers: A Hybrid Continuous Attribution Network ...

引用

International Conference on Tools for Artificial Intelligence (ICTAI)

作者： Shanglin Lei Xiaoping Wang Guanting Dong Jiang Li Yingjian Liu School of Artificial Intelligence and Automation Huazhong University of Science and Technology Wuhan China Key Laboratory of Image Processing and Intelligent Control of Education Ministry of China Wuhan China School of Artificial Intelligence Beijing University of Posts and Telecommunications Beijing China Institute of Artificial Intelligence Huazhong University of Science and Technology (HUST) Wuhan China

Emotion Recognition in Conversation (ERC) has attracted widespread attention in the natural language processing field due to its enormous potential for practical applications. Existing ERC methods face challenges in achieving generalization to diverse scenarios due to insufficient modeling of context, ambiguous capture of dialogue relationships and overfitting in speaker modeling. In this work, we present a Hybrid Continuous Attributive Network (HCAN) to address these issues in the perspective of emotional continuation and emotional attribution. Specifically, HCAN adopts a hybrid recurrent and attention-based module to model global emotion continuity. Then a novel Emotional Attribution Encoding (EAE) is proposed to model intra- and inter-emotional attribution for each utterance. Moreover, aiming to enhance the robustness of the model in speaker modeling and improve its performance in different scenarios, A comprehensive loss function emotional cognitive loss $\mathcal{L}_{EC}$ is proposed to alleviate emotional drift and overcome the overfitting of the model to speaker modeling. Our model achieves state-of-the-art performance on three datasets, demonstrating the superiority of our work. Another extensive comparative experiments and ablation studies on three benchmarks are conducted to provided evidence to support the efficacy of each module. Further exploration of generalization ability experiments shows the plug-and-play nature of the EAE module in our method.

关键词：

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：