检索结果-内蒙古大学图书馆

Parsing Objects at a Finer Granularity: A Survey

Machine Intelligence Research 2024年第3期21卷 431-451页

作者： Yifan Zhao Jia Li Yonghong Tian School of Computer Science Peking UniversityBeijing100871China State Key Laboratory of Virtual Reality Technology and Systems School of Computer Science and EngineeringBeihang UniversityBeijing100191China

Fine-grained visual parsing, including fine-grained part segmentation and fine-grained object recognition, has attracted considerable critical attention due to its importance in many real-world applications, e.g., agriculture, remote sensing, and space technologies. Predominant research efforts tackle these fine-grained sub-tasks following different paradigms, while the inherent relations between these tasks are neglected. Moreover, given most of the research remains fragmented, we conduct an in-depth study of the advanced work from a new perspective of learning the part relationship. In this perspective, we first consolidate recent research and benchmark syntheses with new taxonomies. Based on this consolidation, we revisit the universal challenges in fine-grained part segmentation and recognition tasks and propose new solutions by part relationship learning for these important challenges. Furthermore, we conclude several promising lines of research in fine-grained visual parsing for future research.

关键词： Finer granularity visual parsing part segmentation fine-grained object recognition part relationship

来源：评论

学校读者我要写书评

暂无评论

Drogue detection for autonomous aerial refueling via hybrid pigeon-inspired optimized color opponent and saliency aggregation

引用

Chinese Journal of Aeronautics 2024年第5期37卷 27-38页

作者： Tongyan WU Haibin DUAN Yanming FAN State Key Laboratory of Virtual Reality Technology and Systems School of Automation Science and Electrical EngineeringBeihang UniversityBeijing 100083China Virtual Reality Fundamental Research Laboratory Department of Mathematics and TheoriesPeng Cheng LaboratoryShenzhen 518000China AVIC Shenyang Aircraft Design and Research Institute Shenyang 110035China

Drogue detection is one of the challenging tasks in autonomous aerial refueling due to the requirement for accuracy and *** detection based on image intrinsic cues can achieve fast detection,but with poor *** studies reveal that optimization-based methods provide accurate and quick solutions for saliency *** paper presents a hybrid pigeon-inspired optimization method,the optimized color opponent,that aims to adjust the weight of color opponent channels to detect the drogue *** can optimize the weights in the selected aerial refueling scene offline,and the results are applied for drogue detection in the scene.A novel algorithm aggregated by the optimized color opponent and robust background detection is presented to provide better precision and *** results on benchmark datasets and aerial refueling images show that the proposed method successfully extracts the saliency region or drogue and exhibits superior performance against the other saliency detection methods with intrinsic *** algorithm designed in this paper is competent for the drogue detection task of autonomous aerial refueling.

关键词： Autonomous aerial refueling Drones Hybrid pigeon-inspired optimization Color opponent Saliency detection Saliency aggregation

来源：评论

学校读者我要写书评

暂无评论

TalkingStyle: Personalized Speech-Driven 3D Facial Animation with Style Preservation

引用

IEEE Transactions on Visualization and Computer Graphics 2024年 PP卷 1-12页

作者： Song, Wenfeng Wang, Xuan Zheng, Shi Li, Shuai Hao, Aimin Hou, Xia Computer School Beijing Information Science and Technology University China State Key Laboratory of Virtual Reality Technology and Systems Beihang University China

It is a challenging task to create realistic 3D avatars that accurately replicate individuals' speech and unique talking styles for speech-driven facial animation. Existing techniques have made remarkable progress but still struggle to achieve lifelike mimicry. This paper proposes “TalkingStyle”, a novel method to generate personalized talking avatars while retaining the talking style of the person. Our approach uses a set of audio and animation samples from an individual to create new facial animations that closely resemble their specific talking style, synchronized with speech. We disentangle the style codes from the motion patterns, allowing our method to associate a distinct identifier with each person. To manage each aspect effectively, we employ three separate encoders for style, speech, and motion, ensuring the preservation of the original style while maintaining consistent motion in our stylized talking avatars. Additionally, we propose a new style-conditioned transformer decoder, offering greater flexibility and control over the facial avatar styles. We comprehensively evaluate TalkingStyle through qualitative and quantitative assessments, as well as user studies demonstrating its superior realism and lip synchronization accuracy compared to current state-of-the-art methods. To promote transparency and further advancements in the field, we also make the source code publicly available at https://***/wangxuanx/TalkingStyle. IEEE

关键词： Synchronization

来源：评论

学校读者我要写书评

暂无评论

D-scheduler:A scheduler in time-triggered distributed system through decoupling dependencies between tasks and messages

引用

Science China(Technological Sciences) 2024年第1期67卷 183-196页

作者： YANG TingTing ZHANG YuQi YUE FengLai WUNIRI QiQiGe TONG Chao School of Computer Science&Engineering Beihang UniversityBeijing 100191China State Key Laboratory of Virtual Reality Technology and Systems Beihang UniversityBeijing 100191China National Innovation Center of Intelligent and Connected Vehicles Beijing 100176China

Time-triggered architecture,as a mainstream design of the distributed real-time system,has been successfully applied in the aerospace,automotive and mechanical ***,time-triggered scheduling is a challenging NP-hard *** are few studies that could quickly solve the scheduling problem of large distributed time-triggered *** solve this problem,a communication affinity parameter is defined in this paper to describe the degree of bias of the shaper task towards sending or receiving *** on this,an innovative task-message decoupling model named D-scheduler is built to reduce the computation complexity of the scheduling problem in large-scale ***,we provide mathematical proof that our model is a convex optimization that is easy to solve with existing computational *** experiments substantiate the efficacy of the *** dramatically reduces the scheduling complexity of large-scale real-time systems with a small loss of solving space compared to the federal scheduler.

关键词： time-triggered architecture time-triggered scheduling communication affinity parameter task-message decoupling model

来源：评论

学校读者我要写书评

暂无评论

EM-Gaze:eye context correlation and metric learning for gaze estimation

引用

Visual Computing for Industry,Biomedicine,and Art 2023年第1期6卷 97-108页

作者： Jinchao Zhou Guoan Li Feng Shi Xiaoyan Guo Pengfei Wan Miao Wang State Key Laboratory of Virtual Reality Technology and Systems Beihang UniversityBeijing 100191China Kuaishou Technology Beijing 100085China

In recent years,deep learning techniques have been used to estimate gaze-a significant task in computer vision and human-computer *** studies have made significant achievements in predicting 2D or 3D gazes from monocular face *** study presents a deep neural network for 2D gaze estimation on mobile *** achieves state-of-the-art 2D gaze point regression error,while significantly improving gaze classification error on quadrant divisions of the *** this end,an efficient attention-based module that correlates and fuses the left and right eye contextual features is first proposed to improve gaze point regression ***,through a unified perspective for gaze estimation,metric learning for gaze classification on quadrant divisions is incorporated as additional ***,both gaze point regression and quadrant classification perfor-mances are *** experiments demonstrate that the proposed method outperforms existing gaze-estima-tion methods on the GazeCapture and MPIIFaceGaze datasets.

关键词： Computer vision Gaze estimation Metric learning Attention Multi-task learning

来源：评论

学校读者我要写书评

暂无评论

From animal collective behaviors to swarm robotic cooperation

引用

National Science Review 2023年第5期10卷 82-99页

作者： Haibin Duan Mengzhen Huo Yanming Fan State Key Laboratory of Virtual Reality Technology and Systems School of Automation Science and Electrical EngineeringBeihang University Virtual Reality Fundamental Research Laboratory Department of Mathematics and TheoriesPeng Cheng Laboratory AVIC Shenyang Aircraft Design and Research Institute

The collective behaviors of animals,from schooling fish to packing wolves and flocking birds,display plenty of fascinating phenomena that result from simple interaction rules among *** emergent intelligent properties of the animal collective behaviors,such as self-organization,robustness,adaptability and expansibility,have inspired the design of autonomous unmanned swarm *** article reviews several typical natural collective behaviors,introduces the origin and connotation of swarm intelligence,and gives the application case of animal collective *** this basis,the article focuses on the forefront of progress and bionic achievements of aerial,ground and marine robotics swarms,illustrating the mapping relationship from biological cooperative mechanisms to cooperative unmanned cluster ***,considering the significance of the coexisting-cooperative-cognitive human-machine system,the key technologies to be solved are given as the reference directions for the subsequent exploration.

关键词： collective behaviors swarm intelligence cooperative robotics swarm human-machine system

来源：评论

学校读者我要写书评

暂无评论

Saliency Based Data Augmentation for Few-Shot Video Action Recognition 31st

Saliency Based Data Augmentation for Few-Shot Video Action ...

引用

31st International Conference on Multimedia Modeling, MMM 2025

作者： Kong, Yongqiang Wang, Yunhong Li, Annan The State Key Laboratory of Virtual Reality Technology and Systems Beihang University Beijing100191 China

ISBN: (纸本)9789819620630

Despite the progress made in few-shot video action recognition, existing methods still struggle to achieve satisfactory performance when support samples are limited (e.g., 1-shot task). This paper proposes to augment training samples without relying on additional supervision and labor costs, aiming at improving generalizability of learned representations. We introduce a novel self-supervised salient object detection model which results in frame-level saliency and background features of videos. A shared encoder is employed to fuse saliency and background information from different videos. Both intra- and inter-class fusion are performed, in which the latter is controlled by prior probability to avoid semantic ambiguities. This way actually corresponds to augment training data in feature space. The saliency-background representations formed from query and support videos are used to construct class prototypes through Temporal-Relational CrossTransformers. Experimental results on four standard benchmarks demonstrate that the proposed method outperforms state-of-the-arts under various few-shot settings, particularly excelling in the 1-shot case. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.

关键词： Wages

来源：评论

学校读者我要写书评

暂无评论

Model-guided 3D stitching for augmented virtual environment

引用

Science China(Information Sciences) 2023年第1期66卷 115-130页

作者： Zhong ZHOU Ming MENG Yi ZHOU Zhe ZHU Jingdi YOU State Key Laboratory of Virtual Reality Technology and Systems Beihang University Department of Radiology Duke University Beijing Big View Technology Co. Ltd.

Image and video stitching have made tremendous progress in the construction of wide field-of-view(FOV). However, some long-term challenges still exist, including wide baselines between cameras,large parallaxes, and low texture in overlapping areas. The augmented virtual environment(AVE) captures videos as live textures of 3D models in a virtual environment, and provides another 3D solution to overcome the aforementioned challenges. Existing AVE methods primarily follow from video projection, and cannot produce satisfactory stitching results compared with image stitching. In this paper, we propose a novel model-guided 3D stitching algorithm for AVE. The algorithm recovers an approximate 3D model for each video streaming and optimizes the warping of the models to meet the requirements of feature point matching of the 3D models from adjacent videos. Compared with previous state-of-the-art methods, experiment results illustrate that our method significantly improves the stitching quality.

关键词： 3D stitching video model multiple video visualization augmented virtual environment

来源：评论

学校读者我要写书评

暂无评论

Joint self-supervised and reference-guided learning for depth inpainting

引用

Computational Visual Media 2022年第4期8卷 597-612页

作者： Heng Wu Kui Fu Yifan Zhao Haokun Song Jia Li State Key Laboratory of Virtual Reality Technology and Systems School of Computer Science and EngineeringBeihang UniversityBeijing 100191China

Depth information can benefit various computer vision tasks on both images and ***,depth maps may suffer from invalid values in many pixels,and also large *** improve such data,we propose a joint self-supervised and reference-guided learning approach for depth *** the self-supervised learning strategy,we introduce an improved spatial convolutional sparse coding module in which total variation regularization is employed to enhance the structural information while preserving edge *** module alternately learns a convolutional dictionary and sparse coding from a corrupted depth ***,both the learned convolutional dictionary and sparse coding are convolved to yield an initial depth map,which is effectively smoothed using local contextual *** reference-guided learning part is inspired by the fact that adjacent pixels with close colors in the RGB image tend to have similar depth *** thus construct a hierarchical joint bilateral filter module using the corresponding color image to fill in large *** summary,our approach integrates a convolutional sparse coding module to preserve local contextual information and a hierarchical joint bilateral filter module for filling using specific adjacent *** results show that the proposed approach works well for both invalid value restoration and large hole inpainting.

关键词： depth inpainting self-supervised learning reference-guided learning

来源：评论

学校读者我要写书评

暂无评论

Automatic image matting and fusing for portrait synthesis

引用

Science China(Information Sciences) 2022年第2期65卷 235-237页

作者： Zhike YI Wenfeng SONG Shuai LI Aimin HAO State Key Laboratory of Virtual Reality Technology and Systems Beihang University

We propose an automatic image matting and fusing system for portrait synthesis in this *** firstly use a face detection algorithm to determine if the input contains a ***,we use a semantic segmentation neural network to generate a trimap and feed the trimap and the portrait into the neural network to predict the alpha channel ***,the input portrait’s background is replaced with the given background via an image synthesis algorithm to obtain the synthesized portrait.

关键词： Neural Network Automatic Image Matting Image Fusion Deep Learning Gradient Domain

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：