检索结果-内蒙古大学图书馆

IEEE Transactions on Intelligent Vehicles 2024年 1-19页

作者： Cao, Yue Shangguan, Wei Visser, Arnoud Chen, Junjie Chai, Linguo Cai, Baigen School of Automation and Intelligence Beijing Jiaotong University Beijing China School of Automation and Intelligence and State Key Laboratory of Rail Traffic Control and Safety Beijing Jiaotong University Beijing China Intelligent Robotics and Computer Vision Lab of the Informatics Institute Faculty of Science University of Amsterdam The Netherlands

Detecting surrounding situations and reacting accordingly to avoid collisions remains a challenging task for autonomous driving. This task requires predicting the trajectories of surrounding agents and assessing the potential risk of future situations, which can be difficult to achieve solely through onboard vehicle devices. Therefore, this paper proposes a cooperative architecture for trajectory prediction and risk assessment conducted on roadside devices (RSUs) to assist Connected and Autonomous Vehicles (CAVs). Firstly, we develop a segmentbased prediction model (SegNet) tailored to hub signalized intersections. Intersections are divided into multiple segments, and the Curvilinear coordinates are utilized to indicate the geometric road features. The model leverages individual interaction cues in the ego segment and group features in the merging segments, while also incorporating traffic signal information to generate multimodal prediction results. In terms of risk assessment, we utilize the prediction results to provide hierarchical assistance, such as risk values, risk maps, and reference trajectories. Offline experimental results demonstrate that our SegNet model achieves competitive and well-balanced performance compared to stateof-the-art methods on the CitySim Database, with more accurate and smooth prediction trajectories. Through real-time CARLA and SUMO co-simulation, the performance of assisted CAVs indicates that they can safely and effectively navigate with the support of the proposed architecture. IEEE

关键词： Real time systems

来源：评论

学校读者我要写书评

暂无评论

MetaVSR: A Novel Approach to Video Super-Resolution for Arbitrary Magnification 30th

MetaVSR: A Novel Approach to Video Super-Resolution for Ar...

引用

30th International Conference on MultiMedia Modeling, MMM 2024

作者： Hong, Zixuan Cao, Weipeng Xu, Zhiwu Chen, Zhenru Tao, Xi Ming, Zhong Cao, Chuqing Zheng, Liang College of Computer Science and Software Engineering Shenzhen University Shenzhen518060 China Shenzhen518107 China Anhui Province Key Laboratory of Machine Vision Inspection Yangtze River Delta HIT Robot Technology Research Institute Wuhu241000 China

ISBN: (纸本)9783031533044

Video super-resolution is a pivotal task that involves the recovery of high-resolution video frames from their low-resolution counterparts, possessing a multitude of applications in real-world scenarios. Within the domain of prevailing video super-resolution models, a majority of these models are tailored to specific magnification factors, thereby lacking a cohesive architecture capable of accommodating arbitrary magnifications. In response to this lacuna, this study introduces "MetaVSR", a novel video super-resolution model devised to handle arbitrary magnifications. This model is structured around three distinct modules: inter-frame alignment, feature extraction, and upsampling. In the inter-frame alignment module, a bidirectional propagation technique is employed to attain the alignment of adjacent frames. The feature extraction module amalgamates superficial and profound video features to enhance the model’s representational prowess. The upsampling module serves to establish a mapping correlation between the desired target resolution and the input provided in lower resolution. An array of empirical findings attests to the efficacy of the proposed MetaVSR model in addressing this challenge. © 2024, The Author(s), under exclusive license to Springer Nature Switzerland AG.

关键词： Feature extraction

来源：评论

学校读者我要写书评

暂无评论

Expression-Aware Masking and Progressive Decoupling for Cross-Database Facial Expression Recognition

Expression-Aware Masking and Progressive Decoupling for Cros...

引用

International Conference on Automatic Face and Gesture Recognition

作者： Tao Zhong Xiaole Xian Zihan Wang Weicheng Xie Linlin Shen Computer Vision Institute School of Computer Science & Software Engineering Shenzhen University Shenzhen Institute of Artificial Intelligence and Robotics for Society Guangdong Key Laboratory of Intelligent Information Processing Shenzhen University

ISBN: (数字)9798350394948

ISBN: (纸本)9798350394955

Cross-database facial expression recognition (CD-FER) has been widely studied due to its promising applicability in real-life situations, while the generalization performance is the main concern in this task. For improving cross-database generalization, current works frequently resort to masked auto encoder (MAE) to learn the expression representation in an unsupervised manner, and disentanglement of expression and domain features. (i) For MAE, current algorithms mainly employ random masking, and leverage the reconstruction of these masked regions to enable networks to learn the expression representation. However, these masked regions are expression-irrelevant, can not well reflect the characteristics of expression, thus are not efficient enough in representation learning. To this end, we propose an expression-aware masking in MAE to improve the learning efficiency of expression representation, by guiding MAE to mask out expression-aware regions during training. (ii) For disentanglement of expression and domain features, current algorithms realize it mainly in the deep layers. However, the coupling of these features in the shallow layers are rarely concerned, which may largely affect the disentanglement performance in deep layers. Thus, we propose a progressive decoupler to disentangle these features block by block, to use the feature disentanglement in shallow layers to facilitate that in deep layers. Extensive quantitative and qualitative results on multiple expression datasets show that our method can largely outperform the state of the arts in terms of cross-database generalization performance.

关键词： Training Representation learning Couplings Face recognition Gesture recognition Task analysis

来源：评论

学校读者我要写书评

暂无评论

Optimizing NeRF-based SLAM with Trajectory Smoothness Constraints

arXiv

引用

arXiv 2024年

作者： He, Yicheng Chen, Guangcheng Zhang, Hong Shenzhen Key Laboratory of Robotics and Computer Vision Southern University of Science and Technology Shenzhen China Department of Electronic and Electrical Engineering Southern University of Science and Technology Shenzhen China

The joint optimization of Neural Radiance Fields (NeRF) and camera trajectories has been widely applied in SLAM tasks due to its superior dense mapping quality and consistency. NeRF-based SLAM learns camera poses using constraints by implicit map representation. A widely observed phenomenon that results from the constraints of this form is jerky and physically unrealistic estimated camera motion, which in turn affects the map quality. To address this deficiency of current NeRF-based SLAM, we propose in this paper TS-SLAM (TS for Trajectory Smoothness). It introduces smoothness constraints on camera trajectories by representing them with uniform cubic B-splines with continuous acceleration that guarantees smooth camera motion. Benefiting from the differentiability and local control properties of B-splines, TS-SLAM can incrementally learn the control points end-to-end using a sliding window paradigm. Additionally, we regularize camera trajectories by exploiting the dynamics prior to further smooth trajectories. Experimental results demonstrate that TS-SLAM achieves superior trajectory accuracy and improves mapping quality versus NeRF-based SLAM that does not employ the above smoothness constraints. © 2024, CC BY.

关键词： Mapping

来源：评论

学校读者我要写书评

暂无评论

Efficient Object Rearrangement via Multi-view Fusion

Efficient Object Rearrangement via Multi-view Fusion

引用

IEEE International Conference on robotics and Automation (ICRA)

作者： Dehao Huang Chao Tang Hong Zhang Shenzhen Key Laboratory of Robotics and Computer Vision Southern University of Science and Technology Shenzhen China Department of Electronic and Electrical Engineering Southern University of Science and Technology Shenzhen China

ISBN: (数字)9798350384574

ISBN: (纸本)9798350384581

The prospect of assistive robots aiding in object organization has always been compelling. In an image-goal setting, the robot rearranges the current scene to match the single image captured from the goal scene. The key to an image-goal rearrangement system is estimating the desired placement pose of each object based on the single goal image and observations from the current scene. In order to establish sufficient associations for accurate estimation, the system should observe an object from a viewpoint similar to that in the goal image. Existing image-goal rearrangement systems, due to their reliance on a fixed viewpoint for perception, often require redundant manipulations to randomly adjust an object’s pose for a better perspective. Addressing this inefficiency, we introduce a novel object rearrangement system that employs multi-view fusion. By observing the current scene from multiple viewpoints before manipulating objects, our approach can estimate a more accurate pose without redundant manipulation times. A standard visual localization pipeline at the object level is developed to capitalize on the advantages of multi-view observations. Simulation results demonstrate that the efficiency of our system outperforms existing single-view systems. The effectiveness of our system is further validated in a physical experiment. For videos, please visit https: //***/view/multi-view-rearr.

关键词： Visualization Accuracy Databases Simulation Standards organizations Pose estimation Pipelines

来源：评论

学校读者我要写书评

暂无评论

RTAGrasp: Learning Task-Oriented Grasping from Human Videos via Retrieval, Transfer, and Alignment

arXiv

引用

arXiv 2024年

作者： Dong, Wenlong Huang, Dehao Liu, Jiangshan Tang, Chao Zhang, Hong Shenzhen Key Laboratory of Robotics and Computer Vision Southern University of Science and Technology Shenzhen China Department of Electronic and Electrical Engineering Southern University of Science and Technology Shenzhen China

Task-oriented grasping (TOG) is crucial for robots to accomplish manipulation tasks, requiring the determination of TOG positions and directions. Existing methods either rely on costly manual TOG annotations or only extract coarse grasping positions or regions from human demonstrations, limiting their practicality in real-world applications. To address these limitations, we introduce RTAGrasp, a Retrieval, Transfer, and Alignment framework inspired by human grasping strategies. Specifically, our approach first effortlessly constructs a robot memory from human grasping demonstration videos, extracting both TOG position and direction constraints. Then, given a task instruction and a visual observation of the target object, RTAGrasp retrieves the most similar human grasping experience from its memory and leverages semantic matching capabilities of vision foundation models to transfer the TOG constraints to the target object in a training-free manner. Finally, RTAGrasp aligns the transferred TOG constraints with the robot’s action for execution. Evaluations on the public TOG benchmark, TaskGrasp dataset, show the competitive performance of RTAGrasp on both seen and unseen object categories compared to existing baseline methods. Real-world experiments further validate its effectiveness on a robotic arm. Our code, appendix, and video are available at https: //***/view/rtagrasp/home. © 2024, CC BY.

关键词： Robotic arms

来源：评论

学校读者我要写书评

暂无评论

GelPixel: A Single-Pixel-Based Tactile Sensor

GelPixel: A Single-Pixel-Based Tactile Sensor

引用

2023 IEEE International Conference on Real-Time Computing and robotics, RCAR 2023

作者： Huang, Binhua Li, Xiaoyu Sumari, Putra Ye, Chaoxiang Zhou, Zhenning Yin, Meng Yi, Zhengkun Wu, Xinyu Shenzhen Institute of Artificial Intelligence and Robotics for Society Siat Branch Shenzhen518055 China Universiti Sains Malaysia School of Computer Science 11800 Malaysia Chinese Academy of Sciences Guangdong Provincial Key Laboratory of Robotics and Intelligent System Shenzhen Institute of Advanced Technology Shenzhen518055 China

ISBN: (纸本)9798350327182

In this paper, we present the design and development of a novel optical tactile sensor that uses a single-pixel color light-to-frequency converter (TCS3200) and spectral decoding to recognize presses at different positions. This innovative approach overcomes the limitations associated with camera-based sensors, such as increased manufacturing costs and shape restrictions, enabling potential integration as a skin-like layer over a robot's body. Our proposed direct light propagation structure demonstrates enhanced sensitivity and a broader measuring range compared to traditional reflection structures. Using Ecoflex as the elastomer, the sensor's design incorporates a three-layer structure, including an LED layer, an elastomer layer, and a color sensor layer. Experimental results demonstrate that the proposed sensor performs well in localization tasks, achieving over 95% accuracy using multi-target regression for rendering poked positions. This study demonstrates the potential of single-pixel-based tactile sensors in various applications and provides a foundation for further exploration in this area. © 2023 IEEE.

关键词： Tactile sensors

来源：评论

学校读者我要写书评

暂无评论

FLIP-80M: 80 Million Visual-Linguistic Pairs for Facial Language-Image Pre-Training 24

FLIP-80M: 80 Million Visual-Linguistic Pairs for Facial Lang...

引用

32nd ACM International Conference on Multimedia, MM 2024

作者： Li, Yudong Hou, Xianxu Dezhi, Zheng Shen, Linlin Zhao, Zhe School of Computer Science and Software Engineering Shenzhen University Shenzhen China Shenzhen Institute of Artificial Intelligence and Robotics for Society Shenzhen China School of AI and Advanced Computing Xi'an Jiaotong-Liverpool University Shenzhen China Guangdong Provincial Key Laboratory of Intelligent Information Processing Shenzhen University Shenzhen China Tencent AI Lab Beijing China

ISBN: (纸本)9798400706868

While significant progress has been made in multi-modal learning driven by large-scale image-text datasets, there is still a noticeable gap in the availability of such datasets within the facial domain. To facilitate and advance the field of facial representation learning, we present FLIP-80M, a large-scale visual-linguistic dataset comprising over 80 million face images paired with text descriptions. FLIP-80M is constructed by leveraging the large openly available image-text-pair dataset LAION-5B and a mixed-method approach to filter face-related pairs from both visual and linguistic perspectives. Our curation process involves face detection, face caption classification, text de-noising, and synthesis-based image augmentation. As a result, FLIP-80M stands as the largest face-text dataset to date. To evaluate the potential of our dataset, we fine-tune the CLIP model using the proposed FLIP-80M, to create FLIP (Facial Language-Image Pretraining) and assess its representation capabilities across various downstream tasks. Our experiments demonstrate that our FLIP model achieves state-of-the-art results in a range of face analysis tasks, including face parsing, face alignment, and face attribute classification. The dataset and models are available at https://***/ydli-ai/FLIP. © 2024 ACM.

关键词： Face recognition

来源：评论

学校读者我要写书评

暂无评论

Dual Consistency Learning for Semi-Supervised Airway Segmentation 2

Dual Consistency Learning for Semi-Supervised Airway Segment...

引用

2nd International Conference on Mechatronics, IoT and Industrial Informatics, ICMIII 2024

作者： Li, Mingshuang Yuan, Yunyi Wang, Qiong Qian, Yinling Zhu, Lei Southern University of Science and Technology Shenzhen Institute of Advanced Technology Chinese Academy of Sciences Shenzhen China Shenzhen Institute of Advanced Technology Chinese Academy of Sciences Shenzhen China University of Chinese Academy of Sciences Beijing China Guangdong Provincial Key Laboratory of Computer Vision and Virtual Reality Technology Shenzhen China Guangzhou China

ISBN: (纸本)9798350386639

Airway segmentation serves as an essential foundational process for both the diagnosis of lung conditions and the navigation of surgical interventions. Although numerous attempts have been proposed to address airway segmentation, the application of semi-supervised learning in this task has not yet been extensively investigated. In this study, we introduce an innovative semi-supervised learning framework that incorporates dual consistency learning during the re-training. Dual consistency learning exploits image-level and feature-level perturbations simultaneously to fully and efficiently extract additional information from unlabeled data. Specifically, image-level consistency learning employs a novel frequency domain data augmentation scheme to enforce topological feature capturing, and feature-level consistency learning could explore a broader perturbation space, thus realizing discriminative representation extraction from feature space. We carry out comprehensive experiments across various public datasets to substantiate the efficacy of our proposed methodology. The results illustrate that our framework attains remarkable performance levels, surpassing the current state-of-the-art approaches. © 2024 IEEE.

关键词： Self-supervised learning

来源：评论

学校读者我要写书评

暂无评论

Online Self-distillation and Self-modeling for 3D Brain Tumor Segmentation

引用

IEEE Journal of Biomedical and Health Informatics 2025年第6期29卷 4290-4302页

作者： Pang, Yan Li, Yunhao Huang, Teng Liang, Jiaming Wang, Zhen Dong, Changyu Kuang, Dongyang Hu, Ying Chen, Hao Lei, Tim Wang, Qiong The Guangdong Provincial Key Laboratory of Computer Vision and Virtual Reality Technology Shenzhen Institute of Advanced Technology Chinese Academy of Sciences Shenzhen China The School of Artificial Intelligence Guangzhou University China The Zhejiang Lab Hangzhou China Sun Yat-sen University China The Department of Computer Science and Engineering The Department of Chemical and Biological Engineering Hong Kong University of Science and Technology China The Department of Electrical Engineering University of Colorado Denver United States

In the specialized domain of brain tumor segmentation, supervised segmentation approaches are hindered by the limited availability of high-quality labeled data, a condition arising from data privacy concerns, significant costs, and ethical issues. In response to this challenge, this paper presents a training framework that adeptly integrates a plug-and-play component, MOD, into current supervised learning models, boosting their efficacy in scenarios with limited data. The MOD consists of an Online Tokenizer and a Dense Predictor, which employs self-distillation and self-modeling on masked patches, promoting swift convergence and efficient representation learning. During the inference phase, the plug-and-play MOD component is excluded, preserving the computational efficiency of the original model without incurring extra processing costs. We substantiated the value of our approach through experiments on leading 3D brain tumor segmentation baselines. Remarkably, models augmented with the MOD consistently showcased superior results, achieving elevated Dice coefficients and HD95 scores on two datasets: BraTS 2021 and MSD 2019 Task-01 Brain Tumor. Code: https://***/aigzhusmart/MOD © 2013 IEEE.

关键词： Self-supervised learning

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：