检索结果-内蒙古大学图书馆

arXiv 2023年

作者： Sun, Lei Gehrig, Daniel Sakaridis, Christos Gehrig, Mathias Liang, Jingyun Sun, Peng Xu, Zhijie Wang, Kaiwei Van Gool, Luc Scaramuzza, Davide National Research Center for Optical Instrumentation Zhejiang University Hangzhou310027 China Robotics and Perception Group University of Zurich Zurich8050 Switzerland Computer Vision Lab ETH Zurich Zurich8092 Switzerland Centre for Visual and Immersive Computing Huddersfield University HD1 3DH United Kingdom INSAIT Sofia University "St. Kliment Ohridski" Bulgaria

Effective video frame interpolation hinges on the adept handling of motion in the input scene. Prior work acknowledges asynchronous event information for this, but often overlooks whether motion induces blur in the video, limiting its scope to sharp frame interpolation. We instead propose a unified framework for event-based frame interpolation that performs deblurring ad-hoc and thus works both on sharp and blurry input videos. Our model consists in a bidirectional recurrent network that incorporates the temporal dimension of interpolation and fuses information from the input frames and the events adaptively based on their temporal proximity. To enhance the generalization from synthetic data to real event cameras, we integrate self-supervised framework with the proposed model to enhance the generalization on real-world datasets in the wild. At the dataset level, we introduce a novel real-world high-resolution dataset with events and color videos named HighREV, which provides a challenging evaluation setting for the examined task. Extensive experiments show that our network consistently outperforms previous state-of-the-art methods on frame interpolation, single image deblurring, and the joint task of both. Experiments on domain transfer reveal that self-supervised training effectively mitigates the performance degradation observed when transitioning from synthetic data to real-world data. Code and datasets are available at https://***/AHupuJR/REFID. Copyright © 2023, The Authors. All rights reserved.

关键词： Self-supervised learning

来源：评论

学校读者我要写书评

暂无评论

Robotic workcell for sole grasping in footwear manufacturing

Robotic workcell for sole grasping in footwear manufacturing

引用

International Conference on Emerging Technologies and Factory Automation (ETFA)

作者： Guillermo Oliver Pablo Gil Fernando Torres Automatics Robotics and Artificial Vision Lab (AUROVA). Computer Science Research Institute University of Alicante Spain

ISBN: (数字)9781728189567

ISBN: (纸本)9781728189574

The goal of this paper is to present a robotic workcell to automate several tasks of the cementing process in footwear manufacturing. Our cell's main applications are sole digitization of a wide variety of footwear, glue dispensing and sole grasping from conveyor belts. This cell is made up of a manipulator arm endowed with a gripper, a conveyor belt and a 3D scanner. We have integrated all the elements into a ROS simulation environment facilitating control and communication among them, also providing flexibility to support future extensions. We propose a novel method to grasp soles of different shape, size and material, exploiting the particular characteristics of these objects. Our method relies on object contour extraction using concave hulls. We evaluate it on point clouds of 16 digitized real soles in three different scenarios: concave hull, k-NNs extension and PCA correction. While we have tested this workcell in a simulated environment, the presented system's performance is scheduled to be tested on a real setup at INESCOP facilities in the upcoming months.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Cross Domain Object Detection by Target-Perceived Dual Branch Distillation

arXiv

引用

arXiv 2022年

作者： He, Mengzhe Wang, Yali Wu, Jiaxi Wang, Yiru Li, Hanqing Li, Bo Gan, Weihao Wu, Wei Qiao, Yu ShenZhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences China SenseTime Research University of Chinese Academy of Science China Shanghai AI Laboratory Shanghai China Beihang University China SIAT Branch Shenzhen Institute of Artificial Intelligence and Robotics for Society China

Cross domain object detection is a realistic and challenging task in the wild. It suffers from performance degradation due to large shift of data distributions and lack of instance-level annotations in the target domain. Existing approaches mainly focus on either of these two difficulties, even though they are closely coupled in cross domain object detection. To solve this problem, we propose a novel Target-perceived Dual-branch Distillation (TDD) framework. By integrating detection branches of both source and target domains in a unified teacher-student learning scheme, it can reduce domain shift and generate reliable supervision effectively. In particular, we first introduce a distinct Target Proposal Perceiver between two domains. It can adaptively enhance source detector to perceive objects in a target image, by leveraging target proposal contexts from iterative cross-attention. Afterwards, we design a concise Dual Branch Self Distillation strategy for model training, which can progressively integrate complementary object knowledge from different domains via self-distillation in two branches. Finally, we conduct extensive experiments on a number of widely-used scenarios in cross domain object detection. The results show that our TDD significantly outperforms the state-of-the-art methods on all the benchmarks. Our code and model will be available at here. Copyright © 2022, The Authors. All rights reserved.

关键词： Distillation

来源：评论

学校读者我要写书评

暂无评论

NTIRE 2023 HR NonHomogeneous Dehazing Challenge Report

NTIRE 2023 HR NonHomogeneous Dehazing Challenge Report

引用

2023 IEEE/CVF Conference on computer vision and Pattern Recognition Workshops, CVPRW 2023

作者： Ancuti, Codruta O. Ancuti, Cosmin Vasluianu, Florin-Alexandru Timofte, Radu Zhou, Han Dong, Wei Liu, Yangyi Chen, Jun Liu, Huan Li, Liangyan Wu, Zijun Dong, Yubo Li, Yuyan Qiu, Tian He, Yu Lu, Yonghong Wu, Yinwei Jiang, Zhenxiang Liu, Songhua Yang, Xingyi Jing, Yongcheng Benjdira, Bilel Ali, Anas M. Koubaa, Anis Yang, Hao-Hsiang Chen, I-Hsiang Chen, Wei-Ting Huang, Zhi-Kai Chen, Yi-Chung Hsieh, Chia-Hsuan Chang, Hua-En Chiang, Yuan-Chun Kuo, Sy-Yen Guo, Yu Gao, Yuan Liu, Ryan Wen Lu, Yuxu Qu, Jingxiang He, Shengfeng Ren, Wenqi Hoang, Trung Zhang, Haichuan Yazdani, Amirsaeed Monga, Vishal Yang, Lehan Wu, Alex Jiahao Mai, Tiancheng Cong, Xiaofeng Yin, Xuemeng Yin, Xuefei Emad, Hazim Abdallah, Ahmed Yasser, Yahya Elshahat, Dalia Elbaz, Esraa Li, Zhan Kuang, Wenqing Luo, Ziwei Gustafsson, Fredrik K. Zhao, Zheng Sjölund, Jens Schön, Thomas B. Zhang, Zhao Wei, Yanyan Wang, Junhu Zhao, Suiyi Zheng, Huan Guo, Jin Sun, Yangfan Liu, Tianli Hao, Dejun Jiang, Kui Sarvaiya, Anjali Prajapati, Kalpesh Patra, Ratnadeep Barik, Pragnesh Rathod, Chaitanya Upla, Kishor Raja, Kiran Ramachandra, Raghavendra Busch, Christoph ETcTI Universitatea Politehnica Timisoara Romania ICTEAM UCL Belgium Computer Vision Lab University of Wuerzburg Germany Computer Vision Lab ETH Zurich Switzerland Department of Electrical and Computer Engineering McMaster University Canada Department of Electrical and Computer Engineering University of Alberta Canada McMaster University Canada Xidian University China Research Institute Singapore National University of Singapore Singapore University of Sydney Australia Robotics and Internet-of-Things Laboratory Prince Sultan University Riyadh12435 Saudi Arabia Department of Electrical Engineering National Taiwan University Taiwan Graduate Institute of Electronics Engineering National Taiwan University Taiwan Graduate Institute of Communication Engineering National Taiwan University Taiwan Wuhan University of Technology China Singapore Management University Singapore Singapore Sun Yat-sen University China Electrical Engineering Department Pennsylvania State University United States The University of Sydney Australia Southeast University China University of California Los Angeles United States Beijing Jiaotong University China Mansoura Univeristy Egypt College of Information Science and Technology Jinan University China Department of Information Technology Uppsala University Sweden Hefei University of Technology China Zhejiang Dahua Technology China Sardar Vallabhbhai National Institute of Technology India Norwegian University of Science and Technology Norway

ISBN: (纸本)9798350302493

This study assesses the outcomes of the NTIRE 2023 Challenge on Non-Homogeneous Dehazing, wherein novel techniques were proposed and evaluated on new image dataset called HD-NH-HAZE. The HD-NH-HAZE dataset contains 50 high resolution pairs of real-life outdoor images featuring nonhomogeneous hazy images and corresponding haze-free images of the same scene. The nonhomogeneous haze was simulated using a professional setup that replicated real-world conditions of hazy scenarios. The competition had 246 participants and 17 teams submitted solutions for the final testing phase. The proposed solutions demonstrated the cutting-edge in image dehazing technology. © 2023 IEEE.

关键词： Demulsification

来源：评论

学校读者我要写书评

暂无评论

Moving object detection for visual odometry in a dynamic environment based on occlusion accumulation

Moving object detection for visual odometry in a dynamic env...

引用

IEEE International Conference on robotics and Automation (ICRA)

作者： Haram Kim Pyojin Kim H. Jin Kim Lab for Autonomous Robotics Research (LARR) Seoul National University Seoul South Korea Computer Graphics and Vision Lab (GrUVi) Simon Fraser University Burnaby BC

ISBN: (数字)9781728173955

ISBN: (纸本)9781728173962

Detection of moving objects is an essential capability in dealing with dynamic environments. Most moving object detection algorithms have been designed for color images without depth. For robotic navigation where real-time RGBD data is often readily available, utilization of the depth information would be beneficial for obstacle recognition. Here, we propose a simple moving object detection algorithm that uses RGB-D images. The proposed algorithm does not require estimating a background model. Instead, it uses an occlusion model which enables us to estimate the camera pose on a background confused with moving objects that dominate the scene. The proposed algorithm allows to separate the moving object detection and visual odometry (VO) so that an arbitrary robust VO method can be employed in a dynamic situation with a combination of moving object detection, whereas other VO algorithms for a dynamic environment are inseparable. In this paper, we use dense visual odometry (DVO) as a VO method with a bi-square regression weight. Experimental results show the segmentation accuracy and the performance improvement of DVO in the situations. We validate our algorithm in public datasets and our dataset which also publicly accessible.

关键词： Cameras Heuristic algorithms Object detection Robustness Trajectory Visual odometry Feature extraction

来源：评论

学校读者我要写书评

暂无评论

Collaborative Multi-View Convolutions With Gating For Accurate And Fast Volumetric Medical Image Segmentation

Collaborative Multi-View Convolutions With Gating For Accura...

引用

IEEE International Symposium on Biomedical Imaging

作者： Cheng Li Jin Ye Junjun He Shanshan Wang Lixu Gu Yu Qiao Paul C. Lauterbur Research Center for Biomedical Imaging SIAT CAS Shenzhen China Shenzhen Key Lab of Computer Vision and Pattern Recognition SIAT-SenseTime Joint Lab SIAT CAS Shenzhen China SIAT Branch Shenzhen Institute of Artificial Intelligence and Robotics for Society Shenzhen China School of Biomedical Engineering/the Institute of Medical Robotics Shanghai Jiao Tong University Shanghai China

ISBN: (纸本)9781665412469;9781665429474

Due to their high capacity in capturing 3D spatial information, 3D Fully Convolutional Neural Networks (3D FCNs), especially 3D U-Net, are prevalent for volumetric medical image segmentation. However, 3D convolutions are much more computationally complex than 2D convolutions and thus, are more prone to overfitting. This paper proposes Collaborative Multi-View convolutions (CMV convs) that can keep the model complexity similar to those employing 2D convolutions while capturing the 3D spatial context like 3D convolutions. Specifically, CMV convs simultaneously extract information from three orthogonal views with three parameter-shared 2D convolutions. A Global-Guided Gating mechanism (3G) is further designed that selectively passes information from CMV convs to the next stage. Combined with 3G, a CMV conv become a G-CMV conv that constitutes a plug-and-play module, which can be easily integrated into various 3D CNNs for image segmentation. Extensive experiments utilizing BraTS18 dataset have been conducted. Our method achieves competitive results compared to state-of-the-art methods with over 10× fewer parameters than 3D-UNet.

关键词： Solid modeling Image segmentation Interpolation Three-dimensional displays Computational modeling Biological system modeling Collaboration

来源：评论

学校读者我要写书评

暂无评论

PIPAL: A Large-Scale Image Quality Assessment Dataset for Perceptual Image Restoration 16th

PIPAL: A Large-Scale Image Quality Assessment Dataset for Pe...

引用

16th European Conference on computer vision, ECCV 2020

作者： Jinjin, Gu Haoming, Cai Haoyu, Chen Xiaoxing, Ye Ren, Jimmy S. Chao, Dong The School of Data Science The Chinese University of Hong Kong Shenzhen China ShenZhen Key Lab of Computer Vision and Pattern Recognition SIAT-SenseTime Joint Lab Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences Shenzhen China SenseTime Research Science Park Hong Kong SIAT Branch Shenzhen Institute of Artificial Intelligence and Robotics for Society Shenzhen China

ISBN: (纸本)9783030586201

Image quality assessment (IQA) is the key factor for the fast development of image restoration (IR) algorithms. The most recent IR methods based on Generative Adversarial Networks (GANs) have achieved significant improvement in visual performance, but also presented great challenges for quantitative evaluation. Notably, we observe an increasing inconsistency between perceptual quality and the evaluation results. Then we raise two questions: (1) Can existing IQA methods objectively evaluate recent IR algorithms? (2) When focus on beating current benchmarks, are we getting better IR algorithms? To answer these questions and promote the development of IQA methods, we contribute a large-scale IQA dataset, called Perceptual Image Processing Algorithms (PIPAL) dataset. Especially, this dataset includes the results of GAN-based methods, which are missing in previous datasets. We collect more than 1.13 million human judgments to assign subjective scores for PIPAL images using the more reliable "Elo system". Based on PIPAL, we present new benchmarks for both IQA and super-resolution methods. Our results indicate that existing IQA methods cannot fairly evaluate GAN-based IR algorithms. While using appropriate evaluation methods is important, IQA methods should also be updated along with the development of IR algorithms. At last, we improve the performance of IQA networks on GAN-based distortions by introducing anti-aliasing pooling. Experiments show the effectiveness of the proposed method. © 2020, Springer Nature Switzerland AG.

关键词： Generative adversarial networks

来源：评论

学校读者我要写书评

暂无评论

Self-slimmed vision Transformer

arXiv

引用

arXiv 2021年

作者： Zong, Zhuofan Li, Kunchang Song, Guanglu Wang, Yali Qiao, Yu Leng, Biao Liu, Yu School of Computer Science and Engineering Beihang University China SenseTime Research China ShenZhen Key Lab of Computer Vision and Pattern Recognition SIAT-SenseTime Joint Lab Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences China University of Chinese Academy of Sciences China SIAT Branch Shenzhen Institute of Artificial Intelligence and Robotics for Society China Shanghai AI Laboratory China

vision transformers (ViTs) have become the popular structures and outperformed convolutional neural networks (CNNs) on various vision tasks. However, such powerful transformers bring a huge computation burden, because of the exhausting token-to-token comparison. The previous works focus on dropping insignificant tokens to reduce the computational cost of ViTs. But when the dropping ratio increases, this hard manner will inevitably discard the vital tokens, which limits its efficiency. To solve the issue, we propose a generic self-slimmed learning approach for vanilla ViTs, namely SiT. Specifically, we first design a novel Token Slimming Module (TSM), which can boost the inference efficiency of ViTs by dynamic token aggregation. As a general method of token hard dropping, our TSM softly integrates redundant tokens into fewer informative ones. It can dynamically zoom visual attention without cutting off discriminative token relations in the images, even with a high slimming ratio. Furthermore, we introduce a concise Feature Recalibration Distillation (FRD) framework, wherein we design a reverse version of TSM (RTSM) to recalibrate the unstructured token in a flexible auto-encoder manner. Due to the similar structure between teacher and student, our FRD can effectively leverage structure knowledge for better convergence. Finally, we conduct extensive experiments to evaluate our SiT. It demonstrates that our method can speed up ViTs by 1.7× with negligible accuracy drop, and even speed up ViTs by 3.6× while maintaining 97% of their performance. Surprisingly, by simply arming LV-ViT with our SiT, we achieve new state-of-the-art performance on ImageNet. Code is available at https://***/Sense-X/SiT. © 2021, CC BY.

关键词： Distillation

来源：评论

学校读者我要写书评

暂无评论

OTE: Optimal Trustworthy EdgeAI solutions for smart cities

OTE: Optimal Trustworthy EdgeAI solutions for smart cities

引用

IEEE/ACM International Symposium on Cluster Computing and the Grid (CCGRID)

作者： Vasileios Mygdalis Lorenzo Carnevale Jose Ramiro Martí nez-De-Dios Dmitriy Shutin Giovanni Aiello Massimo Villari Ioannis Pitas Department of Informatics Aristotle University of Thessaloniki Thessaloniki Greece Department of Mathematical and Computer Science Physics and Hearth Sciences University of Messina Messina Italy Gruppo Nazionale per il Calcolo Scientifico (GNCS) Istituto Nazionale di Alta Matematica (INdAM) &#x201C F. Severi&#x201D Rome Italy Robotics Vision and Control Group University of Seville Seville Spain Institute of Communications and Navigation German Aerospace Center (DLR) Wessling Germany Research and Development Lab. Engineering Ingegneria Informatica S.p.A Rome Italy

This work studies and defines the problem of providing extensive and opportunistic Edge AI-based area coverage in smart city application scenarios, by researching and determining the optimal configuration of sensing and computational resources for minimizing the environmental/technology footprint of the solution. A typical smart city computing continuum consists of statically installed multimodal sensing Internet-of-Things (IoT) nodes at various city locations, accompanied by interconnected computational Cloud/Edge/IoT nodes. This paper presents Optimal Trustworthy EdgeAI (OTE), an entirely novel research pipeline, that complements existing smart city infrastructure with intelligent drone Edge/IoT nodes (in the form of modularly equipped unmanned aerial vehicles), capable of autonomous repositioning according to individual/collective sensing and coverage criteria. Thereby, we envisage the emerging cutting-edge technologies of trustworthy sensing, perceiving, modelling technologies for predicting the behavior of moving targets (e.g., citizens/vehicles/objects), understanding natural phenomena (e.g., sea wave motion, urban flora/fauna, biodiversity) in order to anticipate events (people's bad habits, environmental changes), by exploiting novel continuous data processing services across the whole span of the enhanced Cloud-Edge-IoT computing continuum.

关键词： Cloud computing Privacy Smart cities Pipelines Semantics Robot sensing systems Software

来源：评论

学校读者我要写书评

暂无评论

PIPAL: a Large-Scale Image Quality Assessment Dataset for Perceptual Image Restoration

arXiv

引用

arXiv 2020年

作者： Gu, Jinjin Cai, Haoming Chen, Haoyu Ye, Xiaoxing Ren, Jimmy S. Dong, Chao School of Data Science Chinese University of Hong Kong Shenzhen China ShenZhen Key Lab of Computer Vision and Pattern Recognition SIAT-SenseTime Joint Lab Shenzhen Institutes of Advanced Technology Chinese Academy of Sciences China SenseTime Research SIAT Branch Shenzhen Institute of Artificial Intelligence and Robotics for Society China

关键词： Image reconstruction

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：