检索结果-内蒙古大学图书馆

arXiv 2024年

作者： Wu, Xu Hou, XianXu Lai, Zhihui Zhou, Jie Zhang, Ya-Nan Pedrycz, Witold Shen, Linlin The Computer Vision Institute College of Computer Science and Software Engineering Shenzhen University Shenzhen518060 China Shenzhen Institute of Artificial Intelligence and Robotics for Society Shenzhen518060 China Guangdong Key Laboratory of Intelligent Information Processing Shenzhen University Shenzhen518060 China School of AI and Advanced Computing Xi’an Jiaotong-Liverpool University China National Engineering Laboratory for Big Data System Computing Technology Shenzhen University SZU Branch Shenzhen Institute of Artificial Intelligence and Robotics for Society Guangdong Shenzhen518060 China The Department of Electrical & Computer Engineering University of Alberta University of Alberta Canada

Low-light image enhancement (LLIE) aims to improve low-illumination images. However, existing methods face two challenges: (1) uncertainty in restoration from diverse brightness degradations;(2) loss of texture and color information caused by noise suppression and light enhancement. In this paper, we propose a novel enhancement approach, CodeEnhance, by leveraging quantized priors and image refinement to address these challenges. In particular, we reframe LLIE as learning an image-to-code mapping from low-light images to discrete codebook, which has been learned from high-quality images. To enhance this process, a Semantic Embedding Module (SEM) is introduced to integrate semantic information with low-level features, and a Codebook Shift (CS) mechanism, designed to adapt the pre-learned codebook to better suit the distinct characteristics of our low-light dataset. Additionally, we present an Interactive Feature Transformation (IFT) module to refine texture and color information during image reconstruction, allowing for interactive enhancement based on user preferences. Extensive experiments on both real-world and synthetic benchmarks demonstrate that the incorporation of prior knowledge and controllable information transfer significantly enhances LLIE performance in terms of quality and fidelity. The proposed CodeEnhance exhibits superior robustness to various degradations, including uneven illumination, noise, and color distortion. Copyright © 2024, The Authors. All rights reserved.

关键词： Image enhancement

来源：评论

学校读者我要写书评

暂无评论

Learning Multi-dimensional Edge Feature-based AU Relation Graph for Facial Action Unit Recognition

arXiv

引用

arXiv 2022年

作者： Luo, Cheng Song, Siyang Xie, Weicheng Shen, Linlin Gunes, Hatice Computer Vision Institute Shenzhen University China Shenzhen Institute of Artificial Intelligence and Robotics for Society China Guangdong Key Laboratory of Intelligent Information Processing China Department of Computer Science and Technology University of Cambridge United Kingdom

The activations of Facial Action Units (AUs) mutually influence one another. While the relationship between a pair of AUs can be complex and unique, existing approaches fail to specifically and explicitly represent such cues for each pair of AUs in each facial display. This paper proposes an AU relationship modelling approach that deep learns a unique graph to explicitly describe the relationship between each pair of AUs of the target facial display. Our approach first encodes each AU’s activation status and its association with other AUs into a node feature. Then, it learns a pair of multi-dimensional edge features to describe multiple task-specific relationship cues between each pair of AUs. During both node and edge feature learning, our approach also considers the influence of the unique facial display on AUs’ relationship by taking the full face representation as an input. Experimental results on BP4D and DISFA datasets show that both node and edge feature learning modules provide large performance improvements for CNN and transformer-based backbones, with our best systems achieving the state-of-the-art AU recognition results. Our approach not only has a strong capability in modelling relationship cues for AU recognition but also can be easily incorporated into various backbones. Our PyTorch code is made available at https://***/CVI-SZU/ME-GraphAU. Copyright © 2022, The Authors. All rights reserved.

关键词： Chemical activation

来源：评论

学校读者我要写书评

暂无评论

Introducing the structural bases of typicality effects in deep learning

arXiv

引用

arXiv 2021年

作者： Pino, Omar Vidal Nascimento, Erickson R. Campos, Mario F.M. Computer Vision and Robotics Laboratory Computer Science Department Universidade Federal de Minas Gerais Belo Horizonte31270-010 Brazil

In this paper, we hypothesize that the effects of the degree of typicality in natural semantic categories can be generated based on the structure of artificial categories learned with deep learning models. Motivated by the human approach to representing natural semantic categories and based on the foundations of Prototype Theory, we propose a novel Computational Prototype Model (CPM) to represent the internal structure of semantic categories. Unlike other prototype learning approaches, our mathematical framework proposes a first approach to provide deep neural networks with the ability to model abstract semantic concepts such as category central semantic meaning (semantic prototype), typicality degree of object's image, and family resemblance relationship. We proposed several methodologies based on the typicality's concept to evaluate our CPM model in image semantic processing tasks such as image classification, a global semantic description of images, and transfer learning. Our experiments on different image datasets, such as ImageNet and Coco, showed that our approach might be an admissible proposition in the effort to endow machines with greater power of abstraction for the semantic representation of objects' *** Codes 68T07 (Primary) 68Q55 (Secondary) Copyright © 2021, The Authors. All rights reserved.

关键词： Semantics

来源：评论

学校读者我要写书评

暂无评论

GenFace: A Large-Scale Fine-Grained Face Forgery Benchmark and Cross Appearance-Edge Learning

arXiv

引用

arXiv 2024年

作者： Zhang, Yaning Yu, Zitong Wang, Tianyi Huang, Xiaobin Shen, Linlin Gao, Zan Ren, Jianfeng Computer Vision Institute College of Computer Science and Software Engineering Shenzhen University Shenzhen518060 China School of Computing and Information Technology Great Bay University Dongguan523000 China National Engineering Laboratory for Big Data System Computing Technology Shenzhen University Shenzhen518060 China Nanyang Technological University 50 Nanyang Ave Block N 4 639798 Singapore Shenzhen Institute of Artificial Intelligence and Robotics for Society Shenzhen518129 China Guangdong Key Laboratory of Intelligent Information Processing Shenzhen University China Jinan250014 China Key Laboratory of Computer Vision and System Ministry of Education Tianjin University of Technology Tianjin300384 China School of Computer Science University of Nottingham Ningbo China

The rapid advancement of photorealistic generators has reached a critical juncture where the discrepancy between authentic and manipulated images is increasingly indistinguishable. Thus, benchmarking and advancing techniques detecting digital manipulation become an urgent issue. Although there have been a number of publicly available face forgery datasets, the forgery faces are mostly generated using GAN-based synthesis technology, which does not involve the most recent technologies like diffusion. The diversity and quality of images generated by diffusion models have been significantly improved and thus a much more challenging face forgery dataset shall be used to evaluate SOTA forgery detection literature. In this paper, we propose a large-scale, diverse, and fine-grained high-fidelity dataset, namely GenFace, to facilitate the advancement of deepfake detection, which contains a large number of forgery faces generated by advanced generators such as the diffusion-based model and more detailed labels about the manipulation approaches and adopted generators. In addition to evaluating SOTA approaches on our benchmark, we design an innovative Cross Appearance-Edge Learning (CAEL) detector to capture multi-grained appearance and edge global representations, and detect discriminative and general forgery traces. Moreover, we devise an Appearance-Edge Cross-Attention (AECA) module to explore the various integrations across two domains. Extensive experiment results and visualizations show that our detection model outperforms the state of the arts on different settings like cross-generator, cross-forgery, and cross-dataset evaluations. Code and datasets will be available at https://***/Jenine-321/GenFace. Copyright © 2024, The Authors. All rights reserved.

关键词： Image enhancement

来源：评论

学校读者我要写书评

暂无评论

Multi-scale Contrastive Learning for Gastroenteroscopy Classification

Multi-scale Contrastive Learning for Gastroenteroscopy Class...

引用

Annual IEEE Symposium on computer-Based Medical Systems

作者： Dan Li Xuechen Li Zhibin Peng Wenting Chen Linlin Shen Guangyao Wu Computer Vision Institute School of Computer Science & Software Engineering Shenzhen University National Engineering Laboratory for Big Data System Computing Technology ShenZhen University Shenzhen China City University of Hong Kong Hong Kong SAR China Shenzhen Institute of Artificial Intelligence & Robotics for Society Guangdong Key Laboratory of Intelligent Information Processing Shenzhen University General Hospital

In gastroenteroscopy image analysis, numerous CADs demonstrate that deep learning aids doctors' diagnosis. The shapes and sizes of the lesions are varied. And in the clinic, the dataset appears to be data imbalanced. However, existing methods directly classify by texture and ignore lesions with various shapes and sizes. To address the issue above, we propose a deep neural network, which consists of multi-scale feature extraction, contrastive feature learning and a multi-scale feature fusion module. We train the contrastive feature learning module and multi-scale feature fusion module simultaneously to alleviate the issue of data distribution differences. Thus, the proposed network can better identify various categories. Extensive experiments on the Hyper Kvasir dataset show that the proposed Hybrid-M2CL outperforms the benchmark proposed by the dataset with 5.0% Macro Precision, 3.3% Macro Recall, 3.4% Macro F1-score, 3.3% Micro Precision, 3.6% MCC. In addition, it outperforms the SOTA by 1.1% Macro F1-score, 2.6% MCC, and 2.0% B-ACC.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Image-based Navigation in Real-World Environments via Multiple Mid-level Representations: Fusion Models, Benchmark and Efficient Evaluation

arXiv

引用

arXiv 2022年

作者： Rosano, Marco Furnari, Antonino Gulino, Luigi Santoro, Corrado Farinella, Giovanni Maria FPV@IPLAB - Department of Mathematics and Computer Science University of Catania Catania Italy Robotics Laboratory Department of Mathematics and Computer Science University of Catania Catania Italy OrangeDev s.r.l. Firenze Italy Cognitive Robotics and Social Sensing Laboratory ICAR-CNR Palermo Italy Next Vision s.r.l. Catania Italy

Robot visual navigation is a relevant research topic. Current deep navigation models conveniently learn the navigation policies in simulation, given the large amount of experience they need to collect. Unfortunately, the resulting models show a limited generalization ability when deployed in the real world. In this work we explore solutions to facilitate the development of visual navigation policies trained in simulation that can be successfully transferred in the real world. We first propose an efficient evaluation tool to reproduce realistic navigation episodes in simulation. We then investigate a variety of deep fusion architectures to combine a set of mid-level representations, with the aim of finding the best merge strategy that maximize the real world performances. Our experiments, performed both in simulation and on a robotic platform, show the effectiveness of the considered mid-level representations-based models and confirm the reliability of the evaluation tool. The 3D models of the environment and the code of the validation tool are publicly available at the following link: https://***/EmbodiedVN/ © 2022, CC BY-NC-SA.

关键词： Reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Image-Based Navigation in Real-World Environments Via Multiple Mid-Level Representations: Fusion Models, Benchmark and Efficient Evaluation

SSRN

引用

SSRN 2022年

作者： Rosano, Marco Furnari, Antonino Gulino, Luigi Santoro, Corrado Farinella, Giovanni Maria FPV@IPLAB Department of Mathematics and Computer Science University of Catania Catania Italy Robotics Laboratory Department of Mathematics and Computer Science University of Catania Catania Italy OrangeDev s.r.l. Firenze Italy Cognitive Robotics and Social Sensing Laboratory ICAR-CNR Palermo Italy Next Vision s.r.l. Catania Italy

Robot visual navigation is a relevant research topic. Current deep navigation models mostly learn the navigation policies in simulation. This is convenient, given the efficiency offered by simulators to collect the required training experience. Unfortunately, the resulting models show a limited generalization ability when deployed in the real world. In this work we investigate the problem of learning robust visual navigation policies in simulation that can successfully transferred in the real world. We propose an evaluation tool to reproduce realistic navigation episodes in simulation. We then study whether the mid-level visual representations, that have proven to be effective in previous applications, can be successfully used in simulation to solve the domain gap. Our experiments, performed both in simulation and on a robotic platform, show the proposed evaluation tool can provide results which are consistent with the ones obtained in the real world. © 2022, The Authors. All rights reserved.

关键词： Reinforcement learning

来源：评论

学校读者我要写书评

暂无评论

Delving into the Scale Variance Problem in Object Detection

arXiv

引用

arXiv 2022年

作者： Chen, Junliang Zhao, Xiaodong Shen, Linlin Computer Vision Institute School of Computer Science and Software Engineering Shenzhen University China Shenzhen Institute of Artificial Intelligence of Robotics of Society Shenzhen China Guangdong Key Laboratory of Intelligent Information Processing Shenzhen University Shenzhen 518060 China

Object detection has made substantial progress in the last decade, due to the capability of convolution in extracting local context of objects. However, the scales of objects are diverse and current convolution can only process single-scale input. The capability of traditional convolution with a fixed receptive field in dealing with such a scale variance problem, is thus limited. Multi-scale feature representation has been proven to be an effective way to mitigate the scale variance problem. Recent researches mainly adopt partial connection with certain scales, or aggregate features from all scales and focus on the global information across the scales. However, the information across spatial and depth dimensions is ignored. Inspired by this, we propose the multi-scale convolution (MSConv) to handle this problem. Taking into consideration scale, spatial and depth information at the same time, MSConv is able to process multi-scale input more comprehensively. MSConv is effective and computationally efficient, with only a small increase of computational cost. For most of the single-stage object detectors, replacing the traditional convolutions with MSConvs in the detection head can bring more than 2.5% improvement in AP (on COCO 2017 dataset), with only 3% increase of FLOPs. MSConv is also flexible and effective for two-stage object detectors. When extended to the mainstream two-stage object detectors, MSConv can bring up to 3.0% improvement in AP. Our best model under single-scale testing achieves 48.9% AP on COCO 2017 test-dev split, which surpasses many state-of-the-art methods. © 2022, CC BY.

关键词： Object detection

来源：评论

学校读者我要写书评

暂无评论

Selective Multi-Scale Learning for Object Detection

arXiv

引用

arXiv 2022年

作者： Chen, Junliang Lu, Weizeng Shen, Linlin Computer Vision Institute School of Computer Science and Software Engineering Shenzhen University China Shenzhen Institute of Artificial Intelligence of Robotics of Society Shenzhen China Guangdong Key Laboratory of Intelligent Information Processing Shenzhen University Shenzhen518060 China

Pyramidal networks are standard methods for multi-scale object detection. Current researches on feature pyramid networks usually adopt layer connections to collect features from certain levels of the feature hierarchy, and do not consider the significant differences among them. We propose a better architecture of feature pyramid networks, named selective multi-scale learning (SMSL), to address this issue. SMSL is efficient and general, which can be integrated in both single-stage and two-stage detectors to boost detection performance, with nearly no extra inference cost. RetinaNet combined with SMSL obtains 1.8% improvement in AP (from 39.1% to 40.9%) on COCO dataset. When integrated with SMSL, two-stage detectors can get around 1.0% improvement in AP. © 2022, CC BY.

关键词： Object detection

来源：评论

学校读者我要写书评

暂无评论

Correlate-and-excite: Real-time stereo matching via guided cost volume excitation

arXiv

引用

arXiv 2021年

作者： Bangunharcana, Antyanta Cho, Jae Won Lee, Seokju Kweon, In So Kim, Kyung-Soo Kim, Soohyun Mechatronics Systems and Control Laboratory KAIST Daejeon34141 Korea Republic of Robotics and Computer Vision Laboratory KAIST Daejeon34141 Korea Republic of

Volumetric deep learning approach towards stereo matching aggregates a cost volume computed from input left and right images using 3D convolutions. Recent works showed that utilization of extracted image features and a spatially varying cost volume aggregation complements 3D convolutions. However, existing methods with spatially varying operations are complex, cost considerable computation time, and cause memory consumption to increase. In this work, we construct Guided Cost volume Excitation (GCE) and show that simple channel excitation of cost volume guided by image can improve performance considerably. Moreover, we propose a novel method of using top-k selection prior to soft-argmin disparity regression for computing the final disparity estimate. Combining our novel contributions, we present an end-to-end network that we call Correlate-and-Excite (CoEx). Extensive experiments of our model on the SceneFlow, KITTI 2012, and KITTI 2015 datasets demonstrate the effectiveness and efficiency of our model and show that our model outperforms other speed-based algorithms while also being competitive to other state-of-the-art algorithms. © 2021, CC BY.

关键词： Convolution

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：