检索结果-内蒙古大学图书馆

Conference on computer vision and pattern recognition (CVPR)

作者： Bin Fu Fanghua Yu Anran Liu Zixuan Wang Jie Wen Junjun He Yu Qiao ShenZhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences The University of Hong Kong Harbin Institute of Technology Shenzhen Shanghai Artificial Intelligence Laboratory

ISBN: (数字)9798350353006

ISBN: (纸本)9798350353013

Few-shot font generation (FFG) produces stylized font images with a limited number of reference samples, which can significantly reduce lab.r costs in manual font designs. Most existing FFG methods follow the style-content dis-entanglement paradigm and employ the Generative Adver-sarial Network (GAN) to generate target fonts by combining the decoupled content and style representations. The complicated structure and detailed style are simultaneously generated in those methods, which may be the sub-optimal solutions for FFG task. Inspired by most manual font design processes of expert designers, in this paper, we model font generation as a multi-stage generative process. Specifically, as the injected noise and the data distribution in diffusion models can be well-separated into different sub-spaces, we are able to incorporate the font transfer process into these models. Based on this observation, we generalize diffusion methods to modelfont generative process by separating the reverse diffusion process into three stages with different functions: The structure construction stage first generates the structure information for the target character based on the source image, and the font transfer stage subsequently transforms the source font to the target font. Finally, the font refinement stage enhances the appearances and local details of the target font images. Based on the above multi-stage generative process, we construct our font generation framework. named MSD-Font, with a dual-network approach to generate font images. The superior performance demonstrates the effectiveness of our model. The code is availab.e at: https://***/fubinfbIMSD-Font.

关键词： Costs Noise Diffusion processes Transforms Manuals Diffusion models Generative adversarial networks

来源：评论

学校读者我要写书评

暂无评论

Non-Uniform Illumination Attack for Fooling Convolutional Neural Networks

arXiv

引用

arXiv 2024年

作者： Jain, Akshay Dubey, Shiv Ram Singh, Satish Kumar Santosh, K.C. Chaudhuri, Bidyut Baran The Computer Vision and Biometrics Lab Department of Information Technology Indian Institute of Information Technology Allahabad Uttar Pradesh Prayagraj211015 India The AI Research Lab Department of Computer Science University of South Dakota VermillionSD57069 United States The Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata700108 India

Convolutional Neural Networks (CNNs) have made remarkable strides;however, they remain susceptible to vulnerabilities, particularly in the face of minor image perturbations that humans can easily recognize. This weakness, often termed as ‘attacks,’ underscores the limited robustness of CNNs and the need for research into fortifying their resistance against such manipulations. This study introduces a novel Non-Uniform Illumination (NUI) attack technique, where images are subtly altered using varying NUI masks. Extensive experiments are conducted on widely-accepted datasets including CIFAR10, TinyImageNet, and CalTech256, focusing on image classification with 12 different NUI attack models. The resilience of VGG, ResNet, MobilenetV3-small and InceptionV3 models against NUI attacks are evaluated. Our results show a substantial decline in the CNN models’ classification accuracy when subjected to NUI attacks, indicating their vulnerability under non-uniform illumination. To mitigate this, a defense strategy is proposed, including NUI-attacked images, generated through the new NUI transformation, into the training set. The results demonstrate a significant enhancement in CNN model performance when confronted with perturbed images affected by NUI attacks. This strategy seeks to bolster CNN models’ resilience against NUI attacks. 1 Copyright © 2024, The Authors. All rights reserved.

关键词： Convolutional neural networks

来源：评论

学校读者我要写书评

暂无评论

CodePhys: Robust Video-based Remote Physiological Measurement through Latent Codebook Querying

arXiv

引用

arXiv 2025年

作者： Chu, Shuyang Xia, Menghan Yuan, Mengyao Liu, Xin Seppanen, Tapio Zhao, Guoying Shi, Jingang The School of Software Engineering Xi’an Jiaotong University Xi’an China The Tencent AI Lab Shenzhen China The Computer Vision and Pattern Recognition Laboratory Lappeenranta-Lahti University of Technology LUT Lappeenranta53850 Finland The Center for Machine Vision and Signal Analysis University of Oulu Finland

Remote photoplethysmography (rPPG) aims to measure non-contact physiological signals from facial videos, which has shown great potential in many applications. Most existing methods directly extract video-based rPPG features by designing neural networks for heart rate estimation. Although they can achieve acceptable results, the recovery of rPPG signal faces intractable challenges when interference from real-world scenarios takes place on facial video. Specifically, facial videos are inevitably affected by non-physiological factors (e.g., camera device noise, defocus, and motion blur), leading to the distortion of extracted rPPG signals. Recent rPPG extraction methods are easily affected by interference and degradation, resulting in noisy rPPG signals. In this paper, we propose a novel method named CodePhys, which innovatively treats rPPG measurement as a code query task in a noise-free proxy space (i.e., codebook) constructed by ground-truth PPG signals. We consider noisy rPPG features as queries and generate high-fidelity rPPG features by matching them with noise-free PPG features from the codebook. Our approach also incorporates a spatial-aware encoder network with a spatial attention mechanism to highlight physiologically active areas and uses a distillation loss to reduce the influence of non-periodic visual interference. Experimental results on four benchmark datasets demonstrate that CodePhys outperforms state-of-the-art methods in both intra-dataset and cross-dataset settings. © 2025, CC BY.

关键词： Heart

来源：评论

学校读者我要写书评

暂无评论

KV Inversion: KV Embeddings Learning for Text-Conditioned Real Image Action Editing

arXiv

引用

arXiv 2023年

作者： Huang, Jiancheng Liu, Yifan Qin, Jin Chen, Shifeng ShenZhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences Shenzhen China University of Chinese Academy of Sciences Beijing China

Text-conditioned image editing is a recently emerged and highly practical task, and its potential is immeasurable. However, most of the concurrent methods are unable to perform action editing, i.e. they can not produce results that conform to the action semantics of the editing prompt and preserve the content of the original image. To solve the problem of action editing, we propose KV Inversion, a method that can achieve satisfactory reconstruction performance and action editing, which can solve two major problems: 1) the edited result can match the corresponding action, and 2) the edited object can retain the texture and identity of the original real image. In addition, our method does not require training the Stable Diffusion model itself, nor does it require scanning a large-scale dataset to perform time-consuming training. Copyright © 2023, The Authors. All rights reserved.

关键词： Textures

来源：评论

学校读者我要写书评

暂无评论

Bootstrap Diffusion Model Curve Estimation for High Resolution Low-Light Image Enhancement

arXiv

引用

arXiv 2023年

作者： Huang, Jiancheng Liu, Yifan Chen, Shifeng ShenZhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences Shenzhen China University of Chinese Academy of Sciences Beijing China

Learning-based methods have attracted a lot of research attention and led to significant improvements in low-light image enhancement. However, most of them still suffer from two main problems: expensive computational cost in high resolution images and unsatisfactory performance in simultaneous enhancement and denoising. To address these problems, we propose BDCE, a bootstrap diffusion model that exploits the learning of the distribution of the curve parameters instead of the normal-light image itself. Specifically, we adopt the curve estimation method to handle the high-resolution images, where the curve parameters are estimated by our bootstrap diffusion model. In addition, a denoise module is applied in each iteration of curve adjustment to denoise the intermediate enhanced result of each iteration. We evaluate BDCE on commonly used benchmark datasets, and extensive experiments show that it achieves state-of-the-art qualitative and quantitative performance. Copyright © 2023, The Authors. All rights reserved.

关键词： Image enhancement

来源：评论

学校读者我要写书评

暂无评论

UNIFORMER: UNIFIED TRANSFORMER FOR EFFICIENT SPATIOTEMPORAL REPRESENTATION LEARNING 10

UNIFORMER: UNIFIED TRANSFORMER FOR EFFICIENT SPATIOTEMPORAL ...

引用

10th International Conference on Learning Representations, ICLR 2022

作者： Li, Kunchang Wang, Yali Gao, Peng Song, Guanglu Liu, Yu Li, Hongsheng Qiao, Yu ShenZhen Key Lab of Computer Vision and Pattern Recognition SIAT-SenseTime Joint Lab Shenzhen Institute of Advanced Technology Chinese Academy of Sciences China University of Chinese Academy of Sciences China Shanghai AI Laboratory Shanghai China SenseTime Research The Chinese University of Hong Kong Hong Kong

It is a challenging task to learn rich and multi-scale spatiotemporal semantics from high-dimensional videos, due to large local redundancy and complex global dependency between video frames. The recent advances in this research have been mainly driven by 3D convolutional neural networks and vision transformers. Although 3D convolution can efficiently aggregate local context to suppress local redundancy from a small 3D neighborhood, it lacks the capability to capture global dependency because of the limited receptive field. Alternatively, vision transformers can effectively capture long-range dependency by self-attention mechanism, while having the limitation on reducing local redundancy with blind similarity comparison among all the tokens in each layer. Based on these observations, we propose a novel Unified transFormer (UniFormer) which seamlessly integrates merits of 3D convolution and spatiotemporal self-attention in a concise transformer format, and achieves a preferable balance between computation and accuracy. Different from traditional transformers, our relation aggregator can tackle both spatiotemporal redundancy and dependency, by learning local and global token affinity respectively in shallow and deep layers. We conduct extensive experiments on the popular video benchmarks, e.g., Kinetics-400, Kinetics-600, and Something-Something V1&V2. With only ImageNet-1K pretraining, our UniFormer achieves 82.9%/84.8% top-1 accuracy on Kinetics-400/Kinetics-600, while requiring 10× fewer GFLOPs than other state-of-the-art methods. For Something-Something V1 and V2, our UniFormer achieves new state-of-the-art performances of 60.9% and 71.2% top-1 accuracy respectively. Code is availab.e at https://***/Sense-X/UniFormer. © 2022 ICLR 2022 - 10th International Conference on Learning Representationss. All rights reserved.

关键词： Redundancy

来源：评论

学校读者我要写书评

暂无评论

DegAE: A New Pretraining Paradigm for Low-Level vision

DegAE: A New Pretraining Paradigm for Low-Level Vision

引用

Conference on computer vision and pattern recognition (CVPR)

作者： Yihao Liu Jingwen He Jinjin Gu Xiangtao Kong Yu Qiao Chao Dong Shanghai Artificial Intelligence Laboratory ShenZhen Key Lab of Computer Vision and Pattern Recognition Shenzhen Institute of Advanced Technology Chinese Academy of Sciences University of Chinese Academy of Sciences The University of Sydney

Self-supervised pretraining has achieved remarkable success in high-level vision, but its application in low-level vision remains ambiguous and not well-established. What is the primitive intention of pretraining? What is the core problem of pretraining in low-level vision? In this paper, we aim to answer these essential questions and establish a new pretraining scheme for low-level vision. Specifically, we examine previous pretraining methods in both high-level and low-level vision, and categorize current low-level vision tasks into two groups based on the difficulty of data acqui-sition: low-cost and high-cost tasks. Existing literature has mainly focused on pretraining for low-cost tasks, where the observed performance improvement is often limited. However, we argue that pretraining is more significant for high-cost tasks, where data acquisition is more challenging. To learn a general low-level vision representation that can improve the performance of various tasks, we propose a new pretraining paradigm called degradation autoencoder (De-gAE). DegAE follows the philosophy of designing pretext task for self-supervised pretraining and is elab.rately tai-lored to low-level vision. With DegAE pretraining, SwinIR achieves a 6.88dB performance gain on image dehaze task, while Uformer obtains 3.22dB and 0.54dB improvement on dehaze and derain tasks, respectively.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Generalist Segmentation Algorithm for Photoreceptors Analysis in Adaptive Optics Imaging

arXiv

引用

arXiv 2024年

作者： Kulyabin, Mikhail Sindel, Aline Pedersen, Hilde Pedersen, Hilde R. Gilson, Stuart Baraas, Rigmor Maier, Andreas Pattern Recognition Lab Department of Computer Science Friedrich-Alexander-Universität Erlangen-Nürnberg Erlangen Germany National Centre for Optics Vision and Eye Care Faculty of Health and Social Sciences University of South-Eastern Norway Kongsberg Norway

Analyzing the cone photoreceptor pattern in images obtained from the living human retina using quantitative methods can be crucial for the early detection and management of various eye conditions. Confocal adaptive optics scanning light ophthalmoscope (AOSLO) imaging enables visualization of the cones from reflections of waveguiding cone photoreceptors. While there have been significant improvements in automated algorithms for segmenting cones in confocal AOSLO images, the process of lab.ling data remains lab.r-intensive and manual. This paper introduces a method based on deep learning (DL) for detecting and segmenting cones in AOSLO images. The models were trained on a semi-automatically lab.led dataset of 20 AOSLO batches of images of 18 participants for 0◦, 1◦, and 2◦ from the foveal center. F1 scores were 0.968, 0.958, and 0.954 for 0◦, 1◦, and 2◦, respectively, which is better than previously reported DL approaches. Our method minimizes the need for lab.led data by only necessitating a fraction of lab.led cones, which is especially beneficial in the field of ophthalmology, where lab.led data can often be limited. © 2024, CC BY.

关键词： Adaptive optics

来源：评论

学校读者我要写书评

暂无评论

Automated Segmentation and Analysis of Cone Photoreceptors in Multimodal Adaptive Optics Imaging

arXiv

引用

arXiv 2024年

作者： Shrestha, Prajol Kulyabin, Mikhail Sindel, Aline Pedersen, Hilde R. Gilson, Stuart Baraas, Rigmor Maier, Andreas Pattern Recognition Lab Department of Computer Science Friedrich-Alexander-Universität Erlangen-Nürnberg Erlangen Germany National Centre for Optics Vision and Eye Care Faculty of Health and Social Sciences University of South-Eastern Norway Kongsberg Norway

Accurate detection and segmentation of cone cells in the retina are essential for diagnosing and managing retinal diseases. In this study, we used advanced imaging techniques, including confocal and non-confocal split detector images from adaptive optics scanning light ophthalmoscopy (AOSLO), to analyze photoreceptors for improved accuracy. Precise segmentation is crucial for understanding each cone cell’s shape, area, and distribution. It helps to estimate the surrounding areas occupied by rods, which allows the calculation of the density of cone photoreceptors in the area of interest. In turn, density is critical for evaluating overall retinal health and functionality. We explored two U-Net-based segmentation models: StarDist for confocal and Cellpose for calculated modalities. Analyzing cone cells in images from two modalities and achieving consistent results demonstrates the study’s reliability and potential for clinical application. © 2024, CC BY.

关键词： Adaptive optics

来源：评论

学校读者我要写书评

暂无评论

Anomaly Handwritten Text Detection for Automatic Descriptive Answer Evaluation 11

Anomaly Handwritten Text Detection for Automatic Descriptive...

引用

11th International Conference on Computing and pattern recognition, ICCPR 2022

作者： Chatterjee, Nilanjana Shivakumara, Palaiahnaakote Pal, Umapada Lu, Tong Lu, Yue Computer Vision and Pattern Recognition Unit Indian Statistical Institute Kolkata India Faculty of Computer Science and Information Technology University of Malaya Kuala Lumpur Malaysia National Key Lab for Novel Software Technology Nanjing University Nanjing China Shanghai Key Laboratory of Multidimensional Information Processing East China Normal University Shanghai China

ISBN: (纸本)9781450397056

Although there are advanced technologies for character recognition, automatic descriptive answer evaluation is an open challenge for the document image analysis community due to large diversified handwritten text and answers to the question. This paper presents a novel method for detecting anomaly handwritten text in the responses written by the students to the questions. The method is proposed based on the fact that when the students are confident in answering questions, the students usually write answers legibly and neatly while they are not confident, they write sloppy writing which may not be easy for the reader to understand. To detect such anomaly handwritten text, we explore a new combination of Fourier transform and deep learning model for detecting edges. This result preserves the structure of handwritten text. For extracting features for classification of anomaly text and normal text, the proposed method studies the behavior of writing style, especially the variation at ascenders and descenders. Therefore, the proposed work draws principal axis which is invariant to rotation, scaling and some extent to distortion for the edge images. With respect to principal axis, the proposed method draws medial axis using uppermost and lowermost points. The distance between the medial axis and principal axis points are considered as feature vector. Further, the feature vector is passed to Artificial Neural Network for classification of anomaly text. The proposed method is evaluated by testing on our own dataset, standard dataset of gender identification (IAM) and handwritten forgery detection dataset (ACPR 2019). The results on different datasets show that the proposed work outperforms the existing methods. © 2022 ACM.

关键词： Students

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：