检索结果-内蒙古大学图书馆

arXiv 2024年

作者： Feng, Hao Wang, Wendi Liu, Shaokai Deng, Jiajun Zhou, Wengang Li, Houqiang The CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System Department of Electronic Engineering and Information Science University of Science and Technology of China Hefei230027 China The University of Adelaide Australian Institute for Machine Learning Australia

In this work, we present DeepEraser, an effective deep network for generic text removal. DeepEraser utilizes a recurrent architecture that erases the text in an image via iterative operations. Our idea comes from the process of erasing pencil script, where the text area designated for removal is subject to continuous monitoring and the text is attenuated progressively, ensuring a thorough and clean erasure. Technically, at each iteration, an innovative erasing module is deployed, which not only explicitly aggregates the previous erasing progress but also mines additional semantic context to erase the target text. Through iterative refinements, the text regions are progressively replaced with more appropriate content and finally converge to a relatively accurate status. Furthermore, a custom mask generation strategy is introduced to improve the capability of DeepEraser for adaptive text removal, as opposed to indiscriminately removing all the text in an image. Our DeepEraser is notably compact with only 1.4M parameters and trained in an end-to-end manner. To verify its effectiveness, extensive experiments are conducted on several prevalent benchmarks, including SCUT-Syn, SCUT-EnsText, and Oxford Synthetic text dataset. The quantitative and qualitative results demonstrate the effectiveness of our DeepEraser over the state-of-the-art methods, as well as its strong generalization ability in custom mask text removal. The codes and pre-trained models are available at https://***/fh2019ustc/DeepEraser Copyright © 2024, The Authors. All rights reserved.

关键词： Iterative methods

来源：评论

学校读者我要写书评

暂无评论

Global Homography Motion Compensation for Versatile Video Coding

Global Homography Motion Compensation for Versatile Video Co...

引用

IEEE Visual Communications and Image processing (VCIP)

作者： Yao Li Zhuoyuan Li Li Li Dong Liu Houqiang Li CAS Key Laboratory of Technology in Geo-Spatial Information Processing and Application System University of Science and Technology of China Hefei China

ISBN: (纸本)9781665475938

In Versatile Video Coding (VVC), local affine motion compensation (LAMC) is adopted to handle complex motions, such as rotation and zooming. However, it is inefficient to use LAMC to handle the global motion due to the following two reasons. First, the use of LAMC may lead to some extra bit cost on the affine motion model parameters. Second, the precision of LAMC is restricted by the MV precision of the control points. Therefore, in this paper, we propose a global homography motion compensation (GHMC) framework to better characterize the global motion. For each coding block, an extra mode is added to perform motion compensation based on an 8-parameter global homography motion model. In addition, an extrapolation scheme is designed to derive the parameters from reference frames to save the bit cost for signaling them. The proposed framework is implemented into the VVC reference software VTM-6.0. Experimental results show that, on average, 0.69% and 0.66% BD-rate reduction is achieved under Low Delay P and Low Delay B configurations, respectively, for sequences with rich complex global motions.

关键词： Video coding Extrapolation Adaptation models Costs Image coding Visual communication Motion compensation

来源：评论

学校读者我要写书评

暂无评论

Self-guided Few-Shot Semantic Segmentation for Remote Sensing Imagery Based on Large Vision Models 8th

Self-guided Few-Shot Semantic Segmentation for Remote Sensi...

引用

8th EAI International Conference on Machine Learning and Intelligent Communications, MLICOM 2023

作者： Qi, Xiyu Wu, Yifan Mao, Yongqiang Zhang, Wenhui Zhang, Yidan Aerospace Information Research Institute Chinese Academy of Sciences Beijing100190 China Aerospace Information Research Institute Chinese Academy of Sciences Beijing100190 China Key Laboratory of Technology in Geo-spatial Information Processing and Application System Aerospace Information Research Institute Chinese Academy of Sciences Beijing100190 China School of Electronic Electrical and Communication Engineering University of Chinese Academy of Sciences Beijing100190 China

ISBN: (纸本)9783031717154

The Segment Anything Model (SAM) exhibits remarkable versatility and zero-shot learning abilities, owing largely to its extensive training data (SA-1B). Recognizing SAM’s dependency on manual guidance given its category-agnostic nature, we identified unexplored potential within few-shot semantic segmentation tasks for remote sensing imagery. This research introduces a structured framework designed for the automation of few-shot semantic segmentation. It utilizes the SAM model and facilitates a more efficient generation of semantically discernible segmentation outcomes. Central to our methodology is a novel automatic prompt learning approach, leveraging prior guided mask to produce coarse pixel-wise prompts for SAM. Extensive experiments on the DLRSD datasets underlines the superiority of our approach, outperforming other available few-shot methodologies. © ICST Institute for Computer Sciences, Social Informatics and Telecommunications Engineering 2024.

关键词： Semantic Segmentation

来源：评论

学校读者我要写书评

暂无评论

One-shot Generative Domain Adaptation in 3D GANs

arXiv

引用

arXiv 2024年

作者： Li, Ziqiang Wu, Yi Wang, Chaoyue Rui, Xue Li, Bin Nanjing University of Information Science and Technology Nanjing China University of Science and Technology of China Hefei China University of Sydney Australia CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System University of Science and Technology of China Hefei China

3D-aware image generation necessitates extensive training data to ensure stable training and mitigate the risk of overfitting. This paper first considers a novel task known as One-shot 3D Generative Domain Adaptation (GDA), aimed at transferring a pre-trained 3D generator from one domain to a new one, relying solely on a single reference image. One-shot 3D GDA is characterized by the pursuit of specific attributes, namely, high fidelity, large diversity, cross-domain consistency, and multi-view consistency. Within this paper, we introduce 3D-Adapter, the first one-shot 3D GDA method, for diverse and faithful generation. Our approach begins by judiciously selecting a restricted weight set for fine-tuning, and subsequently leverages four advanced loss functions to facilitate adaptation. An efficient progressive fine-tuning strategy is also implemented to enhance the adaptation process. The synergy of these three technological components empowers 3D-Adapter to achieve remarkable performance, substantiated both quantitatively and qualitatively, across all desired properties of 3D GDA. Furthermore, 3D-Adapter seamlessly extends its capabilities to zero-shot scenarios, and preserves the potential for crucial tasks such as interpolation, reconstruction, and editing within the latent space of the pre-trained generator. Code will be available at https://***/iceli1007/3D-Adapter. © 2024, CC BY.

关键词： Generative adversarial networks

来源：评论

学校读者我要写书评

暂无评论

Light Field Compression Based on Implicit Neural Representation

Light Field Compression Based on Implicit Neural Representat...

引用

Picture Coding Symposium, PCS

作者： Henan Wang Hanxin Zhu Zhibo Chen CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System University of Science and Technology of China Hefei China

ISBN: (纸本)9781665492584

Light field, as a new data representation format in multimedia, has the ability to capture both intensity and direction of light rays. However, the additional angular information also brings a large volume of data. Classical coding methods are not effective to describe the relationship between different views, leading to redundancy left. To address this problem, we propose a novel light field compression scheme based on implicit neural representation to reduce redundancies between views. We store the information of a light field image implicitly in an neural network and adopt model compression methods to further compress the implicit representation. Extensive experiments have demonstrated the effectiveness of our proposed method, which achieves comparable rate-distortion performance as well as superior perceptual quality over traditional methods.

关键词： Image coding Redundancy Pipelines Neural networks Rate-distortion Light fields Encoding

来源：评论

学校读者我要写书评

暂无评论

L-hypersurface Based Parameters Selection for Sparse SAR Imaging via Composite Regularization

L-hypersurface Based Parameters Selection for Sparse SAR Ima...

引用

2022 International Conference on Radar systems, RADAR 2022

作者： Fan, Yizhe Xu, Zhongqiu Zhou, Guoru Zhang, Bingchen Wu, Yirong Aerospace Information Research Institute Chinese Academy of Sciences China Key Laboratory of Technology in Geo-spatial Information Processing and Application System School of Electronic Electrical and Communication Engineering University of Chinese Academy of Sciences Beijing100190 China

ISBN: (纸本)9781839537776

Composite regularization models are widely used in sparse signal processing, making multiple regularization parameters selection a significant problem to be solved. Variety kinds of composite regularization models are used in SAR imaging, including L1 and TV penalty, L1 and L2,1 penalty, etc. In this article, a new adaptive multiple regularization parameters selection method named L-hypersurface is proposed. The effectiveness of the proposed method is verified by experiments. Simulation experiments indicate that the selected optimal regularization parameters have satisfied reconstruction results, both visually and numerically. Furthermore, experiments on Gaofen-3 SAR satellite data are also exploited to show the performance of the proposed method. © IET Conference Proceedings. All rights reserved.

关键词： Synthetic aperture radar

来源：评论

学校读者我要写书评

暂无评论

AdaptVision: Dynamic Input Scaling in MLLMs for Versatile Scene Understanding

arXiv

引用

arXiv 2024年

作者： Wang, Yonghui Zhou, Wengang Feng, Hao Li, Houqiang The CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System Department of Electronic Engineering and Information Science University of Science and Technology of China Hefei230027 China Institute of Artificial Intelligence Hefei Comprehensive National Science Center China

Over the past few years, the advancement of Multimodal Large Language Models (MLLMs) has captured the wide interest of researchers, leading to numerous innovations to enhance MLLMs’ comprehension. In this paper, we present AdaptVision, a multimodal large language model specifically designed to dynamically process input images at varying resolutions. We hypothesize that the requisite number of visual tokens for the model is contingent upon both the resolution and content of the input image. Generally, natural images with a lower information density can be effectively interpreted by the model using fewer visual tokens at reduced resolutions. In contrast, images containing textual content, such as documents with rich text, necessitate a higher number of visual tokens for accurate text interpretation due to their higher information density. Building on this insight, we devise a dynamic image partitioning module that adjusts the number of visual tokens according to the size and aspect ratio of images. This method mitigates distortion effects that arise from resizing images to a uniform resolution and dynamically optimizing the visual tokens input to the LLMs. Our model is capable of processing images with resolutions up to 1008 × 1008. Extensive experiments across various datasets demonstrate that our method achieves impressive performance in handling vision-language tasks in both natural and text-related scenes. The source code and dataset are now publicly available at https://***/harrytea/AdaptVision. Copyright © 2024, The Authors. All rights reserved.

关键词： Modeling languages

来源：评论

学校读者我要写书评

暂无评论

AN AUTOFOCUS NETWORK FOR MULTI-CHANNEL PHASE ERRORS WITH application TO TOMOSAR IMAGING

AN AUTOFOCUS NETWORK FOR MULTI-CHANNEL PHASE ERRORS WITH APP...

引用

IET International Radar Conference 2023, IRC 2023

作者： Wang, Muhan Gao, Silin Zhang, Zhe Qiu, Xiaolan Key Laboratory of Technology in Geo-spatial Information Processing and Application System Chinese Academy of Sciences Beijing100190 China Key Laboratory of Intelligent Aerospace Big Data Application Technology Suzhou215123 China Suzhou Aerospace Information Research Institute Suzhou215123 China Aerospace Information Research Institute Chinese Academy of Sciences Beijing100094 China School of Electronic Electrical and Communication Engineering University of Chinese Academy of Sciences Beijing100049 China

ISBN: (纸本)9781839539954

Synthetic aperture radar (SAR) tomography (TomoSAR) has garnered significant attention due to its capability for three-dimensional reconstruction. Compressed sensing (CS) methods are widely employed to address the TomoSAR inversion challenge. Nevertheless, practical applications reveal phase errors among different channels, resulting in defocusing and blurring when relying solely on CS for 3D reconstruction. Current state-of-the-art autofocus techniques suffer from prohibitive computational complexity, limiting their applicability to large-scale 3D imaging. In pursuit of efficient TomoSAR 3-D autofocusing, we proposed ASAMP-Net, an innovative deep unfolding network. Operating within a two-step framework, each layer comprises two stages: phase error estimation and iterative scattering coefficient reconstruction using the sparse adaptive matching pursuit (SAMP) algorithm. Additionally, phase error estimation is obtained through mathematical derivation, while challenges associated with fixed sparsity and limited efficiency in conventional methods are mitigated through deep learning techniques. Simulation experiments and real data validation affirm the effectiveness and superiority of the proposed method. © The Institution of Engineering & technology 2023.

关键词： Radar imaging

来源：评论

学校读者我要写书评

暂无评论

GRIDLESS DOA ESTIMATION FOR AUTOMOTIVE RADARS WITH VARIOUS ARRAY geoMETRIES: THE NON-VANDERMONDE ATOMIC SOFT THRESHOLDING APPROACH

GRIDLESS DOA ESTIMATION FOR AUTOMOTIVE RADARS WITH VARIOUS A...

引用

IET International Radar Conference 2023, IRC 2023

作者： Gao, Silin Wang, Muhan Zhang, Zhe Zhang, Bingchen Wu, Yirong Aerospace Information Research Institute Chinese Academy of Sciences Beijing100094 China Key Laboratory of Technology in Geo-spatial Information Processing and Application System Chinese Academy of Sciences Beijing100190 China School of Electronic Electrical and Communication Engineering University of Chinese Academy of Sciences Beijing100049 China Key Laboratory of Intelligent Aerospace Big Data Application Technology Suzhou215123 China Suzhou Aerospace Information Research Institute Suzhou215123 China

ISBN: (纸本)9781839539954

This paper is centered on gridless direction of arrival (DoA) estimation for single-snapshot data collected by non-uniform linear arrays (NLAs) in the context of automotive applications. While recent single-snapshot DoA estimation algorithms rooted in grid-based compressed sensing (CS) offer super-resolution capabilities, they are notably sensitive to discrepancies between the assumed sparsity basis and the actual basis. Furthermore, the existing atomic soft thresholding (AST) algorithm, a promising gridless sparse recovery technique within the Toeplitz model, is constrained to uniform linear arrays (ULAs) exhibiting Vandermonde structure in their array manifolds. However, in automotive scenarios, it becomes imperative to efficiently apply gridless DoA estimation to NLAs featuring non-Vandermonde array manifolds. In this paper, we introduce the non-Vandermonde atomic soft thresholding (NVAST) algorithm and employ it for DoA estimation in NLAs. The novel algorithm establishes a link between Vandermonde and non-Vandermonde atoms by decomposing non-Vandermonde vectors and addressing them using semi-definite programming (SDP). Simulation and measurement experiments conducted with automotive radars underscore the remarkable performance of our proposed method. © The Institution of Engineering & technology 2023.

关键词： Automotive radar

来源：评论

学校读者我要写书评

暂无评论

Source-free Unsupervised Domain Adaptation for Blind Image Quality Assessment

arXiv

引用

arXiv 2022年

作者： Liu, Jianzhao Li, Xin An, Shukun Chen, Zhibo The CAS Key Laboratory of Technology in Geo-Spatial Information Processing and Application System University of Science and Technology of China Hefei230027 China

Existing learning-based methods for blind image quality assessment (BIQA) are heavily dependent on large amounts of annotated training data, and usually suffer from a severe performance degradation when encountering the domain/distribution shift problem. Thanks to the development of unsupervised domain adaptation (UDA), some works attempt to transfer the knowledge from a label-sufficient source domain to a label-free target domain under domain shift with UDA. However, it requires the coexistence of source and target data, which might be impractical for source data due to the privacy or storage issues. In this paper, we take the first step towards the source-free unsupervised domain adaptation (SFUDA) in a simple yet efficient manner for BIQA to tackle the domain shift without access to the source data. Specifically, we cast the quality assessment task as a rating distribution prediction problem. Based on the intrinsic properties of BIQA, we present a group of well-designed self-supervised objectives to guide the adaptation of the BN affine parameters towards the target domain. Among them, minimizing the prediction entropy and maximizing the batch prediction diversity aim to encourage more confident results while avoiding the trivial solution. Besides, based on the observation that the IQA rating distribution of single image follows the Gaussian distribution, we apply Gaussian regularization to the predicted rating distribution to make it more consistent with the nature of human scoring. Extensive experimental results under cross-domain scenarios demonstrated the effectiveness of our proposed method to mitigate the domain shift. Copyright © 2022, The Authors. All rights reserved.

关键词： Forecasting

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：