检索结果-内蒙古大学图书馆

Multi-scale Residual Alignment Transformer for Remote Sensing Image Change Detection

学校读者我要写书评

暂无评论

IEEE Geoscience and Remote Sensing Letters 2025年

作者： Li, Lei Zhou, Xiao Yu, Xiaodong Gao, Bin Yan, Guogang Liu, Yidan Cui, Tingting Zhao, Guangyu Hu, Yuxin Chinese Academy of Sciences Key Laboratory of Technology in Geo-spatial Information Processing and Application System Key Laboratory of Target Cognition and Application Technology Aerospace Information Research Institute Beijing100094 China

Deep learning (DL) methods have shown great potential for remote sensing image change detection recently, but still suffer from several limitations. Within the identical semantic concept, significant but irrelevant changes in surface texture, color and spatial shifting of building objects resulted from variations in imaging physical factors, causes feature inconsistency of building objects in bitemporal sences. Conventional DL methods lack the capability to effectively distinguish real changes from irrelevant changes, leading to some false detections. This letter propose a novel framework, the multi-scale residual alignment transformer (AlignFormer), to mitigate the above issues. Specifically, inspired by deformable attention mechanism, we firstly design an adaptive feature alignment module (AFAM) to suppress the inconsistency of feature pairs, where the regions of building can be adaptively focused on and the spatial-temporal dependencies of relevant building objects in feature pairs effectively captured, via scheme of flexibly sampling keys/Values for each given Query. Besides, we utilize an extremely tiny Swin Transformer as the backbone of differencing-based framework for obtaining hierarchical features. Moreover, inspired by residual learning strategy, three AFAMs are integrated into the framework to form the multi-scale residual architecture for the coarse-to-fine alignment of the paired features. Experimental results confirm the superiority of our proposed method over several state-of-the-art algorithms. Our code will be released at https://***/lilei-aircas/AlignFormer_CD. © 2004-2012 IEEE.

关键词： Change detection

360HRL: Hierarchical Reinforcement Learning Based Rate Adaptation for 360-Degree Video Streaming

学校读者我要写书评

暂无评论

360HRL: Hierarchical Reinforcement Learning Based Rate Adapt...

IEEE Visual Communications and Image processing (VCIP)

作者： Jun Fu Chen Hou Zhibo Chen CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System University of Science and Technology of China Hefei China

ISBN: (纸本)9781728173221

Recently, reinforced adaptive bitrate (ABR) algorithms have achieved remarkable success in tile-based 360-degree video streaming. However, they heavily rely on accurate viewport prediction. To alleviate this issue, we propose a hierarchical reinforcement-learning (RL) based ABR algorithm, dubbed 360HRL. Specifically, 360HRL consists of a top agent and a bottom agent. The former is used to decide whether to download a new segment for continuous playback or re-download an old segment for correcting wrong bitrate decisions caused by inaccurate viewport estimation, and the latter is used to select bitrates for tiles in the chosen segment. In addition, 360HRL adopts a two-stage training methodology. In the first stage, the bottom agent is trained under the environment where the top agent always chooses to download a new segment. In the second stage, the bottom agent is fixed and the top agent is optimized with the help of a heuristic decision rule. Experimental results demonstrate that 360HRL outperforms existing RL-based ABR algorithms across a broad of network conditions and quality of experience (QoE) objectives.

关键词： Training Visual communication Image processing Bit rate Estimation Reinforcement learning Streaming media

Analyzing Time Complexity of Practical Learned Image Compression Models

学校读者我要写书评

暂无评论

Analyzing Time Complexity of Practical Learned Image Compres...

IEEE Visual Communications and Image processing (VCIP)

作者： Xiaohan Pan Zongyu Guo Zhibo Chen CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System University of Science and Technology of China Hefei China

ISBN: (纸本)9781728173221

We have witnessed the rapid development of learned image compression (LIC). The latest LIC models have outperformed almost all traditional image compression standards in terms of rate-distortion (RD) performance. However, the time complexity of LIC model is still underdiscovered, limiting the practical applications in industry. Even with the acceleration of GPU, LIC models still struggle with long coding time, especially on the decoder side. In this paper, we analyze and test a few prevailing and representative LIC models, and compare their complexity with traditional codecs including H.265/HEVC intra and H.266/VVC intra. We provide a comprehensive analysis on every module in the LIC models, and investigate how bitrate changes affect coding time. We observe that the time complexity bottleneck mainly exists in entropy coding and context modelling. Although this paper pay more attention to experimental statistics, our analysis reveals some insights for further acceleration of LIC model, such as model modification for parallel computing, model pruning and a more parallel context model.

关键词： Analytical models Image coding Codecs Visual communication Computational modeling Rate-distortion Decoding

Multi-dimensional Graph Linear Canonical Transform

学校读者我要写书评

暂无评论

arXiv 2024年

作者： Li, Na Zhang, Zhichao Han, Jie Chen, Yunjie Cao, Chunzheng School of Mathematics and Statistics Center for Applied Mathematics of Jiangsu Province Jiangsu International Joint Laboratory on System Modeling and Data Analysis Nanjing University of Information Science and Technology Nanjing210044 China Key Laboratory of System Control and Information Processing Ministry of Education Shanghai200240 China Key Laboratory of Computational Science and Application of Hainan Province Haikou571158 China School of Remote Sensing and Geomatics Engineering Nanjing University of Information Science and Technology Nanjing210044 China

Many multi-dimensional (M-D) graph signals appear in the real world, such as digital images, sensor network measurements and temperature records from weather observation stations. It is a key challenge to design a transform method for processing these graph M-D signals in the linear canonical transform domain. This paper proposes the two-dimensional graph linear canonical transform based on the central discrete dilated Hermite function (2-D CDDHFs-GLCT) and the two-dimensional graph linear canonical transform based on chirp multiplication-chirp convolution-chirp multiplication decomposition (2-D CM-CC-CM-GLCT). Then, extending 2-D CDDHFs-GLCT and 2-D CM-CC-CM-GLCT to M-D CDDHFs-GLCT and M-D CM-CC-CM-GLCT. In terms of the computational complexity, additivity and reversibility, M-D CDDHFs-GLCT and M-D CM-CC-CM-GLCT are compared. Theoretical analysis shows that the computational complexity of M-D CM-CC-CM-GLCT algorithm is obviously reduced. Simulation results indicate that M-D CM-CC-CM-GLCT achieves comparable additivity to M-D CDDHFs-GLCT, while M-D CM-CC-CM-GLCT exhibits better reversibility. Finally, M-D GLCT is applied to data compression to show its application advantages. The experimental results reflect the superiority of M-D GLCT in the algorithm design and implementation of data compression. Copyright © 2024, The Authors. All rights reserved.

关键词： Chirp modulation

Graph Chirp Signal and Graph Fractional Vertex-Frequency Energy Distribution

学校读者我要写书评

暂无评论

arXiv 2025年

作者： Cui, Manjun Zhang, Zhichao School of Mathematics and Statistics The Center for Applied Mathematics of Jiangsu Province The Jiangsu International Joint Laboratory on System Modeling and Data Analysis Nanjing University of Information Science and Technology Nanjing210044 China School of Mathematics and Statistics Nanjing University of Information Science and Technology Nanjing210044 China Key Laboratory of System Control and Information Processing Ministry of Education Shanghai Jiao Tong University Shanghai200240 China Key Laboratory of Computational Science and Application of Hainan Province Hainan Normal University Haikou571158 China

Graph signal processing (GSP) has emerged as a powerful framework for analyzing data on irregular domains. In recent years, many classical techniques in signal processing (SP) have been successfully extended to GSP. Among them, chirp signals play a crucial role in various SP applications. However, graph chirp signals have not been formally defined despite their importance. Here, we define graph chirp signals and establish a comprehensive theoretical framework for their analysis. We propose the graph fractional vertex-frequency energy distribution (GFED), which provides a powerful tool for processing and analyzing graph chirp signals. We introduce the general fractional graph distribution (GFGD), a generalized vertex-frequency distribution, and the reduced interference GFED, which can suppress cross-term interference and enhance signal clarity. Furthermore, we propose a novel method for detecting graph signals through GFED domain filtering, facilitating robust detection and analysis of graph chirp signals in noisy environments. Moreover, this method can be applied to real-world data for denoising more effective than some state-of-the-arts, further demonstrating its practical significance. Copyright © 2025, The Authors. All rights reserved.

关键词： Wiener filtering

Domain-class correlation decomposition for generalizable person re-identification

学校读者我要写书评

暂无评论

arXiv 2021年

作者： Yang, Kaiwen Tian, Xinmei CAS Key Laboratory of Technology in Geo-spatial Information Processing and Application System University of Science and Technology of China Hefei230027 China

Domain generalization in person re-identification is a highly important meaningful and practical task in which a model trained with data from several source domains is expected to generalize well to unseen target domains. Domain adversarial learning is a promising domain generalization method that aims to remove domain information in the latent representation through adversarial training. However, in person re-identification, the domain and class are correlated, and we theoretically show that domain adversarial learning will lose certain information about class due to this domain-class correlation. Inspired by casual inference, we propose to perform interventions to the domain factor d, aiming to decompose the domain-class correlation. To achieve this goal, we proposed estimating the resulting representation z∗ caused by the intervention through first- and second-order statistical characteristic matching. Specifically, we build a memory bank to restore the statistical characteristics of each domain. Then, we use the newly generated samples {z∗, y, d∗} to compute the loss function. These samples are domain-class correlation decomposed;thus, we can learn a domain-invariant representation that can capture more class-related features. Extensive experiments show that our model outperforms the state-of-the-art methods on the large-scale domain generalization Re-ID benchmark. Copyright © 2021, The Authors. All rights reserved.

关键词： Machine learning

Attribute Artifacts Removal for Geometry-based Point Cloud Compression

学校读者我要写书评

暂无评论

arXiv 2021年

作者： Sheng, Xihua Li, Li Liu, Dong Xiong, Zhiwei CAS Key Laboratory of Technology in Geo-Spatial Information Processing and Application System University of Science and Technology of China Hefei230027 China

Geometry-based point cloud compression (G-PCC) can achieve remarkable compression efficiency for point clouds. However, it still leads to serious attribute compression artifacts, especially under low bitrate scenarios. In this paper, we propose a Multi-Scale Graph Attention Network (MS-GAT) to remove the artifacts of point cloud attributes compressed by G-PCC. We first construct a graph based on point cloud geometry coordinates and then use the Chebyshev graph convolutions to extract features of point cloud attributes. Considering that one point may be correlated with points both near and far away from it, we propose a multi-scale scheme to capture the short- and long-range correlations between the current point and its neighboring and distant points. To address the problem that various points may have different degrees of artifacts caused by adaptive quantization, we introduce the quantization step per point as an extra input to the proposed network. We also incorporate a weighted graph attentional layer into the network to pay special attention to the points with more attribute artifacts. To the best of our knowledge, this is the first attribute artifacts removal method for G-PCC. We validate the effectiveness of our method over various point clouds. Objective comparison results show that our proposed method achieves an average of 9.74% BD-rate reduction compared with Predlift and 10.13% BD-rate reduction compared with RAHT. Subjective comparison results present that visual artifacts such as color shifting, blurring, and quantization noise are reduced. Copyright © 2021, The Authors. All rights reserved.

关键词： Convolution

aiWave: Volumetric Image Compression with 3-D Trained Affine Wavelet-like Transform

学校读者我要写书评

暂无评论

arXiv 2022年

作者： Xue, Dongmei Ma, Haichuan Li, Li Liu, Dong Xiong, Zhiwei The CAS Key Laboratory of Technology in Geo-Spatial Information Processing and Application System University of Science and Technology of China Hefei230027 China The Institute of Artificial Intelligence Hefei Comprehensive National Science Center Hefei230088 China

Volumetric image compression has become an urgent task to effectively transmit and store images produced in biological research and clinical practice. At present, the most commonly used volumetric image compression methods are based on wavelet transform, such as JP3D. However, JP3D employs an ideal, separable, global, and fixed wavelet basis to convert input images from pixel domain to frequency domain, which seriously limits its performance. In this paper, we first design a 3-D trained wavelet-like transform to enable signal-dependent and non-separable transform. Then, an affine wavelet basis is introduced to capture the various local correlations in different regions of volumetric images. Furthermore, we embed the proposed wavelet-like transform to an end-to-end compression framework called aiWave to enable an adaptive compression scheme for various datasets. Last but not least, we introduce the weight sharing strategies of the affine wavelet-like transform according to the volumetric data characteristics in the axial direction to reduce the number of parameters. The experimental results show that: 1) when cooperating our trained 3-D affine wavelet-like transform with a simple factorized entropy coding module, aiWave performs better than JP3D and is comparable in terms of encoding and decoding complexities;2) when adding a context module to remove signal redundancy further, aiWave can achieve a much better performance than HEVC. Copyright © 2022, The Authors. All rights reserved.

关键词： Image compression

LIQA: Lifelong blind image quality assessment

学校读者我要写书评

暂无评论

arXiv 2021年

作者： Liu, Jianzhao Zhou, Wei Xu, Jiahua Li, Xin An, Shukun Chen, Zhibo CAS Key Laboratory of Technology in Geo-Spatial Information Processing and Application System University of Science and Technology of China Hefei230027 China

Existing blind image quality assessment (BIQA) methods are mostly designed in a disposable way and cannot evolve with unseen distortions adaptively, which greatly limits the deployment and application of BIQA models in real-world scenarios. To address this problem, we propose a novel Lifelong blind Image Quality Assessment (LIQA) approach, targeting to achieve the lifelong learning of BIQA. Without accessing to previous training data, our proposed LIQA can not only learn new distortions, but also mitigate the catastrophic forgetting of seen distortions. Specifically, we adopt the Split-and-Merge distillation strategy to train a single-head network that makes task-agnostic predictions. In the split stage, we first employ a distortion-specific generator to obtain the pseudo features of each seen distortion. Then, we use an auxiliary multi-head regression network to generate the predicted quality of each seen distortion. In the merge stage, we replay the pseudo features paired with pseudo labels to distill the knowledge of multiple heads, which can build the final regressed single head. Experimental results demonstrate that the proposed LIQA method can handle the continuous shifts of different distortion types and even datasets. More importantly, our LIQA model can achieve stable performance even if the task sequence is long. Copyright © 2021, The Authors. All rights reserved.

关键词： Distillation