检索结果-内蒙古大学图书馆

HFGlobalFormer: When High-Frequency Recovery Meets Global Context Modeling for Compressed Image Deraindrop

IEEE TRANSACTIONS ON MULTIMEDIA 2025年 27卷 2070-2082页

作者： Lin, Rongqun Yang, Wenhan Chen, Baoliang Zhang, Pingping Liu, Yue Wang, Shiqi Kwong, Sam Pengcheng Lab Shenzhen 518066 Peoples R China South China Normal Univ Dept Comp Sci Guangzhou 510631 Peoples R China City Univ Hong Kong Dept Comp Sci Hong Kong 999077 Peoples R China Lingnan Univ Tuen Mun Hong Kong Peoples R China

When transmission medium and compression degradation are intertwined, new challenges emerge. This study addresses the problem of raindrop removal from compressed images, where raindrops obscure large areas of the background and compression leads to the loss of high-frequency (HF) information. The restoration of the former requires global contextual information, while the latter necessitates guidance for high-frequency details, resulting in a conflict in utilizing these two types of information when designing existing methods. To address this issue, we propose a novel transformer architecture that leverages the advantages of attention mechanism and HF-friendly design to effectively restore the compressed raindrop images at the framework, component, and module levels. Specifically, at the framework level, we integrate relative position multi-head self-attention and convolutional layers into the proposed low-high-frequency transformer (LHFT), where the former captures global contextual information and the latter focuses on high-frequency information. Their combination effectively resolves the issue of mixed degradation. At the component level, we utilize high-frequency depth-wise convolution (HFDC) with zero-mean kernels to improve the capability to extract high-frequency features, drawing inspiration from typical high-frequency filters like Prewitt and Sobel operators. Finally, at the module level, we introduce a low-high-attention module (LHAM) to adaptively allocate the importance of low and high frequencies along channels for effective fusion. We establish the JPEG-compressed raindrop image dataset and conduct extensive experiments on different compression rates. Experimental results demonstrate that the proposed method outperforms state-of-the-art methods without increasing computational costs.

关键词： Image coding transformers Image restoration Convolution Feature extraction Context modeling transform coding Convolutional neural networks Degradation High frequency Compressed image convolution high frequency low frequency raindrop removal transformer zero-mean

来源：评论

学校读者我要写书评

暂无评论

MFI-Net: Multi-Feature Fusion Identification Networks for Artificial Intelligence Manipulation

引用

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 2024年第2期34卷 1266-1280页

作者： Ren, Ruyong Hao, Qixian Niu, Shaozhang Xiong, Keyang Zhang, Jiwei Wang, Maosen Beijing Univ Posts & Telecommun Sch Comp Sci Beijing Key Lab Intelligent Telecommun Software & Beijing 100876 Peoples R China Southeast Digital Econ Dev Inst Quzhou 324000 Peoples R China

Tampered images can easily be used for illegal activities, such as spreading rumors, economic fraud, fabricating false news, and illegally obtaining experience benefits, etc. With the improvement and development of artificial intelligence (AI), image manipulation technology has also been further improved, more and more retouching software in daily life adopts AI technology. So far, there is no AI-based tampered dataset. To address this challenge, we propose a dataset-IPM15K. It utilizes the most advanced image processing technology and contains a total of 150,00 doctored vital images. This dataset also could serve as a catalyst for progressing many vision tasks, e.g., localization, segmentation, and alpha-matting, etc. Additionally, we propose an effective multi-feature fusion identification network (MFI-Net) to identify these challenging images. Our model consists of four modules: the detail extraction module (DEM), which utilizes different sizes of convolutions and perceptual fields to extract more valuable information of tampered locations;the multi-branch attention fusion module (MAFM), which fully exploits contextual information of different levels to capture subtle traces of tampering;the feature decoder component (FDC), which combines fused features to identify tampered regions;and the detail enhancement block (DEB), which continues to supplement the detailed information of the detected regions. Extensive experiments on three public datasets and the proposed dataset show that MFI-Net outperforms various state-of-the-art (SOTA) manipulation detection baselines.

关键词： Feature extraction Artificial intelligence Location awareness Software Telecommunications Multimedia communication transform coding Tampered images illegal activities AI-based tampered dataset image processing manipulation detection effective multi-feature fusion

来源：评论

学校读者我要写书评

暂无评论

Spectral envelope reconstruction via IGF for audio transform coding

Spectral envelope reconstruction via IGF for audio transform...

引用

International Conference on Acoustics, Speech, and Signal Processing (ICASSP)

作者： Christian R. Helmrich Andreas Niedermeier Sascha Disch Florin Ghido International Audio Laboratories Erlangen Germany Fraunhofer Institut fur Integrierte Schaltungen (lIS) Germany

In low-bitrate audio coding, modern coders often rely on efficient parametric techniques to enhance the performance of the waveform preserving transform coder core. While the latter features well-known perceptually adapted quantization of spectral coefficients, parametric techniques reconstruct the signal parts that have been quantized to zero by the encoder to meet the low-bitrate constraint. Large numbers of zeroed spectral values and especially consecutive zeros constituting gaps often lead to audible artifacts at the decoder. To avoid such artifacts the new 3GPP Enhanced Voice Services (EVS) coding standard utilizes noise filling and intelligent gap filling (IGF) techniques, guided by spectral envelope information. In this paper the underlying considerations of the parametric energy adjustment and transmission in EVS and its relation to noise filling, IGF, and tonality preservation are presented. It is further shown that complex-valued IGF envelope calculation in the encoder improves the temporal energy stability of some signals while retaining real-valued decoder-side processing.

关键词： Encoding Decoding Noise transforms Quantization (signal) Codecs transform coding

来源：评论

学校读者我要写书评

暂无评论

Lightweight Steganography Detection Method Based on Multiple Residual Structures and transformer

引用

Chinese Journal of Electronics 2024年第4期33卷 965-978页

作者： Hao LI Yi ZHANG Jinwei WANG Weiming ZHANG Xiangyang LUO Key Laboratory of Cyberspace Situation Awareness of Henan Province Nanjing University of Information Science & Technology University of Science and Technology of China

Existing deep learning-based steganography detection methods utilize convolution to automatically capture and learn steganographic features, yielding higher detection efficiency compared to manually designed steganography detection methods. Detection methods based on convolutional neural network frameworks can extract global features by increasing the network's depth and width. These frameworks are not highly sensitive to global features and can lead to significant resource consumption. This manuscript proposes a lightweight steganography detection method based on multiple residual structures and transformer(Res Former). A multi-residuals block based on channel rearrangement is designed in the preprocessing layer. Multiple residuals are used to enrich the residual features and channel shuffle is used to enhance the feature representation capability. A lightweight convolutional and transformer feature extraction backbone is constructed, which reduces the computational and parameter complexity of the network by employing depth-wise separable convolutions. This backbone integrates local and global image features through the fusion of convolutional layers and transformer, enhancing the network's ability to learn global features and effectively enriching feature diversity. An effective weighted loss function is introduced for learning both local and global features, Bias Loss loss function is used to give full play to the role of feature diversity in classification, and cross-entropy loss function and contrast loss function are organically combined to enhance the expression ability of features. Based on Boss Base-1.01, BOWS2 and ALASKA#2, extensive experiments are conducted on the stego images generated by spatial and JPEG domain adaptive steganographic algorithms, employing both classical and state-of-theart steganalysis techniques. The experimental results demonstrate that compared to the SRM, SRNet, Sia Steg Net,CSANet, LWENet, and Sia IRNet methods, the proposed Res

关键词： Q-factor Steganography Accuracy Convolution Computational modeling transform coding transformers

来源：评论

学校读者我要写书评

暂无评论

Online-learned Graph transforms for Adaptive Blocksize Intra-Predictive coding 47

Online-learned Graph Transforms for Adaptive Blocksize Intra...

引用

Conference on Applications of Digital Image Processing XLVII

作者： Lu, Wen-Yang Pavez, Eduardo Ortega, Antonio Zhao, Xin Liu, Shan Univ Southern Calif Dept ECE Los Angeles CA 90007 USA Tencent Media Lab Palo Alto CA USA

ISBN: (纸本)9781510679344;9781510679351

Current video coding standards, including H.264/AVC, HEVC, and VVC, utilize discrete cosine transform (DCT), discrete sine transform (DST), to decorrelate the intra-prediction residuals. However, these transforms often face challenges in effectively decorrelating signals with complex, non-smooth, and non-periodic structures. Even in smooth areas, an abrupt transition (due to noise or prediction artifacts) can limit their effectiveness. This paper presents a novel block-adaptive separable path graph-based transform (GBT) that is particularly adept at handling such signals. This new method focuses on adaptively modifying the block size and learning GBT to enhance the performance. The GBT is learned in an online scenario using sequential K-means clustering, where each available block size has K clusters and K GBT kernels. This approach allows the GBT to be dynamically learned for the current block based on previously reconstructed areas with same block size and similar characteristics. Our evaluation, integrating this method with H.264/AVC intra-coding tools, shows significant improvement over the traditional H.264/AVC DCT in processing high-resolution natural images.

关键词： Graph signal processing graph-based transform signal-dependent transform online learning transform coding intra-prediction

来源：评论

学校读者我要写书评

暂无评论

Iris image compression using wavelets transform coding

Iris image compression using wavelets transform coding

引用

International Conference on Signal Processing and Integrated Networks (SPIN)

作者： Arnob Paul Tanvir Zaman Khan Prajoy Podder Rafi Ahmed M. Muktadir Rahman Mamdudul Haque Khan Department of ECE Institute of Engineering & Management (IEM) Kolkata India Department of ECE Khulna University of Engineering & Technology (KUET) Khulna Bangladesh

Iris recognition system for identity authentication and verification is one of the most precise and accepted biometrics in the world. Portable iris system mostly used in law enforcement applications, has been increasing more rapidly. The portable device, however, requires a narrow-bandwidth communication channel to transmit iris code or iris image. Though a full resolution of iris image is preferred for accurate recognition of individual, to minimize time in a narrow-bandwidth channel for emergency identification, image compression should be used to minimize the size of image. This paper has investigated the effects of compression particularly for iris image based on wavelet transformed image, using Spatial-orientation tree wavelet (STW), Embedded Zero tree Wavelet (EZW) and Set Partitioning in hierarchical trees (SPIHT), to identify the most suitable image compression. In this paper, Haar wavelet transform is utilized for image compression and image decomposition, by varying the decomposition level. The results have been examined in terms of Peak signal to noise ratio (PSNR), Mean square Error (MSE), Bit per Pixel Ratio (BPP) and Compression ratio (CR). It has been evidently found that wavelet transform is more effective in the image compression, as recognition performance is minimally affected and the use of Haar transform is ideally suited. CASIA, MMU iris database have been used for this purpose.

关键词： Image coding Iris recognition Wavelet transforms Databases transform coding Signal processing algorithms

来源：评论

学校读者我要写书评

暂无评论

StyleAM: Perception-Oriented Unsupervised Domain Adaption for No-Reference Image Quality Assessment

引用

IEEE TRANSACTIONS ON MULTIMEDIA 2025年 27卷 2043-2058页

作者： Lu, Yiting Li, Xin Liu, Jianzhao Chen, Zhibo Univ Sci & Technol China CAS Key Lab Technol Geospatial Informat Proc & App Hefei 230027 Peoples R China

Deep neural networks (DNNs) have shown great potential in no-reference image quality assessment (NR-IQA). However, the annotation of NR-IQA is labor-intensive and time-consuming, which severely limits its application, especially for authentic images. To relieve the dependence on quality annotation, some works have applied unsupervised domain adaptation (UDA) to NR-IQA. However, the above methods ignore the fact that the alignment space used in classification is sub-optimal, since the space is not elaborately designed for perception. To solve this challenge, we propose an effective perception-oriented unsupervised domain adaptation method StyleAM (Style Alignment and Mixup) for NR-IQA, which transfers sufficient knowledge from label-rich source domain data to label-free target domain images. Specifically, we find a more compact and reliable space i.e., feature style space for perception-oriented UDA based on an interesting observation, that the feature style (i.e., the mean and variance) of the deep layer in DNNs is exactly associated with the quality score in NR-IQA. Therefore, we propose to align the source and target domains in a more perceptual-oriented space i.e., the feature style space, to reduce the intervention from other quality-irrelevant feature factors. Furthermore, to increase the consistency (i.e., ordinal/continuous characteristics) between quality score and its feature style, we also propose a novel feature augmentation strategy Style Mixup, which mixes the feature styles (i.e., the mean and variance) before the last layer of DNNs together with mixing their labels. Extensive experimental results on many cross-domain settings (e.g., synthetic to authentic, and multiple distortions to one distortion) have demonstrated the effectiveness of our proposed StyleAM on NR-IQA.

关键词： Distortion Feature extraction Image quality Training Reliability Degradation Visualization transforms transform coding Training data Perception-oriented unsupervised domain adaptation no-reference image quality assessment style alignment style mixup

来源：评论

学校读者我要写书评

暂无评论

Complex-Valued Autoencoder-Based Neural Data Compression for SAR Raw Data

引用

IEEE JOURNAL OF SELECTED TOPICS IN SIGNAL PROCESSING 2025年第3期19卷 572-582页

作者： Asiyabi, Reza Mohammadi Datcu, Mihai Anghel, Andrei Focsa, Adrian Martone, Michele Rizzoli, Paola Imbembo, Ernesto Natl Univ Sci & Technol POLITEHN Bucharest Res Ctr Spatial Informat CEOSpaceTech Bucharest 060042 Romania Mil Tech Acad Ferdinand I Bucharest 050141 Romania Microwaves & Radar Inst German Aerosp Ctr DLR D-51147 Wessling Germany ESA ESTEC European Space Agcy NL-2200 AG Noordwijk Netherlands

Recent advances in Synthetic Aperture Radar (SAR) sensors and innovative advanced imagery techniques have enabled SAR systems to acquire very high-resolution images with wide swaths, large bandwidth and in multiple polarization channels. The improvements of the SAR system capabilities also imply a significant increase in SAR data acquisition rates, such that efficient and effective compression methods become necessary. The compression of SAR raw data plays a crucial role in addressing the challenges posed by downlink and memory limitations onboard the SAR satellites and directly affects the quality of the generated SAR image. Neural data compression techniques using deep models have attracted many interests for natural image compression tasks and demonstrated promising results. In this study, neural data compression is extended into the complex domain to develop a Complex-Valued (CV) autoencoder-based data compression for SAR raw data. To this end, the basic fundamentals of data compression and Rate-Distortion (RD) theory are reviewed, well known data compression methods, Block Adaptive Quantization (BAQ) and JPEG2000 methods, are implemented and tested for SAR raw data compression, and a neural data compression based on CV autoencoders is developed for SAR raw data. Furthermore, since the available Sentinel-1 SAR raw products are already compressed with Flexible Dynamic BAQ (FDBAQ), an adaptation procedure applied to the decoded SAR raw data to generate SAR raw data with quasi-uniform quantization that resemble the statistics of the uncompressed SAR raw data onboard the satellites.

关键词： Synthetic aperture radar Quantization (signal) transform coding Image coding Sentinel-1 Standards Satellites Radar polarimetry Correlation Rate-distortion Data compression neural data compression rate-distortion theory SAR raw data compression

来源：评论

学校读者我要写书评

暂无评论

Image Semantic Steganography: A Way to Hide Information in Semantic Communication

引用

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY 2025年第2期35卷 1951-1960页

作者： Huo, Yanhao Xiang, Shijun Luo, Xiangyang Zhang, Xinpeng Jinan Univ Coll Informat Sci & Technol Guangzhou 510632 Peoples R China State Key Lab Math Engn & Adv Comp Zhengzhou 450001 Henan Peoples R China Shanghai Univ Sch Commun & Informat Engn Shanghai 200444 Peoples R China

Semantic communication (SC) is an emerging communication paradigm that transmits only task-related semantic features to receivers, offering advantages in speed. However, existing robust steganography cannot extract message correctly after SC. To address this issues, we propose a novel steganography framework based on Generating Adversarial Networks (GANs) for SC, called "Image Semantic Steganography". Our framework embeds message into semantic features to guarantee extraction while considering both pixel-level and semantic-level distortions to enhance security. Experimental results show that our framework not only achieves message extraction successfully and behavioral covertness during and after SC, but also does not impact the implementation of SC.

关键词： Semantics Steganography Image reconstruction Feature extraction Image coding Artificial intelligence Security Distortion Decoding transform coding Semantic steganography semantic communication security GANs

来源：评论

学校读者我要写书评

暂无评论

DeepPCC: Learned Lossy Point Cloud Compression

引用

IEEE TRANSACTIONS ON EMERGING TOPICS IN COMPUTATIONAL INTELLIGENCE 2025年第2期9卷 1897-1909页

作者： Zhang, Junzhe Liu, Gexin Zhang, Junteng Ding, Dandan Ma, Zhan Hangzhou Normal Univ Sch Informat Sci & Technol Hangzhou 311121 Peoples R China Nanjing Univ Sch Elect Sci & Engn Nanjing 210093 Peoples R China

We propose DeepPCC, an end-to-end learning-based approach for the lossy compression of large-scale object point clouds. For both geometry and attribute components, we introduce the Multiscale Neighborhood Information Aggregation (NIA) mechanism, which applies resolution downscaling progressively (i.e., dyadic downsampling of geometry and average pooling of attribute) and combines sparse convolution and local self-attention at each resolution scale for effective feature representation. Under a simple autoencoder structure, scale-wise NIA blocks are stacked as the analysis and synthesis transform in the encoder-decoder pair to best characterize spatial neighbors for accurate approximation of geometry occupancy probability and attribute intensity. Experiments demonstrate that DeepPCC remarkably outperforms state-of-the-art rules-based MPEG G-PCC and learning-based solutions both quantitatively and qualitatively, providing strong evidence that DeepPCC is a promising solution for emerging AI-based PCC.

关键词： Point cloud compression Geometry transform coding Decoding Entropy Convolution Standards Image coding Distortion Correlation geometry attribute sparse convolution local self-attention

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：