检索结果-内蒙古大学图书馆

Distribution-flexible subset quantization for post-quantizing super-resolution networks

Science China(Information Sciences) 2025年第3期68卷 163-180页

作者： Yunshan ZHONG Mingbao LIN Jingjing XIE Yuxin ZHANG Fei CHAO Rongrong JI Institute of Artificial Intelligence Xiamen University Key Laboratory of Multimedia Trusted Perception and Efficient Computing Ministry of Education of ChinaXiamen University Tencent Youtu Lab Department of Artificial Intelligence School of Informatics Xiamen University Peng Cheng Laboratory

This paper introduces distribution-flexible subset quantization(DFSQ), a post-training quantization method for super-resolution networks. Our motivation for developing DFSQ is based on the distinctive activation distributions of current super-resolution models, which exhibit significant variance across samples and channels. To address this issue, DFSQ conducts channel-wise normalization of the activations and applies distribution-flexible subset quantization(SQ), wherein the quantization points are selected from a universal set consisting of multi-word additive log-scale values. To expedite the selection of quantization points in SQ, we propose a fast quantization points selection strategy that uses K-means clustering to select the quantization points closest to the centroids. Compared to the common iterative exhaustive search algorithm, our strategy avoids the enumeration of all possible combinations in the universal set, reducing the time complexity from exponential to linear. Consequently, the constraint of time costs on the size of the universal set is greatly relaxed. Extensive evaluations of various super-resolution models show that DFSQ effectively improves performance even without fine-tuning. For example, for 4-bit EDSR×2 on the Urban benchmark, DFSQ obtains 0.242 dB PSNR gains.

关键词： super-resolution post-training quantization distribution-flexible subset quantization neural network

来源：评论

学校读者我要写书评

暂无评论

Aligning Text-to-Image Diffusion Models without Human Feedback

Aligning Text-to-Image Diffusion Models without Human Feedba...

引用

2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025

作者： Liu, Tao Kuang, Huafeng Lin, Xianming Key Laboratory of Multimedia Trusted Perception and Efficient Computing Ministry of Education of China Xiamen University 361005 China

ISBN: (纸本)9798350368741

Incorporating human feedback to optimize text-to-image models has demonstrated significant effectiveness. However, the process of collecting high-quality human preference labels is both resource-intensive and time-consuming. To address this challenge, we propose a novel approach that leverages a large language model (LLM) to generate sophisticated prompts, guiding the diffusion model towards enhanced image generation. This process inherently produces ranking pairs that approximate human preferences. We further introduce a novel integration of AI feedback with a Supervised Fine-Tuning (SFT) policy, aligning the model with preference labels derived from AI. Our experiments demonstrate that our approach achieves a notable approximation of human preferences, achieving a performance level of 68.13% compared to human-level benchmarks and delivering competitive results. Furthermore, we showcase the synergistic effects of combining AI feedback with human feedback, resulting in further improvements in image quality. This research offers fresh insights into AI feedback learning within text-to-image generation and lays the groundwork for more efficient and cost-effective training methodologies. © 2025 IEEE.

关键词： AI feedback diffusion model human feedback large language model

来源：评论

学校读者我要写书评

暂无评论

Efficient Infrared Image Super-Resolution Reconstruction via Guided Filter Coefficients Estimation with Parallax Attention Mechanism

Efficient Infrared Image Super-Resolution Reconstruction via...

引用

2025 IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2025

作者： Wu, Qingyao Chen, Bosheng Li, Chen Tu, Xiaotong Ding, Xinghao Huang, Yue Key Laboratory of Multimedia Trusted Perception and Efficient Computing Ministry of Education of China Xiamen University 361005 China National Key Laboratory of Infrared Detection Technologies Shanghai China

ISBN: (纸本)9798350368741

Due to the spectral range mismatch between the images, building an efficient infrared (IR) image super-resolution algorithm suitable for embedded devices remains a significant challenge. Given that visible images possess more abundant high-frequency information compared to infrared images, we utilize the visible light to guide infrared image super-resolution reconstruction. Specifically, we transfer the reconstruction task to a guided filter learning process, whose coefficients are estimated by joint learning of visible and infrared image to complete the reconstruction through homologous constraints. In order to efficiently predict guided filter coefficients, we design a lightweight network which incorporates reparameterized differential convolution blocks and a feature fusion strategy. Striving to enhance the fusion strategy performance, we utilize parallax attention mechanism to solve the non-pixel registration problem between infrared and visible images. Extensive experiments on two challenging IR image datasets show that our method performs SOTA in terms of PSNR, SSIM and LPIPS as compared to current state-of-the-art approaches while showing its effectiveness and practicality in the edge platform of RK3588. © 2025 IEEE.

关键词： attention mechanism Image reconstruction infrared image lightweight

来源：评论

学校读者我要写书评

暂无评论

Learning Based Interference Coordination for Maritime Communications

引用

China Communications 2025年第4期22卷 356-374页

作者： Liu Chuhuan Xiao Liang Chen Yifan Li Siyao Yang Helin Lyu Zefang Department of Information and Communication Engineering Xiamen UniversityXiamen 361005China Key Laboratory of Multimedia Trusted Perception and Efficient Computing Ministry of Education of ChinaXiamen UniversityXiamen 361005China

With the boom in maritime activities,the need for highly reliable maritime communication is becoming urgent,which is an important component of 5G/6G communication ***,the bandwidth reuse characteristic of 5G/6G networks will inevitably lead to severe interference,resulting in degradation in the communication performance of maritime *** this paper,we propose a safe deep reinforcement learning based interference coordination scheme to jointly optimize the power control and bandwidth allocation in maritime communication systems,and exploit the quality-of-service requirements of users as the risk value references to evaluate the communication *** particular,this scheme designs a deep neural network to select the communication policies through the evaluation network and update the parameters using the target network,which improves the communication performance and speeds up the convergence ***,the Nash equilibrium of the interference coordination game and the computational complexity of the proposed scheme are *** and experimental results verify the performance gain of the proposed scheme compared with benchmarks.

关键词： bandwidth allocation interference coordination maritime communication power control rein-forcement learning

来源：评论

学校读者我要写书评

暂无评论

M3ixup: A multi-modal data augmentation approach for image captioning

引用

Pattern Recognition 2025年 158卷

作者： Li, Yinan Ji, Jiayi Sun, Xiaoshuai Zhou, Yiyi Luo, Yunpeng Ji, Rongrong Key Laboratory of Multimedia Trusted Perception and Efficient Computing Ministry of Education of China Xiamen University 361005 China

Despite the great success, most models in image captioning (IC) are still stuck in the dilemma of generating simple and non-discriminative captions. In this paper, we study this problem from the perspective of data augmentation and propose a novel method called Multi-modal Mixup (M3ixup). Compared with the original Mixup strategy designed for image classification, the proposed M3ixup has three novel designs to mix IC samples from the aspects of visual features, sentence embeddings and loss values, respectively. In practice, M3ixup can not only enrich the diversity of IC training data, but also enforce the model to focus more on visual information for captioning, thereby alleviating the negative effect of dataset bias and addressing the issue of simple captioning. To validate M3ixup, we apply it to three baseline models and conduct extensive experiments on MS COCO. The experimental results demonstrate that our proposed M3ixup can not only improve the discriminability and quality of generated captions, but also help the baseline models obtain obvious performance gains, i.e., improving the CIDEr scores of the state-of-the-art model from 133.8 to 135.3 on off-line testing and 135.4 to 137.1 on online testing. © 2024

关键词： Modal analysis

来源：评论

学校读者我要写书评

暂无评论

Monte Carlo Tree Search Based Prompt Autogeneration for Jailbreak Attacks against LLMs 31

Monte Carlo Tree Search Based Prompt Autogeneration for Jail...

引用

31st International Conference on Computational Linguistics, COLING 2025

作者： Wu, Suhuang Wang, Huimin Zhao, Yutian Wu, Xian Zheng, Yefeng Li, Wei Li, Hui Ji, Rongrong Key Laboratory of Multimedia Trusted Perception and Efficient Computing Ministry of Education of China Xiamen University China Tencent Jarvis Lab China Medical Artificial Intelligence Lab Westlake University China Faculty of Computing Harbin Institute of Technology China

ISBN: (纸本)9798891761964

Jailbreak attacks craft specific prompts or append adversarial suffixes to prompts, thereby inducing language models to generate harmful or unethical content and bypassing the model's safety guardrails. With the recent blossom of large language models (LLMs), there's a growing focus on jailbreak attacks to probe their safety. While current white-box attacks typically focus on meticulously identifying adversarial suffixes for specific models, their effectiveness and efficiency diminish when applied to different LLMs. In this paper, we propose a Monte Carlo Tree Search (MCTS) based Prompt Auto-generation (MPA) method to enhance the effectiveness and efficiency of attacks across various models. MPA automatically searches for and generates adversarial suffixes for valid jailbreak attacks. Specifically, we first identify a series of action candidates that could potentially trick LLMs into providing harmful responses. To streamline the exploration of adversarial suffixes, we design a prior confidence probability for each MCTS node. We then iteratively auto-generate adversarial prompts using the MCTS framework. Extensive experiments on multiple open-source models (like Llama, Gemma, and Mistral) and closed-source models (such as ChatGPT) show that our proposed MPA surpasses existing methods in search efficiency as well as attack effectiveness. The codes are available at https://***/KDEGroup/MPA. © 2025 Association for Computational Linguistics.

关键词： Guard rails

来源：评论

学校读者我要写书评

暂无评论

LIGHTMOTION: A LIGHT AND TUNING-FREE METHOD FOR SIMULATING CAMERA MOTION IN VIDEO GENERATION

arXiv

引用

arXiv 2025年

作者： Song, Quanjian Lin, Zhihang Zeng, Zhanpeng Zhang, Ziyue Cao, Liujuan Ji, Rongrong Key Laboratory of Multimedia Trusted Perception and Efficient Computing Ministry of Education of China Xiamen University China

Existing camera motion-controlled video generation methods face computational bottlenecks in fine-tuning and inference. This paper proposes LightMotion, a light and tuning-free method for simulating camera motion in video generation. Operating in the latent space, it eliminates additional fine-tuning, inpainting, and depth estimation, making it more streamlined than existing methods. The endeavors of this paper comprise: (i) The latent space permutation operation effectively simulates various camera motions like panning, zooming, and rotation. (ii) The latent space resampling strategy combines background-aware sampling and cross-frame alignment to accurately fill new perspectives while maintaining coherence across frames. (iii) Our in-depth analysis shows that the permutation and resampling cause an SNR shift in latent space, leading to poor-quality generation. To address this, we propose latent space correction, which reintroduces noise during denoising to mitigate SNR shift and enhance video generation quality. Exhaustive experiments show that our LightMotion outperforms existing methods, both quantitatively and qualitatively. Copyright © 2025, The Authors. All rights reserved.

关键词： Signal to noise ratio

来源：评论

学校读者我要写书评

暂无评论

Representation Purification for End-to-End Speech Translation 31

Representation Purification for End-to-End Speech Translatio...

引用

31st International Conference on Computational Linguistics, COLING 2025

作者： Zhang, Chengwei Zhou, Yue Zhao, Rui Chen, Yidong Shi, Xiaodong School of Informatics Xiamen University China Key Laboratory of Digital Protection and Intelligent Processing of Intangible Cultural Heritage of Fujian and Taiwan Ministry of Culture and Tourism China Key Laboratory of Multimedia Trusted Perception and Efficient Computing Ministry of Education of China Xiamen University Xiamen China

ISBN: (纸本)9798891761964

Speech-to-text translation (ST) is a cross-modal task that involves converting spoken language into text in a different language. Previous research primarily focused on enhancing speech translation by facilitating knowledge transfer from machine translation, exploring various methods to bridge the gap between speech and text modalities. Despite substantial progress made, factors in speech that are not relevant to translation content, such as timbre and rhythm, often limit the efficiency of knowledge transfer. In this paper, we conceptualize speech representation as a combination of content-agnostic and content-relevant factors. We examine the impact of content-agnostic factors on translation performance through preliminary experiments and observe a significant performance deterioration when content-agnostic perturbations are introduced to speech signals. To address this issue, we propose a Speech Representation Purification with Supervision Enhancement (SRPSE) framework, which excludes the content-agnostic components within speech representations to mitigate their negative impact on ST. Experiments on MuST-C and CoVoST-2 datasets demonstrate that SRPSE significantly improves translation performance across all translation directions in three settings and achieves preeminent performance under a transcript-free setting. © 2025 Association for Computational Linguistics.

关键词：

来源：评论

学校读者我要写书评

暂无评论

Evolving High-Quality Rendering and Reconstruction in a Unified Framework with Contribution-Adaptive Regularization

arXiv

引用

arXiv 2025年

作者： Shen, You Zhang, Zhipeng Li, Xinyang Qu, Yansong Lin, Yu Zhang, Shengchuan Cao, Liujuan Key Laboratory of Multimedia Trusted Perception and Efficient Computing Ministry of Education of China Xiamen University China

Representing 3D scenes from multiview images is a core challenge in computer vision and graphics, which requires both precise rendering and accurate reconstruction. Recently, 3D Gaussian Splatting (3DGS) has garnered significant attention for its high-quality rendering and fast inference speed. Yet, due to the unstructured and irregular nature of Gaussian point clouds, ensuring accurate geometry reconstruction remains difficult. Existing methods primarily focus on geometry regularization, with common approaches including primitive-based and dual-model frameworks. However, the former suffers from inherent conflicts between rendering and reconstruction, while the latter is computationally and storage-intensive. To address these challenges, we propose CarGS, a unified model leveraging Contribution-adaptive regularization to achieve simultaneous, high-quality rendering and surface reconstruction. The essence of our framework is learning adaptive contribution for Gaussian primitives by squeezing the knowledge from geometry regularization into a compact MLP. Additionally, we introduce a geometry-guided densification strategy with clues from both normals and Signed Distance Fields (SDF) to improve the capability of capturing high-frequency details. Our design improves the mutual learning of the two tasks, meanwhile its unified structure doesn’t require separate models as in dual-model based approaches, guaranteeing efficiency. Extensive experiments demonstrate CarGS’s ability to achieve state-of-the-art (SOTA) results in both rendering fidelity and reconstruction accuracy while maintaining real-time speed and minimal storage size. Copyright © 2025, The Authors. All rights reserved.

关键词： 3D reconstruction

来源：评论

学校读者我要写书评

暂无评论

Training-Free Hierarchical Scene Understanding for Gaussian Splatting with Superpoint Graphs

arXiv

引用

arXiv 2025年

作者： Dai, Shaohui Qu, Yansong Li, Zheyan Li, Xinyang Zhang, Shengchuan Cao, Liujuan Key Laboratory of Multimedia Trusted Perception and Efficient Computing Ministry of Education of China Xiamen University China

Bridging natural language and 3D geometry is a crucial step toward flexible, language-driven scene understanding. While recent advances in 3D Gaussian Splatting (3DGS) have enabled fast and high-quality scene reconstruction, research has also explored incorporating open-vocabulary understanding into 3DGS. However, most existing methods require iterative optimization over per-view 2D semantic feature maps, which not only results in inefficiencies but also leads to inconsistent 3D semantics across views. To address these limitations, we introduce a training-free framework that constructs a superpoint graph directly from Gaussian primitives. The superpoint graph partitions the scene into spatially compact and semantically coherent regions, forming view-consistent 3D entities and providing a structured foundation for open-vocabulary understanding. Based on the graph structure, we design an efficient reprojection strategy that lifts 2D semantic features onto the superpoints, avoiding costly multi-view iterative training. The resulting representation ensures strong 3D semantic coherence and naturally supports hierarchical understanding, enabling both coarse- and fine-grained open-vocabulary perception within a unified semantic field. Extensive experiments demonstrate that our method achieves state-of-the-art open-vocabulary segmentation performance, with semantic field reconstruction completed over 30× faster. Our code will be available at https://***/Atrovast/THGS. Copyright © 2025, The Authors. All rights reserved.

关键词： Gaussian distribution

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：