检索结果-内蒙古大学图书馆

2024 16th International Conference on Graphics and Image Processing, ICGIP 2024

作者： Sun, Liliang He, Yuanlie Li, Wensheng Feng, Fujian Liang, Yihui School of Computer Guangdong University of Technology Guangzhou510000 China School of Computer Science Zhongshan Institute University of Electronic Science and Technology of China Zhongshan528400 China Guizhou Key Laboratory of Pattern Recognition and Intelligent System Guizhou Minzu University Guiyang550025 China

ISBN: (数字)9781510688780

ISBN: (纸本)9781510688773

This paper focuses on the image composition of transparent objects, where existing image matting methods suffer from composition errors due to the lack of accurate foreground during the composition process. We propose a foreground prediction model named ALGM, which leverages the local feature extraction capabilities of Convolutional Neural Networks (CNNs) and incorporates an attention mechanism for global information modeling. The proposed alpha-assisted foreground prediction module extracts foreground information from the original image and conveys it. The extracted foreground color information is combined with the deep structural features of the encoder and used for foreground color prediction. ALGM reduces image composition errors in the quantitative data from the Composition-1k dataset and improves the visual quality of composed images on the AIM-500 and Transparent-460 datasets. © 2025 SPIE.

关键词： Prediction models

来源：评论

学校读者我要写书评

暂无评论

Adaptive Multi-Resolution Feature Fusion for Fine-Grained Visual Classification

引用

IEEE Transactions on Circuits and systems for Video Technology 2025年

作者： Yang, Yuqi Chang, Dongliang Du, Ruoyi Song, Yi-Zhe Ma, Zhanyu Beijing University of Posts and Telecommunications Pattern Recognition and Intelligent System Laboratory School of Artificial Intelligence Beijing100876 China Tsinghua University Department of Automation Beijing100084 China University of Surrey SketchX CVSSP GuildfordGU2 7XH United Kingdom

Despite significant progress, the shortage of labeled data and expert knowledge remains a challenge for Fine-grained Visual Classification (FGVC). Some multi-source approaches that incorporate additional modalities, such as sound or bounding boxes, show promise for data enrichment but introduce added complexity to data collection. In this paper, we pose the question: can multi-source capabilities be achieved solely with existing images? The answer, confirmed by a pilot study, is affirmative. By analyzing the probability distribution of model output with different resolutions image, we find that complementary information beneficial to FGVC exists among images of different resolutions. Although the classification accuracy of low-resolution images is lower than high-resolution images, it can provide additional information for high-resolution input images. We designed a naive baseline that uses mixed training of multi-resolution images. Through the experimental results of the baseline, we find that i) not all low-resolution images are beneficial, and ii) adaptively selecting low-resolution images is what we need. Therefore, we proposed a meta-learning-based adaptive "resolution" pooling layer. Through the pooling operation, the features of low-resolution images are obtained from high-resolution images, and the most appropriate complementary features are selected for the features of high-resolution images through the gating mechanism, which enables the model to fully and autonomously exploit the complementary information. Experimental results on three FGVC datasets validate the effectiveness of our proposed method. Our code is available at https://***/PRIS-CV/Adaptive-Multi-Resolution-Feature-Fusion. © 2025 IEEE. All rights reserved.

关键词： Information fusion

来源：评论

学校读者我要写书评

暂无评论

Zero-Shot Audio Captioning Using Soft and Hard Prompts

IEEE Transactions on Audio, Speech and Language Processing

引用

IEEE Transactions on Audio, Speech and Language Processing 2025年 33卷 2045-2058页

作者： Yiming Zhang Xuenan Xu Ruoyi Du Haohe Liu Yuan Dong Zheng-Hua Tan Wenwu Wang Zhanyu Ma Pattern Recognition and Intelligent System Laboratory School of Artificial Intelligence Beijing University of Posts and Telecommunications Beijing China Department of Computer Science and Engineering Shanghai Jiao Tong University Shanghai China Centre for Vision Speech and Signal Processing University of Surrey Guildford U.K. Department of Electronic Systems Aalborg University Aalborg Denmark

In traditional audio captioning methods, a model is usually trained in a fully supervised manner using a human-annotated dataset containing audio-text pairs and then evaluated on the test set from the same dataset. Such methods have two limitations. First, these methods are often data-hungry and require time-consuming and expensive human annotations to obtain audio-text pairs. Second, these models often suffer from performance degradation in cross-domain scenarios, i.e., when the input audio comes from a different domain than the training set, and this issue has received little attention. To address these issues, we propose a new zero-shot method for audio captioning. Our method is built on the contrastive language-audio pre-training (CLAP) model. During training, the model reconstructs the ground-truth caption using the CLAP text encoder. In the inference stage, the model generates text descriptions from the CLAP audio embeddings of given audio inputs. To enhance the ability of the model in transitioning from text-to-text generation to audio-to-text generation, we propose to use the mixed-augmentations-based soft prompt to learn more robust latent representations, leveraging instance replacement and embedding augmentation. Additionally, we introduce the retrieval-based acoustic-aware hard prompt to improve the cross-domain performance of the model by employing the domain-agnostic label information of sound events. Extensive experiments on AudioCaps and Clotho benchmarks show the effectiveness of our proposed method, which outperforms other zero-shot audio captioning approaches for in-domain scenarios and outperforms the compared methods for cross-domain scenarios, underscoring the generalization ability of our method.

关键词： Training Decoding Semantics Data models Acoustics Electronic mail Benchmark testing Transformers Robustness Perturbation methods

来源：评论

学校读者我要写书评

暂无评论

Efficient Image Super-Resolution With Feature Interaction Weighted Hybrid Network

引用

IEEE Transactions on Multimedia 2025年 27卷 2256-2267页

作者： Li, Wenjie Li, Juncheng Gao, Guangwei Deng, Weihong Yang, Jian Qi, Guo-Jun Lin, Chia-Wen Beijing University of Posts and Telecommunications Pattern Recognition and Intelligent System Laboratory School of Artificial Intelligence Beijing100080 China Shanghai University School of Communication and Information Engineering Shanghai200444 China Nanjing University of Posts and Telecommunications IVIPLab Institute of Advanced Technology Nanjing210046 China Ministry of Education Key Laboratory of Artificial Intelligence Shanghai200240 China Soochow University Provincial Key Laboratory for Computer Information Processing Technology Suzhou215006 China Nanjing University of Science and Technology School of Computer Science and Technology Nanjing210094 China Westlake University Research Center for Industries of the Future School of Engineering Hangzhou310024 China OPPO Research SeattleWA98101 United States National Tsing Hua University Department of Electrical Engineering Institute of Communications Engneering Hsinchu300044 Taiwan

Lightweight image super-resolution aims to reconstruct high-resolution images from low-resolution images using low computational costs. However, existing methods result in the loss of middle-layer features due to activation functions. To minimize the impact of intermediate feature loss on reconstruction quality, we propose a Feature Interaction Weighted Hybrid Network (FIWHN), which comprises a series of Wide-residual Distillation Interaction Block (WDIB) as the backbone. Every third WDIB forms a Feature Shuffle Weighted Group (FSWG) by applying mutual information shuffle and fusion. Moreover, to mitigate the negative effects of intermediate feature loss, we introduce Wide Residual Weighting units within WDIB. These units effectively fuse features of varying levels of detail through a Wide-residual Distillation Connection (WRDC) and a Self-Calibrating Fusion (SCF). To compensate for global feature deficiencies, we incorporate a Transformer and explore a novel architecture to combine CNN and Transformer. We show that our FIWHN achieves a favorable balance between performance and efficiency through extensive experiments on low-level and high-level tasks. © 1999-2012 IEEE.

关键词： Transformers Convolutional neural networks Computational modeling Feature extraction Training Superresolution Adaptation models Computer architecture Telecommunications Image reconstruction

来源：评论

学校读者我要写书评

暂无评论

Fire Detection Based on Flame Enhancement for Weak Fires 19th

Fire Detection Based on Flame Enhancement for Weak Fires

引用

19th Chinese Conference on Image and Graphics Technologies and Applications, IGTA 2024

作者： Chen, Kuan Wen, Wen Feng, Fujian Xu, Xiang Liang, Yihui School of Computer Guangdong University of Technology Guangzhou510000 China School of Computer Science Zhongshan Institute University of Electronic Science and Technology of China Zhongshan528400 China Guizhou Key Laboratory of Pattern Recognition and Intelligent System Guizhou Minzu University Guiyang550025 China

ISBN: (纸本)9789819799183

Detecting weak fire, such as overexposed and highly transparent flames, remains a significant challenge in vision-based fire detection. Convolutional Neural Network (CNN) based methods are widely used for automatic fire feature extraction, but they struggle to accurately recognize overexposed flames similar to the background and highly transparent flames that blend with the background, leading to false negative fire detection results. To address this issue, we have established a large-scale fire detection dataset with weak flames and introduce a fire detection method based on flame context enhancement. While employing a flame feature extraction module to identify and delineate the potential flame area, the proposed method primarily introduces a flame context module to capture the effective features of weak flames from this area, significantly enhancing the understanding of fires. Experimental results show that the proposed method can effectively reduce false negatives and improves the recall rate for detecting weak flames. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.

关键词： Premixed flames

来源：评论

学校读者我要写书评

暂无评论

Adaptive Pixel Pair Generation Strategy for Image Matting Methods Based on Pixel Pair Optimization 19th

Adaptive Pixel Pair Generation Strategy for Image Matting Me...

引用

19th Chinese Conference on Image and Graphics Technologies and Applications, IGTA 2024

作者： Zheng, Jiamin Wen, Wen Liang, Yihui Feng, Fujian Xu, Xiang School of Computer Guangdong University of Technology Guangzhou510000 China School of Computer Science Zhongshan Institute University of Electronic Science and Technology of China Zhongshan528400 China Guizhou Key Laboratory of Pattern Recognition and Intelligent System Guizhou Minzu University Guiyang550025 China

ISBN: (纸本)9789819799183

Natural image matting plays a crucial role in numerous real-world applications. Image matting methods based on pixel pair optimization is a type of matting algorithm, which has significant advantages in parallel computation, handling images with similar foreground and background, and operating under limited computing time. However, these methods cannot provide high-quality alpha mattes for high-resolution images within limited computational resources. In this paper, we design an adaptive pixel pair generation strategy to solve the above problem. This strategy adaptively generates pixel pair for the unknown pixel according to its estimated alpha value, which promotes diversity of pixel pair and improves the efficiency of pixel pair optimization. It categorizes unknown pixels into three types and additionally incorporates nearby foreground and background pixels to generate pixel pair. Experimental results show that using our strategy can achieve high-quality alpha mattes and competitive matting performance under limited computational resources. © The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2025.

关键词： Pixels

来源：评论

学校读者我要写书评

暂无评论

Constructing Maximum Entropy Language Models for Movie Review Subjectivity Analysis

引用

Journal of Computer Science & Technology 2008年第2期23卷 231-239页

作者：陈博何慧郭军 Pattern Recognition and Intelligent System Laboratory School of Information Engineering Beijing University of Posts and TelecommunicationsBeijing 100876China

Document subjectivity analysis has become an important aspect of web text content mining. This problem is similar to traditional text categorization, thus many related classification techniques can be adapted here. However, there is one significant difference that more language or semantic information is required for better estimating the subjectivity of a document. Therefore, in this paper, our focuses are mainly on two aspects. One is how to extract useful and meaningful language features, and the other is how to construct appropriate language models efficiently for this special task. For the first issue, we conduct a Global-Filtering and Local-Weighting strategy to select and evaluate language features in a series of n-grams with different orders and within various distance-windows. For the second issue, we adopt Maximum Entropy （MaxEnt） modeling methods to construct our language model framework. Besides the classical MaxEnt models, we have also constructed two kinds of improved models with Gaussian and exponential priors respectively. Detailed experiments given in this paper show that with well selected and weighted language features, MaxEnt models with exponential priors are significantly more suitable for the text subjectivity analysis task.

关键词： exponential prior language model maximum entropy n-gram subjectivity analysis

来源：评论

学校读者我要写书评

暂无评论

Improved discriminative training for generative model

引用

The Journal of China Universities of Posts and Telecommunications 2009年第3期16卷 126-130页

作者： WU Ya-hui,GUO Jun,LIU Gang laboratory of pattern recognition and intelligent system,Beijing University of Posts and Telecommunications,Beijing 100876,China [a]Laboratory of Pattern Recognition and Intelligent System Beijing University of Posts and Telecommunications Beijing 100876 China

This article proposes a model combination method to enhance the discriminability of the generative model. Generative and discriminative models have different optimization objectives and have their own advantages and drawbacks. The method proposed in this article intends to strike a balance between the two models mentioned above. It extracts the discriminative parameter from the generative model and generates a new model based on a multi-model combination. The weight for combining is determined by the ratio of the inter-variance to the intra-variance of the classes. The higher the ratio is, the greater the weight is, and the more discriminative the model will be. Experiments on speech recognition demonstrate that the performance of the new model outperforms the model trained with the traditional generative method.

关键词： model combination discriminability generative model training speech recognition

来源：评论

学校读者我要写书评

暂无评论

Audio Fingerprinting Based on N-grams

International Journal of Digital Content Technology and its ...

引用

International Journal of Digital Content Technology and its Applications 2012年第10期6卷 361-368页

作者： Wang, Qiang Guo, Zhiyuan Liu, Gang Guo, Jun Pattern Recognition and Intelligent System Laboratory Beijing University of Posts and Telecommunications Beijing 100876 China

In this paper, we present a novel audio fingerprinting method based on N-grams, which can quickly identify a segment of audio even when the audio signals are seriously distorted. We make use of N peaks in spectrum to form the audio fingerprint, which accelerates the retrieval speed greatly. We take advantage of the initial robust peaks to calculate the similarity between candidates and the input audio, which improves the retrieval accuracy significantly. The effectiveness of the N-gram method was evaluated on a music database of 10,000 songs. Experimental results show that the proposed approach outperforms two state-of-the-art algorithms (Shazam and Philips Robust Hash) in both effectiveness (in terms of retrieval accuracy) and efficiency (in terms of average retrieval time).

关键词： Music

来源：评论

学校读者我要写书评

暂无评论

Efficient a priori SNR estimation based on parameter adaptive spectral method

引用

Journal of China Universities of Posts and Telecommunications 2011年第SUPPL. 1期18卷 60-63页

作者： Fang, Yu Liu, Gang Guo, Jun Pattern Recognition and Intelligent System Laboratory Beijing University of Posts and Telecommunications Beijing 100876 China

The a priori signal-to-noise (SNR) is one of the most important parameters in the short-time spectrum estimation techniques in speech enhancement. A new and convenient algorithm to estimate the priori SNR is involved in this paper. In this paper, the priori and posterior SNR of intra-frame are defined which can trace the variation of the a priori SNR of each frame better and can solve the problem of delay involved by the traditional approaches. Simulation shows that, the performance of the proposed algorithm is better than the traditional estimators in terms of log-spectral distance and the improvement segmental SNR, especially in the no stationary noise environments. © 2011 The Journal of China Universities of Posts and Telecommunications.

关键词： Speech enhancement

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：