检索结果-内蒙古大学图书馆

3rd International Conference on Geographic Information and Remote Sensing Technology, GIRST 2024

作者： Zeng, Yuxian Zhang, Xinxin College of Computer and Information Engineering Xiamen University of Technology Xiamen361024 China Fujian Key Laboratory of Pattern Recognition and Image Understanding Xiamen University of Technology Xiamen361024 China

ISBN: (纸本)9781510689077

Deep learning techniques have significantly improved the accuracy and efficiency of change detection of very high resolution (VHR) images. However, many current models ignore the inherent heterogeneity of bi-temporal remote sensing images, thus making it difficult to distinguish between target changes and background variations. To address this problem, we propose a deep learning (DL) framework, namely temporal-spatial feature coordination network (TSFCN), to improve the accuracy of detection by enhancing the differentiation of change pixel features and improving the extraction and processing of change information. Further comparison and ablation experiments on the SVCD dataset show that the proposed model, compared with other models, improves the F1 scores and IoU by 2.09% and 3.81%, respectively. The results demonstrate that our model can achieve better accuracy and performance improvement. © 2025 SPIE.

关键词： Change detection

来源：评论

学校读者我要写书评

暂无评论

Automatic Weight Allocation: optimizing remote sensing image retrieval from contrastive learning perspective

引用

Multimedia Systems 2025年第3期31卷 1-19页

作者： Wang, Sijia Ge, Yun Liu, Qiyang Zeng, Yan School of Software Nanchang Hangkong University Jiangxi Nanchang330000 China Jiangxi Province Key Laboratory of Image Processing and Pattern Recognition Jiangxi Nanchang330063 China

Traditional supervised learning methods achieve remarkable performance in high-resolution remote sensing image retrieval, but are limited by the dependence on large-scale annotated images. Contrastive learning can leverage unlabeled images to learn powerful visual features, demonstrating its potential in many unsupervised tasks. Moreover, hash algorithms show significant potential in the field of image retrieval with their advantages in efficiency and storage. Therefore, we propose the Contrastive Hashing Framework based on Automatic Weight Allocation. The framework employs a two-stage training strategy. In the feature learning stage, we propose the Automatic Weighted Contrastive Loss (AWCLoss). It incorporates Gaussian weighting and dynamic adjustment strategies to improve loss functions, enabling them to focus on the distinctiveness and importance of samples. Gaussian weighting assigns different weight values based on the similarity of sample pairs, enhancing the learning of critical sample pairs. Meanwhile, the dynamic adjustment strategy sets a threshold to identify hard negative samples and then adjusts the weight values to weaken the model from being disturbed by hard negative samples. In the hashing learning stage, a hashing layer is added to the end of the network, which converts high-dimensional representations into hash codes. The Quantization loss is introduced to learn the hash codes so that the semantic similarity structure between data can be preserved in hamming space. Additionally, the AWCLoss is utilized to enhance the discriminative power of the hash codes. Extensive experiments on three remotely sensed datasets UCM, AID and NWPU-RESISC45 have demonstrated the significant superiority of our approach in remote sensing image retrieval. Our source code is available at https://***/WANGSJ77/AWCH. © The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2025.

关键词： Contrastive Learning

来源：评论

学校读者我要写书评

暂无评论

Adjustable Gating Prompt Transformer for Facial Attribute recognition with Limited Labeled Data 27th

Adjustable Gating Prompt Transformer for Facial Attribute R...

引用

27th International Conference on pattern recognition, ICPR 2024

作者： Ye, Qinxian Chen, Si Wang, Da-Han Jiang, Nanfeng Su, Yanfei Yan, Yan Fujian Key Laboratory of Pattern Recognition and Image Understanding School of Computer and Information Engineering Xiamen University of Technology Xiamen361024 China School of Informatics Xiamen University Xiamen361005 China

ISBN: (纸本)9783031781032

Existing supervised facial attribute recognition (FAR) methods that rely on large labeled datasets can pose a challenge in real-world scenarios. In the case of limited labeled data, the current methods that introduce auxiliary tasks with a large number of parameters are not conducive to the embedded applications of FAR. To overcome these challenges, this paper develops an adjustable gating prompt Transformer that can handle the limited labeled FAR task with a small number of training parameters. Specifically, we employ an effective image-guided prompt tuning, where the image-related prompt sequence is first generated by feeding image tokens into an image-guided prompt generation network (IPG-Net). Then, the prompt sequence can learn facial image information and guide the frozen pre-trained Transformer to fine-tune the model. In addition, dynamically adjustable gating is applied to the prompt sequence to adaptively adjust the contribution of the prompts from different encoder layers, which enhances the interaction between the different encoder layers and retains effective feature information during the iterative process. Experimental results on the CelebA and LFWA datasets demonstrate that our method outperforms competitive methods with a very small amount of training parameters when only limited labeled data are used. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

关键词： Signal encoding

来源：评论

学校读者我要写书评

暂无评论

DocHFormer: Document image Dewarping via Harmonized Modeling of Hierarchical Priors 27th

DocHFormer: Document Image Dewarping via Harmonized Modelin...

引用

27th International Conference on pattern recognition, ICPR 2024

作者： Zhou, Xinyue Li, Guanting Jiang, Nanfeng Wang, Da-Han Zhang, Xu-Yao Zhu, ShunZhi School of Computer and Information Engineering Xiamen University of Technology Xiamen361024 China Fujian Key Laboratory of Pattern Recognition and Image Understanding Xiamen361024 China State Key Laboratory of Multimodal Artificial Intelligence Systems Institute of Automation of Chinese Academy of Sciences Beijing China

ISBN: (纸本)9783031781186

Document image Dewarping (DID) task aims to address the issue of geometry distortion and improve image quality. In this paper, we propose a simple but effective method, named DocHFormer, that can take hierarchical priors features of images, including document image mask and coordinate positions, as additional information to realize accurate representation. To better exploit these fused information for dewarping, we take them into a harmonized space random shuffle operation, which can stochastically rearrange the pixels across spatial space and further use inverse operation to recover the original order. This way can adapt to allocate each feature pixel with equal probability and thus make full use of multi-type features. Furthermore, we introduce this mechanism into local self-attention to use linear complexity to input resolution and also design a new feed-forward network with structural modeling to boost representation. With the help of the above components, our proposed DocHFormer can achieve competitive performance with lower complexity and also outperform the existing state-of-the-art on several popular datasets. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

关键词： image enhancement

来源：评论

学校读者我要写书评

暂无评论

Document image Shadow Removal via Frequency Information-Oriented Network 27th

Document Image Shadow Removal via Frequency Information-Ori...

引用

27th International Conference on pattern recognition, ICPR 2024

作者： Yang, Fan Zhou, Xinyue Jiang, Nanfeng Wang, Da-Han Zhang, Xu-Yao Li, Guantin Man, Wang Wu, Yun School of Computer and Information Engineering Xiamen University of Technology Xiamen361024 China Fujian Key Laboratory of Pattern Recognition and Image Understanding Xiamen361024 China State Key Laboratory of Multimodal Artificial Intelligence Systems Institute of Automation of Chinese Academy of Sciences Beijing China

ISBN: (纸本)9783031781186

Removing shadows from document images can significantly improve the Quality of Experience (QoE) and boost the performance of the downstream document analysis and recognition tasks. However, existing methods still have limited generalization ability on complex document images and are prone to disrupt the image details. To address this issue, we consider the different shadow types that impact the image content on different frequency sub-bands. This motivates us to exploit frequency-domain information and further design a Frequency Information-oriented Deshadow Network (FID-Net). The proposed FID-Net mainly uses two elaborated modules, named Frequency Feature Extractor (FFE) and a Frequency Feature Refinement (FFR). FFE can generate low/high-frequency features through adaptively decomposing spectra of the shadow image. After that, FFR further refines both frequency features with mutual information operations. With the proposed key designs, extensive experimental results on the commonly used benchmarks demonstrate that the proposed method can learn discriminative shadows and achieve favorable performance against state-of-the-art approaches. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

关键词： image enhancement

来源：评论

学校读者我要写书评

暂无评论

SANS: Spatial-Aware Neural Solver for Plane Geometry Problem 27th

SANS: Spatial-Aware Neural Solver for Plane Geometry Proble...

引用

27th International Conference on pattern recognition, ICPR 2024

作者： Lin, Zi-Hao Xiao, Shun-Xin Chen, Zi-Rong Li, Jian-Min Wang, Da-Han Zhang, Xu-Yao School of Computer and Information Engineering Xiamen University of Technology Xiamen361024 China Fujian Key Laboratory of Pattern Recognition and Image Understanding Xiamen361024 China State Key Laboratory of Multimodal Artificial Intelligence Systems Institute of Automation of Chinese Academy of Sciences Beijing100190 China

ISBN: (纸本)9783031781186

Geometry problem solving (GPS) is an important research direction in artificial intelligence. Previous studies have demonstrated the effectiveness of neural solvers in GPS. However, they are deficiencies in accurately representing spatial relationships of geometric primitives within visually rich geometric diagrams. This paper presents a novel neural solver termed spatial-aware neural solver (SANS) that can perceive spatial relationships between geometric primitives. SANS includes two new modules: multimodal dual-branch spatial awareness pre-trained language module and point-primitive spatial-aware attention module. The pre-training module employs a dual-branch visual-textual point-matching strategy to align visual and textual points, and utilizes semantic structure pre-training to model global relationships. Additionally, the point-primitive spatial awareness attention module enhances the model’s ability to perceive spatial relationships between geometric primitives by accounting for the relative positions of points. Experiments show that SANS achieves 81.5 and 74.1 of accuracy on the Geometry3K and PGPS9K datasets. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

关键词： Semantics

来源：评论

学校读者我要写书评

暂无评论

Ocean archaea PPI prediction with pretraining models 25

Ocean archaea PPI prediction with pretraining models

引用

Proceedings of the 2025 5th International Conference on Bioinformatics and Intelligent Computing

作者： Ying Zhang Yuan Liu Xiaoyong Pan Hongbin Shen Institute of Image Processing and Pattern Recognition Shanghai Jiao Tong University Shanghai China Key Laboratory of System Control and Information Processing Ministry of Education of China Shanghai China

ISBN: (纸本)9798400712203

Protein-Protein Interaction (PPI) provides important insights into the metabolic mechanisms of different biological processes. Although PPIs in some organisms have been investigated systematically, PPIs in the ocean archaea remain largely unexplored. But such species have special investigation value since their adaptation to extreme living conditions may generate unique PPIs. In this paper, we aim to characterize and predict PPIs in ocean archaea to advance understanding of their metabolic networks. First, we collect all ocean archaea PPIs with high confidence from STRING database and analyze the PPI network features, including centrality and enrichment analysis. The functional enrichment results of the largest connecting subgraph in the PPI network show most PPIs in our constructed dataset is related to the translation and transcription processes. Then, we generate an equal number of negative PPI pairs, whose members have either different subcellular locations or GO terms. We also use the generated dataset to test the performance of three pretraining methods and their ensemble methods in the binary PPI prediction task. Our results suggest the ensemble methods could be applied to further improve models’ performance. Fine-tuned models trained on the ocean archaea dataset are expected to predict the other ocean archaea PPIs that are not included in the STRING database and get more understanding about the ocean archaea PPI universe.

关键词： Binary PPI prediction

来源：评论

学校读者我要写书评

暂无评论

Learning Explicit Radical Representations for Zero-Shot Chinese Character recognition 27th

Learning Explicit Radical Representations for Zero-Shot Chin...

引用

27th International Conference on pattern recognition, ICPR 2024

作者： Pan, Song-Liang Wang, Da-Han Jiang, Nanfeng Zhang, Xu-Yao Zhu, Shunzhi School of Computer and Information Engineering Xiamen University of Technology Xiamen361024 China Fujian Key Laboratory of Pattern Recognition and Image Understanding Xiamen361024 China State Key Laboratory of Multimodal Artificial Intelligence Systems Institute of Automation of Chinese Academy of Sciences Beijing100190 China

ISBN: (纸本)9783031781186

Zero-shot Chinese character recognition (ZSCCR) aims to recognize unseen Chinese characters by learning the semantic knowledge of seen characters. Radical-based methods treat Chinese characters as combinations of radicals, recognizing characters by predicting the radicals in the images. Existing radical-based methods have a closed radical parsing process that cannot be intervened in mid-course, relying only on semantic labels for constraints. However, semantic embedding vectors are usually manually designed and lack alignment with visual features, making it extremely difficult for the model to learn and locate discriminative radical representations from visual features. This paper proposes a ZSCCR network called Learning Explicit Radical Representations (LERRNet). LERRNet introduces learnable attribute hint vectors to guide the model in locating discriminative radicals and learning explicit representations of images. Specifically, we introduce a Radical Relevance Enhanced Encoder (RREE) to enhance the correlation of local radicals by augmenting the relationships between grid regions in visual features. Guided by attribute hint vectors, LERRNet employs a Radical Representation Decoder (RRD) to locate the most relevant regions of each radical in the given image and learn explicit radical representations. Extensive experiments demonstrate that LERRNet outperforms state-of-the-art radical/stroke-based methods across three ZSCCR benchmarks. © The Author(s), under exclusive license to Springer Nature Switzerland AG 2025.

关键词： Zero-shot learning

来源：评论

学校读者我要写书评

暂无评论

DFL: cross-view cross-layer discriminative feature learning for fine-grained 3D shape classification

引用

Neural Computing and Applications 2025年 1-22页

作者： Jiang, Jinzhe Bai, Jing Ma, Xiangyu The School of Computer Science and Engineering North Minzu University Yinchuan China The Key Laboratory of Images Processing and Pattern Recognition Laboratory North Minzu University Yinchuan China

Fine-grained 3D shape classification poses challenges in effectively capturing and integrating discriminative features residing in subtle local regions. Previous methods typically extract features independently from individual views of 3D shapes, with a focus on various strategies for fusing these extracted view features. However, this approach neglects interview correlations and potential redundancies among different views. In this study, we introduce $$\hbox {C}^2$$ DFL, which consists of two primary modules: cross-view discriminative feature extraction (CV-DFE) and cross-layer discriminative feature fusion (CL-DFF). CV-DFE integrates discriminative features by merging inputs from multiple views, mitigating limitations associated with isolated feature extraction. CL-DFF dynamically selects key tokens using a transformer model to interactively fuse discriminative features from various levels. Extensive experiments conducted on three categories of the FG3D dataset demonstrate the exceptional efficacy of $$\hbox {C}^2$$ DFL in capturing and integrating discriminative features of 3D shapes. The proposed method achieves state-of-the-art accuracy in fine-grained 3D shape classification (FGSC).

关键词：

来源：评论

学校读者我要写书评

暂无评论

FG3DFormer: Fine-Grained 3D Shape Classification Based on Vision Transformer

FG3DFormer: Fine-Grained 3D Shape Classification Based on Vi...

引用

International Conference on Acoustics, Speech, and Signal processing (ICASSP)

作者： Xiangyu Ma Jing Bai Jinzhe Jiang Bin Peng The School of Computer Science and Engineering North Minzu University The Key Laboratory of Images Processing and Pattern Recognition Laboratory Yinchuan China

ISBN: (数字)9798350368741

ISBN: (纸本)9798350368758

Fine-grained 3D shape classification (FGSC) remains challenging due to the difficulty of adaptively capturing global structure differences and subtle inter-class distinctions. This paper directly extends Vision Transformer (ViT) to FGSC, proposing a pure Transformer network FG3DFormer that fully leverages ViT’s global correlation and local attention abilities. FG3Dformer comprises the Hierarchical Feature Extraction (HFE) and the Hierarchical Feature Refinement (HFR), interconnected through the Adaptive View Region Selection (AVRS). Firstly, the HFE comprehensively evaluates the significance of intra-view patches and views driven by inter-view and intraview attention. Then, the AVRS adaptively selects crucial patch Tokens from different views to serve as sources of subtle local features. Finally, the HFR refines the 3D shape descriptor, capturing more discriminative global and subtle local features by leveraging both the view and selected crucial patch Tokens. Extensive experiments on FG3D and ModelNet40 demonstrate the superiority of FG3Dformer in FGSC and meta-category 3D shape classification tasks.

关键词： Computer vision Visualization Solid modeling Three-dimensional displays Correlation Shape Signal processing Transformers Feature extraction Speech processing

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：