检索结果-内蒙古大学图书馆

arXiv 2025年

作者： Ou, Ruizhe Hu, Yuan Zhang, Fan Chen, Jiaxin Liu, Yu Pattern Recognition and Intelligent System Lab School of Artificial Intelligence Beijing University of Posts and Telecommunications Beijing100876 China Institute of Remote Sensing and Geographic Information Systems School of Earth and Space Sciences Peking University Beijing100871 China Peking University Ordos Research Institute of Energy Ordos017000 China

Multi-modal large language models (MLLMs) have achieved remarkable success in image- and region-level remote sensing (RS) image understanding tasks, such as image captioning, visual question answering, and visual grounding. However, existing RS MLLMs lack the pixel-level dialogue capability, which involves responding to user instructions with segmentation masks for specific instances. In this paper, we propose GeoPix, a RS MLLM that extends image understanding capabilities to the pixel level. This is achieved by equipping the MLLM with a mask predictor, which transforms visual features from the vision encoder into masks conditioned on the LLM’s segmentation token embeddings. To facilitate the segmentation of multi-scale objects in RS imagery, a class-wise learnable memory module is integrated into the mask predictor to capture and store class-wise geo-context at the instance level across the entire dataset. In addition, to address the absence of large-scale datasets for training pixel-level RS MLLMs, we construct the GeoPixInstruct dataset, comprising 65,463 images and 140,412 instances, with each instance annotated with text descriptions, bounding boxes, and masks. Furthermore, we develop a two-stage training strategy to balance the distinct requirements of text generation and masks prediction in multi-modal multi-task optimization. Extensive experiments verify the effectiveness and superiority of GeoPix in pixel-level segmentation tasks, while also maintaining competitive performance in image- and region-level benchmarks. The models, dataset, and code are publicly available at https://***/Norman-Ou/GeoPix. © 2025, CC BY.

关键词： Image segmentation

来源：评论

学校读者我要写书评

暂无评论

引用

The Journal of China Universities of Posts and Telecommunications 2008年第2期15卷 130-134页

作者： ZHAO Jian DONG Yuan ZHAO Xian-yu YANG Hao WANG Hai-la Laboratory of Pattern Recognition and Intelligent System Beijing Universityof Posts and Telecommunications Beijing 100876 China France Telecom Research and Development Center Beijing 100080 China

Speaker adaptive test normalization （ATnorm） is the most effective approach of the widely used score normalization in text-flldependent speaker verification, which selects speaker adaptive impostor cohorts with an extra development corpus in order to enhance the recognition performance. In this paper, an improved implementation of ATnorm that can offer overall significant advantages over the original ATnorm is presented. This method adopts a novel cross similarity measurement in speaker adaptive cohort model selection without an extra development corpus. It can achieve a comparable performance with the original ATnorm and reduce the computation complexity moderately. With the full use of the saved extra development corpus, the overall system performance can be improved significantly. The results are presented on NIST 2006 Speaker recognition Evaluation data corpora where it is shown that this method provides significant improvements in system performance, with relatively 14.4% gain on equal error rate （EER） and 14.6% gain on decision cost function （DCF） obtained as a whole.

关键词： speaker ATnorm score normalization cross similaritymeasurement speaker verification NIST speaker recognitionevaluation

来源：评论

学校读者我要写书评

暂无评论

Learning reference-based representation for image categorization

引用

Journal of Information and Computational Science 2012年第15期9卷 4261-4269页

作者： Li, Qun Zhang, Honggang Guo, Jun Bhanu, Bir Pattern Recognition and Intelligent System Laboratory Beijing University of Posts and Telecommunications Beijing 100876 China Center for Research in Intelligent Systems University of California Riverside CA 92521 United States

This paper develops a reference-based representation method for image categorization and shows that this representation has favorable performance characteristics for multi-class problems. We learn a reconstructive dictionary by associating the reference-set with training data to form a unified objective function. The optimal solution is efficiently obtained using the K-SVD algorithm. After dictionaries are generated, Locality-constrained Linear Coding (LLC) features are extracted. Then, the image feature vector is reconstructed to a histogram-like descriptor by all the image features in the reference-set with a regularization term, leading to a significant reduction of the dimensionality in the feature space. Experimental results show that the proposed method achieves or outperforms results on several benchmarks. © 2012 by Binary Information Press.

关键词： Graphic methods

来源：评论

学校读者我要写书评

暂无评论

Corporate Credit Ratings Based on Hierarchical Heterogeneous Graph Neural Networks

引用

Machine Intelligence research 2024年第2期21卷 257-271页

作者： Bo-Jing Feng Xi Cheng Hao-Nan Xu Wen-Fang Xue Center for Research on Intelligent Perception and Computing National Laboratory of Pattern RecognitionInstitute of AutomationChinese Academy of SciencesBeijing 100190China

order to help investors understand the credit status of target corporations and reduce investment risks,the corporate credit rating model has become an important evaluation tool in the financial *** models are based on statistical learning,machine learning and deep learning especially graph neural networks(GNNs).However,we found that only few models take the hierarchy,heterogeneity or unlabeled data into account in the actual corporate credit rating ***,we propose a novel framework named hierarchical heterogeneous graph neural networks(HHGNN),which can fully model the hierarchy of corporate features and the heterogeneity of relationships between *** addition,we design an adversarial learning block to make full use of the rich unlabeled samples in the financial *** experiments conducted on the public-listed corporate rating dataset prove that HHGNN achieves SOTA compared to the baseline methods.

关键词： Corporate credit rating hierarchical relation heterogeneous graph neural networks adversarial learning

来源：评论

学校读者我要写书评

暂无评论

Voice-based local search using a language model look-ahead structure for efficient pruning

Voice-based local search using a language model look-ahead s...

引用

2011 7th International Conference on Computational Intelligence and Security, CIS 2011

作者： Lu, Yao Liu, Gang Chen, Wei Olsen, Jesper Pattern Recognition and Intelligent System Laboratory BUPT Beijing China Nokia Research Center NOKIA Beijing China

ISBN: (纸本)9780769545844

On mobile terminals, voice-based local search services [1] are quickly becoming a new important application. Voice search is essentially a large vocabulary speech recognition task with an open ended vocabulary, and this is a problem because speed and accuracy are essential for a good user experience. Fortunately when a user submits a local search query, contextual information such as the user's current position can be used for constraining the full search space. In this paper, we use local information and present a pruning algorithm based on a LMLA (Language Model Look-Ahead) tree, which can significantly improve both the speed and the accuracy of the voice search system. © 2011 IEEE.

关键词： Trees (mathematics)

来源：评论

学校读者我要写书评

暂无评论

Boosting Multi-modal Ocular recognition via Spatial Feature Reconstruction and Unsupervised Image Quality Estimation

引用

Machine Intelligence research 2024年第1期21卷 197-214页

作者： Zihui Yan Yunlong Wang Kunbo Zhang Zhenan Sun Lingxiao He Center for Research on Intelligent Perception and Computing National Laboratory of Pattern RecognitionInstitute of AutomationChinese Academy of SciencesBeijing 100190China JD AI Research Beijing 100176China

In the daily application of an iris-recognition-at-a-distance(IAAD)system,many ocular images of low quality are *** the iris part of these images is often not qualified for the recognition requirements,the more accessible periocular regions are a good complement for *** further boost the performance of IAAD systems,a novel end-to-end framework for multi-modal ocular recognition is *** proposed framework mainly consists of iris/periocular feature extraction and matching,unsupervised iris quality assessment,and a score-level adaptive weighted fusion ***,ocular feature reconstruction(OFR)is proposed to sparsely reconstruct each probe image by high-quality gallery images based on proper feature ***,a brand new unsupervised iris quality assessment method based on random multiscale embedding robustness is *** from the existing iris quality assess-ment methods,the quality of an iris image is measured by its robustness in the embedding *** last,the fusion strategy exploits the iris quality score as the fusion weight to coalesce the complementary information from the iris and periocular *** experi-mental results on ocular datasets prove that the proposed method is obviously better than unimodal biometrics,and the fusion strategy can significantly improve therecognition performance.

关键词： Iris recognition periocular recognition spatial feature reconstruction fully convolutional network flexible matching unsupervised iris quality assessment adaptive weight fusion

来源：评论

学校读者我要写书评

暂无评论

Quantization Based Watermarking Methods Against Valumetric Distortions

引用

International Journal of Automation and computing 2017年第6期14卷 672-685页

作者： Zai-Ran Wang Jing Dong Wei Wang College of Engineering and Information Technology University of Chinese Academy of Sciences Center for Research on Intelligent Perception and Computing National Laboratory of Pattern RecognitionInstitute of Automation Chinese Academy of Sciences

Most of the quantization based watermarking algorithms are very sensitive to valumetric distortions, while these distortions are regarded as common processing in audio/video analysis. In recent years, watermarking methods which can resist this kind of distortions have attracted a lot of interests. But still many proposed methods can only deal with one certain kind of valumetric distortion such as amplitude scaling attack, and fail in other kinds of valumetric distortions like constant change attack, gamma correction or contrast stretching. In this paper, we propose a simple but effective method to tackle all the three kinds of valumetric distortions. This algorithm constructs an invariant domain first by spread transform which satisfies certain constraints. Then an amplitude scale invariant watermarking scheme is applied on the constructed domain. The validity of the approach has been confirmed by applying the watermarking scheme to Gaussian host data and real images. Experimental results confirm its intrinsic invariance against amplitude scaling, constant change attack and robustness improvement against nonlinear valumetric distortions.

关键词： Quantization index modulation(QIM) watermarking valumetric distortions amplitude scaling constant change attack

来源：评论

学校读者我要写书评

暂无评论

Towards Interpretable Defense Against Adversarial Attacks via Causal Inference

引用

Machine Intelligence research 2022年第3期19卷 209-226页

作者： Min Ren Yun-Long Wang Zhao-Feng He University of Chinese Academy of Sciences Beijing 100190China Center for Research on Intelligent Perception and Computing National Laboratory of Pattern RecognitionInstitute of AutomationChinese Academy of SciencesBeijing 100190China Laboratory of Visual Computing and Intelligent System Beijing University of Posts and TelecommunicationsBeijing 100876China

Deep learning-based models are vulnerable to adversarial attacks. Defense against adversarial attacks is essential for sensitive and safety-critical scenarios. However, deep learning methods still lack effective and efficient defense mechanisms against adversarial attacks. Most of the existing methods are just stopgaps for specific adversarial samples. The main obstacle is that how adversarial samples fool the deep learning models is still unclear. The underlying working mechanism of adversarial samples has not been well explored, and it is the bottleneck of adversarial attack defense. In this paper, we build a causal model to interpret the generation and performance of adversarial samples. The self-attention/transformer is adopted as a powerful tool in this causal model. Compared to existing methods, causality enables us to analyze adversarial samples more naturally and intrinsically. Based on this causal model, the working mechanism of adversarial samples is revealed, and instructive analysis is provided. Then, we propose simple and effective adversarial sample detection and recognition methods according to the revealed working mechanism. The causal insights enable us to detect and recognize adversarial samples without any extra model or training. Extensive experiments are conducted to demonstrate the effectiveness of the proposed methods. Our methods outperform the state-of-the-art defense methods under various adversarial attacks.

关键词： Adversarial sample adversarial defense causal inference interpretable machine learning transformers

来源：评论

学校读者我要写书评

暂无评论

Comparing and integrating alignment template and standard phrase-based statistical machine translation

Comparing and integrating alignment template and standard ph...

引用

8th International Conference on intelligent Text Processing and Computational Linguistics

作者： Xu, Lin Cao, Xiaoguang Zhang, Bufeng Li, Mu Lab. of Pattern Recognition and Intelligent System Image Processing Center BeiHang University China Al Lab. Computer Science and Technology Tianjing University China Microsoft Research Asia China

ISBN: (纸本)9783540709381

In statistical machine translation (SMT) research, phrase-based methods have been receiving more interest in recent years. In this paper, we first give a brief survey of phrase-based SMT framework, and then make detailed comparisons of two typical implementations: alignment template approach and standard phrase-based approach. At last, we propose an improved model to integrate alignment template into standard phrase-based SMT as a new feature in a log-linear model. Experimental results show that our method outperforms the baseline method.

关键词： Natural language processing systems

来源：评论

学校读者我要写书评

暂无评论

A novel solution for multi-camera object tracking

A novel solution for multi-camera object tracking

引用

作者： Chen, Weihua Cao, Lijun Chen, Xiaotang Huang, Kaiqi Center for Research on Intelligent Perception and Computing National Laboratory of Pattern Recognition Chinese Academy of Sciences China

ISBN: (纸本)9781479957514

The traditional multi-camera object tracking contains two steps: single camera object tracking (SCT) and inter-camera object tracking (ICT). The ICT performance strongly relies on the great results of SCT. In practice, most of current SCT methods are unperfect and products much more fragments. In this paper, a novel solution using a global tracklet association is proposed, which can provide a good ICT performance when the SCT results are not perfect. The proposed solution is also available in non-overlapping views through a new tracklet representation and experiments shows the effectiveness of the proposed novel solution in real scene. © 2014 IEEE.

关键词： Tracking (position)

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：