Multi-modal large language models (MLLMs) have achieved remarkable success in image- and region-level remote sensing (RS) image understanding tasks, such as image captioning, visual question answering, and visual grou...
详细信息
Speaker adaptive test normalization (ATnorm) is the most effective approach of the widely used score normalization in text-flldependent speaker verification, which selects speaker adaptive impostor cohorts with an e...
详细信息
Speaker adaptive test normalization (ATnorm) is the most effective approach of the widely used score normalization in text-flldependent speaker verification, which selects speaker adaptive impostor cohorts with an extra development corpus in order to enhance the recognition performance. In this paper, an improved implementation of ATnorm that can offer overall significant advantages over the original ATnorm is presented. This method adopts a novel cross similarity measurement in speaker adaptive cohort model selection without an extra development corpus. It can achieve a comparable performance with the original ATnorm and reduce the computation complexity moderately. With the full use of the saved extra development corpus, the overall system performance can be improved significantly. The results are presented on NIST 2006 Speaker recognition Evaluation data corpora where it is shown that this method provides significant improvements in system performance, with relatively 14.4% gain on equal error rate (EER) and 14.6% gain on decision cost function (DCF) obtained as a whole.
This paper develops a reference-based representation method for image categorization and shows that this representation has favorable performance characteristics for multi-class problems. We learn a reconstructive dic...
详细信息
order to help investors understand the credit status of target corporations and reduce investment risks,the corporate credit rating model has become an important evaluation tool in the financial *** models are based o...
详细信息
order to help investors understand the credit status of target corporations and reduce investment risks,the corporate credit rating model has become an important evaluation tool in the financial *** models are based on statistical learning,machine learning and deep learning especially graph neural networks(GNNs).However,we found that only few models take the hierarchy,heterogeneity or unlabeled data into account in the actual corporate credit rating ***,we propose a novel framework named hierarchical heterogeneous graph neural networks(HHGNN),which can fully model the hierarchy of corporate features and the heterogeneity of relationships between *** addition,we design an adversarial learning block to make full use of the rich unlabeled samples in the financial *** experiments conducted on the public-listed corporate rating dataset prove that HHGNN achieves SOTA compared to the baseline methods.
On mobile terminals, voice-based local search services [1] are quickly becoming a new important application. Voice search is essentially a large vocabulary speech recognition task with an open ended vocabulary, and th...
详细信息
In the daily application of an iris-recognition-at-a-distance(IAAD)system,many ocular images of low quality are *** the iris part of these images is often not qualified for the recognition requirements,the more access...
详细信息
In the daily application of an iris-recognition-at-a-distance(IAAD)system,many ocular images of low quality are *** the iris part of these images is often not qualified for the recognition requirements,the more accessible periocular regions are a good complement for *** further boost the performance of IAAD systems,a novel end-to-end framework for multi-modal ocular recognition is *** proposed framework mainly consists of iris/periocular feature extraction and matching,unsupervised iris quality assessment,and a score-level adaptive weighted fusion ***,ocular feature reconstruction(OFR)is proposed to sparsely reconstruct each probe image by high-quality gallery images based on proper feature ***,a brand new unsupervised iris quality assessment method based on random multiscale embedding robustness is *** from the existing iris quality assess-ment methods,the quality of an iris image is measured by its robustness in the embedding *** last,the fusion strategy exploits the iris quality score as the fusion weight to coalesce the complementary information from the iris and periocular *** experi-mental results on ocular datasets prove that the proposed method is obviously better than unimodal biometrics,and the fusion strategy can significantly improve therecognition performance.
Most of the quantization based watermarking algorithms are very sensitive to valumetric distortions, while these distortions are regarded as common processing in audio/video analysis. In recent years, watermarking met...
详细信息
Most of the quantization based watermarking algorithms are very sensitive to valumetric distortions, while these distortions are regarded as common processing in audio/video analysis. In recent years, watermarking methods which can resist this kind of distortions have attracted a lot of interests. But still many proposed methods can only deal with one certain kind of valumetric distortion such as amplitude scaling attack, and fail in other kinds of valumetric distortions like constant change attack, gamma correction or contrast stretching. In this paper, we propose a simple but effective method to tackle all the three kinds of valumetric distortions. This algorithm constructs an invariant domain first by spread transform which satisfies certain constraints. Then an amplitude scale invariant watermarking scheme is applied on the constructed domain. The validity of the approach has been confirmed by applying the watermarking scheme to Gaussian host data and real images. Experimental results confirm its intrinsic invariance against amplitude scaling, constant change attack and robustness improvement against nonlinear valumetric distortions.
Deep learning-based models are vulnerable to adversarial attacks. Defense against adversarial attacks is essential for sensitive and safety-critical scenarios. However, deep learning methods still lack effective and e...
详细信息
Deep learning-based models are vulnerable to adversarial attacks. Defense against adversarial attacks is essential for sensitive and safety-critical scenarios. However, deep learning methods still lack effective and efficient defense mechanisms against adversarial attacks. Most of the existing methods are just stopgaps for specific adversarial samples. The main obstacle is that how adversarial samples fool the deep learning models is still unclear. The underlying working mechanism of adversarial samples has not been well explored, and it is the bottleneck of adversarial attack defense. In this paper, we build a causal model to interpret the generation and performance of adversarial samples. The self-attention/transformer is adopted as a powerful tool in this causal model. Compared to existing methods, causality enables us to analyze adversarial samples more naturally and intrinsically. Based on this causal model, the working mechanism of adversarial samples is revealed, and instructive analysis is provided. Then, we propose simple and effective adversarial sample detection and recognition methods according to the revealed working mechanism. The causal insights enable us to detect and recognize adversarial samples without any extra model or training. Extensive experiments are conducted to demonstrate the effectiveness of the proposed methods. Our methods outperform the state-of-the-art defense methods under various adversarial attacks.
In statistical machine translation (SMT) research, phrase-based methods have been receiving more interest in recent years. In this paper, we first give a brief survey of phrase-based SMT framework, and then make detai...
详细信息
ISBN:
(纸本)9783540709381
In statistical machine translation (SMT) research, phrase-based methods have been receiving more interest in recent years. In this paper, we first give a brief survey of phrase-based SMT framework, and then make detailed comparisons of two typical implementations: alignment template approach and standard phrase-based approach. At last, we propose an improved model to integrate alignment template into standard phrase-based SMT as a new feature in a log-linear model. Experimental results show that our method outperforms the baseline method.
The traditional multi-camera object tracking contains two steps: single camera object tracking (SCT) and inter-camera object tracking (ICT). The ICT performance strongly relies on the great results of SCT. In practice...
详细信息
暂无评论