Low-altitude unmanned aerial vehicle (UAV) targets are challenging to detect by radar due to their low flight altitude, which exposes them to strong interferences from complex ground clutter. To address this, this pap...
详细信息
Many research indicates the importance of polarimetric information in radar systems for purposes such as detection, tracking, and parameter estimation. The calibration process is essential for obtaining precise polari...
详细信息
The partition of individual targets into proper clusters is crucial in tracking group targets with complex spatial structures. However, most of the current clustering methods suffer from heavy computation burdens and ...
详细信息
There are a number of leaf recognition methods, but most of them are based on Euclidean space. In this paper, we will introduce a new description of feature for the leaf image recognition, which represents the leaf co...
详细信息
Vision-language pre-trained models (e.g., CLIP) trained on large-scale datasets via self-supervised learning, are drawing increasing research attention since they can achieve superior performances on multi-modal downs...
Vision-language pre-trained models (e.g., CLIP) trained on large-scale datasets via self-supervised learning, are drawing increasing research attention since they can achieve superior performances on multi-modal downstream tasks. Nevertheless, we find that the adversarial perturbations crafted on vision-language pre-trained models can be used to attack different corresponding downstream task models. Specifically, to investigate such adversarial transferability, we introduce a task-agnostic method named Global and Local Augmentation (GLA) attack to generate highly transferable adversarial examples on CLIP, to attack black-box downstream task models. GLA adopts random crop and resize at both global and local patch levels, to create more diversity and make adversarial noises robust. Then GLA generates the adversarial perturbations by minimizing the cosine similarity between intermediate features from augmented adversarial and benign examples. Extensive experiments on three CLIP image encoders with different backbones and three different downstream tasks demonstrate the superiority of our method compared with other strong baselines. The code is available at https://***/yqlvcoding/GLAattack.
Video captioning is an important vision task and has been intensively studied in the computer vision community. Existing methods that utilize the fine-grained spatial information have achieved significant improvements...
详细信息
ISBN:
(纸本)9781665428132
Video captioning is an important vision task and has been intensively studied in the computer vision community. Existing methods that utilize the fine-grained spatial information have achieved significant improvements, however, they either rely on costly external object detectors or do not sufficiently model the spatial/temporal relations. In this paper, we aim at designing a spatial information extraction and aggregation method for video captioning without the need of external object detectors. For this purpose, we propose a Recurrent Region Attention module to better extract diverse spatial features, and by employing Motion-Guided Cross-frame Message Passing, our model is aware of the temporal structure and able to establish high-order relations among the diverse regions across frames. They jointly encourage information communication and produce compact and powerful video representations. Furthermore, an Adjusted Temporal Graph Decoder is proposed to flexibly update video features and model high-order temporal relations during decoding. Experimental results on three benchmark datasets: MSVD, MSR-VTT, and VATEX demonstrate that our proposed method can outperform state-of-the-art methods.
A joint speech signal enhancement based on singular value decomposition filter after spectral subtraction (SSVD) is proposed in this paper. The residual noise after spectral subtraction, which results for audible musi...
详细信息
ISBN:
(纸本)7801501144
A joint speech signal enhancement based on singular value decomposition filter after spectral subtraction (SSVD) is proposed in this paper. The residual noise after spectral subtraction, which results for audible musical noise, is reduced further by SVD filter. The matrix size in spectral domain can be reduced half, and larger step-length adopted by SVD filter in spectral domain leads to lower cost, which make sure that the system can work in real-time. A novel speech/pause detector based on entropy(ESPD) is proposed too. The new detector improves the performance of the whole noise suppression system significantly.
At the early stages of the drug discovery, molecule toxicity prediction is crucial to excluding drug candidates that are likely to fail in clinical trials. In this paper, we presented a novel molecular representation ...
At the early stages of the drug discovery, molecule toxicity prediction is crucial to excluding drug candidates that are likely to fail in clinical trials. In this paper, we presented a novel molecular representation method and developed a corresponding deep learning-based framework called TOP (the abbreviation of TOxicity Prediction). TOP integrated a serial special data processing methods, a bidirectional gated recurrent unit-based RNN (BiGRU) and a fully connected neural network for end-to-end molecular representation learning and chemical toxicity prediction. TOP can automatically learn a mixed molecular representation from not only SMILES contextual information that describes the molecule structure, but also physiochemical properties. Therefore, TOP can overcome the drawbacks of existing methods that use either of them, thus greatly promotes toxicity prediction. We conducted extensive experiments over 14 classic toxicity prediction tasks on three different benchmark datasets, including balanced and imbalanced ones. The results show that, with the help of the novel molecular representation method, TOP significantly outperforms not only three baseline machine learning methods, but also five state-of-the-art methods.
A new method for speech signal reconstruction is proposed by performing a nonlinear Kernel Principal Component Analysis (KPCA). By the use of kernel functions, one can efficiently compute principal components in high-...
详细信息
ISBN:
(纸本)7801501144
A new method for speech signal reconstruction is proposed by performing a nonlinear Kernel Principal Component Analysis (KPCA). By the use of kernel functions, one can efficiently compute principal components in high-dimensional feature spaces, and reconstruct vectors mapping from input space by those dominant principal components. As the reconstructed vectors is expressed in high dimensional feature space and they could not exist pre-image in input space. For finding pre-image, we use iteration method to approximate the pre-image. The experimental results using KPCA in data reconstruction and denoising in speech signal show that it had many potential advantages comparing with PCA.
Resource Space Model (RSM) is a semantic model to manage resources in the future interconnection environment. The query capability is an important aspect of RSM as a semantic resource management model. This paper repo...
详细信息
Resource Space Model (RSM) is a semantic model to manage resources in the future interconnection environment. The query capability is an important aspect of RSM as a semantic resource management model. This paper reports the research result on the query capability of RSM from two perspectives: resource space algebra and resource space calculus. The equivalence of the resource space algebra and the resource space calculus has been discussed.
暂无评论