On-site lithium-ion battery state of health (SoH) estimation is of crucial importance for reliable operations of electric vehicles (EVs). Yet, due to the low-quality of unlabeled real-time field data, diverse operatin...
详细信息
Automatic code summarization aims to generate concise natural language descriptions (summary) for source code, which can free software developers from the heavy burden of manual commenting and software maintenance. Ex...
详细信息
An entailment tree is a structured reasoning path that clearly demonstrates the process of deriving hypotheses through multiple steps of inference from known premises. It enhances the interpretability of QA systems. E...
详细信息
Generating coherent and credible explanations remains a significant challenge in the field of AI. In recent years, researchers have delved into the utilization of entailment trees to depict explanations, which exhibit...
详细信息
Much of commonsense knowledge in real world is in the form of procudures or sequences of steps to achieve particular goals. In recent years, knowledge extraction on procedural documents has attracted considerable atte...
详细信息
The conventional deep learning paradigm often involves training a deep model on a server and then deploying the model or its distilled ones to resource-limited edge devices. Usually, the models shall remain fixed once...
详细信息
Transfer-based Adversarial Attacks(TAAs)can deceive a victim model even without prior *** is achieved by leveraging the property of adversarial *** is,when generated from a surrogate model,they retain their features i...
详细信息
Transfer-based Adversarial Attacks(TAAs)can deceive a victim model even without prior *** is achieved by leveraging the property of adversarial *** is,when generated from a surrogate model,they retain their features if applied to other models due to their good ***,adversarial examples often exhibit overfitting,as they are tailored to exploit the particular architecture and feature representation of source ***,when attempting black-box transfer attacks on different target models,their effectiveness is *** solve this problem,this study proposes an approach based on a Regularized Constrained Feature Layer(RCFL).The proposed method first uses regularization constraints to attenuate the initial examples of low-frequency *** are then added to a pre-specified layer of the source model using the back-propagation technique,in order to modify the original adversarial ***,a regularized loss function is used to enhance the black-box transferability between different target *** proposed method is finally tested on the ImageNet,CIFAR-100,and Stanford Car datasets with various target models,The obtained results demonstrate that it achieves a significantly higher transfer-based adversarial attack success rate compared with baseline techniques.
The video grounding(VG) task aims to locate the queried action or event in an untrimmed video based on rich linguistic descriptions. Existing proposal-free methods are trapped in the complex interaction between video ...
详细信息
The video grounding(VG) task aims to locate the queried action or event in an untrimmed video based on rich linguistic descriptions. Existing proposal-free methods are trapped in the complex interaction between video and query, overemphasizing cross-modal feature fusion and feature correlation for VG. In this paper, we propose a novel boundary regression paradigm that performs regression token learning in a transformer. Particularly, we present a simple but effective proposal-free framework, namely video grounding transformer(ViGT), which predicts the temporal boundary using a learnable regression token rather than multi-modal or cross-modal features. In ViGT, the benefits of a learnable token are manifested as follows.(1) The token is unrelated to the video or the query and avoids data bias toward the original video and query.(2) The token simultaneously performs global context aggregation from video and query ***, we employed a sharing feature encoder to project both video and query into a joint feature space before performing cross-modal co-attention(i.e., video-to-query attention and query-to-video attention) to highlight discriminative features in each modality. Furthermore, we concatenated a learnable regression token [REG] with the video and query features as the input of a vision-language transformer. Finally, we utilized the token [REG] to predict the target moment and visual features to constrain the foreground and background probabilities at each timestamp. The proposed ViGT performed well on three public datasets:ANet-Captions, TACoS, and YouCookⅡ. Extensive ablation studies and qualitative analysis further validated the interpretability of ViGT.
Recently, diffusion-based deep generative models (e.g., Stable Diffusion) have shown impressive results in text-to-image synthesis. However, current text-to-image models often require multiple passes of prompt enginee...
详细信息
Math word problem (MWP) represents a critical research area within reading comprehension, where accurate comprehension of math problem text is crucial for generating math expressions. However, current approaches still...
详细信息
暂无评论