Spodoptera frugiperda (fall armyworm, FAW) is a pest that poses a significant threat to global agriculture, with its larvae exhibiting unique morphological characteristics and varying degrees of harm at different inst...
详细信息
Aspect-level sentiment classification aims to determine the sentiment polarity of a sentence toward a given aspect term or aspect category. For sentiment classification toward a given aspect term, some opinions may ex...
详细信息
Aspect-level sentiment classification aims to determine the sentiment polarity of a sentence toward a given aspect term or aspect category. For sentiment classification toward a given aspect term, some opinions may exist that are not the given aspect term's modifiers because a sentence may contain more than one aspect term. Hence, It is necessary to capture relevant opinion for a certain aspect term. To capture the nearest opinion of the aspect term, researchers have used the relative distance between an aspect term and all other words in a sentence. However, this can be infeasible when the sentence has a complex syntactic structure. In this paper, we introduce dependency relation to detect the dependency-related sentiment feature for the aspect term in the dependency parse tree, and integrate this relationship into the convolutional neural network and bidirectional long short-term memory. Experiments show that the related sentiment features for an aspect term help models discriminate its sentiment polarity. The proposed models achieve state-of-the-art results among neural networks. The codes and datasets are released on https://***/LittleSummer114/DW-CNN.
The video grounding(VG) task aims to locate the queried action or event in an untrimmed video based on rich linguistic descriptions. Existing proposal-free methods are trapped in the complex interaction between video ...
详细信息
The video grounding(VG) task aims to locate the queried action or event in an untrimmed video based on rich linguistic descriptions. Existing proposal-free methods are trapped in the complex interaction between video and query, overemphasizing cross-modal feature fusion and feature correlation for VG. In this paper, we propose a novel boundary regression paradigm that performs regression token learning in a transformer. Particularly, we present a simple but effective proposal-free framework, namely video grounding transformer(ViGT), which predicts the temporal boundary using a learnable regression token rather than multi-modal or cross-modal features. In ViGT, the benefits of a learnable token are manifested as follows.(1) The token is unrelated to the video or the query and avoids data bias toward the original video and query.(2) The token simultaneously performs global context aggregation from video and query ***, we employed a sharing feature encoder to project both video and query into a joint feature space before performing cross-modal co-attention(i.e., video-to-query attention and query-to-video attention) to highlight discriminative features in each modality. Furthermore, we concatenated a learnable regression token [REG] with the video and query features as the input of a vision-language transformer. Finally, we utilized the token [REG] to predict the target moment and visual features to constrain the foreground and background probabilities at each timestamp. The proposed ViGT performed well on three public datasets:ANet-Captions, TACoS, and YouCookⅡ. Extensive ablation studies and qualitative analysis further validated the interpretability of ViGT.
Measuring the transverse velocity field in high-resolution solar images is essential for understanding solar *** paper introduces an innovative unsupervised deep learning optical flow model designed to calculate the t...
详细信息
Measuring the transverse velocity field in high-resolution solar images is essential for understanding solar *** paper introduces an innovative unsupervised deep learning optical flow model designed to calculate the transverse velocity field,addressing the challenges of missing optical flow labels and the limited accuracy of velocity field measurements in high-resolution solar *** proposed method converts the transverse velocity field computation problem into an optical flow computation problem,using two forward propagations of features to get rid of the reliance on optical flow ***,it reduces the impact of the“Brightness Consistency”constraint on optical flow accuracy by identifying and handling optical flow *** apply this method to compute the transverse velocity fields of high-resolution solar image sequences from the Hαand TiO bands,observed by the New Vacuum Solar *** experiments with several wellestablished optical flow methods,including those based on supervised deep learning models,show that our approach outperforms the comparison methods according to key evaluation metrics such as Residual Map Mean,Residual Map Variance,Cross Correlation,and Structural Similarity Index ***,since optical flow captures the fundamental motion information in image sequences,the proposed method can be applied to a variety of research areas,including solar image registration,sequence alignment,image super-resolution,magnetic field calibration,and solar activity *** code is available at https://***/jackie-willianm/Transverse-Velocity-Field-Measurement-of-Solar-High-Resolution-Images.
Visual Commonsense Reasoning (VCR) is a challenging task that requires a model to select the correct answer in the context of a given image and question, while also providing a reasonable explanation to justify the ch...
详细信息
Distributed training of graph neural networks (GNNs) has become a crucial technique for processing large graphs. Prevalent GNN frameworks are model-centric, necessitating the transfer of massive graph vertex features ...
详细信息
Recently, diffusion-based deep generative models (e.g., Stable Diffusion) have shown impressive results in text-to-image synthesis. However, current text-to-image models often require multiple passes of prompt enginee...
详细信息
Chain-of-thought distillation is a powerful technique for transferring reasoning abilities from large language models (LLMs) to smaller student models. Previous methods typically require the student to mimic the step-...
详细信息
Chain-of-thought distillation is a powerful technique for transferring reasoning abilities from large language models (LLMs) to smaller student models. Previous methods typically require the student to mimic the step-by-step rationale produced by LLMs, often facing the following challenges: (i) Tokens within a rationale vary in significance, and treating them equally may fail to accurately mimic keypoint tokens, leading to reasoning errors. (ii) They usually distill knowledge by consistently predicting all the steps in a rationale, which falls short in distinguishing the learning order of step generation. This diverges from the human cognitive progression of starting with easy tasks and advancing to harder ones, resulting in sub-optimal outcomes. To this end, we propose a unified framework, called KPOD, to address these issues. Specifically, we propose a token weighting module utilizing mask learning to encourage accurate mimicry of keypoint tokens by the student during distillation. Besides, we develop an in-rationale progressive distillation strategy, starting with training the student to generate the final reasoning steps and gradually extending to cover the entire rationale. To accomplish this, a weighted token generation loss is proposed to assess step reasoning difficulty, and a value function is devised to schedule the progressive distillation by considering both step difficulty and question diversity. Extensive experiments on four reasoning benchmarks illustrate our KPOD outperforms previous methods by a large margin. Copyright 2024 by the author(s)
Dear Editor,This letter proposes a symmetry-preserving dual-stream graph neural network(SDGNN) for precise representation learning to an undirected weighted graph(UWG). Although existing graph neural networks(GNNs) ar...
详细信息
Dear Editor,This letter proposes a symmetry-preserving dual-stream graph neural network(SDGNN) for precise representation learning to an undirected weighted graph(UWG). Although existing graph neural networks(GNNs) are influential instruments for representation learning to a UWG, they invariably adopt a unique node feature matrix for illustrating the sole node set of a UWG.
Large language models (LLMs) such as ChatGPT have exhibited remarkable performance in generating human-like texts. However, machine-generated texts (MGTs) may carry critical risks, such as plagiarism issues, misleadin...
暂无评论