This paper presents one-bit supervision, a novel setting of learning from incomplete annotations, in the scenario of image classification. Instead of training a model upon the accurate label of each sample, our settin...
详细信息
Visual attention not only improves the performance of image captioners, but also serves as a visual interpretation to qualitatively measure the caption rationality and model transparency. Specifically, we expect that ...
详细信息
ISBN:
(数字)9781728171685
ISBN:
(纸本)9781728171692
Visual attention not only improves the performance of image captioners, but also serves as a visual interpretation to qualitatively measure the caption rationality and model transparency. Specifically, we expect that a captioner can fix its attentive gaze on the correct objects while generating the corresponding words. This ability is also known as grounded image captioning. However, the grounding accuracy of existing captioners is far from *** improve the grounding accuracy while retaining the captioning quality, it is expensive to collect the word-region alignment as strong *** this end, we propose a Part-of-Speech (POS) enhanced image-text matching model (SCAN[24]): POS-SCAN, as the effective knowledge distillation for more grounded image captioning. The benefits are two-fold: 1) given a sentence and an image, POS-SCAN can ground the objects more accurately than SCAN; 2) POS-SCAN serves as a word-region alignment regularization for the captioner's visual attention module. By showing benchmark experimental results, we demonstrate that conventional image captioners equipped with POS-SCAN can significantly improve the grounding accuracy without strong supervision. Last but not the least, we explore the indispensable Self-Critical Sequence Training (SCST[46]) in the context of grounded image captioning and show that the image-text matching score can serve as a reward for more grounded captioning 1 .
At present, the query and acquisition of the fragmented knowledge in Chinese court verdicts mainly adopt the class case retrieval method based on the search engine and the rough extraction method for a part of the dat...
详细信息
ISBN:
(数字)9781728181561
ISBN:
(纸本)9781728181578
At present, the query and acquisition of the fragmented knowledge in Chinese court verdicts mainly adopt the class case retrieval method based on the search engine and the rough extraction method for a part of the data in court verdicts. These traditional methods cannot structurally extract fragmented knowledge in Chinese court verdicts and meet the needs of people for the follow-up analysis of court verdicts. Thus, in this paper, we present a structured subject event extraction method (SEE) for Chinese court verdict cases combining with techniques of event extraction (EE) and attribute-value pair extraction (AVPE). Specifically, we provide a subject event representation frame for organizing fragmented knowledge in Chinese court verdict cases. Then, we extract subject events from the unstructured cases based on the trained sequence labeling models and constructed heuristic rules, and fill them into the subject event representation frame in the form of attribute-value pairs (AVPs). The experimental results show that SEE can efficiently and automatically extract subject events from Chinese court verdict cases and visually display them via frame-filling, which promotes the efficiency of people in searching for legal materials and facilitates further research and analysis.
Local causal structure learning aims to discover and distinguish direct causes (parents) and direct effects (children) of a variable of interest from data. While emerging successes have been made, existing methods nee...
详细信息
Single Image Deraining task aims at recovering the rain-free background from an image degraded by rain streaks and rain accumulation. For the powerful fitting ability of deep neural networks and massive training data,...
详细信息
Visual dialog is a challenging task that requires the comprehension of the semantic dependencies among implicit visual and textual contexts. This task can refer to the relation inference in a graphical model with spar...
详细信息
Visual dialog is a challenging task that requires the comprehension of the semantic dependencies among implicit visual and textual contexts. This task can refer to the relation inference in a graphical model with spar...
详细信息
ISBN:
(数字)9781728171685
ISBN:
(纸本)9781728171692
Visual dialog is a challenging task that requires the comprehension of the semantic dependencies among implicit visual and textual contexts. This task can refer to the relation inference in a graphical model with sparse contexts and unknown graph structure (relation descriptor), and how to model the underlying context-aware relation inference is critical. To this end, we propose a novel Context-Aware Graph (CAG) neural network. Each node in the graph corresponds to a joint semantic feature, including both object-based (visual) and history-related (textual) context representations. The graph structure (relations in dialog) is iteratively updated using an adaptive top-K message passing mechanism. Specifically, in every message passing step, each node selects the most K relevant nodes, and only receives messages from them. Then, after the update, we impose graph attention on all the nodes to get the final graph embedding and infer the answer. In CAG, each node has dynamic relations in the graph (different related K neighbor nodes), and only the most relevant nodes are attributive to the context-aware relational graph inference. Experimental results on VisDial v0.9 and v1.0 datasets show that CAG outperforms comparative methods. Visualization results further validate the interpretability of our method.
Multivariate time series forecasting is very important for many applications. Many studies have been conducted for accurate and interpretable prediction methods. However, existing methods either cannot take both times...
详细信息
ISBN:
(数字)9781728162515
ISBN:
(纸本)9781728162522
Multivariate time series forecasting is very important for many applications. Many studies have been conducted for accurate and interpretable prediction methods. However, existing methods either cannot take both times series and covariates into consideration, lacking of interpretability, or ignore global trends across multivariate time series. In this paper, we aim to solve these issues. To this end, we propose a new model named TEDGE for accurate and interpretable time series prediction. In this model, we extract global trends hidden across multivariate times series to improve prediction accuracy. Meanwhile, we utilize a deep recurrent model with attention mechanism to find long-and short-term sequential patterns hidden in individual time series with interpretability. We conduct experiments on several datasets to evaluate the proposed models performance. Results demonstrate the superior performance of our proposed model.
Based on the Social Cognition Theory, we constructed the influencing factors model of college students' intention of online health information behavior from three levels of individual, society and information syst...
详细信息
Ultra-low-frequency gravitational waves (GWs) generated by individual inspiraling supermassive black hole binaries (SMBHBs) at the centers of galaxies may be detected by pulsar timing arrays (PTAs) in the future. Thes...
详细信息
暂无评论