We evaluated the capability of generative pre-trained transformers (GPT-4) in analysis of textual data in tasks that require highly specialized domain expertise. Specifically, we focused on the task of analyzing court...
详细信息
Algorithms for text-generation in dialogue can be misguided. For example, in task-oriented settings, reinforcement learning that optimizes only task-success can lead to abysmal lexical diversity. We hypothesize this i...
详细信息
Despite recent attention to depth for various tasks, it is still an unexplored modality for weakly-supervised object detection (WSOD). We propose an amplifier method for enhancing the performance of WSOD by integratin...
详细信息
An important task for intelligentsystems is affordance grounding, where the goal is to locate regions on an object where an action can be performed. Past weakly supervised approaches learn from human-object interacti...
详细信息
ISBN:
(数字)9798350365474
ISBN:
(纸本)9798350365481
An important task for intelligentsystems is affordance grounding, where the goal is to locate regions on an object where an action can be performed. Past weakly supervised approaches learn from human-object interaction (HOI) by transferring grounding knowledge from exocentric to ego-centric views of an object. The use of HOI priors is inherently noisy and thus provides a limited source of supervision. To address this challenge, we identify that recent foundational models (i.e. VLMs and LLMs) can serve as auxiliary sources of knowledge for frameworks due to their vast world knowledge. In this work, we propose strategies to extract and leverage foundational model knowledge related to attributes and object parts to enhance an HOI-based affordance grounding framework. In particular, we propose to combine HOI and foundational model priors through (1) a spatial consistency loss and (2) heatmap aggregation. Our strategies result in mKLD and mNSS improvements, and insights suggest future directions for improving affordance grounding capabilities.
In this paper, we present a new approach that lets us extract, and represent relations among terms (concepts) in the documents and uses these relations to support various document analysis applications. Our approach w...
详细信息
In this paper, we present a new approach that lets us extract, and represent relations among terms (concepts) in the documents and uses these relations to support various document analysis applications. Our approach works by building a graph of local co-occurrence relations among terms that are extracted directly from text and by defining a global similarity metric among these terms and sets of terms using the graph and its connectivity. We demonstrate the benefit of the approach on the problem of MeSH keyword annotation of documents based on their abstracts.
Vision-language alignment learned from image-caption pairs has been shown to benefit tasks like object recognition and detection. Methods are mostly evaluated in terms of how well object class names are learned, but c...
Vision-language alignment learned from image-caption pairs has been shown to benefit tasks like object recognition and detection. Methods are mostly evaluated in terms of how well object class names are learned, but captions also contain rich attribute context that should be considered when learning object alignment. It is unclear how methods use this context in learning, as well as whether models succeed when tasks require attribute and object understanding. To address this gap, we conduct extensive analysis of the role of attributes in vision-language models. We specifically measure model sensitivity to the presence and meaning of attribute context, gauging influence on object embeddings through unsupervised phrase grounding and classification via description methods. We further evaluate the utility of attribute context in training for open-vocabulary object detection, fine-grained text-region retrieval, and attribution tasks. Our results show that attribute context can be wasted when learning alignment for detection, attribute meaning is not adequately considered in embeddings, and describing classes by only their attributes is ineffective. A viable strategy that we find to increase benefits from attributes is contrastive training with adjective-based negative captions.
This paper describes an intelligent tutoring system, LARGO, that helps students learn skills of legal reasoning with hypotheticals by analyzing oral arguments before the US Supreme Court. The skills involve proposing ...
详细信息
ISBN:
(纸本)1595936807
This paper describes an intelligent tutoring system, LARGO, that helps students learn skills of legal reasoning with hypotheticals by analyzing oral arguments before the US Supreme Court. The skills involve proposing a rule-like test for deciding a case, posing hypotheticals to challenge the rule, and responding by analogizing or distinguishing the hypotheticals and/or modifying the proposed test. Students diagram arguments in a special-purpose graphical language and receive feedback in the form of reflection questions. Copyright 2007 ACM.
This paper presents an investigation of score prediction for the Organization dimension of an assessment of analytical writing in response to text. With the long-term goal of producing feedback for students and teache...
详细信息
Investigating cooperativity of interlocutors is central in studying pragmatics of dialogue. Models of conversation that only assume cooperative agents fail to explain the dynamics of strategic conversations. Thus, we ...
详细信息
暂无评论