Nowadays, many commodities are being sold across restaurants. Many commodities would earn profit to the seller on the other hand some may provide loss. Predicting the price of a commodity leads to success in terms of ...
详细信息
In this paper, we focus on the Audio-Visual Question Answering (AVQA) task, which aims to answer questions regarding different visual objects, sounds, and their associations in videos. The problem requires comprehensi...
详细信息
ISBN:
(数字)9781665469463
ISBN:
(纸本)9781665469463
In this paper, we focus on the Audio-Visual Question Answering (AVQA) task, which aims to answer questions regarding different visual objects, sounds, and their associations in videos. The problem requires comprehensive multimodal understanding and spatio-temporal reasoning over audio-visual scenes. To benchmark this task and facilitate our study, we introduce a large-scale MUSIC-AVQA dataset, which contains more than 45K question answer pairs covering 33 different question templates spanning over different modalities and question types. We develop several baselines and introduce a spatio-temporal grounded audio-visual network for the AVQA problem. Our results demonstrate that AVQA benefits from multisensory perception and our model outperforms recent A-, V-, and AVQA approaches. We believe that our built dataset has the potential to serve as testbed for evaluating and promoting progress in audio-visual scene understanding and spatio-temporal reasoning. Code and dataset: http://***/MUSIC-AVQA/
Electrical Tomography (ET) is an advanced visualization technique with low-cost, non-invasiveness, nonpolluting, and fast-response advantages. However, inherent illposed problems and soft-field effects in the ET recon...
详细信息
ISBN:
(数字)9798350380903
ISBN:
(纸本)9798350380910
Electrical Tomography (ET) is an advanced visualization technique with low-cost, non-invasiveness, nonpolluting, and fast-response advantages. However, inherent illposed problems and soft-field effects in the ET reconstruction process cause uncertainty in boundary measurements, thereby decreasing the quality of ET reconstruction. To solve this problem, this study employs Gaussian-type fuzzy membership functions to represent measurement uncertainty. The study constructs the objective function for ET reconstruction and performs fuzzy optimization using the Expectation-Maximization (EM) algorithm, aiming to enhance the accuracy and stability of ET reconstruction. Experimental findings affirm the effectiveness of the proposed approach in improving the ET reconstruction quality, offering a novel and valuable tool for ET reconstruction.
In recent years, the rapid development of Visual Simultaneous Localization and Mapping (vSLAM) technology has provided improvements for robot localization. However, for autonomous underwater vehicles, feature extracti...
详细信息
ISBN:
(数字)9781665495721
ISBN:
(纸本)9781665495721
In recent years, the rapid development of Visual Simultaneous Localization and Mapping (vSLAM) technology has provided improvements for robot localization. However, for autonomous underwater vehicles, feature extraction methods in underwater environments have not been studied much due to the specificity of their application scenarios. For this purpose, this paper evaluates and analyzes the performance of feature detecting and matching in underwater environments by using well-known interest points detectors and descriptors in vSLAM. The performance of different well-known interest points detectors and descriptors is compared and analyzed by simulation experiments using four sets of underwater image data and combining histogram equalization-based and deep learning-based image enhancement algorithms. The experimental results have verified the effectiveness of the image enhancement algorithm and feature extraction methods in underwater environments, and the quantitative comparison concludes that ORB algorithm has an advantage in underwater environments.
High-quality labeled textual data are vital for automatic mining and analysis of massive textual data produced by software systems. Several tools have been designed to facilitate manual labeling of textual data on dif...
详细信息
ISBN:
(纸本)9781665437868
High-quality labeled textual data are vital for automatic mining and analysis of massive textual data produced by software systems. Several tools have been designed to facilitate manual labeling of textual data on different levels of granularity. However, these tools neither aim to provide statistics and analysis of labeled textual data, nor support collaboration among the coders to reduce the time cost in manual labeling and enhance the quality of labeling results. In this paper, we developed a Web-based labeling tool named CoolTeD (available at http://***) for collaborative labeling of the textual datasets. Specifically, CoolTeD can be used: (1) to label textual data from the perspective of requirements types based on ISO 25010, (2) to review the labeling results with different confidence levels and contradictory labels, (3) to automatically calculate Cohen's Kappa coefficient of multiple coders, and (4) to visualize the labeling results. The tool demo is available at https://***/xVkrB_Cs1J8
In order to effectively analyze and utilize a large number of maternal health care diagnosis and treatment guidelines, electronic medical records and clinical data, and promote the sharing and reuse of knowledge and e...
详细信息
In this paper, we present a framework rooted in control and planning that enables quadrupedal robots to traverse challenging terrains with discrete footholds using visual feedback. Navigating discrete terrain is chall...
详细信息
ISBN:
(纸本)9781728196817
In this paper, we present a framework rooted in control and planning that enables quadrupedal robots to traverse challenging terrains with discrete footholds using visual feedback. Navigating discrete terrain is challenging for quadrupeds because the motion of the robot can be aperiodic, highly dynamic, and blind for the hind legs of the robot. Additionally, the robot needs to reason over both the feasible footholds as well as the base velocity in order to speed up or slow down at different parts of the discrete terrain. To address these challenges, we build an offline library of periodic gaits which span two trotting steps, and switch between different motion primitives to achieve aperiodic motions of different step lengths on a quadrupedal robot. The motion library is used to provide targets to a geometric model predictive controller which outputs the contact forces at the stance feet. To incorporate visual feedback, we use terrain mapping tools and a forward facing depth camera to build a local height map of the terrain around the robot, and extract feasible foothold locations around both the front and hind legs of the robot. Our experiments show a small scale quadruped robot navigating multiple unknown, challenging and discrete terrains in the real world.
Visual grounding is a task to locate the target indicated by a natural language expression. Existing methods extend the generic object detection framework to this problem. They base the visual grounding on the feature...
详细信息
ISBN:
(数字)9781665469463
ISBN:
(纸本)9781665469463
Visual grounding is a task to locate the target indicated by a natural language expression. Existing methods extend the generic object detection framework to this problem. They base the visual grounding on the features from pre-generated proposals or anchors, and fuse these features with the text embeddings to locate the target mentioned by the text. However, modeling the visual features from these predefined locations may fail to fully exploit the visual context and attribute information in the text query, which limits their performance. In this paper, we propose a transformer-based framework for accurate visual grounding by establishing text-conditioned discriminative features and performing multi-stage cross-modal reasoning. Specifically, we develop a visual-linguistic verification module to focus the visual features on regions relevant to the textual descriptions while suppressing the unrelated areas. A language-guided feature encoder is also devised to aggregate the visual contexts of the target object to improve the object's distinctiveness. To retrieve the target from the encoded visual features, we further propose a multi-stage cross-modal decoder to iteratively speculate on the correlations between the image and text for accurate target localization. Extensive experiments on five widely used datasets validate the efficacy of our proposed components and demonstrate state-of-the-art performance.
This paper presents the design and prototype of soft-fingered AI-enabled hand (SofIA) based on the Fin-Ray (R) effect. The study proposes a material and method for fabricating soft Fin-Ray fingers by molding them enti...
详细信息
ISBN:
(数字)9781665413084
ISBN:
(纸本)9781665413084
This paper presents the design and prototype of soft-fingered AI-enabled hand (SofIA) based on the Fin-Ray (R) effect. The study proposes a material and method for fabricating soft Fin-Ray fingers by molding them entirely from urethane rubber. SofIA is equipped with a depth camera that provides visual feedback on the state of the fingers, which will be used in the development of a versatile sensing system based on deep learning. Flexible side supports were added to further improve the mechanical performance of the fingers. Using SofIA, a series of experiments were conducted with the original and modified FinRay finger structures to test and validate the desired behaviour of the gripper. It was found that the hand is capable of manipulating objects ranging from 10 mm to 90 mm in diameter, objects up to 90 mm x 90 mm in length and width, and objects with a maximum mass of 400 g in a position parallel to the ground regardless of the object material.
The braking system is the key part of trains, and its full life-cycle of health status is essential to ensure the safety of trains. How to accurately assess real-time health status throughout the full life-cycle of th...
详细信息
暂无评论