This work aims to learn how to perform complex robot manipulation tasks that are composed of several, consecutively executed low-level sub-tasks, given as input a few visual demonstrations of the tasks performed by a ...
详细信息
ISBN:
(纸本)9781728196817
This work aims to learn how to perform complex robot manipulation tasks that are composed of several, consecutively executed low-level sub-tasks, given as input a few visual demonstrations of the tasks performed by a person. The sub-tasks consist of moving the robot's end-effector until it reaches a sub-goal region in the task space, performing an action, and triggering the next sub-task when a pre-condition is met. Most prior work in this domain has been concerned with learning only low-level tasks, such as hitting a ball or reaching an object and grasping it. This paper describes a new neural network-based framework for learning simultaneously low-level policies as well as high-level policies, such as deciding which object to pick next or where to place it relative to other objects in the scene. A key feature of the proposed approach is that the policies are learned directly from raw videos of task demonstrations, without any manual annotation or post-processing of the data. Empirical results on object manipulation tasks with a robotic arm show that the proposed network can efficiently learn from real visual demonstrations to perform the tasks, and outperforms popular imitation learning algorithms.
Here we report an intelligent soft robotic gripper enabled by the integration of an ultrasonic remote sensor and triboelectric sensors. Due to the noncontact distance sensing ability, the ultrasonic sensor is used to ...
详细信息
A visual neuroprosthesis delivers electrical stimulation to the surviving neural cells of the visual pathway to produce prosthetic vision. While the retina is often chosen as the stimulation site, current retinal pros...
详细信息
ISBN:
(数字)9781728127828
ISBN:
(纸本)9781728127828
A visual neuroprosthesis delivers electrical stimulation to the surviving neural cells of the visual pathway to produce prosthetic vision. While the retina is often chosen as the stimulation site, current retinal prostheses are hindered by the lack of functional selectivity that impairs the resolution. A possible strategy to improve the resolution is to combine the retinal stimulation and the stimulation of the optic nerve bundle, which contains myelinated fibres of retinal ganglion cells (RGCs) axons that vary in diameter. In this study, we used a computational model of retinal ganglion cells (RGCs) with myelinated axons to predict whether the frequency of electrical stimulation delivered to the optic nerve can be modulated to preferentially inhibit a subset of optic nerve fibres classified by diameter. The model combined a finite element model of bipolar penetrating electrodes delivering sinusoidal stimulation in the range of 25-10000 Hz to the optic nerve, and a double-cable model, to represent an optic nerve fibre. We found that the diameter of the axon fibre and ion kinetic properties of the RGC affect the neuron's frequency response, demonstrating the potential of an optic nerve stimulation to produce selective inhibition based on the axon fibre size.
Single image super-resolution (SISR) with generative adversarial networks (GAN) has recently attracted increasing attention due to its potentials to generate rich details. However, the training of GAN is unstable, and...
详细信息
ISBN:
(数字)9781665469463
ISBN:
(纸本)9781665469463
Single image super-resolution (SISR) with generative adversarial networks (GAN) has recently attracted increasing attention due to its potentials to generate rich details. However, the training of GAN is unstable, and it often introduces many perceptually unpleasant artifacts along with the generated details. In this paper, we demonstrate that it is possible to train a GAN-based SISR model which can stably generate perceptually realistic details while inhibiting visual artifacts. Based on the observation that the local statistics (e.g., residual variance) of artifact areas are often different from the areas of perceptually friendly details, we develop a framework to discriminate between GAN-generated artifacts and realistic details, and consequently generate an artifact map to regularize and stabilize the model training process. Our proposed locally discriminative learning (LDL) method is simple yet effective, which can be easily plugged in off-the-shelf SISR methods and boost their performance. Experiments demonstrate that LDL outperforms the state-of-the-art GAN based SISR methods, achieving not only higher reconstruction accuracy but also superior perceptual quality on both synthetic and real-world datasets. Codes and models are available at https://***/csjliang/LDL.
This study is devoted to the consideration of methods for creating mathematical models of acoustic coherent images in relation to the tasks of monitoring the underwater environment of the World Ocean using modern dist...
详细信息
This paper strives for activity recognition under domain shift, for example caused by change of scenery or camera viewpoint. The leading approaches reduce the shift in activity appearance by adversarial training and s...
详细信息
ISBN:
(数字)9781665469463
ISBN:
(纸本)9781665469463
This paper strives for activity recognition under domain shift, for example caused by change of scenery or camera viewpoint. The leading approaches reduce the shift in activity appearance by adversarial training and self-supervised learning. Different from these vision-focused works we leverage activity sounds for domain adaptation as they have less variance across domains and can reliably indicate which activities are not happening. We propose an audio-adaptive encoder and associated learning methods that discriminatively adjust the visual feature representation as well as addressing shifts in the semantic distribution. To further eliminate domain-specific features and include domain-invariant activity sounds for recognition, an audio-infused recognizer is proposed, which effectively models the cross-modal interaction across domains. We also introduce the new task of actor shift, with a corresponding audio-visual dataset, to challenge our method with situations where the activity appearance changes dramatically. Experiments on this dataset, EPIC-Kitchens and CharadesEgo show the effectiveness of our approach. Project page: https://xiaobai1217. ***/DomainAdaptation
The facial action units (FAU) defined by the Facial Action Coding System (FACS) has become an important approach of facial expression analysis. Most work on FAU detection only considers the spatial-temporal feature an...
详细信息
ISBN:
(数字)9781665487399
ISBN:
(纸本)9781665487399
The facial action units (FAU) defined by the Facial Action Coding System (FACS) has become an important approach of facial expression analysis. Most work on FAU detection only considers the spatial-temporal feature and ignores the label-wise AU correlation. In practice, the strong relationships between facial AUs can help AU detection. We proposed a transformer based FAU detection model by leverage both the local spatial-temporal features and label-wise FAU correlation. To be specific, we firstly designed a visual spatial-temporal transformer based model and a convolution based audio model to extract action unit specific features. Secondly, inspired by the relationship between FAUs, we proposed a transformer based correlation module to learn correlation between AUs. The action unit specific features from aural and visual models are further aggregated in the correlation modules to produce per-frame prediction of 12 AUs. Our model was trained on Aff-Wild2 dataset of the ABAW3 challenge and achieved state of art performance in the FAU task, which verified that the effectiveness of the proposed network.
In recent years, there have been considerable developments in web development, and ReactJS has become a potent tool for creating effective and engaging web apps. This abstract offers a succinct synopsis of the study...
详细信息
Understanding data is crucial right now since doing so will improve the productivity and efficiency of corporate operations across the globe. Data analysis employs methodical techniques to search for patterns, cluster...
详细信息
Scene graph generation (SGG) aims to capture a wide variety of interactions between pairs of objects, which is essential for full scene understanding. Existing SGG methods trained on the entire set of relations fail t...
详细信息
ISBN:
(数字)9781665469463
ISBN:
(纸本)9781665469463
Scene graph generation (SGG) aims to capture a wide variety of interactions between pairs of objects, which is essential for full scene understanding. Existing SGG methods trained on the entire set of relations fail to acquire complex reasoning about visual and textual correlations due to various biases in training data. Learning on trivial relations that indicate generic spatial configuration like 'on' instead of informative relations such as 'parked on' does not enforce this complex reasoning, harming generalization. To address this problem, we propose a novel framework for SGG training that exploits relation labels based on their informativeness. Our model-agnostic training procedure imputes missing informative relations for less informative samples in the training data and trains a SGG model on the imputed labels along with existing annotations. We show that this approach can successfully be used in conjunction with state-of-the-art SGG methods and improves their performance significantly in multiple metrics on the standard Visual Genome benchmark. Furthermore, we obtain considerable improvements for unseen triplets in a more challenging zero-shot setting.
暂无评论