Extracting modular segments from raw video demonstrations of high-level actions is important to understand the underlying building blocks for different tasks in human-robot interaction. While (data-hungry) supervised ...
Extracting modular segments from raw video demonstrations of high-level actions is important to understand the underlying building blocks for different tasks in human-robot interaction. While (data-hungry) supervised learning approaches for Action Segmentation show good performance when the underlying segments are predefined, their performance degrades when unseen actions are introduced on-the-go as new data samples are scarce. In this regard, Zero-and Few-Shot Learning approaches have shown good performance in generalizing to unseen examples. In Action Segmentation, where each frame needs to be labeled, annotating new data even for a few tasks can become tedious as the number of tasks scale. In this work, we propose Interactive Iterative Improvement $(I^{3})$ for Few-Shot Action Segmentation, a Semi-Supervised Interactive Meta-Learning approach for Zero-Shot Learning on unlabeled videos and Few-Shot Learning on small amounts of labeled videos. $I^{3}$ consists of a Prototypical Network model for frame-wise prediction coupled with a Hidden-Semi-Markov-Model to prevent over-segmentation. The model is iteratively improved in an interactive manner through users’ annotations provided via a webinterface. This is done in a task-agnostic manner that, in theory, can be reused for a number of different actions. Our model provides sequentially accurate segmentations using only a limited amount of labeled data which shows the efficacy of our learning approach. A lower edit distance compared to baselines indicates a lower number of required user edits making it well suited for non-expert users to smoothly provide annotations enabling them to have more control over the learned model.
The recent surge in artificial intelligence, particularly in multimodal processing technology, has advanced human-computerinteraction, by altering how intelligent systems perceive, understand, and respond to contextu...
详细信息
A unique digital identity, user ID, is essential for everyday online activities in the Internet era. These user IDs represent a user in a digital environment using stored credentials on a system called authentication ...
详细信息
Advancements in 3D rendering like Gaussian Splatting (GS) allow novel view synthesis and real-time rendering in virtual reality (VR). However, GS-created 3D environments are often difficult to edit. For scene enhancem...
详细信息
ISBN:
(数字)9798331536459
ISBN:
(纸本)9798331536466
Advancements in 3D rendering like Gaussian Splatting (GS) allow novel view synthesis and real-time rendering in virtual reality (VR). However, GS-created 3D environments are often difficult to edit. For scene enhancement or to incorporate 3D assets, segmenting Gaussians by class is essential. Existing segmentation approaches are typically limited to certain types of scenes, e.g., "circular" scenes, to determine clear object boundaries. However, this method is ineffective when removing large objects in non-"circling" scenes such as large outdoor *** propose Semantics-Controlled GS (SCGS), a segmentation-driven GS approach, enabling the separation of large scene parts in uncontrolled, natural environments. SCGS allows scene editing and the extraction of scene parts for VR. Additionally, we introduce a challenging outdoor dataset, overcoming the "circling" setup. We outperform the state-of-the-art in visual quality on our dataset and in segmentation quality on the 3D-OVS dataset. We conducted an exploratory user study, comparing a 360-video, plain GS, and SCGS in VR with a fixed viewpoint. In our subsequent main study, users were allowed to move freely, evaluating plain GS and SCGS. Our main study results show that participants clearly prefer SCGS over plain GS. We overall present an innovative approach that surpasses the state-of-the-art both technically and in user experience.
Roopkotha is a storytelling robot that seamlessly combines traditional storytelling methods and technology, creating a captivating robot storyteller. We are creating a special prototype in the world of robots that tel...
详细信息
ISBN:
(数字)9798350374766
ISBN:
(纸本)9798350374773
Roopkotha is a storytelling robot that seamlessly combines traditional storytelling methods and technology, creating a captivating robot storyteller. We are creating a special prototype in the world of robots that tells stories in a way that is easy to understand and enjoy. In this era of technological advancement, Roopkotha combines voice recognition with Bangla Language processing, emotion recognition, human behavior detection. Roopkotha aims to revolutionize the way stories are told and engage with users on a deep emotional level. Furthermore, Roopkotha is equipped with advanced facial expressions and emotion recognition technology. The emotion recognition feature helps the robot to have a profound connection with the users.
Person image synthesis with controllable body poses and appearances is an essential task owing to the practical needs in the context of virtual try-on, image editing and video production. However, existing methods fac...
详细信息
Over the years, Machine Learning models have been successfully employed on neuroimaging data for accurately predicting brain age. Deviations from the healthy brain aging pattern are associated with the accelerated bra...
详细信息
Contrastive clustering has recently been an emerging topic in deep unsupervised learning. Nevertheless, the previous works mostly adopt the stochastic data augmentations, which easily leads to the semantic drift probl...
详细信息
ISBN:
(数字)9798350359312
ISBN:
(纸本)9798350359329
Contrastive clustering has recently been an emerging topic in deep unsupervised learning. Nevertheless, the previous works mostly adopt the stochastic data augmentations, which easily leads to the semantic drift problem by limited transformations. Moreover, these approaches ignore the data distribution information when generating the positive and negative pair-wise samples. In light of this, this paper proposes a simple yet effective unsupervised clustering network termed Confidence-oriented Contrastive Graph Clustering (CoCGC). Particularly, we design an end-to-end network paradigm with un-shared weights, among which a hybrid graph filter is utilized to generate two views of reliable augmentations. Guided by the non-dominated sorting theory, we further construct a confidence-oriented sample set from the latent data distribution perspective. By considering the local density and cluster distribution of the embedding representations, the discriminative sample pairs can be derived from the confidence-oriented sets in a two-view contrastive manner. Finally, a cross-view neighbor contrastive loss is devised for better exploiting the self-supervised network signals. Extensive experimental results on five benchmark datasets demonstrate the effectiveness of our method against the existing state-of-the-art deep graph clustering methods.
This paper presents a CAD-based approach for automated surface defect detection. We leverage the a-priori knowledge embedded in a CAD model and integrate it with point cloud data acquired from commercially available s...
详细信息
Exposure to ideas in domains outside a scientist’s own may benefit her in reformulating existing research problems in novel ways and discovering new application domains for existing solution ideas. While improved per...
详细信息
暂无评论