First person action recognition is becoming an increasingly researched area thanks to the rising popularity of wearable cameras. This is bringing to light cross-domain issues that are yet to be addressed in this conte...
详细信息
ISBN:
(纸本)9781665409155
First person action recognition is becoming an increasingly researched area thanks to the rising popularity of wearable cameras. This is bringing to light cross-domain issues that are yet to be addressed in this context. Indeed, the information extracted from learned representations suffers from an intrinsic "environmental bias". This strongly affects the ability to generalize to unseen scenarios, limiting the application of current methods to real settings where labeled data are not available during training. In this work, we introduce the first domain generalization approach for egocentric activity recognition, by proposing a new audiovisual loss, called Relative Norm Alignment loss. It rebalances the contributions from the two modalities during training, over different domains, by aligning their feature norm representations. Our approach leads to strong results in domain generalization on both EPIC-Kitchens-55 and EPIC-Kitchens-100, as demonstrated by extensive experiments, and can be extended to work also on domain adaptation settings with competitive results.
This paper introduces a novel dataset for video enhancement and studies the state-of-the-art methods of the NTIRE 2021 challenge on quality enhancement of compressed video. The challenge is the first NTIRE challenge i...
详细信息
ISBN:
(纸本)9781665448994
This paper introduces a novel dataset for video enhancement and studies the state-of-the-art methods of the NTIRE 2021 challenge on quality enhancement of compressed video. The challenge is the first NTIRE challenge in this direction, with three competitions, hundreds of participants and tens of proposed solutions. Our newly collected Large-scale Diverse Video (LDV) dataset is employed in the challenge. In our study, we analyze the solutions of the challenges and several representative methods from previous literature on the proposed LDV dataset. We find that the NTIRE 2021 challenge advances the state-of-theart of quality enhancement on compressed video.
Sign language is a way of communication that uses hand shapes, orientation, movements, and facial expressions to express instead of spoken words like normal language. Different regions have developed their own version...
详细信息
With the exponential expansion of information on the internet, users are increasingly encountering challenges in locating the necessary information within an intricate web information system (WIS). Meanwhile, develope...
详细信息
ISBN:
(纸本)9798350376975;9798350376968
With the exponential expansion of information on the internet, users are increasingly encountering challenges in locating the necessary information within an intricate web information system (WIS). Meanwhile, developers struggle to craft user interfaces that deliver optimal user experiences (UX) within complex web architectures. Chatbots, emerging as integral components within new WISs, serve as complementary elements to traditional graphical user interfaces (GUIs). However, the absence of established methods providing clear guidelines or practices for implementing Chatbots on an existing WIS remains a notable gap. This study aims to address this gap by proposing an approach that transforms existing web functionalities into conversational interfaces (i.e., Chatbot interfaces). We present a comprehensive step-by-step guideline and a set of patterns to facilitate the conversion, referred to as "Chatbotification." To validate the feasibility of our proposed approach, we implemented Chatbots1 using two distinct frameworks, Rasa and GPT. The conducted experimental results show that all participants also found that the Chatbots are easy to use and understandable, while it takes an average of a few interactions to complete a given task with the Chatbots.
With the rapid development of artificial intelligence technology, visual inspection and image processing algorithms have been continuously improved in accuracy and efficiency, and intelligent inspection systems based ...
详细信息
This study presents a novel approach for detecting the angles of the rotated rectangles precisely using the hybrid architecture of Convolutional Neural Networks (CNN) with Multi-Layer Perceptron (MLP) and Support Vect...
详细信息
computers have been given vision by researchers across the world for many years. Now it is the era of digitization. Recognizing handwritten text is a must for a computervision system. Due to the variation and complex...
详细信息
For the requirement of automatic recognition of traffic police gestures in complex backgrounds based on vision sensors for driverless cars, We propose a method for traffic police gesture action recognition based on tw...
详细信息
The field of artificial intelligence (AI) holds a variety of algorithms designed with the goal of achieving high accuracy at low computational cost and latency. One popular algorithm is the vision transformer (ViT), w...
详细信息
ISBN:
(纸本)9798350383638;9798350383645
The field of artificial intelligence (AI) holds a variety of algorithms designed with the goal of achieving high accuracy at low computational cost and latency. One popular algorithm is the vision transformer (ViT), which excels at various computervision tasks for its ability to capture long-range dependencies effectively. This paper analyzes a computing paradigm, namely, spatial transformer networks (STN), in terms of accuracy and hardware complexity for image classification tasks. The paper reveals that for 2D applications, such as image recognition and classification, STN is a great backbone for AI algorithms for its efficiency and fast inference time. This framework offers a promising solution for efficient and accurate AI for resource-constrained Internet of Things (IoT) and edge devices. The comparative analysis of STN implementations on the central processing unit (CPU), Raspberry Pi (RPi), and Resistive Random Access Memory (RRAM) architectures reveals nuanced performance variations, providing valuable insights into their respective computational efficiency and energy utilization.
Graph convolutional networks (GCNs), which can effectively captures the spatial and temporal relationships between skeleton joints through graph topology, have shown promising performances in skeleton-based activity r...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
Graph convolutional networks (GCNs), which can effectively captures the spatial and temporal relationships between skeleton joints through graph topology, have shown promising performances in skeleton-based activity recognition in recent years. These methods typically learn the semantic features of the vertices of a skeleton and the associated adjacency matrix. However, how to efficiently establish relationships between vertices still remains a substantial problem. To solve this problem, we propose a novel Hierarchical Vertex-wise Intensification Graph Convolution Network (HVI-GCN) for skeleton-based action recognition. The proposed module dilates input features into higher dimensions to broaden the temporal horizon, and builds a vertex-wise topology based on self-adaptively learned attention. With the adjacency matrix, features from other positions can be collected to aid the prediction of the current position. The proposed module provides a better receptive field and semantic understanding of both the spatial and temporal domains than related methods. Experiments were mainly conducted on the at NTU-RGB-D, NTU-GRB-D 120, and NW-UCLA datasets with joint and bone integrated with motion sequences. Experimental results show that HVI-GCN can improve accuracy by up to 1.1% on the RGB-D 120 dataset. Meanwhile, the accuracy on RGB-D 60 dataset and NW-UCLA dataset can be boosted by 1.4% and 1.2%, respectively.
暂无评论