As a hot research topic in the field of computer vision, blind image quality assessment (BIQA) can provide high-quality images for end-users and promote the development of other fields of computer vision. Although the...
详细信息
Recently, inspired by the success of Transformer in natural language processing tasks, a number of works have attempted to apply Transformer-based models to video action recognition. Existing works only use one RGB st...
详细信息
ISBN:
(纸本)9781665405409
Recently, inspired by the success of Transformer in natural language processing tasks, a number of works have attempted to apply Transformer-based models to video action recognition. Existing works only use one RGB stream as the input for Transformer. How to use multiple pathways and multiple streams with Transformer for action recognition has not been studied. To address this issue, we present a novel structure namely Two-Pathway vision Transformer (TP-ViT). Two parallel spatial Transformer encoders are used as two pathways with different framerates and resolutions of the input video. The high-resolution pathway contains more spatial information, while the high-framerate pathway contains more temporal information. The two outputs are fused and fed into a temporal Transformer encoder for action recognition. Furthermore, we also fuse skeleton features into our model to get better results. Our experiments demonstrate that our proposed models achieve the state-of-the-art results on both the coarse-grained dataset Kinetics and the fine-grained dataset FineGym.
With the advent of the Industry 4.0, the capability to automatically detect and measure the liquid levels and volumes in the transparent containers (i.e. test tubes in a laboratory) in real-time is a crucial part in t...
详细信息
Convolution and self-attention are popular paradigms and many works take them as two separate components to explore their potential combination. In this work, we consider their intrinsic properties in spatial and chan...
详细信息
WiFi Channel State Information (CSI)-based Human Activity Recognition (HAR) enables contactless, long-range sensing in spatially constrained environments while preserving visual privacy. However, despite the ubiquity ...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
WiFi Channel State Information (CSI)-based Human Activity Recognition (HAR) enables contactless, long-range sensing in spatially constrained environments while preserving visual privacy. However, despite the ubiquity ofWiFi-enabled devices, few expose CSI, limiting sensing hardware options. Variants of the Espressif ESP32 have emerged as potential compact, low-cost, and easy-to-deploy solutions for WiFi CSI-based HAR. In this work, four ESP32-S3-based 2.4 GHz directional antenna systems are evaluated for their ability to facilitate long-range through-wall HAR. Two promising systems are identified: one combines ESP32-S3 with a directional biquad antenna, and the second uses the built-in printed inverted-F antenna (PIFA) achieving directionality through a plane reflector. In a comprehensive evaluation of line-of-sight (LOS) and non-line-of-sight (NLOS) HAR performance, both systems are deployed in an office environment spanning a distance of 18 meters across five rooms. In this experimental setup, theWallhack1.8k dataset, comprising 1,806 CSI amplitude spectrograms of human activities, is collected and made publicly available. Based onWallhack1.8k, activity recognition models using the EfficientNetV2 architecture are trained to assess system performance in LOS and NLOS scenarios. For the core NLOS activity recognition problem, the biquad antenna and PIFA-based systems achieve accuracies of 92.0 +/- 3.5 and 86.8 +/- 4.7, respectively, demonstrating the feasibility of long-range through-wall HAR.
作者:
Martinez, Geovanni
Escuela de Ingeniería Eléctrica Universidad de Costa Rica San José11501-2060 Costa Rica
This paper presents an algorithm capable of determining the three-dimensional position and orientation (3D pose) of an exploration robot from the processing of two multidimensional signals, a monocular near-infrared (...
详细信息
The proceedings contain 47 papers. The special focus in this conference is on Cognitive Systems and Information processing. The topics include: Mastering "Gongzhu" with Self-play Deep Reinforcement Learning;...
ISBN:
(纸本)9789819906161
The proceedings contain 47 papers. The special focus in this conference is on Cognitive Systems and Information processing. The topics include: Mastering "Gongzhu" with Self-play Deep Reinforcement Learning;improved Vanishing Gradient Problem for Deep Multi-layer Neural Networks;incremental Quaternion Random Neural Networks;question Answering on Agricultural Knowledge Graph Based on Multi-label Text Classification;dairy Cow Individual Identification System Based on Deep Learning;automatic Packaging System Based on Machine vision;meteorological and Hydrological Monitoring Technology Based on Wireless Sensor Network Model and Its Application;a Review of Deep Reinforcement Learning Exploration Methods: Prospects and Challenges for Application to robot Attitude Control Tasks;aerobotSim: A High-Photo-Fidelity Simulator for Heterogeneous Aerial Systems Under Physical Interaction;T3SFNet: A Tuned Topological Temporal-Spatial Fusion Network for Motor Imagery with Rehabilitation Exoskeleton;trailer Tag Hitch: An Automatic Reverse Hanging System Using Fiducial Markers;anatomical and vision-Guided Path Generation Method for Nasopharyngeal Swabs Sampling;a Hierarchical Model for Dynamic Simulation of the Fault in Satellite Operations;design and Implementation of Autonomous Navigation System Based on Tracked Mobile robot;towards Flying Carpet: Dynamics Modeling, and Differential-Flatness-Based Control and Planning;region Clustering for Mobile robot Autonomous Exploration in Unknown Environment;human Intention Understanding and Trajectory Planning Based on Multi-modal Data;robotic Arm Movement Primitives Assembly Planning Method Based on BT and DMP;center-of-Mass-Based Regrasping of Unknown Objects Using Reinforcement Learning and Tactile Sensing;alongshore Circumnavigating Control of a Manta robot Based on Fuzzy Control and an Obstacle Avoidance Strategy;joint Trajectory Generation of Obstacle Avoidance in Tight Space for robot Manipulator;robot Calligraphy Based on Footprint Mo
Applications in fields like medical, robots, and automobiles must have embedded vision with the optimized design metrics in today's smart environment. A tiny computer and camera can be used to create a tiny vision...
详细信息
Malaria is a major health issue worldwide, and its diagnosis requires scalable solutions that can work effectively with low-cost microscopes (LCM). Deep learning-based methods have shown success in computer-aided diag...
详细信息
ISBN:
(纸本)9798350349405;9798350349399
Malaria is a major health issue worldwide, and its diagnosis requires scalable solutions that can work effectively with low-cost microscopes (LCM). Deep learning-based methods have shown success in computer-aided diagnosis from microscopic images. However, these methods need annotated images that show cells affected by malaria parasites and their life stages. Annotating images from LCM significantly increases the burden on medical experts compared to annotating images from high-cost microscopes (HCM). For this reason, a practical solution would be trained on HCM images which should generalize well on LCM images during testing. While earlier methods adopted a multi-stage learning process, they did not offer an end-to-end approach. In this work, we present an end-to-end learning framework, named CodaMal (COntrastive Domain Adpation for MALaria). In order to bridge the gap between HCM (training) and LCM (testing), we propose a domain adaptive contrastive loss. It reduces the domain shift by promoting similarity between the representations of HCM and its corresponding LCM image, without imposing an additional annotation burden. In addition, the training objective includes object detection objectives with carefully designed augmentations, ensuring the accurate detection of malaria parasites. On the publicly available large-scale M5-dataset, our proposed method shows a significant improvement of 16% over the state-of-the-art methods in terms of the mean average precision metric (mAP), provides 21 x speed improvement during inference and requires only half of the learnable parameters used in prior methods. Our code is publicly available: https://***/codamal-webpage/.
vision Transformers (ViTs) have gained significant attention for their exceptional model accuracies on computer vision applications, but their demanding memory requirements and computational complexity have hindered a...
详细信息
暂无评论