With the increasing importance of video data in real-world applications, there is a rising need for efficient object detection methods that utilize temporal information. While existing video object detection (VOD) tec...
详细信息
ISBN:
(纸本)1577358872
With the increasing importance of video data in real-world applications, there is a rising need for efficient object detection methods that utilize temporal information. While existing video object detection (VOD) techniques employ various strategies to address this challenge, they typically depend on locally adjacent frames or randomly sampled images within a clip. Although recent Transformer-based VOD methods have shown promising results, their reliance on multiple inputs and additional network complexity to incorporate temporal information limits their practical applicability. In this paper, we propose a novel approach to single image object detection, called Context Enhanced TRansformer (CETR), by incorporating temporal context into DETR using a newly designed memory module. To efficiently store temporal information, we construct a class-wise memory that collects contextual information across data. Additionally, we present a classification-based sampling technique to selectively utilize the relevant memory for the current image. In the testing, We introduce a test-time memory adaptation method that updates individual memory functions by considering the test distribution. Experiments with CityCam and imageNet VID datasets exhibit the efficiency of the framework on various video systems. The project page and code will be made available at: https://***/CETR.
Implementing image dehazing and defogging on a Field Programmable Gate Array (FPGA) offers efficiency. Dehazing an image becomes particularly challenging in the presence of fog or haze. However, employing a dark chann...
详细信息
This work presents a novel approach to real-timevideo frame interpolation using Generative Adversarial Networks (GANs) to enhance streaming services. We developed a custom GAN architecture comprising a generator, whi...
详细信息
In this modern world of fast-paced life, no one really do not care about skin care. Skincare is one of the things left behind. Skin texture analysis is really important for several purposes such as dermatology, skin t...
详细信息
With the rapid developments of low-end IoT devices and artificial intelligence (AI), the heterogeneous and dynamic data has been increasing drastically, especially in the era of AI-enabled maritime cyber-physical syst...
详细信息
We propose a pixel-level vibration imaging method for high frame rate (HFR)-video-based localization of flying objects with large movement. When the ratio of the translation speed of a target to its vibration frequenc...
详细信息
Despite the remarkable success of deep-learning in image and video recognition, constructing real-time recognition systems for computationally intensive tasks such as spatiotemporal human action localization is still ...
ISBN:
(纸本)9798350304572
Despite the remarkable success of deep-learning in image and video recognition, constructing real-time recognition systems for computationally intensive tasks such as spatiotemporal human action localization is still challenging. As computational complexity of these tasks can easily exceed the capacity of edge devices, inference must be performed in remote (cloud) environments. But then, recognition accuracy is subject to fluctuating networking conditions in best-effort networks due to compression artefacts incurred from low-bitrate video streaming. To improve overall recognition accuracy under various networking conditions, we propose SwitchingNet, an edge-assisted inference model switching method. In SwitchingNet, we train multiple recognition models specialized towards different levels of image quality and a neural switching model for dynamically choosing among the specialized recognition models during system operation. Switching decisions are made at the edge given an image quality vector calculated from compressed and uncompressed frames. In the experiments, we show that our approach can on average sustain higher recognition accuracy than plain recognition systems under heavily fluctuating networking conditions. Also, our switchingbased recognition approach is far less computationally intensive than competing ensemble methods and allows to significantly reduce cloud computing costs.
This paper presents a Genesys-2 FPGA implementation of three video watermarking techniques in both spatial and frequency domains followed by a comparative analysis. The video acquisition is realized using OV7670 camer...
详细信息
This paper addresses the potential impact of video-mediated and video-recorded communication on the sign language production of Flemish signers. We present preliminary results of a comparison of face-to-face communica...
详细信息
ISBN:
(纸本)9798350302615
This paper addresses the potential impact of video-mediated and video-recorded communication on the sign language production of Flemish signers. We present preliminary results of a comparison of face-to-face communication in Flemish Sign Language with real-time online communication and with video-recorded messages intended for later viewing. We pay particular attention to the use of the signing space, simultaneity and referent tracking mechanisms. A better understanding of this effect is needed for the design and development of sign language recognition, which is fundamental for machine translation of these languages and the development of avatar technology.
The core of sports video analysis technology is the analysis of semantic events and their relationships, which is a complex and challenging problem. Its essence lies in the huge gap between the low-level features suit...
详细信息
暂无评论