Herein, a novel methodology is proposed for real-time human activity detection and recognition in a compressed domain of videos using motion vectors and attention-guided bidirectional LSTM, and it is termed as MVABLST...
详细信息
Herein, a novel methodology is proposed for real-time human activity detection and recognition in a compressed domain of videos using motion vectors and attention-guided bidirectional LSTM, and it is termed as MVABLSTM. The videos in MPEG-4 and H.264 compression formats are considered for the present study. Any video source without any prior setup could be considered by adopting the proposed method to various video codecs and camera settings. Existing algorithms for human action recognition in a compressed domain video have some limitations in this regard, such as (i) requirement of keyframes at a fixed interval, (ii) usage of P-frames only, and (iii) normally support single codec only. These limitations are overcome in the proposed method using arbitrary keyframe intervals, using both P- and B-frames, and supporting MPEG-4 as well as H.264 codecs. The experimentation is carried out using the benchmark datasets, namely UCF101, HMDB51, and THUMOS14, and the recognition accuracy in a compressed domain is found to be comparable to that observed in raw video data but with reduced computational time. The proposed MVABLSTM method has outperformed other recent methods in the literature in terms of a lesser (65%) number of parameters and (92%) GFLOPS, while significantly improving accuracy by 0.8%, 5.95%, and 16.65% for UCF101, HMDB51, and THUMOS14, respectively, and speed by 8% in MPEG-4 domain. The performance analysis of the proposed method has been done using MVABLSTM variants in different codecs in comparison with the state-of-the-art network models.
Eliminating time-consuming post-production processes and delivering high-quality videos in today's fast-paced digital landscape are the key advantages of real-time approaches. To address these needs, we present Re...
详细信息
ISBN:
(纸本)9798350318920;9798350318937
Eliminating time-consuming post-production processes and delivering high-quality videos in today's fast-paced digital landscape are the key advantages of real-time approaches. To address these needs, we present realtime GAZED: a real-time adaptation of the GAZED framework integrated with CineFilter, a novel real-time camera trajectory stabilization approach. It enables users to create professionally edited videos in real-time. Comparative evaluations against baseline methods, including the non-real-time GAZED, demonstrate that realtime GAZED achieves similar editing results, ensuring high-quality video output. Furthermore, a user study confirms the aesthetic quality of the video edits produced by the realtime GAZED approach. With these advancements in real-time camera trajectory optimization and video editing presented, the demand for immediate and dynamic content creation in industries such as live broadcasting, sports coverage, news reporting, and social media content creation can be met more efficiently.
Remote video monitoring over networks inevitably introduces a certain degree of communication latency. Although numerous studies have been conducted to reduce latency in network systems, achieving "zero-latency&q...
详细信息
Remote video monitoring over networks inevitably introduces a certain degree of communication latency. Although numerous studies have been conducted to reduce latency in network systems, achieving "zero-latency" is fundamentally impossible for video monitoring. To address this issue, we investigate a practical method to compensate for latency in video monitoring using video prediction techniques. We apply the lightweight PredNet to predict future frames, and their image qualities are evaluated through quantitative image quality metrics and subjective assessment. The evaluation results suggest that for simple movements the robot arm, the prediction time to generate future frames can tolerate up to 333 ms. The video prediction method is integrated into a remote monitoring system, and its processingtime is also evaluated. We define the object-to-display latency for video monitoring and explore the potential for realizing a zero-latency remote video monitoring system. The evaluation, involving simultaneous capture of the robot arm's movement and the display of the remote monitoring system, confirms the feasibility of compensating for the object-to-display latency of several hundred milliseconds by using video prediction. Experimental results demonstrate that our approach can function as a new compensation method for communication latency.
The article proposes an approach to the development of computationally simple and fast algorithms for data preprocessing and the selection of stable features. The following algorithms are used: 1. a modified method of...
详细信息
ISBN:
(纸本)9781510673199;9781510673182
The article proposes an approach to the development of computationally simple and fast algorithms for data preprocessing and the selection of stable features. The following algorithms are used: 1. a modified method of multicriteria processing in local windows. The method is based on minimizing the objective function, which allows both to reduce the noise component in locally stationary areas and to preserve and strengthen the transition boundaries;2. The method of reducing the scope of clusters allows you to change the number of color histograms with the absorption of nearby areas and preservation of objects;3. The method of non-local change in color balance allows you to select areas on a dark/light background when the color balance is shifted;4. Edge detector based on the analysis of local areas in various data layers. The effectiveness test was carried out on a set of test images obtained by the flip chip machine, images by a microcircuit analyzer, as well as data from the product production line. The analyzation frames had low resolution and poor lighting. images are captured in RGB color space.
The intersection of deep learning and programmable logic controllers (PLCs) can lead to innovative applications in automation. One of the exciting application areas are gesture-based control systems for Automated Guid...
详细信息
ISBN:
(纸本)9781510673199;9781510673182
The intersection of deep learning and programmable logic controllers (PLCs) can lead to innovative applications in automation. One of the exciting application areas are gesture-based control systems for Automated Guided Vehicles (AGVs). AGVs are used in various industries for material handling, logistics, warehouse automation, etc. Traditionally, these vehicles are controlled using predefined routes or remote controls, but with gesture-based control, operators can communicate more naturally and efficiently. The incorporation of YOLO-Pose in YOLO versions 7 and 8 has elevated the YOLO algorithm to a leading tool for creating gesture recognition models. The YOLO algorithm employs convolutional neural networks (CNN) to detect objects in real-time. These latest YOLO models offer significantly improved accuracy, speed, and reduced training times. This paper presents the comparative results of 2D gesture recognition transfer learning models created using the YOLO v5, v7, and v8 models, along with the steps taken to implement the model in a PLC-controlled AGV. Over 14,000 images were collected to build the models. A semi-automated approach was used to annotate them. Five models were created: two Keypoint models and three object detection models using transfer learning techniques with the same hyperparameters.
Bimodal objects, such as the checkerboard pattern used in camera calibration, markers for object tracking, and text on road signs, to name a few, are prevalent in our daily lives and serve as a visual form to embed in...
详细信息
Bimodal objects, such as the checkerboard pattern used in camera calibration, markers for object tracking, and text on road signs, to name a few, are prevalent in our daily lives and serve as a visual form to embed information that can be easily recognized by vision systems. While binarization from intensity images is crucial for extracting the embedded information in the bimodal objects, few previous works consider the task of binarization of blurry images due to the relative motion between the vision sensor and the environment. The blurry images can result in a loss in the binarization quality and thus degrade the downstream applications where the vision system is in motion. Recently, neuromorphic cameras offer new capabilities for alleviating motion blur, but it is non-trivial to first deblur and then binarize the images in a real-time manner. In this work, we propose an event-based binary reconstruction method that leverages the prior knowledge of the bimodal target's properties to perform inference independently in both event space and image space and merge the results from both domains to generate a sharp binary image. We also develop an efficient integration method to propagate this binary image to high frame rate binary video. Finally, we develop a novel method to naturally fuse events and images for unsupervised threshold identification. The proposed method is evaluated in publicly available and our collected data sequence, and shows the proposed method can outperform the SOTA methods to generate high frame rate binary video in real-time on CPU-only devices.
PurposeA stereoscopic surgical video stream consists of left-right image pairs provided by a stereo endoscope. While the surgical display shows these image pairs synchronised, most capture cards cause de-synchronisati...
详细信息
PurposeA stereoscopic surgical video stream consists of left-right image pairs provided by a stereo endoscope. While the surgical display shows these image pairs synchronised, most capture cards cause de-synchronisation. This means that the paired left and right images may not correspond once used in downstream tasks such as stereo depth computation. The stereo synchronisation problem is to recover the corresponding left-right images. This is particularly challenging in the surgical setting, owing to the moist tissues, rapid camera motion, quasi-staticity and real-timeprocessing requirement. Existing methods exploit image cues from the diffuse reflection component and are defeated by the above *** propose to exploit the specular reflection. Specifically, we propose a powerful left-right comparison score (LRCS) using the specular highlights commonly occurring on moist tissues. We detect the highlights using a neural network, characterise them with invariant descriptors, match them, and use the number of matches to form the proposed LRCS. We perform evaluation against 147 existing LRCS in 44 challenging robotic partial nephrectomy and robotic-assisted hepatic resection video sequences with simulated and real *** proposed LRCS outperforms, with an average and maximum offsets of 0.055 and 1 frames and 94.1 +/- 3.6% successfully synchronised frames. In contrast, the best existing LRCS achieves an average and maximum offsets of 0.3 and 3 frames and 81.2 +/- 6.4% successfully synchronised *** use of specular reflection brings a tremendous boost to the real-time surgical stereo synchronisation problem.
real-world images captured in remote sensing, image or video retrieval, and outdoor surveillance are often degraded due to poor weather conditions, such as rain and mist. These conditions introduce artifacts that make...
详细信息
real-world images captured in remote sensing, image or video retrieval, and outdoor surveillance are often degraded due to poor weather conditions, such as rain and mist. These conditions introduce artifacts that make visual analysis challenging and limit the performance of high-level computer vision methods. In time-critical applications, it is vital to develop algorithms that automatically remove rain without compromising the quality of the image contents. This article proposes a novel approach called QSAM-Net, a quaternion multi-stage multiscale neural network with a self-attention module. The algorithm requires significantly fewer parameters by a factor of 3.98 than the real-valued counterpart and state-of-the-art methods while improving the visual quality of the images. The extensive evaluation and benchmarking on synthetic and real-world rainy images demonstrate the effectiveness of QSAM-Net. This feature makes the network suitable for edge devices and applications requiring near real-time performance. Furthermore, the experiments show that the improved visual quality of images also leads to better object detection accuracy and training speed.
Neural volumetric representations such as Neural Radiance Fields (NeRF) have emerged as a compelling technique for learning to represent 3D scenes from images with the goal of rendering photorealistic images of the sc...
详细信息
Neural volumetric representations such as Neural Radiance Fields (NeRF) have emerged as a compelling technique for learning to represent 3D scenes from images with the goal of rendering photorealistic images of the scene from unobserved viewpoints. However, NeRF's computational requirements are prohibitive for real-time applications: rendering views from a trained NeRF requires querying a multilayer perceptron (MLP) hundreds of times per ray. We present a method to train a NeRF, then precompute and store (i.e., "bake") it as a novel representation called a Sparse Neural Radiance Grid (SNeRG) that enables real-time rendering on commodity hardware. To achieve this, we introduce 1) a reformulation of NeRF's architecture and 2) a sparse voxel grid representation with learned feature vectors. The resulting scene representation retains NeRF's ability to render fine geometric details and view-dependent appearance, is compact (averaging less than 90 MB per scene), and can be rendered in real-time (higher than 30 frames per second on a laptop GPU). Actual screen captures are shown in our video.
The rise of IoT devices has led to a surge in data generation, necessitating efficient processing solutions. Traditional cloud-centric approaches face challenges like latency, bandwidth and privacy issues. Edge comput...
详细信息
The rise of IoT devices has led to a surge in data generation, necessitating efficient processing solutions. Traditional cloud-centric approaches face challenges like latency, bandwidth and privacy issues. Edge computing, a promising paradigm, enables data processing closer to the source, enhancing IoT-driven computer vision applications. This shift integrates edge computing frameworks with a study that proposed a novel drosophila food search-tuned convolutional neural network (DFS-CNN) and computer vision algorithms for real-time tasks like object detection and anomaly detection. We collected acquisition data from labeled Faces in the Wild (LFW) video and image dataset, the acquisition data were preprocessed using a bilateral filter for minimizing noise while maintaining sharp edges, and then filtered data were extracted using a histogram of oriented gradients (HOG) - the DFS-CNN model using the HOG feature set to detection the computer vision solution. The optimal DFS-CNN model was deployed on edge computing and is used in an IoT-based architecture to compute and transport data for real-time performance simulation using Tensor Flow Lite. The proposed method is compared to other traditional algorithms. The proposed DFS-CNN model detects objects in the LWT image with 96% accuracy. The proposed DFS-CNN model was used to address students' inactivity status during the online exam, and that the outcome of the suggested method was tested with data latency, and real-time response, according to a comparative performance analysis.
暂无评论