检索结果-内蒙古大学图书馆

A novel algorithm for human action recognition in compressed domain using attention-guided approach

JOURNAL OF real-time image processing 2023年第6期20卷 122页

作者： Praveenkumar, S. M. Patil, Prakashgoud Hiremath, P. S. KLE Technol Univ Dept Master Comp Applicat Hubballi 580031 Karnataka India

Herein, a novel methodology is proposed for real-time human activity detection and recognition in a compressed domain of videos using motion vectors and attention-guided bidirectional LSTM, and it is termed as MVABLSTM. The videos in MPEG-4 and H.264 compression formats are considered for the present study. Any video source without any prior setup could be considered by adopting the proposed method to various video codecs and camera settings. Existing algorithms for human action recognition in a compressed domain video have some limitations in this regard, such as (i) requirement of keyframes at a fixed interval, (ii) usage of P-frames only, and (iii) normally support single codec only. These limitations are overcome in the proposed method using arbitrary keyframe intervals, using both P- and B-frames, and supporting MPEG-4 as well as H.264 codecs. The experimentation is carried out using the benchmark datasets, namely UCF101, HMDB51, and THUMOS14, and the recognition accuracy in a compressed domain is found to be comparable to that observed in raw video data but with reduced computational time. The proposed MVABLSTM method has outperformed other recent methods in the literature in terms of a lesser (65%) number of parameters and (92%) GFLOPS, while significantly improving accuracy by 0.8%, 5.95%, and 16.65% for UCF101, HMDB51, and THUMOS14, respectively, and speed by 8% in MPEG-4 domain. The performance analysis of the proposed method has been done using MVABLSTM variants in different codecs in comparison with the state-of-the-art network models.

关键词： Convolution neural network video processing Data association Motion vectors Bidirectional long short-term memory Attention mechanism

来源：评论

学校读者我要写书评

暂无评论

real time GAZED: Online Shot Selection and Editing of Virtual Cameras from Wide-Angle Monocular video Recordings

Real Time GAZED: Online Shot Selection and Editing of Virtua...

引用

IEEE/CVF Winter conference on Applications of Computer Vision (WACV)

作者： Achary, Sudheer Girmaji, Rohit Deshmukh, Adhiraj Anil Gandhi, Vineet IIIT Hyderabad CVIT KCIS Hyderabad Telangana India

ISBN: (纸本)9798350318920;9798350318937

Eliminating time-consuming post-production processes and delivering high-quality videos in today's fast-paced digital landscape are the key advantages of real-time approaches. To address these needs, we present real time GAZED: a real-time adaptation of the GAZED framework integrated with CineFilter, a novel real-time camera trajectory stabilization approach. It enables users to create professionally edited videos in real-time. Comparative evaluations against baseline methods, including the non-real-time GAZED, demonstrate that real time GAZED achieves similar editing results, ensuring high-quality video output. Furthermore, a user study confirms the aesthetic quality of the video edits produced by the real time GAZED approach. With these advancements in real-time camera trajectory optimization and video editing presented, the demand for immediate and dynamic content creation in industries such as live broadcasting, sports coverage, news reporting, and social media content creation can be met more efficiently.

关键词： Algorithms Applications Applications Computational photography Embedded sensing / real-time techniques image and video synthesis Virtual / augmented reality

来源：评论

学校读者我要写书评

暂无评论

Compensation of Communication Latency in Remote Monitoring Systems by video Prediction

引用

IEICE TRANSACTIONS ON COMMUNICATIONS 2024年第12期E107B卷 945-954页

作者： Sato, Toshio Katsuyama, Yutaka Qi, Xin Wen, Zheng Tamesue, Kazuhiko Kameyama, Wataru Nakamura, Yuichi Katto, Jiro Sato, Takuro Waseda Univ Fac Sci & Engn Tokyo 1690072 Japan Waseda Univ Sch Int Liberal Studies Tokyo 1690051 Japan Kyoto Univ Grad Sch Biostudies Kyoto 6068501 Japan

Remote video monitoring over networks inevitably introduces a certain degree of communication latency. Although numerous studies have been conducted to reduce latency in network systems, achieving "zero-latency" is fundamentally impossible for video monitoring. To address this issue, we investigate a practical method to compensate for latency in video monitoring using video prediction techniques. We apply the lightweight PredNet to predict future frames, and their image qualities are evaluated through quantitative image quality metrics and subjective assessment. The evaluation results suggest that for simple movements the robot arm, the prediction time to generate future frames can tolerate up to 333 ms. The video prediction method is integrated into a remote monitoring system, and its processing time is also evaluated. We define the object-to-display latency for video monitoring and explore the potential for realizing a zero-latency remote video monitoring system. The evaluation, involving simultaneous capture of the robot arm's movement and the display of the remote monitoring system, confirms the feasibility of compensating for the object-to-display latency of several hundred milliseconds by using video prediction. Experimental results demonstrate that our approach can function as a new compensation method for communication latency.

关键词： video prediction zero-latency remote video monitoring Pred- Net image quality object-to-display latency

来源：评论

学校读者我要写书评

暂无评论

Algorithm for detecting objects and specialized tags in low light conditions and low camera resolution

Algorithm for detecting objects and specialized tags in low ...

引用

conference on real-time processing of image, Depth, and video Information

作者： Semenishcheva, Evgenii Zdanovaa, Marina Alepkoa, Andrey Voronina, Viacheslav Moscow State Tech Univ STANKIN 1a Vadkovsky Moscow 127055 Russia

ISBN: (纸本)9781510673199;9781510673182

The article proposes an approach to the development of computationally simple and fast algorithms for data preprocessing and the selection of stable features. The following algorithms are used: 1. a modified method of multicriteria processing in local windows. The method is based on minimizing the objective function, which allows both to reduce the noise component in locally stationary areas and to preserve and strengthen the transition boundaries;2. The method of reducing the scope of clusters allows you to change the number of color histograms with the absorption of nearby areas and preservation of objects;3. The method of non-local change in color balance allows you to select areas on a dark/light background when the color balance is shifted;4. Edge detector based on the analysis of local areas in various data layers. The effectiveness test was carried out on a set of test images obtained by the flip chip machine, images by a microcircuit analyzer, as well as data from the product production line. The analyzation frames had low resolution and poor lighting. images are captured in RGB color space.

关键词： image preprocessing edge detection object detection low resolution image analyze

来源：评论

学校读者我要写书评

暂无评论

Deep Learning Approach for a Machine-Human interface based on optical real-time Gesture Recognition for Automated Guided Vehicles

Deep Learning Approach for a Machine-Human interface based o...

引用

conference on real-time processing of image, Depth, and video Information

作者： Krishnakumar, Kiran Raj Gersmeier, Laura Harders, Leif Ole Hussmann, Stephan West Coast Univ Appl Sci Fritz Thiedemann Ring 20 D-25746 Heide Germany REINHOLZ Technol GmbH Kirchhoffstr 1A D-25524 Itzehoe Germany

ISBN: (纸本)9781510673199;9781510673182

The intersection of deep learning and programmable logic controllers (PLCs) can lead to innovative applications in automation. One of the exciting application areas are gesture-based control systems for Automated Guided Vehicles (AGVs). AGVs are used in various industries for material handling, logistics, warehouse automation, etc. Traditionally, these vehicles are controlled using predefined routes or remote controls, but with gesture-based control, operators can communicate more naturally and efficiently. The incorporation of YOLO-Pose in YOLO versions 7 and 8 has elevated the YOLO algorithm to a leading tool for creating gesture recognition models. The YOLO algorithm employs convolutional neural networks (CNN) to detect objects in real-time. These latest YOLO models offer significantly improved accuracy, speed, and reduced training times. This paper presents the comparative results of 2D gesture recognition transfer learning models created using the YOLO v5, v7, and v8 models, along with the steps taken to implement the model in a PLC-controlled AGV. Over 14,000 images were collected to build the models. A semi-automated approach was used to annotate them. Five models were created: two Keypoint models and three object detection models using transfer learning techniques with the same hyperparameters.

关键词： real-time image processing machine-human interface gesture recognition convolutional neural networks (CNN) automated guided vehicles machine learning deep-learning networks

来源：评论

学校读者我要写书评

暂无评论

Neuromorphic Synergy for video Binarization

引用

IEEE TRANSACTIONS ON image processing 2024年 33卷 1403-1418页

作者： Lin, Shijie Zhang, Xiang Yang, Lei Yu, Lei Zhou, Bin Luo, Xiaowei Wang, Wenping Pan, Jia Univ Hong Kong Dept Comp Sci Hong Kong Peoples R China Ctr Transformat Garment Prod Hong Kong Peoples R China Wuhan Univ Sch Elect & Informat Wuhan 430072 Peoples R China Beihang Univ Sch Comp Sci & Engn Beijing 100191 Peoples R China City Univ Hong Kong Dept Architecture & Civil Engn Hong Kong Peoples R China Texas A&M Univ Dept Comp Sci & Engn College Stn TX 77843 USA

Bimodal objects, such as the checkerboard pattern used in camera calibration, markers for object tracking, and text on road signs, to name a few, are prevalent in our daily lives and serve as a visual form to embed information that can be easily recognized by vision systems. While binarization from intensity images is crucial for extracting the embedded information in the bimodal objects, few previous works consider the task of binarization of blurry images due to the relative motion between the vision sensor and the environment. The blurry images can result in a loss in the binarization quality and thus degrade the downstream applications where the vision system is in motion. Recently, neuromorphic cameras offer new capabilities for alleviating motion blur, but it is non-trivial to first deblur and then binarize the images in a real-time manner. In this work, we propose an event-based binary reconstruction method that leverages the prior knowledge of the bimodal target's properties to perform inference independently in both event space and image space and merge the results from both domains to generate a sharp binary image. We also develop an efficient integration method to propagate this binary image to high frame rate binary video. Finally, we develop a novel method to naturally fuse events and images for unsupervised threshold identification. The proposed method is evaluated in publicly available and our collected data sequence, and shows the proposed method can outperform the SOTA methods to generate high frame rate binary video in real-time on CPU-only devices.

关键词： image binarization neuromorphic event camera motion deblurring high frame-rate video restoration

来源：评论

学校读者我要写书评

暂无评论

Synchronising a stereoscopic surgical video stream using specular reflection

引用

INTERNATIONAL JOURNAL OF COMPUTER ASSISTED RADIOLOGY AND SURGERY 2025年第2期20卷 289-299页

作者： Chandelon, Kilian Bartoli, Adrien Clermont Ferrand Univ Hosp Inst Pascal EnCoVUCA UMR6602 CNRS Clermont Ferrand France SurgAR Surg Augmented Real Clermont Ferrand France Clermont Ferrand Univ Hosp Dept Clin Res & Innovat Clermont Ferrand France

PurposeA stereoscopic surgical video stream consists of left-right image pairs provided by a stereo endoscope. While the surgical display shows these image pairs synchronised, most capture cards cause de-synchronisation. This means that the paired left and right images may not correspond once used in downstream tasks such as stereo depth computation. The stereo synchronisation problem is to recover the corresponding left-right images. This is particularly challenging in the surgical setting, owing to the moist tissues, rapid camera motion, quasi-staticity and real-time processing requirement. Existing methods exploit image cues from the diffuse reflection component and are defeated by the above *** propose to exploit the specular reflection. Specifically, we propose a powerful left-right comparison score (LRCS) using the specular highlights commonly occurring on moist tissues. We detect the highlights using a neural network, characterise them with invariant descriptors, match them, and use the number of matches to form the proposed LRCS. We perform evaluation against 147 existing LRCS in 44 challenging robotic partial nephrectomy and robotic-assisted hepatic resection video sequences with simulated and real *** proposed LRCS outperforms, with an average and maximum offsets of 0.055 and 1 frames and 94.1 +/- 3.6% successfully synchronised frames. In contrast, the best existing LRCS achieves an average and maximum offsets of 0.3 and 3 frames and 81.2 +/- 6.4% successfully synchronised *** use of specular reflection brings a tremendous boost to the real-time surgical stereo synchronisation problem.

关键词： Stereo Synchronisation Specularity Endoscopy Left-right comparison score

来源：评论

学校读者我要写书评

暂无评论

QSAM-Net: Rain Streak Removal by Quaternion Neural Network With Self-Attention Module

引用

IEEE TRANSACTIONS ON MULtimeDIA 2024年 26卷 789-798页

作者： Frants, Vladimir Agaian, Sos Panetta, Karen CUNY Grad Ctr New York NY 10016 USA CUNY Coll Staten Isl New York NY 10314 USA Tufts Univ Elect & Comp Engn Dept Medford MA 02155 USA

real-world images captured in remote sensing, image or video retrieval, and outdoor surveillance are often degraded due to poor weather conditions, such as rain and mist. These conditions introduce artifacts that make visual analysis challenging and limit the performance of high-level computer vision methods. In time-critical applications, it is vital to develop algorithms that automatically remove rain without compromising the quality of the image contents. This article proposes a novel approach called QSAM-Net, a quaternion multi-stage multiscale neural network with a self-attention module. The algorithm requires significantly fewer parameters by a factor of 3.98 than the real-valued counterpart and state-of-the-art methods while improving the visual quality of the images. The extensive evaluation and benchmarking on synthetic and real-world rainy images demonstrate the effectiveness of QSAM-Net. This feature makes the network suitable for edge devices and applications requiring near real-time performance. Furthermore, the experiments show that the improved visual quality of images also leads to better object detection accuracy and training speed.

关键词： Deep learning object detection quaternion image processing quaternion neural networks rain removal

来源：评论

学校读者我要写书评

暂无评论

Baking Neural Radiance Fields for real-time View Synthesis

引用

IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE 2025年第5期47卷 3310-3321页

作者： Hedman, Peter Srinivasan, Pratul P. Mildenhall, Ben Reiser, Christian Barron, Jonathan T. Debevec, Paul Google Res Mountain View CA 94043 USA Netflix Los Gatos CA 95032 USA

Neural volumetric representations such as Neural Radiance Fields (NeRF) have emerged as a compelling technique for learning to represent 3D scenes from images with the goal of rendering photorealistic images of the scene from unobserved viewpoints. However, NeRF's computational requirements are prohibitive for real-time applications: rendering views from a trained NeRF requires querying a multilayer perceptron (MLP) hundreds of times per ray. We present a method to train a NeRF, then precompute and store (i.e., "bake") it as a novel representation called a Sparse Neural Radiance Grid (SNeRG) that enables real-time rendering on commodity hardware. To achieve this, we introduce 1) a reformulation of NeRF's architecture and 2) a sparse voxel grid representation with learned feature vectors. The resulting scene representation retains NeRF's ability to render fine geometric details and view-dependent appearance, is compact (averaging less than 90 MB per scene), and can be rendered in real-time (higher than 30 frames per second on a laptop GPU). Actual screen captures are shown in our video.

关键词： Rendering (computer graphics) Three-dimensional displays real-time systems image color analysis Vectors Graphics processing units image reconstruction Computer vision neural rendering real-time rendering view synthesis

来源：评论

学校读者我要写书评

暂无评论

IoT-Oriented Edge Computing-Driven Computer Vision Solution

引用

SPIN 2025年第0期

作者： Zhang, Songning Univ Sci & Technol China Sch Cyber Sci & Technol Hefei 230088 Anhui Peoples R China

The rise of IoT devices has led to a surge in data generation, necessitating efficient processing solutions. Traditional cloud-centric approaches face challenges like latency, bandwidth and privacy issues. Edge computing, a promising paradigm, enables data processing closer to the source, enhancing IoT-driven computer vision applications. This shift integrates edge computing frameworks with a study that proposed a novel drosophila food search-tuned convolutional neural network (DFS-CNN) and computer vision algorithms for real-time tasks like object detection and anomaly detection. We collected acquisition data from labeled Faces in the Wild (LFW) video and image dataset, the acquisition data were preprocessed using a bilateral filter for minimizing noise while maintaining sharp edges, and then filtered data were extracted using a histogram of oriented gradients (HOG) - the DFS-CNN model using the HOG feature set to detection the computer vision solution. The optimal DFS-CNN model was deployed on edge computing and is used in an IoT-based architecture to compute and transport data for real-time performance simulation using Tensor Flow Lite. The proposed method is compared to other traditional algorithms. The proposed DFS-CNN model detects objects in the LWT image with 96% accuracy. The proposed DFS-CNN model was used to address students' inactivity status during the online exam, and that the outcome of the suggested method was tested with data latency, and real-time response, according to a comparative performance analysis.

关键词： Object detection algorithm LFW data model architecture processing traditional algorithm

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：