检索结果-内蒙古大学图书馆

31st International conference on Systems, Signals and image processing (IWSSIP)

作者： Dwivedi, Vivek Bhatnagar, Mansi Tursunov, Javlon Rozinaj, Gregor Slovak Univ Technol Bratislava Inst Multimedia Informet Commun Technol Bratislava Slovakia

ISBN: (纸本)9798350391893;9798350391886

In this paper, the development of an immersive virtual conference room designed to replicate the experience of a physical meeting environment has been presented. Utilizing Unity software, authors have created a virtual space equipped with several simple screens instead of holograms, each attached with individual cameras. Live camera feeds seamlessly integrate real-time participant interactions within virtual environments, enhancing telepresence and fostering immersive collaboration akin to face-to-face meetings. While the current implementation operates locally, our future focus is on enabling remote connectivity, facilitating collaboration among individuals across different geographic locations, and later a hologram-based virtual conference. This innovative approach aims to enhance remote collaboration experiences and bridge the gap between virtual interactions and physical presence.

关键词： Virtual conference room Unity software multi-screen setup Live streaming Immersive experience Virtual communication Distance interaction Geographical connectivity

来源：评论

学校读者我要写书评

暂无评论

realistic Avatar Control Through video-Driven Animation for Augmented reality 7th

Realistic Avatar Control Through Video-Driven Animation for ...

引用

7th IFIP TC 12 International conference on Computational Intelligence in Data Science, ICCIDS 2024

作者： Nath, A. Gokul Kumar, K. Suresh Department of IT Saveetha Engineering College Chennai India

ISBN: (纸本)9783031699818

This paper proposes an efficient real-time framework to generate detailed avatar animations solely from monocular camera videos, avoiding costly motion capture equipment. It extracts 3D facial and body landmarks using Blaze Pose key points on the input video. A novel adaptor mapping function then transforms the 2D landmark topology into diverse 3D avatar rigs, enabling the animation of different characters. The unified approach produces high-fidelity lip sync, expressions, gestures, and full-body motions in real-time. Extensive experiments demonstrate the framework generates realistic avatar mimicry of humans in video for immersive real-time applications in VR/AR entertainment and animation. A novel adaptor mapping function transforms 2D landmarks extracted by Blaze Pose into diverse 3D avatar rigs, overcoming topology limitations. The unified approach produces detailed facial expressions, lip sync, gestures, and body motions in real-time, enabling the avatar to mimic humans in video. Extensive experiments validate that the framework generates realistic avatar animations comparable to motion capture, with applications in real-time VR/AR. Key innovations include the novel mapping function to transform 2D landmarks into 3D avatar motions, and the real-time performance to animate avatars that closely imitate people in monocular video. © IFIP International Federation for Information processing 2024.

关键词： Animation

来源：评论

学校读者我要写书评

暂无评论

ViTrack: Efficient Tracking on the Edge for Commodity video Surveillance Systems

引用

IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEMS 2022年第3期33卷 723-735页

作者： Cheng, Linsong Wang, Jiliang Li, Yinghui Tsinghua Univ Sch Software & BNrist Beijing 100084 Peoples R China

Nowadays, video surveillance systems are widely deployed in various places, e.g., schools, parks, airports, roads, etc. However, existing video surveillance systems are far from full utilization due to high computation overhead in video processing. In this work, we present ViTrack, a framework for efficient multi-video tracking using computation resource on the edge for commodity video surveillance systems. In the heart of ViTrack lies a two layer spatial/temporal compressed target detection method to significantly reduce the computation overhead by combining videos from multiple cameras. Further, ViTrack derives the video relationship and camera information even in absence of camera location, direction, etc. To alleviate the impact of variant video quality and missing targets, ViTrack leverages a Markov Model based approach to efficiently recover missing information and finally derive the complete trajectory. We implement ViTrack on a real deployed video surveillance system with 110 cameras. The experiment results demonstrate that ViTrack can provide efficient trajectory tracking with processing time 45x less than the existing approach. For 110 video cameras, ViTrack can run on a Dell OptiPlex 390 computer to track given targets in almost real time. We believe ViTrack can enable practical video analysis for widely deployed commodity video surveillance systems.

关键词： Cameras video surveillance Trajectory image edge detection Target tracking Object detection Trajectory tracking video tracking edge computing

来源：评论

学校读者我要写书评

暂无评论

Visual Cardiac Signal Classifiers: A Deep Learning Classification Approach for Heart Signal Estimation From video

引用

IEEE ACCESS 2024年 12卷 144377-144394页

作者： Moustafa, Mohamed Farooq, Muhammad Ali Elrasad, Amr Lemley, Joseph Corcoran, Peter Univ Galway Dept Elect & Elect Engn Imaging Lab C3I Galway H91 TK33 Ireland FotoNation Sensing Team Galway H91 V0TX Ireland

Heart rate is a crucial metric in health monitoring. Traditional computer vision solutions estimate cardiac signals by detecting physical manifestations of heartbeats, such as facial discoloration caused by blood oxygenation changes, from subject videos using regression methods. As continuous signals are more complex and expensive to de-noise, this study introduces an alternative approach, employing end-to-end classification models to remotely derive a discrete representation of cardiac signals from face videos. These visual cardiac signal classifiers are trained on discretized cardiac signals, a novel pre-processing method with limited precedent in health monitoring literature. Consequently, various methods to convert continuous cardiac signals into binary form are presented, and their impact on training is evaluated. An implementation of this approach, the temporal shift convolutional attention binary classifier, is presented using the regression-based convolutional attention network architecture. The classifier and a baseline regression model are trained and tested using publicly available and locally collected datasets designed for heart signal detection from face video. The model performance is then assessed based on the heart rate error from the extracted cardiac signals. Results show the proposed method outperforms the baseline on the UBFC-rPPG dataset, reducing cross-dataset root mean square error from 2.33 to 1.63 beats per minute. However, both models struggled to generalize to the PURE dataset, with root mean square errors of 12.40 and 16.29 beats per minute, respectively. Additionally, the proposed approach reduces the computational complexity of model output post-processing, enhancing its suitability for real-time applications and deployment on systems with restricted resources.

关键词： Convolutional neural networks Heart rate Computer vision Training Estimation Remote sensing Visualization Standards Monitoring Feature extraction Deep learning Biomedical signal processing Photoplethysmography deep learning image classification remote monitoring remote sensing biomedical signal processing heart rate photoplethysmography

来源：评论

学校读者我要写书评

暂无评论

Beyond boundaries: Advancements in fire and smoke detection for indoor and outdoor surveillance feeds

引用

ENGINEERING APPLICATIONS OF ARTIFICIAL INTELLIGENCE 2025年 142卷

作者： Khan, Rafaqat Alam Bajwa, Usama Ijaz Raza, Rana Hammad Anwar, Muhammad Waqas COMSATS Univ Islamabad Dept Comp Sci Lahore Campus 1-5 KM Def Rd Raiwind Rd Lahore Pakistan Lahore Garrison Univ Dept Software Engn DHA Phase 6 Sect C Lahore Pakistan Natl Univ Sci & Technol NUST Pakistan Navy Engn Coll PNEC Elect & Power Engn Dept Habib Ibrahim Rehmatullah Rd Karachi Pakistan Govt Coll Univ Dept Comp Sci Lahore Pakistan

This survey paper reviews the challenges and recent advancements in Artificial Intelligence (AI) video-based smoke and fire detection systems, with particular focus on both indoor and outdoor environments. The main problem addressed is the high false alarm rates (ranging from 3.4% to 29.49% across various systems) and the challenges posed by environmental variability, dataset scarcity, and the complexity of real-time detection. The paper critically examines key methodologies, including traditional approaches, deep learning techniques (with accuracy rates reaching up to 98.72% and false alarm rates reduced to as low as 0.61%), hybrid methods, and domain transfer-based tools, highlighting their evolution and current trends. This survey also provides an indepth analysis of publicly available datasets and evaluation metrics, such as detection accuracy (ranging from 79.66% to 98.72%), robustness to dynamic environments, and real-time processing capabilities (with some systems achieving up to 333 frames per second (FPS). By synthesizing insights from 33 papers published between 2013 and 2024, the survey not only summarizes the current state of the art but also identifies emerging trends, such as the increasing use of automatic feature learning and multi-fusion systems, which have demonstrated significant improvements in detection accuracy. The paper concludes by advocating for future research focused on improving system robustness and reducing false alarms through the integration of visible range cameras and traditional sensors, with the goal of achieving more accurate and reliable fire detection in surveillance systems.

关键词： Fire detection Smoke detection video image Deep learning Computer vision Indoor and outdoor environment Artificial intelligence

来源：评论

学校读者我要写书评

暂无评论

MobileNetV2-Transformer Hybrid Architecture for Effective video Frame Embeddings Prediction 2

MobileNetV2-Transformer Hybrid Architecture for Effective Vi...

引用

2nd IEEE International conference on Electrical, Automation and Computer Engineering, ICEACE 2024

作者： Zhang, Xiangyue Georgia Institute of Technology School of Computer Science AtlantaGA United States

ISBN: (纸本)9798350368208

In the realm of video processing and analysis, accurate prediction of future frames is crucial in applications like video compression, anomaly detection and augmented reality. This paper introduces a novel approach that leverages the powerful feature extraction capabilities of MobileNetV2 combined with the temporal sequence modeling strengths of Transformer models to predict the embeddings of following video frames. More specifically, the paper first employs the MobileNetV2 architecture to extract high-dimensional embeddings from individual frames, capitalizing on its efficiency and robustness in handling image data. These embeddings serve as a compact representation of the visual content of each frame. Then, a Transformer model is utilized to analyze the sequential nature of those embeddings, aiming to predict the embeddings of following frames. The Transformer's multi-head self-attention mechanism allows for an enhanced understanding of the temporal dynamics of the video, facilitating more accurate predictions compared with traditional methods. The paper later evaluates the method on test video datasets. The results indicate that the model needs further development such as incorporating a greater variety and quantity of experimental data. The method involved in the paper opens new avenues for real-time video analysis applications with the combination of Convolutional Neural Networks (CNNs) and Transformer models. The implications of this research extend to improve the performance of video-based systems across various domains, possibly indicating the potential of integrated architectures in advancing video analytic techniques. © 2024 IEEE.

关键词： video analysis

来源：评论

学校读者我要写书评

暂无评论

Color image denoising: a hybrid approach for mixed Gaussian and impulsive noise

Color image denoising: a hybrid approach for mixed Gaussian ...

引用

conference on real-time image processing and Deep Learning

作者： Smolka, Bogdan Kusnik, Damian Smolka, Milena Kawulok, Michal Cyganek, Boguslaw Silesian Tech Univ Gliwice Poland AGH Univ Sci & Technol Krakow Poland

ISBN: (纸本)9781510673878;9781510673861

This paper tackles the problem of mixed Gaussian and impulsive noise suppression in color images. The proposed method comprises two essential steps. Firstly, we detect impulsive noise through an approach based on the concept of digital path exploring the local pixel neighborhood. Each pixel is assigned a cost of a path connecting the boundary of a local processing window with its center. When the central pixel exhibits a high value of the path with lowest cost, it is identified as an impulse. To achieve this, we use a thresholding procedure for detecting corrupted pixels. Analyzing the distribution of minimum path costs, we employ the k-means technique to classify pixels into three distinct categories: those nearly undistorted, those corrupted by Gaussian noise, and those affected by impulsive noise. Subsequently, we employ the Laplace interpolation technique to restore the impulsive pixels - a fast and effective method yielding satisfactory denoising results. In the second step, we address the residual Gaussian noise using the Non-Local Means method, which selectively considers pixels from the local window that have not been flagged as impulsive. The experimental results confirm that our proposed hybrid method consistently yields superior outcomes compared to state-of-the-art denoising techniques. Moreover, its computational complexity remains low, rendering it suitable for real-time applications.

关键词： image enhancement color imaging mixed noise reduction impulsive noise

来源：评论

学校读者我要写书评

暂无评论

ACCURATE MULTISCALE SELECTIVE FUSION OF CT AND video imageS FOR real-time ENDOSCOPIC CAMERA 3D TRACKING IN ROBOTIC SURGERY 47

ACCURATE MULTISCALE SELECTIVE FUSION OF CT AND VIDEO IMAGES ...

引用

47th IEEE International conference on Acoustics, Speech and Signal processing (ICASSP)

作者： Luo, Xiongbiao Xiamen Univ Dept Comp Sci Xiamen 36105 Peoples R China

ISBN: (纸本)9781665405409

Robotic surgery requires endoscope 3D tracking to navigate the endoscope in the body. This paper proposes an accurate multiscale selective fusion framework to register 2D endoscopic video images to 3D pre-operative CT data for endoscope 3D tracking. Current video-based 3D tracking depends on the performance of the 2D-3D fusion procedure that suffers from inaccurate similarity and image uncertainties. To boost video-based 3D tracking, we develop multiscale selective similarity characterization to enhance the 2D-3D fusion procedure. Such fusion not only uses image pyramids in multiple scales to represent endoscopic images but also selects specific structure information from these multiscale images to compute the similarity. We validated our method on clinical data. Our method can reduce the current tracking error from 8.9 to 5.4 mm without using any external trackers, while it provides surgeons with robust real-time surgical 3D tracking.

关键词： Endoscope Tracking Surgical Navigation 2D-3D Registration image Similarity Robotic Surgery

来源：评论

学校读者我要写书评

暂无评论

video detection method of pointer instrument based on improved FCOS

Video detection method of pointer instrument based on improv...

引用

2022 International conference on Computer Graphics, Artificial Intelligence, and Data processing, ICCAID 2022

作者： Chen, Xi Zhang, Pengfei Xu, Wei Chang, Yongjuan Liu, Mingshuo Zhao, Zhenyuan State Grid Hebei Information &telecommunication Branch Shijiazhuang050000 China North China Electric Power University Hebei Baoding071003 China

ISBN: (纸本)9781510663350

Aiming at the problem that pointer instrument detection algorithm has slow locating speed and low real time performance in edge equipment, this paper proposes a pointer instrument video detection method based on improved FCOS. The algorithm is based on FCOS model and uses lightweight network ShuffleNetV2 to extract image features. Using PAN structure to strengthen the original feature fusion network, a bidirectional feature fusion network is formed. The attention module with global context information is introduced to reduce the information attenuation in the process of feature fusion. The two parameters of pixel utilization PUR and relative time increase RIT are introduced to test the influence of images with different image pixels on the detection effect in a more intuitive form. Through experiments, when the resolution of the input image is 1 280×1 280, compared with the baseline model, the detection time of the pointer instrument video detection method based on improved FCOS is reduced by 91.60% when the detection accuracy is similar. © 2023 SPIE.

关键词： Pixels

来源：评论

学校读者我要写书评

暂无评论

UnDIVE: Generalized Underwater video Enhancement Using Generative Priors

UnDIVE: Generalized Underwater Video Enhancement Using Gener...

引用

2025 IEEE/CVF Winter conference on Applications of Computer Vision, WACV 2025

作者： Srinath, Suhas Chandrasekar, Aditya Jamadagni, Hemang Soundararajan, Rajiv Prathosh, A.P. Indian Institute of Science India Qualcomm United States National Institute of Technology Karnataka India

ISBN: (纸本)9798331510831

With the rise of marine exploration, underwater imaging has gained significant attention as a research topic. Under-water video enhancement has become crucial for real-time computer vision tasks in marine exploration. However, most existing methods focus on enhancing individual frames and neglect video temporal dynamics, leading to visually poor enhancements. Furthermore, the lack of ground-truth references limits the use of abundant available underwater video data in many applications. To address these issues, we propose a two-stage framework for enhancing underwater videos. The first stage uses a denoising diffusion probabilistic model to learn a generative prior from unlabeled data, capturing robust and descriptive feature representations. In the second stage, this prior is incorporated into a physics-based image formulation for spatial enhancement, while also enforcing temporal consistency between video frames. Our method enables real-time and computationally-efficient processing of high-resolution underwater videos at lower resolutions, and offers efficient enhancement in the presence of diverse water-types. Extensive experiments on four datasets show that our approach generalizes well and outperforms existing enhancement methods. Our code is available at github. com/suhas-srinath/undive. © 2025 IEEE.

关键词： Photointerpretation

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：