检索结果-内蒙古大学图书馆

2025 IEEE International Students' conference on Electrical, Electronics and Computer Science, SCEECS 2025

作者： Lone, Mohd Rafi Sharma, Ajay Najar, Ashfaq Ahmad Vit Bhopal University Kothrikalan Madhya Pradesh Sehore466114 India

ISBN: (纸本)9798331529833

Versatile video Coding (VVC) offers compression efficiency improvements of 50% and 75% compared to High Efficiency video Coding (HEVC) and Advanced video Coding (AVC), respectively. However, the VVC encoder software (VVENC), while efficient, is extremely complex and unsuitable for real-time encoding. This paper examines the overall complexity of VVENC by profiling the time consumption of its individual stages. We identify the key contributors to the encoder's computational load. Although various efforts have been made to reduce the complexity of specific stages, such as the Transform and Quantization units, these reductions have not significantly lowered latency. As a result, real-time encoding remains out of reach. This paper highlights these challenges and explores possible strategies to reduce the overall complexity of the encoder, aiming for more efficient future implementations. © 2025 IEEE.

关键词： video signal processing

来源：评论

学校读者我要写书评

暂无评论

Comparison of deep generative models for real-time generation of synthesized defective wafer maps

Comparison of deep generative models for real-time generatio...

引用

conference on real-time image processing and Deep Learning

作者： Alam, Lamia Kehtarnavaz, Nasser Univ Texas Dallas Dept Elect & Comp Engn Richardson TX 75080 USA

ISBN: (纸本)9781510673878;9781510673861

Modern wafer inspection systems in Integrated Circuit (IC) manufacturing utilize deep neural networks. The training of such networks requires the availability of a very large number of defective or faulty die patterns on a wafer called wafer maps. The number of defective wafer maps on a production line is often limited. In order to have a very large number of defective wafer maps for the training of deep neural networks, generative models can be utilized to generate realistic synthesized defective wafer maps. This paper compares the following three generative models that are commonly used for generating synthesized images: Generative Adversarial Network (GAN), Variational Auto-Encoder (VAE), and CycleGAN which is a variant of GAN. The comparison is carried out based on the public domain wafer map dataset WM-811K. The quality aspect of the generated wafer map images is evaluated by computing the five metrics of peak signal-to-noise ratio (PSNR), structural similarity index measure (SSIM), inception score (IS), Frechet inception distance (FID), and kernel inception distance (KID). Furthermore, the computational efficiency of these generative networks is examined in terms of their deployment in a real-time inspection system.

关键词： Synthesized Wafer Maps Generative Adversarial Network Variational Autoencoder CycleGAN real-time image Synthesis

来源：评论

学校读者我要写书评

暂无评论

Comprehensive Review for video Surveillance Based Suspicious Human Activity Detection 2

Comprehensive Review for Video Surveillance Based Suspicious...

引用

2nd International conference on Advances in Computation, Communication and Information Technology, ICAICCIT 2024

作者： Rajpoot, Lucky Madaan, Rosy School of Engineering and Technology Manav Rachna International Institute of Research and Studies Faridabad India

ISBN: (纸本)9798331541217

The detection of potentially illicit behaviors from recorded video footage is an emerging field of study in the domain of image processing and computer vision. Detecting suspicious activities is essential for maintaining the security of both businesses and communities. Surveillance cameras are primarily utilized in public areas to monitor and guarantee security. To provide continuous monitoring of public spaces, it is crucial to utilize intelligent video surveillance systems capable of properly detecting and classifying human actions in real-time as either benign or potentially threatening. This study presents a state-of-the-art review that highlights the general progress made in the last several years in spotting suspicious behaviors from surveillance recordings. article briefly summarize the problems and challenges related to identifying questionable human behaviors. This article aims to provide researchers in this subject with an overview of the literature on a number of suspicious activity recognition systems, including an overview of their general architecture. © 2024 IEEE.

关键词： video recording

来源：评论

学校读者我要写书评

暂无评论

Medical image Scalar for Portable Medical Devices 1

Medical Image Scalar for Portable Medical Devices

引用

1st International conference on AIML-Applications for Engineering and Technology, ICAET 2025

作者： Pangaonakar, Mrinalini Joshi Shingare, P.P. College of engineering Pune Dept of E&TC Engineerig Pune India

ISBN: (纸本)9798350355611

The field of image processing is playing a vital role in making technological changes those results in real time applications. image scaling is one of such fundamental method that helps to resolve storage issue and also helps for fast image data transmission. The technique is beneficial in case of real time portable devices and similar applications as usually the scaling required at end user device. This paper proposes medical image scalar for portable medical devices implemented using FPGA which involves creating a compact, efficient, and reliable system capable of adjusting the size and resolution of medical images for display on different devices This medical image scalar can be used for applications such as portable medical devices for remote disease diagnosis, image assisted surgery and other telemedicine applications where image quality, processing power and energy efficiency are of critical concerns. The proposed medical image scalar involves polynomial based enhanced bilinear interpolation technique for image scaling that preserves the image quality and also make VLSI implementation easier. © 2025 IEEE.

关键词： Diagnosis

来源：评论

学校读者我要写书评

暂无评论

ALL-INTRA RATE CONTROL USING LOW COMPLEXITY video FEATURES FOR VERSATILE video CODING 30

ALL-INTRA RATE CONTROL USING LOW COMPLEXITY VIDEO FEATURES F...

引用

30th IEEE International conference on image processing (ICIP)

作者： Menon, Vignesh V. Henkel, Anastasia Rajendran, Prajit T. Helmrich, Christian R. Wieckowski, Adam Bross, Benjamin Timmerer, Christian Marpe, Detlev Alpen Adria Univ Christian Doppler Lab ATHENA Klagenfurt Austria Fraunhofer HHI Video Commun & Applicat Dept Berlin Germany Univ Paris Saclay List CEA F-91120 Palaiseau France Univ Paris Saclay Gif Sur Yvette France

ISBN: (纸本)9781728198354

Versatile video Coding (VVC) allows for large compression efficiency gains over its predecessor, High Efficiency video Coding (HEVC). The added efficiency comes at the cost of increased runtime complexity, especially for encoding. It is thus highly relevant to explore all available runtime reduction options. This paper proposes a novel first pass for two-pass rate control in all-intra configuration, using low-complexity video analysis and a Random Forest (RF)-based machine learning model to derive the data required for driving the second pass. The proposed method is validated using VVenC, an open and optimized VVC encoder. Compared to the default two-pass rate control algorithm in VVenC, the proposed method achieves around 32% reduction in encoding time for the preset faster, while on average only causing 2% BD-rate increase and achieving similar rate control accuracy.

关键词： Rate control Complexity reduction Random Forest Machine learning VVC

来源：评论

学校读者我要写书评

暂无评论

BP-EVD: Forward Block-Output Propagation for Efficient video Denoising

引用

IEEE TRANSACTIONS ON image processing 2022年 31卷 3809-3824页

作者： Ostrowski, Piotr Kopa Katsaros, Efklidis Wesierski, Daniel Jezierska, Anna Gdansk Univ Technol Fac Elect Telecommun & Informat ETI Dept Robot & Decis Syst PL-80233 Gdansk Poland Dept Biomed Engn Fac Elect Telecommun & Informat ETI Gdansk Univ Technol PL-80233 Gdansk Poland Gdansk Univ Technol Fac Elect Telecommun & Informat ETI Dept Multimedia Syst PL-80233 Gdansk Poland Syst Res Inst Dept Modelling & Optimizat Dynam Syst PL-01447 Warsaw Poland

Denoising videos in real-time is critical in many applications, including robotics and medicine, where varying-light conditions, miniaturized sensors, and optics can substantially compromise image quality. This work proposes the first video denoising method based on a deep neural network that achieves state-of-the-art performance on dynamic scenes while running in real-time on VGA video resolution with no frame latency. The backbone of our method is a novel, remarkably simple, temporal network of cascaded blocks with forward block output propagation. We train our architecture with short, long, and global residual connections by minimizing the restoration loss of pairs of frames, leading to a more effective training across noise levels. It is robust to heavy noise following Poisson-Gaussian noise statistics. The algorithm is evaluated on RAW and RGB data. We propose a denoising algorithm that requires no future frames to denoise a current frame, reducing its latency considerably. The visual and quantitative results show that our algorithm achieves state-of-the-art performance among efficient algorithms, achieving from two-fold to two-orders-of-magnitude speed-ups on standard benchmarks for video denoising.

关键词： Noise reduction Streaming media Training Noise level real-time systems Noise measurement Gaussian noise Low-latency video denoising real-time Poisson-Gaussian noise Gaussian noise

来源：评论

学校读者我要写书评

暂无评论

Analysis and Development of Deep Learning Depth Estimation Techniques for Volumetric Capture and Free Viewpoint video 24

Analysis and Development of Deep Learning Depth Estimation T...

引用

15th ACM Multimedia Systems conference (ACM MMSys)

作者： Uson, Javier Cabrera, Julian Univ Politecn Madrid ETSI Telecomunicac Informat Proc & Telecommun Ctr Grp Tratamiento Imagenes Madrid Spain

ISBN: (纸本)9798400704123

Volumetric capture is an important topic in eXtended reality (XR) as it enables the integration of realistic three-dimensional content into virtual scenarios and immersive applications. Certain systems are even capable of delivering these volumetric captures live and in real-time, opening the door to interactive use cases such as immersive videoconferencing. One example of such systems is FVV Live, a Free Viewpoint video (FVV) application capable of working in real-time with low delay Current breakthroughs in Artificial Intelligence (AI) in general and deep learning in particular report great success when applied to the computer vision tasks involved in volumetric capture, helping to overcome the quality and bandwidth restrictions that these systems often face. Despite their promising results, state-of-the-art approaches still come with the disadvantage of requiring large processing power and time. This project aims to advance the volumetric capture state-of-the-art applying the previously mentioned deep learning techniques, optimizing the models to work in real-time while still delivering high quality. The technology developed will be validated integrating it into immersive video communication systems such as FVV Live in order to overcome their main restrictions and to improve the quality delivered to the end user.

关键词： Deep learning

来源：评论

学校读者我要写书评

暂无评论

videoLLM-online: Online video Large Language Model for Streaming video

VideoLLM-online: Online Video Large Language Model for Strea...

引用

IEEE/CVF conference on Computer Vision and Pattern Recognition (CVPR)

作者： Chen, Joya Lv, Zhaoyang Wu, Shiwei Lin, Kevin Qinghong Song, Chenan Gao, Difei Liu, Jia-Wei Gao, Ziteng Mao, Dongxing Shou, Mike Zheng Natl Univ Singapore Show Lab Singapore Singapore Real Labs Res Meta Redmond WA USA

ISBN: (纸本)9798350353006

Recent Large Language Models (LLMs) have been enhanced with vision capabilities, enabling them to comprehend images, videos, and interleaved vision-language content. However, the learning methods of these large multimodal models (LMMs) typically treat videos as predetermined clips, rendering them less effective and efficient at handling streaming video inputs. In this paper, we propose a novel Learning-In-video-Stream (LIVE) framework, which enables temporally aligned, long-context, and real-time dialogue within a continuous video stream. Our LIVE framework comprises comprehensive approaches to achieve video streaming dialogue, encompassing: (1) a training objective designed to perform language modeling for continuous streaming inputs, (2) a data generation scheme that converts offline temporal annotations into a streaming dialogue format, and (3) an optimized inference pipeline to speed up interactive chat in real-world video streams. With our LIVE framework, we develop a simplified model called videoLLM-online and demonstrate its significant advantages in processing streaming videos. For instance, our videoLLM-online-7B model can operate at over 10 FPS on an A100 GPU for a 5-minute video clip from Ego4D narration. Moreover, videoLLM-online also showcases state-of-the-art performance on public offline video benchmarks, such as recognition, captioning, and forecasting. The code, model, data, and demo have been made available at ***/videollm-online.

关键词： video streaming

来源：评论

学校读者我要写书评

暂无评论

SDN- based Internet of video Things Platform Enabling real-time Edge/Cloud video Analytics 17

SDN- based Internet of Video Things Platform Enabling Real-T...

引用

IEEE 17th International conference on the Experience of Designing and Application of CAD Systems (CADSM)

作者： Kochan, Orest Beshley, Mykola Beshley, Halyna Shkoropad, Yuriy Ivanochko, Iryna Seliuchenko, Nadiia Hubei Univ Technol Sch Comp Sci Wuhan Peoples R China Lviv Polytech Natl Univ Dept Informat Measuring Technol 12 Bandery Str Lvov Ukraine Lviv Polytech Natl Univ Dept Telecommun 12 Bandery Str Lvov Ukraine Comenius Univ Fac Management Dept Informat Syst Bratislava 82005 Slovakia Lviv Polytech Natl Univ Dept Management & Int Business 12 Bandery Str Lvov Ukraine Lviv Polytech Natl Univ Dept Business Econ & Investment 12 Bandery Str Lvov Ukraine

ISBN: (纸本)9798350310856

The trend of recent years is the continuous development of the Internet of Things (IoT). Among such things, a significant share is occupied by visual sensors and video cameras that generate large amounts of data. In turn, the need to attract significant storage resources, transmission throughput, and processing power is an inevitable solution for real-time video analytics. Thus, the combination of smart cameras with the computing paradigm of Cloud/Edge and IoT architectures form the next generation of video surveillance systems, called the "Internet of video Things" (IoVT In this paper, a new IoVT platform is developed that, in addition to harmoniously combining Edge/Cloud computing, uses SDN to overcome challenges such as flexible management, control, and maintenance of IoVT devices. In particular, within the proposed IoVT platform, an algorithm for the dynamic selection of Edge or Cloud computing is implemented using an SDN controller to provide effective video analytics in real-time. This algorithm considers such parameters as the priority of computational tasks, the number of video streams, and the image quality with the ability to adapt to a specific application by software configuration of the IoVT platform. We also demonstrate the effectiveness of the proposed solutions on real equipment and discuss several promising areas of application of the developed platform.

关键词： Internet of video Things Software-defined Networking Cloud Computing Edge Computing Quality of Service

来源：评论

学校读者我要写书评

暂无评论

YOLO glass: video-based smart object detection using squeeze and attention YOLO network

引用

SIGNAL image AND video processing 2024年第3期18卷 2105-2115页

作者： Sugashini, T. Balakrishnan, G. Saranathan Coll Engn Dept Comp Sci & Engn Trichy 620012 India

Visually impairments or blindness people need guidance in order to avoid collision risks with outdoor obstacles. Recently, technology has been proving its presence in all aspects of human life, and new devices provide assistance to humans on a daily basis. However, due to real-time dynamics or a lack of specialized knowledge, object detection confronts a reliability difficulty. To overcome the challenge, YOLO Glass a video-based Smart object detection model has been proposed for visually impaired person to navigate effectively in indoor and outdoor environments. Initially the captured video is converted into key frames and pre-processed using Correlation Fusion-based disparity approach. The pre-processed images were augmented to prevent overfitting of the trained model. The proposed method uses an obstacle detection system based on a Squeeze and Attendant Block YOLO Network model (SAB-YOLO). A proposed system assists visually impaired users in detecting multiple objects and their locations relative to their line of sight, and alerts them by providing audio messages via headphones. The system assists blind and visually impaired people in managing their daily tasks and navigating their surroundings. The experimental results show that the proposed system improves accuracy by 98.99%, proving that it can accurately identify objects. The detection accuracy of the proposed method is 5.15%, 7.15% and 9.7% better that existing YOLO v6, YOLO v5 and YOLO v3, respectively.

关键词： Visually impairment Deep learning Outdoor object detection Wearable system

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：