检索结果-内蒙古大学图书馆

A hybrid spatial error concealment using human face and image edge over H.264/AVC video sequences

MULtimeDIA TOOLS AND APPLICATIONS 2023年第3期82卷 3769-3799页

作者： Nam, Cholman Ryu, Sun Chu, Changgon Kim, Taeguk Choe, Sungran Choe, Jinhyok Kim II Sung Univ Informat & Commun Lab Sch Informat Sci Pyongyang North Korea Kim II Sung Univ Informat & Artificial Intelligence Lab Sch Informat Sci Pyongyang North Korea Kim II Sung Univ Visual Informat Proc Lab Sch Informat Sci Pyongyang North Korea

This paper proposes a human face and image edge-based hybrid spatial error concealment. Though human faces on video sequences are of most interest, the face error concealment is not yet easy in case of loss of face information. And furthermore, when there are bit errors in regular edge shapes in background, it affects more seriously to visual effect than irregular image characteristics. In order to overcome these challenges, the proposed algorithm, at first, classifies the lost block into foreground, boundary and background by using face detection, and then selects adaptively bilinear interpolation(BI) and horizontal symmetrical interpolation(HSI) for foreground, multi-direction filling interpolation(MDFI) for boundary and block division-based interpolation(BDI) for background. HSI, MDFI, Bezier curve-based block division of foreground and background and BDI of background are novel error concealments which are proposed in this paper. Our test reveals that the proposed error concealment can achieve a better PSNR compared with previous works including separate, adaptive or hybrid concealments, in terms of visual effect, PSNR and runtime, etc. The proposed algorithm may be utilized as an effective error resilient tool for real-time video applications, such as telephone conference, mobile telephone conference and wireless multimedia camera networks in which power consumption should be low.

关键词： Multi-direction-based filling interpolation Horizontal symmetrical interpolation Block division-based interpolation Adaptive selection based hybrid error concealment Bezier curve-based block division

来源：评论

学校读者我要写书评

暂无评论

Implicit train-free calibration for video-based eye-tracking 32

Implicit train-free calibration for video-based eye-tracking

引用

32nd European Signal processing conference (EUSIPCO)

作者： Mygdalis, Vasileios Dens, Nathalie Univ Antwerp Fac Business & Econ Antwerp Belgium

ISBN: (纸本)9789464593617;9798331519773

Appearance-based gaze estimation methods based on deep learning perform significantly better when they have been appropriately calibrated at a per-participant (subject) level. However, their calibration process typically includes neural model retraining with ground truth subject gaze data, which is difficult to obtain, leaving much room for error and consuming a non-negligible portion of the recording time. To address this issue, we propose a novel train-free calibration scheme, which includes a novel neural architecture and training process that learns to operate with implicit calibration, by design. More specifically, the input image representation is refined by extracting information about the visual similarity between the input image and the proposed calibration anchors, i.e., representative images of subjects linked with rough gaze directions, using an attention mechanism. During deployment, the model is adapted to new subjects by enriching the input image representation with its similarity to a set of representative test subject images, without model retraining or ground truth gaze data. Our experiments in publicly available eye-tracking datasets have shown that the proposed method provides almost a 10-15% reduction in angular error with respect to baseline solutions.

关键词： Deep learning

来源：评论

学校读者我要写书评

暂无评论

Multi-view aggregation for real-time accurate object detection of a moving camera

引用

JOURNAL OF real-time image processing 2022年第6期19卷 1169-1179页

作者： Hu, Jiyuan Wang, Tao Zhu, Shiqiang Zhejiang Univ Ocean Coll Zhoushan 316000 Peoples R China Zhejiang Lab Hangzhou 310000 Peoples R China

Object detection plays an important role on various mobile robot tasks. However, directly applying existing detectors on videos from a mobile robot will cause a sharp accuracy decline, because such videos introduce some extra difficulties on accurate detection. This paper proposes a viewpoint-based memory mechanism to handle detection performance deterioration and improve detection accuracy of the videos in real time. The mechanism positively organizes previous results from multiple viewpoints of target objects as prior knowledge to enhance detection accuracy for succeeding frames, and it is designed as an extension module of an existing image detector. In experiments, we collect testing dataset from an indoor mobile robot, and compare performance of several sole image detectors and the same detectors extended by the extension module. The result shows the mechanism module achieves 20.7% object localization rate margin in average at a cost of 18.1 ms, and the mechanism can give positive impact on various existing detectors. The result indicates the proposed method achieves good accuracy margin, has acceptable time cost, and gets a degree of universal applicability.

关键词： video object detection Multi-view aggregation Computer vision real time

来源：评论

学校读者我要写书评

暂无评论

One-Click Upgrade from 2D to 3D: Sandwiched RGB-D video Compression for Stereoscopic Teleconferencing

One-Click Upgrade from 2D to 3D: Sandwiched RGB-D Video Comp...

引用

IEEE/CVF conference on Computer Vision and Pattern Recognition (CVPR)

作者： Hu, Yueyu Guleryuz, Onur G. Chou, Philip A. Tang, Danhang Taylor, Jonathan Maxham, Rus Wang, Yao NYU Tandon Sch Engn Brooklyn NY 11201 USA Google LLC Mountain View CA USA

ISBN: (纸本)9798350365474

Stereoscopic video conferencing is still challenging due to the need to compress stereo RGB-D video in real-time. Though hardware implementations of standard video codecs such as H.264 / AVC and HEVC are widely available, they are not designed for stereoscopic videos and suffer from reduced quality and performance. Specific multiview or 3D extensions of these codecs are complex and lack efficient implementations. In this paper, we propose a new approach to upgrade a 2D video codec to support stereo RGB-D video compression, by wrapping it with a neural pre- and post-processor pair. The neural networks are end-to-end trained with an image codec proxy, and shown to work with a more sophisticated video codec. We also propose a geometry-aware loss function to improve rendering quality. We train the neural pre- and post-processors on a synthetic 4D people dataset, and evaluate it on both synthetic and real-captured stereo RGB-D videos. Experimental results show that the neural networks generalize well to unseen data and work out-of-box with various video codecs. Our approach saves about 30% bit-rate compared to a conventional video coding scheme and MV-HEVC at the same level of rendering quality from a novel view, without the need of a task-specific hardware upgrade.

关键词： video conferencing

来源：评论

学校读者我要写书评

暂无评论

WebGL-based image processing through JavaScript Injection 24

WebGL-based Image Processing through JavaScript Injection

引用

29th International ACM conference on 3D Web Technology (Web3D)

作者： Meyer, Tim Rodosek, Gabi Dreo Haehn, Daniel Univ Bundeswehr Munchen Neubiberg Germany Univ Massachusetts Boston Boston MA USA

ISBN: (纸本)9798400706899

Can we modify existing web-based computer graphics content through JavaScript injection? We study how to hijack the WebGL context of any external website to perform GPU-accelerated image processing and scene modification. This allows client-side modification of 2D and 3D content without access to the web server. We demonstrate how JavaScript can overload an existing WebGL context and present examples such as color replacement, edge detection, image filtering, and complete visual transformations of external websites, as well as vertex and geometry processing and manipulation. We discuss the potential of such an approach and present open-source software for real-time processing using a bookmarklet implementation.

关键词： WebGL WebGL2 web-based image processing Vertex processing JavaScript Injection

来源：评论

学校读者我要写书评

暂无评论

real-time Detection of Illegally Parked Vehicles in Roadside Parking Areas Based on Intelligent video Terminals 24

Real-time Detection of Illegally Parked Vehicles in Roadside...

引用

2024 International conference on image processing, Intelligent Control and Computer Engineering, IPICE 2024

作者： Tang, Kang Sun, Yu Zhong, Xiaoyang Key Laboratory of Spatial Data Mining and Information Sharing Ministry of Education Fuzhou University China

ISBN: (纸本)9798400710285

With the development of artificial intelligence technology, urban traffic management has become increasingly convenient, and the task of illegal parking detection has become a major research focus. Currently, most illegal parking detection schemes use fixed-point cameras, which not only waste resources but also have the limitation of a small detection range. To overcome this issue, we have designed a highly mobile system for detecting illegally parked vehicles. Considering the different relative positions of the tires and parking lines in roadside parking areas, we use an optimized You Only Look Once(YOLOv5) algorithm to classify the tires and determine whether the vehicle is illegally parked based on different category of tires combinations. Subsequently, we applied the above-mentioned illegal parking detection strategy to a embedded device which mounted on the mobile vehicle to realize the real-time mobile detection and information collection of illegal vehicles in the roadside parking area. © 2024 Copyright held by the owner/author(s).

关键词： Tires

来源：评论

学校读者我要写书评

暂无评论

Lumiere: A Space-time Diffusion Model for video Generation 24

Lumiere: A Space-Time Diffusion Model for Video Generation

引用

2024 SIGGRAPH Asia conference-SIGGRAPH Asia

作者： Bar-Tal, Omer Chefer, Hila Tov, Omer Herrmann, Charles Paiss, Roni Zada, Shiran Ephrat, Ariel Hur, Junhwa Liu, Guanghui Raj, Amit Li, Yuanzhen Rubinstein, Michael Michaeli, Tomer Wang, Oliver Sun, Deqing Dekel, Tali Mosseri, Inbar Google Res Tel Aviv Israel Weizmann Inst Sci Rehovot Israel Tel Aviv Univ Tel Aviv Israel Google Res Mountain View CA USA Technion Israel Inst Technol IL-32000 Haifa Israel

ISBN: (纸本)9798400711312

We introduce Lumiere - a text-to-video diffusion model designed for synthesizing videos that portray realistic, diverse and coherent motion - a pivotal challenge in video synthesis. To this end, we introduce a Space-time U-Net architecture that generates the entire temporal duration of the video at once, through a single pass in the model. This is in contrast to existing video models which synthesize distant keyframes followed by temporal super-resolution - an approach that inherently makes global temporal consistency difficult to achieve. By deploying both spatial and (importantly) temporal down- and up-sampling and leveraging a pre-trained text-to-image diffusion model, our model learns to directly generate a full-frame-rate, low-resolution video by processing it in multiple space-time scales. We demonstrate state-of-the-art text-to-video generation results, and show that our design easily facilitates a wide range of content creation tasks and video editing applications, including image-to-video, video inpainting, and stylized generation.

关键词： Text-to-video generation diffusion models

来源：评论

学校读者我要写书评

暂无评论

Improved LYT-Net image Enhancement Model for Underwater Robotics Vision 13

Improved LYT-Net Image Enhancement Model for Underwater Robo...

引用

13th International conference on Intelligent Control and Information processing, ICICIP 2025

作者： Li, Wenjie Luo, Xiaonan Guangxi Key Laboratory of Image and Graphic Intelligent Processing Güttin University of Electronic Technology Guilin China

ISBN: (纸本)9798331516147

—Although underwater robots can replace humans to explore the ocean which is rich in resources but fraught with unknown risks, there are phenomena such as monotonous colors, complex backgrounds and uneven illumination in the underwater environment, which makes it difficult for underwater robots relying on the optical vision system to obtain clear images and is not conducive to subsequent recognition and grasping tasks. In this paper, relevant preparatory work was carried out for underwater image processing applicable to underwater robots, and the underwater imaging model as well as other underwater image processing models were analyzed. Subsequently, a method combining the underwater physical model and the dark channel image was proposed to improve LYT-Net, enabling it to be applicable to mobile embedded devices such as underwater robots. After training and testing on the real datasets LSUI and UIEB, our model achieved good results on important indicators such as PSNR and SSIM. Meanwhile, it met the theoretical requirements of mobile devices in terms of FLOPs, Params and single-frame processing time, which verified its robustness and real-time processing ability in various underwater environments. © 2025 IEEE.

关键词： Underwater imaging

来源：评论

学校读者我要写书评

暂无评论

Optimized 4K video and Geospatial Metadata Synchronization for Web Applications on 5G 15

Optimized 4K Video and Geospatial Metadata Synchronization f...

引用

15th International conference on Information and Communication Technology Convergence, ICTC 2024

作者： Yun, Jae-Kwan Lee, MoonSoo Lim, ChaeDeok DNA+Drone Platform Research Center Electronics and Telecommunications Research Institute Daejeon Korea Republic of Electronics and Telecommunications Research Institute Air Mobility Research Division Daejeon Korea Republic of

ISBN: (纸本)9798350364637

The advancement of drone services utilizing 5G technology has led to an increasing need for displaying high-resolution video and associated geometric information on web-based maps in real-time. Traditionally, Full HD video and KLV metadata have been used to display video and geometric data in applications or web browsers. However, there is a growing demand for services that stream and process 4K or higher resolution videos in real-time, especially for AI-driven analysis. To effectively handle 4K video and geometric data via web browsers, advanced real-time video processing technologies such as Media Source Extensions are required, along with precise timestamping and synchronization of KLV metadata, and 3D map mapping techniques. Mismatched synchronization between video and geometric information can lead to difficulties in making accurate decisions in real-time applications such as search and rescue for missing persons or detection operations. To effectively address this issue, this paper presents a synchronization technique capable of aligning 4K video and KLV metadata collected from drones with a precision of within 10 frames. By utilizing this synchronization method, we successfully demonstrated the synchronized display of 4K video streams from four drones along with their flight waypoints on a Cesium 3D map within a single web browser. © 2024 IEEE.

关键词： Web browsers

来源：评论

学校读者我要写书评

暂无评论

Parallel Hardware Architecture for Medical image processing Using Xilinx-System-Generator 2

Parallel Hardware Architecture for Medical Image Processing ...

引用

2nd International conference on Electrical Engineering and Automatic Control (ICEEAC)

作者： Baali, Mehdi Bourbia, Nadjla Messaoudi, Kamel Bourennane, El-Bay Mohemed Cherif Mssaadia Univ Elect Engn & Renewlable Energies Lab Leer Souk Ahras Algeria Mentouri Bros Univ Lab Study Elect Mat Med Applicat Constantine 1 Algeria Burgundy Univ ImVia Lab Dijon France

ISBN: (纸本)9798350349740;9798350349757

FPGA is increasingly used in latest realizations of real time implementation for a variety of image processing such as medical imaging. In this paper, we present parallel hardware architecture through a co-simulation using the most efficient tool called Xilinx-System-Generator (XSG) which integrated with MATLAB-Simulink and the synthesis tool used is Xilinx-Vivado. We propose a new strategy for FPGA's memory management based on a design of edge detection algorithm already implemented. The goal is to make an optimization for memory use by minimizing the consumption of slices-registers and slices-Luts. This technique was successfully verified in the images obtained and the resource utilization for the proposed architectures show that the new architectures use fewer resources than the existing architecture.

关键词： FPGA real time implementation image processing XSG MATLAB-Simulink

来源：评论

学校读者我要写书评

暂无评论

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案：

请选择收藏分类：

通借通还

建议与咨询 留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

时间限定

文献类型

馆藏选择

核心期刊

语言

文献类型

帮助

文字说明：

检索规则说明：

检索范例：

分类表

所选分类

限定检索结果

文献类型

馆藏范围

日期分布

学科分类号

主题

机构

作者

语言

请选择保存的检索档案： 新增检索档案 确定 取消

请选择收藏分类： 新增自定义分类 确定 取消

通借通还

建议与咨询留下您的常用邮箱和电话号码，以便我们向您反馈解决方案和替代方法

请选择保存的检索档案：

请选择收藏分类：