This paper proposes a human face and image edge-based hybrid spatial error concealment. Though human faces on video sequences are of most interest, the face error concealment is not yet easy in case of loss of face in...
详细信息
This paper proposes a human face and image edge-based hybrid spatial error concealment. Though human faces on video sequences are of most interest, the face error concealment is not yet easy in case of loss of face information. And furthermore, when there are bit errors in regular edge shapes in background, it affects more seriously to visual effect than irregular image characteristics. In order to overcome these challenges, the proposed algorithm, at first, classifies the lost block into foreground, boundary and background by using face detection, and then selects adaptively bilinear interpolation(BI) and horizontal symmetrical interpolation(HSI) for foreground, multi-direction filling interpolation(MDFI) for boundary and block division-based interpolation(BDI) for background. HSI, MDFI, Bezier curve-based block division of foreground and background and BDI of background are novel error concealments which are proposed in this paper. Our test reveals that the proposed error concealment can achieve a better PSNR compared with previous works including separate, adaptive or hybrid concealments, in terms of visual effect, PSNR and runtime, etc. The proposed algorithm may be utilized as an effective error resilient tool for real-timevideo applications, such as telephone conference, mobile telephone conference and wireless multimedia camera networks in which power consumption should be low.
Appearance-based gaze estimation methods based on deep learning perform significantly better when they have been appropriately calibrated at a per-participant (subject) level. However, their calibration process typica...
详细信息
ISBN:
(纸本)9789464593617;9798331519773
Appearance-based gaze estimation methods based on deep learning perform significantly better when they have been appropriately calibrated at a per-participant (subject) level. However, their calibration process typically includes neural model retraining with ground truth subject gaze data, which is difficult to obtain, leaving much room for error and consuming a non-negligible portion of the recording time. To address this issue, we propose a novel train-free calibration scheme, which includes a novel neural architecture and training process that learns to operate with implicit calibration, by design. More specifically, the input image representation is refined by extracting information about the visual similarity between the input image and the proposed calibration anchors, i.e., representative images of subjects linked with rough gaze directions, using an attention mechanism. During deployment, the model is adapted to new subjects by enriching the input image representation with its similarity to a set of representative test subject images, without model retraining or ground truth gaze data. Our experiments in publicly available eye-tracking datasets have shown that the proposed method provides almost a 10-15% reduction in angular error with respect to baseline solutions.
Object detection plays an important role on various mobile robot tasks. However, directly applying existing detectors on videos from a mobile robot will cause a sharp accuracy decline, because such videos introduce so...
详细信息
Object detection plays an important role on various mobile robot tasks. However, directly applying existing detectors on videos from a mobile robot will cause a sharp accuracy decline, because such videos introduce some extra difficulties on accurate detection. This paper proposes a viewpoint-based memory mechanism to handle detection performance deterioration and improve detection accuracy of the videos in realtime. The mechanism positively organizes previous results from multiple viewpoints of target objects as prior knowledge to enhance detection accuracy for succeeding frames, and it is designed as an extension module of an existing image detector. In experiments, we collect testing dataset from an indoor mobile robot, and compare performance of several sole image detectors and the same detectors extended by the extension module. The result shows the mechanism module achieves 20.7% object localization rate margin in average at a cost of 18.1 ms, and the mechanism can give positive impact on various existing detectors. The result indicates the proposed method achieves good accuracy margin, has acceptable time cost, and gets a degree of universal applicability.
Stereoscopic video conferencing is still challenging due to the need to compress stereo RGB-D video in real-time. Though hardware implementations of standard video codecs such as H.264 / AVC and HEVC are widely availa...
详细信息
ISBN:
(纸本)9798350365474
Stereoscopic video conferencing is still challenging due to the need to compress stereo RGB-D video in real-time. Though hardware implementations of standard video codecs such as H.264 / AVC and HEVC are widely available, they are not designed for stereoscopic videos and suffer from reduced quality and performance. Specific multiview or 3D extensions of these codecs are complex and lack efficient implementations. In this paper, we propose a new approach to upgrade a 2D video codec to support stereo RGB-D video compression, by wrapping it with a neural pre- and post-processor pair. The neural networks are end-to-end trained with an image codec proxy, and shown to work with a more sophisticated video codec. We also propose a geometry-aware loss function to improve rendering quality. We train the neural pre- and post-processors on a synthetic 4D people dataset, and evaluate it on both synthetic and real-captured stereo RGB-D videos. Experimental results show that the neural networks generalize well to unseen data and work out-of-box with various video codecs. Our approach saves about 30% bit-rate compared to a conventional video coding scheme and MV-HEVC at the same level of rendering quality from a novel view, without the need of a task-specific hardware upgrade.
Can we modify existing web-based computer graphics content through JavaScript injection? We study how to hijack the WebGL context of any external website to perform GPU-accelerated imageprocessing and scene modificat...
详细信息
ISBN:
(纸本)9798400706899
Can we modify existing web-based computer graphics content through JavaScript injection? We study how to hijack the WebGL context of any external website to perform GPU-accelerated imageprocessing and scene modification. This allows client-side modification of 2D and 3D content without access to the web server. We demonstrate how JavaScript can overload an existing WebGL context and present examples such as color replacement, edge detection, image filtering, and complete visual transformations of external websites, as well as vertex and geometry processing and manipulation. We discuss the potential of such an approach and present open-source software for real-timeprocessing using a bookmarklet implementation.
With the development of artificial intelligence technology, urban traffic management has become increasingly convenient, and the task of illegal parking detection has become a major research focus. Currently, most ill...
详细信息
We introduce Lumiere - a text-to-video diffusion model designed for synthesizing videos that portray realistic, diverse and coherent motion - a pivotal challenge in video synthesis. To this end, we introduce a Space-T...
详细信息
ISBN:
(纸本)9798400711312
We introduce Lumiere - a text-to-video diffusion model designed for synthesizing videos that portray realistic, diverse and coherent motion - a pivotal challenge in video synthesis. To this end, we introduce a Space-time U-Net architecture that generates the entire temporal duration of the video at once, through a single pass in the model. This is in contrast to existing video models which synthesize distant keyframes followed by temporal super-resolution - an approach that inherently makes global temporal consistency difficult to achieve. By deploying both spatial and (importantly) temporal down- and up-sampling and leveraging a pre-trained text-to-image diffusion model, our model learns to directly generate a full-frame-rate, low-resolution video by processing it in multiple space-time scales. We demonstrate state-of-the-art text-to-video generation results, and show that our design easily facilitates a wide range of content creation tasks and video editing applications, including image-to-video, video inpainting, and stylized generation.
—Although underwater robots can replace humans to explore the ocean which is rich in resources but fraught with unknown risks, there are phenomena such as monotonous colors, complex backgrounds and uneven illuminatio...
详细信息
The advancement of drone services utilizing 5G technology has led to an increasing need for displaying high-resolution video and associated geometric information on web-based maps in real-time. Traditionally, Full HD ...
详细信息
FPGA is increasingly used in latest realizations of realtime implementation for a variety of imageprocessing such as medical imaging. In this paper, we present parallel hardware architecture through a co-simulation ...
详细信息
ISBN:
(纸本)9798350349740;9798350349757
FPGA is increasingly used in latest realizations of realtime implementation for a variety of imageprocessing such as medical imaging. In this paper, we present parallel hardware architecture through a co-simulation using the most efficient tool called Xilinx-System-Generator (XSG) which integrated with MATLAB-Simulink and the synthesis tool used is Xilinx-Vivado. We propose a new strategy for FPGA's memory management based on a design of edge detection algorithm already implemented. The goal is to make an optimization for memory use by minimizing the consumption of slices-registers and slices-Luts. This technique was successfully verified in the images obtained and the resource utilization for the proposed architectures show that the new architectures use fewer resources than the existing architecture.
暂无评论